Dear friends,
I have six sample ( ~ >36M reads/sample) for ribosomal-profiling. In all the sample, >95% of the reads has adapter sequences as expected. Tophat(default parameters) was used to align the reads against human genome(hg38). For one of the sample had 81.50% alignment and rest had an alignment ranging from 0.60% to 9.00%.
Unaligned reads to the human genome were searched against the NCBI-NT database using BLAST. we could map only maximum of 10% of the reads. We had checked for the barcode and adapter contamination. I have the following queries
1. What could be the reason for the lower alignment in the other samples?
2. How to check where these unaligned reads belong to?
Any suggestions would be appreciated.
I suggest running FastQC on your samples and see if there are differences in read quality, overepresented sequences, etc,... between the "good" sample and the others.
Can you give more information?