Entering edit mode
8.9 years ago
kanika.151
▴
160
I had run DeNovo Assembly on my data using Trinity. The Assembled .fasta file was then aligned by using TopHat without giving an annotation file. The amount of unmapped.bam is more than expected as it ranges anywhere from 100 MB to 600 MB for different conditions. I have 6 different conditions and I have paired-end data.
My question is it normal to get such high number of unmapped reads?
One of the align_summary.txt
:
Left reads:
Input : 12382431
Mapped : 11331326 (91.5% of input)
of these: 10276265 (90.7%) have multiple alignments (312919 have >20)
Right reads:
Input : 12382431
Mapped : 11346906 (91.6% of input)
of these: 10290863 (90.7%) have multiple alignments (312928 have >20)
91.6% overall read mapping rate.
Aligned pairs: 11146003
of these: 10125490 (90.8%) have multiple alignments
65963 ( 0.6%) are discordant alignments
89.5% concordant pair alignment rate.
Should I be concerned?
You should not look at the file size. Check what percentage of reads are unmapped. From the align_summary, 91% of reads mapped back to the assembled transcriptome.
Note: As you are aligning the data to transcriptome, which might have multiple transcripts assembled for same gene (redundancy), so you get more multi mapped reads.
As Trinity assemblies results in a factor of 3 in my case. I was expecting that some of it will be unmapped but 15-20% of the data is not aligned that raised some flags.
There could be better ways but I would just BLAST few of the unmapped reads and see what are they.