Entering edit mode
11.3 years ago
newDNASeqer
▴
790
a quick question:
After running Tophat with a fastQ files, I found the # of reads from (accepted_hits.bam and unmapped.bam) is greater than the # of reads in fastQ file. Why is this? I thought the accepted_hits.bam plus unmapped should add up to the total # of reads that tophat started with.
I use samtools view -c to count the total reads in both accepted_hits.bam and unmapped.bam, and used grep "^@" to count the # of reads in fastQ file.
Basically, the number of entries in the bam is not the number of READS, but the number of ALIGNMENTS. And if there are multiple alignments allowed per read, you will have more alignments than reads.