How can read-count from BAM file be greater than than from fastq file?
1
0
Entering edit mode
3.4 years ago
c_u ▴ 520

Hello,

This may be an odd question, but I recently aligned a fastq file (from mouse total-RNA seq) using STAR, and I wanted to get the total read count for further analysis. I tried to get it from the fastq file using

echo $(cat myfile.fastq|wc -l)/4|bc

And the number was 18,526,266.

I then also thought of trying to get it using the BAM file using

samtools flagstat myfile.bam

And the result was -

23712455 + 0 in total (QC-passed reads + QC-failed reads)
6132434 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
23712455 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I don't understand how the fastq file gives me 18,526,266 reads and the BAM file gives 23,712,455, which is much larger than the result from the former. What am I missing?

rna-seq • 2.2k views
ADD COMMENT
5
Entering edit mode
3.4 years ago

6M of your ~24M reads are secondary alignments (24M - 6M ~ 18M). The most common reason (but not only reason) for a read to have a secondary alignment is that the read aligns to more than one location in the genome, and one of them has been assigned "primary" at random and all the others assigned "secondary" (multimapped reads). An alternative reason might be that the first part of the read maps to one genome location, and the other part of the read to an other genome location (a chimeric read).

ADD COMMENT
0
Entering edit mode

That was quite helpful. Thanks a lot!

ADD REPLY
0
Entering edit mode

So, for using the total # of reads, would it be better to use 23.7M - 6.1M ~ 17.5M or the ~18.5M that fastq gives. I would assume the former would be better as the latter would include reads that didn't match anything?

ADD REPLY
1
Entering edit mode

Yes, the number of primary alignments is the number of reads that were aligned.

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6