Does Bam File Include Unmapped Reads As Well?
3
1
Entering edit mode
11.8 years ago
Jordan ★ 1.3k

Hi,

I have a rookie question. I was using the samtools flagstat to check the statistics of bam files. When I view the results of that bam file, I see that number of reads which pass the QC is sometimes more than the number of reads which mapped. My understanding is that bam files only include mapped reads. Does it have unmapped reads too?

An e.g., is:

$ samtools flagstat file.bam
257823892 + 0 in total (QC-passed reads + QC-failed reads)
132531248 + 0 duplicates
209402202 + 0 mapped (81.22%:nan%)
257823892 + 0 paired in sequencing
128911946 + 0 read1
128911946 + 0 read2
152678438 + 0 properly paired (59.22%:nan%)
48421690 + 0 singletons (18.78%:nan%)
3565988 + 0 with mate mapped to a different chr
1316058 + 0 with mate mapped to a different chr (mapQ>=5)

Here, the mapping is 81.22%. I thought if the bam files have only mapped reads, then it should be 100% mapped. Can anyone help me understand this? Tried looking online but no luck.

The bam file was generated by Lifescope mapping using paired SOLiD reads.

Thanks!

bam mapping read • 9.9k views
ADD COMMENT
0
Entering edit mode

I guess for lifescope, the read pair where both the reads remain ualigned ends up in unmapped.bam file. I think you have the option to select what you want to do with the unmapped reads. But the mapped bam file will have both the reads from a read pair where one read was mapped and other failed to map.

ADD REPLY
0
Entering edit mode

Oh I see. In that case singleton means all the single reads which failed to map. And the number 209,402,202 means all the pairs that were mapped. Is that right?

ADD REPLY
1
Entering edit mode
11.8 years ago

Some aligners will leave out unmapped reads. I think Bowtie by default does that. Others, like bwa will leave them in. I guess LifeScope leaves them in.

Note that samtools flagstat is only reading the flags. It itself is not trying to make any QC decisions, or duplicate decisions, or anything like that. There is a flag for "failed QC", but that doesn't mean that the software you used necessarily tried to assess that. So you can't take those flagstat lines at face value, then only mean something if you ran software which would have correctly set those flags in your .bam. If you are sure you should have more reads, maybe LifeScope was doing an internal QC, and dumping the bad reads.

ADD COMMENT
0
Entering edit mode
11.8 years ago

It also store unmapped reads, you can find out which are these by flags. See this: http://picard.sourceforge.net/explain-flags.html

ADD COMMENT
0
Entering edit mode

The total number of reads in this file were 324 million reads. Now the bam file has overall 257 million reads. How are these 257 million reads selected then?

ADD REPLY
0
Entering edit mode

It looks like QC-failed reads were removed.

ADD REPLY
0
Entering edit mode
6.4 years ago

the question " Does "BAM" have unmapped reads too?" keeping the unaligned reads is an option. Some workflows keep the reads and some other don't depending on their use case. so the answer is it depends on your file you have at hand.

ADD COMMENT

Login before adding your answer.

Traffic: 2935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6