Struggling with the information contained into a BAM/SAM file
1
1
Entering edit mode
9.2 years ago

SAM/BAM files are that kind of files with hidden treasures that needs to be unrevealed for the inexperienced user. That is me...

My case. After analyzing a BAM file when asking myself how many reads' mates remain unmapped

One possibility to answer this is by analyzing the FLAG values with samtools.

I understand FLAGS are formed by an unique combination of many other individual flags. So, all of these FLAGS values: 73, 89, 121, 153, 185, 137, 77, 141 and so on, contain the "8" , that in turn, should be indicating that the mate read remains unmapped. I got this information from this WEB page to get an idea about what information the FLAGS can provide

Now a summary..To answer this question I have analyzed a unique BAM file in two ways

  • One is by counting the number of * present in column 7 (RNEXT value), because in agreement with the official SAM file specification, this could mean that your mate can be unmapped (This field is set as `` when the information is unavailable*). In this case, I got over 65000 sequences that could be unmapped
  • However, if I run samtools view file.bam -f 8 | wc -l, I ended with only 2903 sequences.

One possibility is that when using the -f option, the program is looking for a lonely "8" in the FLAG field. But if I look in column 2 in the BAM file, I don't find any FLAG with only that lonely 8. That convinced me that the samtools view -f FLAG try to find any combination of FLAG values that intrinsically contains that 8, and thus, it should provide with the information about how many mates are being unmapped

With all this information, I still are not fully confident in knowing what are the right answer to this question. Or I have serious doubts about what is the usefulness of using the -f qualificator in the samtools view command. Or maybe, many other "lonely" flags should be included in the searching because I miss some important information and/or the orientation do not seem to matter when the BAM file is generated

SAM aligment BAM • 2.5k views
ADD COMMENT
2
Entering edit mode
9.2 years ago

-f 8 is an and operation, so it's just looking that all bits set in 8 are set in the flag (-F, on the other hand, just needs a single bit of overlap between the value provided and the flag). Anyway, the most robust method would be samtools view -cf8 -F 4 file.bam. This discounts cases where both mates in a pair are unmapped, since I assume you don't care about those. Note that using column 7 to get this is risky since it relies on aligners behaving in a way that isn't guaranteed (e.g., some aligners will give mapping coordinates to unmapped reads).

ADD COMMENT
0
Entering edit mode

So what is exactly providing the * value in the 7th column ?

Just "more robust" being said, I understand you believe that the 2903 answer is the correct one

ADD REPLY
0
Entering edit mode

The asterisk is from the aligner, each of which has different quirks in its output. For example, I recall that some aligners won't set mate alignment information if it aligners mates as singletons, though I don't recall exactly which ones do this.

ADD REPLY

Login before adding your answer.

Traffic: 3314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6