How to check if a read in a SAM file contains substitutions
2
0
Entering edit mode
6.7 years ago

How I do check whether a read contains substitutions or not. I'm not sure if I should check the flag, the cigar or the sequence.

sam bam cpp • 1.9k views
ADD COMMENT
1
Entering edit mode
6.7 years ago

Typically this is indicated in the NM auxiliary tag. Since you have C++ in the tags, I'll note that you can retrieve this in htslib via uint8_t *v = bam_aux_get(b, "NM") and then int64_t NM = bam_aux2i(v);.

ADD COMMENT
0
Entering edit mode

That's great, what exactly do I have to check in the NM tag?

ADD REPLY
0
Entering edit mode

I gave you two C commands that use htslib in my original reply.

ADD REPLY
0
Entering edit mode

NM is defined as the edit distance to the reference. Insertions and deletions are included . Most caller do not included soft or hard clipped bases in their calculation (the specs do not define whether they should or not).

ADD REPLY
0
Entering edit mode

True, one would also need to parse the CIGAR string and subtract appropriately.

ADD REPLY
1
Entering edit mode
6.7 years ago
d-cameron ★ 2.9k

If you are looking exclusively for base substitution for alignments that use M CIGAR operators (as opposed to X and =), then you either have to a) check each base against the reference and calculate mismatches yourself, or b) use the NM tag and adjust for any indels in the alignment.

ADD COMMENT
3
Entering edit mode

To add to this answer, A while back a wrote a program, ExpandCigar, that expands the M operator with 'X' or '='. After that, it's easy to extract reads where the CIGAR string contains X.

ADD REPLY

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6