How to get an exact match from the bam alignment file
1
0
Entering edit mode
5.3 years ago
xiaoguang ▴ 160

How to get an exact match from the bam comparison file, it is well known that M in the cigar character is completely consistent or there is mismatch, but how do I get a completely consistent sequence?

bam bwa samtools • 1.8k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
yasokannan93 ▴ 20

You can use a combination of read length and the cigar character M from the alignment file. For example, if the read length is 100, you can try extracting reads with cigar score of 100M. This would mean the read is completely consistent or it is an exact match.

ADD COMMENT
1
Entering edit mode

That is not true, a M can also be a mismatch as xiaoguang correctly says. M only says that these was neither insertion nor deletion but a single base found at that position. A possible strategy could be no check the MD tag which should contain the position of mismatches. If it is 0 there should not be one, check the SAM documentation and other threads to confirm this https://samtools.github.io/hts-specs/SAMtags.pdf

ADD REPLY
0
Entering edit mode

Thanks for the clarification. So for the example, we should expect it as MD:Z:100 ?

https://samtools.github.io/hts-specs/SAMtags.pdf says

The MD field aims to achieve SNP/indel calling without looking at the reference. For example, a string ‘10A5^AC6’ means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD field ought to match the CIGAR string.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6