Estimate edit distance to a reference for a subset of bases in reads (bam files)
1
0
Entering edit mode
3.8 years ago
AQ7 ▴ 30

I am pretty new to bam/sam files but I would like to estimate edit distance to the reference (NM) not on all the length of my reads but just on a subset of bases for each read. Imagine that I have read of length 100 bp and actually I have the edit distance for all the length of each read, but I would like to estimate the NM on a shorter fraction of bases, let's say 20 bases for each reads. Is there a way to perform that? I was wondering maybe to "cut" the MD and recalculate NM through samtools calmd but I do not have any clues on how to perform that. Does anyone have some suggestions? Thanks a lot

next-gen bam samtools sequence • 815 views
ADD COMMENT
0
Entering edit mode
3.8 years ago

you'll have to loop over the cigar-string + the read bases and compare it with the REFerence genome. You'll need an API reading the BAM and the reference.

ADD COMMENT
0
Entering edit mode

Thanks Pierre for your suggestion. I am sorry I think I miss some clues to correctly understand that, can you be bit more specific please? I know how to extract my cigar values but what do you mean by "loop over the cigar-string + the read bases"? And secondly what is an API reading? thanks a lot

ADD REPLY
0
Entering edit mode

to compute the edit distance you 'll have to compare the read bases and the reference bases. To know which base is aligned under which reference you'll need too parse the elements of the cigar string.

ADD REPLY
0
Entering edit mode

and how select the first 20 bases aligned to compute the MD and NM?

ADD REPLY
0
Entering edit mode

you'll have to loop over the cigar-string

ADD REPLY
0
Entering edit mode

ok...I think i miss some skills to perform that pratically :( thanks a lot for your support

ADD REPLY

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6