I am pretty new to bam/sam files but I would like to estimate edit distance to the reference (NM) not on all the length of my reads but just on a subset of bases for each read. Imagine that I have read of length 100 bp and actually I have the edit distance for all the length of each read, but I would like to estimate the NM on a shorter fraction of bases, let's say 20 bases for each reads. Is there a way to perform that? I was wondering maybe to "cut" the MD and recalculate NM through samtools calmd but I do not have any clues on how to perform that. Does anyone have some suggestions? Thanks a lot
Thanks Pierre for your suggestion. I am sorry I think I miss some clues to correctly understand that, can you be bit more specific please? I know how to extract my cigar values but what do you mean by "loop over the cigar-string + the read bases"? And secondly what is an API reading? thanks a lot
to compute the edit distance you 'll have to compare the read bases and the reference bases. To know which base is aligned under which reference you'll need too parse the elements of the cigar string.
and how select the first 20 bases aligned to compute the MD and NM?
ok...I think i miss some skills to perform that pratically :( thanks a lot for your support