Hi. I am trying to retrieve exact break point positions from the CIGAR string using python. I know how to use the regex to retreieve integers that are upstream the letter denoting insertion or deletion. How can I sum these integers, or do anything else, so that my script will report the start and end position of my indel?
import re
a = '12M1I'
if 'I' or 'M' in a:
matchI = re.findall(r'(\d+)I', a)
intlistI = [int(x) for x in matchI]
print matchI
matchM = re.findall(r'(\d+)M', a)
intlistM = [int(x) for x in matchM]
print matchM
or just simply:
match = re.findall(r'(\d+)(\w)', a)
print match
Hmm it's not really helpfull. I'm already using pysam later in my script. I do have the reports of whether I have an insertion or deletion. I only want the script to report the exact location of my indel. So going with the starting position of read alignment what is the position of the indel. example: I have read 10M1I10M1D starting position let's say 10 and I would like an output: 1 insertion 20 1 deletion 31
store the initial position read.pos in a variable, for each match/deletion, increase the variable by cigarLength, report the insertions/deletion.
Yes, the problem is simple: a = '10M1I10M1D' pos = 10 match = re.findall(r'(\d+)(\w)', a) print match for i in match: #print i[0],i[1] posindel = pos + int(i[0]) print posindel,i[0],i[1]
Does not report the correct position of every next event
I am confused, is it a CIGAR problem or a problem with the program ?