Hello, I am working on a task, where have an interval - for instance chromosome 10 12000-12008 and I have to iterate over those reads that overlap this interval (The overlap should be 100%, meaning I skip reads that start at 12001,12002... or end before the position 12008).
Now, I cannot decide what approach would work for me since I came across a problem with deletions. Lets say I do pileup and iterate over those reads that overlap position 12 000 (I only look for reads that have to overlap whole interval). However, if I find out that the base is deletion (function pileupread.is_del) I want to check if following 8th bases are deletions as well. Now I cannot figure out how to approach neighboring ''bases''.
Is there a way to extract these overlapping reads and ''look'' at the read interval 12 000 - 12 008, to see if they are deletions? Should I change my approach to fetch() method, if yes how can I check those deletion (using CIGAR)?
Thank you for your help, I am sorry if this is something easy it is my first question, I am a beginner when it comes to pysam and samtools. I put a code example. I am just interested in the part after the if condition, but I wanted to give some example on how I approach the reads, I can however, edit the post and add some code if needed
#Code:
for pileupcolumn in bamfile.pileup('chr10', start, start+1):
if pileupcolumn.pos == start:
for pileupread in pileupcolumn.pileups:
if pileupread.is_del:
pass
#look at bases from the interval and check if they are all deletions or not
else:
pass
#save bases from the interval trough pileupread.alignment.query_sequence[interval]