Hi everyone,
I have a bunch of cDNA reads generated by minion sequencing and mapped onto the genome with minimap2. Now I'm trying to find the genomic positions of the exon junctions in order to class my reads by exon usage.
I'm using pysam
to get the coordinates of every base from the CIGAR string and then I compare one position with the previous one and, if the distance is greater than 90 (for example), I consider it as an intron and so I print the coordinates.
Here is the small script I have written so far:
import pysam
samfile = pysam.AlignmentFile("file.sam", "r")
for read in samfile:
pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
for i in range(len(pos)):
if (pos[i+1]-pos[i]) > 90:
print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))
samfile.close()
However, it seems to work for the first read in the SAM file but not for the next ones and all I get is :
04f7c7b3-2959-4e73-b025-c659677e26b0 14585254 : 14622303
04f7c7b3-2959-4e73-b025-c659677e26b0 14622510 : 14622827
04f7c7b3-2959-4e73-b025-c659677e26b0 14622896 : 14623009
04f7c7b3-2959-4e73-b025-c659677e26b0 14623147 : 14623827
04f7c7b3-2959-4e73-b025-c659677e26b0 14623897 : 14624302
04f7c7b3-2959-4e73-b025-c659677e26b0 14624419 : 14624955
04f7c7b3-2959-4e73-b025-c659677e26b0 14625088 : 14625278
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-bf9e6dc8c1a4> in <module>()
6 pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
7 for i in range(len(pos)):
----> 8 if (pos[i+1]-pos[i]) > 90:
9 print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))
10
IndexError: list index out of range
To be fair, I don't understand why it raises this error and how I can fix it in order to get all coordinates for every junctions of each reads.
If anyone has an idea I would be happy to hear about it ! Thanks in advance to anyone reading me. Florian.
that's perfect ! Thanks a lot.