I am working on RNA sequencing data from glioblastoma samples.
One of the most famous variant in glioblastoma is EGFR variant III.
EGFR variant iii is aberrant EGFR protein without exon 2-7 regions, which is not detectable from my whole-exome sequencing data and variant callers.
Therefore, I tried to locate RNA-seq reads spanning from exon 1 to exon 8 with pysam (python3).
I used fetch function and extracted reads starting from exon 1 region with following command line.
loaded_bam.fetch('chr7',55086794,55087058)
Now, I need to identify RNA reads whose end nucleotides are situated in exon8 ('chr7',55224226,55224352).
How to identify them?