How to identify RNA-seq reads spanning given genomic range?
1
0
Entering edit mode
5.7 years ago
agicict ▴ 200

I am working on RNA sequencing data from glioblastoma samples.

One of the most famous variant in glioblastoma is EGFR variant III.

EGFR variant iii is aberrant EGFR protein without exon 2-7 regions, which is not detectable from my whole-exome sequencing data and variant callers.

Therefore, I tried to locate RNA-seq reads spanning from exon 1 to exon 8 with pysam (python3).

I used fetch function and extracted reads starting from exon 1 region with following command line.

loaded_bam.fetch('chr7',55086794,55087058)

Now, I need to identify RNA reads whose end nucleotides are situated in exon8 ('chr7',55224226,55224352).

How to identify them?

RNA-Seq python pysam • 893 views
ADD COMMENT
1
Entering edit mode
5.7 years ago
ATpoint 85k

Get the annotation (GTF or GFF3) file for your genome version and then grep out your gene and then the respective exon, basically as in here A: how to get intronic and intergenic sequences based on gff file?. These coordinates you can use with your code snipped from above.

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6