Get all intronic regions from a generic GTF file
1
1
Entering edit mode
8.2 years ago

Hello, I have to make a Java program for a college course in which I have to find possible intron retention in a given sample.

I am stuck in the initial part where, given a reference GTF file, I have to parse it and recover all intron regions from in (Making another, pruned, GTF file).

I am not getting how could I find where an intron starts and ends Thanks

GTF JAVA • 8.1k views
ADD COMMENT
0
Entering edit mode

Are you using a specific GTF file? Can you post a few lines? Latest GTFspec is available at this link.

ADD REPLY
0
Entering edit mode

Hi thank you for the help. But I forgot to mention (I think it's quite important) that my input file contains ALL and ONLY the known exons of a human genome sample. I was thinking on computing from that file for each chromosome and then for each gene and for each transcripts (I saw the same gene can have multiple versions of itself due to splicing events) where is the exons start and end. And then compute the introns as the complementary of this.

How do you think about my algorithm?

Thanks

ADD REPLY
0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts.

Introns are not complementary (not sure what sense you are saying that in). They represent the interval between two exons. e.g. Exon_1-Intron_1-Exon_2-Intron_2-Exon_3 etc. More here: https://en.wikipedia.org/wiki/Exon

Also see this thread for a nice graphic: What'S The Difference Between Cds And Orf?

ADD REPLY
3
Entering edit mode
8.2 years ago
brent_wilson ▴ 140

Hi Alessandro,

Here are a couple relevant links that may be useful:

If you can use UCSC, rather than be forced to use a GTF file: Bed File With Introns Only

A little more detail on a manual solution is here: https://biostar.usegalaxy.org/p/6453/

And some basic information on GTF parsing in Python: http://biopython.org/wiki/GFF_Parsing

It's tough to give an exact solution without seeing the file, but hopefully this is useful. Good luck!

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENT

Login before adding your answer.

Traffic: 1900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6