Hello, I have to make a Java program for a college course in which I have to find possible intron retention in a given sample.
I am stuck in the initial part where, given a reference GTF file, I have to parse it and recover all intron regions from in (Making another, pruned, GTF file).
I am not getting how could I find where an intron starts and ends Thanks
Are you using a specific GTF file? Can you post a few lines? Latest GTFspec is available at this link.
Hi thank you for the help. But I forgot to mention (I think it's quite important) that my input file contains ALL and ONLY the known exons of a human genome sample. I was thinking on computing from that file for each chromosome and then for each gene and for each transcripts (I saw the same gene can have multiple versions of itself due to splicing events) where is the exons start and end. And then compute the introns as the complementary of this.
How do you think about my algorithm?
Thanks
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts.Introns are not complementary (not sure what sense you are saying that in). They represent the interval between two exons. e.g.
Exon_1
-Intron_1
-Exon_2
-Intron_2
-Exon_3
etc. More here: https://en.wikipedia.org/wiki/ExonAlso see this thread for a nice graphic: What'S The Difference Between Cds And Orf?