GTF file: CDS feature is exon 1, but has frame = 1 or 2
1
0
Entering edit mode
7.4 years ago
Marvin ▴ 220

Hello, I'm looking at a record from a GTF file:

18  protein_coding  CDS 2554668 2554691 .   -   2    gene_id "ENSG00000101574"; transcript_id "ENST00000576251"; exon_number "1"; gene_name "METTL4"; gene_biotype "protein_coding"; transcript_name "METTL4-010"; protein_id "ENSP00000460774";

If this is exon_number 1, how can it have a frame of 2 (I expect 0) ?

According to the documentation this means that the third base of this sequence is the first base of a codon. So what about the first two bases of this sequence then? Since this is exon 1? Where is the missing base? Do you know what I mean?

gtf cds exon • 3.0k views
ADD COMMENT
2
Entering edit mode
7.4 years ago

For genes on the - strand, you don't want exon 1, but the last exon. You'll note it has frame 0.

ADD COMMENT
0
Entering edit mode

Just in this moment it clicked and I have understood what you meant 2 weeks ago :D

I do not know how to explain it to others but I highly recommend this: download the .gtf file from the ENSEMBL ftp server. check out the following transcript:

awk '$0 ~ /ENST00000576251/ && $3 == "CDS" {print $0}' Homo_sapiens.GRCh37.68.gtf | less

You will notice it has 4 exons. Pick exon_number "1" and enter its coordinates into UCSC genome browser hg19 like this:

chr18:2554667-2554692

Notice how I extended the interval at both sides by 1 nucleotide. Now in UCSC you will find the according transcript among others. You will see that the "intron arrows" of this exon point to the LEFT (extending the interval by 1 base makes this visible). That means (as Devon said) that the gene is on the minus strand. And now you can clearly see how it is correct that the left-most position in this CDS does NOT have frame 0. The last exon (exon 4) has frame 0.

I got it now, thanks for your reply Devon :)

ADD REPLY
0
Entering edit mode

Or you could just look at column 7 of the GTF. -/+

ADD REPLY
0
Entering edit mode

I think you misunderstood the purpose of my post: The idea was not to go to UCSC in order to see on which strand the gene is. Instead the idea was to walk through an example that makes you _understand_ (and see with your own eyes) why exon 1 doesn't necessarily have frame 0.

ADD REPLY

Login before adding your answer.

Traffic: 1572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6