Question

UCSC Gene Table Exon Frames Generating Stop Codons

1

Entering edit mode

3.9 years ago

andrewjiajia ▴ 10

Hi,

I'm using UCSC gene tables, and I am running into trouble with interpreting exon frames. In some cases, using the exon frame from the tables creates stop codons, which shouldn't be happening in coding regions.

As an example, from the hg19 gene NM_001369291 on chromosome 22, I have this line from the gene table:

733 NM_001369291    chr22   +   19466988    19508131    19467079    19506431    19  19466988,19467680,19468475,19470212,19471384,19481849,19483503,19484908,19486623,19492884,19494908,19495288,19496052,19502271,19502487,19504049,19504339,19506366,19508003, 19467094,19467740,19468568,19470350,19471528,19481905,19483552,19484970,19486674,19493004,19495040,19495387,19496214,19502410,19502571,19504168,19504416,19506432,19508131, 0   CDC45   cmpl    cmpl    0,0,0,0,0,0,2,0,2,2,2,2,2,2,0,0,2,1,-1,

Where the first list of positions is a list of exon starts, and the last list of numbers is a list of exon frames. 19495288 corresponds to a frame of 2, but using a sequence of the exon from UCSC, only a frame of 1 creates a transcript where no stop codons are made:

>hg19_ncbiRefSeqCurated_NM_001369291.1_22 range=chr22:19495289-19495387 5'pad=0 3'pad=0 strand=+ repeatMasking=none
TCTTCCCCTGAAGCAGGTGAAGCAGAAGTTCCAGGCCATGGACATCTCCT
TGAAGGAGAATTTGCGGGAAATGATTGAAGAGTCTGCAAATAAATTTGG

Is there something I am missing with interpreting the exon frames of the gene table? Unless I am mistaken, the gene table is 0 indexed, and the fasta entry for the exon is 1 indexed.

Thanks in advance!

orf ucsc genome frame • 1.1k views

ADD COMMENT • link updated 3.3 years ago by Luis Nassar ▴ 670 • written 3.9 years ago by andrewjiajia ▴ 10

score 0 · Answer 1 · 2021-09-15

Hello,

I am unsure of which stop codons you refer to as being created, but I can provide some explanation about this transcript and exon frames.

Exon frames (this previous question may be helpful too (https://groups.google.com/a/soe.ucsc.edu/g/genome/c/U-w4b_ZS2j0) tell you how many bases are being 'borrowed' or belong to the previous exon to complete the codon.

So in your example here is a session of the end of exon 12: https://genome.ucsc.edu/s/Lou/NM_001369291exon12end And here is a session of the start of exon 13: https://genome.ucsc.edu/s/Lou/NM_001369291exon13start

We see that exon12 has "GG" as the two final bases of the G codon, and exon13 contains the final "G" base that results in the Gly (G) AA. The exon frame for exon 13 is then 2, because it uses 2 bases from the previous exon to complete its first codon.

This is also what is seen in the sequence for that transcript (here with intron included):

>hg19_refGene_NM_001369291_22 range=chr22:19495289-19495387 5'pad=0 3'pad=0 strand=+ repeatMasking=none
TCTTCCCCTGAAGCAGGTGAAGCAGAAGTTCCAGGCCATGGACATCTCCT
TGAAGGAGAATTTGCGGGAAATGATTGAAGAGTCTGCAAATAAATTTGG
>hg19_refGene_NM_001369291_23 range=chr22:19495388-19496052 5'pad=0 3'pad=0 strand=+ repeatMasking=none
gtaaacacacatttttctggatttatcttcattacatccaggttcattaa
agtgaagggttctactttaactgtgctcctaaaataatgcaaaaaaaacc
aacgtgctcctaaaataatgcaaaaaaaacaacaacccgagaatgtacaa
atgagtgttatctgagtctggccattcctgaagtcttgagttctttttgg
gttaaaaataagacttcttcaagcttgtgaggtttaggatcctaggatcc
ttggatcctagggctgctgtgggacctgtgaggtcgaccccagtctgttt
cactggagacagcaaatgggccacagggccaggttcaggtgaactctgcc
cgaccaagtccatgggcactcctggaggcccctggcttctcccaccccag
ccagctggcatgctaaggtgtgagagaggaccccacacaccccctaagcc
agccataaagctgttgacaagaaggcacccggccactctgggctgcaggg
ctggtatgtctggacgttctgacctgcctcatcctctgagcagcatggct
gtgctggaagcattgacttgggcctgtgagggacattagcacatctgttg
gcctgcctggcagtgagagcttgcccacaatttgagggtgacagctgtgt
ttgctcccatgacag
>hg19_refGene_NM_001369291_24 range=chr22:19496053-19496214 5'pad=0 3'pad=0 strand=+ repeatMasking=none
GATGAAGGACATGCGCGTGCAGACTTTCAGCATTCATTTTGGGTTCAAGC
ACAAGTTTCTGGCCAGCGACGTGGTCTTTGCCACCATGTCTTTGATGGAG
AGCCCCGAGAAGGATGGCTCAGGGACAGATCACTTCATCCAGGCTCTGGA
CAGCCTCTCCAG

Let me know if this does not answer your question, and we can look into it further.

We can also answer questions via our public help desk which can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.