Length(Ucsc/Ensgene) % 3 != 0
2
1
Entering edit mode
14.5 years ago

Before I ask the question to the UCSC mailing list: is it me or something else ?

I've noticed that some records (not all) in hg18/UCSC ensGene.txt coding for a protein have a size where length%3!=0

For example for http://genome.ucsc.edu/cgi-bin/hgc?g=htcGeneMrna&i=ENST00000383614&c=chr6&l=30082338&r=30083996&o=ensGene&table=ensGene

>ENST00000383614
ccccagacgccgacgatggggtcATGGCGCCCCGAACCCTCCTCCTGCTG
CTCTCGGGGACCCTGGCCCTGGCCGAGACCTGGGCGGCCCCCCCCAAGAC
ACACGTGACCCacccccctctctgaacatgaggcataa

echo -n ATGGCGCCCCGAACCCTCCTCCTGCTGCTCTCGGGGACCCTGGCCCTGGCCGAGACCTGGGCGGCCCCCCCCAAGACACACGTGACCC | wc -c
88

but 88%3!=0

is it an error from the UCSC or am I missing something ?

protein ucsc cdna translation sequence • 2.7k views
ADD COMMENT
0
Entering edit mode

this sequence you posted has a stop codon in position 87 of the nucleotide seq (84 starting counting from 0).

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

yes, that was an error "We have determined that the data as originally incorporated into the track was strangely annotated and Ensembl has since corrected the error. The track on our side will be updated (and this data corrected) at the next update"

ADD REPLY
2
Entering edit mode
14.5 years ago

It may be an error in the annotation: there are many, I can assure you. A while ago, the ensembl's maintainer made disappear a gene that I was studying, as they merged its transcript with another gene.

Notice that the sequence you posted has a stop codon in position 87 of the nucleotide seq (84 if you start counting from 0).

By the way, the sequence you posted belong to a MHC chain, a gene which is well known for its variability and for generating a lot of transcripts.

ADD COMMENT
0
Entering edit mode

Agree that this is a very variable region with several transcripts and a pseudogene.

ADD REPLY
2
Entering edit mode
14.5 years ago
Neilfws 49k

The same transcript at ensembl.org has length = 87 bp and a slightly different 3' sequence. I wonder if this is related to UCSC sequences having zero-based starts (i.e. first base = 0)?

ADD COMMENT
0
Entering edit mode

Forgot to add that this comes from the latest ensembl, whereas your data are from HG18.

ADD REPLY

Login before adding your answer.

Traffic: 2462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6