Question

Internal stop codons in protein sequences

0

Entering edit mode

8.0 years ago

vivekananthrp ▴ 20

Hi, there are protein sequences with internal stop codons (* in between).

For example,

http://www.candidagenome.org/cgi-bin/protein/proteinPage.pl?dbid=CAL0000175821&seq_source=C.%20albicans%20SC5314%20Assembly%2022

What does these many internal stop codon means? While running bioinformatics predictions should I remove these *? Or should I remove these sequence from the fasta file before analysis?

Thanks in advance.

-Vivek Ananth

fasta sequence • 4.5k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 8.0 years ago by vivekananthrp ▴ 20

score 3 · Answer 1 · 2016-12-07

3

Entering edit mode

8.0 years ago

Brian Bushnell 20k

That means the annotation is incorrect. The frame is wrong, or the sequence is wrong, or there isn't really a gene there.

ADD COMMENT • link 8.0 years ago by Brian Bushnell 20k

score 2 · Answer 2 · 2016-12-07

No idea why the amino acid sequence is so full of errors but, if you download the DNA sequence and translate it yourself, it produces a single open reading frame. You may want to contact the webmaster at CGD about this problem.

FYI, I thought it might be due to the alternative genetic code used by Candida (both nuclear and mitochondrial genes contain variant codons), but that's not the explanation.