What does an amino acid sequence that does not start with methionine mean?
4
0
Entering edit mode
2.8 years ago
Riku ▴ 80

Dear, all,

The de novo transcriptome gave me the following amino acid sequences. Many of the amino acid sequences start with methionine, but some of them do not start with methionine like this.

What is this protein?

>At_DN540_c0_g1_i10.p1   AEDLRETAGEYADSAKEKAENLKNRAKEHLPDSDTLSKTKESLKKKASDGKEYVQEKAEDLRETAGEYADSAKEKAENLKNRAKEHLPDSDTLSKTKESLKKKASDGKEYVQEKAEDLRETAGEYADSAKEKAENLKNRAKEQAENLKDNVKDKYDEYTDSAKEYANDAKDKVKSKTEEYTGSAKKTANDVKDKVKSKANDLKEGAQGMSESVSDSAREGFHHVEDKAEDIKEAGEQKKEEIDAKAEEAKSTIGGKIKSAADTVVGGVKSAAESVGSTVSGVFSSASDKADQAKEDVKEKAEEKREEIKESVRRKRETVGEKIDSNIEDAKSKASDAKAAAGEKIDEAKEKLNRFRRHTSAEEAGEKAGSTIDRAREKASEAGQAVGDKAKELKDDVTNRMKRAENDTVSGGGSKIGEGLAEIGGAAKTGAANAGATVVGGVIFAGEKVGEGAKAAKDKTVETAQAVGDKASEAAEAARQKADDAKNSFGKTVSFNTRSDTSIQNLGNL*

Thank you for reading.

de Trinity novo assembly protein TransDecoder • 2.5k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
Mensur Dlakic ★ 28k

There are many reasons, and some have been mentioned by others. Briefly: 1) gene overprediction, where there is an open reading frame even before Met and it gets added to it; 2) As swbarnes2 pointed out, GTG (valine) is often used as a start codon. When at the start it will be translated as Met, while GTG anywhere else in the sequence will be translated as Val; 3) transcripts are incomplete at the 5' end.

ADD COMMENT
0
Entering edit mode

Hi.

When I was evaluating coding sequence from ensembl transcripts, I found some transcript coding sequence do not start with ATG. They start with GTC, CAC, GGC... etc. The peptide sequence sometimes start with X, H... etc.

Some examples: ENST00000466610 (start with H), ENST00000680216 and ENST00000704018 (start with X) , ENST00000685033 (start with G).

I can't find a reason and these transcripts are not using rare start codon either. Would you mind explaining a little bit more? Thank you.

ADD REPLY
1
Entering edit mode
2.8 years ago

I know that tuberculosis, which is 66% GC, often starts its coding DNA with GTG. Though I believe when this is at the beginning of the coding region, it really is translated to Met, but a naive translator program might not know that.

I think you also have to consider that you don't have the start of the protein. BLASTing the protein you provided, it doesn't hit the beginning of any proteins, just the middles.

ADD COMMENT
0
Entering edit mode

Thank you very much for your help. I did a BLAST and found many hits in the front section of this protein sequence, as shown in the figure. It seems likely that the protein sequence is not complete.

enter image description here

ADD REPLY
0
Entering edit mode
2.8 years ago
5heikki 11k

Codons other than AUG can initiate translation in many "genetic codes"

Perhaps you didn't translate your transcriptome into proteins following the proper code..

ADD COMMENT
0
Entering edit mode
2.8 years ago

Although the most common start codon codes for methionine, this amino acid can also be cleaved off posttranslationally.

ADD COMMENT
0
Entering edit mode

But a program translating a de novo transcriptome won't know that.

ADD REPLY

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6