Question

EMBOSS transeq translate to protein with 3 letter code

0

Entering edit mode

6.9 years ago

marongiu.luigi ▴ 750

dear all,

would be possible to translate a nucleotide sequence with the three letter code using EMBOSS transeq? I can do it with the one letter code

$ echo atgtttcaggacccacaggagtaa | transeq -filter -osformat2 text
MFQDPQE*

But I don't see in the manual the 3 letter code.

Thank you

emboss transeq translation 3 letter code • 3.2k views

ADD COMMENT • link 6.9 years ago by marongiu.luigi ▴ 750

0

Entering edit mode

If you don't see that option in the manual then no. You could do some replacements with sed if you must have three letter code.

ADD REPLY • link 6.9 years ago by GenoMax 153k

0

Entering edit mode

Any particular reason why you want to do that? It will only be very confusing ...

ADD REPLY • link 6.9 years ago by lieven.sterck 15k

1

Entering edit mode

Just for graphical reasons: with the three letter code, it is easier to see the correspondence with the triplette:

atgttt...
MetPhe...

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 750

1

Entering edit mode

I find this even easier:

atgttt
 M  F

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

yes but you need to add 5 spaces because the sequence is given as MF not ad _M__F_

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 750

1

Entering edit mode

Not quite right, the general pattern is you have to insert one initial space, then two spaces between every amino acid, then a final space. There are several tricks around to split a string into characters. As I like perl, split //, $_ would split a string at every character, then join with join. The split PerlDoc has some examples of using them together.

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

It wouldn't be too hard to write a script which converts between one and three letter codes. There must be ample python examples to get you started.

ADD REPLY • link 6.9 years ago by WouterDeCoster 48k

0

Entering edit mode

sure, that is not the problem, just wanted to know if transeq does it directly to save the effort...

ADD REPLY • link 6.9 years ago by marongiu.luigi ▴ 750

score 3 · Accepted Answer · 2018-09-20

3

Entering edit mode

6.9 years ago

cpad0112 21k

The output peptide sequence is always in the standard one-letter IUPAC code.

http://structure.usc.edu/emboss/transeq.html

and try this:

$ echo atgtttcaggacccacaggagtaa | showseq -filter -threeletter y -format 4

           10        20        
  ----:----|----:----|----
  atgtttcaggacccacaggagtaa

  MetPheGlnAspProGlnGlu***