Question

Translation Of Nucleotide To Amino Acid

3

Entering edit mode

12.5 years ago

figo ▴ 220

HI All,

I need to translate the multi fasta file nucleotide sequences to aminoacid into 6 reading frames and select the best reading frame that defines the nucleotide sequences. Do any one of you have any perl script or any tool that can do this will be a great help.

Regards

translation • 25k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 12.5 years ago by figo ▴ 220

Ram · Answer 1 · 2012-10-29

6

Entering edit mode

12.5 years ago

JC 13k

Emboss has sixpack to do that: http://emboss.sourceforge.net/apps/cvs/emboss/apps/sixpack.html

ADD COMMENT • link 12.5 years ago by JC 13k

0

Entering edit mode

Hello, the Emboss util doesn't work:

@BioPower3-IBM ~/Bacteria/cufflinks/older $ sixpack -mstart yes -orfminsize 300 -sequence diff_expressed.fasta -outfile diff_expressed.sixpack -outseq diff_expressed_AA.fasta -mstart -orfminsize 50
Display a DNA sequence with 6-frame translation and ORFs
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20:07:41
@BioPower3-IBM ~/Bacteria/cufflinks/older $ head diff_expressed_AA.fasta
>16451-16691(-)_1_ORF1  Translation of 16451-16691(-) in frame 1, ORF 1, threshold 50, 79aa
MKKLLFLFFALTAFLFGAVNINTATLKELKSLNGIGEAKAKAILEYRKEANFTSIDDLKK
VKGIGDKLFEKIKNDITIE
>16451-16691(-)_3_ORF1  Translation of 16451-16691(-) in frame 3, ORF 1, threshold 50, 37aa
EKITIFIFCFNGFSLWCCKYQHCNTKRIKKFKWYWRS
>16451-16691(-)_4_ORF1  Translation of 16451-16691(-) in frame 4, ORF 1, threshold 50, 36aa
LFYCDIIFDFFKKLITYAFNFFKIINTCKICFFAVF
>16451-16691(-)_5_ORF1  Translation of 16451-16691(-) in frame 5, ORF 1, threshold 50, 3aa
ILL
>16451-16691(-)_6_ORF1  Translation of 16451-16691(-) in frame 6, ORF 1, threshold 50, 64aa

Even though a minimum size for ORFs is set, smaller ORFs are reported, and the first AA is not always M, but it should be.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 10.0 years ago by apelin20 ▴ 490

Ram · Answer 2 · 2015-09-18

Here is the corrected code from shane.neely, which is otherwise very elegant and useful:

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for I in range(len(sequence)//3)])
      return translate

def reverse_complement( sequence ):
      reversed_sequence = (sequence[::-1])
      rc = ''.join([basepairs.get(reversed_sequence[i], 'X') for I in range(len(sequence))])
      return rc

print(translate_frameshifted(beans[0:]))    # first frame
print(translate_frameshifted(beans[1:]))    # second frame
print(translate_frameshifted(beans[2:]))    # third frame
print(translate_frameshifted(reverse_complement(beans)))    # negative first frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-1])))    # negative second frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-2])))    # negative third frame

# This ::-1 syntax in python means reverse the string.

score 1 · Answer 3 · 2012-10-30

1

Entering edit mode

12.5 years ago

vaskin90 ▴ 290

You may use UGENE Workflow Designer. No scripts needed, only GUI manipulations 1) Download UGENE http://ugene.unipro.ru/ 2) Open Workflow Designer from the Tools menu 3) Create a scheme with the following elements: "Read Sequence" -> "Amino Translation" -> "Write sequence".

ADD COMMENT • link 12.5 years ago by vaskin90 ▴ 290

Ram · Answer 4 · 2015-10-16

1

Entering edit mode

9.5 years ago

x.jack.min ▴ 20

Try use the orfpredictor - it gives you the best and all 6-frames, and also retrieve the cds.

http://proteomics.ysu.edu/tools/OrfPredictor.html

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by x.jack.min ▴ 20

score 1 · Answer 5 · 2018-12-17

I have written a program in Python 3 that takes a nucleotide FASTA file as input, and translates each sequence in that file in the frame which produces the fewest number of stop codons.

The software transcribes each sequence in all six frames and counts the number of STOP codons in each. It then writes the original sequence in the 'optimal' frame to an output FASTA file, with the frame name appended to the contig name. In cases where there are multiple optimal frames Optimal Translate writes both into the output FASTA file. It only needs Python 3 to run and works on both DNA and RNA sequences.

The software can be run in Python or as a Geneious 11 plugin on Windows.

You can access it here: https://github.com/Kzra/Optimal-Translate

score 0 · Answer 6 · 2012-10-29

0

Entering edit mode

12.5 years ago

Sukhi Singh 11k

Try this python script from this resource, I haven't tested it though.

[translatedna.py]read a fasta file, do 6-frame translation, print best protein seqs as fasta

ADD COMMENT • link 12.5 years ago by Sukhi Singh 11k

1

Entering edit mode

beware, "haven't tested" is a big risk in the case there is in fact a mistake or bug. In general, for routine tasks like this, in my opinion no new code should be written at all given the presence of highly stable and well tested code. The reason why I like to stress this is that you can taint all your downstream analysis, by e.g. a typo in the genetic code matrix.

ADD REPLY • link 12.5 years ago by Michael 55k

0

Entering edit mode

Yeah I agree, but it was point in right direction, that why I mentioned that "I haven't tested the code". The user should at-least bear responsibility for testing and usage which might not be the case if someone is naive in the subject and is unable to test the result

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

score 0 · Answer 7 · 2012-10-29

0

Entering edit mode

12.5 years ago

Neilfws 49k

There are multiple tools available online to do this. Did you try "translate 6 frames" as a Google search? Here's the top hit.

ADD COMMENT • link 12.5 years ago by Neilfws 49k

score 0 · Answer 8 · 2012-10-29

Here is some python to translate all six frames of DNA.

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)])
      return translate

print translate_frameshifted(beans[0:])    # first frame
print translate_frameshifted(beans[1:])    # second frame
print translate_frameshifted(beans[2:])    # third frame
print translate_frameshifted(beans[::-1][0:])    # first frame from end
print translate_frameshifted(beans[::-1][1:])    # second frame from end
print translate_frameshifted(beans[::-1][2:])    # third frame from end

# This ::-1 syntax in python means reverse the string.

Result:

_LCF_TINDLNQVWLPMVI
DCVSEQ_MT_TRYGCRWLS
TVFLNNK_LKPGMAADGYL
FYW_PSVWTKFSK_QVFVS
SIGSRRYGPNSVNNKSLCQ
LLVAVGMDQIQ_ITSLCVS

score 0 · Answer 9 · 2015-09-18

0

Entering edit mode

9.6 years ago

iontrap.ms ▴ 20

Result:

_LCF_TINDLNQVWLPMVI DCVSEQ_MT_TRYGCRWLS TVFLNNK_LKPGMAADGYL KITIGSHTWFKSFIVQKHS R_PSAAIPGLSHLLFRNTV DNHRQPYLV_VIYCSETQS

ADD COMMENT • link 9.6 years ago by iontrap.ms ▴ 20