How to translate protein sequences to Nucleotide sequences?
2
3
Entering edit mode
4.9 years ago
Misha ▴ 60

I want to convert a list of fasta ( protein sequences) in a .text file into corresponding nucleotide sequences. A Google search gives me result of DNA to protein conversion but not vice versa. Also, I came across How do I find the nucleotide sequence of a protein using Biopython?, but this is what I am not looking for. Is there any possible way to do it using python.Moreover, I would like to solve it using python programming. I am sure there must be some way to do it rather than writing a code from scratch. Thanks!

protein sequence Nucleotide translation • 6.0k views
ADD COMMENT
1
Entering edit mode

would it be possible to give a bit of context?

Biologically it is (near) impossible to translate a protein back to its dna sequence.

You can translate the protein into a dna sequence but not into its dna sequence

and more on topic: if there is a biopython solution, why is that no good then? I'm no python expert but it should be possible to create a dictionary where every aminoacid points to a codon (3 nucleotides), then loop over each aminoacid and print the codon for it

ADD REPLY
1
Entering edit mode

Hello Misha!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/11345/how-to-translate-amino-acid-sequences-to-nucleotide-sequences

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
4
Entering edit mode
4.9 years ago
cschu181 ★ 2.8k

As lieven.sterck points out: this returns you 'a' backtranslation of a peptide sequence. You could use a more dedicated statistical model using codon frequencies from your organism under study, but this is the gist of it:

import random

AA2NA = {
    "A": list("GCT,GCC,GCA,GCG".split(",")),
    "R": list("CGT,CGC,CGA,CGG,AGA,AGG".split(",")),
    "N": list("AAT,AAC".split(",")),
    "D": list("GAT,GAC".split(",")),
    "C": list("TGT,TGC".split(",")),
    "Q": list("CAA,CAG".split(",")),
    "E": list("GAA,GAG".split(",")),
    "G": list("GGT,GGC,GGA,GGG".split(",")),
    "H": list("CAT,CAC".split(",")),
    "I": list("ATT,ATC,ATA".split(",")),
    "L": list("TTA,TTG,CTT,CTC,CTA,CTG".split(",")),
    "K": list("AAA,AAG".split(",")),
    "M": list("ATG".split(",")),
    "F": list("TTT,TTC".split(",")),
    "P": list("CCT,CCC,CCA,CCG".split(",")),
    "S": list("TCT,TCC,TCA,TCG,AGT,AGC".split(",")),
    "T": list("ACT,ACC,ACA,ACG".split(",")),
    "W": list("TGG".split(",")),
    "Y": list("TAT,TAC".split(",")),
    "V": list("GTT,GTC,GTA,GTG".split(",")),
    "*": list("TAA,TGA,TAG".split(","))
}

def aa2na(seq):
    na_seq = [random.choice(AA2NA.get(c, ["---"])) for c in seq]
    return "".join(na_seq)

print("MARNDCQEGHILKMFPSTWYV*", aa2na("MARNDCQEGHILKMFPSTWYV*"))

One possible output:

MARNDCQEGHILKMFPSTWYV* ATGGCTCGAAATGACTGCCAAGAGGGACACATTCTTAAAATGTTTCCGAGTACCTGGTACGTCTAA

Edit: changed return value of AA2NA.get() for "unknown" amino acids to "---" instead of "-".

ADD COMMENT
0
Entering edit mode

Thanks a lot for answering this.

ADD REPLY
2
Entering edit mode
4.9 years ago
Mensur Dlakic ★ 28k

There is something called codon degeneracy which means that multiple nucleotide triplets (codons) translate into the same amino-acid. Conversely, a single amino-acid can be translated into multiple codons, which is why there is no single solution for what you are asking.

ADD COMMENT

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6