As lieven.sterck points out: this returns you 'a' backtranslation of a peptide sequence. You could use a more dedicated statistical model using codon frequencies from your organism under study, but this is the gist of it:
import random
AA2NA = {
"A": list("GCT,GCC,GCA,GCG".split(",")),
"R": list("CGT,CGC,CGA,CGG,AGA,AGG".split(",")),
"N": list("AAT,AAC".split(",")),
"D": list("GAT,GAC".split(",")),
"C": list("TGT,TGC".split(",")),
"Q": list("CAA,CAG".split(",")),
"E": list("GAA,GAG".split(",")),
"G": list("GGT,GGC,GGA,GGG".split(",")),
"H": list("CAT,CAC".split(",")),
"I": list("ATT,ATC,ATA".split(",")),
"L": list("TTA,TTG,CTT,CTC,CTA,CTG".split(",")),
"K": list("AAA,AAG".split(",")),
"M": list("ATG".split(",")),
"F": list("TTT,TTC".split(",")),
"P": list("CCT,CCC,CCA,CCG".split(",")),
"S": list("TCT,TCC,TCA,TCG,AGT,AGC".split(",")),
"T": list("ACT,ACC,ACA,ACG".split(",")),
"W": list("TGG".split(",")),
"Y": list("TAT,TAC".split(",")),
"V": list("GTT,GTC,GTA,GTG".split(",")),
"*": list("TAA,TGA,TAG".split(","))
}
def aa2na(seq):
na_seq = [random.choice(AA2NA.get(c, ["---"])) for c in seq]
return "".join(na_seq)
print("MARNDCQEGHILKMFPSTWYV*", aa2na("MARNDCQEGHILKMFPSTWYV*"))
One possible output:
MARNDCQEGHILKMFPSTWYV* ATGGCTCGAAATGACTGCCAAGAGGGACACATTCTTAAAATGTTTCCGAGTACCTGGTACGTCTAA
Edit: changed return value of AA2NA.get() for "unknown" amino acids to "---" instead of "-".
would it be possible to give a bit of context?
Biologically it is (near) impossible to translate a protein back to its dna sequence.
You can translate the protein into a dna sequence but not into its dna sequence
and more on topic: if there is a biopython solution, why is that no good then? I'm no python expert but it should be possible to create a dictionary where every aminoacid points to a codon (3 nucleotides), then loop over each aminoacid and print the codon for it
Hello Misha!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/11345/how-to-translate-amino-acid-sequences-to-nucleotide-sequences
This is typically not recommended as it runs the risk of annoying people in both communities.