I see lots of example on how to convert DNA to RNA or RNA to Amino. I see plenty examples using Python and Biopython. How could I do the reverse? Amino acid sequence to RNA and then RNA to DNA.
Define a dict that maps Amino acids to the corresponding codon
AA_codon = {
'C': ['TGT', 'TGC'],
'A': ['GAT', 'GAC'],
'S': ['TCT', 'TCG', 'TCA', 'TCC', 'AGC', 'AGT'],
'G': ['CAA', 'CAG'],
'M': ['ATG'], #Start
'A': ['AAC', 'AAT'],
'P': ['CCT', 'CCG', 'CCA', 'CCC'],
'L': ['AAG', 'AAA'],
'Q': ['TAG', 'TGA', 'TAA'], #Stop
'T': ['ACC', 'ACA', 'ACG', 'ACT'],
'P': ['TTT', 'TTC'],
'A': ['GCA', 'GCC', 'GCG', 'GCT'],
'G': ['GGT', 'GGG', 'GGA', 'GGC'],
'I': ['ATC', 'ATA', 'ATT'],
'L': ['TTA', 'TTG', 'CTC', 'CTT', 'CTG', 'CTA'],
'H': ['CAT', 'CAC'],
'A': ['CGA', 'CGC', 'CGG', 'CGT', 'AGG', 'AGA'],
'T': ['TGG'],
'V': ['GTA', 'GTC', 'GTG', 'GTT'],
'G': ['GAG', 'GAA'],
'T': ['TAT', 'TAC'] }
Basically, it's because there are so many possible combinations, and so much redundancy, that the number of possible sequences that give rise to a particular amino acid, is too big to be useful for anything downstream. It would also be difficut to even represent the data in a useful way.
You can choose a frequency table, and sort of do a codon optimization. I say sort of, because you won't be taking into account PTMs, or other risks.
I was under the impression that Biopython had method(s) to work this kind of problem, but I was not considering the "redundancy" of the codon table.