Hi, so I have a nucleotide mutliple sequence alignment that I would like to translate into an amino acid MSA based on the reading frame of a reference sequence in that alignment. Looking for the best way to do this, preferably a Biopython way. Thanks!
not sure you are still interested but...this solution assumes that the nucleotide alignment is a codon alignment. If not, you will end up with a bunch of "X" as aminoacid
from Bio import SeqIO
with open("translated.fas","w") as out:
for record in SeqIO.parse("alignment.phy","phylip"): ##change this to whichever format
sequence=[]
for c in range(0,len(record.seq),3): change to 0, 1, 2 depending on the frame of the reference
codon = record.seq[c:c+3]
if "-" not in str(codon):
sequence.append( str(codon.translate()) )
elif str(codon)=="---":
sequence.append( "-" )
else:
sequence.append( "X" )
print >>out, ">"+record.id
print >>out, "".join(sequence)
Why is it better to translate a nucleotide sequence to an amino acid sequence for MSA?
DNA codons can are redundant, amino acids are not.