Entering edit mode
2.8 years ago
nrizzolo1796
•
0
I have this file
files = list(SeqIO.parse("proteasomes.fasta", "fasta"))
Which is a list of proteasome Amino Acid sequences like "MALASVLERPLPVNQRGFFGLGGRADLLDLGPGSLSDGLSL..."
I want to convert each letter of the sequence to a number specified in this dictionary
AMINO_ACID_TO_ID = {'0': 0,
'A': 1,
'C': 2,
'D': 3,
'E': 4,
'F': 5,
'G': 6,
'H': 7,
'I': 8,
'K': 9,
'L': 10,
'M': 11,
'N': 12,
'P': 13,
'Q': 14,
'R': 15,
'S': 16,
'T': 17,
'V': 18,
'W': 19,
'Y': 20}
Sample code I tried but did not work
converted = np.asarray[AMINO_ACID_TO_ID[(files[0].seq)]]
Any quick way to do this?
out of curiosity, how will you make the distinction between for instance two As and one M ?
Not sure yet I was thinking a numpy array. The goal is to use the fasta files as input data for a neural net to generate similar sequences.
ah, ok, you keep them in an array (would indeed only be a problem if you print them in a line again)
That is a good point because they would need to be converted back to letters at the end.