Question

How to convert amino acid sequence to numbers

0

Entering edit mode

2.8 years ago

nrizzolo1796 • 0

I have this file

files = list(SeqIO.parse("proteasomes.fasta", "fasta"))

Which is a list of proteasome Amino Acid sequences like "MALASVLERPLPVNQRGFFGLGGRADLLDLGPGSLSDGLSL..."

I want to convert each letter of the sequence to a number specified in this dictionary

AMINO_ACID_TO_ID = {'0': 0,
                'A': 1,
                'C': 2,
                'D': 3,
                'E': 4,
                'F': 5,
                'G': 6,
                'H': 7,
                'I': 8,
                'K': 9,
                'L': 10,
                'M': 11,
                'N': 12,
                'P': 13,
                'Q': 14,
                'R': 15,
                'S': 16,
                'T': 17,
                'V': 18,
                'W': 19,
                'Y': 20}

Sample code I tried but did not work

converted = np.asarray[AMINO_ACID_TO_ID[(files[0].seq)]]

Any quick way to do this?

fasta acids amino • 1.6k views

ADD COMMENT • link 2.8 years ago by nrizzolo1796 • 0

0

Entering edit mode

out of curiosity, how will you make the distinction between for instance two As and one M ?

ADD REPLY • link 2.8 years ago by lieven.sterck 15k

0

Entering edit mode

Not sure yet I was thinking a numpy array. The goal is to use the fasta files as input data for a neural net to generate similar sequences.

ADD REPLY • link 2.8 years ago by nrizzolo1796 • 0

1

Entering edit mode

ah, ok, you keep them in an array (would indeed only be a problem if you print them in a line again)

ADD REPLY • link 2.8 years ago by lieven.sterck 15k

0

Entering edit mode

That is a good point because they would need to be converted back to letters at the end.

ADD REPLY • link 2.8 years ago by nrizzolo1796 • 0

score 2 · Answer 1 · 2022-03-07

2

Entering edit mode

2.8 years ago

Mensur Dlakic ★ 28k

You may want to read about one-hot encoding of amino acids. Out of curiosity, why is 0 not used for an amino acid?

ADD COMMENT • link 2.8 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thank you for your sources. I copied that part of the code so I'm not sure why 0 is not used.

ADD REPLY • link 2.8 years ago by nrizzolo1796 • 0