Question

how to convert a protein sequence into a list of characters

0

Entering edit mode

2.6 years ago

Debut ▴ 20

I would like to convert a protein sequence and a sequence containing mutations (the mutations between the protein sequences and the reference sequence) into lists. In order to be able to compare the two lists. for example :

Seq           GEDAPEEMN
----------
Mut                   LM
----------
output seq     ['G',    'E', 'D', 'A', 'P','E', 'E', 'M', 'N'] (list)
----------
output mut [' L  ', 'M ', '    ', '', '',' ', ' ', ' ', ' ']

I made this code but it does not work:

lignes=myFile.readlines()
for ligne in lignes :
    split_tableau= ligne.split(",")
    seq= split_tableau[4]
    mut= split_tableau[6]

    for caraS in seq :
        caraSeq= caraS.split()
    for caraM in mut :
        caraMut= caraM.split(
        print(caraMut)

python list • 1.8k views

ADD COMMENT • link updated 2.6 years ago by Jeremy ▴ 930 • written 2.6 years ago by Debut ▴ 20

0

Entering edit mode

I reformatted your post - maybe lost some of the white-space formatting you'd added in, sorry about that. Can you please check again so your Seq Mut etc are properly formatted? Also, do the extra white spaces in the output seq and output mut blocks have any significance?

ADD REPLY • link 2.6 years ago by Ram 44k

0

Entering edit mode

Thanks for your answer. mut: this is a sequence that highlights the mutations that the sequence has in relation to a reference sequence (I don't need the reference sequence for my code that's why it doesn't appear). The spaces show the matches between my sequence and the reference sequence. for example in the first and second position it's mutations and the rest where there are spaces is to say that it's match it's the same amino acid that appears

ADD REPLY • link 2.6 years ago by Debut ▴ 20

0

Entering edit mode

So in your example, the "L" and "M" are supposed to match with the "G" and "E"?

its not clear at all where the mutations are supposed to come in or what governs their position.

ADD REPLY • link 2.6 years ago by Joe 21k

0

Entering edit mode

Thank you for your feedback, it's two sequences: mut is a sequence that shows the mutations of the protein sequence compared to the reference sequence (I didn't present in my code the reference sequence because I don't need it). and seq is the protein sequence that has been compared with the reference sequence I want to put the two sequences in list format and compare them with their indexes in order to have the position of the mutations for example G in L in position 0. And when there is a space as in position 3, 4, .... that is to say that the protein sequence and the reference sequence there is a match (the reference sequence is not illustrated in the code because I don't need it) so first I want to turn them into a list but the problem with seq (mut), each character is in a list alone

Translated with www.DeepL.com/Translator (free version)

ADD REPLY • link 2.6 years ago by Debut ▴ 20

0

Entering edit mode

You need to stop opening new questions. This is the 3rd post I've seen of yours in the last 2 or 3 days, all related to the same topic.

ADD REPLY • link 2.6 years ago by Joe 21k

score 0 · Answer 1 · 2022-05-19

0

Entering edit mode

2.6 years ago

Jeremy ▴ 930

You can use the following code in Python.

newseq = list(Seq)
for letter in Mut:
    if letter == ' ':
       letter == '-'
newmut = list(Mut)

ADD COMMENT • link 2.6 years ago by Jeremy ▴ 930

1

Entering edit mode

Thank you for your answer. When I display on the console I have each character in a list alone but I want all the characters of the senescences in the same list. I have several sequences so I need a for loop isn't it ?

with open ('data2.csv', 'r') as myFile: lignes=monFichier.readlines() for ligne in lignes : split_tableau= ligne.split(",") seq= split_tableau[4] mut= split_tableau[6]

    #print(seq)
    #print(mut)
    for s in seq :
        caraSeq= list(s)
    for m in mut :
        caraMut= list(m)
        print(caraMut)

ADD REPLY • link 2.6 years ago by Debut ▴ 20

0

Entering edit mode

OK, I thought your sequences were strings. It looks like you're starting with a CSV file. It might be helpful if you could show the first 5 or 10 rows of your CSV file and what you would like the final output to look like.

ADD REPLY • link 2.6 years ago by Jeremy ▴ 930

score 0 · Answer 2 · 2022-05-20

Maybe this solves the question but I'm still not sure:

# Make lists from strings
>>> seq = list("GEDAPEEMN")
>>> seq
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
>>> mut = list ("LM")
>>> mut
['L', 'M']

# Pad the mutation list
mut += [' '] * (len(seq) - len(mut))

# Display
>>> print(seq,mut, sep="\n")
['G', 'E', 'D', 'A', 'P', 'E', 'E', 'M', 'N']
['L', 'M', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

This will only work for right hand padding though. Without more information we can't write a general solution because we know nothing about how the mutation coordinates are determined.