I would like to convert a protein sequence and a sequence containing mutations (the mutations between the protein sequences and the reference sequence) into lists. In order to be able to compare the two lists. for example :
Seq GEDAPEEMN
----------
Mut LM
----------
output seq ['G', 'E', 'D', 'A', 'P','E', 'E', 'M', 'N'] (list)
----------
output mut [' L ', 'M ', ' ', '', '',' ', ' ', ' ', ' ']
I made this code but it does not work:
lignes=myFile.readlines()
for ligne in lignes :
split_tableau= ligne.split(",")
seq= split_tableau[4]
mut= split_tableau[6]
for caraS in seq :
caraSeq= caraS.split()
for caraM in mut :
caraMut= caraM.split(
print(caraMut)
I reformatted your post - maybe lost some of the white-space formatting you'd added in, sorry about that. Can you please check again so your Seq Mut etc are properly formatted? Also, do the extra white spaces in the output seq and output mut blocks have any significance?
Thanks for your answer. mut: this is a sequence that highlights the mutations that the sequence has in relation to a reference sequence (I don't need the reference sequence for my code that's why it doesn't appear). The spaces show the matches between my sequence and the reference sequence. for example in the first and second position it's mutations and the rest where there are spaces is to say that it's match it's the same amino acid that appears
So in your example, the "L" and "M" are supposed to match with the "G" and "E"?
its not clear at all where the mutations are supposed to come in or what governs their position.
Thank you for your feedback, it's two sequences: mut is a sequence that shows the mutations of the protein sequence compared to the reference sequence (I didn't present in my code the reference sequence because I don't need it). and seq is the protein sequence that has been compared with the reference sequence I want to put the two sequences in list format and compare them with their indexes in order to have the position of the mutations for example G in L in position 0. And when there is a space as in position 3, 4, .... that is to say that the protein sequence and the reference sequence there is a match (the reference sequence is not illustrated in the code because I don't need it) so first I want to turn them into a list but the problem with seq (mut), each character is in a list alone
Translated with www.DeepL.com/Translator (free version)
You need to stop opening new questions. This is the 3rd post I've seen of yours in the last 2 or 3 days, all related to the same topic.