Entering edit mode
10.3 years ago
ishengomae
▴
110
I am looking for tools that would help process multiple alignment of protein sequences and extract polymorphic sites only--based on reference sequence. Well, what I have in mind is an output of something matrix-like like so:
Position 2 10 20 45 60 63
RefSeq Val Leu Ala Ser Phe Thr
Seq1 Met Tyr
Seq2 Gly Pro
Seq3 Ala Arg
Any suggestion for tools or scripts(preferably python) that can help me achieve this fast?
Thanks.
Excellent and elegant, thanks very much.
Hello @Cytosine, more help please.
I implemented this code with a snippet of my data and I think I am just a small step away from achieving my goal.
This would give me:
The positional information at the top is crucial and I only want to extract corresponding reference residues and those aligned to those positions, not the entire length of the alignment. I tried to figure how to modify this code to do that, but so far unsuccessfully.
@Cytosine,
Finally I have found a solution. You helped me so much that I can't help sharing my solution with you -- which is just a small modification of your code. Again, thanks so much. This code does the job.