I have a large amount of align protein sequences in the .fasta forma, and a reference sequence, every of that has the same length. I would like to extract only the amino acid mutations from these sequences, so that, in the end, I want to have a list that looks something like this: I456L, W675T, etc . Is there a software or any way to do this? Thankful
Pierre has a complete solution but in case that does not work you could use
blastp
with-outfmt 3
which will identify the difference and output it so.Biopython blast parser may be able to help finish the rest.