Hello!
I would like to merge all amino acid sequences with identical names in a given fasta file (trimmed alignment). I know that this can be done for manually selected sequences with Aliview, however, I am looking for a tool or script (bash, perl, python) that allows me to do that for hundreds of files without manual selection. Non-matching amino acids should be replaced by "X". If the script/tool allows to set up a sequence similarity and overlapping length threshold, even better :) Any help would be much appreciated. Thank you!
My file:
>Blue
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
>Red
CIKKCPFGALSIVNLPSNLEKETTHRYCAI------------------------------
>Red
-----------------------THRYCANAFKLHRLPIPRPGEVL--------------
>Red
--------------------------------------------------TNGIGKSTAL
>Yellow
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
>Green
KGTVGSILDRKDETKTQAI-----------------------------------------
>Green
-------------TKKQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQDQICKKI
Desired output:
>Blue
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
>Red
CIKKCPFGALSIVNLPSNLEKETTHRYCAXAFKLHRLPIPRPGEVL----TNGIGKSTAL
>Yellow
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
>Green
KGTVGSILDRKDETKXQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQDQICKKI
jan : Please test the perl solution offered by @JC. Then accept it as well (you can accept multiple answers) if all looks well.