I am currently using Clustal to align about orthologous proteins for about 50 species, but would like to use MUSCLE instead.
Since I am examining thousands of proteins, I use the linux binary of MUSCLE.
the Problem
MUSCLE appears to not accept "empty" input sequences. That is, Protein X is not present in, say, the bear, and this is shown as lines with dashes/gaps:
Input
>ProteinX_human
ABC
>ProteinX_cat
A-C
>ProteinX_bear
---
Output
the MUSCLE output alignment file will not include >ProteinX_bear.
Question
How do I go about to ensure that MUSCLE will output the alignment with >ProteinX_bear just showing dashes/gaps/- throughout its alignment? I cannot find any information about how to achieve this in the MUSCLE manual, although I am new to bioinformatic and could be bixblind, so to speak. It is very important for my downstream analysis that species lacking AAs are included in the alignment output.
thankyou for your help, and I hope my question is clear.
Why don't you post-process the output of MUSCLE instead and add back the missing sequence? It sounds safer and you can easily use it with any other aligner, should you want to replace MUSCLE in the future.