Generally speaking this is a straightforward task, but we lack information from you regarding the details. That's why my suggestions will be general, but it should be a good enough starting point for you to adapt to your specific needs.
Here is a simple C-shell script that will do this (bash script would be fairly similar):
foreach i ( *.fasta )
mafft --maxiterate 1000 --localpair --thread 8 --nomemsave $i > $i:r.afa
trimal -in $i:r.afa -out $i:r.trimmed.afa -gt 0.5
end
This assumes that all your starting files are in the same directory and have a .fasta
extension. Alignments are done with mafft
in comprehensive mode (slowest), but you may want to choose a different program (clustalw, clustalo, muscle, etc). After that each alignment (ending in .afa
) is trimmed with trimal
such that all columns with more than half gapped positions are removed (resulting in .trimmed.afa
files). This may or may not be what you want, so you should look up other available trimming option.
The whole script probably need not be longer than 3-4 lines like above, though you will probably want to adjust the exact commands. Lastly, I suggest you consider how to speed up the whole thing by utilizing most or all of your CPUs, and at that point it becomes a waiting game.
Two trimming program options included in this answer: A: How to clean multiple protein sequences alignement in order to make a phylogenic