Question

Align and trim thousands of genes

0

Entering edit mode

3.9 years ago

sankadinesh ▴ 20

Dear All, I have 20000 ASVs obtained from multiple studies with gene sizes ranging from 200 to 320. I would like to do multiple alignment and trim unaligned portion just like MEGA. Please suggest me a suitable software and protol to do it. Thanks

Regards, Dinesh

alignment gene sequencing next-gen • 945 views

ADD COMMENT • link updated 3.9 years ago by Mensur Dlakic ★ 28k • written 3.9 years ago by sankadinesh ▴ 20

0

Entering edit mode

Two trimming program options included in this answer: A: How to clean multiple protein sequences alignement in order to make a phylogenic

ADD REPLY • link 3.9 years ago by GenoMax 147k

score 0 · Answer 1 · 2021-01-01

Generally speaking this is a straightforward task, but we lack information from you regarding the details. That's why my suggestions will be general, but it should be a good enough starting point for you to adapt to your specific needs.

Here is a simple C-shell script that will do this (bash script would be fairly similar):

foreach i ( *.fasta )
mafft --maxiterate 1000 --localpair --thread 8 --nomemsave $i > $i:r.afa
trimal -in $i:r.afa -out $i:r.trimmed.afa -gt 0.5
end

This assumes that all your starting files are in the same directory and have a .fasta extension. Alignments are done with mafft in comprehensive mode (slowest), but you may want to choose a different program (clustalw, clustalo, muscle, etc). After that each alignment (ending in .afa) is trimmed with trimal such that all columns with more than half gapped positions are removed (resulting in .trimmed.afa files). This may or may not be what you want, so you should look up other available trimming option.

The whole script probably need not be longer than 3-4 lines like above, though you will probably want to adjust the exact commands. Lastly, I suggest you consider how to speed up the whole thing by utilizing most or all of your CPUs, and at that point it becomes a waiting game.