How can I run multiple sequence alignment for a large number of proteins (~10k)
1
0
Entering edit mode
23 months ago
O.rka ▴ 740

What's the preferred method for running multiple sequence alignment on such a large amount of protein sequences? I'm trying something fairly experimental and running MSA would be really helpful in the approach.

I usually use muscle and noticed there is a super5 module that helps with this: https://drive5.com/muscle5/manual/cmd_super5.html

How can I adjust the parameters to help out with running out of memory? Alternatively, is there another tool that's better suited for this? Basically, I want a fasta MSA for the output.

muscle msa protein multiple alignment sequence • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi, take a look here : MAFFT

ADD REPLY
0
Entering edit mode

I should have mentioned that some of the sequences are long. There are a few that are ~70k. I've trimmed them out and it's working now but I'll keep MAFFT in the back of my in case this fails.

ADD REPLY
2
Entering edit mode
23 months ago
Mensur Dlakic ★ 28k

As advertised in their paper, FAMSA is meant specifically for aligning huge protein families. Clustal Omega should work as well.

ADD COMMENT
0
Entering edit mode

Wow, FAMSA is really fast AND memory efficient. Nice find thank you! Do you usually use single linkage, upgma, or nj?

ADD REPLY
1
Entering edit mode

I have always used single linkage but with proteins that were not as long as yours. If speed and memory are not problematic for your computer setup, that should be the best choice.

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6