How to use mafft to do large scale sequence alignment?
1
0
Entering edit mode
5.2 years ago
sunyeping ▴ 110

I wish to use mafft to do sequence alignment on a large protein sequence dataset which contains over 100,000 sequences with average sequence length being 1000 residues. I guess I need to use a supercomputer.

Does anyone know how many CPU cores and how large memory does it need to run the alignment smoothly?

Can mafft estimate the time it needs to finish an alignment? by itself And what will be the estimated time to finish above alignment if enough computational resources is input?

alignment • 2.5k views
ADD COMMENT
1
Entering edit mode

Have you tried any configuration? Say, 32GB RAM + 12 cores? The speed at which you see results should tell you if you need to increase the speed. You could also start with a wall time of 48-72 hours and tweak it from there.

ADD REPLY
1
Entering edit mode

out of curiosity: how did you get to the dataset of 100,000 proteins?

does it need to be done with mafft or is any other aligner also fine?

ADD REPLY
0
Entering edit mode

any is fine. And what if: Assumed the similarity is very high, it should not be too hard. Can I do it on my notebook with 16 GB memory, but my virtual linux system has only 8 GB ?

ADD REPLY
0
Entering edit mode
5.2 years ago
Cupton ▴ 80

Throw out identical sequences first....

We made this tool years ago: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694821/ Not sure where the code is these days.

ADD COMMENT

Login before adding your answer.

Traffic: 2978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6