How to speed-up to contruct phylogenetic tree
4
0
Entering edit mode
11 months ago
Nelo ▴ 20

I have ~600 protein sequences from different species( same protein family) for which I wanted to contruct Maximum likelihhod with 1000 bootstrap. But in MEGA GUI (version 11.0.11), it's took me about 1 month to contruct it. There is an option given in MEGA as MEGA CC for command line but it is also very slow.

Is there a way I could force MEGA command line to speed it up.

MEGA tree phylogenetic • 2.3k views
ADD COMMENT
1
Entering edit mode

Is there a specific reason you need to use MEGA? Maybe try another software which might be faster (e.g. PhyML, RaxML, IQ-TREE). Also, do you really need 1000 BS? Wouldn't 100 be good enough?

ADD REPLY
0
Entering edit mode

I used Raxml for a trial fasta file of about only 33 sequences as: raxmlHPC-PTHREADS-AVX -f a -T 80 -m PROTGAMMAWAG -n test1 -s bittergourd_33_tree_input_aligned.fas -p 54321 -N 10 -x 12345

I am still figuring it out whether the script I run was correct or not as I first wanted to see whether Raxml is working on my system or not.

and the above script took 22hours to complete even after I specified thread -T to 80.

ADD REPLY
1
Entering edit mode

More threads is not always better and can slow things down. Try using 4-8 cores/threads instead.

ADD REPLY
0
Entering edit mode

That seems extremely slow to me. The only reason I can think of is that your computer doesn't reach 80 cores/threads. It is normally not good to use more threads than are available on the machine.

ADD REPLY
0
Entering edit mode

Do you have access to a HPC or cloud computing service? If so, you could parallelise almost every step of RAxML. ML tree search can run 20+ at the same time rather than in sequence and manually find the best tree from the scores, which is usually the most time consuming part in my experience. Once you have the ML tree, the bootstrapping can easily be split up too.

Also, what are you using that you have 80 threads?

ADD REPLY
3
Entering edit mode
11 months ago
Mensur Dlakic ★ 28k

It depends on how long your protein is. Generally speaking, it shouldn't take weeks or months. Trees with more than ~100 branches are very difficult to print and look at in any meaningful way. I would ask myself if 600 proteins are really needed. For example, proteins that are identical or near-identical (say, > 95% identity) need not be included, as one knows they will be right next to each other in the tree.

I suggest you try command-line programs:

ADD COMMENT
0
Entering edit mode

The average length of fasta sequences is ~300aa length. I can't exclude any sequences as I have to provide them nomenclature aslo on the basis of thier position in the clade And for the command line option you mentioned, as I told before I have used Raxml which is also kind of slow even after specifying the maximum threads available in my system.

ADD REPLY
2
Entering edit mode
11 months ago
Michael 55k

I agree with the other recommendations and I wouldn't recommend MEGA either, but would like to add that most programs, including MEGA, have multithreading options which you might want to set to the maximum number of threads available on your machine. For MEGA, the multithreading parameter is set in the options dialogue.

If at all possible, try a different program. IQ-Tree has several advantages over MEGA: proper checkpoints, broad automatic model test (although you might want to restrict the search space for your case), ultrafast bootstrap algorithm, and automatic determination of the optimal number of threads.

Another aspect worth mentioning is reducing your alignment by trimming using programs such as trimAl or clipKIT. It may be worth trying clipping, however, the effect on total run time may be minuscule.

ADD COMMENT
0
Entering edit mode

I used MEGA CC (comand line option) but there is no option to specify threads in its --help manual . And default thread is 4

ADD REPLY
1
Entering edit mode
11 months ago
Joe 21k

If you really really need speed, you could also look at fasttree. AFAIK it has an ML or ML-like function.

IQ Tree should be more than fast enough though I would think.

If this isn't the case, I would perhaps reassess your foundational assumptions:

  1. Why do you need 1000 bootstraps (a 'good' or at least unambiguous tree will converge long before that)?
  2. Is the alignment you're giving the tree constructor any good? Bad, highly gappy alignments generally take longer for these algorithms to determine a sensible tree.
ADD COMMENT
0
Entering edit mode

I will also try this fasttree

ADD REPLY
0
Entering edit mode
8 months ago
bioramg • 0

Hi please use the following command to speed up:

$ raxmlHPC-PTHREADS -T 80 -m GTRGAMMA -p 12345 -x 123 -#1000 -f a -s input_filename -n output_filename

ADD COMMENT

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6