Question

How to speed-up to contruct phylogenetic tree

0

Entering edit mode

17 months ago

Nelo ▴ 20

I have ~600 protein sequences from different species( same protein family) for which I wanted to contruct Maximum likelihhod with 1000 bootstrap. But in MEGA GUI (version 11.0.11), it's took me about 1 month to contruct it. There is an option given in MEGA as MEGA CC for command line but it is also very slow.

Is there a way I could force MEGA command line to speed it up.

MEGA tree phylogenetic • 3.9k views

ADD COMMENT • link updated 14 months ago by Joe 22k • written 17 months ago by Nelo ▴ 20

1

Entering edit mode

Is there a specific reason you need to use MEGA? Maybe try another software which might be faster (e.g. PhyML, RaxML, IQ-TREE). Also, do you really need 1000 BS? Wouldn't 100 be good enough?

ADD REPLY • link 17 months ago by liorglic ★ 1.5k

0

Entering edit mode

I used Raxml for a trial fasta file of about only 33 sequences as: raxmlHPC-PTHREADS-AVX -f a -T 80 -m PROTGAMMAWAG -n test1 -s bittergourd_33_tree_input_aligned.fas -p 54321 -N 10 -x 12345

I am still figuring it out whether the script I run was correct or not as I first wanted to see whether Raxml is working on my system or not.

and the above script took 22hours to complete even after I specified thread -T to 80.

ADD REPLY • link 17 months ago by Nelo ▴ 20

1

Entering edit mode

More threads is not always better and can slow things down. Try using 4-8 cores/threads instead.

ADD REPLY • link 14 months ago by Joe 22k

0

Entering edit mode

That seems extremely slow to me. The only reason I can think of is that your computer doesn't reach 80 cores/threads. It is normally not good to use more threads than are available on the machine.

ADD REPLY • link 14 months ago by Michael 55k

0

Entering edit mode

Do you have access to a HPC or cloud computing service? If so, you could parallelise almost every step of RAxML. ML tree search can run 20+ at the same time rather than in sequence and manually find the best tree from the scores, which is usually the most time consuming part in my experience. Once you have the ML tree, the bootstrapping can easily be split up too.

Also, what are you using that you have 80 threads?

ADD REPLY • link 17 months ago by dthorbur ★ 3.0k

score 3 · Answer 1 · 2023-12-11

3

Entering edit mode

17 months ago

Mensur Dlakic ★ 29k

It depends on how long your protein is. Generally speaking, it shouldn't take weeks or months. Trees with more than ~100 branches are very difficult to print and look at in any meaningful way. I would ask myself if 600 proteins are really needed. For example, proteins that are identical or near-identical (say, > 95% identity) need not be included, as one knows they will be right next to each other in the tree.

I suggest you try command-line programs:

ADD COMMENT • link 17 months ago by Mensur Dlakic ★ 29k

0

Entering edit mode

The average length of fasta sequences is ~300aa length. I can't exclude any sequences as I have to provide them nomenclature aslo on the basis of thier position in the clade And for the command line option you mentioned, as I told before I have used Raxml which is also kind of slow even after specifying the maximum threads available in my system.

ADD REPLY • link 17 months ago by Nelo ▴ 20

score 2 · Answer 2 · 2023-12-11

I agree with the other recommendations and I wouldn't recommend MEGA either, but would like to add that most programs, including MEGA, have multithreading options which you might want to set to the maximum number of threads available on your machine. For MEGA, the multithreading parameter is set in the options dialogue.

If at all possible, try a different program. IQ-Tree has several advantages over MEGA: proper checkpoints, broad automatic model test (although you might want to restrict the search space for your case), ultrafast bootstrap algorithm, and automatic determination of the optimal number of threads.

Another aspect worth mentioning is reducing your alignment by trimming using programs such as trimAl or clipKIT. It may be worth trying clipping, however, the effect on total run time may be minuscule.

score 1 · Answer 3 · 2023-12-11

1

Entering edit mode

17 months ago

Joe 22k

If you really really need speed, you could also look at fasttree. AFAIK it has an ML or ML-like function.

IQ Tree should be more than fast enough though I would think.

If this isn't the case, I would perhaps reassess your foundational assumptions:

Why do you need 1000 bootstraps (a 'good' or at least unambiguous tree will converge long before that)?
Is the alignment you're giving the tree constructor any good? Bad, highly gappy alignments generally take longer for these algorithms to determine a sensible tree.

ADD COMMENT • link 17 months ago by Joe 22k

0

Entering edit mode

I will also try this fasttree

ADD REPLY • link 17 months ago by Nelo ▴ 20

score 0 · Answer 4 · 2024-03-12

0

Entering edit mode

14 months ago

bioramg • 0

Hi please use the following command to speed up:

$ raxmlHPC-PTHREADS -T 80 -m GTRGAMMA -p 12345 -x 123 -#1000 -f a -s input_filename -n output_filename

ADD COMMENT • link 14 months ago by bioramg • 0