I have ~600 protein sequences from different species( same protein family) for which I wanted to contruct Maximum likelihhod with 1000 bootstrap. But in MEGA GUI (version 11.0.11), it's took me about 1 month to contruct it. There is an option given in MEGA as MEGA CC for command line but it is also very slow.
Is there a way I could force MEGA command line to speed it up.
Is there a specific reason you need to use MEGA? Maybe try another software which might be faster (e.g. PhyML, RaxML, IQ-TREE). Also, do you really need 1000 BS? Wouldn't 100 be good enough?
I used Raxml for a trial fasta file of about only 33 sequences as:
raxmlHPC-PTHREADS-AVX -f a -T 80 -m PROTGAMMAWAG -n test1 -s bittergourd_33_tree_input_aligned.fas -p 54321 -N 10 -x 12345
I am still figuring it out whether the script I run was correct or not as I first wanted to see whether Raxml is working on my system or not.
and the above script took 22hours to complete even after I specified thread -T to 80.
More threads is not always better and can slow things down. Try using 4-8 cores/threads instead.
That seems extremely slow to me. The only reason I can think of is that your computer doesn't reach 80 cores/threads. It is normally not good to use more threads than are available on the machine.
Do you have access to a HPC or cloud computing service? If so, you could parallelise almost every step of
RAxML
. ML tree search can run 20+ at the same time rather than in sequence and manually find the best tree from the scores, which is usually the most time consuming part in my experience. Once you have the ML tree, the bootstrapping can easily be split up too.Also, what are you using that you have 80 threads?