Question

Trouble Running MrBayes with GPU(s)

0

Entering edit mode

2.4 years ago

CRL-CHAR • 0

Hi everyone,

I have been attempting to run MrBayes on my university's cluster that has NVIDIA GPUs. I compiled MrBayes with the BEAGLE library according to the manual in order to take advantage of running my analysis using a GPU, and the HPC team verified that it was installed and compiled correctly on the cluster.

My amino acid dataset consists of 179 taxa and 90350 sites, and I am running a partitioned analysis in MrBayes. When I run my dataset using a GPU, the memory instantly fills up right after the model partitioning step at the beginning of running the program.

I was wondering if anyone has had any luck running a large dataset in MrBayes with GPU(s)? I can run my dataset with CPUs, but it takes about 26 days for it to complete its run.

Additional MrBayes details: ngen = 1000000, nruns = 4, burnin = 1000, samplefreq = 1000, and nchains = 4

I appreciate any and all help that may be given. If any more info needs to be given, I'm happy to share more.

phylogenetics MrBayes Bayesian GPU inference • 1.5k views

ADD COMMENT • link updated 2.4 years ago by Mensur Dlakic ★ 28k • written 2.4 years ago by CRL-CHAR • 0

score 1 · Answer 1 · 2022-08-10

1

Entering edit mode

2.4 years ago

Mensur Dlakic ★ 28k

Don't know about your GPU's memory, but that sounds like too big of a dataset. I have an 8 Gb GPU, and it works for about 150 taxa and ~3000 residues. It doesn't have enough memory for 150 taxa and ~10000 residues. Your numbers are far greater, so I suspect it won't work regardless of what GPU you have.

When compiled with BEAGLE and ran with multiple CPUs (I have 40), the smaller MrBayes analysis I mentioned above runs only 2-3x slower than with GPU.

PS That burn-in of 1000 seems awfully small. In general, one wants to discard as many generations as it takes for the sampling to stabilize, which most definitely will not happen with 4 chains after 1000 generations. Common burn-ins are 10-25% of the total run, which in your case would be at least 100000 generations.

ADD COMMENT • link 2.4 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hi Mensur! Thank you for your response and helpful tips. Our cluster is currently undergoing its annual maintenance. As soon as it is back online, I will take a look at the estimated time running my dataset on the compiled version of MrBayes with BEAGLE with multiple CPUs.

Also, thank you for correcting my burn-in. I will definitely change that in my Bayes block.

ADD REPLY • link 2.4 years ago by CRL-CHAR • 0

0

Entering edit mode

Hi Mensur,

I was able to troubleshoot on my university's cluster, and I was able to run a MrBayes job using the MPI version with the BEAGLE library. In my Bayes block I used the following commands per the manual:

set usebeagle=yes beagledevice=cpu beagleprecision=double; set beaglescaling=dynamic beaglesse=yes;

My job is actually running slower than if I do not utilize the BEAGLE lib. Do you change any of these settings when you have run MrBayes on CPUs with BEAGLE?

I appreciate your help!

ADD REPLY • link 2.4 years ago by CRL-CHAR • 0

0

Entering edit mode

The settings look fine. You need to ask for a larger number of threads, as each MPI job will most likely want to run multi-threaded. On my 40-thread computers, all threads are used even when I run only 8 parallel MPI jobs.

I t is possible that BEAGLE may have been compiled without SSE support. Alternatively, some clusters have old nodes without SSE, so you may have been unlucky that your job was assigned to one of them. It should run much faster with BEAGLE - at least 5-10 times.

ADD REPLY • link 2.4 years ago by Mensur Dlakic ★ 28k