Hi everyone,
I have been attempting to run MrBayes on my university's cluster that has NVIDIA GPUs. I compiled MrBayes with the BEAGLE library according to the manual in order to take advantage of running my analysis using a GPU, and the HPC team verified that it was installed and compiled correctly on the cluster.
My amino acid dataset consists of 179 taxa and 90350 sites, and I am running a partitioned analysis in MrBayes. When I run my dataset using a GPU, the memory instantly fills up right after the model partitioning step at the beginning of running the program.
I was wondering if anyone has had any luck running a large dataset in MrBayes with GPU(s)? I can run my dataset with CPUs, but it takes about 26 days for it to complete its run.
Additional MrBayes details: ngen = 1000000, nruns = 4, burnin = 1000, samplefreq = 1000, and nchains = 4
I appreciate any and all help that may be given. If any more info needs to be given, I'm happy to share more.
Hi Mensur! Thank you for your response and helpful tips. Our cluster is currently undergoing its annual maintenance. As soon as it is back online, I will take a look at the estimated time running my dataset on the compiled version of MrBayes with BEAGLE with multiple CPUs.
Also, thank you for correcting my burn-in. I will definitely change that in my Bayes block.
Hi Mensur,
I was able to troubleshoot on my university's cluster, and I was able to run a MrBayes job using the MPI version with the BEAGLE library. In my Bayes block I used the following commands per the manual:
set usebeagle=yes beagledevice=cpu beagleprecision=double; set beaglescaling=dynamic beaglesse=yes;
My job is actually running slower than if I do not utilize the BEAGLE lib. Do you change any of these settings when you have run MrBayes on CPUs with BEAGLE?
I appreciate your help!
The settings look fine. You need to ask for a larger number of threads, as each MPI job will most likely want to run multi-threaded. On my 40-thread computers, all threads are used even when I run only 8 parallel MPI jobs.
I t is possible that BEAGLE may have been compiled without SSE support. Alternatively, some clusters have old nodes without SSE, so you may have been unlucky that your job was assigned to one of them. It should run much faster with BEAGLE - at least 5-10 times.