How To Speed Up The Beast Tool (Bayesian Evolutionary Analysis By Sampling Trees)
1
2
Entering edit mode
11.7 years ago
polangxin ▴ 80

I'm using BEAST to run a data of 172 taxa of 90 000 sites. first time i run 10 000 000 MCMCs, takes me 22 days!! but the log file shows by tracer, it it not get to the stable state. enter image description here

if it runs 100 000 000 MCMCs, i think it will get batter?

but can BEAST use muliple cores to run ? there is an option: -threads, but it still use only 1 core. i know the it can run parallel, like 10 runs of 10 000 000, but i think the result is still not what i want.

please someone help me.

• 9.2k views
ADD COMMENT
2
Entering edit mode

I have sent an email to the corresponding author using the 'request help via email' link above

ADD REPLY
0
Entering edit mode

Very cool new feature!

ADD REPLY
6
Entering edit mode
11.7 years ago
qiyunzhu ▴ 430

Two points:

(1) BEAST can run on multiple cores. Just install a library called Beagle, which helps you to utilize the full power of BEAST or MrBayes or whatever on your super machine. You can find Beagle on BEAST's website. Installation is not too straight-forward. If you have problems, let me know.

(2) Judging by your trace file, your run is not even approaching a stable platform. It's usually because the evolutionary model you chose isn't optimal. How come you alignment has so many (90,000) sites? Is it composed of multiple genes? If so, you should definitely divide them into multiple partitions, then do proper model test, then assign one model to each partition. Working on a large single partition (like what you have) can give you much trouble. Partitioning will solve your problem. Keep me updated and I will be glad to debug with you.

ADD COMMENT
0
Entering edit mode

Thank you!
(1) i've install Beagle on Ubuntu, using 40 cores and it works great! But i try and failed to install Beagle on another server which is SUSE Linux Enterprise Server 10 SP2(has 70 nodes, every has 16-64 cores).

(2)Re-sequencing 170 meningitis(bacterial), and mapping them to reference, call SNPs. Then concatenate all SNPs(unique site) to get a MSA file.
Help: how to select proper model or how to do model test? now i use GTR GAMMA, Use tip dates(the year of sample collected), Strict clock, Constant Size.

(3) Would you suggest some books or urls that i can learn about this? (i've read some totural and Guide of BEAST, but still not understand how to really use this software) Thank you !

ADD REPLY
1
Entering edit mode

Hello polangxin, Glad to hear Beagle works for your case. The compiling process varies among different Linux distributions. I never used SUSE, but I can try to debug for you. What screen output did you see when you try to compile?

2) Concerning "Model test", you can refer to these pages as a start: http://en.wikipedia.org/wiki/Models_of_DNA_evolution http://en.wikipedia.org/wiki/Substitution_model http://www.rhizobia.co.nz/phylogenetics/modeltest

In brief, you can download a software called jModelTest: http://code.google.com/p/jmodeltest2/

And simply follow the directions.

This is a computational expensive step, so your multi-core servers will definitely be helpful. Then use the best model for BEAST. If you find BEAST do not support a particular model, please let me know, and I can try to write a code to support this model.

Note that the best model from jModelTest is not always the best model for Bayesian analysis. GTR is usually the best, because it is the most parameter-rich model. However, in BEAST analyses, you sometimes will find GTR running crazy. Then you need to replace GTR with some simpler models and try again, to see if the traces are smoother.

For Bayesian analyses, the standard way of model test is very complicated, variable and still being explored. You can see this instruction:

http://beast.bio.ed.ac.uk/Model_selection

But trust me, you will suffer a lot if you really want to carry on these analyses, unless you are a super expert. So jModelTest is typically a best work-around for model tests.

3) As for books, well, I don't know if there are very helpful and organized books. I simply refer to the BEAST tutorials and MrBayes user manual. You can also post your questions on BEAST's google group and see if the authors may answer you.

Good luck!

ADD REPLY
0
Entering edit mode

Thanks a lot! I'll read urls you suggest, and try them. BEAST is great, but so complicated, i'll do my best!

ADD REPLY
0
Entering edit mode

Dear qiyunzhu:

i tried jModelTest, and it work great! run 88 models, it suggest me the ::Best Models:: BIC GTR+G

That meas Bayesion and GRT+GAMMA model, am i right?

but still, i have a question when setting with BEAUti:

i) Tips: set each sequence with the data it collected

ii)Sites: GTR Estimated Gamma Off

iii)Clocks: Estimate check

iv)Trees: Coalescent: Constant Size

QUESTIONS when setting Priors: clock.rate do not have a initial setting, and i tried all of (1/x, Uniform, Exponential, Normal, Lognormal, Gamma, Inverse Gamma), but i do not know what does this mean? and how to chooese this parameter? or how to compare them?

Thanks a lot!

ADD REPLY
0
Entering edit mode

Hi polangxin! BIC is a standard of deciding the best model. It's based on Bayesian inference. But it is irrelevant to the subsequent phylogenetic reconstruction. No matter you use RAxML or BEAST, you can choose the BIC model. Therefore, in your case, you may use GTR + Gamma model. I think your model settings are fine. For the clock rate, you can simply set a random number. It does not usually matter (unless you know your clock rate). I usually set exponential with mean = 1.0. There was a post in the BEAST google group discussing this issue, but I cannot recall where it is. You may do a search.

ADD REPLY

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6