Bootstrap Using Paup* Is So Slow, Why?
2
2
Entering edit mode
14.1 years ago
Ijessie ▴ 70

Hi,

Yesterday I tried to perform a bootstrap analysis with 1000 replicates using PAUP* on Windows 7. My dataset consists of 107 coding sequences, which has 675 letters each. Frist I used jModeltest to select the best-fit model, and then appended the following PAUP block to DATA block, and excecuted it using PAUP concole.

BEGIN PAUP;           
log start=yes file=test.log replace=yes;
set criterion=likelihood;

lset base=(0.3037 0.2083 0.2478) nst=6  rmat=(1.4346 9.4669 0.8990 0.3065 7.1675) rates=gamma shape=0.6360 ncat=4 pinvar=0;

bootstrap nreps=1000 bseed=12345 conlevel=60 treefile=bootstrap.tre replace=yes format=NEXUS search=HEURISTIC;

savetrees brlens=yes savebootp=both maxdecimals=0 from=1 to=1 file=boots.tre replace=yes;
log stop;
END;   

But it turns out that 17 hours passed, it's in Bootstrap replicate 6 stage. Each run takes more than 3 hours.

So, any wrong in the commands? Or is it just simply a fact that when the number of sequences is above a certain naumber, such as 100, PAUP* is not recommended to use?

Thank in advance for your answer and help!

phylogenetics • 7.4k views
ADD COMMENT
4
Entering edit mode
14.1 years ago
Paulo Nuin ★ 3.7k

It sounds about right the performance you're getting. The main bottleneck here is the number of taxa, the length of the protein doesn't have a huge impact on the performance.

PAUP is well known to be slow on bootstrap calculations. In some tests I run PAUP was at least 2-3 times slower than Phylip. My suggestion would be to use Phylip to calculate the bootstrap, but you might no be able to set the same evolutionary model on it.

On Phylip you need to use SEQBOOT to generate bootstrap matrices, then PROML to calculate the likelihood of all these matrices and then CONSENSE to finally get a consensus tree for all the results from PROML.

ADD COMMENT
0
Entering edit mode

Thanks. I will give Phylip a try.

When I read papers, I've noticed that most of them use PAUP* to construct a ML tree with 1000 bootstraps. In some cases, their dataset is large, too. So, I'm just curious how they did it...

ADD REPLY
0
Entering edit mode

They will use PAUP too, but the performance is identical to what you're getting. In some tests I made (some years ago) to generate 1000 bootstrap replicates of a 100 taxa matrix, it took more than two weeks.

ADD REPLY
2
Entering edit mode
14.1 years ago

I have never used PAUP, but done similar things with Phylip. Bootstrapping should increase time linearly. However 107 proteins 675 aminoacid long is very computational intensive and I would not be surprised if it takes 3 hours each.

Usually only a conserved region is used for phylogenetic analysis. Are you sure 675 nucleotides are conserved across your 107 proteins?

ADD COMMENT
0
Entering edit mode

Yes, I think at most of positions they're pretty conserved. Besides, I exclude positions with gaps. I'm not sure about whether I need to exclude those gaps, but I read an article entitiled "Phylogeny for the faint of heart: a tutorial", and it said: The general rule is to delete all positions with gaps plus any adjacent, ambiguously aligned positions.

When you conduct such an analysis, how would you do with the data after multiple alignment?

ADD REPLY
0
Entering edit mode

Don't worry about deleting positions with gap, they are not used by PAUP.

ADD REPLY

Login before adding your answer.

Traffic: 2671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6