Entering edit mode
7 months ago
pablo
▴
310
Dear,
I work on 16S pacbio dataset. I used DADA2
to get my taxonomy, and now I would like to use raxml-ng
to get a phylogeny. I first ran DECIPHER
to get my sequences alignment and it goes well :
./raxml-ng --check --msa ./output/alignment.fasta --model GTR+G
(...)
[00:00:00] Reading alignment from file:./output/alignment.fasta
[00:00:00] Loaded alignment with 18299 taxa and 10649 sites
Alignment comprises 1 partitions and 10649 sites
Partition 0: noname
Model: GTR+FO+G4m
Alignment sites: 10649
Gaps: 86.32 %
Invariant sites: 45.54 %
Alignment can be successfully read by RAxML-NG.
It is a very time-consuming to get my boostraped tree, using :
./raxml-ng --all --msa ./output/alignment.fasta --model GTR+G --prefix test --seed 2 --threads 128 --bs-metric fbp,tbe
That's why I try to go by first, get a consensus tree from 100 trees and then, compute the bootstrap values :
#consensus
./raxml-ng --msa ./output/alignment.fasta --model GTR+G --prefix test --tree pars{50},rand{50} --threads 128 --start
./raxml-ng --consense MRE --prefix consensus --tree test.raxml.startTree --threads 128
#bootstrap
./raxml-ng --msa ./output/alignment.fasta --model GTR+G --prefix T7 --seed 2 --threads 128 --bootstrap
The bootstrap step is still runing. I will use the --support
option to map these values on the tree if it ends on time.
Is it a correct way to do?
Best
How did you decide to use 128 threads? Have you run
raxml-ng
with the--parse
flag? If not, I would recommend doing so and try setting the number of threads to the value recommended. Using more threads than is recommended may actually cause the analysis to be slower than it otherwise would be.I decided on this arbitrarily, thinking that increasing the number of threads would improve computing speed. That command gives me :
* Recommended number of threads / MPI processes: 9
. I'll check that. Thanks.I finally reduce my input dataset, from 19k ASVs to 3600 ASvs because it still gets stuck .
raxml-ng
runs well on it, with the good number of threads recommended by--parse
. ThanksReducing my input dataset is the key. It took about 2 days, with 20 starting trees as defaut. I need to set 100 bootstrap replicates, but I suppose it is not enough (since the default value is 1000) ? I also found this topic : https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples#:~:text=Example%20(Table%20V%2C%20ibid.,95%25%20sure%2C%20850%20replications. that says "to be 90% sure that the relative CI length discrepancy does not exceed 10%, 700 replications are sufficient in half of the cases, and to be 95% sure, 850 replications."