I am running IQ-TREE on a dataset of 160 samples with on average 2.1 millions reads on a genome of 310 Mb.
However, the software has been computing for one week now, and progress in the standard output are very slow.
The last segment of the standard output are:
Create initial parsimony tree by phylogenetic likelihood library (PLL)... 101.349 seconds Perform fast likelihood tree search using GTR+I+G model... Estimate model parameters (epsilon = 5.000) Perform nearest neighbor interchange... Optimizing NNI: done in 37959.5 secs using 99.58% CPU Estimate model parameters (epsilon = 1.000)
1. Initial log-likelihood: -28771040.770
2. Current log-likelihood: -28771036.989
3. Current log-likelihood: -28771033.633
4. Current log-likelihood: -28771030.718
5. Current log-likelihood: -28771028.173
6. Current log-likelihood: -28771025.940
7. Current log-likelihood: -28771023.982
8. Current log-likelihood: -28771022.265
9. Current log-likelihood: -28771020.758
10. Current log-likelihood: -28771019.437
11. Current log-likelihood: -28771018.277
12. Current log-likelihood: -28771017.259 Optimal log-likelihood: -28771016.360 Rate parameters: A-C: 0.99469 A-G: 2.56577 A-T: 1.06937 C-G: 0.76029 C-T: 2.56126 G-T: 1.00000 Base frequencies: A: 0.254 C: 0.244 G: 0.245 T: 0.256 Proportion of invariable sites:
0.000 Gamma shape alpha: 3.383 Parameters optimization took 12 rounds (9957.005 sec) Time for fast ML tree search: 101193.759 seconds
NOTE: ModelFinder requires 214666 MB RAM! ModelFinder will test up to 484 DNA models (sample size: 4006362) ... No. Model -LnL df AIC AICc BIC 1 GTR+F 28847670.292 339
57696018.584 57696018.642 57700494.535 2 GTR+F+I 28847652.995 340 57695985.990 57695986.047 57700475.144 3 GTR+F+G4
28771009.772 340 57542699.545 57542699.602 57547188.699 4 GTR+F+I+G4 28771010.076 341 57542702.152 57542702.210 57547204.509 5 GTR+F+R2 28587738.467 341 57176158.933 57176158.992
57180661.291
Would anyone know if ModelFinder will effectively test 484 DNA models?
It took days to output these five tests. Is there any shortcut for the estimation of the right substitution model?
Thank you Mensur Dlakic, it is my first time using IQ-TREE, so I was unaware of the software requirements. I will consult the documentation and try to find a way to subset the dataset.
To provide some explanations:
I am working on an Hyb-Seq dataset of 160 samples that capture 1239 loci in the target genome. I have aligned the raw reads on a reference genome, and group called variants. The output is a GVCF file that include the 160 samples.
I have converted the GVCF file in a PHYLIP file with the tool vcf2phylip. Then I have used the PHYLIP file as input for IQ-TREE.
I am not sure yet how to manage the subsets of the VCF files yet, but it might be indicated in the IQ-TREE documentation.
Hi Did IQtree really tested 480 DNA models? I am also new to this so was wondering if u figured this out and have suggestions to finish it faster.
Answering my own query!
No the IQtree will not test all 480 DNA models I believe having seen the log printed on the screen:
The job finished by the end of the day I raised my concern in the comment section