Question

IQ-TREE stuck in selecting a substitution model with ModelFinder

1

Entering edit mode

2.9 years ago

Begonia_pavonina ▴ 210

I am running IQ-TREE on a dataset of 160 samples with on average 2.1 millions reads on a genome of 310 Mb.

However, the software has been computing for one week now, and progress in the standard output are very slow.

The last segment of the standard output are:

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 101.349 seconds Perform fast likelihood tree search using GTR+I+G model... Estimate model parameters (epsilon = 5.000) Perform nearest neighbor interchange... Optimizing NNI: done in 37959.5 secs using 99.58% CPU Estimate model parameters (epsilon = 1.000)
1. Initial log-likelihood: -28771040.770
2. Current log-likelihood: -28771036.989
3. Current log-likelihood: -28771033.633
4. Current log-likelihood: -28771030.718
5. Current log-likelihood: -28771028.173
6. Current log-likelihood: -28771025.940
7. Current log-likelihood: -28771023.982
8. Current log-likelihood: -28771022.265
9. Current log-likelihood: -28771020.758
10. Current log-likelihood: -28771019.437
11. Current log-likelihood: -28771018.277
12. Current log-likelihood: -28771017.259 Optimal log-likelihood: -28771016.360 Rate parameters:  A-C: 0.99469  A-G: 2.56577  A-T: 1.06937  C-G: 0.76029  C-T: 2.56126  G-T: 1.00000 Base frequencies:  A: 0.254  C: 0.244  G: 0.245  T: 0.256 Proportion of invariable sites:
0.000 Gamma shape alpha: 3.383 Parameters optimization took 12 rounds (9957.005 sec) Time for fast ML tree search: 101193.759 seconds

NOTE: ModelFinder requires 214666 MB RAM! ModelFinder will test up to 484 DNA models (sample size: 4006362) ...  No. Model         -LnL      df  AIC          AICc         BIC   1  GTR+F         28847670.292 339
57696018.584 57696018.642 57700494.535   2  GTR+F+I       28847652.995 340 57695985.990 57695986.047 57700475.144   3  GTR+F+G4     
28771009.772 340 57542699.545 57542699.602 57547188.699   4  GTR+F+I+G4    28771010.076 341 57542702.152 57542702.210 57547204.509  5  GTR+F+R2      28587738.467 341 57176158.933 57176158.992
57180661.291

Would anyone know if ModelFinder will effectively test 484 DNA models?

It took days to output these five tests. Is there any shortcut for the estimation of the right substitution model?

substitution ModelFInder model IQ-TREE • 3.1k views

ADD COMMENT • link updated 2.0 years ago by rohitsatyam102 ▴ 940 • written 2.9 years ago by Begonia_pavonina ▴ 210

score 4 · Accepted Answer · 2022-07-15

4

Entering edit mode

2.9 years ago

Mensur Dlakic ★ 29k

I am running IQ-TREE on a dataset of 160 samples with on average 2.1 millions reads on a genome of 310 Mb.

Not quite sure what you mean based on the above explanation, but log-likelihood values tell me this is a monstrous alignment. Nothing will be done fast with large alignments.

If on top of that you have insufficient RAM and the program needs to swap a lot, this could take months. I may be able to offer better advice if you explain in greater detail what your exact alignment is, but either way it sounds like you need to rethink your strategy. That means starting from a smaller alignment, or using a generic substitution model with an alignment you have.

ADD COMMENT • link 2.9 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Thank you Mensur Dlakic, it is my first time using IQ-TREE, so I was unaware of the software requirements. I will consult the documentation and try to find a way to subset the dataset.

To provide some explanations:

I am working on an Hyb-Seq dataset of 160 samples that capture 1239 loci in the target genome. I have aligned the raw reads on a reference genome, and group called variants. The output is a GVCF file that include the 160 samples.

I have converted the GVCF file in a PHYLIP file with the tool vcf2phylip. Then I have used the PHYLIP file as input for IQ-TREE.

I am not sure yet how to manage the subsets of the VCF files yet, but it might be indicated in the IQ-TREE documentation.

ADD REPLY • link 2.9 years ago by Begonia_pavonina ▴ 210

0

Entering edit mode

Hi Did IQtree really tested 480 DNA models? I am also new to this so was wondering if u figured this out and have suggestions to finish it faster.

ADD REPLY • link 2.1 years ago by rohitsatyam102 ▴ 940

0

Entering edit mode

Answering my own query!

No the IQtree will not test all 480 DNA models I believe having seen the log printed on the screen:

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 1.449 seconds
Perform fast likelihood tree search using GTR+I+G model...
Estimate model parameters (epsilon = 5.000)
Perform nearest neighbor interchange...
Optimizing NNI: done in 9.52888 secs using 99.95% CPU
Estimate model parameters (epsilon = 1.000)
1. Initial log-likelihood: -429571.697
Optimal log-likelihood: -429571.351
Rate parameters:  A-C: 1.17401  A-G: 7.09460  A-T: 1.35196  C-G: 0.67433  C-T: 9.96279  G-T: 1.00000
Base frequencies:  A: 0.244  C: 0.286  G: 0.257  T: 0.213
Proportion of invariable sites: 0.362
Gamma shape alpha: 1.080
Parameters optimization took 1 rounds (6.237 sec)
Time for fast ML tree search: 127.481 seconds

NOTE: ModelFinder requires 1138 MB RAM!
ModelFinder will test up to 484 DNA models (sample size: 7690) ...
 No. Model         -LnL         df  AIC          AICc         BIC
  1  GTR+F         496388.282   1437 995650.563   996311.602   1005634.374
  2  GTR+F+I       456572.422   1438 916020.843   916682.908   926011.602
  3  GTR+F+G4      432090.617   1438 867057.234   867719.299   877047.992
  4  GTR+F+I+G4    429571.336   1439 862020.673   862683.764   872018.378
  5  GTR+F+R2      442523.960   1439 887925.921   888589.012   897923.627
  6  GTR+F+R3      434148.792   1441 871179.584   871844.732   881191.186
  7  GTR+F+R4      431235.580   1443 865357.160   866024.368   875382.656
  8  GTR+F+R5      429874.475   1445 862638.951   863308.224   872678.343
  9  GTR+F+R6      429546.890   1447 861987.780   862659.121   872041.067
 10  GTR+F+R7      429376.452   1449 861650.904   862324.318   871718.087
 11  GTR+F+R8      429285.610   1451 861473.220   862148.709   871554.298
 12  GTR+F+R9      429046.704   1453 860999.407   861676.977   871094.381
 13  GTR+F+R10     428989.534   1455 860889.069   861568.722   870997.938
 14  GTR+F+I+R2    433703.982   1440 870287.963   870952.083   880292.617
 15  GTR+F+I+R3    430168.735   1442 863221.470   863887.648   873240.019
 16  GTR+F+I+R4    429144.492   1444 861176.985   861845.225   871209.429
 17  GTR+F+I+R5    428878.289   1446 860648.577   861318.884   870694.917
 18  GTR+F+I+R6    428728.022   1448 860352.043   861024.420   870412.278
 19  GTR+F+I+R7    428688.557   1450 860277.114   860951.565   870351.245
 20  GTR+F+I+R8    428685.942   1452 860275.884   860952.413   870363.909
 41  SYM+I+R7      429160.114   1447 861214.228   861885.569   871267.515
 63  TVM+F+I+R7    429109.493   1449 861116.986   861790.400   871184.169
 85  TVMe+I+R7     429488.826   1446 861869.652   862539.958   871915.991
107  TIM3+F+I+R7   429061.488   1448 861018.977   861691.353   871079.212
129  TIM3e+I+R7    429462.358   1445 861814.715   862483.988   871854.107
151  TIM2+F+I+R7   428812.781   1448 860521.562   861193.939   870581.797
173  TIM2e+I+R7    429183.124   1445 861256.248   861925.521   871295.640
195  TIM+F+I+R7    429110.702   1448 861117.405   861789.782   871177.640
217  TIMe+I+R7     429416.395   1445 861722.789   862392.062   871762.181
239  TPM3u+F+I+R7  429485.360   1447 861864.720   862536.062   871918.008
261  TPM3+I+R7     429786.037   1444 862460.074   863128.315   872492.519
283  TPM2u+F+I+R7  429238.527   1447 861371.054   862042.396   871424.342
305  TPM2+I+R7     429511.559   1444 861911.118   862579.359   871943.563
327  K3Pu+F+I+R7   429539.404   1447 861972.808   862644.149   872026.095
349  K3P+I+R7      429740.678   1444 862369.357   863037.597   872401.801
371  TN+F+I+R7     429162.113   1447 861218.226   861889.568   871271.514
393  TNe+I+R7      429462.645   1444 861813.289   862481.529   871845.734
415  HKY+F+I+R7    429591.438   1446 862074.877   862745.183   872121.216
437  K2P+I+R7      429786.033   1443 862458.066   863125.274   872483.562
459  F81+F+I+R7    470412.434   1445 943714.867   944384.140   953754.259
481  JC+I+R7       469638.848   1442 942161.697   942827.874   952180.246
Akaike Information Criterion:           GTR+F+I+R8
Corrected Akaike Information Criterion: GTR+F+I+R7
Bayesian Information Criterion:         GTR+F+I+R7
Best-fit model: GTR+F+I+R7 chosen according to BIC

The job finished by the end of the day I raised my concern in the comment section

ADD REPLY • link 2.0 years ago by rohitsatyam102 ▴ 940