Question

Extremely low bootstrap values for phylogenetic tree

1

Entering edit mode

4.0 years ago

alicecol ▴ 20

Hi all,

I have constructed a phylogenetic tree for a set of 427 species using aligned and concatenated sequences for two genes (18s and COI). The species set is very taxonomically diverse and includes multiple different phyla.

My workflow is as follows:

Align sequences with MUSCLE
Trim alignments with trimAl (using heuristic determination of best trimming method)
Concatenate gene alignments with geneStitcher.py
Perform maximum likelihood tree inference + parametric bootstrap with iqtree2

The topology of the tree generally looks okay, but the node bootstrap values are extremely low (1 - 4%) across the entire tree which is concerning. I know alignment quality can impact bootstrap support, but had hoped that running the alignments through trimAl would reduce any issues, especially as the dataset is too large to manually edit.

Are there other potential sources of error that would be causing such low bootstrap values?

bootstrap phylogenetic tree • 3.7k views

ADD COMMENT • link updated 4.0 years ago by Michael 55k • written 4.0 years ago by alicecol ▴ 20

1

Entering edit mode

There could be multiple reasons, for example that the two genes convey conflicting phylogenetic information, why did you choose exactly those two (ribosomal RNA gene + mitochondrial protein coding gene)?

Some more ideas:

alignments are on DNA sequences, I guess, I think you chose only the CDS of the COI gene?
how did you choose your substitution model? Run ModelTest, I guess that part is difficult for combinations of protein coding and RNA coding genes. For example you couldn't choose a codon model here.
did you visually inspect your alignments, plot a PCA? I expect the COI alignment to be much worse than 18S
- I would exchange step 2 and 3
- replace iqtree2 with RAxML
- replace trimAI with gblocks or better leave this step out completely if computationally feasible
- run the phylogenetic reconstruction separately for each gene, choosing adequate substitution models (or do a amino-acids based MSA for COI) in each step and calculate a consensus tree

In conclusion, I think the choice or combination of genes is the problem here. If you want to do a multi-gene phylogeny I would start from a protein-level alignment of single-copy orthologs or stick with the 18S sequence only, or both.

ADD REPLY • link 4.0 years ago by Michael 55k

0

Entering edit mode

Thanks for the advice! I decided to use both 18S and COI to provide for resolution among closely and more distantly related species since there is considerable taxonomic diversity in my species set.

The substitution model for each of the gene sub-alignments was selected by iqtree2 which uses ModelFinder. I inspected the COI and 18S alignments individually and ran the phylogenetic reconstruction for each gene and in both cases the COI alignment/tree seem better than the 18S. The COI tree has bootstrap values of around 12%, while the 18S bootstraps are around 1.5%.

ADD REPLY • link 4.0 years ago by alicecol ▴ 20

0

Entering edit mode

Ok, so now I'd say things have slightly improved, but it also becomes more difficult to provide more advice without seeing the actual tree and input data. Do all nodes have low support values, for example or are there some that have good support also. You could also experiment with many different tools and parameters but it is easy to get absorbed by the many options and combinations. Some more ideas:

your species might be so divers that this is expected (but we don't know what those are), it might make sense to restrict your selection to related taxa
hope you did use only the transcript/exon sequences of 18S genes, correct?
my best bet is to try to improve the alignment, try for example SSU-align, gblocks, ClustalO
exclude some outlier species that may mess up the tree by long-branch attraction
change the tools for each step in the pipeline, including the phylogenetic reconstruction method

To make this easier I would create a smaller subset of say 10-50 species (including some good and bad branches) to experiment and also allowing for inspecting the alignments visually.

ADD REPLY • link 4.0 years ago by Michael 55k