Long branch lengths, low bootstrap values and misalignments
1
0
Entering edit mode
10 months ago

Hello, I am building my first phylogeny. It is shallow, I am trying to see the relationship between multiple populations of the same species. I am using the phyluce pipeline, I tried-no-trim, edge-trim and internal trim alignments. Every run results in trees with low bootstrap values (ranging from 20-60) and the branch lengths are exceptionally long. I read that misalignments can lead to such results. But how do I fix this? What are the ways to rectify misalignments of sequences? I also tried muscle, but the results weren't great. Any help is greatly appreciated, thanks!

mafft muscle iqtree • 717 views
ADD COMMENT
0
Entering edit mode
10 months ago
inedraylig ▴ 70

There's a lot of information missing from the question: what kind of data are you using? How was it obtained, aligned and filtered? A pipeline needs to be adjusted for your data, there's not one-pipeline-fits-all.

Low bootstrap values are not indicative of anything wrong: they simply indicate low or conflicting signal. This is a typical outcome when different groups in a data-set are not far diverged, hence the low support at the nodes. Exceptionally long branches, however, can indicate long branch attraction or mishandling of missing data, among other reasons - but it's hard to tell. I'm not familiar with the phyluce pipeline, but know that it's designed for phylogenomcis of highly conserved elements. Highly conserved genomic regions, by definition, are less likely to be differentiated among populations, leading to low bootstrap values. If the data-set includes only several conserved contigs, personally, I would begin by aligning the regions with MAFFT and checking visually for misalignments, to get an idea of the quality of alignment and the extent of missing data. I would also construct a PCA on a data-set with little or no missing data, to see if the groups can be differentiated at all. These are a few ideas for exploring your data to understand possible caveats.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. I am using a dataset of UCE loci obtained from multiple individuals from many populations of the same species and an outgroup. We sequenced our own data, I used mafft to align with the no-trim option. I viewed the alignment and it seems like there is a lot of missing data. Do you have any suggestions for handling long branch attraction?

ADD REPLY
0
Entering edit mode

LBA is a complex problem, quite impossible to circumvent in a short online answer. Generally speaking, sequencing UCEs to differentiate populations is tricky - you might not have enough signal in the data. If you want to be able to justify your pipeline, I recommend diving into the literature and also exploring and understanding what's in your data. All of this is very normal for phylogenomic analysis, so don't be discouraged.

Short answer: filter for missing data, use partitioned analysis, remove individuals if needed.

Are the different loci concatenated together? This is something to avoid, different loci can have different substitution rates. instead look if specific loci are prone to missing data. One can construct a phylogenomic tree for different loci or different combinations of taxa and see if specific populations or loci are prone to LBA, or if removing one taxa changes the overall topology. Otherwise, filtering for missing data to remove unreliably aligned regions or sites that evolve faster than others can help, as well as carefully considering the substitution model in inference. for the latter, I recommend using iqtree2. It's quite fast, the documentation is good and it runs a number of tests for site composition and model-testing that are very informative. A partitioned analysis may give you more robust results that concatenating all sites together.

ADD REPLY
1
Entering edit mode

Thank you so much for your time! Your suggestions are giving me a sense of direction :)

ADD REPLY

Login before adding your answer.

Traffic: 3528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6