Best method for phylogeny on bacterial genes
1
0
Entering edit mode
4.3 years ago
vujex ▴ 10

Hello!

I have approximately bacterial 40 (non conserved) genes in which I am interested, I want to build a phylogenetic tree per gene (so 40 trees) and later I want to (try) take a consensus tree of that. All of the genes are around 200-500bp. I have installed raxml, iqtree and megax, and I can work with all of them. But, because I don't have a lot of theoretical background on phylogeny I was wondering if anyone has any suggestions or could help me by explaining or sending me relevant theory. Of course I have looked on google and in the manuals of the software I listed but I couldn't really find what I was looking for.

Basically my questions are:

Is there a "best" way to build phylogenetic trees based on one bacterial gene?

&

Is there a software that allows you to use all 40 genes as input and generates just one tree from that?

Thanks a lot!

phylogeny raxml megax iqtree • 1.6k views
ADD COMMENT
1
Entering edit mode

I can recommend:

  1. To concatenate all alignments (40 alignments) into a super matrix then to use the super matrix as input for a phylogenetic tool.

  2. You can generate 40 separate trees and later you can make a final consensus tree. (IQ-TREE has an option for this)

Importantly,

There may be long branch attraction in your species. Therefore, you have to handle this;

I recommend you to select a subsutitional model for this. For the model;

  1. IQ-TREE has PMSF model, C10-C60 models)
  2. Phylobayes cat model
ADD REPLY
0
Entering edit mode

Thanks! I have DNA data so don't think I can use PMSF. I did however use GTR made 40 trees and took a consensus using IQ-TREE command (iqtree -con mytrees -minsup 0.5). The result looks dodgy and not biologically meaningful but this could be caused by the dataset.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
4.3 years ago
Lesley Sitter ▴ 610

Several methods I've seen so far.

Method 1: Gene array tree

Concat all the genes into one sequence... this is an oldschool trick and only works for genes that have SNPs and one or two indels, but not genes that have vastly different sizes

Method 2: Summary tree

You can build individual trees for each gene, for example using PhyML or something. Then use all the trees you generated (the 40) to create a summary tree from the Summtree package of Dendropy for example. This just summarizes the values from the individual trees

Method 3: Clustering

If you build the 40 gene trees, there are also various clustering algorithms that determine distance between trees to come to a conclusion. For example just found this TreeClust tool (i haven't actually used it, just pointing it out as an example) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221068

Method 4: Network

Because your genes are bacterial, HGT might be a factor, i'm not sure. HGT introduces the extra layer of absence/presence of genes which can completely skew the previous three methods (although the extent depends on the case). One method i heard a lot at ECCB 2018 was using Phylogenetic Networks. These don't treat the tree as a fixed one directional thing, but also consider branches rejoining (where a gene or loci from one species moves to a very distant other one, through ICE's, plasmids etc)... I have not found a good tool myself, but if you consider doing this, i think it would be nice to add it as an answer to your own question for future readers

ADD COMMENT

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6