Construction of a phylogentic tree for thousands of bacterial genomes
1
0
Entering edit mode
5 months ago
biobiu ▴ 150

Hi, I wish to build a phylogenetic tree for a thousand microbial genomes. The diversity should be high. Can any one recommend on steps or a pipeline to do so? I was thinking about 16s tree but open to any other suggestion. Thanks

Phylogenomics • 494 views
ADD COMMENT
1
Entering edit mode
5 months ago
Mensur Dlakic ★ 28k

There is already a tree that includes 100,000+ bacterial species.

https://gtdb.ecogenomic.org/

I am not sure what exactly you are trying to achieve, but it is for practical purposes impossible to look at, or to publish, a tree that has more than a few hundred branches. It is also practically impossible to calculate this tree without taking some shortcuts, which means using FastTree and hash-based trees as suggested or other approximations.

My suggestion to you is be sure you know what you are doing at what you are trying to achieve, as this is a non-trivial task that can take many months to complete and still not be appreciated either by general public or by reviewers specifically. I think one can easily make a diverse bacterial tree with 150-200 entries, and they should be done with concatenated single-marker genes rather than 16S rRNA (see the exact procedure at that link above).

ADD COMMENT
0
Entering edit mode

Thank you for the reply. The goal is that to show the presence of a specific protein family across the diversity of genomes. I was thinking to use mash based methods but I'm afraid that the diversity might be too high. So thought to use 16S, but as you say, it contains several steps which I want to make sure doing correctly.

ADD REPLY
0
Entering edit mode

How would you show "the presence of a specific protein family across the diversity of genomes" by using 16S rRNA?

If I were a reviewer, you would convince me just fine that a given protein family is widely distributed by showing its presence in 100-200 well-chosen genomes as you would with 1000 genomes. The difference is that the former is easier to execute and to actually inspect the tree.

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6