Is there a phylogenetic tree with edge lengths that encompasses all of the bacterial genomes in e.g. RefSeq or Genbank at the NCBI?
Basically, an extension of the NCBI Taxonomy to include phylogenetic distances/edge lengths.
Looking around, I found the following related but unsatisfactory resources:
The NCBI tool CommonTree outputs a tree given a set of input taxonomy ids, but the resulting tree does not have edge lengths, so it is just the NCBI Taxonomy in a different format.
The NCBI Taxonomy FTP does not have any tree-like files or anything with phylogenetic distance-like information.
phyloT has a tree with distances, but it requires subscription or payment for use, and I really need something that is up-to-date with the rapidly evolving NCBI repository (RefSeq, Genbank, Taxonomy).
The Genome Taxonomy Database (GTDB) has bacterial and archaeal trees, and the nodes & tips of the tree seem to have RefSeq/Genbank assembly ids, but GTDB uses different taxonomy assignments from NCBI and thus a different phylogenetic tree.
I don't think such tree exists. However, I can say with great confidence that even RefSeq bacteria includes a very large number of falsely annotated genomes. I've arrived independently (by my own methodology) largely at the same conclusions as e.g. this repo, although my resolution is higher 😎