Hello,
I have a set of genomes I downloaded from NCBI. I would like to make a reference phylogenetic tree where only they appear.
Instead of aligning them or using mash distance to make my own tree, is there a way I can simply provide the genomes or taxa to GTDB and get a tree back from it?
Nicely done. Yet pruning the tree will not suffice for the user-specific entries that may not already be in the tree.
Good point. I think it's possible to "place" tips via IQ-TREE2's contrained tree search option.
Here the starting tree is fixed to the pruned tree, and will infer the whole tree to include the "new" user-specific sequences.
I think it is highly likely that most (or all) user-specified NCBI genomes can be found in GTDB. The reference GTDB tree encompasses approximately 80,000 bacterial species, each represented by a single genome (representative genome for species). However, the complete GTDB database comprises around 400,000 genomes, including both representative and non-representative genomes. To access the full genome list, including NCBI identifiers, the user can download the metadata file from GTDB at https://data.gtdb.ecogenomic.org/releases/latest/. This file provides details for all 400,000 genomes present in the GTDB database.