Hi, I have a problem specific to the EnsEMBL Compara database, which is how to retrieve the species tree that was used to run TreeBest in the Compara pipeline. I am basing this question on the version 93.
In short:
Does anyone know how to proceed to extract the EnsEMBL species tree used by TreeBest?
Detailed question:
My problem is that in the description of the protein trees pipeline, it says that:
The species tree is based on the NCBI taxonomy tree (subject to some modifications depending on new datasets).
So I am unsure whether the tree that we can download manually on the species tree page is the one including the modifications. Is it the case?
In any case, I am interested in fetching it automatically. I am more at ease with SQL than object-oriented Perl, but I tried the API:
#!/usr/bin/env perl
use warnings;
use strict;
use Bio::EnsEMBL::Registry;
# Auto-configuration
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous',
-port => 5306);
my $species_tree_adaptor = Bio::EnsEMBL::Registry->get_adaptor(
'Multi', 'compara', 'SpeciesTree');
#Bio::EnsEMBL::Compara::DBSQL::SpeciesTreeAdaptor
my $species_trees = $species_tree_adaptor->fetch_all();
foreach my $tree (@{$species_trees}) {
print $tree->toString(), "\n";
}
This code fetches something, but I think it's just some ID and label from the species_tree_root
table, but I would like a newick output...
Below, I tried using directly the SpeciesTree
class, but the create_species_tree
does not work without any argument,
and anyway, I don't see any method of this class to write a newick tree. I should maybe use SpeciesTreeNode->newick_format()
, but I don't know how to get such an instance...
use Bio::EnsEMBL::Compara::Utils::SpeciesTree;
## include all available species from genome_db by default
my $species_tree = Bio::EnsEMBL::Compara::Utils::SpeciesTree->create_species_tree();
#print $species_tree->newick_format();
Additionnally, I am interested in the $species_tree->ultrametrize_from_timetree()
method.
Many thanks.
It seems clear to me that the tree provided on GitHub is the one used by Ensembl. As far as I know, the modifications of the NCBI tree concern the inclusion of new species.