I have some newly sequenced genomes. Now I would like to know where else can this genome be found around the world and in which habitats. My idea is to blast all my genomic sequences (not only 16S) against some kind of metagenome database. Preferably assembled contigs.
So far, mg-rast and EBI have the raw data. But they don't have any utilities for custom sequence comparisons. To download their sequences, which are on 100TB magnitude, requires an AWS Snowmobile, and that is just for one Blast.
So does anyone have good ideas as to how to do this please?
But I want to know where my specific microbe occurs, not the taxonomy.
The idea of using GTDB is to create a custom database based on the taxonomy of your strain. For example, if from the GTDB toolkit came out that your bugs are affiliate to Pseudomonadaceae, then there is no need to download the whole EBI or NCBI database, but just the genomes and MAGs sharing the same taxonomic affiliation of your genome...
Main requirement stated by OP is this:
Only way to do that is to identify submitted samples that have hits and check where they were collected from.