How to search for similar sequences in all public metagenomes?
2
0
Entering edit mode
3.8 years ago
Dgg32 ▴ 90

I have some newly sequenced genomes. Now I would like to know where else can this genome be found around the world and in which habitats. My idea is to blast all my genomic sequences (not only 16S) against some kind of metagenome database. Preferably assembled contigs.

So far, mg-rast and EBI have the raw data. But they don't have any utilities for custom sequence comparisons. To download their sequences, which are on 100TB magnitude, requires an AWS Snowmobile, and that is just for one Blast.

So does anyone have good ideas as to how to do this please?

metagenome • 875 views
ADD COMMENT
0
Entering edit mode
3.8 years ago
GenoMax 148k

Looks like your best bet is to download the protein dataset from EBI metagenomics. You could then use DIAMOND or blastp do the searches locally. Looks like they update it every six months.

You could blast representative sequences against SRA using the --remote option for blast+ but that is not assembled contigs.

ADD COMMENT
0
Entering edit mode
3.8 years ago
Mensur Dlakic ★ 28k

GTDB toolkit should be good enough to classify most of your sequences. In some cases you may be able to get the order, family or species designations. That may help to focus your efforts on a more narrow group of microbes.

ADD COMMENT
0
Entering edit mode

But I want to know where my specific microbe occurs, not the taxonomy.

ADD REPLY
0
Entering edit mode

The idea of using GTDB is to create a custom database based on the taxonomy of your strain. For example, if from the GTDB toolkit came out that your bugs are affiliate to Pseudomonadaceae, then there is no need to download the whole EBI or NCBI database, but just the genomes and MAGs sharing the same taxonomic affiliation of your genome...

ADD REPLY
0
Entering edit mode

Main requirement stated by OP is this:

Now I would like to know where else can this genome be found around the world and in which habitats.

Only way to do that is to identify submitted samples that have hits and check where they were collected from.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6