Hello,
I have a few draft bacterial genomes that I would like to BLAST in entirety to find the closest genomes in the Genbank. Can anyone suggest a tool or a way to do that?
Many thanks,
Hello,
I have a few draft bacterial genomes that I would like to BLAST in entirety to find the closest genomes in the Genbank. Can anyone suggest a tool or a way to do that?
Many thanks,
You may want to look into tools that determine the similarity based on large-scale matching of k-mers, sketches, etc.
Yes, KMCP provides some prebuilt databases for genome searching, no Genbank but you can use GTDB. Here's the tutorial.
Sourmash provides databases for GTDB too, and old Genbank databases (2018) are also available.
Both tools work great, the biggest challenge is downloading the whole Genbank database. Building the database using a sketching algorithm is also fast.
You can use BlobTools to identify the taxonomy of your sequences and further use different tools to find your closest relative.
I have found a best and easy way to download all the reference genomes available in genbank and include your genomes into it.
Use PhyloPhlAn 3.0 and detail is here:
https://github.com/biobakery/biobakery/wiki/PhyloPhlAn-3.0:-Example-02:-Tree-of-life
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If you know the taxid you can find the closest whole genome using
gaas_ncbi_get_genome_tree.pl
from GAASI don't know taxid. All I have is contig sequences.
There is no such tool that align a draft genome against all genomes in Genbank. First you must identify to which taxonomic lineage (taxid) your draft genomes belong. To figure out the taxid get the 16S from the annotated genome or use the Type Strain Genome Server to find the closest type-strain