Question

Convert GTDB database to blast database

0

Entering edit mode

2.5 years ago

emi • 0

I am checking for homologs of a specific gene in a representative tree of bacteria. I was given a list of representative bacteria to use, however, the list was from GTDB. Is there a way for me to convert the GTDB to a taxid database I can use to run blastp, or is there a better way for me to search for the presence of homologs in each of these species?

GTDB blast • 1.5k views

ADD COMMENT • link updated 2.5 years ago by andres.firrincieli 3.8k • written 2.5 years ago by emi • 0

0

Entering edit mode

I was given a list of representative bacteria to use, however, the list was from GTDB.

I think there is a specific reason why the list was from GTDB instead of NCBI. GTDB is a curated taxonomy database while NCBI is not. In other words, the taxonomic lineage of a genome in GTDB does not necessarily match the taxonomic lineage in NCBI.

Is there a way for me to convert the GTDB to a taxid database

Can you give an example of the GTDB list you have?

ADD REPLY • link 2.5 years ago by andres.firrincieli 3.8k

0

Entering edit mode

RS_GCF_005380545.1 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia flexneri

Here's an example row from the list. Thanks so much for your help!

ADD REPLY • link 2.5 years ago by emi • 0

0

Entering edit mode

The GCF_005380545.1 is the NCBI Assembly accession number of that Escherichia flexneri (in NCBI is identified as Escherichia coli!) you have in the GTDB list. You can use these accession numbers to download the protein fasta file (.faa) of each genome in that list to create your database with makeblastdb

ADD REPLY • link 2.5 years ago by andres.firrincieli 3.8k

score 0 · Answer 1 · 2022-06-14

0

Entering edit mode

2.5 years ago

Mensur Dlakic ★ 28k

This could help:

https://gtdb.ecogenomic.org/tools

ADD COMMENT • link 2.5 years ago by Mensur Dlakic ★ 28k