Entering edit mode
2.5 years ago
emi
•
0
I am checking for homologs of a specific gene in a representative tree of bacteria. I was given a list of representative bacteria to use, however, the list was from GTDB. Is there a way for me to convert the GTDB to a taxid database I can use to run blastp, or is there a better way for me to search for the presence of homologs in each of these species?
I think there is a specific reason why the list was from GTDB instead of NCBI. GTDB is a curated taxonomy database while NCBI is not. In other words, the taxonomic lineage of a genome in GTDB does not necessarily match the taxonomic lineage in NCBI.
Can you give an example of the GTDB list you have?
RS_GCF_005380545.1 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia flexneri
Here's an example row from the list. Thanks so much for your help!
The GCF_005380545.1 is the NCBI Assembly accession number of that Escherichia flexneri (in NCBI is identified as Escherichia coli!) you have in the GTDB list. You can use these accession numbers to download the protein fasta file (.faa) of each genome in that list to create your database with
makeblastdb