Question

Refseq genomic BLAST database

0

Entering edit mode

7.3 years ago

chland • 0

Hi everyone, I'd like to use a preformatted bacterial database on NCBI to run blastn searches and used the update_blastdb script to download the refseq_genomic database. All of the downloaded files are listed as refseq_genomic.## where ## is 04, or 07, 195 etc. I would like to know how I can find which of these files is my bacteria of interest, such as Streptococcus, Shewanella etc? Thanks in advance

NCBI BLAST database • 3.5k views

ADD COMMENT • link updated 7.3 years ago by GenoMax 153k • written 7.3 years ago by chland • 0

0

Entering edit mode

thanks, So I would have to point to the directory as refseq_genomic.*.tar.gz for both blastn searches and extracting sequence using blastdbcmd?

ADD REPLY • link 7.3 years ago by chland • 0

0

Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

ADD REPLY • link 7.3 years ago by GenoMax 153k

score 0 · Answer 1 · 2018-04-24

0

Entering edit mode

7.3 years ago

lieven.sterck 15k

I don't think that is even possible.

In any case what you want (or need to do) is to use all parts for your blast searches. They all together form a single DB, you have to use it as -db refseq_genomic in your blastcmdline (so omitting the .## in the name)

ADD COMMENT • link 7.3 years ago by lieven.sterck 15k

score 0 · Answer 2 · 2018-04-24

Use the answer provided by @5heikki in this thread to download the genomes: How to download COMPLETE bacterial genomes from NCBI based on list of names? Then index them and blast away.

You can find the names in this file (I am only looking for reference genomes).

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
grep "reference" assembly_summary_refseq.txt | awk -F '\t' '{print $8}' > names