question: how to convert downloaded tar.gz and .gz.md5 files to a database blastn can work with?
I downloaded the nr database to do some command line blast with the ncbi-blast package with the following command: ‘perl update_blastdb.pl —decompress nr’
the database download completes, but I got an error saying that ‘decompress’ was not found and I got a lot of tar.gz and tar.gz.md5 files (files indexed 0 to 55).
I tried running blast with blastn -query dummy.fasta -db $path/to/nr_db -num_threads 8 -out dummy.out
and it failed, saying “BLAST Database error: No alias or index file found for nucleotide database”
Then I tried to uncompress the downloaded database files (since with gunzip -cd *.tar.gz | tar xvf - And this yielded a log file > 6 GB within 5 minutes of running and I killed the job.
I’m just trying to run blast and I’m not following how to go from several gzipped tar files to a data since a gzipped database is inappropriate.
thanks!
It is odd that you were not able to decompress the downloaded files. Yes they need to be all decompressed and need to stay in one directory.
nr
database is close to 300-400G worth of data so the uncompress job will run for a while and will need enough space to be available. You should not need to capture logs unless your download has somehow been corrupt and that is generating error messages. In that case you will need to redownload the data.thanks for your response. is there a good way to decompress the downloaded files without using the --decompress flag?
Also, your error saying "decompress not found" comes from copy-pasting the wrong unicode character in
‘perl update_blastdb.pl —decompress nr’
which maybe has fallen victim to some auto-correction.—
may look like a dash, but it is what some autocorrection (word, email client) makes from--
. So, just retype the command next time, then it should work.