Hi there,
I've been trying to run blastn locally on metabarcoding sequences of eukaryotes and am running into errors both in command line and using system2 in R.
I have downloaded and unzipped the entire nt_euk library from NCBI, resulting in a folder called "nt_euk" containing a bunch of file types (.nhr
, .nin
, .nnd
, .nni
, .nog
, .nsq
) for each chunk (e.g., nt_euk.01
, nt_euk.02
, etc.) of the reference library database, as well as files for the whole database (nt_euk.nal
, nt_euk.ndb
, nt_euk.
nt_euk.nos
, nt_eul.not
, nt_euk.ntf
, nt_euk.nto
). The folder also contains taxonomy files (taxdb.btd
, taxdb.bti
, taxonomy4blast.sqlite3
).
I am trying to implement this in my bioinformatics pipeline in R using system2() to run command line functions from blast+ on unassigned ASVs from my samples. The code looks like this:
blast.f6 <- c('qseqid', 'sseqid', 'sscinames', 'scomnames', 'sskingdoms', 'pident', 'qcovs')
blastn <- "C:/Program Files/NCBI/blast-2.15.0+/bin/blastn.exe"
ntdb <- "data/nt_euk"
input <- "data/results/18S-Comeau_ASV_sequences.fasta"
blast.out <-
system2(command = blastn,
args = c('-db', ntdb,
'-num_threads', '10',
'-outfmt', sprintf('"6 %s"', paste(collapse = ' ', blast.f6)),
'-perc_identity','.99',
'-max_target_seqs', '1',
'-query', input,
'-out', 'data/results/18S_blast.txt'),
wait = TRUE,
stdout = TRUE
)
This results in an error:
Warning message:
In system2(command = blastn, args = c("-db", ntdb, "-num_threads", :
running command '"C:/Program Files/NCBI/blast-2.15.0+/bin/blastn.exe"` ... `had status 2
I tried to run blastn in the Terminal directly and I also get an error:
blastn -query data/results/18S-Comeau_ASV_sequences.fasta -db data/nt_euk
BLAST Database error: No alias or index file found for nucleotide database
I thought the alias/index file is nt_euk.nal
, which is in my directory. So I'm not sure what exactly is the issue here and all my Google searching has lead me to dead ends. Any insights or solutions would be much appreciated!
What do you see if you
cat nt_euk.nal
? Do the number of pieces mentioned in that file match with what you locally have?I am seeing
'cat' is not recognized as an internal or external command, operable program or batch file.
Also,
nt_euk.nal
is just one file. There are othernt_euk
files in my directory with different extensions (.ndb
,.nos
,.not
,.ntf
,.nto
).The
-db data/nt_euk
switch means that in your current directory you have a subdirectory calleddata
, and all thent_euk
files are in that directory. If that's not the case, it will throw this error.It may be a good idea to give a whole path, e.g.
-db C:/data/nt_euk
or whatever the actual path is. As always, it is not recommended to have space characters in directory names if it can be avoided.Yes, in my actual script I use the whole path to nt_euk. I was just showing
C:/data/nt_euk
as shorthand for the full path. There are no space characters in my directory name.