Hello.
I'm trying to run tblastn and I get an error message. I'm posting the commands I gave and the error message I got, with hopes anyone would be able to provide some tips.
The commands:
export BLASTDB="/.../biodb/BLAST/Proteins2/nr"
module load blast/blast-2.10.0
Then I opened python and ran the following commands:
import subprocess
file = /path/to/fasta_file.fa
db_path = '/.../biodb/BLAST/Proteins2/nr'
output_path = '/output/path'
command = f'tblastn -query {file} -db {db_path} -max_hsps 1 -max_target_seqs 20 -num_threads 10 -evalue 1e-5' \
f'-out {output_path + file[:-2]}txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend ' \
f'qlen length sstart send slen evalue mismatch gapopen bitscore stitle"'
subprocess.run(command, stdout=subprocess.PIPE, shell=True)
The error message:
BLAST Database error: No alias or index file found for nucleotide database [/bioseq/biodb/BLAST/Proteins2/nr] in search path [/powerapps/share/centos7/ncbi-blast/ncbi-blast-2.10.0+/blastdb:/bioseq/biodb/BLAST/Proteins2/nr:]
Thanks!
You have one dot too many in your path to the DB.
You mean in
db_path = '/.../biodb/BLAST/Proteins2/nr'
? That's not the actual path. The actual path doesn't have dots.what do you have in your folder
"/.../biodb/BLAST/Proteins2/"
? Apparently it can't find the files for the DB.That's where the protein database is. I can't understand what's wrong from the error message.
can you show the results of
ls -l /.../biodb/BLAST/Proteins2/
?Did you create the DB yourself or downloaded from NCBI? if so can you post the command you use?
Are you sure you're using a nucleotide DB?
tblastn
search translated nucleotide databases using a protein query.Here's a sample of the command
ls -l /.../biodb/BLAST/Proteins2/
output:The database was downloaded by our IT unit (I posted a question to them too, but getting an answer from them might take awhile).
From what you wrote I'm thinking that maybe I'm using the wrong blast. My PI guided me to use tblastn, but maybe I didn't fully understood her. What I wish to do is align DNA sequences in a fasta format to a protein database in order to get gene annotations for the sequences.