Failing to run tblastn
1
0
Entering edit mode
3.6 years ago
langziv ▴ 70

Hello.

I'm trying to run tblastn and I get an error message. I'm posting the commands I gave and the error message I got, with hopes anyone would be able to provide some tips.

The commands:

export BLASTDB="/.../biodb/BLAST/Proteins2/nr"
module load blast/blast-2.10.0

Then I opened python and ran the following commands:

import subprocess

file = /path/to/fasta_file.fa
db_path = '/.../biodb/BLAST/Proteins2/nr'
output_path = '/output/path'

command = f'tblastn -query {file} -db {db_path} -max_hsps 1 -max_target_seqs 20 -num_threads 10 -evalue 1e-5' \
          f'-out {output_path + file[:-2]}txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend ' \
          f'qlen length sstart send slen evalue mismatch gapopen bitscore stitle"'

subprocess.run(command, stdout=subprocess.PIPE, shell=True)

The error message:

BLAST Database error: No alias or index file found for nucleotide database [/bioseq/biodb/BLAST/Proteins2/nr] in search path [/powerapps/share/centos7/ncbi-blast/ncbi-blast-2.10.0+/blastdb:/bioseq/biodb/BLAST/Proteins2/nr:]

Thanks!

tblstn blast • 2.0k views
ADD COMMENT
0
Entering edit mode

You have one dot too many in your path to the DB.

file = /path/to/fasta_file.fa
db_path = '/.../biodb/BLAST/Proteins2/nr'
output_path = '/output/path'
ADD REPLY
0
Entering edit mode

You mean in db_path = '/.../biodb/BLAST/Proteins2/nr'? That's not the actual path. The actual path doesn't have dots.

ADD REPLY
0
Entering edit mode

what do you have in your folder "/.../biodb/BLAST/Proteins2/"? Apparently it can't find the files for the DB.

ADD REPLY
0
Entering edit mode

That's where the protein database is. I can't understand what's wrong from the error message.

ADD REPLY
0
Entering edit mode

can you show the results of ls -l /.../biodb/BLAST/Proteins2/?

Did you create the DB yourself or downloaded from NCBI? if so can you post the command you use?

Are you sure you're using a nucleotide DB? tblastn search translated nucleotide databases using a protein query.

ADD REPLY
0
Entering edit mode

Here's a sample of the command ls -l /.../biodb/BLAST/Proteins2/ output: enter image description here

The database was downloaded by our IT unit (I posted a question to them too, but getting an answer from them might take awhile).

From what you wrote I'm thinking that maybe I'm using the wrong blast. My PI guided me to use tblastn, but maybe I didn't fully understood her. What I wish to do is align DNA sequences in a fasta format to a protein database in order to get gene annotations for the sequences.

ADD REPLY
4
Entering edit mode
3.6 years ago
Assa Yeroslaviz ★ 1.9k

You're using a protein DB. you need a nucleotide one. which is usually named nt

the nt DB has this suffixes

nr.pal
nr.pdb
nr.pos
nr.pot
nr.ptf
nr.pto
ADD COMMENT
2
Entering edit mode

If I'm not mistaken, for searching a nucleotide sequence (DNA) against a protein data base you'll need blastx.

You can aslo use tblastx for that, but than you still need yourself to have a nucleotide DB, which is than translated in the process.

ADD REPLY
0
Entering edit mode

Shouldn't there be some protein amino acids sequences involved? Should I first translate the DNA sequences I got to amino acids sequences and then run it against the nucleotides database?

ADD REPLY
0
Entering edit mode

Great. Thanks a lot!

ADD REPLY
0
Entering edit mode

As Assa Yeroslaviz commented, The correct blast for my needs is blatx, since I need to run blast of nucleotides again a proteins database.

ADD REPLY
1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6