Ncbi Blast Failing For Unknown Reasons
2
3
Entering edit mode
13.1 years ago
Jesse J ▴ 150

I have a Python script that runs BioPython's Web Blast function. We're using large fasta files, so the script breaks these files into smaller files and then blasts them. Below is a partial query:

>gi|73485745|gb|AAJJ01000902.1|_30
CCATCTGGCCTGACCCAGATCGGCCTTTTATGGCATACTCGTACCGTAATAAA

The frustrating part is that some of the smaller files work and some don't, even though they appear to be the exact same format. Plus the failed file works when going through the NCBI Blast web page. Below is the error message I get when run from my script:

ValueError: Error message from NCBI: Cannot accept request, error code: 1

According to NCBI, an error code of 1 means bad query sequences or BLAST options. This is the function:

result_handle = NCBIWWW.qblast("blastn", "nr", fasta_string, megablast=MEGA_BLAST)

where MEGA_BLAST is a boolean. Anyone have any idea why it would fail? The input string, as far as I can tell, is fine. I have no idea why this is occuring.

UPDATE: This is a file that failed.

biopython blast • 16k views
ADD COMMENT
0
Entering edit mode

You should post one of the smaller files that fail to a location that we can download it from

ADD REPLY
0
Entering edit mode

So where did you find that command line? blastn doesn't work with nr.

ADD REPLY
0
Entering edit mode

You're wrong Michael, blastn DOES work with nr - see below. It is probably treated as an alias for nt, given the NCBI refer to it a "Nucleotide collection (nt/nr)" on the BLASTN website. It is surprising through as NR normally means the protein database.

ADD REPLY
0
Entering edit mode

You're wrong Michael, QBLAST with "blastn" DOES work with nr - see below. It is probably treated as an alias for nt, given the NCBI refer to it a "Nucleotide collection (nt/nr)" on the BLASTN website. It is surprising through as NR normally means the protein database.

ADD REPLY
0
Entering edit mode

I've updated the Biopython Tutorial to use "nt" rather than "nr" to avoid the confusion - thanks for flagging this Michael: https://github.com/biopython/biopython/commit/60fed13c350ab8e3f2e79b69d490b0701a1b2540

ADD REPLY
2
Entering edit mode
13.1 years ago
Niek De Klein ★ 2.6k

If you BLAST the sequences manually on the NCBI BLAST site they all get the following result: "No significant similarity found. For reasons why,click here."

Clicking there brings you to the FAQ: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ

Then on the FAQ you can find the error of no significant similarity found here: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#nohits

Which says:

Below are common reasons that a BLAST search results in the "No significant similarity 
found" message.

Short query sequences: Short alignments may have Expect values above the default 
threshold, which is 10 on most pages, and, therefore, are not displayed. Try increasing 
the Expect threshold (under 'Algorithm parameters'). Also, see the FAQ Submitting 
primers or other short sequences.

So one possibility is that your alignments are to short and that this error returns an error code 1.

EDIT:

I see that you're not only breaking up one file of fasta sequences into smaller files of complete fasta sequences, you are also chopping up the genes themselves

(e.g.)

From this example, 1, 2 and 5 can be found, the others cannot. Again, I don't know if "No significant similarity found. For reasons why,click here" in the NCBI web application gives an error(1) in the biopython blast, but that would be my safest bet. Try keeping the sequences intact.

ADD COMMENT
0
Entering edit mode

Because of short query sequences? No, that can't be it, considering I've tested the exact same program with a test fasta file, one that has some sequences that are only 3 characters in length, and it ran just fine.

ADD REPLY
2
Entering edit mode
13.1 years ago
Peter 6.0k

Thanks for sharing the problem input file. This is my testing with the latest Biopython,

from Bio import SeqIO
from Bio.Blast import NCBIWWW
for record in SeqIO.parse("permutations91.fa", "fasta"):
    print "%s length %i" % record.id, len(record))
    result_handle = NCBIWWW.qblast("blastn", "nr", record.format("fasta"), megablast=False)

This is a simple silly script which calls QBLAST but ignores the results. The output:

gi|73486063|gb|AAJJ01000584.1|_20 length 263
gi|73486063|gb|AAJJ01000584.1|_21 length 1247
...

No errors. I checked the last result and it looked like proper XML. Niek de Klein pointed out some other possible failure causes, but they don't seem to apply here. Potentially your issue is a simple network failure, and a try/except could be used to repeat the query?

However, if you are using this script on large FASTA files, you would be much better off downloading the NR database and standalone BLAST and running this locally.

ADD COMMENT
0
Entering edit mode

I had thought it could be a network problem, except the same program works as expected with a different fasta file. I think I will try the local blast though.

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6