BioPython - qblast() timeout every time, running from python shell
1
0
Entering edit mode
5 months ago
Ryan • 0

I am a new user to BioPython. I am not at all a bioinformatics person or scientist. I'm doing this as a project to learn some new programming concepts and to pick up a little bit of information on the subject matter along the way.

I have set up my virtual env and installed BioPython via pip inside the venv.

I am following the instructions in the BioPython cookbook/tutorial here.

First, I import the module:

>>>from Bio.Blast import NCBIWWW

Then I set my email, using: >>>NCBIWWW.email=myemail@domain.com, as the guidelines specify. Finally, I run the query: >>> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")

Every time I try to send this query, it hangs a really long time. Eventually I get the following warning:

/home/user/python/covid_dna/env/lib/python3.10/site-packages/Bio/Blast/NCBIWWW.py:281: BiopythonWarning: BLAST request BHFD53MJ016 is taking longer than 10 minutes, consider re-issuing it
  warnings.warn(

I expected the query to return a result to result_handle. Instead, either the query fails, or I send keyboard interrupt with ctrl+c as it has hung for over 30 minutes.

Am I using incorrect syntax? Have I failed to set some variable, such as the url_base correctly?

I have run the same BLAST query using the web interface, which returns an expected BLAST search result: web_blast_result

qblast biopython blast ncbi • 652 views
ADD COMMENT
1
Entering edit mode
5 months ago
Mensur Dlakic ★ 28k

Generally speaking, remote BLAST jobs are not allowed to run for the same amount of time as web-based jobs. It is a feature, not a bug. If you are not able to run a blastn command using the -remote switch from a command-line interface, it is not going to work from python either.

I suggest you run the same command but use pdb as a database instead of nt. If that works, it would mean that nothing is wrong with your script, but the job is simply too time-consuming for remote execution. If you search through BLAST pages there will be an explanation why this limit is imposed. The bottom-line is that poorly formatted or just plain long jobs are submitted for BLASTing all the time, and the system has to protect itself from wasting time on individual jobs that may not amount to anything or take an unreasonable amount of resources.

ADD COMMENT
1
Entering edit mode

There's also a new core_nt that should be faster. It's nt without eukaryotic chromosomes so is substantially faster. See https://ncbiinsights.ncbi.nlm.nih.gov/2024/07/18/new-blast-core-nucleotide-database/

ADD REPLY
0
Entering edit mode

Yes I saw this, core_nt is not currently supported by biopython, so I will try to install the cli tool and call it using python subprocess.

ADD REPLY
0
Entering edit mode

I will try to install and use the command line interface tool to test from NCBI's blast cli documentation here. I asked my friend to run the same script on his computer and he got an SSL certificate error. The error messaging suggested that the URL was not able to be accessed via python. I'll start from the cli tool and see.

ADD REPLY
0
Entering edit mode

I think it was an outage, DNS issue, or other NCBI server side issue. The same blast search completed successfully this morning form within python with nt as the database. Thanks for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 4548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6