Is there a python utility or library that does the equivalent of blast+? I use the -remote option of blast+ because I am a hobbyist and I don't have the storage capacity to download the ncbi databases and find the -remote option to frequently hang without ever returning. There is also no way to monitor progress to find out where it is hanging. Queries that take a minute or so on the website can take 10 minutes or longer via blast+ command line -remote.
It is my understanding that biopython uses a locally installed blastp to access the ncbi databases. I am looking for an open source python based utility that uses the ncbi web api directly to do the equivalent of blast+ and that does not depend upon a locally installed blast+.
I can download the source for blastp and modify it to build a more robust implementation of the -remote option but I don't want to program in C++ which I believe it is written in. If I'm going to create an enhancement I would rather do it in a more popular language like python.
I'm not clear on why you cannot use the website then since it is fast and targeted for the 'hobbyist'- level you specify you prefer to work at? NCBI offers plenty of options for you to scale up or automate. One of which is to put some resources on the table yourself. However, I don't think you are considering the scale of the operation. NCBI is serving a lot of scientists via the various online avenues and aren't looking to get slammed. (Note the comment about the limits on public web services below.)
The options that exist already usually can be wrapped in Python, and you can translate the results for analysis via the Python ecosystem, like this example.
Please check out documentation about running BLAST+ in the cloud, if you haven't already:
You can integrate as much Python into that as you'd lilke.
Thankyou for your reply. From what I read on your gibhub project, blast-binder uses a locally installed blast+. I was looking to get away from that completely. As I said the -remote option is not implemented robustly in blast+ commandline. Of course, I could implement a python program that uses the ncbi web api to access the ncbi web service directly but I was wondering if someone else already did that because there clearly is a lot of logic in the blast+ tool for formatting and parsing the output that would be burdonsome to replicate. As far as the cloud is concerned it would cost me money to rent space on a cloud hosting service. I haven't looked at the actual pricing but I would assume that it would be too costly for it to be worth it for a hobbiest like myself. As far as using the website directly is concerned, it doesn't support all features. For example, the blastp page no longer supports entrez queries.
The
-remote
options are limited by NCBI. I don't think that the issue is the software not being written well or that a new Python implementation could do better. NCBI wants to be able to manage the traffic and number of tasks in order to serve the community. That's been the case for as long as I recall. That is why I pointed out their documentation referring to "limits on public web services". Maybe someone else can chime in?See this page that has a Perl example of accessing the API. I thought they used to have at least mention of Python route on an older version of that page but maybe they decided it was kind off moot given Biopython's 'Bio.Blast.NCBIWWW' module, see here for example and here. (Note the NCBI documentation page again stresses it is a shared resource and what you should do if the limits are holding you back.)
The blast-binder example was meant as illustrating using Python for downstream uses once you have the data from BLAST. It was meant to be along the lines of saying there's lots of ways to use Python along with BLAST. You enthusiasm for Python is admirable;however, you probably don't want to go about reinventing the wheel in a different language. Also it was meant to point out there's way to make the databases and run queries that don't impact your local storage or machine if you wanted.
The cloud way may be an alternative way to bring you the speed you want at little cost. You'd have to explore. Both Google and Amazon offer some free initial tiers that are more limited so you can work out some of the basics on a free or cheap machine and then only activate a large machine when you are really ready. From what I understand from the documentation, the databases are available so you don't need to supply them,