Using the blast api from Python (3)
2
1
Entering edit mode
7.4 years ago
Freek ▴ 60

Hi all,

I'm trying to use the blast api (https://blast.ncbi.nlm.nih.gov/Blast.cgi) from Python using the requests module. My goal is to send a sequence and get genomic (Ensembl GRCh38) coordinates back.

request = 'https://blast.ncbi.nlm.nih.gov/Blast.cgi?QUERY=gagtctcctttggaactctgcaggttctatttgctttttcccagatgagctctttttctggtgtttgtct&DATABASE=nt&PROGRAM=blastn&CMD=Put&FORMAT_TYPE=JSON2'

(This sequence is part of the ACTB gene)

I sent it to the server like this:

response = requests.get(request)

The response looks like:

print(response)
<Response [200]="">
print(response.headers)
{'Server': 'Apache', 'Set-Cookie': 'BlastCubbyImported=passive; domain=ncbi.nlm.nih.gov, MyBlastUser=1lgZT_2PBCUePBfITK86610D67; domain=.ncbi.nlm.nih.gov; path=/, ncbi_sid=5AAB86A694B876A1_0000SID; domain=.nih.gov; path=/; expires=Fri, 22 Jun 2018 09:01:30 GMT', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Security-Policy': 'upgrade-insecure-requests', 'X-UA-Compatible': 'IE=Edge', 'Cache-Control': 'private', 'Referrer-Policy': 'origin-when-cross-origin', 'NCBI-SID': '5AAB86A694B876A1_0000SID', 'NCBI-PHID': '5AAB86A694B876A10000000000000001.m_1', 'Keep-Alive': 'timeout=1, max=10', 'X-XSS-Protection': '1; mode=block', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Date': 'Thu, 22 Jun 2017 09:01:30 GMT', 'Connection': 'Keep-Alive'}

print(response.content)
b'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="&lt;a href=" http:="" www.w3.org="" 1999="" xhtml"="" rel="nofollow">http://www.w3.org/1999/xhtml">\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n<meta name="jig" content="ncbitoggler ncbiautocomplete"/>\n<meta name="ncbi_app" content="static"/>\n<meta name="ncbi_pdid" content="blastformatreq"/>\n<meta name="ncbi_stat" content="false"/>\n<meta name="ncbi_sessionid" content="5AAB86A694B876A1_0000SID"/>\n<meta name="ncbi_phid" content="5AAB86A694B876A10000000000000001"/>\nNCBI Blast\n<link rel="stylesheet" type="text/css" href="css/header.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/google-fonts.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/footer.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/main.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/common.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/blastReq.css" media="screen"/>\n\n<link rel="stylesheet" type="text/css" href="css/print.css" media="print"/>\n\n\n\n<script type="text/javascript" src="/core/jig/1.14.8/js/jig.min.js             "></script>   \n<script type="text/javascript" src="js/utils.js"></script>\n<script type="text/javascript" src="js/blast.js"></script>\n<script type="text/javascript" src="js/format.js"></script>\n\n</head>\n\n<body id="type-a">\n\n
\n\t\t \t\n

Most of it is cut off because of the character limit of this post.

This is unexpected, not? The response is not JSON and difficult to parse, it looks like it get a webpage back somehow.

Any suggestions?

Best regards,

Freek.

python python3 blast api • 7.9k views
ADD COMMENT
0
Entering edit mode
7.4 years ago

How fixed are you on using JSON format for the response?
Have you considered using the Blast api from biopython?: http://biopython.org/DIST/docs/api/Bio.Blast-module.html

-> very easy to parse!

ADD COMMENT
0
Entering edit mode

Hi Gunnar, Thanx for your response.

Hmm, before asking to install such things on our compute cluster I prefer this minimal approach. I will investigate bio-python, still I would prefer minimal, self made, flexible code and an easy to parse JSON response for portability, if anybody can get it to work :)

I feel I'm missing a very small thing.

ADD REPLY
0
Entering edit mode

By the way, what ever FORMAT_TYPE I use I get the same html/website as a response.

ADD REPLY
0
Entering edit mode

Any tips on using Biopython then?

If I do this:

result_handle = NCBIWWW.qblast("blastn", "nt", 'gagtctcctttggaactctgcaggttctatttgctttttcccagatgagctctttttctggtgtttgtct')

I get some XML output, when I want to print result_handle again, it is empty! How to save result for example?

>

Never mind, I found this in the Biopython Cookbook:

"We need to be a bit careful since we can use result_handle.read() to read the BLAST output only once – calling result_handle.read() again returns an empty string."

I really don't understand the reason for this, it made me re-blast many, many times wondering what went wrong in the part of my script after the .read(). Anyway, thanx for the suggestion.

ADD REPLY
0
Entering edit mode
20 months ago
Crunk ▴ 10

You are on the exactly same page as me, I also try to discover the way to use the web Blast by python-requests. I think it certainly will be very faster than others (QBlast, Blast+).

I have been stuck in the problem find the RID (request ID) of Blast. The only missing is RID if we find a way to create RID on the client and post it with other parameters, we'll get the result.

RID missing

ADD COMMENT
2
Entering edit mode

web blast is a public resource that I don't think is meant to be used programmatically. You could either use local blast searches or if you are doing a limited number do -remote command line searches from a local blast+ client. If you have a lot of queries, web or remote blast would not work since NCBI may ban your IP on repeated queries or you would start getting errors about using more resources than can be allocated.

ADD REPLY

Login before adding your answer.

Traffic: 2585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6