Hi all,
I'm trying to use the blast api (https://blast.ncbi.nlm.nih.gov/Blast.cgi) from Python using the requests module. My goal is to send a sequence and get genomic (Ensembl GRCh38) coordinates back.
request = 'https://blast.ncbi.nlm.nih.gov/Blast.cgi?QUERY=gagtctcctttggaactctgcaggttctatttgctttttcccagatgagctctttttctggtgtttgtct&DATABASE=nt&PROGRAM=blastn&CMD=Put&FORMAT_TYPE=JSON2'
(This sequence is part of the ACTB gene)
I sent it to the server like this:
response = requests.get(request)
The response looks like:
print(response)
<Response [200]="">
print(response.headers)
{'Server': 'Apache', 'Set-Cookie': 'BlastCubbyImported=passive; domain=ncbi.nlm.nih.gov, MyBlastUser=1lgZT_2PBCUePBfITK86610D67; domain=.ncbi.nlm.nih.gov; path=/, ncbi_sid=5AAB86A694B876A1_0000SID; domain=.nih.gov; path=/; expires=Fri, 22 Jun 2018 09:01:30 GMT', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Security-Policy': 'upgrade-insecure-requests', 'X-UA-Compatible': 'IE=Edge', 'Cache-Control': 'private', 'Referrer-Policy': 'origin-when-cross-origin', 'NCBI-SID': '5AAB86A694B876A1_0000SID', 'NCBI-PHID': '5AAB86A694B876A10000000000000001.m_1', 'Keep-Alive': 'timeout=1, max=10', 'X-XSS-Protection': '1; mode=block', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Date': 'Thu, 22 Jun 2017 09:01:30 GMT', 'Connection': 'Keep-Alive'}
print(response.content)
b'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="<a href=" http:="" www.w3.org="" 1999="" xhtml"="" rel="nofollow">http://www.w3.org/1999/xhtml">\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n<meta name="jig" content="ncbitoggler ncbiautocomplete"/>\n<meta name="ncbi_app" content="static"/>\n<meta name="ncbi_pdid" content="blastformatreq"/>\n<meta name="ncbi_stat" content="false"/>\n<meta name="ncbi_sessionid" content="5AAB86A694B876A1_0000SID"/>\n<meta name="ncbi_phid" content="5AAB86A694B876A10000000000000001"/>\nNCBI Blast \n<link rel="stylesheet" type="text/css" href="css/header.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/google-fonts.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/footer.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/main.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/common.css" media="screen"/>\n<link rel="stylesheet" type="text/css" href="css/blastReq.css" media="screen"/>\n\n<link rel="stylesheet" type="text/css" href="css/print.css" media="print"/>\n\n\n\n<script type="text/javascript" src="/core/jig/1.14.8/js/jig.min.js "></script> \n<script type="text/javascript" src="js/utils.js"></script>\n<script type="text/javascript" src="js/blast.js"></script>\n<script type="text/javascript" src="js/format.js"></script>\n\n</head>\n\n<body id="type-a">\n\n\n\t\t \t\n \n \n - https://www.ncbi.nlm.nih.gov/">NCBI Home
\n - https://www.ncbi.nlm.nih.gov/myncbi">Sign in to NCBI
\n - Skip to Main Content
\n - Skip to Navigation
\n - https://www.ncbi.nlm.nih.gov/guide/browsers/#accesskeys">About NCBI Accesskeys
\n
\n
Most of it is cut off because of the character limit of this post.
This is unexpected, not? The response is not JSON and difficult to parse, it looks like it get a webpage back somehow.
Any suggestions?
Best regards,
Freek.
Hi Gunnar, Thanx for your response.
Hmm, before asking to install such things on our compute cluster I prefer this minimal approach. I will investigate bio-python, still I would prefer minimal, self made, flexible code and an easy to parse JSON response for portability, if anybody can get it to work :)
I feel I'm missing a very small thing.
By the way, what ever FORMAT_TYPE I use I get the same html/website as a response.
Any tips on using Biopython then?
If I do this:
I get some XML output, when I want to print result_handle again, it is empty! How to save result for example?
>
Never mind, I found this in the Biopython Cookbook:
"We need to be a bit careful since we can use result_handle.read() to read the BLAST output only once – calling result_handle.read() again returns an empty string."
I really don't understand the reason for this, it made me re-blast many, many times wondering what went wrong in the part of my script after the .read(). Anyway, thanx for the suggestion.