NCBI E-utilities : HTTP POST not working for EPost
2
0
Entering edit mode
9.4 years ago

I am trying to download FASTA sequences from a list of protein GIs (~ 100000). I planned to use EPost using HTTP POST to first upload the list of GIs and then use EFetch to download the FASTA.

I am getting no response from the server (WebEnv and query_key are not generated) when I upload the list of GIs with EPost using HTTP POST.

The code that I am using for EPost is :

#!/usr/bin/perl 
use LWP::Simple; 
use LWP::UserAgent; 

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/'; 
$url = $base . "epost.fcgi"; 

$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090"; 

#create HTTP user agent 
$ua = new LWP::UserAgent; 

#create HTTP request object 
$req = new HTTP::Request POST => "$url"; 
$req->content_type('application/x-www-form-urlencoded'); 
$req->content("$url_params"); 

#post the HTTP request 
$response = $ua->request($req); 
print $response->content;

This prints nothing after the code is run. Ideally it should print the WebEnv and the query_key. The HTTP status is OK and the code is 200.

If I change the url_params and remove all GIs from it

$url_params = "db=protein";

I get the following output :

<?xml version="1.0"?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD ePostResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ePost_020511.dtd">
<ePostResult>
    <ERROR>Empty ID list; Nothing to store</ERROR>
</ePostResult>

I have no idea what the problem is and why the server isn't generating WebEnv and query_key.

If anyone knows the solution please help me out.

E-utilities EPost Perl • 3.8k views
ADD COMMENT
2
Entering edit mode
9.4 years ago
piet ★ 1.9k
$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090";

You have to pass several ID as a comma separated list

$url_params = "db=protein&id=830003112,830003110,830003108,830003106,830003104,830003102,830003100,830003098,830003096,830003094,830003092,830003090";

See also the example on the NCBI site: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EPost

ADD COMMENT
0
Entering edit mode

The comma separated list is for EPost using HTTP GET, which has a limit of 200 GIs per query. I want to query a million GIs and for that HTTP POST should be used. Please read :- http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_EPost_ (in required parameters - id).

In HTTP POST query each id is separated by &id=. This is mentioned in the sample application 4 :- http://www.ncbi.nlm.nih.gov/books/NBK25498/#_chapter3_Application_4_Finding_unique_se_

ADD REPLY
1
Entering edit mode

There is no difference in the format of the data between a GET and a POST request. For some reasons the NCBI server expects a comma separated list for id, but fails if &id= is given multiple times. Maybe the NCBI server is not in accordance with the CGI standard.

I guess that the problem is with Perl. Please pass your comma-separated GI list as a plain string to $req instance. If you pass them as a list or an associative array/hash, Perl will presumably mangle them into multi &id=.

ADD REPLY
0
Entering edit mode

You are right, NCBI does not accept &id=. It worked as soon as I converted it to comma separated list.

It seems they have given incorrect code in their sample application 4 for HTTP POST.

Thanks a lot!

ADD REPLY
0
Entering edit mode

I'd be curious to know if the system itself will allow you to pass 1 million GIs into a single entrez post request (once you figure out how to generate right format) - but let us know

ADD REPLY
0
Entering edit mode

I have successfully passed 100,000 GIs with one POST request. Will try 1 million soon.

ADD REPLY
0
Entering edit mode

I'm kinda surprised to (still) see this working ... didn't NCBI switched to https protocol some time ago? I would have thought the http approach would have been phased out by now

ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6