HTTP Error 502-Biopython-Entrez Files
1
1
Entering edit mode
10.0 years ago

Hi I am using biopython to pull files from NCBI using Entrez. The program works on small files but on larger files I get an error. I would really appreciate some insight or help figuring out what went wrong.

Here is the program:

from Bio import Entrez
Entrez.email = "jbro262@lsu.edu"
search_handle = Entrez.esearch(db="nucleotide",term="Saimiri",usehistory="n")
search_results = Entrez.read(search_handle)
search_handle.close()

gi_list = search_results["IdList"]
count = int(search_results["Count"])
a = open("Numfile.txt", "a+")
a.write("The number of Saimiri files are :")
a.write(str(count))
a.write("\n")
a.close()

webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]

batch_size = 25
out_handle = open("SaimiriDNA.fasta", "w")

for start in range(0,count,batch_size):
    
    end = min(count, start+batch_size)
    print("Going to download record %i to %i" % (start+1, end))
    
    fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key)
    data=fetch_handle.read()
    fetch_handle.close()
    out_handle.write(data)
out_handle.close()

HERE ARE THE ERRORS:

Traceback (most recent call last):
  File "Entrezfiles_Saimiri.py", line 53, in <module>
    fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key)
  File "/usr/local/lib/python3.4/

dist-packages/Bio/Entrez/__init__.py", line 149, in efetch
    return _open(cgi, variables, post)
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 464, in _open
    raise exception
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 462, in _open
    handle = _urlopen(cgi)
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 461, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 499, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 502: Bad Gateway

Does this mean something is going wrong with my server while the files are downloading?

Help is greatly appreciated.

biopython entrez python3 ubuntu error • 8.5k views
ADD COMMENT
1
Entering edit mode

502 has usually nothing to do with the client. Could you try again in a half hour or so and see if it still exists?

ADD REPLY
0
Entering edit mode

Thank you. Okay I will try again in a few minutes. However, It took an hour or so for the error to occur the last time. What exactly does Error 502 mean and how does that relate to a urllib.error with python?

ADD REPLY
1
Entering edit mode

Under Python 3, you would import the HTTPError class with: from urllib.error import HTTPError

Having done that you can use it to catch the exception, see also: http://stackoverflow.com/questions/3193060/catch-specific-http-error-in-python

HTTP error code 502 is a specific server problem (in this case, an NCBI problem). See http://en.wikipedia.org/wiki/List_of_HTTP_status_codes

ADD REPLY
0
Entering edit mode

Thanks! I'll use this information to help edit my code.

ADD REPLY
0
Entering edit mode

Hey. The try/except around Entrez.fetch fixed my program. Works great now. Thanks!

ADD REPLY
1
Entering edit mode
10.0 years ago
Peter 6.0k

When making heavy use of an online service like NCBI Entrez, you should expect to get intermittent network errors like HTTP Error 502: Bad Gateway from time to time. The standard approach would be to wrap the call in a try/except block and retry it (e.g. three retries, with a pause between each).

Or just wait and retry when the NCBI is less busy (i.e. avoid USA working hours), that is often easier ;)

See also Ncbi Entrez Server Issues

ADD COMMENT
0
Entering edit mode

Thanks Peter! That makes a lot of sense. Would you mind giving me an example of what you mean by wrapping the call in a try/except block. I am fairly new to programming so I'm learning as I go. Do you mean for each batch that's called I have a block of code that tries to pull the data except when an error occurs and the program moves on from there?

I'll take a look at this NCBI Server issue link also.

ADD REPLY
1
Entering edit mode

I don't have an example to hand, another Biopython contributor might: http://lists.open-bio.org/pipermail/biopython-dev/2014-November/020773.html

I would in the first instance put the try/except round the Entrez.efetch(...) call to allow a pause and retry - but that would only work as long as the history session does not expire.

ADD REPLY
0
Entering edit mode

Okay I will look into other examples. Thanks so much.

As far a using the history option, when I tried to use the history option all of the files that I see online would not download it would only use a portion of them. So I opted to put no instead of yes for "use history" Could that be an issue?

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6