Hi I am using biopython to pull files from NCBI using Entrez. The program works on small files but on larger files I get an error. I would really appreciate some insight or help figuring out what went wrong.
Here is the program:
from Bio import Entrez Entrez.email = "jbro262@lsu.edu" search_handle = Entrez.esearch(db="nucleotide",term="Saimiri",usehistory="n") search_results = Entrez.read(search_handle) search_handle.close() gi_list = search_results["IdList"] count = int(search_results["Count"]) a = open("Numfile.txt", "a+") a.write("The number of Saimiri files are :") a.write(str(count)) a.write("\n") a.close() webenv = search_results["WebEnv"] query_key = search_results["QueryKey"] batch_size = 25 out_handle = open("SaimiriDNA.fasta", "w") for start in range(0,count,batch_size): end = min(count, start+batch_size) print("Going to download record %i to %i" % (start+1, end)) fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key) data=fetch_handle.read() fetch_handle.close() out_handle.write(data) out_handle.close()
HERE ARE THE ERRORS:
Traceback (most recent call last): File "Entrezfiles_Saimiri.py", line 53, in <module> fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key) File "/usr/local/lib/python3.4/ dist-packages/Bio/Entrez/__init__.py", line 149, in efetch return _open(cgi, variables, post) File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 464, in _open raise exception File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 462, in _open handle = _urlopen(cgi) File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.4/urllib/request.py", line 461, in open response = meth(req, response) File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.4/urllib/request.py", line 499, in error return self._call_chain(*args) File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain result = func(*args) File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 502: Bad Gateway
Does this mean something is going wrong with my server while the files are downloading?
Help is greatly appreciated.
502 has usually nothing to do with the client. Could you try again in a half hour or so and see if it still exists?
Thank you. Okay I will try again in a few minutes. However, It took an hour or so for the error to occur the last time. What exactly does Error 502 mean and how does that relate to a urllib.error with python?
Under Python 3, you would import the HTTPError class with:
from urllib.error import HTTPError
Having done that you can use it to catch the exception, see also: http://stackoverflow.com/questions/3193060/catch-specific-http-error-in-python
HTTP error code 502 is a specific server problem (in this case, an NCBI problem). See http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Thanks! I'll use this information to help edit my code.
Hey. The try/except around Entrez.fetch fixed my program. Works great now. Thanks!