python stopped opening xml url, connection closed.
0
0
Entering edit mode
7.1 years ago
alowi33 ▴ 50

I used the following codes in python 3.5 to scrape data from multiple web xml pages. It worked once and then stopped working when I tried again. Do you know why and how I can fix it? The following is part of a longer script:

from urllib.request import urlopen
import urllib

urls=["https://www.ebi.ac.uk/ena/data/view/ERS1887141&display=xml",
      "https://www.ebi.ac.uk/ena/data/view/ERS1887140&display=xml",
      "https://www.ebi.ac.uk/ena/data/view/ERS1887139&display=xml"]

for url in urls:
    contents = None
    while contents is None:
        try:
            s = urlopen(url)
            contents = s.read()
        except urllib.error.URLError:
            print("urllib timeout")
            pass

It worked the first time I tried to open a single url. It failed after I tried to open it again or open another url from the same web domain. When I remove the exception I get the following error. Increasing the timeout time did not solve the problem. It seems like my error has to do with the TLS/SSL connection.

Unlike NCBI, the European Nucleotide Archive (ENA) did not include the metadata (info table) for all samples in the study PRJEB99111 in a file. I have seen the metadata only in html or xml format - hence the my url scraping. Please let me know if you can find this metadata/info file.

    Traceback (most recent call last):
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 1107, in request
    self._send_request(method, url, body, headers)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 1152, in _send_request
    self.endheaders(body)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 1103, in endheaders
    self._send_output(message_body)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/home/amirza/.conda/envs/py35/lib/python3.5/http/client.py", line 1261, in connect
    server_hostname=server_hostname)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/ssl.py", line 385, in wrap_socket
    _context=self)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/ssl.py", line 760, in __init__
    self.do_handshake()
  File "/home/amirza/.conda/envs/py35/lib/python3.5/ssl.py", line 996, in do_handshake
    self._sslobj.do_handshake()
  File "/home/amirza/.conda/envs/py35/lib/python3.5/ssl.py", line 641, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:719)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 1297, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/home/amirza/.conda/envs/py35/lib/python3.5/urllib/request.py", line 1256, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error TLS/SSL connection has been closed (EOF) (_ssl.c:719)>
web scraping urlopen urllib python EMBL-EBI • 2.9k views
ADD COMMENT
0
Entering edit mode

I am not sure exactly what metadata you are looking for but if you click on the "TEXT" link on this page you can get a detailed dump of all samples (including the data download locations).

ADD REPLY
0
Entering edit mode

'TEXT" does not contain metadata (age, sex, disease status, etc). The metadata can be found by clicking on the "Attributes" tab for each sample individually (147 samples all together). This page contains the metadata for one of the samples: https://www.ebi.ac.uk/ena/data/view/SAMEA104228118

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6