Question

web efetch Json out put which is not a real json

1

Entering edit mode

7.9 years ago

Lilizine ▴ 10

Hi all``

I am using this code to download pubmed articles

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
search_r = requests.post(search_url)
search_data = search_r.json()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

     for i in range(0, total_records, 10000):
     this_fetch = fetch_url+"&retstart="+str(i)
     print("Getting this URL: "+this_fetch)
     fetch_r = requests.post(this_fetch)
     f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
     f.write(fetch_r.text)
     f.close()

I want to have output in XML not in json, the problem is when I want to do this :

page = urllib.urlopen('one of the URLs')
content = page.read()
obj = json.loads(content)
xml = dicttoxml.dicttoxml(content)
print(xml)

I have this error:

No JSON object could be decoded

Ideally If I can extract XML ? and avoid the json outputs which are not recognized as json

PS: ignore the idendation issues due to copy paste

Thanks

json efetch biopython python pubmed • 5.4k views

ADD COMMENT • link updated 6.5 years ago by MatthewP ★ 1.4k • written 7.9 years ago by Lilizine ▴ 10

0

Entering edit mode

I don't understand the issue here. If you want xml output from NCBI, just use retmode=xml in the url.

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Already did this and it returns an error

ADD REPLY • link 7.9 years ago by Lilizine ▴ 10

0

Entering edit mode

What does return an error ? What's the error message ?

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I changed the following:

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"
search_r = requests.post(search_url)
search_data = search_r.xml()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

for i in range(0, total_records, 10000):
this_fetch = fetch_url+"&retstart="+str(i)
print("Getting this URL: "+this_fetch)
fetch_r = requests.post(this_fetch)
f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".txt", 'w')
f.write(fetch_r.text)
f.close()

print("Number of records found :"+str(total_records))

I got this error:

AttributeError Traceback (most recent call last)

<ipython-input-8-262417e4aa63> in <module>()

  1 search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"

  2 search_r = requests.post(search_url)

----> 3 search_data = search_r.xml()

4 webenv = search_data["esearchresult"]['webenv']

  5 total_records = int(search_data["esearchresult"]['count'])

AttributeError: 'Response' object has no attribute 'xml'

ADD REPLY • link 7.9 years ago by Lilizine ▴ 10

0

Entering edit mode

I don't know which programming language you're using (python maybe ? I don't know python) but this could indicate that your object has no method called xml. Maybe whatever module you're using is not capable of handling xml. The solution is to deal with the xml yourself, maybe with the help of another module. This looks now more like a programming question and not a bioinformatics one. You may have better luck asking on StackOverflow. Alternatively, explain what you're trying to do, i.e. what bioinformatics problem you're trying to solve. There may already be a solution.

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

score 3 · Answer 1 · 2019-01-30

3

Entering edit mode

6.5 years ago

MatthewP ★ 1.4k

Hello, you set retmode=json but this may not supported by db=pubmed, this table shows all default and valid retmode for E-utilities all databases.

ADD COMMENT • link 6.5 years ago by MatthewP ★ 1.4k

score 0 · Answer 2 · 2019-01-30

So far, eUtils only allows retmode='json' for eSearch queries.

with request.urlopen(search_url) as response:
    content = response.read()
data = json.loads(content)

web = data['esearchresult']['webenv']
key = data['esearchresult']['querykey']
count = int(data['esearchresult']['count'])
ids = data['esearchresult']['idlist']

When you are planning to eFetch 'pubmed' data you have three options for retmode (asn.1, text, xml). Within the text category you can select 3 different kinds of rettypes (medline, uilist, abstract) of which only 'medline' will get you all the data of the publication. Both 'asn.1' and 'xml' retmodes do not have associated rettypes, so you can ignore setting that field in those scenarios [1].

with request.urlopen(fetch_url) as response:
    content = response.read()
soup = BeautifulSoup(content, 'html.parser')

You will be getting real json output for your eSearch results, but HTML/XML for the eFetch call. There currently is no way to get eFetch results in json format. Hopefully NCBI adds this functionality to their other eUtils tools soon!