web efetch Json out put which is not a real json
2
1
Entering edit mode
7.2 years ago
Lilizine ▴ 10

Hi all``

I am using this code to download pubmed articles

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
search_r = requests.post(search_url)
search_data = search_r.json()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

     for i in range(0, total_records, 10000):
     this_fetch = fetch_url+"&retstart="+str(i)
     print("Getting this URL: "+this_fetch)
     fetch_r = requests.post(this_fetch)
     f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
     f.write(fetch_r.text)
     f.close()

I want to have output in XML not in json, the problem is when I want to do this :

page = urllib.urlopen('one of the URLs')
content = page.read()
obj = json.loads(content)
xml = dicttoxml.dicttoxml(content)
print(xml)

I have this error:

No JSON object could be decoded

Ideally If I can extract XML ? and avoid the json outputs which are not recognized as json

PS: ignore the idendation issues due to copy paste

Thanks

json efetch biopython python pubmed • 4.6k views
ADD COMMENT
0
Entering edit mode

I don't understand the issue here. If you want xml output from NCBI, just use retmode=xml in the url.

ADD REPLY
0
Entering edit mode

Already did this and it returns an error

ADD REPLY
0
Entering edit mode

What does return an error ? What's the error message ?

ADD REPLY
0
Entering edit mode

I changed the following:

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"
search_r = requests.post(search_url)
search_data = search_r.xml()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

for i in range(0, total_records, 10000):
this_fetch = fetch_url+"&retstart="+str(i)
print("Getting this URL: "+this_fetch)
fetch_r = requests.post(this_fetch)
f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".txt", 'w')
f.write(fetch_r.text)
f.close()

print("Number of records found :"+str(total_records))

I got this error:

AttributeError Traceback (most recent call last)

<ipython-input-8-262417e4aa63> in <module>()

  1 search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"

  2 search_r = requests.post(search_url)

----> 3 search_data = search_r.xml()

4 webenv = search_data["esearchresult"]['webenv']

  5 total_records = int(search_data["esearchresult"]['count'])

AttributeError: 'Response' object has no attribute 'xml'

ADD REPLY
0
Entering edit mode

I don't know which programming language you're using (python maybe ? I don't know python) but this could indicate that your object has no method called xml. Maybe whatever module you're using is not capable of handling xml. The solution is to deal with the xml yourself, maybe with the help of another module. This looks now more like a programming question and not a bioinformatics one. You may have better luck asking on StackOverflow. Alternatively, explain what you're trying to do, i.e. what bioinformatics problem you're trying to solve. There may already be a solution.

ADD REPLY
3
Entering edit mode
5.8 years ago
MatthewP ★ 1.4k

Hello, you set retmode=json but this may not supported by db=pubmed, this table shows all default and valid retmode for E-utilities all databases.

ADD COMMENT
0
Entering edit mode
5.8 years ago
jrdeans • 0

So far, eUtils only allows retmode='json' for eSearch queries.

with request.urlopen(search_url) as response:
    content = response.read()
data = json.loads(content)

web = data['esearchresult']['webenv']
key = data['esearchresult']['querykey']
count = int(data['esearchresult']['count'])
ids = data['esearchresult']['idlist']

When you are planning to eFetch 'pubmed' data you have three options for retmode (asn.1, text, xml). Within the text category you can select 3 different kinds of rettypes (medline, uilist, abstract) of which only 'medline' will get you all the data of the publication. Both 'asn.1' and 'xml' retmodes do not have associated rettypes, so you can ignore setting that field in those scenarios [1].

with request.urlopen(fetch_url) as response:
    content = response.read()
soup = BeautifulSoup(content, 'html.parser')

You will be getting real json output for your eSearch results, but HTML/XML for the eFetch call. There currently is no way to get eFetch results in json format. Hopefully NCBI adds this functionality to their other eUtils tools soon!

ADD COMMENT

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6