python module to download any paper by DOI
2
1
Entering edit mode
7.3 years ago
akarazeev ▴ 10

Hi. I am wondering is there any python module that allows to download paper by its doi?

I am familiar with SciHub module (https://github.com/zaytoun/scihub.py) but there is a problem with captchas.

Probably you know other methods of downloading papers using some API.

I'm ready to pay some money to avoid captchas but personally I don't know any resource that offers this option.

Thank you in advance.

pubmed python scihub • 8.7k views
ADD COMMENT
3
Entering edit mode
7.3 years ago
Joe 21k

It's not a module per se, and isn't specific for DOIs but you could do something like the following:

def getDOI(top_hit):
    """Query the PDB REST API to get an associated DOI/Publication"""
    import requests
    try:
        query = requests.get("https://www.ebi.ac.uk/pdbe/api/pdb/entry/publications/" + str(top_hit))
        qjson = query.json()
        doi = qjson[top_hit][0]['doi']

        if not doi:
            doi = "No DOI found."

    except KeyError:
        doi = "Key error. ID likely deprecated."

    return doi

I use this code snippet to return DOIs from a PDB ID query. You may be able to chop it up to suit your own needs.

Otherwise take a look at the esearch/efetch options from Bio.Entrez (http://biopython.org/DIST/docs/api/Bio.Entrez-module.html)

ADD COMMENT
0
Entering edit mode
6.8 years ago

You could try Europe PMC API to retrieve publication full text via DOI. You would need to first map the DOI to the corresponding PMCID using the search module, and then use the PMCID to retrieve full text XML from the open access subset. Here is an example: search module for the following DOI (10.1371/journal.ppat.1002485) returns PMC3257301 as a PMCID, then fullTextXML module for PMC3257301 retrieves the full text.

ADD COMMENT

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6