Question

Modifying a GET request for article retrieval in R

0

Entering edit mode

2.5 years ago

francesca.longhin.2 • 0

Hi everyone

I am using the R package europepmc (https://cran.r-project.org/web/packages/europepmc/europepmc.pdf) and the function epmc_ftxt for obtaining the full texts of some articles given their PMC ID. However for many articles I keep getting the following error:

"Request failed [404]. Retrying in 1 seconds... Error in epmc_ftxt("PMC2701033") : Not Found (HTTP 404). Failed to retrieve full text.."

That is because the article does not belong to the OpenAccess subset (I guess). However I checked and saw that my University has the license to access that article. So my question is... How can I edit the get request in the function in order to tell epmc_ftxt that I can actually access that article? Code below:

    #' This function loads full texts into R. Full texts are in XML format and are
    #' only provided for the Open Access subset of Europe PMC.
    #'
    #' @param ext_id character, PMCID. 
    #'   All full text publications have external IDs starting 'PMC_'
    #'
    #' @export
    #' @return xml_document
    #'
    #' @examples
    #'   \dontrun{
    #'   epmc_ftxt("PMC3257301")
    #'   epmc_ftxt("PMC3639880")
    #'   }
    epmc_ftxt <- function(ext_id = NULL) {
      if (!grepl("^PMC", ext_id))
        stop("Please provide a PMCID, i.e. ids starting with 'PMC'")
      # call api
      req <-
        httr::RETRY("GET",
                    base_uri(),
                    path = paste(rest_path(), ext_id,
                                 "fullTextXML", sep = "/"))
      # check for http status
      httr::stop_for_status(req, "retrieve full text.")
      # load xml into r
      httr::content(req, as = "text", encoding = "utf-8") %>%
        xml2::read_xml()
    }

GET retrival R request articles • 458 views

ADD COMMENT • link updated 2.5 years ago by Matthias Zepper 5.0k • written 2.5 years ago by francesca.longhin.2 • 0

0

Entering edit mode

Well, the API specification does not list any means of authentication, e.g. via tokens and if this was a problem I would also expect another status than 404. (e.g. 401 or 403). It just seems that those full texts are not available in this format, because also with curl you can't get them:

Doesn't work (your example)

curl -X GET --header 'Accept: application/xml' 'https://www.ebi.ac.uk/europepmc/webservices/rest/PMC2701033/fullTextXML'

Works:

curl -X GET --header 'Accept: application/xml' 'https://www.ebi.ac.uk/europepmc/webservices/rest/PMC2601033/fullTextXML'

ADD REPLY • link 2.5 years ago by Matthias Zepper 5.0k