Question

Parse Drugbank XML in R

0

Entering edit mode

7.7 years ago

kakukeshi ▴ 80

Hi guys,

Anybody knows how to parse the complete drugbank database in xml into R. I need the drug mechanism of action, so I can't use the other downloadable files from drugbank.

Thanks

drug xml R gene • 4.8k views

ADD COMMENT • link updated 6.1 years ago by mohfcis ▴ 20 • written 7.7 years ago by kakukeshi ▴ 80

score 1 · Answer 1 · 2017-08-27

I successfully used the xmlEventParse() function in R (https://www.rdocumentation.org/packages/XML/versions/3.98-1.9/topics/xmlEventParse) to extract selected fields from the DrugBank database. (After experimenting with loading the full 600+ MB database into memory, and finding that that was not working, I ended up using this SAX parsing method.)

I've included a subset of my code to give you a feel for what this looks like:

library(XML)
library(xml2)
library(gdata)

drug.name <- array(dim = 0)

# Define function to extract necessary data from each drug (= each main node)
getDrug <- function(x, ...) {

  # name the current drug for easy reference
  current_drug <- read_xml(toString.XMLNode(x));

  # extract properties related to drug
  drug.name <- xml_text(xml_find_first(current_drug, './name'))

  # remove the current node from memory when finished with it
  rm(x)

}

# Use event-driven SAX parser to process the XML without requiring the full tree structure to be loaded into memory
# Call the function defined above
xmlEventParse(file = filename, handlers = NULL, trim = FALSE, branches = list(drug = getDrug))

Hope this helps.

score 0 · Answer 2 · 2019-04-01

0

Entering edit mode

6.1 years ago

mohfcis ▴ 20

I know it is an old post, but for anyone how might be having the same question. There is a new package called dbparser to parse drugbank database into several R datasets https://github.com/Dainanahan/dbparser

ADD COMMENT • link 6.1 years ago by mohfcis ▴ 20