Question

How Do I Import Rdf Data Into R?

17

Entering edit mode

14.4 years ago

Egon Willighagen 5.4k

What approach are you using to import Resource Description Framework data into R? There is minimal support with the R package Rredland, but that seems rather spartanic. There was an interesting Rswub, but that was lost in time. I also noted Rsparql, but the project does not seem to have delivered anything yet. And, of course, I can do something manually... what are your best practices to use RDF data from, for example, Bio2RDF?

r web • 18k views

ADD COMMENT • link updated 7.3 years ago by Biostar 20 • written 14.4 years ago by Egon Willighagen 5.4k

1

Entering edit mode

Your first link connects to the Swedish version of wikipedia. For the english version: http://en.wikipedia.org/wiki/Resource_Description_Framework

ADD REPLY • link 14.4 years ago by David Quigley 11k

0

Entering edit mode

nice we all Speak swedish RDF :D http://www.youtube.com/watch?v=9OfsABOGw3c&feature=related

ADD REPLY • link 14.4 years ago by Michael 55k

0

Entering edit mode

Sorry, you lost me... Swedish RDF?

ADD REPLY • link 14.4 years ago by Egon Willighagen 5.4k

0

Entering edit mode

Oh, crap... OK, fixing... stupid, we're-so-smart-we-know-where-you-live websites... :(

ADD REPLY • link 14.4 years ago by Egon Willighagen 5.4k

0

Entering edit mode

Ah! Sorry about that; fixed now.

ADD REPLY • link 14.4 years ago by Egon Willighagen 5.4k

Ram · Answer 1 · 2011-03-23

I started a package for just this purpose yesterday. It is available from CRAN, as functionality is a bit limited today:

library(rrdf)
m1 = load.rdf("one.rdf")
m2 = load.rdf("two.rdf")
m3 = combine.rdf(m1, m2)
summarize.rdf(m3)
sparql.rdf(m3, "SELECT ?s ?p { ?s ?p ?o }")

It is wrapping around Jena and using rJava to interface to it.

There is in fact also a Bioconductor package called Rredland.

Because the rrdf package now also supports SPARQL queries against remote databases, you can also do (following this BioStar answer):

library(rrdf)

endpoint = "http://rdf.farmbio.uu.se/chembl/sparql"

query = "
SELECT ?organism ?instance
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query)

As of version 1.4 you can also use on of the SPARQL variables as values for the row names. For example, to get a single column with the protein names as row names, you do:

query = "
SELECT ?organism ?title
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://purl.org/dc/elements/1.1/title> ?title ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query, rowvarname="title")

Resulting in a R matrix like:

                                                      organism                       
Maltase-glucoamylase                                  "Homo sapiens"                 
Sulfonylurea receptor 2                               "Homo sapiens"                 
Voltage-gated T-type calcium channel alpha-1H subunit "Homo sapiens"                 
Dihydrofolate reductase                               "Escherichia coli (strain K12)"
Tyrosine-protein kinase ABL                           "Homo sapiens"                 
DNA-directed RNA polymerase beta chain                "Escherichia coli (strain K12)"

score 6 · Answer 2 · 2010-08-10

The following hints are all far from perfect, and will require some experimenting on your side, but here's my best guess (I got only worst practices for language interfaces, not for reading data from BioRDF):

The Redland C library has many language bindings (Perl, Python, Ruby). If these bindings are more complete than Rredland, you could use e.g. the Perl-binding + RPy or RSPerl
There are java libraries out there, see the StackExchange answer. They can be interfaced using e.g. SJava or (less nicely) JRI.
Pimping the Rredland package to add the functionality you need (maybe most clean but takes a lot of your time)

I would maybe go for the SJava solution first because there at least four java libraries to chose from. I have had some mixed experiences with using language bindings, but in the end RSPerl and SJava worked with Perl and Java for me, and I heard that RPy works nicely too. So it should be possible in principle^TM to access the libraries too. Whatever solution you come up with will likely be appreciated by the BioC community.