Parsing Owl (Chebi) Ontology.
5
4
Entering edit mode
13.4 years ago
Iain ▴ 260

Hi all,

I was wondering if anybody could give me some pointers on how to parse an OWL ontology with available tools.

I would like to generate an output file based on the ChEBI ontology, to get all the terms associated with specific compounds, both direct and indirect annotations. The CheEBI ontology is available from http://www.ebi.ac.uk/chebi/downloadsForward.do

For example, methotrexate (http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:44185) is annotated as dicarboxylic acid (http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A35692). However dicarboxylic acid is itself a carboxylic acid (http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A33575). I would like to return an annotation to both dicarboxylic acid and carboxylic acid for methotrexate.

My only experience with ontologies is with GO, and for parsing that I use various bioconductor packages in R, so any help greatly appreciated.

Thanks

Iain

• 8.2k views
ADD COMMENT
3
Entering edit mode
13.4 years ago
Chris Mungall ▴ 320

I'm not aware of any R specific packages (unfortunately there is a lack of fully featured OWL-level APIs outside java).

Your best bet would be to process this externally using either a java program that uses the OWL API or Jena, or use a language like SPARQL to process the RDF-XML.

The following SPARQL query will extract all class-subclass pairs from an OWL ontology:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

SELECT * 
  WHERE { ?x rdfs:subClassOf ?y 
}

If you run this through a reasoner such as Pellet, this has the advantage of also giving you the inferred subclass relationships.

Save the above as "subclassOf.sparql" and run the following:

pellet query -q subclassOf.sparql chebi.owl

If you want to use the other relationship types in CHEBI it's a little bit of extra syntax - but if you want to use these then it's maybe worth the effort of using the OWL API here.

If you want a table mapping IDs to labels:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

SELECT * 
  WHERE { ?x rdfs:label ?xn
}

This strategy should work with any OWL for any of the ontologies in the OBO library (including GO).

Of course, you could always try the obo format version of CHEBI and using the standard GO R tools - but be warned these tools might make assumptions that do not hold for CHEBI.

Hopefully we'll see more support for OWL in R in the future.

Hope this helps,
Chris

ADD COMMENT
0
Entering edit mode

That is a great help Chris, thanks a million.

ADD REPLY
0
Entering edit mode

Hi Chris,

Would you have recommendations on where I could learn more about using the basics of a reasoner such as Pellet? Ontologies/semantic web discussion seem to use a very different lingo that I am used to!.

Thanks

Iain

ADD REPLY
0
Entering edit mode

The Pellet docs are good, but they don't aim to explain why you might want a full-blown reasoner.

There is a lot of good material that is geared towards developers of ontologies, who are currently the primary beneficiaries of reasoning.

We need more material aimed at bioinformatics software developers consuming ontologies within some tool or database. For many scenarios using an OWL reasoner is overkill when simple graph traversal can give you valid and complete answers. This is changing as ontologies are getting richer and more integrated, but there's a lack of docs for bioinformaticians.

ADD REPLY
0
Entering edit mode

Watch this space..

ADD REPLY
1
Entering edit mode
13.4 years ago

The right way would be to use a RDF reasoner: e.g. Pellet http://clarkparsia.com/pellet

However, you can transform this OWL file with the following XSLT stylesheet:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    xmlns="http://purl.obolibrary.org/obo#"
         xml:base="http://purl.obolibrary.org/obo"
         xmlns:obo="http://purl.obolibrary.org/obo/"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:obo2="http://purl.obolibrary.org/obo#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
        version='1.0'
        >
<xsl:output method="text"/>

<xsl:template match="rdf:RDF">
<xsl:apply-templates select="owl:Class"/>
</xsl:template>

<xsl:template match="owl:Class">
<xsl:variable name="subject" select="@rdf:about"/>
<xsl:variable name="label" select="rdfs:label"/>
<xsl:for-each select="rdfs:subClassOf">
<xsl:value-of select="$subject"/>
<xsl:text>  </xsl:text>
<xsl:value-of select="$label"/>
<xsl:text>  </xsl:text>
<xsl:choose>
  <xsl:when test="@rdf:resource">
    <xsl:value-of select="@rdf:resource"/>
  </xsl:when>
  <xsl:when test="owl:Restriction/owl:someValuesFrom/@rdf:resource">
    <xsl:value-of select="owl:Restriction/owl:someValuesFrom/@rdf:resource"/>
  </xsl:when>
  <xsl:otherwise>
    <xsl:message terminate="yes">??? <xsl:value-of select="$subject"/></xsl:message>
  </xsl:otherwise>
</xsl:choose>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

It will linearize the RDF statements to 3 columns: ID/label/parent-id

$ xsltproc stylesheet.xsl chebi.owl | grep CHEBI_44185
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_26373
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_29347
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_35692
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_50680
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_16015
http://purl.obolibrary.org/obo/CHEBI_44185  methotrexate    http://purl.obolibrary.org/obo/CHEBI_50683
(...)

$ xsltproc stylesheet.xsl chebi.owl | grep "^http://purl.obolibrary.org/obo/CHEBI_35692"
http://purl.obolibrary.org/obo/CHEBI_35692  dicarboxylic acid   http://purl.obolibrary.org/obo/CHEBI_33575

so you can read this tab delimited file in R and query it recursively from the children id to the parent-id.

ADD COMMENT
0
Entering edit mode

That is great, thank you!

ADD REPLY
0
Entering edit mode

@Pierre - neat xslt script!

Unfortunately, using XML-level tools and APIs to process RDF/XML is not a very robust strategy (although fine for one-offs). This is because it's possible to render RDF-XML in structurally different but semantically identical ways. Also, it's just more work than using an RDF or OWL-level API or language! I'll add some other options for processing the OWL below.

ADD REPLY
0
Entering edit mode

You're right Chris, that is why I said that the best option was to use a RDF Reasoner.

ADD REPLY
0
Entering edit mode

Hi Pierre,

Just a quick follow up if I may. How could I modify the style sheet to ignore subclasses that have a restriction? e.g. I would like to ignore this relationship. [?] [?] [?] [?] [?] [?]

Thanks,

Iain

ADD REPLY
0
Entering edit mode

try to change [?] to [?]

ADD REPLY
0
Entering edit mode

Pierre - you did indeed, sorry for hammering home the point. In any case, I see my comments were already redundant with a previous exchange:

Problem To Transform An Owl File Related To Cell Line Ontology Using An Xslt

ADD REPLY
0
Entering edit mode

Hi Pierre, unfortunately that didn't work.

ADD REPLY
0
Entering edit mode

Is it possible to change the style sheet to output the owl: restriction relationships too? then I could only look at the ones I was interested in. Sorry for such basic questions!

ADD REPLY
1
Entering edit mode
13.1 years ago
Pablacious ▴ 630

You can use the OWL API to query the reasoned ontology:

First you load the ontology:

OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(owlFilePath);
PrefixManager pm = new DefaultPrefixManager("http://purl.obolibrary.org/obo/");

You init the reasoner:

OWLReasonerFactory reasonerFactory = new Reasoner.ReasonerFactory();
OWLReasoner reasoner = reasonerFactory.createReasoner(ontology);
OWLDataFactory dataFactory = manager.getOWLDataFactory();

You define the role property for instance (I use the roles ontology a lot, you would need to adapt this for your particular question but I guess it is a start):

OWLObjectProperty propertyHasRole = dataFactory.getOWLObjectProperty(IRI.create("http://purl.obolibrary.org/obo#has_role"));
OWLClass chebiRole = dataFactory.getOWLClass(initialChEBIRole,this.pm);
    reasoner.precomputeInferences();

Then you can ask things such as whether a particular ChEBI entry has a particular ChEBI role:

public boolean hasChEBIRole(String chebiEntry, String chebiRoleName) {
    OWLClass chebiClass = dataFactory.getOWLClass(chebiEntry,this.pm);

    OWLClass chebiQueryRole = dataFactory.getOWLClass(chebiRoleName,this.pm);
    OWLClassExpression hasRoleChEBIRole = dataFactory.getOWLObjectSomeValuesFrom(propertyHasRole, chebiQueryRole);
    NodeSet<OWLClass> chebiEntitiesWithRole = reasoner.getSubClasses(hasRoleChEBIRole, false);

    return chebiEntitiesWithRole.containsEntity(chebiClass);
}

I hope the example is somehow useful. The OWL API is quite tricky to use, it took me a while to manage these kind of things. The OWL API code examples are useful: http://owlapi.sourceforge.net/documentation.html

ADD COMMENT
1
Entering edit mode
13.1 years ago

The simplest way to do this is with a small Cactvs script (see www.xemistry.com/academic for free academic versions).

set nh [network read chebi.obo]
filter create relationship_is_a property C_ONTOLOGY_LINK operator = value is_a
filter create rootnode property V_LEVEL value 0 operator =
set vquery [network scan $nh "V_ONTOLOGY_TERM(id) = CHEBI:44185" vertex]
foreach r [network vertices $nh rootnode] {
    set paths [vertex paths $nh $vquery $r [dict create filters relationship_is_a flags {outgoing}]]
    foreach path $paths {
        set level 0
        foreach v $path {
            puts "[replicate - [incr level]]Node $v = [vertex get $nh $v V_ONTOLOGY_TERM(name)]"
        }
    }
}

This gives you output looking like

-Node 44185 = methotrexate
--Node 35692 = dicarboxylic acid
---Node 33575 = carboxylic acid
----Node 36586 = carbonyl compound
-----Node 36963 = organooxygen compounds
------Node 25806 = oxygen molecular entities
-------Node 33304 = chalcogen molecular entities
--------Node 33675 = p-block molecular entities
---------Node 33579 = main group molecular entity
----------Node 23367 = molecular entity
-----------Node 24431 = molecular structure
-Node 44185 = methotrexate
--Node 35692 = dicarboxylic acid
---Node 33575 = carboxylic acid
----Node 50860 = organic molecular entity
-----Node 33582 = carbon group molecular entities
------Node 33675 = p-block molecular entities
-------Node 33579 = main group molecular entity
--------Node 23367 = molecular entity
---------Node 24431 = molecular structure

(plus a few more such paths)

ADD COMMENT
1
Entering edit mode
13.1 years ago
Tomasz ▴ 50

ontocat R package - http://www.ontocat.org/wiki/r

Here's an example how it's used for gene enrichment test http://www.ontocat.org/browser/trunk/ontoCAT/src/uk/ac/ebi/ontocat/examples/R/Example1.R

This package also has a pure Java version with more examples: http://www.ontocat.org/wiki/OntocatGuide

ADD COMMENT

Login before adding your answer.

Traffic: 2031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6