Thanks for your help in this. It is appreciated.
To give you more detail. I am using the 'Homo Sapiens.owl' file available from: http://www.reactome.org/download/current/biopax.zip .
I am using SWI-Prologs RDF-Parser http://www.swi-prolog.org/pldoc/package/rdf2pl.html to bring the file into a Prolog knowledge base.
So as an example of what I am trying to understand: I am looking at the pathway "Regulation of activated PAK-2p34 by proteasome mediated degradation" RDF:ID=Pathway1474
Diagram: http://www.reactome.org/PathwayBrowser/#DIAGRAM=169911&ID=211733&PATH=109581
By inspecting the diagram It can be seen that there is a reaction "Regulation of activated PAK-2p34 by proteasome mediated degradation" which is Catalyzed by "26S proteasome."
The XML snippet would be:
<bp:Pathway rdf:ID="Pathway1473">
<bp:pathwayComponent rdf:resource="#BiochemicalReaction6535"/>
<bp:pathwayComponent rdf:resource="#BiochemicalReaction6536"/>
<bp:pathwayOrder rdf:resource="#PathwayStep8092"/>
<bp:pathwayOrder rdf:resource="#PathwayStep8091"/>
<bp:organism rdf:resource="#BioSource2"/>
<bp:displayName rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Regulation of activated PAK-2p34 by proteasome mediated degradation</bp:displayName>
<bp:xref rdf:resource="#UnificationXref80321"/>
<bp:xref rdf:resource="#UnificationXref80322"/>
<bp:comment rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Stimulation of cell death by PAK-2 requires the generation and stabilization of the caspase-activated form, PAK-2p34 (Walter et al., 1998;Jakobi et al., 2003). Levels of proteolytically activated PAK-2p34 protein are controlled by ubiquitin-mediated proteolysis. PAK-2p34 but not full-length PAK-2 is degraded by the 26 S proteasome (Jakobi et al., 2003). It is not known whether ubiquitination and degradation of PAK-2p34 occurs in the cytoplasm or in the nucleus.</bp:comment>
<bp:xref rdf:resource="#PublicationXref13720"/>
<bp:xref rdf:resource="#PublicationXref7487"/>
<bp:xref rdf:resource="#RelationshipXref749"/>
<bp:dataSource rdf:resource="#Provenance1"/>
<bp:comment rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Authored: Jakobi, R, 2008-02-05 11:04:14</bp:comment>
<bp:comment rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Reviewed: Chang, E, 2008-05-21 00:05:41</bp:comment>
<bp:comment rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Edited: Matthews, L, 2008-02-03 20:50:13</bp:comment>
<bp:comment rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">Edited: Matthews, L, 2008-06-12 00:23:53</bp:comment>
</bp:pathway>
Here you can see that there are two Biochemical Reactions in the pathway. RDF:ID=BiochemicalReaction6535 & RDF:Id=BiochemicalReaction6536. And there also two pathway steps: Rdf:Id=PathwayStep8092 & RDF:ID=PathwayStep8091.
I use Prolog to query the RDF triples (I am not sure if you are familiar with Prolog apologies if not!) . For example I have a predicate:
controlled_reaction(Controller,Controlled_Reaction,Control):-
rdf(Control,'http://www.w3.org/1999/02/22-rdf-syntax-ns#type','http://www.biopax.org/release/biopax-level3.owl#Control'),
rdf(Control, 'http://www.biopax.org/release/biopax-level3.owl#controller', Controller),
rdf(Control, 'http://www.biopax.org/release/biopax-level3.owl#controlled', Controlled_Reaction).
And for catalyzed reactions :
catalyzed_reaction(Controller, Controlled_Reaction,Catalyst):-
rdf(Catalyst, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type','http://www.biopax.org/release/biopax-level3.owl#Catalysis'),
rdf(Catalyst,'http://www.biopax.org/release/biopax-level3.owl#controller', Controller),
rdf(Catalyst,'http://www.biopax.org/release/biopax-level3.owl#controlled', Controlled_Reaction).
For example I can use thsi to find that for rdf:ID="Catalysis1" the controller is RDF:ID="Protein5" and the controlled reaction is RDF:ID="BiochemicalReaction2" the corresponding xml snippet is:
<bp:Catalysis rdf:ID="Catalysis1">
<bp:controller rdf:resource="#Protein5"/>
<bp:controlled rdf:resource="#BiochemicalReaction2"/>
<bp:controlType rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">ACTIVATION</bp:controlType>
<bp:xref rdf:resource="#RelationshipXref1"/>
<bp:xref rdf:resource="#RelationshipXref2"/>
<bp:dataSource rdf:resource="#Provenance1"/>
</bp:catalysis>
Now if I query for RDF:ID="BiochemicalReaction6536" it will return no results.
It seems that the information about the catalyst for the reaction is contained in the pathway step.
<bp:PathwayStep rdf:ID="PathwayStep8092">
<bp:stepProcess rdf:resource="#BiochemicalReaction6536" />
<bp:stepProcess rdf:resource="#Catalysis102" />
</bp:pathwaystep>
This shows a link between RDF:Id=BiochemicalReaction6536 & RDF:ID=Catalysis102.
If I look up RDF:ID=Catalysis102 the corresponding xml snippet is:
<bp:Catalysis rdf:ID="Catalysis102">
<bp:controller rdf:resource="#Complex343"/>
<bp:controlled rdf:resource="#BiochemicalReaction271"/>
<bp:controlType rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">ACTIVATION</bp:controlType>
<bp:xref rdf:resource="#RelationshipXref144"/>
<bp:xref rdf:resource="#RelationshipXref145"/>
<bp:dataSource rdf:resource="#Provenance1"/>
</bp:catalysis>
If I look up RDF:ID="Complex343" I can see that this is indeed "26S proteasome". But the information about RDF:ID="Catalysis102" states the controlled reaction is RDF:ID="BiochemicalReaction271". And that the control type is "ACTIVATION".
So my question is: What is the relationship between RDF:ID="BiochemicalReaction271" and RDF:ID="BiochemicalReaction6536"?
Do I infer that RDF:ID="BiochemicalReaction6536" is a type of RDF:ID="BiochemicalReaction271" because of the pathwayStep rdf:ID="PathwayStep8092" ? That is : should they have the same properties? Significantly the control type. i.e. in this case "ACTIVATION"?
To state another way:
By using the pathway steps the results match the diagram and seem to make sense, but I am uncertain if I am capturing the information correctly. For example the information for RDF:ID=Catalysis102 states that the controller is rdf:id=Complex343 and the controlled reaction is rdf:id=BiochemicalReaction271 and that the control type is "ACTIVATION". Do I say that because of rdf:ID="PathwayStep8092" contains the step processes of rdf:id= BiochemicalReaction6536 & rdf:id="Catalysis102" that the reaction rdf:id= BiochemicalReaction6536 is also controlled by rdf:id=Complex343" and that the control type is "ACTIVATION". And that the control type will always be 'inherited' in this manner?
Is the relationship between RDF:ID="BiochemicalReaction6536" & RDF:ID="BiochemicalReaction271" something to do with the section on the Reactome pathway viewer website that talks about the reaction has been "Inferred from another species" or is this unrelated?
Sorry this is quite long and I am not sure if it will be much clearer- but hopefully it is!
Thank once again for your time and links. Paxtools does indeed look useful but I am trying to have this in Prolog for a number of reasons, so it is not completely suitable. Although reading the documentation has helped me to understand the format so thank you.
thanks for the details, Igor :)