What Sparql End Point Can I Use With To Find All Animals?
3
4
Entering edit mode
12.0 years ago

Is there a public SPARQL end point where I query if a particular organism is a plant, animal, or virus? The input I have is an taxonomy identifer, e.g. http://bio2rdf.org/taxonomy:9606, and I want to get the superclass, classification, or so, that tells me this falls in the Animal kingdom.

Technically, this is done by using a reasoning engine, e.g. as outlined in this question. But want to mash up things, and leave a public server in control of updating the end point content, and use federated SPARQL instead of caching the data myself.

The kind of SPARQL I like to fire is:

SELECT * WHERE {
  tax:9606 foo:isA ?kingdom .
  ?kingdom rdf:type bar:Kingdom . 
}
taxonomy • 6.3k views
ADD COMMENT
0
Entering edit mode

Your link to "this question" is not correct (links to viruses at uniprot.org instead)

ADD REPLY
0
Entering edit mode

Fixed. Thanx for letting me know.

ADD REPLY
6
Entering edit mode
12.0 years ago
Jerven ▴ 660

Of course there is no animal kingdom in the NCBI/UniProt taxonomy so I replaced that with metazoa. But the uniprot beta sparql endpoint can be used like this.

PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
SELECT ?input ?kingdom ?name
FROM <http://purl.uniprot.org/taxonomy/>
WHERE
{
  ?kingdom a up:Taxon .
  ?kingdom up:scientificName ?name.
  { ?kingdom up:rank up:Kingdom } UNION {?kingdom up:rank up:Superkingdom}
  BIND (taxon:9606 AS ?input)
  ?input rdfs:subClassOf+ ?kingdom .
}

This query will tell you exactly the corresponding kingdom and super kingdoms that your ncbi tax id corresponds to.

or this

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
SELECT ?input ?isPlant ?isMetazoa ?isVirus 
FROM <http://purl.uniprot.org/taxonomy/>
WHERE
{
  BIND(taxon:9606 AS ?input)
  {
    ?input rdfs:subClassOf+ taxon:33090 . #viridiplanae
    BIND(true AS ?isPlant)
  } UNION {
    ?input rdfs:subClassOf+ taxon:33208 . #metazoa
    BIND(true AS ?isMetazoal)
  } UNION {
    ?input rdfs:subClassOf+ taxon:10239 . #viruses
    BIND(true AS ?isVirus)
  }
}

Once we adapt to the latest sparql1.1 draft you can ask for a multiple of tax ids in one go. This won't work before January 2013 at the earliest so until then you need to use the other method.

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
SELECT ?input ?isPlant ?isMamal ?isVirus 
FROM <http://purl.uniprot.org/taxonomy/>
WHERE
{
  VALUES ?input {(taxon:9606) (taxon:8333)} #etc...
  {
    ?input rdfs:subClassOf+ taxon:33090 . #viridiplanae
    BIND(true AS ?isPlant)
  } UNION {
    ?input rdfs:subClassOf+ taxon:40674 . #mamalia
    BIND(true AS ?isMamal)
  } UNION {
    ?input rdfs:subClassOf+ taxon:10239 . #viruses
    BIND(true AS ?isVirus)
  }
}
ADD COMMENT
1
Entering edit mode

The use of "rdfs:subClassOf+" is really interesting...

ADD REPLY
0
Entering edit mode

I think that something has changed in UniProt regarding the use of rdfs:subClassOf+, because it doesn't work now. If I use rdfs:subClassOf? (or without ?) it works fine and I can get an organism lineage like this. Here's the error I get when using rdfs:subClassOf+:

Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool

Jerven could you update the answer? Same in this answer: A: Order by subclass SPARQL

ADD REPLY
0
Entering edit mode

very interesting Jerven, thanks ! I need to learn more about the SPARQL grammar.

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Yes, so what we need is a reasoning SPARQL end point. Doing things iteratively will not work.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode

Or see my answer and use SPARQL path queries

ADD REPLY
0
Entering edit mode
12.0 years ago

I am surprised that that is not in the RDF you already have. According NCBI taxonomy (the underlying database) 9606 is human and thus a primate (which it shows). http://www.ncbi.nlm.nih.gov/taxonomy?term=9606 Since the taxonomy is a tree it should simply know that we are not plants and it is the right source for that. So the RDF for the taxonomy itself should be improved to reflect the tree structure if that is not already in there.

ADD COMMENT
0
Entering edit mode

RDF is not SPARQL. I'm sure it is available as RDF, but I like to hear about public SPARQL end points. And I changed to the title to match the example I give...

ADD REPLY
0
Entering edit mode

Agreed. Will leave answer up since that with your comment is actually quite instructive.

ADD REPLY

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6