I need to run semantic similarity searches on human phenotype ontology.
I am aware of packages like GoSim, SemSim etc. These packages works well with GO. I am looking for a package that can take any .obo file and run semantic similarities on top of it. Do you know of any packages that can do it?
Thanks in advance!
Thanks Hfang,
dnet is a great resource.
I was trying the example you have provided and have following queries.
Thanks!
Hi Khader,
Below are the long answers for your questions. Hope they are useful.
1. HPO has three namespaces (sub-ontologies). This situation is very similar to GO and its sub-ontologies (Biological Process, Molecular Function, Cellular Component). For this reason, you have to calculate semantic similarity for each sub ontology, and then take their sum as your final semantic similarity. Alternatively, for HPO, usually the sub-ontology (Phenotypic Abnormality) is useful, and other two sub-ontologies (Mode of Inheritance; ONset and clinical course) are not well-defined.
2. First, make it clear that semantic similarity is a type of comparison to assess the degree of relatedness between two entities. It can be between two terms, but also can be between two genes annotated by terms. To do these, information content (IC) of a term is defined as the negative 10-based log-transformed frequency of genes annotated to that term. This definition considers the actual usage of a term (the frequency of annotated genes it has) to measure how specific and informative the term is. The function http://supfam.org/dnet/dDAGtermSim.html is to calculate semantic similarity between terms, which is then used by the function http://supfam.org/dnet/dDAGgeneSim.html to calculate semantic similarity between genes. When we are talking about semantic similarity between terms, the semantic similairty is NOT about their distance in the ontology hierarchy (actually organised as a DAG: directed acyclic graph without cycles). Depending on which methods to use, the meaning of semantic similarity can be different. If you choose the method 'Resnik', then semantic similarity is the information content (IC) at most informative common ancestor (MICA) of two terms (of your interest). MICA for HP:0000062 x HP:0000062 is HP:0000062 (who's IC is 1.958076). MICA for HP:0000062 x HP:0010931 is the root of ontology. Always, IC at the root is zero.
3. As for the Disease Ontology (DO), there is no sub-ontology. So the situation is very simple. Here is the code how to do it using dnet.
3a) if you are interested in the semantic similarity between DO terms.
3b) if you are interested in the semantic similarity between human genes (annotatable by DO terms).
Hi Fang,
Thanks you for taking time out to explain with great clarity and your package is very, very well documented.
My requirement is a bit different. I have a set of DOIDs and HPOIDs - no gene association data; some of these phenotypes are mapped to non-coding regions.
I need to get similarity between the sets of HPO or DO IDs using a function similar to GO term similarity methods
What I really want is a single number that provides a cumulative similarity score across all DO / HPO IDs instead of matrix of pair-wise similarities. Is there any option in dnet to get this out?
Even an implementation that would take a set of IDs like
Input<-c("HPOID:0100543", "HPOID:0100543", "HPOID:0001250", "HPOID:0001250")
and provide a similarity metric as output would also be very useful.Thank you again for providing a very useful package.