Hi,
We are trying to make sense of PubChem RDF data, and there are some anomalies in the data as compared to what we have in other RDF datasets. In other RDF datasets, we usually have at least 1 triple statement with rdf:type for each of the URI for the various entities (e.g. Protein, Gene, Compound).
For PubChem RDF data, we find that for the entities, not all entity URIs do not have the rdf:type triples: e.g. - Gene: 58198 Unique Gene IDS in the gene file, of which only 291 have an rdf:type predicate - Protein: 20223 unique IDs but only 16120 IDs with rdf:type = bp:Protein - Compound: 103mil unique IDs but only 133k IDs with rdf:type
Can I seek expert opinion on why this is the case? And how to make sense of entities with IDs that doesn't have his rdf:type triple statement?