I have run into an issue when trying to do GO enrichment using ClusterProfiler in combination with org.hs.eg.db. In this analysis, I am interested in the set of proteins (as labeled by their Uniprot ID) that are found to be differentially abundant between two experimental conditions.
In the process, I noticed that a large number of my Uniprot IDs in my set of proteins do not seem to have any information about them contained in the org.hs.eg.db database (31/87). This is confusing to me, as when I go to Uniprot's website, these protein IDs do have GO terms associated with them. Thus, it seems that these IDs do have GO data associated with them, it just seems to not be contained in the org.hs.eg.db database. I am using the most recent version of org.hs.eg.db (3.17.0).
As a couple of examples, A0A075B6H9, A0A075B6I4, and A0A0B4J1Y9 fit this pattern.
I am wondering, 1. Why is this the case? 2. What can I do about it? Any help in this area would be much appreciated! Thanks!
How outdated compared to UniProt annotations is org.hs.eg.db? From the latest docs:
So how outdated is this Entrez Gene info relative to UniProt?
As for point 2, why not then use GO annotations from UniProt directly?
Thanks for the answers! This is what I ended up doing.
In case it helps anyone else, the way I did this is below: