Question

Missing Org.hs.eg.db GO Annotations for Uniprot IDs

0

Entering edit mode

22 months ago

Charlie ▴ 10

I have run into an issue when trying to do GO enrichment using ClusterProfiler in combination with org.hs.eg.db. In this analysis, I am interested in the set of proteins (as labeled by their Uniprot ID) that are found to be differentially abundant between two experimental conditions.

In the process, I noticed that a large number of my Uniprot IDs in my set of proteins do not seem to have any information about them contained in the org.hs.eg.db database (31/87). This is confusing to me, as when I go to Uniprot's website, these protein IDs do have GO terms associated with them. Thus, it seems that these IDs do have GO data associated with them, it just seems to not be contained in the org.hs.eg.db database. I am using the most recent version of org.hs.eg.db (3.17.0).

As a couple of examples, A0A075B6H9, A0A075B6I4, and A0A0B4J1Y9 fit this pattern.

I am wondering, 1. Why is this the case? 2. What can I do about it? Any help in this area would be much appreciated! Thanks!

Org.hs.eg.db clusterProfiler Uniprot • 1.4k views

ADD COMMENT • link 22 months ago by Charlie ▴ 10

0

Entering edit mode

How outdated compared to UniProt annotations is org.hs.eg.db? From the latest docs:

Mappings were based on data provided by: Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA With a date stamp from the source of: 2023-Mar05

So how outdated is this Entrez Gene info relative to UniProt?

ADD REPLY • link 22 months ago by Jean-Karim Heriche 27k

0

Entering edit mode

As for point 2, why not then use GO annotations from UniProt directly?

ADD REPLY • link 22 months ago by Jean-Karim Heriche 27k

1

Entering edit mode

Thanks for the answers! This is what I ended up doing.

In case it helps anyone else, the way I did this is below:

#Annotate uniprot IDS with GO Terms using uniprot API (https://github.com/baynec2/GLabR)
uniprot_go = GLabR::annotate_uniprot_single(ids,columns = "accession,go_id")

#Formating the go terms into a dataframe
go = uniprot_go %>% 
  dplyr::separate_rows(Gene.Ontology.IDs, sep = ';') %>% 
  dplyr::mutate(Gene.Ontology.IDs = gsub(" ","",Gene.Ontology.IDs))

#Making TERM2GENE dataframe to use with enricher
TERM2GENE = go %>% 
  dplyr::select(TERM = Gene.Ontology.IDs,GENE = Entry) 

#Getting the names for each GO term to use with enricher
TERM2NAME = AnnotationDbi::select(GO.db::GO.db,keys = unique(TERM2GENE$TERM),columns = c("TERM"))

#doing the enrichment analysis
enrich = clusterProfiler::enricher(up,
         universe = go$Entry,
         TERM2GENE = TERM2GENE,
         TERM2NAME = TERM2NAME)

#Making a network plot
plot = cnetplot(enrich, categorySize="pvalue")

plot

ADD REPLY • link 22 months ago by Charlie ▴ 10