Question

How to obtain all genes associated with a given GO term from PANTHER API enrichment analysis

0

Entering edit mode

2.7 years ago

gt ▴ 30

I have written some code to run GO enrichment within an R Shiny application where the user can filter the input gene list before submitting for analysis. The code is below:

result_go <- GET(paste0("http://pantherdb.org/services/oai/pantherdb/enrich/overrep?geneInputList=",genes,"&organism=",organism,"&annotDataSet=GO%3A",annot_,"&enrichmentTestType=FISHER&correction=",correct))
result_go <- rawToChar(result_go$content)
result_go <- fromJSON(result_go)
result_go <- data.frame(result_go$results$result)
result_go <- data.frame(cbind(result_go$term$id,result_go$term$label,result_go$number_in_list,result_go$number_in_reference,result_go$fold_enrichment,result_go$fdr))
names(result_go) <- c("Term Id","Term label","# Genes","# Reference","Fold enrichment","Adj. p-value")
result_go$`Fold enrichment` <- round(as.numeric(as.character(result_go$`Fold enrichment`)),2)
result_go$`Adj. p-value` <- round(as.numeric(as.character(result_go$`Adj. p-value`)),2)
result_go <- result_go[(result_go$`# Genes` > 0) & (result_go$`Adj. p-value` < 0.05),] # (result_go$`Adj. p-value` < input$adj_p_val_thresh) & 
result_go <- result_go[order(result_go$`Fold enrichment`,decreasing = TRUE),]

This code gives me a table which looks like the following:

enter image description here

From this point I took all the GO term IDs and use the amigo gene ontology site to look up the GO terms. An example link is the following with the GO term 0038130: http://amigo.geneontology.org/amigo/term/GO:0038130. Instead of looking these up manually for each GO term I use the read.table function in R to download the data for each GO term:

read.table(paste0("http://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,bioentity_label,annotation_class,annotation_class_label,aspect,panther_family,type,reference&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=true&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=isa_partof_closure:%22",go_term,"%22&fq=taxon_subset_closure_label:%22Mus%20musculus%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*"),quote = "",stringsAsFactors = FALSE,fill = TRUE,header = TRUE,sep = "\t")

However, when I take the intersection with the input gene list and the genes associated with each GO term from amigo, I am either obtaining more or less genes than the # Genes column in the results table. Does anyone know what could be happening here? I was under the impression that both the panther API and amigo use gene ontology for the annotations. Is there a way to just obtain the GO terms and associated genes used for the reference in the panther API?

API scRNA-seq R PANTHER • 2.0k views

ADD COMMENT • link updated 2.0 years ago by Viraj • 0 • written 2.7 years ago by gt ▴ 30

score 1 · Answer 1 · 2022-04-02

1

Entering edit mode

2.7 years ago

Istvan Albert 101k

I would suggest downloading and parsing the Gene Ontology annotation files that connect GO terms to gene/protein ids. Those are simple tab delimited files.

The resulting lists are not that large (even for the Human genome is about 600K rows) so the representation should be easy to either load up into memory or to place in an SQLITE type library.

ADD COMMENT • link 2.7 years ago by Istvan Albert 101k

0

Entering edit mode

Do you know where I can find those?

ADD REPLY • link 2.6 years ago by gt ▴ 30

0

Entering edit mode

http://geneontology.org/docs/downloads/

ADD REPLY • link 2.6 years ago by Istvan Albert 101k

0

Entering edit mode

Yes, I actually did try this method as well. Instead of downloading programmatically. But I still do not get the same number of associated genes from the panther API as I do with GO annotations. An example is the term ID GO:0050804 which from the panther API tells me there should be 529 genes associated with the term. But using the file from the link you provided, I am only obtaining 87 genes associated with the term. Are the annotations different between the two tools? Is there a way to get a similar file from panther?

ADD REPLY • link 2.6 years ago by gt ▴ 30

1

Entering edit mode

Ok, now I understand the original question.

I believe that the source of the discrepancy is that the download shows you the leaf nodes, not all intermediate nodes. When a protein is tagged with a GO term it automatically possesses all other functions that have is_a relationship.

A -> B -> C

something tagged as C is also A and B

but there are things tagged just as B, if we just search for B we won't find the entry tagged as C

ADD REPLY • link 2.6 years ago by Istvan Albert 101k

0

Entering edit mode

Ah okay this makes more sense now, thank you!

ADD REPLY • link 2.6 years ago by gt ▴ 30

0

Entering edit mode

I am having the same issue. I used Panther API for my uploaded genes and found 541 genes over-represented in intracellular anatomical structure (GO:0005622)

To investigate further about those 541 genes, I mapped every gene in my uploaded list to GO annotations using this API: http://pantherdb.org/services/oai/pantherdb/geneinfo.

The geneinfo service returns only 4 genes with that GO annotation (GO:0005622).

Is there a way to fix this? Possible get all the intermediate node? Any help would be greatly appreciated!

ADD REPLY • link 2.0 years ago by Viraj • 0