I have written some code to run GO enrichment within an R Shiny application where the user can filter the input gene list before submitting for analysis. The code is below:
result_go <- GET(paste0("http://pantherdb.org/services/oai/pantherdb/enrich/overrep?geneInputList=",genes,"&organism=",organism,"&annotDataSet=GO%3A",annot_,"&enrichmentTestType=FISHER&correction=",correct))
result_go <- rawToChar(result_go$content)
result_go <- fromJSON(result_go)
result_go <- data.frame(result_go$results$result)
result_go <- data.frame(cbind(result_go$term$id,result_go$term$label,result_go$number_in_list,result_go$number_in_reference,result_go$fold_enrichment,result_go$fdr))
names(result_go) <- c("Term Id","Term label","# Genes","# Reference","Fold enrichment","Adj. p-value")
result_go$`Fold enrichment` <- round(as.numeric(as.character(result_go$`Fold enrichment`)),2)
result_go$`Adj. p-value` <- round(as.numeric(as.character(result_go$`Adj. p-value`)),2)
result_go <- result_go[(result_go$`# Genes` > 0) & (result_go$`Adj. p-value` < 0.05),] # (result_go$`Adj. p-value` < input$adj_p_val_thresh) &
result_go <- result_go[order(result_go$`Fold enrichment`,decreasing = TRUE),]
This code gives me a table which looks like the following:
From this point I took all the GO term IDs and use the amigo gene ontology site to look up the GO terms. An example link is the following with the GO term 0038130: http://amigo.geneontology.org/amigo/term/GO:0038130. Instead of looking these up manually for each GO term I use the read.table function in R to download the data for each GO term:
read.table(paste0("http://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,bioentity_label,annotation_class,annotation_class_label,aspect,panther_family,type,reference&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=true&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=isa_partof_closure:%22",go_term,"%22&fq=taxon_subset_closure_label:%22Mus%20musculus%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*"),quote = "",stringsAsFactors = FALSE,fill = TRUE,header = TRUE,sep = "\t")
However, when I take the intersection with the input gene list and the genes associated with each GO term from amigo, I am either obtaining more or less genes than the # Genes column in the results table. Does anyone know what could be happening here? I was under the impression that both the panther API and amigo use gene ontology for the annotations. Is there a way to just obtain the GO terms and associated genes used for the reference in the panther API?
Do you know where I can find those?
http://geneontology.org/docs/downloads/
Yes, I actually did try this method as well. Instead of downloading programmatically. But I still do not get the same number of associated genes from the panther API as I do with GO annotations. An example is the term ID GO:0050804 which from the panther API tells me there should be 529 genes associated with the term. But using the file from the link you provided, I am only obtaining 87 genes associated with the term. Are the annotations different between the two tools? Is there a way to get a similar file from panther?
Ok, now I understand the original question.
I believe that the source of the discrepancy is that the download shows you the leaf nodes, not all intermediate nodes. When a protein is tagged with a GO term it automatically possesses all other functions that have is_a relationship.
something tagged as C is also A and B
but there are things tagged just as B, if we just search for B we won't find the entry tagged as C
Ah okay this makes more sense now, thank you!
I am having the same issue. I used Panther API for my uploaded genes and found 541 genes over-represented in intracellular anatomical structure (GO:0005622)
To investigate further about those 541 genes, I mapped every gene in my uploaded list to GO annotations using this API: http://pantherdb.org/services/oai/pantherdb/geneinfo.
The geneinfo service returns only 4 genes with that GO annotation (GO:0005622).
Is there a way to fix this? Possible get all the intermediate node? Any help would be greatly appreciated!