Greetings,
At which part are you having difficulty with the function, exactly?
The function can be used in different ways but the input gene list should be a name-value pairing. The values can be a rank (#1, highest) or a list of p-values, e.g., from differential expression. In this example, I supply a list of ranked genes:
1, obtain input genes
rankedGenes <- c(1:10)
names(rankedGenes) <- c("BRCA1","BRCA2","TP53", "ATM", "ERCC1", "BRCC3", "TP63", "ERCC2", "ERCC3", "ERCC4")
rankedGenes
BRCA1 BRCA2 TP53 ATM ERCC1 BRCC3 TP63 ERCC2 ERCC3 ERCC4
1 2 3 4 5 6 7 8 9 10
So, in my genes, BRCA1 is ranked highest... ERCC4 is lowest.
Edit April 25, 2019:
- One can also supply name-value pairings with p-values, in which case
the lowest p-value is ranked highest.
- You can also just supply a vector where everything has the same
value, in which case everything has equal rank
2, set-up the enrichment object based on your genes
require(topGO)
require(org.Hs.eg.db)
selection <- function(x) TRUE
allGO2genes <- annFUN.org(whichOnto="BP", feasibleGenes=NULL, mapping="org.Hs.eg.db", ID="symbol")
GOdata <- new("topGOdata", ontology="BP", allGenes=rankedGenes,
annot=annFUN.GO2genes, GO2genes=allGO2genes,
geneSel=selection, nodeSize=5)
Depending on the format of your input genes, you can change the value of ID
, which is passed to annFUN.org()
. This is to allow for gene mapping to GO terms to take place.
Also note that whichOnto
and ontology
can be BP, MF, or CC, relating to the GO categories.
It's up to you to ensure that the gene list that you're providing is the correct list of genes. The geneSel
parameter of new()
(above) can be used to filter your list in real time, e.g., filtering on P<0.01 if supplying genes and their respective p-values; otherwise, you can just set this to geneSel=TRUE
and use everything that's supplied (as I have).
3, perform gene enrichment. Here, we make use of the rank information by using the Kolmogorov-Smirnov (K-S) test on the enriched terms
results.ks <- runTest(GOdata, algorithm="classic", statistic="ks")
goEnrichment <- GenTable(GOdata, KS=results.ks, orderBy="KS", topNodes=10)
goEnrichment$KS <- as.numeric(goEnrichment$KS)
goEnrichment <- goEnrichment[goEnrichment$KS<0.05,]
goEnrichment <- goEnrichment[,c("GO.ID","Term","KS")]
goEnrichment
goEnrichment
GO.ID Term KS
1 GO:0010212 response to ionizing radiation 0.015
2 GO:0007154 cell communication 0.041
3 GO:0007165 signal transduction 0.041
4 GO:0008630 intrinsic apoptotic signaling pathway in... 0.041
5 GO:0016569 covalent chromatin modification 0.041
6 GO:0016570 histone modification 0.041
7 GO:0023052 signaling 0.041
8 GO:0030330 DNA damage response, signal transduction... 0.041
9 GO:0035556 intracellular signal transduction 0.041
10 GO:0042127 regulation of cell proliferation 0.041
See a related thread where I also produce an enrichment plot using ggplot2: A: GO analysis using topGO
Kevin
See also my related answers: