Hi, I would perform a GO enrichment analysis for a set of genes, for exemple if I have 3 or 4 genes how to get the GO terms using topGO? in this link how to put my genes in 'allgenes' of the object topGOdata? Thank you
Hi, I would perform a GO enrichment analysis for a set of genes, for exemple if I have 3 or 4 genes how to get the GO terms using topGO? in this link how to put my genes in 'allgenes' of the object topGOdata? Thank you
NB - another answer here: issues regarding topGO usage
The input to topGO is a named list of genes and P-values, like this:
head(genes)
ANXA2 DHCR24 GBE1 GLA PRDX1 PSMD14
0 0 0 0 0 0
tail(genes)
CTTNBP2NL METTL7B AKR1C3 SLC5A11 TIMP4 PPP2R2C
7.395843e-07 1.496372e-06 4.328879e-05 1.612348e-04 2.620590e-04 9.728924e-04
range(genes)
[1] 0.0000000000 0.0009728924
The p-values are used to rank the genes, which is important when using the Kolmogorov-Smirnov test.
You can then run topGO like this (here with GO BP as database against which enrichment is performed).
require(topGO)
require(org.Hs.eg.db)
selection <- function(allScore){ return(allScore < 0.05)} # function that returns TRUE/FALSE for p-values<0.05
allGO2genes <- annFUN.org(whichOnto="BP", feasibleGenes=NULL, mapping="org.Hs.eg.db", ID="symbol")
GOdata <- new("topGOdata",
ontology="BP",
allGenes=genes,
annot=annFUN.GO2genes,
GO2genes=allGO2genes,
geneSel=selection,
nodeSize=10)
In order to make use of the rank information, use Kolmogorov-Smirnov (K-S) test:
results.ks <- runTest(GOdata, algorithm="classic", statistic="ks")
goEnrichment <- GenTable(GOdata, KS=results.ks, orderBy="KS", topNodes=20)
goEnrichment$KS <- as.numeric(goEnrichment$KS)
goEnrichment <- goEnrichment[goEnrichment$KS<0.05,]
goEnrichment <- goEnrichment[,c("GO.ID","Term","KS")]
goEnrichment$Term <- gsub(" [a-z]*\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- gsub("\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- paste(goEnrichment$GO.ID, goEnrichment$Term, sep=", ")
goEnrichment$Term <- factor(goEnrichment$Term, levels=rev(goEnrichment$Term))
Plot the results (the enrichment score is just negative log (base 10) of the enrichment P-value):
require(ggplot2)
ggplot(goEnrichment, aes(x=Term, y=-log10(KS))) +
stat_summary(geom = "bar", fun.y = mean, position = "dodge") +
xlab("Biological process") +
ylab("Enrichment") +
ggtitle("Title") +
scale_y_continuous(breaks = round(seq(0, max(-log10(goEnrichment$KS)), by = 2), 1)) +
theme_bw(base_size=24) +
theme(
legend.position='none',
legend.background=element_rect(),
plot.title=element_text(angle=0, size=24, face="bold", vjust=1),
axis.text.x=element_text(angle=0, size=18, face="bold", hjust=1.10),
axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
axis.title=element_text(size=24, face="bold"),
legend.key=element_blank(), #removes the border
legend.key.size=unit(1, "cm"), #Sets overall area/size of the legend
legend.text=element_text(size=18), #Text size
title=element_text(size=18)) +
guides(colour=guide_legend(override.aes=list(size=2.5))) +
coord_flip()
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi kevin, thank you very much, you are always so helpful. It worked and I have the output of 'GenTable'. But my genes of interest have been selected as DEG. So Why we use KS<0.05 ? in my case I have all KS>0.05.
KS is the GO enrichment p-value threshold from the Kolmogorov-Smirnov (K-S) test. It is not the p-value relating to the DEG that you performed.
should come before filtering KS!
Thank you - I have made the change
See also my related answers: