Question

GO analysis using topGO

6

Entering edit mode

6.7 years ago

mannoulag1 ▴ 130

Hi, I would perform a GO enrichment analysis for a set of genes, for exemple if I have 3 or 4 genes how to get the GO terms using topGO? in this link how to put my genes in 'allgenes' of the object topGOdata? Thank you

topGO GO GOenrichment gene R • 25k views

ADD COMMENT • link updated 6.7 years ago by Kevin Blighe 89k • written 6.7 years ago by mannoulag1 ▴ 130

score 20 · Accepted Answer · 2018-11-21

NB - another answer here: issues regarding topGO usage

-----------------------------------

The input to topGO is a named list of genes and P-values, like this:

head(genes)
 ANXA2 DHCR24   GBE1    GLA  PRDX1 PSMD14 
     0      0      0      0      0      0 
tail(genes)
   CTTNBP2NL      METTL7B       AKR1C3      SLC5A11        TIMP4      PPP2R2C 
7.395843e-07 1.496372e-06 4.328879e-05 1.612348e-04 2.620590e-04 9.728924e-04 

range(genes)
[1] 0.0000000000 0.0009728924

The p-values are used to rank the genes, which is important when using the Kolmogorov-Smirnov test.

You can then run topGO like this (here with GO BP as database against which enrichment is performed).

require(topGO)
require(org.Hs.eg.db)

selection <- function(allScore){ return(allScore < 0.05)} # function that returns TRUE/FALSE for p-values<0.05
allGO2genes <- annFUN.org(whichOnto="BP", feasibleGenes=NULL, mapping="org.Hs.eg.db", ID="symbol")
GOdata <- new("topGOdata",
  ontology="BP",
  allGenes=genes,
  annot=annFUN.GO2genes,
  GO2genes=allGO2genes,
  geneSel=selection,
  nodeSize=10)

In order to make use of the rank information, use Kolmogorov-Smirnov (K-S) test:

results.ks <- runTest(GOdata, algorithm="classic", statistic="ks")
goEnrichment <- GenTable(GOdata, KS=results.ks, orderBy="KS", topNodes=20)
goEnrichment$KS <- as.numeric(goEnrichment$KS)
goEnrichment <- goEnrichment[goEnrichment$KS<0.05,]
goEnrichment <- goEnrichment[,c("GO.ID","Term","KS")]
goEnrichment$Term <- gsub(" [a-z]*\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- gsub("\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- paste(goEnrichment$GO.ID, goEnrichment$Term, sep=", ")
goEnrichment$Term <- factor(goEnrichment$Term, levels=rev(goEnrichment$Term))

Plot the results (the enrichment score is just negative log (base 10) of the enrichment P-value):

require(ggplot2)
ggplot(goEnrichment, aes(x=Term, y=-log10(KS))) +
    stat_summary(geom = "bar", fun.y = mean, position = "dodge") +
    xlab("Biological process") +
    ylab("Enrichment") +
    ggtitle("Title") +
    scale_y_continuous(breaks = round(seq(0, max(-log10(goEnrichment$KS)), by = 2), 1)) +
    theme_bw(base_size=24) +
    theme(
        legend.position='none',
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=24, face="bold", vjust=1),
        axis.text.x=element_text(angle=0, size=18, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
        axis.title=element_text(size=24, face="bold"),
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=18),  #Text size
        title=element_text(size=18)) +
    guides(colour=guide_legend(override.aes=list(size=2.5))) +
    coord_flip()