Information content of gene ontology
2
0
Entering edit mode
4.7 years ago
Yean ▴ 140

Hi all

I am struggling with calculating the information content of the individuals GO term.

As far as i know, most R packages deal with the semantic similarity of GO term such as GOSemSim

but I didn't find any tools calculating IC of individuals GO term yet

Does anyone know the tools for doing this job directly ?

thanks

gene ontology information content • 1.4k views
ADD COMMENT
1
Entering edit mode
4.4 years ago
Zuguang Gu ▴ 220

A quick look at the code of GOSemSim::godata() shows the IC slot of the object returned by this function contains the IC for all GO terms:

> GOSemSim:::computeIC
function (goAnno, ont) 
{
    if (!exists(".GOSemSimEnv")) 
        .initial()
    .GOSemSimEnv <- get(".GOSemSimEnv", envir = .GlobalEnv)
    godata <- get("gotbl", envir = .GOSemSimEnv)
    goids <- unique(godata[godata$Ontology == ont, "go_id"])
    goterms = goAnno$GO
    gocount <- table(goterms)
    goname <- names(gocount)
    go.diff <- setdiff(goids, goname)
    m <- double(length(go.diff))
    names(m) <- go.diff
    gocount <- as.vector(gocount)
    names(gocount) <- goname
    gocount <- c(gocount, m)
    Offsprings <- switch(ont, MF = AnnotationDbi::as.list(GOMFOFFSPRING), 
        BP = AnnotationDbi::as.list(GOBPOFFSPRING), CC = AnnotationDbi::as.list(GOCCOFFSPRING))
    cnt <- gocount[goids] + sapply(goids, function(i) sum(gocount[Offsprings[[i]]], 
        na.rm = TRUE))
    names(cnt) <- goids
    p <- cnt/sum(gocount)
    IC <- -log(p)
    return(IC)
}

E.g.:

semData <- godata(db, ont = ont)
semData@IC
ADD COMMENT
0
Entering edit mode
4.7 years ago
Yean ▴ 140

I think I found the answer which is using the annotation file (GO.db) from bioconductor

This is an example code I have used (sorry for a bit messy)

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("GO.db")

library(GO.db)

P <- toTable(GOBPOFFSPRING)
names(P) <- c("child","parent")
P_count <- as.data.frame(table(P$parent))
P_count$type <- "BP"

C <- toTable(GOCCOFFSPRING)
names(C) <- c("child","parent")
C_count <- as.data.frame(table(C$parent))
C_count$type <- "CC"

M <- toTable(GOMFOFFSPRING) 
names(M) <- c("child","parent")
M_count <- as.data.frame(table(M$parent))
M_count$type <- "MF"

ref <- rbind(M_count,C_count,P_count)
IC <- function(id,onto){
  #count child  
  cnt <- filter(ref, Var1 == id) %>% dplyr::select(Freq) %>% as.numeric()
  #all GO term in select ontology
  df <- select(GO.db, keys(GO.db, "GOID"),  columns = c("ONTOLOGY"))
  df_02 <- as.data.frame(table(df$ONTOLOGY)) %>% filter(.,Var1 == onto) 
  all_onto <- as.numeric(df_02[,2])
  prob <- cnt/all_onto
  IC <- -log2(prob)
  return(IC)
}

IC("GO:0001895","MF")
ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6