Information content of gene ontology
2
Hi all
I am struggling with calculating the information content of the individuals GO term.
As far as i know, most R packages deal with the semantic similarity of GO term such as GOSemSim
but I didn't find any tools calculating IC of individuals GO term yet
Does anyone know the tools for doing this job directly ?
thanks
gene ontology
information content
• 1.5k views
A quick look at the code of GOSemSim::godata()
shows the IC
slot of the object returned by this function contains the IC for all GO terms:
> GOSemSim:::computeIC
function (goAnno, ont)
{
if (!exists(".GOSemSimEnv"))
.initial()
.GOSemSimEnv <- get(".GOSemSimEnv", envir = .GlobalEnv)
godata <- get("gotbl", envir = .GOSemSimEnv)
goids <- unique(godata[godata$Ontology == ont, "go_id"])
goterms = goAnno$GO
gocount <- table(goterms)
goname <- names(gocount)
go.diff <- setdiff(goids, goname)
m <- double(length(go.diff))
names(m) <- go.diff
gocount <- as.vector(gocount)
names(gocount) <- goname
gocount <- c(gocount, m)
Offsprings <- switch(ont, MF = AnnotationDbi::as.list(GOMFOFFSPRING),
BP = AnnotationDbi::as.list(GOBPOFFSPRING), CC = AnnotationDbi::as.list(GOCCOFFSPRING))
cnt <- gocount[goids] + sapply(goids, function(i) sum(gocount[Offsprings[[i]]],
na.rm = TRUE))
names(cnt) <- goids
p <- cnt/sum(gocount)
IC <- -log(p)
return(IC)
}
E.g.:
semData <- godata(db, ont = ont)
semData@IC
I think I found the answer which is using the annotation file (GO.db) from bioconductor
This is an example code I have used (sorry for a bit messy)
library(GO.db)
P <- toTable(GOBPOFFSPRING)
names(P) <- c("child","parent")
P_count <- as.data.frame(table(P$parent))
P_count$type <- "BP"
C <- toTable(GOCCOFFSPRING)
names(C) <- c("child","parent")
C_count <- as.data.frame(table(C$parent))
C_count$type <- "CC"
M <- toTable(GOMFOFFSPRING)
names(M) <- c("child","parent")
M_count <- as.data.frame(table(M$parent))
M_count$type <- "MF"
ref <- rbind(M_count,C_count,P_count)
IC <- function(id,onto){
cnt <- filter(ref, Var1 == id) %>% dplyr::select(Freq) %>% as.numeric()
df <- select(GO.db, keys(GO.db, "GOID"), columns = c("ONTOLOGY"))
df_02 <- as.data.frame(table(df$ONTOLOGY)) %>% filter(.,Var1 == onto)
all_onto <- as.numeric(df_02[,2])
prob <- cnt/all_onto
IC <- -log2(prob)
return(IC)
}
IC("GO:0001895","MF")
Login before adding your answer.
Traffic: 3620 users visited in the last hour