I was trying to apply GO enrichment analysis using topGO to rice dataset with the help of Reference 1 and Reference 2. As per the manual and these references, it is a good idea to provide to p-value for ranking the genes.
However, I am working on a DEG list that was commonly identified in different conditions (not a specific comparison). In that case, how can I rank the genes based on p-val or FDR? I am not sure how to input the DEGs and make a ranking vector
Fisher test and KM test produce different GO enrichment result. Which one to select?
I use following code,
mart <- biomaRt::useMart(biomart = "plants_mart",
dataset = "osativa_eg_gene",
host = 'plants.ensembl.org')
get_go <- biomaRt::getBM(attributes = c( "ensembl_gene_id",
"go_id"), mart = mart)
get_go <- get_go[get_go$go_id != '',]
geneID2GO <- by(get_go$go_id,
get_go$ensembl_gene_id,
function(x) as.character(x))
all.genes <- sort(unique(as.character(get_go$ensembl_gene_id)))
#Input list
?????
go.obj <- new("topGOdata", ontology='BP'
, allGenes = int.genes
, annot = annFUN.gene2GO
, gene2GO = geneID2GO
, nodeSize = 10
)
#Fisher test
results <- runTest(go.obj, algorithm = "elim", statistic = "fisher")
results.tab <- GenTable(object = go.obj, elimFisher = results)
#Kolmogorov-Smirnov (K-S) test
results.ks <- runTest(go.obj, algorithm="classic", statistic="ks")
goEnrichment <- GenTable(go.obj, KS=results.ks, orderBy="KS", topNodes=20)
goEnrichment <- goEnrichment[goEnrichment$KS<0.05,]
goEnrichment <- goEnrichment[,c("GO.ID","Term","KS")]
goEnrichment$Term <- gsub(" [a-z]*\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- gsub("\\.\\.\\.$", "", goEnrichment$Term)
goEnrichment$Term <- paste(goEnrichment$GO.ID, goEnrichment$Term, sep=", ")
goEnrichment$Term <- factor(goEnrichment$Term, levels=rev(goEnrichment$Term))
goEnrichment$KS <- as.numeric(goEnrichment$KS)
Thanks for the explanation and reference @antonioggsousa.