Dear all,
I have been analysing RNA seq data and I wanted to do gene set enrichment analysis with clusterprofiler package. I have used deseq2 to identify differentially expressed genes. I have set lfcThreshold = 1 while calling function results(). I have created a vector with log2 fold changes and Entrez names. I thought that gseGO function from clusterprofiler is the same thing as GSEA function from clusterprofiler is the same thing. Am I wrong? I have run gseGO on my sorted log2 fold changes list and then I ran GSEA function on the same list and specified TERM2GENE to be a gene list downloaded from broad's institute website (c5: GO).
Basicaly this is what I did:
gseaGO1 <- gseGO(geneList = foldchanges,
OrgDb = org.Hs.eg.db,
ont = 'All',
nPerm = 1000,
minGSSize = 10,
pvalueCutoff = 0.05,
verbose = FALSE)
c5 <- read.gmt("c5.all.v7.0.entrez.gmt")
gseaGO2 <- GSEA(foldchanges,
TERM2GENE=c5,
minGSSize = 10,
nPerm = 1000,
pvalueCutoff = 0.05,
verbose=FALSE)
The results are very similar but not the same. I can see some of the GO sets in results of both gseaGO1 and gseaGO2 and as far as I can see they have the same enrichment score but different NES value, pvalue, padjusted (however differences are VERY small).
So my questions are: are the gseGO and GSEA functions form clusterprofiler package the same (in a mathematical sense)? Additionally, I have defined c5.all.v7.0.entrez.gmt to be gene set database for GSEA function, but which gene set database is used for gseGO?
Even though I am new to this analysis I have read a lot about it but it still isn't clear to me this. Thank you very much for your time and help.
Thank you very much MatthewP. It is clear to me now.