(...) You should use all genes, or at least all relevant genes. In
DESeq2 that might be the genes surviving the independent filtering
(=not being NA) or in edgeR those that survive filterByExpr. GSEA
tests whether a gene set as a whole (rather than individual genes as
we test in a pairwise comparison with the mentioned tools) show
evidence to be over- or underexpressed. A geneset can (as a whole)
show evidence to be overexpressed even though each gene individually
does not need to be overexpressed (=being significant) in a pairwise
comparison. It is simply two different types of questions one asks
when using pairwise DE testing and GSEA. For DESeq2 I would therefore
use all genes surviving the independent filtering, e.g. ranked by
moderated and shrunken LFC after applying lfcShrink. As we rank genes
for GSEA we obviously lose the information of the magnitude of the
ranking metric (here the fold changes) so GSEA informs about global
tendencies. I think it makes sense to always pair GSEA results with
other information, like the fold changes from DESeq2. Even if your
GSEA is significant, but it turns out that the fold changes of your
DESeq2 analysis for the genes of that particular pathway you are
fgsea-ing against are tiny (like very close to zero), then it is
probably questionable whether the result is biologically meaningful,
even though in GSEA rank space the analysis was significant. But I
think the practice of combining different analysis methods to make a
confident statements always makes sense, not just in the GSEA context.
Does that make sense to you?
No problem. I am not really familiar with what the GSEA implementation from the Broad institute does. I personally use fgsea from Bioconductor, and for this I rank the genes by shrunken logFC. It is on you what you use for ranking. I do not know how the Broad GSEA works and what kind of input it expects.
You should read up about GSEA, it sounds like you don't have a good grasp of what the process involves, which could lead you to misinterpret the results. The original paper gives a good overview of the theory, and this page gives some good tips on providing a rank statistic.
I am really sorry for asking for clarification because I am a beginner.
I already used all genes (normalized and filtered) for GSEA using the program GSEA version:4.1.0. but before LFC shrinkage. is that correct?
DO you mean that I should rank the genes In pic (normalized and filtered) according to shrunk LFC and then using them as input for GSEA ??
No problem. I am not really familiar with what the GSEA implementation from the Broad institute does. I personally use fgsea from Bioconductor, and for this I rank the genes by shrunken logFC. It is on you what you use for ranking. I do not know how the Broad GSEA works and what kind of input it expects.