How to deal with the data that has some genes with 0 log2fold change? Is it a good idea to prefilter before ranking as they are essentially not modulated by the treatment for example?
Edit 31st July, 2019: I gave my original answer (below) assuming that you were referring to the general process of gene enrichment (or 'gene-set enrichment analysis'), and not that you were referring to GSEA, the Broad Institute's PROGRAM that hijacked the term GSEA
GSEA (the Broad Institute program) permits a ranked list of genes, as does topGO (R), fGSEA (see my comment below), and other enrichment programs - there are too many programs.
---------------------------------------------
It makes sense that there is no consensus, as there are countless ways to do this. My own recommendation would be to:
Set an adjusted P value cut-off
Rank genes based on absolute log (base 2) fold change
I believe the most widely used method is to just set an adjusted P value and log (base 2) fold change cut-off, and to then 'throw' the resulting gene list into the GSEA without any ranking.
The lack of consensus on a proper filtering strategy may in part be due to the fact that a substantial proportion of researchers do not pay much attention to the results of GSEA. GSEA results would certainly never stand as the sole evidence in a clinical test, neither would they be sufficient evidence on which conclusions could be made in most reputable journals.
I have a query regarding the analysis of GSEA Results. I have used GSEA to obtain the dysregulated KEGG pathways. Now, I want to rank the dysregulated KEGG pathways.
So, is it logical to use NES * (-log10 Nominal p-value) or NES * (-log10 FDR q-value) for ranking the KEGG pathways?
How to deal with the data that has some genes with 0 log2fold change? Is it a good idea to prefilter before ranking as they are essentially not modulated by the treatment for example?