Dear Biostars,
I want to prepare the rank file for GSEA analysis based on RNA-seq results that were generated by DESeq2. I have found different recommendations as how to create the pre-ranked gene list. The GSEA site mentions the gene list can be sorted by any value, however, other people have pointed out in this blog that the direction of fold change is important for GSEA analysis. Now, if the genes are sorted only based on their log fold change, a gene with a large fold change but a poor p-value will be ranked higher than a gene with a statistically significant fold change that is smaller in magnitude.
I've also read Mark Zeimann's post about his approach to this issue where he generates a new scoring metric by multiplying the sign of fold change by its inverse p-value: http://genomespot.blogspot.com.au/2014/09/data-analysis-step-8-pathway-analysis.html
He also adds that: "at the top of the list are the genes with "strongest" up-regulation and the bottom of the list are genes with "strongest" down-regulation and genes not changing are in the middle". I am not sure if this is the right assumption for GSEA input file?
I greatly appreciate if you could help me understand this, and explain me your preferred method for creating a GSEA rank file for RNASeq expression results.
Many Thanks,
Noushin