Question

GSEA preranking metric for RNA Seq

1

Entering edit mode

7.5 years ago

bipin ▴ 30

I came across multiple posts regarding the pre-ranking metric for GSEA when using RNA seq data. However, there doesn't seem to be a consensus.

Some of the metrics I came across are:-

sign of log fold change * -log10(p-value[not adjusted p-val])
logfc shrink values from DESeq2
Inbuilt signal2noise from GSEA. However, this cannot be used in case of <3 replicates.

What metric do you use for ranking the genes or you know is widely used?

RNA-Seq gsea deseq2 • 12k views

ADD COMMENT • link updated 13 months ago by V_Vibes ▴ 10 • written 7.5 years ago by bipin ▴ 30

0

Entering edit mode

How to deal with the data that has some genes with 0 log2fold change? Is it a good idea to prefilter before ranking as they are essentially not modulated by the treatment for example?

ADD REPLY • link 18 months ago by Balasubramaniam ▴ 20

score 6 · Answer 1 · 2018-02-12

6

Entering edit mode

7.5 years ago

Kevin Blighe 89k

Edit 31st July, 2019: I gave my original answer (below) assuming that you were referring to the general process of gene enrichment (or 'gene-set enrichment analysis'), and not that you were referring to GSEA, the Broad Institute's PROGRAM that hijacked the term GSEA

GSEA (the Broad Institute program) permits a ranked list of genes, as does topGO (R), fGSEA (see my comment below), and other enrichment programs - there are too many programs.

---------------------------------------------

It makes sense that there is no consensus, as there are countless ways to do this. My own recommendation would be to:

Set an adjusted P value cut-off
Rank genes based on absolute log (base 2) fold change

I believe the most widely used method is to just set an adjusted P value and log (base 2) fold change cut-off, and to then 'throw' the resulting gene list into the GSEA without any ranking.

The lack of consensus on a proper filtering strategy may in part be due to the fact that a substantial proportion of researchers do not pay much attention to the results of GSEA. GSEA results would certainly never stand as the sole evidence in a clinical test, neither would they be sufficient evidence on which conclusions could be made in most reputable journals.

Kevin

ADD COMMENT • link 4.7 years ago by Kevin Blighe 89k

0

Entering edit mode

what would be your answer if the question was about the GSEA program of Broad Institute?

ADD REPLY • link 5.4 years ago by asalimih ▴ 60

1

Entering edit mode

Rank by fold-change.

ADD REPLY • link 5.4 years ago by Kevin Blighe 89k

0

Entering edit mode

For others landing here, note also these excellent postings on using fGSEA:

ADD REPLY • link 4.7 years ago by Kevin Blighe 89k

0

Entering edit mode

How bad would it be to use the -log10(p-value) * log2(foldchange)? Note I mean log2(foldchange) and not its sign.

ADD REPLY • link 3.3 years ago by Juan Cordero ▴ 140

0

Entering edit mode

No issue - at least you maintain directionality, in that case.

ADD REPLY • link 3.3 years ago by Kevin Blighe 89k

0

Entering edit mode

I have a query regarding the analysis of GSEA Results. I have used GSEA to obtain the dysregulated KEGG pathways. Now, I want to rank the dysregulated KEGG pathways. So, is it logical to use NES * (-log10 Nominal p-value) or NES * (-log10 FDR q-value) for ranking the KEGG pathways?

ADD REPLY • link 13 months ago by V_Vibes ▴ 10