Question

What is the best way to rank genes for GSEA?

9

Entering edit mode

5.6 years ago

Gabriel ▴ 170

I am doing pathway and gene ontology analysis using Gene Set Enrichment Analysis(GSEA). For the tools, you need to provide a ranked gene list, however, various papers have provide different recommendations on how to do this.

Is there a current consensus on what is the ideal way to do this? I've been using Log2 Fold change, and I am unsure weather to use Fold Change, p-values instead. Or an other method?

One post: Problem with creating GSEA rank file recommended signed p-values, but I haven't found any literature reviews or clarification on the issue. clusterProfiler mentions fold change for ranked gene lists, so I am unsure if I would be getting "bad results" by using p-value sorting. And if the different packages are optimized for one or the other sorting.

According to Yu, author of cluster profiler:

geneList contains three features: numeric vector: fold change or other type of numerical variable named vector: every number has a name, the corresponding gene ID sorted vector: number should be sorted in decreasing order https://github.com/GuangchuangYu/DOSE/wiki/how-to-prepare-your-own-geneList

"other type of numerical variable" is unclear. Perhaps there are other, similar methods to GSEA who have a more concrete way of doing things?

EDIT: for clusterProfiler's function gseGO() I get different result when using Log2FoldChange versus FoldChange for ranking

GSEA GO Gene ranking RNA-Seq • 23k views

ADD COMMENT • link updated 7 weeks ago by hermidalc ▴ 60 • written 5.6 years ago by Gabriel ▴ 170

score 14 · Answer 1 · 2019-04-18

14

Entering edit mode

5.6 years ago

Pietro ▴ 240

Hi Gabriel

For GSEA, some they do signed fold change * -log10pvalue, found it here: http://crazyhottommy.blogspot.com/2016/08/gene-set-enrichment-analysis-gsea.html

ADD COMMENT • link 5.6 years ago by Pietro ▴ 240

2

Entering edit mode

Just because you see something in published papers doesn't mean it's good or recommended, a lot of authors miss things or do not have a deep knowledge of what they are doing, and such technical details are often not reviewed by peer reviewers, even in high impact papers.

Using only logFC or only p-value based ranking metrics (which includes the above approach since only using the logFC to get the direction) each have their downsides - genes ranked by logFC are biased by the bigger variance in genes with low counts and genes ranked by p-value are biased by genes with higher abundance and longer transcripts. See https://support.bioconductor.org/p/85681/

ADD REPLY • link 2.3 years ago by hermidalc ▴ 60

0

Entering edit mode

I have a query regarding the analysis of GSEA Results. I have used GSEA to obtain the dysregulated KEGG pathways. Now, I want to rank the dysregulated KEGG pathways. So, is it logical to use NES * (-log10 Nominal p-value) or NES * (-log10 FDR q-value) for ranking the KEGG pathways?

ADD REPLY • link 5 months ago by V_Vibes ▴ 10

1

Entering edit mode

For ranking pathways you can do following thing:

Sort by adjusted p-value < 0.25 (Recommended)
Filter data withh adjusted pvalue and sort by absolute NES value.

ADD REPLY • link 5 months ago by DareDevil ★ 4.3k

0

Entering edit mode

What about the case where the GSEA results don't have any pathway with adjusted p-value <0.25?

ADD REPLY • link 5 months ago by V_Vibes ▴ 10

0

Entering edit mode

If you don't have any pathways with an adjusted p-value < 0.25, you may consider increasing the threshold to < 0.3 (though this is not recommended). Essentially, this indicates that there are no significantly different pathways in your analysis.

ADD REPLY • link 5 months ago by DareDevil ★ 4.3k

0

Entering edit mode

Do you have a reference for the recommendation of using an adjusted p-value p<0.25?

ADD REPLY • link 3 months ago by cmmcdowell • 0

0

Entering edit mode

See here and here too

ADD REPLY • link 3 months ago by DareDevil ★ 4.3k

0

Entering edit mode

So, is FDR and adjusted p value the same thing then?

ADD REPLY • link 12 weeks ago by cmmcdowell • 0

0

Entering edit mode

FDR is a specific type of adjusted p-value, its the Benjamini-Hochberg false discovery rate. There are other methods too for adjusting p-values.

ADD REPLY • link 7 weeks ago by hermidalc ▴ 60