What is the best way to rank genes for GSEA?
1
9
Entering edit mode
5.6 years ago
Gabriel ▴ 170

I am doing pathway and gene ontology analysis using Gene Set Enrichment Analysis(GSEA). For the tools, you need to provide a ranked gene list, however, various papers have provide different recommendations on how to do this.

Is there a current consensus on what is the ideal way to do this? I've been using Log2 Fold change, and I am unsure weather to use Fold Change, p-values instead. Or an other method?

One post: Problem with creating GSEA rank file recommended signed p-values, but I haven't found any literature reviews or clarification on the issue. clusterProfiler mentions fold change for ranked gene lists, so I am unsure if I would be getting "bad results" by using p-value sorting. And if the different packages are optimized for one or the other sorting.

According to Yu, author of cluster profiler:

geneList contains three features: numeric vector: fold change or other type of numerical variable named vector: every number has a name, the corresponding gene ID sorted vector: number should be sorted in decreasing order https://github.com/GuangchuangYu/DOSE/wiki/how-to-prepare-your-own-geneList

"other type of numerical variable" is unclear. Perhaps there are other, similar methods to GSEA who have a more concrete way of doing things?

EDIT: for clusterProfiler's function gseGO() I get different result when using Log2FoldChange versus FoldChange for ranking

GSEA GO Gene ranking RNA-Seq • 23k views
ADD COMMENT
14
Entering edit mode
5.6 years ago
Pietro ▴ 240

Hi Gabriel

For GSEA, some they do signed fold change * -log10pvalue, found it here: http://crazyhottommy.blogspot.com/2016/08/gene-set-enrichment-analysis-gsea.html

ADD COMMENT
2
Entering edit mode

Just because you see something in published papers doesn't mean it's good or recommended, a lot of authors miss things or do not have a deep knowledge of what they are doing, and such technical details are often not reviewed by peer reviewers, even in high impact papers.

Using only logFC or only p-value based ranking metrics (which includes the above approach since only using the logFC to get the direction) each have their downsides - genes ranked by logFC are biased by the bigger variance in genes with low counts and genes ranked by p-value are biased by genes with higher abundance and longer transcripts. See https://support.bioconductor.org/p/85681/

ADD REPLY
0
Entering edit mode

I have a query regarding the analysis of GSEA Results. I have used GSEA to obtain the dysregulated KEGG pathways. Now, I want to rank the dysregulated KEGG pathways. So, is it logical to use NES * (-log10 Nominal p-value) or NES * (-log10 FDR q-value) for ranking the KEGG pathways?

ADD REPLY
1
Entering edit mode

For ranking pathways you can do following thing:

  1. Sort by adjusted p-value < 0.25 (Recommended)
  2. Filter data withh adjusted pvalue and sort by absolute NES value.
ADD REPLY
0
Entering edit mode

What about the case where the GSEA results don't have any pathway with adjusted p-value <0.25?

ADD REPLY
0
Entering edit mode

If you don't have any pathways with an adjusted p-value < 0.25, you may consider increasing the threshold to < 0.3 (though this is not recommended). Essentially, this indicates that there are no significantly different pathways in your analysis.

ADD REPLY
0
Entering edit mode

Do you have a reference for the recommendation of using an adjusted p-value p<0.25?

ADD REPLY
0
Entering edit mode

See here and here too

ADD REPLY
0
Entering edit mode

So, is FDR and adjusted p value the same thing then?

ADD REPLY
0
Entering edit mode

FDR is a specific type of adjusted p-value, its the Benjamini-Hochberg false discovery rate. There are other methods too for adjusting p-values.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6