Question

How to rank genes for GSEA using edgeR-LRT results ?

1

Entering edit mode

3 days ago

Picasa ▴ 660

Hi all,

I'm running GSEA on RNA-seq differential expression results and I’m wondering what’s the best way to rank genes when using edgeR LRT .

I know that for:

DESeq2, it's common to rank by log2FoldChange after shrinkage.

limma-voom, the t-statistic is often used for ranking.

But edgeR-LRT does not provide a t-stat, so I’m not sure what’s most appropriate. Would using logFC alone be enough?

DEG edgeR • 292 views

ADD COMMENT • link updated 8 hours ago by dariober 15k • written 3 days ago by Picasa ▴ 660

0

Entering edit mode

2 days ago

ATpoint 87k

I assume you mean glmLRT followed by topTags? If so, the LR column could be used, signed for direction of fold change, e.g. with topTags()$tableoutput being tt:

sign(tt$logFC) * tt$LR

or you use signed -log10(pvalue) which is basically the same. I would not use fold change alone since you need the pvalue to decide whether high fold changes are reliable or an artifact of large standard errors.

ADD COMMENT • link 2 days ago by ATpoint 87k

0

Entering edit mode

Don't necessarily need to run topTags. Even with the glmLRT output, lrt say, the signed LRT statistic

z <- sign(lrt$table$logFC) * sqrt(lrt$table$LR)

is a standard normal z-statistic, and would be a good choice for GSEA ranking analyses.

ADD REPLY • link 2 days ago by Gordon Smyth ★ 7.8k

0

Entering edit mode

Checking my understanding is correct: GSEA, at least as implemented in clusterProfiler, only uses the rank of the genes, not the quantitative score used for ranking. This should mean that ranking by p-value is equivalent to ranking by t-statistics (or LRT, in the case glm) and sqrt(lrt$table$LR) is the same as lrt$table$LR. However, occasionally I've heard people discussing whether you should use p-value or -log10(pvalue) or t-statistics... Am I missing something? (It's different for logFC, shrunk-logFC, p-value where indeed the ranking may change).

ADD REPLY • link 8 hours ago by dariober 15k

score 1 · Accepted Answer · 2025-03-22

The best choice would be use edgeR's built-in GSEA functionality provided by camera and cameraPR. These functions adjust for inter-gene correlation, which other ranked GSEA tools do not.

Otherwise, you can use the signed LRT statistic

z <- sign(lrt$table$logFC) * sqrt(lrt$table$LR)

as suggested by ATpoint, which is analogous to the t-statistic from limma-voom.

Finally, if you wanted a shrunk logFC analogous to that from DESeq2, you could use predFC with a large prior.count, say prior.count=5.