Question

EdgeR gmlQLF with low number of genes

0

Entering edit mode

11 months ago

merdanosman • 0

Hi,

I have scRNA seq data comprising two main groups ('Treatment') from 2 batches. All cells are CD8+T cells.

Batch 1: 1 control and one treatment sample
Batch 2: 1 control and one treatment sample

The scRNAseq panel, however, is a targeted immune panel comprising 400 genes. After removing all zero counts, I was left with approximately 250 genes.

I want to try pseudobulk DE testing on this dataset to compare treatment vs control groups. I have used EdgeR. As far as I know, EdgeR takes advantage of an abundant number of genes to estimate dispersion. I had weird-looking BCV plots and QLDisp plots.

Now, I am not sure that the results from this analysis are reliable. Do you know how edgeR performs under these types of situations?

plots:

plots

edgeR DGE Pseudobulking • 1.1k views

ADD COMMENT • link updated 11 months ago by Gordon Smyth ★ 8.2k • written 11 months ago by merdanosman • 0

score 1 · Answer 1 · 2024-08-14

1

Entering edit mode

11 months ago

Gordon Smyth ★ 8.2k

When the number of genes is small, the edgeR dispersion trend lines may show some wobbles, but this isn't a problem. edgeR should work fine down to quite small numbers of genes, well below 250.

You should use the latest version of edgeR (edgeR v4.2) because we are actively working on edgeR QL for small counts and small numbers of genes. Also try setting robust=TRUE in the glmQLFit call.

Having zero counts is fine, just as long as the gene is not entirely zero for all samples.

ADD COMMENT • link 11 months ago by Gordon Smyth ★ 8.2k

0

Entering edit mode

Thank you very much for the comment.

ADD REPLY • link 11 months ago by merdanosman • 0

0

Entering edit mode

Hello Professor Smyth, I'm in a similar situation with a targeted amplicon panel, but I have only 8 genes. I was wondering if you have a sense of how far below 250 edgeR would provide valid results?

ADD REPLY • link 11 months ago by iwt • 0

0

Entering edit mode

limma and edgeR are written in such a way that they can be applied to any number of genes, even just a few. edgeR will run even on just one gene, in which case it is equivalent to a univariate generalized linear model.

The edgeR v4 QL pipepline automatically simplifies the dispersion trend when there are only a few genes. Theoretically, it can benefit from borrowing information between genes when there are as few as three genes, although I wouldn't promote that in practice. It should work fine on 8 genes. Of course, your data needs to included replicate samples, regardless of the number of genes.

ADD REPLY • link 11 months ago by Gordon Smyth ★ 8.2k