EdgeR gmlQLF with low number of genes
1
0
Entering edit mode
4 months ago

Hi,

I have scRNA seq data comprising two main groups ('Treatment') from 2 batches. All cells are CD8+T cells.

  • Batch 1: 1 control and one treatment sample
  • Batch 2: 1 control and one treatment sample

The scRNAseq panel, however, is a targeted immune panel comprising 400 genes. After removing all zero counts, I was left with approximately 250 genes.

I want to try pseudobulk DE testing on this dataset to compare treatment vs control groups. I have used EdgeR. As far as I know, EdgeR takes advantage of an abundant number of genes to estimate dispersion. I had weird-looking BCV plots and QLDisp plots.

Now, I am not sure that the results from this analysis are reliable. Do you know how edgeR performs under these types of situations?

plots:

plots

edgeR DGE Pseudobulking • 557 views
ADD COMMENT
1
Entering edit mode
4 months ago
Gordon Smyth ★ 7.7k

When the number of genes is small, the edgeR dispersion trend lines may show some wobbles, but this isn't a problem. edgeR should work fine down to quite small numbers of genes, well below 250.

You should use the latest version of edgeR (edgeR v4.2) because we are actively working on edgeR QL for small counts and small numbers of genes. Also try setting robust=TRUE in the glmQLFit call.

Having zero counts is fine, just as long as the gene is not entirely zero for all samples.

ADD COMMENT
0
Entering edit mode

Thank you very much for the comment.

ADD REPLY
0
Entering edit mode

Hello Professor Smyth, I'm in a similar situation with a targeted amplicon panel, but I have only 8 genes. I was wondering if you have a sense of how far below 250 edgeR would provide valid results?

ADD REPLY
0
Entering edit mode

limma and edgeR are written in such a way that they can be applied to any number of genes, even just a few. edgeR will run even on just one gene, in which case it is equivalent to a univariate generalized linear model.

The edgeR v4 QL pipepline automatically simplifies the dispersion trend when there are only a few genes. Theoretically, it can benefit from borrowing information between genes when there are as few as three genes, although I wouldn't promote that in practice. It should work fine on 8 genes. Of course, your data needs to included replicate samples, regardless of the number of genes.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6