Hi,
I have scRNA seq data comprising two main groups ('Treatment') from 2 batches. All cells are CD8+T cells.
- Batch 1: 1 control and one treatment sample
- Batch 2: 1 control and one treatment sample
The scRNAseq panel, however, is a targeted immune panel comprising 400 genes. After removing all zero counts, I was left with approximately 250 genes.
I want to try pseudobulk DE testing on this dataset to compare treatment vs control groups. I have used EdgeR. As far as I know, EdgeR takes advantage of an abundant number of genes to estimate dispersion. I had weird-looking BCV plots and QLDisp plots.
Now, I am not sure that the results from this analysis are reliable. Do you know how edgeR performs under these types of situations?
plots:
Thank you very much for the comment.
Hello Professor Smyth, I'm in a similar situation with a targeted amplicon panel, but I have only 8 genes. I was wondering if you have a sense of how far below 250 edgeR would provide valid results?
limma and edgeR are written in such a way that they can be applied to any number of genes, even just a few. edgeR will run even on just one gene, in which case it is equivalent to a univariate generalized linear model.
The edgeR v4 QL pipepline automatically simplifies the dispersion trend when there are only a few genes. Theoretically, it can benefit from borrowing information between genes when there are as few as three genes, although I wouldn't promote that in practice. It should work fine on 8 genes. Of course, your data needs to included replicate samples, regardless of the number of genes.