Can I run EdgeR or DESeq2 on bimodal data caused by batch effect?
1
0
Entering edit mode
3.9 years ago

I have counts data for 18 samples, two conditions, sequenced in two batches. I would normally run DE analysis with batch as covariate.

A density plot on the scaled CPM data shows a bimodal distribution. Density plots of the separated batches show single peak more-or-less normal distributions so my conclusion is that the bimodality is caused by the batches.

I know that for running batch correction using ComBat you need to select non-parametric correction if the distribution isn't normal, because the regular parametric method assumes a normal distribution. I also know you're supposed to run DE on the raw counts without corrections so I can't use the batch corrected data. What I do not deeply understand is how DE programs subtract out the covariates and if they need a normally distributed dataset to do so.

So my question is: Can I run EdgeR or DESeq2 with batch as covariate the way I normally do? Or will the bimodality cause issues?

RNA-Seq • 1.3k views
ADD COMMENT
0
Entering edit mode

Hello, could you please add the code and resulting figure to your post, it is difficult to follow just based on textual description. Thanks.

ADD REPLY
0
Entering edit mode
2.1 years ago
Gordon Smyth ★ 7.7k

edgeR makes no assumptions about the shape of the CPM distribution, so bimodality does not present any problem. You can simply run the DE analysis with batch as a blocking factor. The same would be true for DESeq2.

edgeR is essentially only making assumptions about the mean-variance relationship. It does not make any assumptions about the distribution of genewise expression levels.

ADD COMMENT

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6