Question

Can I run EdgeR or DESeq2 on bimodal data caused by batch effect?

0

Entering edit mode

3.9 years ago

jeltje.van.baren ▴ 80

I have counts data for 18 samples, two conditions, sequenced in two batches. I would normally run DE analysis with batch as covariate.

A density plot on the scaled CPM data shows a bimodal distribution. Density plots of the separated batches show single peak more-or-less normal distributions so my conclusion is that the bimodality is caused by the batches.

I know that for running batch correction using ComBat you need to select non-parametric correction if the distribution isn't normal, because the regular parametric method assumes a normal distribution. I also know you're supposed to run DE on the raw counts without corrections so I can't use the batch corrected data. What I do not deeply understand is how DE programs subtract out the covariates and if they need a normally distributed dataset to do so.

So my question is: Can I run EdgeR or DESeq2 with batch as covariate the way I normally do? Or will the bimodality cause issues?

RNA-Seq • 1.3k views

ADD COMMENT • link updated 2.1 years ago by Gordon Smyth ★ 7.7k • written 3.9 years ago by jeltje.van.baren ▴ 80

0

Entering edit mode

Hello, could you please add the code and resulting figure to your post, it is difficult to follow just based on textual description. Thanks.

ADD REPLY • link 3.9 years ago by ATpoint 85k

score 0 · Answer 1 · 2022-10-28

edgeR makes no assumptions about the shape of the CPM distribution, so bimodality does not present any problem. You can simply run the DE analysis with batch as a blocking factor. The same would be true for DESeq2.

edgeR is essentially only making assumptions about the mean-variance relationship. It does not make any assumptions about the distribution of genewise expression levels.