Question

Batch effect correction for bulk RNA-seq

0

Entering edit mode

2.8 years ago

bobbybobbobbo • 0

I have a bulk RNA-seq dataset made up of control and treatment conditions for a range of cell lines. This dataset was generated in two batches, such that the cell lines are split between batches but both the treatment and control for each cell line are within the same batch. As the cell lines are very different, I'm looking to do DE analysis for each cell line and then compare these DE responses between cell lines.

I'm wondering whether batch effect correction is useful here since I'm only ever comparing DE values and not the raw counts themselves. Also, the baseline expression values for the cells lines are so different that they already constitute a strong biological batch effect that is perfectly correlated with the technical batch effect, which batch correction wouldn't be able to account for.

statistics batch-effect RNA-seq • 1.7k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 2.8 years ago by bobbybobbobbo • 0

score 0 · Answer 1 · 2022-02-13

Could it still be detrimental to the study? The short answer is that yes. There are tons of ways/reasons why this is true, so Ill choose a very easy one. Suppose that one of the batches got contaminated and was run without noticing that. Perhaps, for instance, some liquid was spilled in there, displacing 90% of what should be there and leaving only 10%.

Final effect could be, for instance, much higher variability in the results, so the differential expression values themselves fluctuate. This increased variability will make that batch less likely to agree with the others, ultimately driving down your test statistics overall; in other words increasing Type II error. However, this and other problems could also drive up Type I error as well...

How do you determine if there is Batch effect? For myself, I would say this: you should know for sure whether there is a batch effect by the time you are done with your QC.

There are lots of ways to do this. I generally get a lot from the PC loadings for the top 10 PCs for each sample. If you notice that the batches are grouping together, rather than the cases and controls grouping together, then in comparison to the effect you are trying to measure, the batch effect is strong. There are lots of other ways to detect batch effect; e.g. similarity plot and difference plot. See the DESeq2 vignette for examples of this, and google more if you don't understand.

You could also test control 1 vs control 2 vs control 3, etc.. This would you mean you are essentially testing if there is a difference between controls by batch. If you find really strong differences between your controls, thats not a good sign.

What to do if you have batch effect?

Control for it as a co-variate
Run the experiment as a meta-analysis
Use a random effects model with a blocking variable (for batch) to account for that variability.

TL;DR: it costs very little to just add in a term for batch. However again, the larger point is that if you know how to read the QC graphs, you should have a good idea going in if there is a batch effect or not.