Batch Effect Removal on non-Linearly independent samples
1
0
Entering edit mode
16 months ago
James • 0

The wet lab that I work with did a bulk RNA-Seq experiment. In the experiment, they had wild type and diseased cells. They then treated some of the WT and some of the diseased with an RNA methyltransferase to see see if it rescued the diseased state. Here are the specifics of the experiment with the batch, treated v. untreated, and disease state.

Treatment (T=treated; U=untreated):

  • T U U U T T T T U U U T

Batch:

  • 1 2 3 4 5 5 5 5 3 4 2 1

Disease Group:

  • 1 1 1 1 1 1 2 2 2 2 2 2

To point out the problem, it seems that the Treated samples are in batches 1 and 5, but the untreated samples are in separate batches. If I perform batch effect removal inputting the batches and the Disease groups, wouldn't this cancel out the Treatment effects? What should I do in this situation? I wasn't involved in the wet lab part of this experiment and I wasn't consulted on the planning of the experiment.

Also, if I perform batch removal, can I use all 6 samples from each Disease group to compare differentially expressed genes because the Treatment effect will have also been removed. Optimally, I would like to keep the experiment faithful to what was planned, but any tips, suggestions, or advice would be helpful.

batch rna-seq ngs • 825 views
ADD COMMENT
0
Entering edit mode

What does "batch" mean in this context? Is it sequencing batch? Experimental? Maybe the treatment is considered a different batch by the lab where in practice someone else might consider it the same batch. In some cases treatments and controls can't be done together for technical reasons. I would clear this out before jumping into conclusions.

ADD REPLY
0
Entering edit mode
16 months ago
LChart 4.6k

So "Rescuing the disease state" to me means a differences in differences of the form:

(Treated Disease vs Untreated Disease) vs (Untreated Disease vs Untreated Control)

The second part (Untreated Disease vs Untreated Control) is batches 2,3,4 and are OK; so you can define the "disease" signature without trouble.

The first part (Treated Disease vs Untreated Disease) is, as you mention, stratified by (5,1) vs (2,3,4). A batch effect correction method won't complain because you also have (Treated WT vs Untreated WT) in these batches as well; and correcting for batch will correct for the average treatment effect for both WT and disease; leaving any residual (Disease x Treatment) effect.

Unfortunately, this is a perfect confound, and there's very little you can do. You can use the variability batches 2, 3, 4 to set a prior on the magnitude of batch effects for 1, 5 -- and in the case of a strong treatment effect this will reduce the magnitude of correction; but the correction will still be in the "direction" of treatment. Ultimately there is no way to distinguish between treatment effects and batch effects under this fixed-effects design.

Funnily enough you can compare (Treated Disease vs Treated Control) vs (Untreated Disease vs Untreated Control) without issue since both of these are balanced. This will actually let you make statements about how treatment impacts the differentially expressed genes; but you have to make assumptions about whether it does this by making disease look more like control; or by making control look more like disease. While you cannot definitively rule out that there is a treatment effect on controls that is then "canceled out" by a similarly large -- but opposite -- batch effect; if the (disease vs control) effects are large compared to the batch 2,3,4 effects, then you can make a compelling case that "successful treatment" is far more likely than "pernicious batch effect."

One way to do this is to correct for the batch factor via a random effect, rather than a fixed effect. The only way to accomplish this at the moment is to switch from DESeq2/edgeR over to limma so you can use the duplicateCorrelation function to specify batch as a random effect; or you could take vst/voom/logTPM expression values and fit a mixed linear model using lmer. I would recommend the former.

ADD COMMENT
0
Entering edit mode

Thanks for the quick response! So the second differences in differences makes sense to me as it's comparing the differentially expressed genes, but I'm a bit confused on the first differences in differences: "(Treated Disease vs Untreated Disease) vs (Untreated Disease vs Untreated Control)" - what exactly would this tell me? "Treated Disease vs Untreated Disease" would tell me which genes are DEX due to treatment on the diseased state. "Untreated Disease vs Untreated Control" would tell me which genes are DEX in the disease. I don't understand how comparing these two differences would give me information about the rescue.

Thanks for the advice with the second comparison though. Will definitely try that!

Edit: Also I'm thinking about this, and how would I compare a difference of differences with DESeq2 or a similar software. I feel like I've only ever compared one group to another

ADD REPLY
0
Entering edit mode

Because if (Treated Disease vs Untreated Disease) is equal and opposite to (Untreated Disease vs Untreated Control) then the treatment effect (on disease subject) completely ablates the disease effect itself. From the perspective of "reversal" there is no value to the effect of treatment on WT cells.

ADD REPLY

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6