Question

Batch correction confounded by outcome distribution across batches

1

Entering edit mode

3.6 years ago

boris ▴ 10

I'm having some trouble figuring out how to correct for batch in my experiment. I have multiple batches that I need to correct for, and each batch has a different marginal distribution of outcome data. Using ComBat or ComBat-seq, I can stratify for the outcome. But then I can't properly test this model on held out data for which outcomes are not known. Correcting without controlling for outcome risks confounding batch with outcome distribution and, empricially and expectedly, removes signal for the data. The only way I see to proceed is to correct with outcome known, then test out of sample by pretending that that batch effect is basically equal to batch-corrected data. Alternatively, I can skip batch correction, but the batch effect is quite strong, including different read depths.

Thanks in advance for any advice!

batch correction • 880 views

ADD COMMENT • link updated 3.1 years ago by madbadradscientist ▴ 20 • written 3.6 years ago by boris ▴ 10

score 0 · Answer 1 · 2022-04-06

I've created a new method called ConDo which was designed to solve this exact problem! At training time, it conditions on confounding variables, but it finds the optimal linear transform to apply to all samples, regardless of the value of the confounding variables. Then, you can apply it to test data, even where you don't know the value of the confounders.

Preprint is here: https://arxiv.org/abs/2203.12720

Software is here: https://github.com/calvinmccarter/condo-adapter

Please feel free to reach out here if you have issues using the software, or as a Github Issue.