I'm having some trouble figuring out how to correct for batch in my experiment. I have multiple batches that I need to correct for, and each batch has a different marginal distribution of outcome data. Using ComBat or ComBat-seq, I can stratify for the outcome. But then I can't properly test this model on held out data for which outcomes are not known. Correcting without controlling for outcome risks confounding batch with outcome distribution and, empricially and expectedly, removes signal for the data. The only way I see to proceed is to correct with outcome known, then test out of sample by pretending that that batch effect is basically equal to batch-corrected data. Alternatively, I can skip batch correction, but the batch effect is quite strong, including different read depths.
Thanks in advance for any advice!