Question

Is ComBat not able to to handle only two batches? What are good alternatives?

0

Entering edit mode

5 months ago

samuelandjw ▴ 270

Final Edit: The lesson here is, don't convert factor variables with only two levels to factor types. Just leave them as numerics.

Edit 2: Sorry I have wrongly inferred ComBat not being able to handle 2 batches.

Original: I have an Olink NPX dataset that need batch correction with only 2 batches. I tried ComBat, it would stop running with the error message "At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat", with or without the design matrix. I looked into the code and it seems that ComBat is not able to handle batch corrections with only 2 batches.

Am I correct? What are good alternatives if I am to do batch corrections to dataset with only 2 batches?

Edit1:

While I cannot post the covariates and batch variables directly, I can provide the correlation matrix of design, the structure used in ComBat to infer confounding:

structure(c(1, -1, 0, 0, -0.0260011520030977, 0.0791694781327139,
-0.0460764787433406, 0.108519031075261, 0.0123805848206014, -1,
1, 0, 0, 0.0260011520030977, -0.0791694781327139, 0.0460764787433406,
-0.108519031075261, -0.0123805848206014, 0, 0, 1, -1, -0.260035694607971,
-0.608536557719787, -0.105492246706702, -0.412662510120705, 0.0596664424282021,
0, 0, -1, 1, 0.260035694607971, 0.608536557719787, 0.105492246706702,
0.412662510120705, -0.0596664424282021, -0.0260011520030977,
0.0260011520030977, -0.260035694607971, 0.260035694607971, 1,
0.264457402769079, 0.10259294363056, 0.249699029250425, 0.0500069775024256,
0.0791694781327139, -0.0791694781327139, -0.608536557719787,
0.608536557719787, 0.264457402769079, 1, 0.0162943643294796,
0.678122792929598, -0.0214816402734835, -0.0460764787433406,
0.0460764787433406, -0.105492246706702, 0.105492246706702, 0.10259294363056,
0.0162943643294796, 1, -0.000200465769279558, 0.0884104795401378,
0.108519031075261, -0.108519031075261, -0.412662510120705, 0.412662510120705,
0.249699029250425, 0.678122792929598, -0.000200465769279558,
1, -0.0710095314512266, 0.0123805848206014, -0.0123805848206014,
0.0596664424282021, -0.0596664424282021, 0.0500069775024256,
-0.0214816402734835, 0.0884104795401378, -0.0710095314512266,
1), dim = c(9L, 9L), dimnames = list(c("batch0", "batch1", "as.factor(gender)0",
"as.factor(gender)1", "age", "as.factor(smoking_status)1",
"BMI", "pack_year", "second_hand_smoking"), c("batch0",
"batch1", "as.factor(gender)0", "as.factor(gender)1", "age",
"as.factor(smoking_status)1", "BMI", "pack_year", "second_hand_smoking"
)))

ComBat batch correction • 455 views

ADD COMMENT • link 5 months ago by samuelandjw ▴ 270

0

Entering edit mode

Can you show a table summarizing the conditions of your experiment and the batches so which samples are which batch. The error tells you that either the batches are nested with each other or linear with a condition.

ADD REPLY • link 5 months ago by ATpoint 86k

0

Entering edit mode

Thank you! Please see my edit.

ADD REPLY • link 5 months ago by samuelandjw ▴ 270

score 2 · Answer 1 · 2024-07-30

ComBat can handle 2 batches just fine as far as I know. More likely, your batches are totally aligned with your covariate of interest, thereby making it impossible to account for properly. Posting your sample metadata and code would help us determine if that is indeed the case.

As for solutions, there's nothing you can do. Put more thought into experimental design next time to ensure samples from all groups of interest are spread across batches.