Final Edit: The lesson here is, don't convert factor variables with only two levels to factor types. Just leave them as numerics.
Edit 2: Sorry I have wrongly inferred ComBat not being able to handle 2 batches.
Original: I have an Olink NPX dataset that need batch correction with only 2 batches. I tried ComBat, it would stop running with the error message "At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat", with or without the design matrix. I looked into the code and it seems that ComBat is not able to handle batch corrections with only 2 batches.
Am I correct? What are good alternatives if I am to do batch corrections to dataset with only 2 batches?
Edit1:
While I cannot post the covariates and batch variables directly, I can provide the correlation matrix of design
, the structure used in ComBat to infer confounding:
structure(c(1, -1, 0, 0, -0.0260011520030977, 0.0791694781327139,
-0.0460764787433406, 0.108519031075261, 0.0123805848206014, -1,
1, 0, 0, 0.0260011520030977, -0.0791694781327139, 0.0460764787433406,
-0.108519031075261, -0.0123805848206014, 0, 0, 1, -1, -0.260035694607971,
-0.608536557719787, -0.105492246706702, -0.412662510120705, 0.0596664424282021,
0, 0, -1, 1, 0.260035694607971, 0.608536557719787, 0.105492246706702,
0.412662510120705, -0.0596664424282021, -0.0260011520030977,
0.0260011520030977, -0.260035694607971, 0.260035694607971, 1,
0.264457402769079, 0.10259294363056, 0.249699029250425, 0.0500069775024256,
0.0791694781327139, -0.0791694781327139, -0.608536557719787,
0.608536557719787, 0.264457402769079, 1, 0.0162943643294796,
0.678122792929598, -0.0214816402734835, -0.0460764787433406,
0.0460764787433406, -0.105492246706702, 0.105492246706702, 0.10259294363056,
0.0162943643294796, 1, -0.000200465769279558, 0.0884104795401378,
0.108519031075261, -0.108519031075261, -0.412662510120705, 0.412662510120705,
0.249699029250425, 0.678122792929598, -0.000200465769279558,
1, -0.0710095314512266, 0.0123805848206014, -0.0123805848206014,
0.0596664424282021, -0.0596664424282021, 0.0500069775024256,
-0.0214816402734835, 0.0884104795401378, -0.0710095314512266,
1), dim = c(9L, 9L), dimnames = list(c("batch0", "batch1", "as.factor(gender)0",
"as.factor(gender)1", "age", "as.factor(smoking_status)1",
"BMI", "pack_year", "second_hand_smoking"), c("batch0",
"batch1", "as.factor(gender)0", "as.factor(gender)1", "age",
"as.factor(smoking_status)1", "BMI", "pack_year", "second_hand_smoking"
)))
Can you show a table summarizing the conditions of your experiment and the batches so which samples are which batch. The error tells you that either the batches are nested with each other or linear with a condition.
Thank you! Please see my edit.