I am analysing RNA-seq data consisting of 3 different groups of samples, 2 different tumour types and the control normal tissue. The design is not balanced since one of the tumour types comes from a different batch (in house), and the other data from the other tumour type and normal come from TCGA downloaded data. Even knowing that, I would like to remove the batch effect but also to retain biological differences when accounting for this.
factor 1-- > group, with 3 levels (tumour type a, tumour type b, and normal )
factor 2 -- > class, with 2 levels (tumour, non tumour)
factor 3 -- > batch, with 25 levels corresponding to 25 different runs
To do so, I am using ComBat as follows, but I am getting this error.
modcombat <- model.matrix(~as.factor(group) + as.factor(class), data=design_data)
combat_data <- ComBat(dat=y_norm, \
batch= design_data$run,\
mod=modcombat,\
par.prior=TRUE,\
prior.plots=FALSE,\
mean.only = TRUE)
Using the 'mean only' version of ComBat
Found25batches
Adjusting for3covariate(s) or covariate level(s)
Error in ComBat(dat = y_norm, batch = design_data$run, mod = modcombat, :
At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat
I would appreciate any suggestions about the best model to not lose biology.
Thanks!
Can you please show a table where for each sample one can clearly see which covariates are assigned?
Here I am showing some bits of the 3 main groups