I'm trying to remove batch effect from my data using the sva
package. The process described here is like this:
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
# The null model contains only the adjustment variables. Since we are not adjusting
# for any other variables in this analysis, only an intercept is included in
# the model.
mod0 = model.matrix(~1,data=pheno)
# The full model includes both the adjustment variables
# and the variable of interest
mod <- model.matrix(~1+ cancer, data=pheno)
# Identify the number of latent factors that need to be estimated.
n.sv = num.sv(edata,mod,method="leek")
# estimate the surrogate variables
svobj = sva(edata,mod,mod0,n.sv=n.sv)
# include the surrogate variables in both the null and full models
modSv = cbind(mod,svobj$sv)
mod0Sv = cbind(mod0,svobj$sv)
# Adjusting for surrogate variables using the limma package
fit = lmFit(edata,modSv)
Now I want to have a one vs all comparison. So, according to this post by Michael Love my model should be like this:
limma_mod <- model.matrix(~0 + cancer, data=pheno)
I'm confused whether I should use model.matrix(~1+ cancer, data=pheno)
as my full model and model.matrix(~1, data=pheno)
as the null model, then, for limma
, merge model.matrix(~0+ cancer, data=pheno)
with the surrogate variables, or use model.matrix(~0+ cancer, data=pheno)
as my full model and model.matrix(~1, data=pheno)
as the null model?
Note: using mod0 <- model.matrix(~0, data=pheno)
and mod <- model.matrix(~0+ cancer, data=pheno)
results in this error:
Error in solve.default(t(mod0) %*% mod0) : 'a' is 0-diml
I don't really see how your post title relates to the body of your post, but as described here if you have known batch effects you want to use either ComBat (maybe to get CPM values for heatmaps) or simply account for them in your limma design matrix (for DE analysis), whereas SVA is useful if you have unknown sources of variation. Do you still wish to use SVA? If so I can dig through some old code and probably help out for real.
Sorry about the title. I am really confused on this topic. Working with combat is really easier but I actually don't know all sources of variation in my data. I would be grateful if you coul help me.