Hi there,
After having read the vignette, I am really keen to understand in depth how the Limma functions 'removebatcheffect' actually works. Looked at the function:
# removeBatchEffect.R
# A refinement would be to empirical Bayes shrink
# the batch effects before subtracting them.
removeBatchEffect <- function(x,batch=NULL,batch2=NULL,covariates=NULL,design=matrix(1,ncol(x),1),...)
# Remove batch effects from matrix of expression data
# Gordon Smyth and Carolyn de Graaf
# Created 1 Aug 2008. Last revised 1 June 2014.
{
if(is.null(batch) && is.null(batch2) && is.null(covariates)) return(as.matrix(x))
if(!is.null(batch)) {
batch <- as.factor(batch)
contrasts(batch) <- contr.sum(levels(batch))
batch <- model.matrix(~batch)[,-1,drop=FALSE]
}
if(!is.null(batch2)) {
batch2 <- as.factor(batch2)
contrasts(batch2) <- contr.sum(levels(batch2))
batch2 <- model.matrix(~batch2)[,-1,drop=FALSE]
}
if(!is.null(covariates)) covariates <- as.matrix(covariates)
X.batch <- cbind(batch,batch2,covariates)
fit <- lmFit(x,cbind(design,X.batch),...)
beta <- fit$coefficients[,-(1:ncol(design)),drop=FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(X.batch)
}
My understanding is that when you have just one known source of bias (say, different days in which an experiment was performed), covariates is = null. Thus, the function will be the following:
batch <- as.factor(batch)
contrasts(batch) <- contr.sum(levels(batch))
batch <- model.matrix(~batch)[,-1,drop=FALSE]
once a design matrix has been created..
fit <- lmFit(x,cbind(design,X.batch),...)
beta <- fit$coefficients[,-(1:ncol(design)),drop=FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(X.batch)
So that's the hard bit for a lay person... So once you have your 'dds' object (I am using DESeq2), design formula (es. ~condition
), Limma's function adds the batch object just created and then it fits a linear model for each gene in the dataset.
- Is that correct?
- can anyone explain what is the beta object in this case? I guess the following strings will add the beta variable to the equation?
Sorry if my questions may be very naive for you but I am really trying to understand every single steps and in the meantime trying to study a bit of stats.
Related: https://support.bioconductor.org/p/116821/#116828