Remove Surrogate Variables and Batch
3
3
Entering edit mode
10.1 years ago
tucanj ▴ 100

Is there a way to output a gene expression matrix corrected for batch (I know I can use ComBat) and surrogate variables from SVA? I would like to visualize the results of a limma differential expression however they are confounded and thus not amenable to visualization.

R sva ComBat • 9.4k views
ADD COMMENT
11
Entering edit mode
10.1 years ago
brentp 24k

If you have:

  • y as the gene expresion matrix
  • mod as the model matrix you sent to sva (the full model)
  • svs as svobj$sv where svobj is the output from the sva function

then you can use the function below to get a "cleaned" version of the matrix with the surrogate variables removed.

cleanY = function(y, mod, svs) {
    X = cbind(mod, svs)
    Hat = solve(t(X) %*% X) %*% t(X)
    beta = (Hat %*% t(y))
    rm(Hat)
    gc()
    P = ncol(mod)
    return(y - t(as.matrix(X[,-c(1:P)]) %*% beta[-c(1:P),]))
}

I modified this from a post by Andrew Jaffe here.

ADD COMMENT
0
Entering edit mode

The joys of linear algebra :)

ADD REPLY
4
Entering edit mode
8.2 years ago
jtleek ▴ 40

It is not recommended that you perform differential expression analysis after data cleaning - the resulting degrees of freedom will be wrong. The best thing to do is include the svs as covariates in downstream linear models in limma etc. as adjustment models.

  • Jeff (co-developer of sva)
ADD COMMENT
1
Entering edit mode

Dear Jeff, is it correct to perform a Gene regulatory network (GRN) analysis after the data cleaning with the function mentioned before?

ADD REPLY
3
Entering edit mode
8.2 years ago
ddiez ★ 2.0k

With Jeff's answer in mind, if you just want to remove the effect of batch variables for visualisation (e.g. heatmap, PCA), the limma function removeBatchEffect() can be used for that purpose.

ADD COMMENT
0
Entering edit mode

Hello, could you be more clear? why just for visualization? are you saying that the Limma corrected dataset cannot be used for other purposes?

ADD REPLY
0
Entering edit mode

Take a look at ?limma::removeBatchEffect for details of other uses of the processed dataset. For differential expression it is better to use the original dataset and add batch as covariate, as indicated in @jtleek's answer.

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6