Adjusting for batch effect and covariates with ComBat
1
0
Entering edit mode
4.1 years ago
juara ▴ 40

Dear All,

my question is related to this post: Error in while (change > conv) { : missing value where TRUE/FALSE needed

I have a heterogeneous RNAseq dataset in TPMs from 66 samples and two sequencing batches (64 from one batch, 2 from the second batch). This dataset contains many disease types (categorical), sample types (categorical) and age (continuous) so I thought I can adjust for these covariates using ComBat. Ultimately, I would like to use this dataset for WGCNA to identify gene signatures predictive of survival. I have few questions. Here is what I have done so far:

First I removed samples with zero variance:

matrix_good  <- mymatrix[apply(mymatrix, 1, var) != 0, ]

Run a PCA and color the batches:

res.pca <- prcomp(t(matrix_good), scale = T)

groups <- as.factor(metadata$Sequencing.Batch)
fviz_pca_ind(res.pca, label = "none", col.ind = groups,addEllipses = FALSE)

download

While I understand there is only two samples in the second batch, I think I should adjust for them. Or do you think I should just simply remove them from the analysis?

Run a PCA and color the disease types:

groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca, label = "none", col.ind = groups, addEllipses = FALSE)

disas

Some disease types cluster together (for example the dark yellow + on the bottom right) and so I need to adjust for this and possibly age.

Now I use the ComBat function to correct for batch effect while adjusting for disease type and age:

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease) + AGE, data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

I have two questions here. Shouldn't PCA analysis of combat_tpm look more homogenous than the original dataset?

res.pca_combat <- prcomp(t(combat_tpm), scale = T)
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca_combat, label = "none",col.ind = groups, addEllipses = FALSE)

disas1

Maybe I am not familiar with typical corrections with such tools, but It does not look to me that the confounding covariate is adjusted? I was expecting to see more uniform distribution of my samples.

and my second question. If I exclude AGE from my model, it gives me an error and I can not understand why.

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease), data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

Error in while (change > conv) {: missing value where TRUE/FALSE needed

based on the post above, I thought this error occurs when variance is very low, but here, my expression data is the same. Interestingly, in line with the post above, it seems to run well (but a lot longer) if, par.prior=FALSE. I get the same error if I quantile normalize the data before running ComBat.

I appreciate if you could help me understand this better.

Thank you

batch-effect RNA-Seq combat • 3.1k views
ADD COMMENT
0
Entering edit mode
17 months ago

For your first question that is "Shouldn't PCA analysis of combat_tpm look more homogenous than the original dataset?", NO, because you included variables "disease" and "AGE" in the model matrix. SO ComBat will consider these two as your biological variables of interest. Then, it PRESERVES the effect of these two in your data and you will see their biological effect in your clustering in PCA. Actually you are doing the opposite of what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6