Dear All,
my question is related to this post: Error in while (change > conv) { : missing value where TRUE/FALSE needed
I have a heterogeneous RNAseq dataset in TPMs from 66 samples and two sequencing batches (64 from one batch, 2 from the second batch). This dataset contains many disease types (categorical), sample types (categorical) and age (continuous) so I thought I can adjust for these covariates using ComBat. Ultimately, I would like to use this dataset for WGCNA to identify gene signatures predictive of survival. I have few questions. Here is what I have done so far:
First I removed samples with zero variance:
matrix_good <- mymatrix[apply(mymatrix, 1, var) != 0, ]
Run a PCA and color the batches:
res.pca <- prcomp(t(matrix_good), scale = T)
groups <- as.factor(metadata$Sequencing.Batch)
fviz_pca_ind(res.pca, label = "none", col.ind = groups,addEllipses = FALSE)
While I understand there is only two samples in the second batch, I think I should adjust for them. Or do you think I should just simply remove them from the analysis?
Run a PCA and color the disease types:
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca, label = "none", col.ind = groups, addEllipses = FALSE)
Some disease types cluster together (for example the dark yellow + on the bottom right) and so I need to adjust for this and possibly age.
Now I use the ComBat
function to correct for batch effect while adjusting for disease type and age:
batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease) + AGE, data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
I have two questions here. Shouldn't PCA analysis of combat_tpm
look more homogenous than the original dataset?
res.pca_combat <- prcomp(t(combat_tpm), scale = T)
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca_combat, label = "none",col.ind = groups, addEllipses = FALSE)
Maybe I am not familiar with typical corrections with such tools, but It does not look to me that the confounding covariate is adjusted? I was expecting to see more uniform distribution of my samples.
and my second question. If I exclude AGE
from my model, it gives me an error and I can not understand why.
batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease), data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
Error in while (change > conv) {: missing value where TRUE/FALSE needed
based on the post above, I thought this error occurs when variance is very low, but here, my expression data is the same. Interestingly, in line with the post above, it seems to run well (but a lot longer) if, par.prior=FALSE
. I get the same error if I quantile normalize the data before running ComBat.
I appreciate if you could help me understand this better.
Thank you