Entering edit mode
7.5 years ago
Folder40g
▴
190
Hi
I have samples (rnaseq) coming from two different experiments. I plotted all samples in a PCA plot and I saw a clear batch effect.
Then I used svaseq
to plot a PCA without the batch effect and I got something quite reasonable, samples from different experiments mix but without a clear order. Samples from each experiment in the PCA cluster as expected.
But then I run combat, write the output and create a heatmap with it in morpheus and the clustering that I get is the one I would get without batch effect correction.
Why is this so? What I'm doing wrong?
Thanks
library(sva)
library(limma)
count <- read.table("/local/deseq/ALLsamples.txt", header = T, row.names = 1)
count <- log(count + 1)
#-- Filtering counts min >=2 genes having >5 counts
filter <- apply(count, 1, function(x) length(x[x>5])>=2)
filtered <- count[filter,]
dat0 <- as.matrix(filtered)
#-- Phenotype
pheno <- as.data.frame(cbind(c(rep("Daf", 10), rep("Tuv",19)), c(rep("Dox",5), rep("Doxless",5), rep("mN",7), rep("mP",6), rep("mT",6))))
rownames(pheno) <- colnames(count)
colnames(pheno) <- c("Experiment", "GroupSample")
#-- Models
#-- Full model = adjustment variables + variables of interest
#mod = model.matrix(~as.factor(Experiment) + as.factor(GroupSample), data=pheno)
mod <- model.matrix(~as.factor(Experiment), data=pheno)
#-- Null model = adjustment variables
mod0 <- model.matrix(~1, data=pheno)
#-- Number of surrogated variables n-sv
num.sv(as.matrix(filtered),mod,method="leek") #-- Value is 1
#-- svaseq + PCA plot
#-- svaseq
svseq <- svaseq(as.matrix(filtered), mod = mod, mod0 = mod0, n.sv = 1)
svseq_plot <- svaseq(as.matrix(filtered), mod = mod, mod0 = mod0, n.sv = 1)$sv
#-- Plot
plot(svseq_plot, pch=19, col="blue")
#-- Clean original input and obtain corrected counts
cleaningY = function(y, mod, svaobj) {
X = cbind(mod, svaobj$sv)
Hat = solve(t(X)%*%X)%*%t(X)
beta = (Hat%*%t(y))
P = ncol(mod)
cleany = y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
return(cleany)
}
#-- Clean
counts_sva = cleaningY(count, mod, svseq)
pca = prcomp(t( counts_sva ), scale=FALSE)
plot(pca$x[,1], pca$x[,2])
text(pca$x[,1], pca$x[,2], rownames(pca$x), pos= 3 )
#-- Combat
mod_batch <- mod[,2]
combat_adj <- ComBat(as.matrix(filtered), batch = mod_batch, mod = mod0, par.prior=TRUE, prior.plots=FALSE)
write.table(combat_adj, "/local/CombatAdjusted.txt", sep ="\t")
Hi,in the process of the PCA plot,why you choose scale=FALSE ?I have tried my data with the scale TRUE or FALSE ,the results are very different.
Scaling or not will depend on the distribution of your input data.