Im taking stage specific data so for one set im taking from one publication and other set from a different publication now the question comes about the batch since there might be many factors contributing to noise which might not contribute to the true biological variability so how do i remove the batch effect so i came across combat ,RUV ,sva, but there is also in deseq2 to define batch to resolve the batch effect .
so I have 4 sample from HSC and 2 sample from granulocyte which is from a different publication so im trying to define it in my design but im getting error there been similar issue i read but im not able to resolve ..
countdata <- read.table('HSC_Gran.txt', header=TRUE, row.names=1)
head(countdata)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- colnames(countdata)
countdata <- countdata[rowSums(countdata)>10,]
names(countdata)
length(countdata[,1])
countdata <- as.matrix(countdata)
head(countdata)
colnames(countdata)
condition <- factor(c(rep("Control", 4),rep("Test", 2)),levels=c("Control", "Test"))
batch <- factor(c(rep("A", 4),rep("B", 2)),levels=c("A", "B"))
(coldata <- data.frame(row.names=colnames(countdata), condition,batch))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition:batch)
Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
I'm doing something incorrect in my design ,any suggestion or help would be highly appreciated
You might want to look at my answer here, which may clarify your issues.
Additionally, you're looking at the interaction between condition and batch.
okay .so suggest me, as there are no mature samples which is my granulocyte from the same publication this is from a different lab/publication data so how do i remove the batch effect do comparison/differential analysis
As @JJ has already told you, this design cannot be made balanced. I'd suggest analysing the experiments separately, and if you really feel they're comparable, try a non-parametric approach such as RankProd for looking at the differences in respective fold changes.