Dear all,
I am performing a differential expression analysis with deseq2, but I have firstly to take into account batch effects. I have an info file with several technical confounders and other information for the samples (twins), like family and zygosity. As you suggested, I am using the svaseq function for correcting the batch effects, according to: http://www.bioconductor.org/help/workflows/rnaseqGene/#batch
This is my workflow:
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = info,
design = ~ condition)
dds<-DESeq(dds)
dat <- counts(dds, normalized=TRUE)
mod <- model.matrix(~ condition, info)
mod0 <- model.matrix(~ 1, info)
svseq <- svaseq(dat, mod, mod0, n.sv=2)
I understand that this way I will clean the dataset, but how can I take into account also the family relatedness? Can I add this information in the model (mod?) or can I do it in the following steps of the differential expression analysis? including in the model not only condition and surrogate variables (do they have to be 2?), but also family or zygosity?
design <- ~ SV1 + SV2 + fam + condition
Thanks
Best Regards