Hi,
I want to normalize the gene counts from GTEX using variance stabilizing transformation (VST) but I'm confused about which variables I should include in the "design" when creating DESeqDataSet. For the moment I'm doing the following:
dds <- DESeqDataSetFromMatrix(countData = gtex,
colData = sampledata,
design = ~ tissue) #generate the deseq data set
dds <- dds[ rowSums(counts(dds)) > 1, ] #remove genes with zero counts
vsd <- vst(dds, blind = FALSE) #normalization considering tissue
However, this just considers the different tissues during the normalization. My question is should I do it like this and include all the tissues or do it for each tissue and use something like ~ 1? should I include other variables like the experimental batch or Post-mortem interval (PMI)?
Many thanks
When you set blind = TRUE, I'm pretty sure you are not considering the different tissues.
oops! its corrected now