Hi all,
I am trying to do a multifactor analysis in DESeq2 with RNA-Seq data from mammalian cells. I have 12 different samples:
name,cell,treatment
Sample 1,cell_a,untreated
Sample 2,cell_a,untreated
Sample 3,cell_a,untreated
Sample 4,cell_a,treated
Sample 5,cell_a,treated
Sample 6,cell_a,treated
Sample 7,cell_b,untreated
Sample 8,cell_b,untreated
Sample 9,cell_b,untreated
Sample 10,cell_b,treated
Sample 11,cell_b,treated
Sample 12,cell_b,treated
This means I have both cell line a and b treated and untreated and three biological replicates per combination.
Now, I want to analyze the data for differentially expressed genes after treatment (cell line b as a reference), but with the differentially expressed genes in the untreated samples substracted as a baseline. The goal is to exclude the genes that are already differentially expressed between the two cell lines without any treatment.
This is my code when working with only one factor (treatment in this case):
dds <- DESeqDataSetFromMatrix(countData = allcts,
colData = anno,
design = ~ treatment)
dds
dds$treatment <- factor(dds$treatment, levels = c("treated", "untreated"))
dds$cell <- factor(dds$cell, levels = c("cell_a", "cell_b"))
dds$treatment <- relevel(dds$treatment, ref = "untreated")
dds$cell <- relevel(dds$cell, ref = "cell_b")
dds <- DESeq(dds)
res <- results(dds)
And this is my code when it comes to the multifactor analysis:
ddsMF <- dds
design(ddsMF) <- formula(~ treatment + cell)
ddsMF <- DESeq(ddsMF)
resMF <- results(ddsMF)
head(resMF)
The intention is to have the effect of the cell line on the gene expression, accounting for (or normalized on) the effect of treatment. But this does not seem to have the desired effect. I analyzed the ratios of cell a vs. cell b treated and untreated, and if the above code works the way I intend it to, genes that are DE in the untreated cell lines should not show up in the results for the treated cell lines. I think I got the design formula wrong. Maybe someone has an idea how to put this into these terms, I am not sure.
Cheers and best wishes
Niklas
Hi, I'm not sure why you think this is true:
Surely if you are isolating the effects of cell type from the effects of treatment, then most of the genes that are DE between cell_a and cell_b in treated should also be DE in untreated?
Thank you for the reply. Maybe I phrased this incorrectly, sorry. Yes, most of the genes should be DE in the treated samples and the untreated, when isolating the effect of the cell line. I will try to clarify: Let's say we have to groups of differentially expressed genes, those that are DE after treatment in cell line a vs cell line b (DE#1) and then those that are DE withouth any treatment in cell line a vs cell line b (DE#2). Is there a way to do DE#1 - DE#2 = DE#3, with DE#3 comprising the genes that are DE specifically because of the treatment? Otherwise, if one only looks at DE#1, there are also DE genes because of the effect of the cell line. So basically a combination of the effect the cell line has AND the effect the treatment has. I hope I can get my point across.
Yep. I understand. See answer by @swbarnes2 and my comments on it.