Multi factor analysis in DESeq2 and design formula
1
0
Entering edit mode
2.1 years ago

Hi all,

I am trying to do a multifactor analysis in DESeq2 with RNA-Seq data from mammalian cells. I have 12 different samples:

name,cell,treatment
Sample 1,cell_a,untreated
Sample 2,cell_a,untreated
Sample 3,cell_a,untreated
Sample 4,cell_a,treated
Sample 5,cell_a,treated
Sample 6,cell_a,treated
Sample 7,cell_b,untreated
Sample 8,cell_b,untreated
Sample 9,cell_b,untreated
Sample 10,cell_b,treated
Sample 11,cell_b,treated
Sample 12,cell_b,treated

This means I have both cell line a and b treated and untreated and three biological replicates per combination.

Now, I want to analyze the data for differentially expressed genes after treatment (cell line b as a reference), but with the differentially expressed genes in the untreated samples substracted as a baseline. The goal is to exclude the genes that are already differentially expressed between the two cell lines without any treatment.

This is my code when working with only one factor (treatment in this case):

dds <- DESeqDataSetFromMatrix(countData = allcts,
                              colData = anno,
                              design = ~ treatment)
dds

dds$treatment <- factor(dds$treatment, levels = c("treated", "untreated"))
dds$cell <- factor(dds$cell, levels = c("cell_a", "cell_b"))
dds$treatment <- relevel(dds$treatment, ref = "untreated")
dds$cell <- relevel(dds$cell, ref = "cell_b")
dds <- DESeq(dds)
res <- results(dds)

And this is my code when it comes to the multifactor analysis:

ddsMF <- dds

design(ddsMF) <- formula(~ treatment + cell)

ddsMF <- DESeq(ddsMF)
resMF <- results(ddsMF)
head(resMF)

The intention is to have the effect of the cell line on the gene expression, accounting for (or normalized on) the effect of treatment. But this does not seem to have the desired effect. I analyzed the ratios of cell a vs. cell b treated and untreated, and if the above code works the way I intend it to, genes that are DE in the untreated cell lines should not show up in the results for the treated cell lines. I think I got the design formula wrong. Maybe someone has an idea how to put this into these terms, I am not sure.

Cheers and best wishes

Niklas

rnaseq DE statistics deseq design • 1.8k views
ADD COMMENT
1
Entering edit mode

Hi, I'm not sure why you think this is true:

I analyzed the ratios of cell a vs. cell b treated and untreated, and if the above code works the way I intend it to, genes that are DE in the untreated cell lines should not show up in the results for the treated cell lines.

Surely if you are isolating the effects of cell type from the effects of treatment, then most of the genes that are DE between cell_a and cell_b in treated should also be DE in untreated?

ADD REPLY
0
Entering edit mode

Thank you for the reply. Maybe I phrased this incorrectly, sorry. Yes, most of the genes should be DE in the treated samples and the untreated, when isolating the effect of the cell line. I will try to clarify: Let's say we have to groups of differentially expressed genes, those that are DE after treatment in cell line a vs cell line b (DE#1) and then those that are DE withouth any treatment in cell line a vs cell line b (DE#2). Is there a way to do DE#1 - DE#2 = DE#3, with DE#3 comprising the genes that are DE specifically because of the treatment? Otherwise, if one only looks at DE#1, there are also DE genes because of the effect of the cell line. So basically a combination of the effect the cell line has AND the effect the treatment has. I hope I can get my point across.

ADD REPLY
0
Entering edit mode

Yep. I understand. See answer by @swbarnes2 and my comments on it.

ADD REPLY
1
Entering edit mode
2.1 years ago

What people typically do for experiments like this is use interactions to work out the difference in the fold changes between treated vs untreated in a, compared to the fold changes between treated and untreated in b.

ADD COMMENT
1
Entering edit mode

See the part of the DESeq2 manual here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

You might also do exactly what you said in your comment to me on the question: DE#1 - DE#2

If instread of having two columns in your design matrix, you have just one (say condition), and its values are cell_a_treated, cell_a_untreated, cell_b_treated, cell_b_untreated.

We can now think of DE#1 cell_a_treated - cell_a_untreated and DE#2 as cell_b_treated - cell_b_untreated

DE#1 - DE#2 is now (cell_a_treated - cell_a_untreated) - (cell_b_treated - cell_b_untreated). Which can be rearraged to cell_a_treated + cell_b_untreaded - cell_a_unstreated - cell_b_treated.

You can then input this into the DESeq2 contrasts arguement, either as list(c("cell_a_treated", "cell_b_untreated)", c("cell_a_untreated", "cell_b_treated")) or by using a numeric vector of coefficients.

ADD REPLY
0
Entering edit mode

Ok great, this seems totally plausible. Thank you for the explanation! I have just one more point that is not clear to me: the contrast argument needs a list of two elements as an input, the names of the log2 FC to be added and the ones to be substracted. These should come from the resultsNames function. But the different conditions itself (cell_a_untreated,...) do not contain log2 FC. Do I overlook something here? Sorry for my ignorance, I get the idea but I am not sure how to transfer this to R.

ADD REPLY

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6