Dear all,
I have a question concerning DESeq2's multi-factor design for a RIP-seq experiment. My experimental design is quite complicated, as can be seen in the following table:
condition input_output treatment
no_tag input no_stress
tag input no_stress
no_tag input stress
tag input stress
no_tag output no_stress
tag output no_stress
no_tag output stress
tag output stress
Here, input means RNA after extraction from the cells and output refers to the immunoprecipitated (IP) RNA, where IP uses an antibody against the tag. Therefore, the samples containing a tagged protein should show an enrichment of RNAs compared to the input and, if the signal is real (no random binding), also compared to the tag-free samples.
My initial approach was to follow the analysis described here: https://support.bioconductor.org/p/61509/ and test for significant enrichment under 'stress' and 'no stress' separately with the following design:
design = ~condition+input_output+condition:input_output;
ddsCountMatrix <- DESeqDataSetFromMatrix(
colData = sample_information_stress),
countData = count_table_stress,
design = design);
dds <- DESeq(ddsCountMatrix);
reduce = ~condition+input_output;
dds <- DESeq(dds, test = 'LRT', reduced = reduce);
res <- results(dds,altHypothesis='greater');
The model matrix generated by this design looks like this:
model.matrix(~input_output+condition+input_output:condition,sample_information_stress)
(Intercept) input_outputoutput conditionDhh1
5 1 0 0
6 1 0 1
7 1 0 0
8 1 0 1
13 1 1 0
14 1 1 1
15 1 1 0
16 1 1 1
input_outputoutput:conditionDhh1
5 0
6 0
7 0
8 0
13 0
14 1
15 0
16 1
So Intercept comprises 'no tag' and 'input'. However, contrary to my assumption, the resulting list of genes is expressed higher in the control samples despite having a positive LFC (specified via altHypothesis).
My questions therefore are:
- What am I missing in the design to find genes that are enriched in output tag vs. input tag and output no_tag?
- Is it possible to include the stress vs. no stress comparison in the design as well? Or should I stick to identifying genes above background with the above methodology and then continue with a smaller gene list for comparing stress vs. no stress?
Any help is greatly appreciated, thanks in advance and best regards,
René