Question

DESeq2 RIP-seq design with multiple factors

0

Entering edit mode

9.9 years ago

Bontus ▴ 80

Dear all,

I have a question concerning DESeq2's multi-factor design for a RIP-seq experiment. My experimental design is quite complicated, as can be seen in the following table:

condition   input_output   treatment
no_tag      input          no_stress
tag         input          no_stress
no_tag      input          stress
tag         input          stress
no_tag      output         no_stress
tag         output         no_stress
no_tag      output         stress
tag         output         stress

Here, input means RNA after extraction from the cells and output refers to the immunoprecipitated (IP) RNA, where IP uses an antibody against the tag. Therefore, the samples containing a tagged protein should show an enrichment of RNAs compared to the input and, if the signal is real (no random binding), also compared to the tag-free samples.

My initial approach was to follow the analysis described here: https://support.bioconductor.org/p/61509/ and test for significant enrichment under 'stress' and 'no stress' separately with the following design:

design = ~condition+input_output+condition:input_output;
ddsCountMatrix <- DESeqDataSetFromMatrix(
  colData = sample_information_stress),
  countData = count_table_stress,
  design = design);
dds <- DESeq(ddsCountMatrix);
reduce = ~condition+input_output;
dds <- DESeq(dds, test = 'LRT', reduced = reduce);
res <- results(dds,altHypothesis='greater');

The model matrix generated by this design looks like this:

model.matrix(~input_output+condition+input_output:condition,sample_information_stress)

   (Intercept) input_outputoutput conditionDhh1
5            1                  0             0
6            1                  0             1
7            1                  0             0
8            1                  0             1
13           1                  1             0
14           1                  1             1
15           1                  1             0
16           1                  1             1
   input_outputoutput:conditionDhh1
5                                 0
6                                 0
7                                 0
8                                 0
13                                0
14                                1
15                                0
16                                1

So Intercept comprises 'no tag' and 'input'. However, contrary to my assumption, the resulting list of genes is expressed higher in the control samples despite having a positive LFC (specified via altHypothesis).

My questions therefore are:

What am I missing in the design to find genes that are enriched in output tag vs. input tag and output no_tag?
Is it possible to include the stress vs. no stress comparison in the design as well? Or should I stick to identifying genes above background with the above methodology and then continue with a smaller gene list for comparing stress vs. no stress?

Any help is greatly appreciated, thanks in advance and best regards,

René

multifactor deseq2 DESeq2 RIP-seq R • 3.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.9 years ago by Bontus ▴ 80