Question

EdgeR generic design matrix

0

Entering edit mode

7.0 years ago

rfenouil • 0

Hello all,

I have some very generic/naive questions regarding EdgeR. My apologies if these questions were already asked, I could not find them answered clearly on the forum.

1. Build a generic design matrix

From EdgeR's documentation, in chapter "3.3 Experiments with all combinations of multiple factors", I read: "A simple, multi-purpose approach is to combine all the experimental factors into one combined factor"

If I understand correctly, this approach allows to specify a generic design matrix (only replicates are identified as such). Then, comparisons between conditions of interest are carried out using contrasts during statistical testing procedure. Does that sound corrrect ?

Does that mean that design matrix information is not required during estimation of dispersion ? In other terms, could you confirm that using this approach is theoretically equivalent to the classic approach where conditions are separated in design matrix ?

A systematic use of this strategy would help me for automatization of an analysis, and I was wondering whether it makes sense or not.

2. QC plots

While 'playing' with edgeR, I generated the attached plots. I believe these plots give important information about quality for downstream processes and I think it might be important to provide them for every analysis.

Unfortunately, the details of EdgeR's method are too elaborated for my understanding. I would like to know if there is a simple way to explain what is important to look for in them.

By reading documentation, I made myself a representation of what they mean but it is for sure incomplete and likely to be incorrect... I would appreciate a piece of advice from experts. Apologies if figures title/axes/legend don't make sense, I made some of them from what I thought I understood...

Thank you very much for your help.

Figures

RNA-Seq EdgeR • 2.1k views

ADD COMMENT • link updated 7.0 years ago by Devon Ryan 105k • written 7.0 years ago by rfenouil • 0

score 0 · Answer 1 · 2018-06-19

0

Entering edit mode

7.0 years ago

Devon Ryan 105k

If I understand correctly, this approach allows to specify a generic design matrix (only replicates are identified as such). Then, comparisons between conditions of interest are carried out using contrasts during statistical testing procedure. Does that sound correct ?

Depends on how generic generic is. Basically have a design of ~0 + group, where the levels of group can be something like: 'WT untreated', 'WT treated', 'Mut untreated', 'Mut treated', to give a simple example.

Does that mean that design matrix information is not required during estimation of dispersion ?

No, you need to know your groups either way. You can only automate so much, since inevitably more complicated designs mean that only some comparisons will be of interest.

Regarding the plots, the absolute simplest explanation is that (A) you want to ensure that the trend lines actually fit the points and (B) that the squeezed variances move in a reasonable direction (toward the fit or the NB mean-variance relationship).

ADD COMMENT • link 7.0 years ago by Devon Ryan 105k

0

Entering edit mode

Hello and thank you !

Ok ~0 + group is what I was thinking about (generic design matrix), with groups defining combinations of experimental conditions as in you example. Then, that should allow me to compare WT vs Mut using contrasts (1, 1, -1, -1), as well as Untreated vs Treated using a different contrast matrix (1, -1, 1, -1) when applying statistical test. Is that approach correct and equivalent to defining separate factors for WT/Mut and Treated/Untreated in design matrix ?

If so, this would be enough for my 'automation' needs.

ADD REPLY • link 7.0 years ago by rfenouil • 0

0

Entering edit mode

Yup, that'd be the equivalent.

ADD REPLY • link 7.0 years ago by Devon Ryan 105k

0

Entering edit mode

Awesome, thank you very much for your help.

ADD REPLY • link 7.0 years ago by rfenouil • 0