Hi guys,
I have a general question regarding the correct use of estimateDisp function of edgeR.
Suppose you have the following situation:
Sample Condition
A1 Control
A2 Control
B1 Treatment1
B2 Treatment1
B3 Treatment1
C1 Treatment2
C2 Treatment2
C3 Treatment2
and you want to compare Control vs Treatment1, Control vs Treatment2, Treatment1 vs Treatment2.
The data matrix contains around 12.000 genes on raws and 8 columns (samples).
Is it possible and correct to apply estimateDisp function only on the subset of samples you want to compare (i.e. Control vs Treatment1)? In other words when I compare Control vs Treatment1 the design matrix should contain also Treatment2?
Suppose a second situation, i.e. you want to compare Treatment1 vs Treatment2 only. You never use samples A1 and A2 for some reasons. In this case, is it correct to retain the control samples while estimating the dispersion?
Although to me is pretty clear how to go on, I'm a little bit confused by the practical use (beyond the theory) of the function by some bioinformaticians.
Thank you in advance.
E.
I would ask this on Bioconductor support forum.