I have an experiment where I know that both group (high, low, control) and subject (A - E) play a role in my data. As such, I would normally model this as ~0 + group + subject
. However, for a couple of subjects they are not well represented in all groups. Below a simplified example where groups high
and low
have subjects A
- C
but for group control
we have subjects C
- E
. Control is thus confounded by D
and E
.
sample group subject
1 high A
2 high B
3 high C
4 low A
5 low B
6 low C
7 control C
8 control D
9 control E
When modeling this experiment, I can of course consider only the group effect (~0 + group
) and of course I will not know if any comparisons against control
will reflect differences against this group, or against subjects D
and E
. In this situation I get a large number of estimated differentially expressed features (at FDR < 0.05: high vs low = 1100
, high vs control = 800
, low vs control = 110
).
However, when modeling it and including subject effect (~0 + group + subject
) I get an expected warning that the coefficients for D
and E
cannot be estimated. Yet, the number of differentially expressed features is much lower as expected (at FDR < 0.05: high vs low = 200
, high vs control = 300
, low vs control = 10
).
My questions are:
- Despite coefficients for
D
andE
not being estimatable, can I still rely on the results of the differential expression when accounting forsubject
in the model, particularly if I am only interested in group vs group comparisons? - Would this mean that I could at least have an accurate estimation of
high vs low
but not any comparison against control? - In other words, can we be confident in the coefficients that do not yield any warning despite the others that do?
Thanks in advance