Experiment 1: condition 1 vs Control Experiment 2: Condition 2 vs Control
We have the same measurement, however these experiments are performed by different user and on different cohort of animals. We would like to compare the effect of Condition 1 vs Condition 2.
However, I cannot directly compare all the four observations together because I believe there is the effect of experiment, batch, observer/handling bias. Given that we have controls groups of same kind, is there any way to compare(say.. normalize) this data... So I can compare both of these and a significance test on that comparison
Here below is the sample of this data?
Cond1 Ctrl1 Cond2 Ctrl2
89 63 181 225
98 125 132 239
100 103 140 224
83 118 117 214
89 112 144 206
95 111 150 228
74 73 162 208
65 102 136 174
66 88 148 207
76 107 169 201
79 102 108 203
95 101 133 228
One of the ways that I thought about was to perform something like follows:
(X_cond1 - Mean of Ctrl1)/SD_ctrl1
and
(X_cond1 - Mean of Ctrl2)/SD_ctrl2
Any help/comments will be very much appreciated. Thanks in advance!
I'm reminded of chapter 9 in the limma user's guide, which addresses this sort of design in the context of microarrays (or RNAseq, by extension). Are these actually count-based values or was that just an example?
Sorry, I did not mention it in the question. These are not RNA Seq counts... These are observations/measurements from a behavioral experiment
OK, so I assume then that everything under Cond1 are measurements from different mice/rats/whatever on the same behaviour (as opposed to there being 12 different behavioural measurements and the counts representing sums within a cohort). Is this assumption correct?
Just to preemptively reply in case the answer is "yes, that's correct," it sounds like a generalized linear model would work. You could melt() that table in your example and add a "condition" and "batch" value to the resulting data frame. Then glm.nb(values~condition+batch, ...). I happened to try that and the results look reasonable (it turns out to not be overdispersed and remember that the coefficients are actually log(coefficient)).
Thank you very much.... And the assumption is correct.. But I fail to understand the explanation about linear model... Could you please elaborate a bit more if you don't mind... Thanks a lot in advance!
You'll probably be better off just googling around for a tutorial on linear (and generalized linear) models, since those will be rather more eloquent that anything I would write here :) in short, though, the idea is to properly estimate dispersion in your system and use that in a model fit and test statistic.
Great! Awesome idea!