In an experiment on fruit flies, I have a set of expression data of four groups of larvae:
1. 0 day old mutants
2. 3 day old mutants
3. 5 day old mutants
4. 0 day old wild-type
(The wild-type does not have 3 or 5 day olds because by that time wild-type larvae have pupated)
I have a quandary with normalization. Currently, when using R and limma, I normalize only for the comparison I'm doing (e.g. if I was comparing 0 day old mutants to 0 day old wild-types I would only normalize with those two groups and then do the linear modeling / t-tests)
My question is should I normalize across all the groups even if I'm only comparing two groups at a time?
FYI, the assay was performed by Genosensor Corporation of Tempe, AZ using the Genoexplorer microRNA system (http://www.genosensorcorp.com/infoaboutproducts.html) and arrays were scanned using GenePix Pro software version 5.0.0.49. The arrays are single-color (635nm) so I didn't do any within array normalization (as is suggested in limma documentation).
If you really are doing isolated comparisons - then it doesn't make sense to normalize more things than will be compared. The point of normalization is to provide something in common for the things being compared, so that differences can be identified. The most sensitive way to do this would be to include only the things being compared in your normalization. However, are you really only going to do isolated comparisons this way even though you have a more extensive data set? If this is a well controlled time course, I would think that many kinds of comparisons are possible. In limma you can create a single normalized data set, and then ask many questions, and create many contrasts of interest. You might consider doing an in silico experiment where you try both kinds of normalization and then compare the results.
It would help if you mentioned the expression platform, as there might be caveats related to the technology.
Thanks for the great answer. I edited the post to include platform information. I have done the in silico comparison and the only thing that splitting them up into groups is improve the p-values slightly, it doesn't change the order or significance of the mis-regulated genes.
Thanks for the great answer. I edited the post to include platform information. I have done the in silico comparison and the only thing that splitting them up into groups is improve the p-values slightly, it doesn't change the order or significance of the mis-regulated genes.