I've picked up an analysis that has been sitting in someone's drawer for a while. Its an Agilent array experiment where global changes are expected between the control and the treatment. The person who originally did the analysis (on GeneSpring :( ) stated in their material and methods that they used normalisation to a set of unchanging control genes. This is implemented in limma as the "control" method to normalizeWithinArray
and makes sense given that the usual assumption that most genes are unchanged might not hold. The original analyst stated that they selected 83 genes which don't change across conditions to use as controls, but they don't say how they selected these. Anyone ever done this sort of analysis? Anyone know how they might have done this?
Just speculating, but two approaches to select a control gene set would be selection of "known" housekeeping genes (e.g. GAPDH, Actin, etc.), or rank invariant set normalization to select genes whose ranks are stable across samples (original ref I believe at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55329 ). Given the size of the set you mention (83), the second sounds like a better fit to your description, to me.
We are looking at RNA processing genes, so there is no reason to suspect that "housekeeping" genes wouldn't also be affected, but I'll look into the rank invariance thing.
Just to report back on this... Rank invariant normalisation does in fact do better than something like quantile normalisation, but it is still clearly normalising away real differences, as can be seen by the behavoir of spike-in controls. Unfortunately normalising to the spike-ins doesn't work either ... still tinkering.