I know people do this commonly, but what's the best way to go about when it comes to normalizing multiple different datasets for essentially a "meta-analysis" for clustering purposes? Basically, I want to take several different RNAi screens from different papers, normalize them (since their scoring pattern may be different), and find common signatures shared between them for clustering purposes. What I don't know is, at which step does this sort of "meta" analysis can be done - before or after clustering?
I ask this because I worry that when you normalize the data, you can potentially weaken the spread of the data (after all, that's like the whole point of normalization...) and thus affect the way you cluster the changes in the phenotypes.
What's the readout from the screens - is it multiparametric (high content imaging, RNA-Seq, ...) or single parameter (cell survival, reporter assay?). I'm not clear on what signatures you are looking for by clustering - are these clusters of genes that have common readout patterns across RNAi treatments? If the readouts are very different across the datasets, a common approach would be to reduce data to more comparable summary statistics within each dataset, and then compare the summary statistics across the datasets.
Most of the datasets are survival. It's like a suppressor screen - you do a genome-wide RNAi, add a drug (or do the RNAi in the presence of a mutant), and see what survived.
When you say: " a common approach would be to reduce data to more comparable summary statistics within each dataset, and then compare the summary statistics across the datasets. ", do the summary statistics tell you enough to cluster? I guess when I think of summary statistics, I'm thinking of just distribution (like the shape of the phenotypic scores on survival) but I'm not sure if it can tell about the community groups.