Hi everyone.
I have been contemplating doing a few analyses of publicly available affy array data using R and Bioconductor to define signatures for phenotypes/identify transcriptional biomarkers et cetera. I always work using raw data and the options I have in terms of workflows are
[1] Pool together all the .CEL files, then run it through RMA and limma in one go. [2] Normalise arrays from individual studies with their respective batches, then combine normalised expression values into one expression set for further analysis. [3] Try combining P-values using Stouffer's z, for instance.
Previously, my approach involved looking across differentially expressed genes for each study addressing a question to see which genes were recurrent, but given issues associated with dodgy datasets/small datasets with high adjusted P.values introducing lots of false negatives I am not a fan.
Which workflow would you recommend and why? Also, what other solutions exist to carry out microarray-meta analysis starting from .CEL files and sample group data?
Cheers, Ankur Chakravarthy.