Let's say I had two separate gene expression microarray experiments of normal vs cancer cell for the same cell type on the same platform like the Affymetrix Human Genome U133A Array. Would there be any pitfalls of aggregating the data by taking the CEL files from both and RMA normalizing it then comparing the aggregate control to the aggregate cancer? If so would it be best to start from the CEL files or could the aggregation work on even on more processed downstream data like the expression values from the soft files in the GEO database?
I agree with Neilfws and disagree with matt.newman and andrew. Though I would like to add that you should include batch/study as a covariate in the design matrix. I haven't used RankProd before but it looks appealing. If you want to find genes relevant for your cancer, doing meta-analysis compared with comparing the end results of both studies (p-values of fold changes) you have a worse sensitivity and worse specificity if you pursue the latter. For example, genes that are found differential in 1 study and not in the other, are often genes that are borderline significant in both studies. A venn diagram/comparing p-values of the 2 studies doesn't consider this information while integrated meta-analysis does.
I guess it really depends on how many datasets you're comparing. Take this one for example (Taken from ImmunoLand by Omicsoft: www.omicsoft.com/immunoland):
The x-axis represents a log2 fold change, while the size of the dot indicates p-value. Each dot is a comparison in a particular GEO dataset. I think you can make a conclusion that this gene (and genes with similar patterns) are consistently up-regulated in skin disease and IBD, when compared to normal.