Question

Aggregating Microarrays from different experiments

0

Entering edit mode

9.8 years ago

ad ▴ 30

Let's say I had two separate gene expression microarray experiments of normal vs cancer cell for the same cell type on the same platform like the Affymetrix Human Genome U133A Array. Would there be any pitfalls of aggregating the data by taking the CEL files from both and RMA normalizing it then comparing the aggregate control to the aggregate cancer? If so would it be best to start from the CEL files or could the aggregation work on even on more processed downstream data like the expression values from the soft files in the GEO database?

microarray Affymetrix expression • 3.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by ad ▴ 30

Ram · Answer 1 · 2015-07-06

2

Entering edit mode

9.8 years ago

Neilfws 49k

Plenty of pitfalls, yes. But we can combine different studies, it's called meta-analysis. As a starting point, you may like to investigate the R/Bioconductor package RankProd which was written for this purpose.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by Neilfws 49k

1

Entering edit mode

I agree with Neilfws and disagree with matt.newman and andrew. Though I would like to add that you should include batch/study as a covariate in the design matrix. I haven't used RankProd before but it looks appealing. If you want to find genes relevant for your cancer, doing meta-analysis compared with comparing the end results of both studies (p-values of fold changes) you have a worse sensitivity and worse specificity if you pursue the latter. For example, genes that are found differential in 1 study and not in the other, are often genes that are borderline significant in both studies. A venn diagram/comparing p-values of the 2 studies doesn't consider this information while integrated meta-analysis does.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by Irsan ★ 7.8k

0

Entering edit mode

I guess it really depends on how many datasets you're comparing. Take this one for example (Taken from ImmunoLand by Omicsoft: www.omicsoft.com/immunoland):

The x-axis represents a log2 fold change, while the size of the dot indicates p-value. Each dot is a comparison in a particular GEO dataset. I think you can make a conclusion that this gene (and genes with similar patterns) are consistently up-regulated in skin disease and IBD, when compared to normal.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by matt.newman ▴ 170

0

Entering edit mode

Sure things that are very solid will be more easily identified using both methods

ADD REPLY • link 9.8 years ago by Irsan ★ 7.8k

Ram · Answer 2 · 2015-07-07

As Neilfws points out, it can be done, but that doesn't mean it should be done. We have an application designed specifically for this called iPathwayGuide (www.iPathwayGuide.com). This is a web-based application that doesn't require any coding experience. Simply upload your CEL files in the groups your wish to analyze. iPathwayGuide will QC check and normalize (GCRMA) the CEL files automatically. Information about DEGs, Predicted miRNAs, GO terms, Pathways and diseases is provided in minutes. If you have multiple analyses, you can generate a meta report that will give you information about the overlap between the datasets. Here's are a brief video.

Ram · Answer 3 · 2015-07-07

0

Entering edit mode

9.8 years ago

matt.newman ▴ 170

I'd consider looking at the results of the differentially expressed genes and compare those between the datasets, rather than trying to aggregate the raw (or normalized data) all at once. We do something similar to public datasets in our ImmunoLand product (http://www.omicsoft.com/immunoland).

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by matt.newman ▴ 170