Aggregating Microarrays from different experiments
3
0
Entering edit mode
9.4 years ago
ad ▴ 30

Let's say I had two separate gene expression microarray experiments of normal vs cancer cell for the same cell type on the same platform like the Affymetrix Human Genome U133A Array. Would there be any pitfalls of aggregating the data by taking the CEL files from both and RMA normalizing it then comparing the aggregate control to the aggregate cancer? If so would it be best to start from the CEL files or could the aggregation work on even on more processed downstream data like the expression values from the soft files in the GEO database?

microarray Affymetrix expression • 3.2k views
ADD COMMENT
2
Entering edit mode
9.4 years ago
Neilfws 49k

Plenty of pitfalls, yes. But we can combine different studies, it's called meta-analysis. As a starting point, you may like to investigate the R/Bioconductor package RankProd which was written for this purpose.

ADD COMMENT
1
Entering edit mode

I agree with Neilfws and disagree with matt.newman and andrew. Though I would like to add that you should include batch/study as a covariate in the design matrix. I haven't used RankProd before but it looks appealing. If you want to find genes relevant for your cancer, doing meta-analysis compared with comparing the end results of both studies (p-values of fold changes) you have a worse sensitivity and worse specificity if you pursue the latter. For example, genes that are found differential in 1 study and not in the other, are often genes that are borderline significant in both studies. A venn diagram/comparing p-values of the 2 studies doesn't consider this information while integrated meta-analysis does.

ADD REPLY
0
Entering edit mode

I guess it really depends on how many datasets you're comparing. Take this one for example (Taken from ImmunoLand by Omicsoft: www.omicsoft.com/immunoland):

The x-axis represents a log2 fold change, while the size of the dot indicates p-value. Each dot is a comparison in a particular GEO dataset. I think you can make a conclusion that this gene (and genes with similar patterns) are consistently up-regulated in skin disease and IBD, when compared to normal.

ADD REPLY
0
Entering edit mode
Sure things that are very solid will be more easily identified using both methods
ADD REPLY
0
Entering edit mode
9.4 years ago
andrew ▴ 560

As Neilfws points out, it can be done, but that doesn't mean it should be done. We have an application designed specifically for this called iPathwayGuide (www.iPathwayGuide.com). This is a web-based application that doesn't require any coding experience. Simply upload your CEL files in the groups your wish to analyze. iPathwayGuide will QC check and normalize (GCRMA) the CEL files automatically. Information about DEGs, Predicted miRNAs, GO terms, Pathways and diseases is provided in minutes. If you have multiple analyses, you can generate a meta report that will give you information about the overlap between the datasets. Here's are a brief video.

ADD COMMENT
0
Entering edit mode
9.4 years ago
matt.newman ▴ 170

I'd consider looking at the results of the differentially expressed genes and compare those between the datasets, rather than trying to aggregate the raw (or normalized data) all at once. We do something similar to public datasets in our ImmunoLand product (http://www.omicsoft.com/immunoland).

ADD COMMENT

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6