Question

Microarray analysis of CEL files with Log-transformation instead of GCRMA or RMA

0

Entering edit mode

9.7 years ago

Bioinformatist Newbie ▴ 270

Basic Problem: When I analyse microarray data (Treated vs. Control) with GEO2R I get some 1000 genes above lfc=1 but when I do that analysis by using either GCRMA or RMA and then limma I get only 3 genes above lfc=1.

I want to do differential gene expression analysis of multiple drug vs. treated cases. I was wondering if it is possible to read the cel files but instead of using RMA or GCRMA use the log-transformation as is done by GEO2R. I am encountering problem in making an expression-set by reading the cel files without using RMA and GCRMA.

If somebody has tried it this way then share your experience. Thanks.

Note: The dataset I have doesn't contains series matrix file, otherwise I could have use GEO2R approach simply. So only thing I have is cel files.

microarray limma R GEO2R • 5.3k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

Do you have replicates (ie., more than one sample per group when doing differential expression)?

ADD REPLY • link 9.7 years ago by Sean Davis 27k

0

Entering edit mode

Yes, For every experiment I have at least 3 treatment and 3 control sample. I am analyzing Build 2 of Connectivity Map

ADD REPLY • link 9.7 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

When making comparisons with replicates available, I'd suggest focusing on FDR rather than (or at least in addition to) LFC. If the FDRs are near 1, then your experiment may simply not have detectable differentially-expressed genes.

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Sean Davis 27k

Ram · Answer 1 · 2015-08-20

0

Entering edit mode

9.7 years ago

andrew ▴ 560

RMA and GCRMA is used for normalization. Supposedly, GEO2R performs RMA, but in my experience, it does not appear to perform any kind of QC - which is incredibly problematic as you have no idea is there is a bad file/sample. Although GEO2R does allow one to inspect the distribution of each CEL file, it provide no objective tool to identify outliers. Log transformation is usually only performed after QC and normalization has been performed.

The company I work for offers these capabilities for most major Affy platforms for Human, Mouse, and Rat. The application is called iPathwayGuide, and will accept raw CEL files and then perform QC and Normalization (GCRMA) automatically, and will provide statistics on the acceptable CEL files and then perform DEGs analysis including prediction of miRNA activity, GO analysis, Pathway analysis, Disease analysis, and can perform meta analysis comparing various contrasts.

The best part is that its 100% free to use. You only pay if you want you want to keep your results beyond 72 hours.

Give it a shot.

http://www.iPathwayGuide.com

Here are a few screen shots of the QC/Normalization process.

screenshot-1

screenshot-2

screenshot-3

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by andrew ▴ 560

0

Entering edit mode

In my case it will be of no use because GCRMA is not giving me more than 3 genes which are above lfc=1 and this application you are sharing is by default using GCRMA for normalizarion.

ADD REPLY • link 9.7 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

GEO2R does not perform RMA. It uses the values as provided by the submitter (modulo log2 transformation), I believe.

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Sean Davis 27k

0

Entering edit mode

Yes you are right and that I already know. But my questions are:

Why I get more than 1000 genes having lfc=1 while I use GEO2R analysis while only 3 gene having lfc=1 when I use RMA or GCRMA and limma?
Is it possible that I can read the cel files but instead of applying RMA or GCRMA I just apply log2 transformation and then limma to find DEG? I think in this case I will get the exactly same no.of genes with lfc=1 as if analysis done by GEO2R.

Problem: I have tried the 2nd method, read cel files into an affy batch and take exprs(affybatch) but in that case I loose information about gene identifiers (row names) and total no.of rows in my expression set are ~550000 (while total no.of genes in GPL96 platform are 22283). In contrast to this when I read cel files and apply RMA or GCRMA I get the expression set with 22283 rows and also the gene identifiers. Problem is how can I solve this issue ?

Kindly read it thoroughly and reply comprehensively. Thanks

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

GEO2R has to assume that all of the normalization done previously was appropriate.
Sure, but the results will be meaningless.

Use affy with RMA or GCRMA normalization. The results from that will be more reliable than what GEO2R could reasonably produce.

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Devon Ryan 105k

0

Entering edit mode

But if I use affy with RMA or GCRMA then I usually get less than 4 genes above lfc=1 for most of the experiements. What should I do in this case try some other normalizations like mas5 or decrease the lfc threshold? The same approach (affy + RMA/GCRMA) for some experiments yield ~1500 genes above lfc=1 but in the current study it is not giving me more than 3 genes above this threshold. It means that samples I am using are having very similar expression values?

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Bioinformatist Newbie ▴ 270

1

Entering edit mode

You seem to be assuming that your data actually have detectable differentially expressed genes. That is simply not always the case, unfortunately. The number of differentially expressed genes is strongly affected by the experiment, not just the analysis approach.

You are free to try other normalization methods and use different cutoffs for LFC (you should probably be using FDR, instead, for comparing across methods), but you'll need to determine the effect that this has on your false positive rate.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.7 years ago by Sean Davis 27k