Difference Between Rma Analysis Of Cel Files And Data From Geoquery Of Array Data
1
0
Entering edit mode
11.6 years ago
J.F.Jiang ▴ 930

Hi all,

Just a discussion,

For microArray data, there are always two ways to obtain the expression value for probe across the samples,

1) download the original CEL files, then use ReadAffy & rma to get the matrix OR use justRMA directly

2) use GEOquery to obtain the matrix directly

However, I found somehow minimal difference between these two method, but I do not know why?

Another question is that can I use the matrix from GEOquery to directly do differential expression analysis as outputs of rma?

And which one is better for DE analysis, 1) using probe level 2) using gene level Because one gene may point to several probes, when we carry out DE analysi, one step is to obtain the DE output which needs p.adjust, so the question is that the array may have 50K probes but only have 20k genes, which may results quite different results.

Anyone can answer these questions?

Thanks

array • 5.8k views
ADD COMMENT
0
Entering edit mode

I always use the expression matrix directly. The difference between the two methods can be ignored, array data is not so accurate. I don't know the choice between probe and genes.

ADD REPLY
0
Entering edit mode

Maybe I am so quite agree with you, I do think for gene expression analysis, array seems more accurate than RNASeq, using VST or RPKM value. The great advantage of RNAseq I think is the great ablity to hold all genes and special for those low transcribed genes.

If I am misunderstanding, plz correct me.

ADD REPLY
6
Entering edit mode
11.6 years ago
Neilfws 49k
  1. Where raw data (CEL files) are available, you should use them. Simply for the reason that you can never fully trust data that has been processed by someone else, unless what they did is absolutely explicit.

  2. You can expect "minimal" differences in RMA values between different implementations. If raw data are not available on which to perform normalization yourself and you are comfortable with the available processed data matrix, by all means use it. By "comfortable" I mean you understand what kind of values it contains, how they were derived and that they "look sensible" (for example, are not in the hundreds or thousands if log2 transformation was supposedly transformed).

  3. Neither probeset-level nor gene-level data are "better" for DE analysis: it all depends what you are trying to achieve. Using multiple probesets per gene can be informative if you are interested in splice variants or in evaluating how good probesets are as measures of expression; some may be more "responsive" than others.

In general, the most differentially-expressed genes in a gene-level analysis will also have the most differentially-expressed probesets. Simply because gene-level values are a rather crude summary, most often obtained by taking the median of (core) probesets for a gene.

ADD COMMENT
0
Entering edit mode

Great comments, actually I later tried rma and justRMA, there is not difference between them. For comment of splice variants, it indeed important to go through the probe-level analysis, however, for this kind of p adjustment, it may introduce the bias, how to adjust the p-value is still unclear, FDR control to get the Q-value?

And for CEL file analysis, we are always recommend to use RMA to decrease the bias among arrays, here is another question, MAS5 normalization could always scale all the arrays to the same level, e.g., 200 for affy, then we will do log2 to transform the matrix, so the two methods seems to be all appropriate for the DE analysis.

Finnally, when using RMA or MAS5, the probe level matrix will be normalized, after log2 transfromation, we will do quantile normalization for the probes, and gene mapping, so which one should be carried out first, normalization -> gene mapping OR gene mapping -> normalization?

ADD REPLY

Login before adding your answer.

Traffic: 1331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6