I was doing .cel.And when I mapped the probe id to gene symbol, I found sometimes a gene is represented by multiple probe sets. How to define this gene value?
+1. In my opinion this question has received surprisingly little attention, especially considering that different probes on the same gene might show very different patterns of expression. I vaguely remember a paper suggesting that the best way to represent the whole expression of a gene is to use the the probe with the highest intensity (can't find the ref just now...).
I suppose you are referring to gene expression arrays that can have multiple probes per gene. If you want to have a single value you should take the probe closest to the 3'-end of the gene (so the end of the gene). This is because RNA molecules get degraded from the start of the molecule and therefore the signal/probe at the end of the molecule is most reliable. The reason why multiple probes were designed per gene was to quantify the extent of RNA degradation and to allow for differential isoform expression analysis.
+1. In my opinion this question has received surprisingly little attention, especially considering that different probes on the same gene might show very different patterns of expression. I vaguely remember a paper suggesting that the best way to represent the whole expression of a gene is to use the the probe with the highest intensity (can't find the ref just now...).