summarise probe-level expression to genes (or exons)

Question

How to map probeset associated statistics to gene statistics in microarray differential expression analysis?

0

Entering edit mode

7.2 years ago

moxu ▴ 510

For microarray data, differential expression analysis is done for each probeset. The problem is that one gene is typically mapped to multiple probesets. Since for most if not all practical reasons, we are interested in differential expression at the gene but not probeset level, I am wondering what's the best way to map probeset analysis into gene analysis. For instance, there are multiple probesets for a gene and each probeset has a p-value, fold-change, etc. When we map the probelets into the corresponding gene, shall we take the probeset with the smallest p-value and use its statistics for the gene? Or median p-value? Mean? ...?

Thanks in advance!

gene R RNA-Seq • 2.4k views

ADD COMMENT • link updated 7.2 years ago by Kevin Blighe 89k • written 7.2 years ago by moxu ▴ 510

score 4 · Answer 1 · 2018-03-05

NB - added July 31, 2020: see also C: Human Exon array probeset to gene-level expression

----

For microarray analysis, during RMA normalisation, there is one key function parameter that relates to your question: target

summarise probe-level expression to genes (or exons)

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="core")

Functionality of this depends on the array type. If you have a 'Gene' array, then expression is summarised to genes. If you have an 'Exon' array, then it will be summarised to Exons.

summarise at probe-set level

rma(MyCELfiles, background=TRUE, normalize=TRUE, target="probeset")

---------------------------------------

Two further options are available for 'Exon' arrays:

target = ’full'
target = ’extended’

If you still cannot obtain the correct level of summarisation with these, then just summarise by mean via the aggregate() function.

Kevin