Question

Microarray data gene name annotation and multiple probes mapped one gene.

0

Entering edit mode

5.4 years ago

MatthewP ★ 1.4k

Hello, I google a lot but still confusing. First what's the major difference we get difference expression matrix(limma::lmFit) before gene name annotation, not annotate gene name to expression matrix then call diff-expr.
Second, how to choose strategy to handle multiple probes mapped to same gene? Some use average value, some use mean and some may use largest value. If I care whole gene, maybe average and mean value is better. But if I call diff-expr matirx on probe level, then I merge(combind) multiple probes data to one gene, I don't thinks is appropriate to average logFC or P.val . Can you share your experience here? Thanks.

microarray limma • 1.6k views

ADD COMMENT • link updated 5.4 years ago by Kevin Blighe 88k • written 5.4 years ago by MatthewP ★ 1.4k

score 2 · Accepted Answer · 2019-07-13

For your first question, assuming that you have any data matrix of n x m dimensions, it does not make any difference what are the names of the gene names. limma will fit a linear regression model to each gene independently, irrespective of what are the gene names.

For your second question, there is no clear answer without knowing the array type and version that you are using. Each array is designed differently.

For example, some Affymetrix are designed with probe-sets (a probe-set consists of multiple probes that are related to each other) that target exons, while others are designed with probe-sets that target entire genes. Probe-level summarisation for Affymetrix arrays can be controlled to some extent for RMA normalisation via the following means: How to map probeset associated statistics to gene statistics in microarray differential expression analysis?

If, after that, you still find that you have duplicate genes in your expression matrix, then you can summarise these by mean or median expression.

For Agilent arrays, well, these are again designed differently, and RMA normalisation cannot be used for these. For summarisation, however, you can use the avereps() function in limma.

I will not comment on Illumina cDNA arrays.

Kevin