Hi All
I know similar questions have been asked before but, having read the answers, I am still unclear of the best solution to the following problem:
We have done a custom one-colour Agilent oligonucleotide microarray (with essentially genome-wide coverage) on 24 disease and 24 control human brain samples. In some cases, there are multiple probes which correspond to the same gene. How do I calculate the fold change for a gene mapped to by multiple probes? Here are some of the options I have come across:
- use the probe with the highest normalized intensity averaged over all samples
- use the probe with the highest absolute value of differential expression
- use the probe with the highest signal variation
- use the probe with maximum inter quartile expression range value (this method is implemented in Agilent's GeneSpring for the Gene Set Enrichment Analysis function)
- for each gene, select a single RefSeq entry, primarily the one annotated by TaqMan assays. If multiple probes match the same RefSeq entry, only the probe closest to the 3′ end is used (this method is adopted in this MicroArray Quality Control project paper
- select the probe least likely to cross-hybridise, i.e., the probe with the least similarity to other areas of the genome based on a BLAT search using UCSC genome browse
- take the median fold change of all probes
- select the probe with the lowest p-value
Which option would you use & why? (Apologies about the long question!)
Dear Davy,
I understand that it has been a long time since your suggestion, but in my opinion, option 8 might be considered as cherry picking from the data, since you are only interested in ones with the lowest p-value, and that might not implicate the biological scenario, especially when we look into the case of drug treatments for a particular condition. Any thoughts?