Question

Microarray: How To Select One Of Multiple Probes Corresponding To A Gene

26

Entering edit mode

12.4 years ago

Nasir ▴ 270

Hi All

I know similar questions have been asked before but, having read the answers, I am still unclear of the best solution to the following problem:

We have done a custom one-colour Agilent oligonucleotide microarray (with essentially genome-wide coverage) on 24 disease and 24 control human brain samples. In some cases, there are multiple probes which correspond to the same gene. How do I calculate the fold change for a gene mapped to by multiple probes? Here are some of the options I have come across:

use the probe with the highest normalized intensity averaged over all samples
use the probe with the highest absolute value of differential expression
use the probe with the highest signal variation
use the probe with maximum inter quartile expression range value (this method is implemented in Agilent's GeneSpring for the Gene Set Enrichment Analysis function)
for each gene, select a single RefSeq entry, primarily the one annotated by TaqMan assays. If multiple probes match the same RefSeq entry, only the probe closest to the 3′ end is used (this method is adopted in this MicroArray Quality Control project paper
select the probe least likely to cross-hybridise, i.e., the probe with the least similarity to other areas of the genome based on a BLAT search using UCSC genome browse
take the median fold change of all probes
select the probe with the lowest p-value

Which option would you use & why? (Apologies about the long question!)

microarray agilent • 22k views

ADD COMMENT • link updated 5.2 years ago by asalimih ▴ 60 • written 12.4 years ago by Nasir ▴ 270

score 6 · Answer 1 · 2012-06-20

6

Entering edit mode

12.4 years ago

Jeremy Leipzig 22k

before you do any of that see if you can associate your probes to transcripts (ENST or otherwise) instead of genes. You might find some of these changes are limited to one isoform, which you would mask with the averaging.

ADD COMMENT • link 12.4 years ago by Jeremy Leipzig 22k

score 3 · Answer 2 · 2012-06-20

3

Entering edit mode

12.4 years ago

Davy ▴ 410

This option is not on your list, but I would check to see if the probes impart information about multiple transcripts from the same gene. I don't know of a way to do this programmatically, but if you have a list of top hit probes, you could do it for those, then revisit even non-significant probes from within the same gene. If there are any. If you are calculating the fold change, then you want to show the biggest difference because this is likely a list of top hits or something, so I would go for option 2 or 8.

ADD COMMENT • link 12.4 years ago by Davy ▴ 410

0

Entering edit mode

Dear Davy,

I understand that it has been a long time since your suggestion, but in my opinion, option 8 might be considered as cherry picking from the data, since you are only interested in ones with the lowest p-value, and that might not implicate the biological scenario, especially when we look into the case of drug treatments for a particular condition. Any thoughts?

ADD REPLY • link 5.5 years ago by vinayjrao ▴ 250

score 2 · Answer 3 · 2012-06-21

From my own experience, and as Davy and Jeremy stated, if multiple probes targetting the same gene show different expression levels, this might be an indicator of alternative transcription and should be investigated. These might be many more than you first thought.

For probes targetting the same transcript and showing similar expression levels, I would take the median (or mean) fold-change, no need to get too fancy here.

But do check for alternative transcripts first.

score 1 · Answer 4 · 2019-09-11

1

Entering edit mode

5.2 years ago

asalimih ▴ 60

Although this question is old but i found this answer from one of reddit bioinformatics forum very useful. link

ADD COMMENT • link 5.2 years ago by asalimih ▴ 60