I need to make a judgment call about what the background level of expression is in a given experiment on the Affymetrix Gene ST microarray platform (described here). Samples are normalized using RMA from the BioConductor oligo package. At the gene level, probesets consist of
- main (e.g. the real genes)
- control->affx (spike-in controls)
- control->bgp->antigenomic
- normgene->intron (for wacky housekeeping gene calculations)
- normgene->exon (more wacky housekeeping)
- flmrna->unmapped
My thought was to use the expression level of the antigenomic probes (which are designed not to hybridize to anything in mouse, human, fly, etc) to get an idea of the post-normalization background level. The point is to determine where credible expression levels start for that experiment. This would be a rule of thumb, not a statistical measure. Antigenomic probes range across the spectrum of GC content (which affects hybridization strength).
The trouble is that post-normalization expression for these probes ranges between log2 3 and 8, presumably reflecting GC content. My question: has anyone got a better idea than "take the median of background probes" to get a floor for post-normalization background expression? Major bonus for answers motivated by statistics or chemistry.
In theory the antigenomic probes are negative controls, since they shouldn't bind to anything on the genome. The trouble is their reported hybridization varies a lot. I'll try GCRMA and see if there is a qualitative difference in results; my guess is I won't see one, but you never know until you try.