How To Determine If A Gene Has A Low Intensity/Expression In Microarrays ?
3
0
Entering edit mode
11.0 years ago

Hello,

I have read that low intensity/expression genes are prone to be false positive compared to other more expressed genes in microarrays.

I did not know what was the intensity threshold under which a gene is considered to be lowly expressed.

I found out that there is a way of classifying genes expression level in not expressed/low/medium/high-expressing genes using the signal/noise ratio (SNR) thresholds (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC436055/).

However, in this publication the threshold based on SNR seems to be chosen arbitrarily.

Do you know other approaches to classify genes according to their expression levels ?

gene classification microarray • 10k views
ADD COMMENT
0
Entering edit mode

I'd suggest that many cut-offs, thresholds etc. are chosen arbitrarily. There often isn't a "correct" value; it's a matter of how many true/false positives/negatives you are prepared to tolerate.

ADD REPLY
0
Entering edit mode
11.0 years ago

If you create an MA plot, you can see how absolute intensity values correlate with variation:

http://en.wikipedia.org/wiki/MA_plot

When you see the bullet spread out more, you data is getting more noisy. You can use this chart to pick a threshold to ignore low-expression genes (e.g. when the variation in M starts to increase).

Different platforms and different normalization methods yield different signal distributions, so it isn't possible to give one universal cutoff for genes with low expression values.

ADD COMMENT
0
Entering edit mode

As Neilfws suggests, and as is neatly supported by the figure at the bottom of the page you link to, there is no correct threshold. On the figure, there is no logical place to choose as a cutoff, only a gradual increase in the 'noisiness'. Speaking of that figure, it is a strange choice to present as typical data since there is at least 2 forms of artefacts in that figure (spurr on upper left pointing to the up and right and the one in the center pointing down).

ADD REPLY
0
Entering edit mode
11.0 years ago

We usually go for the following procedure to choose genes that are expressed highly enough to analyse. Although it is arbitrary, in the sense that WE choose the method and threshold, not that we choose some random method, it is easy to explain and logical so it never gave us problems during reviews. It works if you have blank spots or negative controls on your array.

  1. We calculate the average and standard deviation of the expression value of all the blanks and negative controls on one array (we do it separately for both colours in 2-colour arrays).
  2. We define a threshold for each array (or colour) that is equal to the mean expression of the blanks plus two times the standard deviation (threshold = mean + 2*stdev) (assuming these do not deviate too much from a normal deviation, it means that the threshold will keep spots that are outside ~95% of the distribution of the blanks)
  3. We keep a gene if it is above the threshold on more than 80% of the samples of at least one of the biological groups we are testing.

So, let's say you have 4 groups, if at least 1 group has a spot above the mean + 2 time the stdev for 80% or more of the samples, we retain it in the analysis.

Well, maybe not THAT simple to explain, but easy to understand and kind of logical. Plus, it retains spots that may be not expressed in one group but that are expressed in another one. These genes are especially interesting since they could very well be differentially expressed. Other methods may miss these genes.

ADD COMMENT
0
Entering edit mode

Well-described. Just a note that using information about biological groups to filter genes has the potential to bias downstream statistical tests. In practice, how much bias is introduced will depend on the experimental system.

ADD REPLY
0
Entering edit mode

Now I have duplicate dual-color microarray raw data (FE text files) and I would like to set an intensity threshold to filter out genes that are lowly expressed. Per your method, readings of "all the blanks and negative controls on one array" should be used to calculate the Aveage and Standard Deviation of the expression values of them. However, could you help specifiy what the values of the "BLANKs and NEGATIVE CONTROLs on one array" are in an FE text file? I suppose the values are those in the gBGUsed and rBGUsed columns. But since every g/rBGSubSignal value is obtained individually by the substration of "g/rMeanSignal-g/rBGUsed", I do not understand why the above procedure indicates a substration with "the mean expression" calculated from ALL the blanks and negative controls.

ADD REPLY
0
Entering edit mode
11.0 years ago

When analyzing Affymetrix human exon arrays, I've used the DABG (detected above background) function from the Affymetrix power tools software. This provides a p-value for each probe after testing whether or not a probeset is detected above background. When using gene-level estimates, a threshold can be set such as requiring the DABG P < 0.05 in ~50% of the samples of at least one group (as suggested here). This method requires you to have the raw CEL files, and I'm not sure if this is appropriate for other Affy arrays beyond the human exon array.

ADD COMMENT

Login before adding your answer.

Traffic: 1038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6