In Microarray experiments, there is no clear-cut way of defining cut-offs for anything, other than this simple rule: You should only consider expression differences that are significantly beyond what you would expect from replicates.
You describe cases where you observe 'a higher variance within the replicates than between samples'. Unless there is some major experimental problem, this can be explained only by chance alone, in the best case 'different samples' are not fundamentally different from 'replicates', e.g. when the differences are minor.
You should be aware of the different sources of variations:
-
technical random noise (inter-array differences). This is what you see in purely technical replicates (split an RNA sample in two halves, process them separately and hybridize on different array). This variability is superimposed onto everything that you measure. In my experience (many 1000s of microarray) this is nowadays not too big a problem and is usually smaller than the other noise sources. However, it is a technical frontier, so you can't expect more accurate results than that.
-
biological variability. Can be a big problem in outbred populations (such as humans). There are plenty of 'expression polymorphisms' caused by inter-individual differences. Even worse, there are differencs not only in the baseline expression leven (which you could 'normalize out' but there are also difference in the response to various stimuli.
You would really have to sample many individuals before claiming that an observed difference is a general phenomenon.
-
sampling variability. This is a major issue that is often not sufficiently recognized. It is caused by (hidden) variantions in sample preparations. Factors to be controlled are i) time of the day, ii) nutrition status, iii) drugs taken, iv) cell composition of biopsy samples, etc. Which of these factors is important depends on your system. Often iv) is of major concern because even minute contaminations with other tissue (blood! fat!) can lead to dramatic changes in the expression profile
-
systematic variations. These can often be avoided but nevertheless might be a big problem. Possible causes are batch effects (of the array, enzymes, buffers) or a change in the microarray operator. Sometimes even a different hybridisation chamber or a different room temperature can make a big difference.
These sources of variations have to be weighed against the expression changes you expect to see. Sometimes the expression changes are so big that you hardly have to worry about variations (e.g. when exposing cells to toxins or LPS or such) but often you have to.
If you expect subtle expression changes (which you probably do, judging by your question) you should pay a lot of attention to your replicates, maybe even increase the number of replicates. When you want to judge if a given expression difference is meaningful, you have to apply statistical tests (e.g. t-test or ANOVA) to calculcate the significance, e.g. the probability that this observation was by chance alone. Very important: when doing statistics on microarray results, don't forgt to apply a correction for multiple testing (e.g. Benjamini-Hochberg). By doing this, you will gen an 'implicit cut-off' which is directly based on the variability you observed in your replicate samples.
Could you elaborate on your experiment? What are the variables in your experiment? What is the hypothesis you are testing?
Our experiment consists out of microarrays for 60 individuals. For 2 of them there are 5 replicates (5 adjacent days). Reason for this is that these samples are human cell lines which most likely adds a layer of growth variance . We would like to filter out targeted regions that are more influenced by growth to lower than actually by individual based variance.
And what is the biological hypothesis that you want to test? How do the 60 arrays from individuals come into it?
We're performing a GWAS study on these 60 samples to find heritable chromatin statuses. Though, because of that these are cell lines we have seen quite some noise for some probes within the replicates. We can simple filter these by saying we remove the top 10% variant probes from our dataset before summarisation.
I was just wondering if anyone knows of a better/improved/de-facto way of doing this. I cannot image than we're the first to try something like this to improve microarray results.
Wait a sec... What kind of arrays are these? You are doing GWAS, meaning they are SNP arrays? If so how do you explain the variation in the first place? (and no if it is about expression arrays you are not the first to try to heck the relation between intra and inter individual variation).