When working with a microarray containing tens of thousands of probes, it makes sense that multiple testing is an issue. I also understand that it is common to perform multiple testing correction (for example with Bonferroni or Benjamini-Hochberg), where the more genes you are testing, the more stringent this becomes.
However, let's say that I have a particular microarray study that was designed to answer a question specifically about genes related to metabolism and how they are affected between two test conditions. In such a case it makes sense to me to first remove all genes from my expression matrix that are not related to metabolism (I'm not immediately sure how one would go about doing this, but I imagine it must be possible?), before performing statistical analysis and subsequent multiple testing correction. This would, I guess, allow for easier detection of results that are relevant to my particular research question.
My question: is this thinking correct? And if so, why is it not done more frequently?
One thing I forgot until looking at the paper today is that removing probes with the lowest variance will mess up limma. So that's something to keep in mind when doing variance-based filtering.
thanks for the link, this is an approach that I have been advocating a lot but mostly on gut feeling + common sense but it is great to have something more meaningful to back it up
I had also been thinking about this recently in the context of both Microarrays and RNA-Seq experiments after some discussions with colleagues. My impression was that it would obviously increase power, but you had to be very careful not to introduce experimenter bias in to the equation. One of the benefits of not filtering is you aren't introducing any bias in to the system, and may discover significant and unexpected results. I would probably do both a filtered and unfiltered analysis if I opted to do this.