That always makes me wonder... I find it a very unsafe approach to eliminate genes as non-expressed in particular tissue/cell type. What if we actually introduce further bias because of detection issues? I.e. can we really say that certain set of genes are not expressed at all under no conditions in particular tissue/cell type. NB: tissues and cells are dynamic and responsive, there is no static state and static signature that would be true under all conditions. That's why we do the experiments after all. Therefore the argument that because some genes might not be detected, we should remove even more genes from the background set, doesn't really convince me.
Now, I can understand the point some people make that if we get a transcriptomic profile of a tissue and compare to "universe" background all we'll learn will be that we study that tissue. Yet, if I design experiments aiming at discovering an enriched/enhanced process, I would normally compare the same tissue/cell type, e.g. treated and untreated. Which means that the tissue- or cell type-specific signature will be "filtered out" at the level of DE, as those genes should be more or less at the same expression level, and the enriched sets will contain genes regulated by the treatment. Unless the treatment also affects e.g. differentiation rate of the tissue or its identity, then I would receive terms relevant to that tissue phenotype, but in that case obviously I would want to know they are regulated.
So, with all the possible biases, I still feel that comparing against all the genes that could be expressed (hence all the genes) is more biologically relevant than comparing against an artificially/arbitrarily selected background.
But I would be very happy if somebody could suggest a thorough reading on the topic, especially related to NGS (RNA-seq and ChIP-seq data). I found the brief article cited above a bit disappointing.
But couldn't/shouldn't one also limit to genes that are expressed in the tissue or cell type being profiled?
In my RNA-seq experiment I can definitely confirm that without limiting the background gene sets to genes that are actually expressed in my tissue of interest, I see enrichment of tissue-specific genes even without enrichment among differentially expressed genes.
My tentative conclusion from this is that one should remove genes from the background gene set if they are not expressed in any sample of the experiment, but I am curious what other people think about that.
Thanks for this link. Very helpful.