What is considered the best way to handle genes that are not detected at all in a two group comparison, when doing an over-representation analysis ?
For example, define all genes with F.P.K.M. < 1 as not detected. I have three different two-group comparisons to make. If all undetected genes are excluded from the analysis, then different ontology categories will be excluded for each comparison, because different categories will have at least the minimum number of genes in a category. The other option is to keep all genes in the analysis. This means that the ontology categories with sufficient genes in the experiment will be the same for all three comparisons, but it has the undesired effect of more multiple testing adjustment for all genes and also the genes with small counts will inevitably be found to not be differentially expressed. This seems to artificially inflate the count of genes that are not differentially expressed, because the genes might truly be differentially expressed, if more sequencing depth covered those genes, for example by targeted RNA-seq. There must be some abundance threshold below which the answer to differential expression should be "don't know" rather than "no".
Filtering on the whole dataset at once doesn't seem specific enough. Consider the case
Time 1 : 100 110 115 Time 2 : 5 3 0 Time 3 : 0 2 1
Filtering on the whole dataset using a cutoff such as at least 2 observations >= 10 reads would include this gene for all contrasts, but it's only interesting for the Time 2 - Time 1 contrast or the Time 3 - Time 1 contrast, not the Time 3 - Time 2 contrast.
Yeah. Most of the examples you'll see of filtering will do it on the whole dataset. You could go ahead and do it for each comparison, though (in fact, you would normally filter on the final output just prior to adjusting the p-values).