Question

What are the necessary and sufficient conditions for applying a multiple hypothesis correction?

1

Entering edit mode

3.7 years ago

Aspire ▴ 370

Suppose I perform an RNA-SEQ experiment, and I perform three different comparisons in it using, say, DESeq2. (i.e. WT timepoint1 vs MUT timepoint1; WT timepoint2 vs WT timepoint1; MUT timepoint2 vs MUT timepoint1).

In each comparison, there are many genes on which a statistical test is being performed; so within each comparison, a correction for multiple hypotheses has to be applied.

But I perform three comparisons, not one. The chance of falsely rejecting H0 for genes increases with each comparison that is being performed. So should I correct for multiple hypothesis due to the number of comparisons, and why not, if not?

To further play the devil's advocate : suppose that not only I perform this experiment, but another laboratory tries to perform exactly the same experiment. That laboratory also performs a statistical test, attempting to replicate my experiment. Every time we perform a statistical test, we increase the chance of a false positive. So why would the replication of the experiment by another lab not count as a reason to correct for multiple hypotheses?

In other words, what are the necessary and sufficient conditions for applying a multiple hypothesis correction?

multiple-comparisons statistics • 2.4k views

ADD COMMENT • link updated 3.6 years ago by Jeremy Leipzig 22k • written 3.7 years ago by Aspire ▴ 370

score 2 · Answer 1 · 2021-04-02

2

Entering edit mode

3.7 years ago

Jeremy Leipzig 22k

I think the first one is valid - it's the difference between running three t-tests vs one ANOVA, and your type I error does go up with each comparison.

The second one is not your problem IMHO, just let the meta-analyses and replication studies take care of that.

ADD COMMENT • link 3.7 years ago by Jeremy Leipzig 22k

1

Entering edit mode

Related to the second question (should I correct for other laboratories also testing the data ?), no you should not correct for that – it would be impossible anyway. But you are right that it increases the chance of a false positive. This matter was explored in Ioannidis' famous paper Why most published research findings are false. Excerpt:

[...] research finding is less likely to be true [...] when more teams are involved in a scientific field in chase of statistical significance.

ADD REPLY • link 3.7 years ago by Carlo Yague 8.9k

0

Entering edit mode

Regarding the first issue - is there a standard way to adjust for multiple comparisons being made? Could you refer me to any resource? I have not seen this issue being addressed at introductory-level DESeq2 tutorials, and I don't think it appears in the DESeq2 vignette as well.

ADD REPLY • link 3.7 years ago by Aspire ▴ 370

0

Entering edit mode

i would take the universe of unadjusted p-values from your runs, put them in p, and use p.adjust(p, method = 'BH', n = length(p))

ADD REPLY • link 3.7 years ago by Jeremy Leipzig 22k

0

Entering edit mode

There is an interesting post by Gordon Smyth (author of limma package). I admit I don't quiet follow his logic, but he states that where he says

the default practice remains to apply FDR to each contrast separately, and I continue to recommend that for the majority of analyses.

Regarding your suggestion, he specifically states that

Note also that applying the BH algorithm to all the contrasts simultaneously is not a panacea anyway. Controlling the expected FDR across all genes and all contrasts does not also control the expected FDR within each contrast separately when the percentage of DE genes differs between the contrasts. In particular, the expected FDR for individual contrasts with relatively DE genes can become overly high. Applying FDR globally to all contrasts is conservative for contrasts with many DE genes and liberal for those with few, even while it controls the expected FDR overall.

ADD REPLY • link 3.7 years ago by Aspire ▴ 370

1

Entering edit mode

The logic of this is based on how BH pvalue correction depends on the distribution of the raw pvalues. If pvalues are uniformly distributed (not many DE genes, or at least not much more DE genes than expected if there is no factorial effect), then BH correction will be very stringent, and almost no adjusted pvalue will be "significant". On the other hand, if the pvalue distribution is skewed towards low values (many DE genes), then the pvalue correction is less stringent and most DE genes will remain significant after correction.

In simple terms, Gordon's post explains that if you mix pvalue from different contrasts before applying BH correction, then you "blur" the [pvalue distribution] -> [correction stringency] relationship and might introduce bias.

ADD REPLY • link 3.7 years ago by Carlo Yague 8.9k

0

Entering edit mode

Gordon Smith seems to say that if one controls the FDR within each comparison, then one doesn't need to worry about multiplicity with many comparisons (contrasts), when looking at the overall FDR (since FDR is a scalable quality). (Though I am not 100% sure it applies to Benjamani-Hochberg from his answer)

ADD REPLY • link 3.6 years ago by Aspire ▴ 370

0

Entering edit mode

That makes no sense to me. If I do 100 experiments and look at one gene in each, then I have to correct for multiple comparisons post-hoc but not a priori.

ADD REPLY • link 3.6 years ago by Jeremy Leipzig 22k