Question

Benjamini+ Hochberg multiple testing p.adjust R

1

Entering edit mode

6.8 years ago

dp0b ▴ 80

Hi,

Im just trying to get my head around Benjamini + Hochberg multiple testing using the p.adjust approach in R. So is the BH mutliple testing option (pval *no of tests)/rank ?

I have a dummy dataset and bonferroni multiple testing works as should

pvals = c(2.335810e-07, 4.820826e-07, 4.820826e-07, 5.807533e-07, 5.807533e-07,6.954857e-07)
pval2=as.numeric(pvals)
BONF = p.adjust(pval2, "bonferroni")
head(BONF)


[1] 1.401486e-06 2.892496e-06 2.892496e-06 3.484520e-06 3.484520e-06
[6] 4.172914e-06

However, when I try the Benjamini and Hochberg option the adjusted pvalues are all the same

BH = p.adjust(pval2, "BH")
 head(BH)
[1] 6.954857e-07 6.954857e-07 6.954857e-07 6.954857e-07 6.954857e-07
[6] 6.954857e-07

If there is any advice on what Im doing wrong it would be much appreciated or if anybody knows the exact formula p.adjust uses for BH as well.

Thanks

R p.adjust multiple testing p values • 20k views

ADD COMMENT • link updated 6.4 years ago by Renesh ★ 2.2k • written 6.8 years ago by dp0b ▴ 80

score 2 · Answer 1 · 2018-07-09

2

Entering edit mode

6.8 years ago

Devon Ryan 105k

is the BH mutliple testing option (pval *no of tests)/rank ?

No, it involves the cumulative minimum. It's most useful to think of BH adjustment as subtracting out a uniform distribution of p-values from your distribution. That won't produce the actual p-values, but it's a more useful visual representation. Alternatively, the adjusted p-value is the excess significance after plotting the p-value vs. its rank.

Given a vector of p-values p of length n:

i = n:1  # The reverse rank order
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro]

Coincidentally, that's the exact code that p.adjust() uses.

ADD COMMENT • link 4.6 years ago by Devon Ryan 105k

0

Entering edit mode

p.adjust does not use your q.

I also don't understand your comment on subtracting out a uniform distribution, or the "excess significance".

The adjusted p-value is simply the minimum FDR under which the test is significant.

ADD REPLY • link 4.7 years ago by asebsadowski • 0

1

Entering edit mode

It looks like I had the BY method rather than BH, I'll update the answer to correct that.

Regarding subtracting out a uniform, that's one of the ways of conceptually understanding how BH works.

ADD REPLY • link 4.6 years ago by Devon Ryan 105k

score 1 · Answer 2 · 2018-07-09

Have you tried reading the paper ? The procedure is described in section 3. What it says is:
- Order the n p-values (smallest first)
- For a desired false discovery rate threshold alpha, compare p-value p(i) to alpha * i/n
- Find k as the largest i satisfying p(i) <= alpha * i/n
- Reject all null hypotheses p(i) for all i such that p(i)<=p(k)
Alternatively, the corrected p-value can be computed as min(p(i)*n/i, corrected.p(i+1)) for i from n-1 to 1

score 1 · Answer 3 · 2018-07-09

TL;DR There is nothing wrong with the output from the p.adjust function using the "BH" method. If you had more p-values that were not as highly similar, the results would generate different "adjusted p-values" from the "BH" method. You can see the exact calculations performed by p.adjust by simply typing "p.adjust" without parentheses into the R terminal.

What you're observing is a by product of the p.adjust package using a cumulative min when you specify the BH procedure. The original BH procedure (https://www.jstor.org/stable/2346101) doesn't actually specify an adjusted p-value corresponding to each p-value, rather it specifies a criterion to reject the null hypothesis:

[Equation 1 in paper]
let k be the largest i for which P(i) <= i/m * q^*
then reject all H(i), i=1,...,k

where P(i) is the i'th smallest p-value in your vector, H(i) is the i'th hypothesis, m is the number of tests, and q^* is the the desired false discovery rate.

Note that, as you point out, q = P(i) / (i / m) is basically the equivalent of an "adjusted p-value" for the BH procedure. Normally you just reject all null hypotheses that have q less than some value. Except there is a problem. q is not necessarily monotonic as you get to larger p-values. This could result in not calling all hypothesis as significant, as specified in the BH procedure to reject all H(i), from i = 1, ..., k. The fix in the p.adjust function was to use a cumulative min so that the rejection of null hypotheses will be the same as specified in the original paper.

score 0 · Answer 4 · 2018-12-18

0

Entering edit mode

6.4 years ago

Renesh ★ 2.2k

Read this article about how to adjust P-values using Bonferroni and Benjamini and Hochberg in multiple testing https://reneshbedre.github.io/blog/mtest.html

ADD COMMENT • link 6.4 years ago by Renesh ★ 2.2k