Question

How to Calculate FDR in permutation F test

1

Entering edit mode

8.1 years ago

hellocita ▴ 40

Hi all, I am a little confused about how to calculate FDR after permutation F test.

Assume there is 6000 genes in my data. And for each gene, I perform 1000 permutation F test and got 1000 F value, which includes 1 original F value and 999 permutating F value. And p-value = sum(F > F-original)/1000.

But I am confused how to calculate FDR? I think it should be FDR = False positive gene number/ gene with Permutation p < 0.05 number.

Thank you in advance:)

R • 6.3k views

ADD COMMENT • link updated 8.1 years ago by Jean-Karim Heriche 27k • written 8.1 years ago by hellocita ▴ 40

0

Entering edit mode

Hi! Did you find answers for the questions you asked? To my understanding for each gene you have to calculate: perm_p-value= number of p-values<=p-value experimental +1/total number of permiutations+1. So your formula is not correct in this way. To perform FDR correction you should take your raw p-values and adjust them e.g. by means of p.adjust(method='fdr') R base function.

ADD REPLY • link 5.9 years ago by Denis ▴ 310

score 0 · Answer 1 · 2017-04-16

The FDR is the probability of getting a false positive result at a given p-value threshold. It is E[false positive]/E[significant tests]. E[significant tests] is just the number of tests called significant at the chosen threshold. The problem is then to estimate the number of false positives. This is the number of true negatives times the probability of calling one significant, which is the given threshold. So we need to estimate the number of true negatives. For this we can assume that the distribution of p-values for true negatives is uniform, plot a histogram of the observed p-values and find the region where the distribution is flat. The height of this part gives an estimate of the proportion of true negatives. In practice, one finds a value lambda after which the p-value distribution is flat and the proportion of true negatives is the number of p-values greater than lambda divided by 1-lambda times the total number of tests. See Storey, J. D. and R. Tibshirani (2003). “Statistical significance for genome-wide studies.”Proceedings of the National Academy of Sciences 100(16): 9440-9445.
This is related to the q-value which is the minimum FDR of deciding that a particular test is significant. This is probably what you want and is available as the qvalue() function in the qvalue R package.