Problem With Fdr Test : Q-Values All < 0.05 For A List Of P-Values That Are Not All Significant.
1
4
Entering edit mode
12.5 years ago

Hi,

I found 2,302 significant SNPs in genomic sequences representing a few hundred genes sequenced for 2 populations (12 samples / population). Based on the allele counts, I calculated the allele frequencies at each locus. Then, I calculated de allelic frequency divergence (AFD), i.e :

AFD = freq[a1,pop1] - freq[a1,pop2] (allele-freq for pop1 - allele-freq for pop2)

With that, I want to discover which SNPs show a significant difference in allele frequency between both populations in order to identify putative divergent genes (I also performed an outlier test with Bayescan and I also have some problems with that, but it's not the topic of this question!).

I used the Python package "fisher-0.1.4" (http://pypi.python.org/pypi/fisher/) to compute the p-values for each AFD (i.e for each SNP) and I counter-verified these values with R, building a 2x2 matrix for each SNP and performing this test:

fisher.test(mat, alternative="less")

I find that R computes the same p-values than Python. Based on these p-values, among the 2,302 SNPs, 56 SNPs show a AFD > 0.5 and a p-value < 0.05. Now, I want to see if those SNPs are also significant when corrected for multiple hypotheses (FDR). I tried to perform a FDR test to compute q-values based on the distribution of p-values (for the 2,302 SNPs). This is where things get complicated...

I tried two R packages: "qvalue" AND "fdrtool". With both of them, I get an error message, which makes me think that there is something wrong with my dataset of p-values. Here is what I did for the package "qvalue" :

> p <- scan("~/Desktop/pvalues.txt")
Read 2302 items
>qobj <- qvalue(p, fdr.level=0.05, pi0.method="bootstrap")
[1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

And even when I try to change some parameters in the test, I ALWAYS get this error message.

Then with the package "fdrtool", this is what I did:

> fdr <- fdrtool(p, statistic="pvalue")
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr
Step 5... prepare for plotting

Warning message :
Censored sample for null model estimation has only size 2 !

With that second package (fdrtool), I can try to visualize the q-values computed with the command:

fdr$qval

But ALL the q-values are < 0.05, which is quite strange. How can I have p-values > 0.05 that become significant (i.e < 0.05) when I correct for FDR? And how is it possible that ALL the q-values (n=2,302) are significant? I decided to include in my question a graph showing the distribution of my p-values. Maybe this can help...

http://www.freeimagehosting.net/5v76e

UPDATE 02-06-2012 : max p-value = 0.758 and min p-value = 2.73E-10.

If someone could help me, it would be tremendously appreciated!

Cheers ! FO

statistics • 19k views
ADD COMMENT
0
Entering edit mode

can you post the min, max of your p-values here? The error seems to indicate that your p-values are < 0.

ADD REPLY
0
Entering edit mode

Sure I can do that ! I just updated my question by giving the min and max p-values. I have some p-values that are really small, but they are all above 0.

ADD REPLY
0
Entering edit mode

Okay. So I got the error part wrong. Would it be possible for you to upload the p-values somewhere and link here? Or if you're willing to try a multiple-correction using R and multtest package, I can provide you the snippet to accomplish this and see how it turns out..

ADD REPLY
0
Entering edit mode

Problem solved! Thanks for your time. I think my dataset wasn't of super good quality... so I changed a few things and it works now. Since your first answer, I was concerned with the range of my p-values and it turned out that it was the core of the problem. So thanks again!

ADD REPLY
3
Entering edit mode
12.5 years ago
Michael 55k

First, I think you are correct with being concerned about the estimated q-values and the presence of the warning. In my understanding, p-values adjusted for multiple testing should always be greater-equal to the unadjusted p-values. The warning also indicates that something in the estimation process went wrong. It might have to do with your p-values not fully covering the whole interval [0,1]. If the method tries to find an estimate for the percentage of true null-hypotheses in the data, due to the lack of p-values close or equal to 1, it might not be able to do so. At least this is my interpretation of the warning message.

I have tried to reproduce the behavior, given your p-value range. It seems to be reproducible with simulated data only.

pvalues <- c(runif(2250, min=2.73E-10, max= 0.758)) # your values
res1 <- fdrtool(pvalues, statistic="pvalue")
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr
Step 5... prepare for plotting

Warning message:
Censored sample for null model estimation has only size 1 ! 
cbind(res1$pval,res1$qval)[res1$pval > res1$qval, ] # have a look at them...
               [,1]         [,2]
[1,] 0.297413740 0.0013290189
[2,] 0.425104731 0.0013472933
[3,] 0.330043362 0.0013349787
[4,] 0.424277501 0.0013472092
[5,] 0.751239836 0.0013910413
....
 sum(res1$pval > res1$qval)
[1] 2249

That yields almost identical significant q-values for all different p-values, ant therefore looks very wrong to me, while:

pvalues <- c(runif(2250, min=0, max=1))
res2 <- fdrtool(pvalues, statistic="pvalue")
sum(res2$pval > res2$qval)
[1] 0

In consequence, the estimation method seem to make assumptions about your data that are maybe not justified. We possibly need to understand the method a bit better, to use other parameters or use a different method. That is all I can say from this applied perspective.

If you are only out for any kind of FDR correction you might as well use p.adjust(p, method="fdr") which will use the method of Benjamini&Hochberg (1995), which has been applied for microarray data for example. I wouldn't comment of the initial test step of your analysis, other than that you could have a look at more complex association tests e.g. in PLINK.

ADD COMMENT
0
Entering edit mode

Thank you very much for your time and your thorough answer! You are right, the problem definitely comes from the p-values themselves. I was worried about the quality of that dataset since I didn't have any p-value really close to 1 as you noted. I adjusted my script to compute them and it's giving me different p-values and there is a bunch of 1's in there. I used this new dataset to perform the qvalue test and it works fine.

You brought up something important that helped solve the problem, many thanks! Best regards, FO

ADD REPLY
0
Entering edit mode

(+1) wow, that's some debugging! :)

ADD REPLY

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6