Hi,
I found 2,302 significant SNPs in genomic sequences representing a few hundred genes sequenced for 2 populations (12 samples / population). Based on the allele counts, I calculated the allele frequencies at each locus. Then, I calculated de allelic frequency divergence (AFD), i.e :
AFD = freq[a1,pop1] - freq[a1,pop2] (allele-freq for pop1 - allele-freq for pop2)
With that, I want to discover which SNPs show a significant difference in allele frequency between both populations in order to identify putative divergent genes (I also performed an outlier test with Bayescan and I also have some problems with that, but it's not the topic of this question!).
I used the Python package "fisher-0.1.4" (http://pypi.python.org/pypi/fisher/) to compute the p-values for each AFD (i.e for each SNP) and I counter-verified these values with R, building a 2x2 matrix for each SNP and performing this test:
fisher.test(mat, alternative="less")
I find that R computes the same p-values than Python. Based on these p-values, among the 2,302 SNPs, 56 SNPs show a AFD > 0.5 and a p-value < 0.05. Now, I want to see if those SNPs are also significant when corrected for multiple hypotheses (FDR). I tried to perform a FDR test to compute q-values based on the distribution of p-values (for the 2,302 SNPs). This is where things get complicated...
I tried two R packages: "qvalue" AND "fdrtool". With both of them, I get an error message, which makes me think that there is something wrong with my dataset of p-values. Here is what I did for the package "qvalue" :
> p <- scan("~/Desktop/pvalues.txt")
Read 2302 items
>qobj <- qvalue(p, fdr.level=0.05, pi0.method="bootstrap")
[1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."
And even when I try to change some parameters in the test, I ALWAYS get this error message.
Then with the package "fdrtool", this is what I did:
> fdr <- fdrtool(p, statistic="pvalue")
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr
Step 5... prepare for plotting
Warning message :
Censored sample for null model estimation has only size 2 !
With that second package (fdrtool), I can try to visualize the q-values computed with the command:
fdr$qval
But ALL the q-values are < 0.05, which is quite strange. How can I have p-values > 0.05 that become significant (i.e < 0.05) when I correct for FDR? And how is it possible that ALL the q-values (n=2,302) are significant? I decided to include in my question a graph showing the distribution of my p-values. Maybe this can help...
http://www.freeimagehosting.net/5v76e
UPDATE 02-06-2012 : max p-value = 0.758 and min p-value = 2.73E-10.
If someone could help me, it would be tremendously appreciated!
Cheers ! FO
can you post the min, max of your p-values here? The error seems to indicate that your p-values are < 0.
Sure I can do that ! I just updated my question by giving the min and max p-values. I have some p-values that are really small, but they are all above 0.
Okay. So I got the error part wrong. Would it be possible for you to upload the p-values somewhere and link here? Or if you're willing to try a multiple-correction using R and multtest package, I can provide you the snippet to accomplish this and see how it turns out..
Problem solved! Thanks for your time. I think my dataset wasn't of super good quality... so I changed a few things and it works now. Since your first answer, I was concerned with the range of my p-values and it turned out that it was the core of the problem. So thanks again!