Dear Biostars, I need your help please,
I am new to genome wide association studies, and now i am struggling with this issue: I have around 150 samples, of RNA sequences. I clean the data and called variants ... based on GATK workflow. Then, kept the variants with max 3% missingness in the samples. I ended up with 20,000 high quality variants. This just to give you a background about the project!
Here is the issue: I used plink (2 and also tried 1.9) I got some good pvalues, but the problem is that when add --adjust to calculate FDR, I found that it is almost 1 for all
here are the header of my file
UNADJ GC BONF HOLM SIDAK_SS SIDAK_SD FDR_BH FDR_BY
8.39E-05 9.38E-05 1 1 0.767967 0.767967 0.986743 1
0.000556724 0.000607716 1 1 0.999939 0.999939 0.986743 1
0.000935481 0.00101439 1 1 1 1 0.986743 1
0.00112283 0.00121471 1 1 1 1 0.986743 1
Please help!
It can be that you have nothing significant in the dataset - that's how it is supposed to look if there is nothing there. You can extract p-values and perform fdr correction in R and compare results with what you see to check if the correction was adequate.
Thanks for your reply.
The problem is that i have significant pvalue using logistics regression model or chi-square, around 1e-5. but when adjust and calculate FDR, it is one. I believe that it shouldn’t be different that much. If there is nothing significant why logistics regression pvalue suggests e-5! Regards
Considering the number of SNPs in a typical GWAS, we usually defined SNPs with 5e-8 as significant. As yours only go to e-5, it is not too surprising that you've no signal after adjustment.
Thanks for your reply. Just to add one point, i have RNA. So we are having less variants (for expressed genes). So this may cause that problem.
Your answer explain the issue partially, many thanks
Yeap, I simulated 20.000 p-values (which is uniformly distributed under 0 hypothesis) and the minimum was 8.5e-05 - which is of order of your number, but is totally random under independence hypothesis - so I would not worry much about raw p-values at all.