plink association: FDR is one for all
0
0
Entering edit mode
4.6 years ago

Dear Biostars, I need your help please,

I am new to genome wide association studies, and now i am struggling with this issue: I have around 150 samples, of RNA sequences. I clean the data and called variants ... based on GATK workflow. Then, kept the variants with max 3% missingness in the samples. I ended up with 20,000 high quality variants. This just to give you a background about the project!

Here is the issue: I used plink (2 and also tried 1.9) I got some good pvalues, but the problem is that when add --adjust to calculate FDR, I found that it is almost 1 for all

here are the header of my file

UNADJ   GC  BONF    HOLM    SIDAK_SS    SIDAK_SD    FDR_BH  FDR_BY

8.39E-05    9.38E-05    1   1   0.767967    0.767967    0.986743    1

0.000556724 0.000607716 1   1   0.999939    0.999939    0.986743    1

0.000935481 0.00101439  1   1   1   1   0.986743    1

0.00112283  0.00121471  1   1   1   1   0.986743    1

Please help!

plink gwas fdr RNA-Seq • 1.7k views
ADD COMMENT
1
Entering edit mode

It can be that you have nothing significant in the dataset - that's how it is supposed to look if there is nothing there. You can extract p-values and perform fdr correction in R and compare results with what you see to check if the correction was adequate.

vec <- runif(20000)

summary(p.adjust(vec))
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1       1       1       1       1       1 

min(vec)
 [1] 8.516642e-05
ADD REPLY
0
Entering edit mode

Thanks for your reply.

The problem is that i have significant pvalue using logistics regression model or chi-square, around 1e-5. but when adjust and calculate FDR, it is one. I believe that it shouldn’t be different that much. If there is nothing significant why logistics regression pvalue suggests e-5! Regards

ADD REPLY
1
Entering edit mode

Considering the number of SNPs in a typical GWAS, we usually defined SNPs with 5e-8 as significant. As yours only go to e-5, it is not too surprising that you've no signal after adjustment.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Just to add one point, i have RNA. So we are having less variants (for expressed genes). So this may cause that problem.

Your answer explain the issue partially, many thanks

ADD REPLY
0
Entering edit mode

Yeap, I simulated 20.000 p-values (which is uniformly distributed under 0 hypothesis) and the minimum was 8.5e-05 - which is of order of your number, but is totally random under independence hypothesis - so I would not worry much about raw p-values at all.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6