Question

plink association: FDR is one for all

0

Entering edit mode

4.6 years ago

ali.al-fatlawi ▴ 10

Dear Biostars, I need your help please,

I am new to genome wide association studies, and now i am struggling with this issue: I have around 150 samples, of RNA sequences. I clean the data and called variants ... based on GATK workflow. Then, kept the variants with max 3% missingness in the samples. I ended up with 20,000 high quality variants. This just to give you a background about the project!

Here is the issue: I used plink (2 and also tried 1.9) I got some good pvalues, but the problem is that when add --adjust to calculate FDR, I found that it is almost 1 for all

here are the header of my file

UNADJ   GC  BONF    HOLM    SIDAK_SS    SIDAK_SD    FDR_BH  FDR_BY

8.39E-05    9.38E-05    1   1   0.767967    0.767967    0.986743    1

0.000556724 0.000607716 1   1   0.999939    0.999939    0.986743    1

0.000935481 0.00101439  1   1   1   1   0.986743    1

0.00112283  0.00121471  1   1   1   1   0.986743    1

Please help!

plink gwas fdr RNA-Seq • 1.7k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 85k • written 4.6 years ago by ali.al-fatlawi ▴ 10

1

Entering edit mode

It can be that you have nothing significant in the dataset - that's how it is supposed to look if there is nothing there. You can extract p-values and perform fdr correction in R and compare results with what you see to check if the correction was adequate.

vec <- runif(20000)

summary(p.adjust(vec))
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1       1       1       1       1       1 

min(vec)
 [1] 8.516642e-05

ADD REPLY • link updated 4.6 years ago by Kevin Blighe 88k • written 4.6 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Thanks for your reply.

The problem is that i have significant pvalue using logistics regression model or chi-square, around 1e-5. but when adjust and calculate FDR, it is one. I believe that it shouldn’t be different that much. If there is nothing significant why logistics regression pvalue suggests e-5! Regards

ADD REPLY • link 4.6 years ago by ali.al-fatlawi ▴ 10

1

Entering edit mode

Considering the number of SNPs in a typical GWAS, we usually defined SNPs with 5e-8 as significant. As yours only go to e-5, it is not too surprising that you've no signal after adjustment.

ADD REPLY • link 4.6 years ago by Sam ★ 4.8k

0

Entering edit mode

Thanks for your reply. Just to add one point, i have RNA. So we are having less variants (for expressed genes). So this may cause that problem.

Your answer explain the issue partially, many thanks

ADD REPLY • link 4.6 years ago by ali.al-fatlawi ▴ 10

0

Entering edit mode

Yeap, I simulated 20.000 p-values (which is uniformly distributed under 0 hypothesis) and the minimum was 8.5e-05 - which is of order of your number, but is totally random under independence hypothesis - so I would not worry much about raw p-values at all.

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k