Question

How to determine which SNPs are above FDR line?

1

Entering edit mode

8.7 years ago

joshf ▴ 50

Hey so here I have an R script I use to make Q-Q plots for GWA SNP P-values with FDR lines. How could I determine with programming which P-values (or SNPs, really) are significant enough to have passed the 0.05 FDR line? I'm just really not sure how to approach this. Example below!

  qqunif = function(p, BH=T, MAIN = " ", SUB=" ")
    {
      nn = length(p)
      xx =  -log10((1:nn)/(nn+1))
      plot( xx,  -sort(log10(p)),
            main = MAIN, sub= SUB, cex.sub=1.3,
         xlab=expression(Expected~~-log[10](italic(p))),
            ylab=expression(Observed~~-log[10](italic(p))),
           cex.lab=1.0,mgp=c(2,1,0))
      abline(0,1,col='gray')
      if(BH) ## BH = include Benjamini Hochberg FDR
        {
          abline(-log10(0.05),1, col='purple',lty=1)
          abline(-log10(0.10),1, col='blue',lty=1)
          abline(-log10(0.25),1, col='green',lty=1)
          legend('topleft', c("FDR = 0.05","FDR = 0.10","FDR = 0.25"),
                 col=c('purple','blue','green'),lty=c(1,1,1), cex=0.8)
          if (BF)
          {
            abline(h=-log10(0.05/nn), col='black') ## bonferroni
          }
        }
    }

EXAMPLE: enter image description here I can SEE that there are 25 SNPs that passed the 0.05 FDR line, but how can I get a list of just those 25 SNPs in a new file? Thing is sometimes I work with much larger files. My input files are really just: rs_number pval

Hopefully someone can help me with this

FDR R GWAS snp • 3.5k views

ADD COMMENT • link updated 4.1 years ago by Biostar 20 • written 8.7 years ago by joshf ▴ 50

score 0 · Answer 1 · 2016-03-10

The equation for your .05 FDR line is y= -log10(expected p value) + (-log10(.05)), so you just have to determine if your observed values fall above this line. If you create two vectors with your expected and observed p-values

where

Expected_Pvalue = xx, Observed_Pvalue = -sort(log10(p)))

Then you can use the line equation to do a simple logical evaluation to get a subsetting index:

Pvalue_keeps <- Observed_Pvalue > -log10(Expected_Pvalue) + -log10(.05)

Finally use this index to subset your dataframe of snps and pvalues, which was also sorted based on: -sort(log10(p))

snps_pvalues_df[Pvalue_keeps, ]