Question

Compute confidence intervals for lambda genomic factor statistics

3

Entering edit mode

7.2 years ago

jamespower ▴ 100

Hi,

I am computing the lambda statistics in R to check whether the distribution of my p-values follow a null distribution using:

pvalue <- runif(1000, min=0, max=1)
chisq <- qchisq(1 - pvalue, 1)
lambda <- median(chisq) / qchisq(0.5, 1)

Is there a procedure for constructing the confidence interval for this lambda genomic factor in R?

Thanks

R confidence intervals chi • 5.2k views

ADD COMMENT • link 7.2 years ago by jamespower ▴ 100

0

Entering edit mode

Thank you Kevin! I had no idea about the 1.253 to have a median from a mean. I am wondering if this is what is normally done in GWAS studies (or whether maybe Bootstrapping is more appropriate)? It seems reasonable to be able to use this approximation since we would expect a large sample size as you have (1000000).
I will look up the stats as suggested. If there are other opinions about what is normally done for GWAS please share.

Thank you

ADD REPLY • link 7.2 years ago by jamespower ▴ 100

0

Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized. This comment ideally belongs under @Kevin's answer.

ADD REPLY • link 7.2 years ago by GenoMax 151k

0

Entering edit mode

There is no standard way of doing this, from what I could / can see. Hence the disclaimers at the start and end of my post. I presume that you were asked to do this by a reviewer (possibly a statistician) for a manuscript? Bootstrapping it to obtain bootstrapped CIs would be a good idea. That can be as easy as random sampling the p values and then repeating 250x.

ADD REPLY • link 7.2 years ago by Kevin Blighe 89k

score 7 · Accepted Answer · 2018-02-14

I encourage you to follow up on my points here with a statistician in your department. Always better to get more brains looking at the same thing. There are various ways to calculate confidence intervals for series of data.

You could start with a typical formula for the standard error of the mean (SE mean), which is the standard deviation divided by the square root of the sample n:

σ / sqrt (n)

Lambda is a median value, so, we have to convert the SE mean to SE median:

1.253 * (σ / sqrt (n))

(various articles on the WWW validate this)

As you are dealing with a Chi-squared quantile distribution, in order to calculate the confidence intervals, I believe you can do this with the following:

pvalue <- runif(1000000, min=0, max=1)
chisq <- qchisq(1 - pvalue, 1)
lambda <- median(chisq) / qchisq(0.5, 1)

lambda 
[1] 0.9963626

SE.median <- qchisq(0.975, 1) * (1.253 * ( sd(chisq) / sqrt( length(chisq) ) ) )

SE.median
[1] 0.008914307

lambda + (SE.median / qchisq(0.5, 1))
[1] 1.015957

lambda - (SE.median / qchisq(0.5, 1))
[1] 0.976768

I increased the number of random p-values to 1 million just to make the confidence intervals tighter. At 1000 (your original post), they were pretty wide.

Take this as a first step into this. I encourage you to look up all functions / formulae that I have used.

Kevin