Question

What is the recomended z-score for filtration in GWAS studies

0

Entering edit mode

9 months ago

Orlando • 0

I am conducting a post-GWAS analysis of summary statistics from a FVC study. I am attempting to remove SNPs that do not meet relatively stringent criteria, what recommended Z-score filtration value is used in the field, or how do I calculate the Z-score required?

Many Thanks

GWAS • 547 views

ADD COMMENT • link updated 9 months ago by LauferVA 4.5k • written 9 months ago by Orlando • 0

1

Entering edit mode

Please don't add bioinformatics or help as tags, they are not subject matter related and, broadly speaking, are too vague to be useful to anyone.

ADD REPLY • link 9 months ago by Ram 44k

score 0 · Answer 1 · 2024-04-06

Your question deals specifically with GWA studies. For this, please see this post, which is about suggestive level of association in contrast to genome wide significance.

As important or more important than those numbers, though, is the theoretical basis for any such number that could or might have been quoted to you. In that sense, your question is really about type I error rate and multiple testing. This would be an appropriate starting point for finding a dedicated review on the subject in pubmed.

Briefly, there are different numbers one could provide. Generally, the Bonferonni correction was used in GWAS. Researchers knew that studies would accumulate, and so they set the alpha level so stringently that the mathematical expectation is that, based on chance alone, 0 associations will be identified. This is done by dividing the initial alpha level (0.05) by the number of independent markers being tested. That number is basically the number of SNVs that are not linked to any other genomic variant, and it varies a little bit by population. For Western Europeans, for instance, a pretty good estimate is 1,000,000 per genome.

Therefore, the Bonferonni corrected alpha threshold for the GWAS study is given by (1):

bonferonni-corrected alpha level = alpha level for one test / number of independent tests done            (1)

or alpha = 0.05 / 1,000,000 = 5 x 10^-8.

But, there are other ways to do control for type I error based on multiple testing. A second major option is family-wise error rate of FWER. Here, the goal is not to avoid even one false association. Rather, it is to set the alpha level for each test individually such that all the tests together (as a family) have a specified alpha level. It is like saying, in a RNA-Seq study, "I know that false associations may be generated, but I want this to be no more than 5% of all the associations I generate." To be clear, this is different than Bonferonni because Bonferonni is "I dont want even 1 false association".

There are many other types of multiple testing penalties, and many other values have been proposed in GWA studies. To learn more the literature search term mentioned above could prove helpful.

Finally, to bring this all the way back to your question, now your question reduces itself to, "what is the Z-score that corresponds to a given p-value".

from scipy.stats import norm

z_score_one_sided = norm.ppf(1 - 5e-8) # p_value_one_sided
z_score_two_sided = norm.ppf(1 - 2.5e-8) # p_value_two_sided

print(z_score_one_sided)
print(z_score_two_sided)

5.33 5.45

Note: generative AI was used to hasten the process of writing the python code above using the following query:

Could you please provide the Z-score that corresponds to a p-value of 5 x 10^-8 for 1 and 2 sided test?

All other portions of the answer were typed out manually.