Question

Plink HWE QC question

0

Entering edit mode

10 months ago

janny.lau ▴ 10

Hello, I have been trying to do some QC on my SNP chip data using PLINK (I am using Rstudio to access PLINK). And I am very confused about the HWE qc.

One of the QC methods for PLINK is the Hardy Weinberg Equilibrium (--hwe). From my understanding, SNPs, where the chance of this deviation from HWE is due to random variation (with a specified p-value), will be excluded from further analysis - meaning that they deviate significantly from HWE. So, that would mean the smaller p-value you use, you are setting a more stringent HWE and that means fewer SNPs will be removed?

Therefore, small p-value = more SNPs retained and high p-value = less SNPs retained?

I honestly am very uncertain about this. If someone could please help me clear this up, I would appreciate it so very much.

HWE Plink p-value • 1.4k views

ADD COMMENT • link updated 10 months ago by bk11 ★ 3.1k • written 10 months ago by janny.lau ▴ 10

score 0 · Answer 1 · 2024-09-19

You're close, but there's a little mix-up in the interpretation. Let me clarify how the Hardy-Weinberg Equilibrium (HWE) test works in PLINK:

HWE Test and p-Value: The HWE test checks if the genotype frequencies deviate from expected frequencies under Hardy-Weinberg Equilibrium. The p-value from this test indicates how likely it is to observe a deviation as extreme as the one observed, given that the SNP is in equilibrium.

Interpreting p-Values:

Small p-Value: A small p-value (e.g., <0.0001) indicates that the SNP is significantly out of HWE. This suggests that there might be some problem with the SNP, such as genotyping errors, population stratification, or selection pressures. Smaller p-values lead to more SNPs being flagged for removal because they indicate stronger evidence of deviation from HWE.

Large p-Value: A large p-value suggests that the SNP does not show a significant deviation from HWE and is likely to be in equilibrium. Larger p-values mean fewer SNPs are flagged for removal.

Setting the Threshold: When you use the --hwe flag in PLINK, you specify a p-value threshold. SNPs with p-values below this threshold are excluded from further analysis. For example, --hwe 1e-6 will exclude SNPs with p-values less than 10e-6, which is a very stringent threshold, leading to more SNPs being removed. Conversely, a threshold of --hwe 0.05 is less stringent, leading to fewer SNPs being excluded.

In summary:

Smaller p-value threshold: More stringent criteria, more SNPs removed.

Larger p-value threshold: Less stringent criteria, fewer SNPs removed.

I hope this clears up the confusion!