I´d like to understand what the hardy weinberg equilibrium represents in GWAS. What does it means to say that a SNP is not in equilibrium?
I´d like to understand what the hardy weinberg equilibrium represents in GWAS. What does it means to say that a SNP is not in equilibrium?
Under Hardy-Weinberg assumptions, allele and genotype frequencies can be estimated. When the ratios of homozygous and heterozygous genotypes significantly differ from the prediction under HWE assumptions, it can indicate genotyping errors, batch effects, population stratification, or--much more rarely--association. Since the last of those is least likely, typically departure from HWE is an indicator that a marker should be discarded. I find this is typically true unless adjacent markers in LD with one another both violate HWE.
Principal components are the most common means by which to correct population stratification or some batch effects. You can also examine LD between the markers you would otherwise discard for violating HWE to see if you might accidentally discard markers that are legitimately associated and do not indicate the other types of error present.
I would say yes, if several SNPs are in high LD (say r2>0.8 or 0.9) and all out of HWE, you can typically assume it is a legitimate association. The chance of having 3 or more SNPs in high LD fail HWE checks because they all genotype badly is pretty low. The only other way this could happen--that I can think of--is if all of these SNPs were in a copy number variable region.
You looked up the Hardy-Weinburg Equilbriaum equation, right? And how it predicts all the possible genotypes should be in particular ratios?
In general, if you see an allele out of HWE, it might be under selection. Or, you might have a consistent genotyping error. I think those are the biggest worries with that kind of data.
I couldn't understand it, cause I thought that, if a SNP differs from the prediction under HWE, that difference is probably caused by an association, so why discard it? But I see now.. good answers, I understood! Thank you both!
Hope it was helpful. Don't forget to select a "correct" answer if you feel it was useful. Principal components, BTW, can be very powerful for correcting for a number of errors. Learning how to use them--especially if your GWAS may have those issues (batch issues like date, genotyping center, sample type, genotyping platform, or population stratification) can save you from publishing a terrible paper.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This is a case where the Wikipedia entry is well worth reading.
There is a nice recent discussion in Reddit about why HW is used even if it never occurs in reality. It's not exactly what you were asking for, but I recommend you to read it. Anyway, basically HW is used to determine if a genotype can be a sequencing error, or to infer population structure.
Thank you Giovanni, I liked the discussion, it'ss really helpful.