I am examining the HWE on 6 HLA genes (A, B, C, DRB1, DQB1, DPB1) in a population which has 2 subpopulations (municipalities). I am using Arlequin v 3.5.2.2 for the HWE test using unphased genotypes. The p-values in one of the HLA genes (B) on the total population is 0.002, on subpopulation 1 is 0.037 and on subpopulation 2 is 0.098. We have set the p limit for a population NOT to be in HWE equal to 0.01. This means that although subpopulation 1 and 2 are in HWE, the total population is not.
Do you have any idea regarding the underlying genetic reasons why this might be happening? Would you consider any additional analysis that could shed some light on this? The rest of the HLA genes (A, C, DRB1, DQB1 and DPB1) are in HWE in all three populations (total + 2 subpopulations).
This is a basic biology question, not a bioinformatics one - you may get more complete answers at Biology StackExchange.
There are a number of possible explanations. For example, a purely statistical one: as the sample size of the total population is larger, the test has greater power than each subpopulation test. Thus, it detected a deviation to Hardy-Weinberg in the total population but not in the subpopulations.
Another possibility is each subpopulation has different allele frequencies and are at Hardy-Weinberg. Then, the deviation to Hardy-Weinberg equilibrium yu are observing is due to population structure, which you are ignoring when performing the test.
Thank you for your reply. BTW I posted this also in reddit and I received more or else the same comments. I believe that these difference are indeed due to population structure.