Hi Everyone,
Just a general question about GWAS studies. Probably this has already been addressed but browsing through the literature I couldn't find any specific mention of this issue. I was wondering if GWAS studies take haplotype length into account as a potentially confounding variable? If you are using SNP data from a population and doing a GWAS to identify regions of the genome contributing to an extremely polygenic trait, aren't SNPS on long haplotypes more likely to show significant associations because, firstly, you are more likely to tag them with a SNP and, secondly, they are more likely to contain multiple causative alleles (especially if you assume a highly polygenic additive model with each variant contributing an equally small amount to a trait)?
Couldn't this result in an enrichment for GWAS hits on young haplotypes or regions that have recently experienced a selective sweep?
If anybody knows any papers that address this I would be grateful to hear about them, or if anyone can explain why this is not an issue. Perhaps this is not directly a bioinformatics question but I wonder if software for GWAS controls for this, or if it needs to.
Best regards,
Rubal
Thanks for the thoughtful reply. On your first point yes it's important to consider the different priorities that go into array design for different studies. HLA is a good example of a positively selected locus that comes up often in GWAS studies (also lots of balancing selection going on there). Although this is probably an example of a positively selected locus that commonly has GWAS hits because it is functionally connected to phenotypes. I was more concerned about 'false positive' GWAS hits, or at least not exactly what the experimenter was looking for hits, that are driven by selective sweeps resulting in a region of high LD that is more likely to be tagged by a SNP and/or contain more causative variants due to its length, relative to other tested SNPs.
I agree with all your points in the second paragraph. I still wonder how much of a problem variation in haplotype length is a problem for GWAS. I suppose that in an ideal world all SNPs would tag small haplotypes of equal size but that given the variation in LD across the genome you just have to accept that this is a factor that influences the power of these studies.
Thanks very much for the review paper you suggested.