We are investigating the possibility of synthetic associations. We would like to identify coding variants in LD with GWAS identified variants using publicly available resources and are willing to look over long distances for them. As Kai Wang pointed out, r2 may be a meaningless measure for low frequency variants. Instead one may wish to look at measures of D-prime. Even D-prime of public samples (e.g. HapMap / 1KG) may not reflect long-distance LD present in a disease population. Given this stack of caveats, how would the BioStar community pursue such a question? We do not have complete sequencing data on these individuals so only those samples in public databases would be fair game.
Put on your thinking caps.
I am curious as to why you wish to limit/focus this search on coding variants. Is this because you have exome sequence data available? Or are you hedging that coding variants, either synonymous or non-synonymous, will be the drivers of disease risk/disease phenotype?
I am curious as to why you wish to limit/focus this search on coding variants. Is this because you have exome sequence data available? Or are you edging that coding variants, either synonymous or non-synonymous, will be the drivers of disease risk/disease phenotype?
Both. Sequence is coming, but we cannot get sequence data for our entire sample size. We were hoping to test whether or not coding variants exert "action at a distance" which leads to the GWAS signal. So for a GWAS significant association, are there one or more rare variants with some LD (within a disease population or otherwise) which might drive the signal?