I have GWAS data indicating a large number of SNPs may be associated with a trait of interest. Some of these SNPs are within or very nearby regulatory elements, such as long ncRNAs.
These SNPs are markers for alleles from many different genotypes. It is unclear whether the regulatory elements near these SNPs are the cause of the associations with the trait – or if another element within LD is causing the association
I'm trying to figure out a way to calculate the probability of these lncRNAs being the cause of the associations. I don't yet know how to do this, but have been thinking of an approach using integral calculus and the following variables:
Variables I know how to incorporate:
- rate of LD decay
- length of regulatory element
- position of SNP relative to the regulatory element of interest
Variables I don't know how to account for:
- strength of association
- nearby associated SNPs, also within LD, and the strengths of these associations
- variations in LD dependent on local genomic structure
The approach I'm considering would depend on calculating the black area in the figure relative to the blue area. I think this approach may be rather crude and am wondering about ways to make the estimation more accurate, by including more variables such as the ones I listed and others.