Caroline,
There is a closed-form mathematical proof that the degree of linkage disequilibrium between a true positive "causal" variant and any other variant (that is NOT a true positive variant) is proportional to the negative log of the p-value of the causal variant and the degree of LD as measured by r^2. Or, more simply,
(1) pNC = - log(pC) * r^2 + E,
where,
pNC
is the p-value of the non-causal variant, pC
is the p-value of the causal variant, r^2
is the LD between the two, and E is an unbiased estimator of zero, in other words, a term accounting for random noise ...
As such, suppose that you don't know where the "causal" variant is, but you think it is in approximately a Megabase of sequence and you have, say, 2,000 variants in that region. The first thing to do is to eliminate everything with a p-value > 10^-2. This should get rid of a vast majority of your variants, leaving with reasonable options for candidate causal variants only.
Remember, no algorithm out there will work properly if the true positive (causal) polymorphism is not assayed. For instance, an indel in a repeat region or some such. The other thing to keep in mind is the total number of causal variants you think you are present in the locus. In that case, the math is no longer as simple as (1), above. If you have reason to suspect this may be true, be sure to use an algorithm capable of modeling more than one causal variant per locus.
I think if you do these things, you will have a very manageable number of variants left over, and may not even need to worry about the LD exclusions ... I've never had to.
VL