Question

Info/r2 threshold to use for LD file for FINEMAP/SUSIE

0

Entering edit mode

3.3 years ago

Caroline • 0

Hi everyone, I am running a finemapping analysis using FINEMAP and SUSIE. I am using an imputed dataset to construct the LD reference file, but I am unsure what r2/info score threshold to use for variants for inclusion in the LD reference. I used a fairly liberal threshold (r2>0.3) for inclusion for the meta-analysis. However, I am considering a more stringent threshold for finemapping, because I am concerned about having too much noise if the variants are not well imputed. However, I am also concerned about being too stringent and losing true secondary signals, especially because the LD reference population is a non-European population. Any recommendations on the best practice for r2/info thresholds for inclusion of variants in the LD file for FINEMAP or SUSIE? Thank you, Caroline

susie finemap LD threshold imputation • 903 views

ADD COMMENT • link updated 3.3 years ago by LauferVA 4.7k • written 3.3 years ago by Caroline • 0

score 0 · Answer 1 · 2022-03-09

Caroline,

There is a closed-form mathematical proof that the degree of linkage disequilibrium between a true positive "causal" variant and any other variant (that is NOT a true positive variant) is proportional to the negative log of the p-value of the causal variant and the degree of LD as measured by r^2. Or, more simply,

(1) pNC = - log(pC) * r^2 + E, where,

pNC is the p-value of the non-causal variant, pC is the p-value of the causal variant, r^2 is the LD between the two, and E is an unbiased estimator of zero, in other words, a term accounting for random noise ...

As such, suppose that you don't know where the "causal" variant is, but you think it is in approximately a Megabase of sequence and you have, say, 2,000 variants in that region. The first thing to do is to eliminate everything with a p-value > 10^-2. This should get rid of a vast majority of your variants, leaving with reasonable options for candidate causal variants only.

Remember, no algorithm out there will work properly if the true positive (causal) polymorphism is not assayed. For instance, an indel in a repeat region or some such. The other thing to keep in mind is the total number of causal variants you think you are present in the locus. In that case, the math is no longer as simple as (1), above. If you have reason to suspect this may be true, be sure to use an algorithm capable of modeling more than one causal variant per locus.

I think if you do these things, you will have a very manageable number of variants left over, and may not even need to worry about the LD exclusions ... I've never had to.

VL