Hi everyone,
I have a vcf file from which I would like to retain (i) a list of known, associated SNPs and (ii) background, independent variants that are not associated with each other or with any of my associated SNPs.
I know if I just wanted independent SNPs, I could use something like plink --indep-pairwise
, but that wouldn't guarantee that I keep all of my associated SNPs or that the "random" SNPs are independent from them. I could theoretically overcome that problem by then removing random SNPs that are in linkage with my associated SNPs one by one. For example, I could add associated variants to the vcf file, use vcftools to calculate r2 in the neighborhood around each associated variant, and form a list of associated background variants to remove (e.g. those with r2 > 0.5 to the associated variant). However, running this for each of the thousands of associated variants feels like an extremely inefficient approach.
Do you have any suggestions for solving this problem, or know of any tools which may help?
Thanks in advance!