Is there a good way to take any two SNPs and pull out the LD between them, particularly the R-squared and the directionality of the linkage (e.g. A in SNP1 occurs with G in SNP2 95% of the time)?
Obviously I can do this manually, but I am wanting to do it for a list of several thousand SNPs, so I am hoping for a scalable solution. Right now I can't find anything, and it looks like I will have to come up with my own solution using vcftools
and the 1000 genomes data.
Thanks!
plink1.9 offers this functionality, as answered by @chrchang523
In case anyone else wants to do this, I wrote a little package based on plink and LDlink that allows many-to-many LD lookup. Basically, provided two SNP lists, it creates a list of SNP LD pairs between each SNP in the first list and every SNP in the second list, filtered by distance and R2. Provided a first list of 40,000 SNPs and a second list of ~10million risk alleles it runs in a couple of hours.
The output includes phase SNP data to ask the question: 'given Allele X in SNP A, what is the allele in SNP B' for every single possible pair.
All of this is just done by some basic math, running plink a bunch of times, and parsing the output. Hopefully it helps someone else.
Hi Mike,
I have exactly the same issue like you and I would like to use your code to compare LD between two lists of SNPs. Can you please tell me where I can find that code?
It turns out that LDLink does this beautifully for a single pair of rsids, but it doesn't seem to work in batch mode: https://analysistools.nci.nih.gov/LDlink/
Maybe it's possible that you could get your results from LDlink with some scripting as well, possibly requires parsing the html for ex.you can construct URLs programmatically pretty easily https://analysistools.nci.nih.gov/LDlink/?var1=rs1042779&var2=rs6792369&pop=YRI%2BLWK%2BGWD%2BMSL%2BESN%2BASW%2BACB%2BMXL%2BPUR%2BCLM%2BPEL%2BCHB%2BJPT%2BCHS%2BCDX%2BKHV%2BCEU%2BTSI%2BFIN%2BGBR%2BIBS%2BGIH%2BPJL%2BBEB%2BSTU%2BITU&tab=ldpair
Yes! It looks like that does work, I am not sure how many queries their API can tolerate, but I am going to try this tonight to see. I also want to compare this to running the calculations with plink/vcftools to see which is faster/more stable. Thanks!