[SOLVED] Get LD data for any two SNPs
3
0
Entering edit mode
7.7 years ago
Mike Dacre ▴ 130

Is there a good way to take any two SNPs and pull out the LD between them, particularly the R-squared and the directionality of the linkage (e.g. A in SNP1 occurs with G in SNP2 95% of the time)?

Obviously I can do this manually, but I am wanting to do it for a list of several thousand SNPs, so I am hoping for a scalable solution. Right now I can't find anything, and it looks like I will have to come up with my own solution using vcftools and the 1000 genomes data.

Thanks!

genome linkage disequilibrium • 6.4k views
ADD COMMENT
1
Entering edit mode

plink1.9 offers this functionality, as answered by @chrchang523

In case anyone else wants to do this, I wrote a little package based on plink and LDlink that allows many-to-many LD lookup. Basically, provided two SNP lists, it creates a list of SNP LD pairs between each SNP in the first list and every SNP in the second list, filtered by distance and R2. Provided a first list of 40,000 SNPs and a second list of ~10million risk alleles it runs in a couple of hours.

The output includes phase SNP data to ask the question: 'given Allele X in SNP A, what is the allele in SNP B' for every single possible pair.

All of this is just done by some basic math, running plink a bunch of times, and parsing the output. Hopefully it helps someone else.

ADD REPLY
0
Entering edit mode

Hi Mike,

I have exactly the same issue like you and I would like to use your code to compare LD between two lists of SNPs. Can you please tell me where I can find that code?

ADD REPLY
0
Entering edit mode

It turns out that LDLink does this beautifully for a single pair of rsids, but it doesn't seem to work in batch mode: https://analysistools.nci.nih.gov/LDlink/

ADD REPLY
1
Entering edit mode

Maybe it's possible that you could get your results from LDlink with some scripting as well, possibly requires parsing the html for ex.you can construct URLs programmatically pretty easily https://analysistools.nci.nih.gov/LDlink/?var1=rs1042779&var2=rs6792369&pop=YRI%2BLWK%2BGWD%2BMSL%2BESN%2BASW%2BACB%2BMXL%2BPUR%2BCLM%2BPEL%2BCHB%2BJPT%2BCHS%2BCDX%2BKHV%2BCEU%2BTSI%2BFIN%2BGBR%2BIBS%2BGIH%2BPJL%2BBEB%2BSTU%2BITU&tab=ldpair

ADD REPLY
0
Entering edit mode

Yes! It looks like that does work, I am not sure how many queries their API can tolerate, but I am going to try this tonight to see. I also want to compare this to running the calculations with plink/vcftools to see which is faster/more stable. Thanks!

ADD REPLY
3
Entering edit mode
7.7 years ago

"plink --r2 in-phase" provides both r-squared and directionality; see https://www.cog-genomics.org/plink/1.9/ld#r .

ADD COMMENT
0
Entering edit mode
7.7 years ago
cmdcolin ★ 4.0k

The ensembl REST API offers this function http://rest.ensembl.org/documentation/info/ld_pairwise_get

ADD COMMENT
0
Entering edit mode

Note that you can use 1000GENOMES:phase_3:ALL for example http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?content-type=application/json;population_name=1000GENOMES:phase_3:ALL instead of a specific population as well.

ADD REPLY
0
Entering edit mode

Thanks! That is great, but unfortunately it doesn't include the directionality, what I really need to know is what SNP2 is given some value for SNP1 (i.e. SNP2 is T 90% of the time when SNP1 is G)

ADD REPLY
1
Entering edit mode

I see! Perhaps the ensembl team would be interested in adding that function.

ADD REPLY
0
Entering edit mode
2.4 years ago
Raju • 0

You can do this using the below command: you need to have a bim, bed and fam files

plink --bfile Obesity_Send --allow-no-sex --extract SNPs.txt --r2 --out LD_file
ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6