In another question, the OP mentioned that he was able to find some preliminary LD data for the 1000 Genomes project. I was unable to find the file that was mentioned there, but I got the impression, that LD data for the project already exists. Where is it, can you help me?
The 1000G files from the website are best approached with something that keeps the phasing information in-tact--one of the benefits of 1000G data. For that, haploxt is recommended: http://genome.sph.umich.edu/wiki/Haploxt
It can be used on the files downloaded from here.
If you want to do some PLINK calculation, I can try to help by offering a script for extraction of such data from the phased file. Working on that at the moment.
ADD REPLY
• link
updated 5.2 years ago by
Ram
44k
•
written 14.1 years ago by
Ryan D
★
3.4k
0
Entering edit mode
It has been almost 3 years since this question was asked, did you manage to find LD data?
As far as I know, there is no such information currently available to the public. at least on the official ftp site (you can always check the whole up-to-date site tree here).
Some data exists on this LD. I ended up pulling data from files with code like this: zcat ~/1000Genomes/2010-06/CEU/LD/xt/chr19.xt.gz | grep -w rs11671664 | awk '$4 > 0.5' rs11670375 rs11671664 0.8961 0.5673 A,G chr19:50848886 rs11671664 0.8990 0.7340 C,G rs11083777 rs11671664 0.8990 0.7340 G,G chr19:50851826 rs11671664 0.8899 0.7134 C,G chr19:50852809 rs11671664 0.8990 0.7340 A,G chr19:50853145 rs11671664 0.8990 0.7340 G,G rs4375772 rs11671664 0.8899 0.7134 C,G rs11671664 chr19:50865055 1.0000 0.6138 G,G
but unfortunately I haven't been able to find that data on the 1000 Genomes FTP server. it would definitely be very interesting to know where exactly is this data available.
The 1000G files from the website are best approached with something that keeps the phasing information in-tact--one of the benefits of 1000G data. For that, haploxt is recommended: genome.sph.umich.edu/wiki/Haploxt It can be used on the files downloaded from here: sph.umich.edu/csg/abecasis/MACH/download/… If you want to do some PLINK calculation, I can try to help by offering a script for extraction of such data from the phased file. Working on that at the moment.
It looks like SNAP has updated with the 1000 Genomes pilot 1 data. So if you don't mind being a bit out-of-date, you can use that for quick calculations: http://www.broadinstitute.org/mpg/snap/ldsearch.php
Under SNP data set choose "1000 Genomes pilot 1".
Now we're using a ruby script which pulls the region of interest from the 1000G data and calculates LD within PLINK. Sub-optimal, but better than pilot 1 data.
The 1000G files from the website are best approached with something that keeps the phasing information in-tact--one of the benefits of 1000G data. For that, haploxt is recommended: http://genome.sph.umich.edu/wiki/Haploxt It can be used on the files downloaded from here.
If you want to do some PLINK calculation, I can try to help by offering a script for extraction of such data from the phased file. Working on that at the moment.
It has been almost 3 years since this question was asked, did you manage to find LD data?
I calculated it myself from the phase 1 vcf, using this: A: 1000 genomes LD calculation