I have a list of SNPs, and I would like to generate an LD matrix for it.
Below is an example. I obtained the R2 from SNAP [http://archive.broadinstitute.org/mpg/snap/ldsearch.php]. But this will take a lot of time since I have more than a couple of lists of SNPs.
I have also searched on Ensembl, but it seems that I can only search for the LD between two SNPs each time. [Example : http://www.ensembl.org/Homo_sapiens/Variation/HighLD?db=core;r=7:127081784-127082784;v=rs2283094;vdb=variation;vf=1684883;second_variant_name=rs2283095]
I have also looked at the LDheatmap package in R. But it seems the calculation of LD is based on a SNP information from a study, i.e. one need to provide the genotype of each participants. The snpStats package in Bioconductor seems to serve the same purpose. So these two methods do not seem to fit my situation.
Is there any way I could extract the LD matrix (R2) for CEU population, based on HapMap 22 dataset or 1000G dataset? (Considering I have a lot of lists of SNPs)
Example:
SNP
rs2283094
rs2283095
rs6467111
LD matrix (R2)
rs2283094 rs2283095 rs6467111
rs2283094 1 0.143 0.042
rs2283095 0.143 1 0.297
rs6467111 0.042 0.297 1
I downloaded the 1000 Genomes phase 3 VCF files from this FTP folder [ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/]. I wonder if the one named " " covers chr 1-22? I suspect "wgs" means whole genome, but it seems to me the file size not that big, comparing to the separated files of chr1 to chr22.
I also try plink on my PC, I already put the plink file in the same folder where I store the VCF file. Then I use this command line
plink --vcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf --make-bed --out binary_fileset
It returns error: Failed to open ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf
Any idea what is wrong?