The answer is yes. We will download data from 1000 genomes, keep the SVs, and then do LD calculations. In this example I will do this for all SVs in 100kb of a known gene:
Download a VCF of the SVs, convert to plink format with VCFtools, then to bed/bim/fam format :
tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 22:38853527-39888783 > genotypes.vcf
vcftools --vcf ~/genotypes.vcf --plink-tped --out plinkformat
plink --tfile plinkformat --make-bed --out tmp
Now pull out the SVs--here they are labeled with esv in the bim file:
cat tmp.bim | grep esv | cut -f2 > sv.txt
That gives me the following SVs
cat sv.txt
esv2672287
esv2660594
esv2665859
esv2666691
esv2658587
esv2670964
esv2676044
esv2673637
esv2676660
esv2667824
Extract these SVs from the plink file and keep the new bed bim and fam with SVs only in a new file:
plink --tfile plinkformat --extract tmp --make-bed --out sv
Now just run a check for LD and set your r2 threshold in the last argument
plink --bfile sv --r2 --ld-window-kb 1000 --ld-window 99999 --ld-window-r2 0.5
This gives the following result:
CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2
22 39294101 esv2660594 22 39294104 esv2665859 0.986685
22 39358280 esv2666691 22 39360536 esv2658587 0.892472
22 39358280 esv2666691 22 39374101 esv2670964 0.81834
22 39360536 esv2658587 22 39374101 esv2670964 0.789573
I'm assuming you are talking about small indels (<10bp or so)? You might take a look at this paper (and related ones). http://genome.cshlp.org/content/21/6/830.full
Yes, I do mean indels! Thanks for the paper!
I read the paper, and they talk about their results, but unfortunately there is no description of how they calculated LD... :-(