I'm using 1000 genomes vcfs, and I'm trying to thin out SNPs in moderate linkage disequlibrium (r2) using vcftools. In plink, I would do this using the --indep-pairwise parameter, and then excluding the outputted SNPs:
plink --bfile DATA --indep-pairwise 50 10 0.8 --out OUTPUT --noweb
plink --bfile DATA --exclude OUTPUT.prune.out --noweb --make-bed --out DATA_FILTERED
Does anyone know if there is an equivalent one or two step solution to do this using vcftools? I would like to avoid having to convert to plink format and back to vcf, if possible.
Thanks!
Hi Kevin, Thanks for this response, it is helpful. However, from what I understand, this command will just output a file containing the r2, D, and D’ statistics. Is there a way to actually filter based on r2 after we have this file?
Hi everyone! I've got the same question and am wondering how you can actually prune for LD using VCFTools (not just identify the SNPs that are in LD). I'm wondering if you could use the command --hap-r2-positions <positions list="" file=""> to create a list of positions that are out of LD, and then use the --exclude-positions to prune out the SNPs that are in or out of LD. I'm going to give this a go, but if there are any other suggestions, that would be greatly appreciated!
VCFtools has long been superseded by BCFtools. Please use that. If you have other questions, you may open your own question.
Kia ora (thank you) Kevin! I just saw your other post here. It was very helpful!
VCFtools version for LD calculations specifying bin size
Kia ora bro / dudette!