How to calculate the LD from a list of SNPs using plink
1
0
Entering edit mode
4.7 years ago
1587620186 • 0

Hi, I have a list of SNPs (with marker RS id) in a .txt file. Beacuse we don't have the .ped file and the .map file, we want to know if it is possible to do LD pruning using plink? And how to do it? I am almost an outsider to plink and know very little about LD pruning, hoping your answer could be more detailed. Thank you very much for any help. I highly appreciate any suggestions!

snp LD plink • 7.3k views
ADD COMMENT
0
Entering edit mode

Please show the format of the data that you currently have.

ADD REPLY
0
Entering edit mode

Do you have any genotype file such as bed bim fam? (they are the compressed format of .ped and .map). If not, you can't calculate the LD nor perform LD pruning unless you are willing to use the 1000 genome data and your SNPs can be found in the 1000 genome data. If that is the case, you can do it with something like

plink --bfile <1000 genome bed file prefix> --indep-pairwise 200 50 0.2 --out Out

You can find the documentation here

ADD REPLY
0
Entering edit mode
4.7 years ago
JFF • 0

No, you can't do the ld pruning with out reference genome data. Actually, most of the analysis by plink need reference data. Firstly you should be sure of the population of your research, such as European or Asian. And you can download 1000 genome reference data from 1000 genome website, which contain 5 populations.
The downloaded file maybe vcf.gz, so you have to transfer them to plink file(bed,bim,fam) by "plink --vcf xxx.vcf.gz --make-bed--out xxxx(outfile name)". Next you can use "plink --file data --extract your_snplist --indep-pairwise 50 5 0.5 " to do the pruning. Tips: in my experience, the 1000 genome data contain dupplicate SNP and triple alleles, which will report error in LD calculate when using plink 1.07 or plink 1.9. There are two method:(1) use plink2, which fixed this bug (2) or remove all the duplicate and multiple alleles in the reference genome.

There are three tutorials for plink 1.07/1.9/2:
http://zzz.bwh.harvard.edu/plink/index.shtml
https://www.cog-genomics.org/plink2
https://www.cog-genomics.org/plink/2.0/ I recommend you to read the first (plink 1.07) one at first, because it's more friendly to beginner and the main usage are the same in these three version. I hope this can help you.

ADD COMMENT
0
Entering edit mode

Dear JFF, Thank you for your kindly help. I have download the vcf.gz file from 1000 Genomes. Because the population which I focused on was European, I used vcftools to extract EUR data according to the ttorials ( How To Extract A Specific Population Vcf File From 1000G Genotypes Vcf File ). However, this method is a little slow. I want to know if I can extract EUR data using plink and how to do it. Thank you!!

ADD REPLY
0
Entering edit mode

You can use "plink --vcf xxx.vcf.gz --make-bed--out xxxx(outfile name)", actually I already wrote it in my answer..... And another advice is that you can use "--extract snplist --make-bed --out xxxx" to create a genotype file, which only contain your interested SNPs. Then you can read this genotype data to do the pruning, it will convenient if you want to use this genotype file frequently. Actually, because of the frequent use, I use --extract to split the whole genotype data in to 23 chromsome files. (After all, we usually calculate LD in one chromsome) By the way, the plink 1.07 tutorial is really useful. I learned almost all usage from there.

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6