Hello,
I intended to download chr22 genotype data of the 1000Genome phase 3 data (from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) and extract SNPs in high LD with my selected SNP list (test.txt) using PLINK. Let's say my test.txt file contains only the following line:
rs3761445
When I ran
plink --vcf chr22.phase3.vcf.gz --show-tags text.txt --list-all --tag-kb 500 --tag-r2 0.8
the error message said
Error: Duplicate ID 'rs10656307'.
Using suggestion from previous questions and answers posted on this forum, I tried the following to remove duplicates first:
plink --vcf chr22.phase3.vcf.gz --list-duplicate-vars ids-only suppress-first
plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar --make-bed
plink --bfile chr22.phase3 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8
Then I tried manually add 'rs10656307' to plink.dupvar file, now named as plink.dupvar2, and ran again:
plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar2 --make-bed
plink --bfile chr22.phase3v2 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8
The error message prompt another SNP and said
Error: Duplicate ID 'rs111334030'.
I wonder if it is the problem of 1000Genome phase 3 data, or that I'm not doing it correctly.
Opal