Remove duplicated-by-position SNPs using PLINK failed on 1000 Genome phase 3 data
1
0
Entering edit mode
6.4 years ago
Opal • 0

Hello,

I intended to download chr22 genotype data of the 1000Genome phase 3 data (from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) and extract SNPs in high LD with my selected SNP list (test.txt) using PLINK. Let's say my test.txt file contains only the following line:

rs3761445

When I ran

plink --vcf chr22.phase3.vcf.gz --show-tags text.txt --list-all --tag-kb 500 --tag-r2 0.8

the error message said

Error: Duplicate ID 'rs10656307'.

Using suggestion from previous questions and answers posted on this forum, I tried the following to remove duplicates first:

plink --vcf chr22.phase3.vcf.gz --list-duplicate-vars ids-only suppress-first

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar --make-bed

plink --bfile chr22.phase3 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

Then I tried manually add 'rs10656307' to plink.dupvar file, now named as plink.dupvar2, and ran again:

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar2 --make-bed

plink --bfile chr22.phase3v2 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

The error message prompt another SNP and said

Error: Duplicate ID 'rs111334030'.

I wonder if it is the problem of 1000Genome phase 3 data, or that I'm not doing it correctly.

Opal

PLINK 1000 Genome phase 3 duplicated SNP id • 2.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6