Hi,
I have a data file containing autosomal SNPs imputed from the 1000 genomes data. The SNPs in my file are named as chr:pos but I want them to be named by rs number. I downloaded the 1000 genomes phase 1 data from the PLINK resources site, excluded sex chromosomes, and organized the file so that I have 2 colums: chr:pos (column 1) and corresponding rs number (column 2). I then tried to use PLINK --update-name command to update the SNP names in my file:
./plink --bfile my_data --update-name 1000_genomes_chrpos_rs.txt --make-bed --out my_data2
I got back the following error message:
Error: Duplicate variant ID '1:2351395' in --update-name file
In the 1000 genomes file, this (and likely other) chr:pos corresponds to multiple rs numbers. Is there a way to rectify this or modify the PLINK command so that I can change the naming of the SNPs in my data file from chr:pos to rs number?
Thanks so much!
Hi,
I have exactly same problem now. I was wondering, have you figured out this problem to remove the duplicate variant ID?
Thank you.
I am wondering too how did you solve this problem?
Hi, You can use unix/linux command to remove or rename duplicated or triplicated lines of your file. Here I'm presenting example assuming that you want to make column 2 unique (you test with small file first). It will add _0 _1 _2 etc. to duplicated values. For example, if your file has 2 columns 18 15 44 16 55 15 77 15 will be turned to numbers 18 15_0 44 16_0 55 15_1 77 15_2 (note that changes are only in column 2) The next pipe (sed 's/_0//' ) removes _0 and keeps other _2 etc. 18 15 44 16 55 15_1 77 15_2 (so, the second column will have unique values)
The command is (I'm assuming you have 6 columns, if the number is different remove or add $3, $4 etc.):
awk '{print $1, $2"_"x[$2]++, $3, $4, $5, $6}' update_file's_name | sed 's/_0//' > result_file_name
If you would like to remove all other underscores like 15_2 15_3 etc you can proceed with extending pipe to grep -v _\ To use column 1 you need to replace $2_\x[$2]++ with $1_\x[$1]++
I hope it helps, Thanks