Question

Full-stop/period as SNP identifier in BIM file causes error in Plink v1.9 --read-freq command

0

Entering edit mode

7.6 years ago

olavur ▴ 150

In the example shown below, you see that certain variants have a full-stop/period instead of an identifier.

1       rs1495237       0       4372049 T       C
1       .       0       4372921 0       T
1       .       0       4372921 0       T
1       rs1353341       0       4372992 A       G
1       rs12080695      0       4375410 A       G

When I try to run a Plink command with --read-freq, I get an error. For example:

plink --bfile output_data/results_pruned --read-freq output_data/results_freq.frq --make-bed --out temp

Gives me the output:

PLINK v1.90b3.36 64-bit (31 Mar 2016)      https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to temp.log.
Options in effect:
  --bfile output_data/results_pruned
  --make-bed
  --out temp
  --read-freq output_data/results_freq.frq

64386 MB RAM detected; reserving 32193 MB for main workspace.
Allocated 13581 MB successfully, after larger attempt(s) failed.
151955 variants loaded from .bim file.
48 people (26 males, 22 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 32 founders and 16 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999511.
Error: Duplicate ID '.'.

I have only encountered this problem when using --read-freq.

Obviously, one thing I can do is remove all variants with these missing identifiers. Does any have an other solution, or perhaps just an explanation?

plink snp • 3.0k views

ADD COMMENT • link updated 7.6 years ago by sbk ▴ 60 • written 7.6 years ago by olavur ▴ 150

0

Entering edit mode

Check if it runs when NA is replaced with "." NA is more standard for missing values than "."

ADD REPLY • link 7.6 years ago by Santosh Anand 5.8k

score 0 · Answer 1 · 2017-05-08

Hi @Olavur,

You can probably replace the ID column with "chr:pos:ref:alt". Example: 1:4372049:T: C. You can do this to all rows also if you don't need rsID column for further analysis. This can be achieved using following awk script:

awk 'BEGIN{FS=OFS="\t"}{$2=$1":"$4":"$5":"$6;print}' filename.bim

if you would like to keep the rsID column for which ever is available then add a if loop

awk 'BEGIN{FS=OFS="\t"}{if($2~/^rs/){$2=$1":"$4":"$5};print}' filename.bim