In the example shown below, you see that certain variants have a full-stop/period instead of an identifier.
1 rs1495237 0 4372049 T C
1 . 0 4372921 0 T
1 . 0 4372921 0 T
1 rs1353341 0 4372992 A G
1 rs12080695 0 4375410 A G
When I try to run a Plink command with --read-freq
, I get an error. For example:
plink --bfile output_data/results_pruned --read-freq output_data/results_freq.frq --make-bed --out temp
Gives me the output:
PLINK v1.90b3.36 64-bit (31 Mar 2016) https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to temp.log.
Options in effect:
--bfile output_data/results_pruned
--make-bed
--out temp
--read-freq output_data/results_freq.frq
64386 MB RAM detected; reserving 32193 MB for main workspace.
Allocated 13581 MB successfully, after larger attempt(s) failed.
151955 variants loaded from .bim file.
48 people (26 males, 22 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 32 founders and 16 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999511.
Error: Duplicate ID '.'.
I have only encountered this problem when using --read-freq
.
Obviously, one thing I can do is remove all variants with these missing identifiers. Does any have an other solution, or perhaps just an explanation?
Check if it runs when NA is replaced with "." NA is more standard for missing values than "."