Full-stop/period as SNP identifier in BIM file causes error in Plink v1.9 --read-freq command
1
0
Entering edit mode
7.6 years ago
olavur ▴ 150

In the example shown below, you see that certain variants have a full-stop/period instead of an identifier.

1       rs1495237       0       4372049 T       C
1       .       0       4372921 0       T
1       .       0       4372921 0       T
1       rs1353341       0       4372992 A       G
1       rs12080695      0       4375410 A       G

When I try to run a Plink command with --read-freq, I get an error. For example:

plink --bfile output_data/results_pruned --read-freq output_data/results_freq.frq --make-bed --out temp

Gives me the output:

PLINK v1.90b3.36 64-bit (31 Mar 2016)      https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to temp.log.
Options in effect:
  --bfile output_data/results_pruned
  --make-bed
  --out temp
  --read-freq output_data/results_freq.frq

64386 MB RAM detected; reserving 32193 MB for main workspace.
Allocated 13581 MB successfully, after larger attempt(s) failed.
151955 variants loaded from .bim file.
48 people (26 males, 22 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 32 founders and 16 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999511.
Error: Duplicate ID '.'.

I have only encountered this problem when using --read-freq.

Obviously, one thing I can do is remove all variants with these missing identifiers. Does any have an other solution, or perhaps just an explanation?

plink snp • 3.0k views
ADD COMMENT
0
Entering edit mode

Check if it runs when NA is replaced with "." NA is more standard for missing values than "."

ADD REPLY
0
Entering edit mode
7.6 years ago
sbk ▴ 60

Hi @Olavur,

You can probably replace the ID column with "chr:pos:ref:alt". Example: 1:4372049:T: C. You can do this to all rows also if you don't need rsID column for further analysis. This can be achieved using following awk script:

awk 'BEGIN{FS=OFS="\t"}{$2=$1":"$4":"$5":"$6;print}' filename.bim

if you would like to keep the rsID column for which ever is available then add a if loop

awk 'BEGIN{FS=OFS="\t"}{if($2~/^rs/){$2=$1":"$4":"$5};print}' filename.bim

ADD COMMENT

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6