Plink Bed Format Confusion
1
0
Entering edit mode
11.2 years ago
kindlychung ▴ 60

I have been reading the description of plink file formats ( http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml ) and there are some questions.

First, how is the null distinguished from the first homozygote? In terms of binary bits they are exactly the same:

            Genotype    Person    SNP
 11011100

       00   G/G         1 1       snp1
     11     A/A         1 2       snp1
   10       0/0         1 3       snp1
 11         A/A         2 1       snp1


 00001111

       11   A/A         2 2       snp1
     11     A/A         2 3       snp1
   00       (null)
 00         (null)

Second, is there any reason for reading the bits in the reverse order?

Third, it would be more natural and intuitive to encode (homozygote 1, heterozygote, homozygote 2) as (00, 01, 10), which in decimal is just (0, 1, 2), what is the motivation behind designating homozygote 2 as 11?

plink • 3.0k views
ADD COMMENT
2
Entering edit mode
11.2 years ago
zx8754 12k

I wouldn't go as far as using the word myths. Plink is one of the most robust pieces of softwares, with a good documentation.

Now regarding the first point, they do look exactly the same, but (null) is used when there are no more individuals left for that byte for snp1. Plink already knows where to stop from the fam file. So in above example, there are 6 individuals, that's why after 6 individuals any bit is read as (null). From plink manual:

Finally, when we reach the end of a SNP (or if in individual-mode, the end of an individual) we skip to the start of a new byte (i.e. skip any remaining bits in that byte).

Second and third points could be simply programmers' design choice.

ADD COMMENT

Login before adding your answer.

Traffic: 3547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6