I have been reading the description of plink file formats ( http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml ) and there are some questions.
First, how is the null
distinguished from the first
homozygote? In terms of binary bits they are exactly the same:
Genotype Person SNP
11011100
00 G/G 1 1 snp1
11 A/A 1 2 snp1
10 0/0 1 3 snp1
11 A/A 2 1 snp1
00001111
11 A/A 2 2 snp1
11 A/A 2 3 snp1
00 (null)
00 (null)
Second, is there any reason for reading the bits in the reverse order?
Third, it would be more natural and intuitive to encode (homozygote 1, heterozygote, homozygote 2) as (00, 01, 10), which in decimal is just (0, 1, 2), what is the motivation behind designating homozygote 2 as 11?