HAP file
The HAP file contains the haplotypes. The HAP file corresponding to the example dataset is:
0 0 1 0 0 0 1 1
0 1 1 0 0 1 0 1
0 1 1 0 1 1 1 1
This file is SPACE delimited. Each line corresponds to a single SNP. Each successive column pair (0, 1), (2, 3), (4, 5) and (6, 7) corresponds to the alleles carried at the 4 SNPs by each haplotype of a single individual. For example a pair "1 0" means that the first haplotype carries the B allele while the second carries the A allele as specified in the LEGEND file. The haplotypes are given in the same order than in the SAMPLE file. This file should have L lines and 2N columns, where L and N are the numbers of SNPs and individuals respectively.
http://www.shapeit.fr/pages/m02_formats/haplegsample.html
When we compare the documentation to your example, we might get an impression about the confusion, because the example does not correspond fully with the documentation. In other words, if your example is correct, the documentation is lacking and vice versa.
Let's look at the first row of your example there are 5 undocumented columns:
I added ** and ()
**5 rs79182581 521049 G A** (0 0) (0 0) (1 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0) (0 0)
5: possibly Chromosome
rs79182581: SNP rs-ID see: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=79182581
521049: possibly genomic position of SNP, also indicates that annotation is based on not the latest genome build GRCh37.p13
G: Allele corresponding to 0
A: Allele corresponding to 1
0 0 := Samples are homozygous in rs79182581, having G G
1 0 := One of the samples supports heterozygous alleles A G
layman speaking: most patients have chromosomes nr. 5 with a G at 521049 in both, while
a single patient has one chromosome 5 with A and one with a G
A predicted/ phased haplotype consists of a single column, or in other words, the software predicts that all the different alleles in one column can be found on the same copy of the chromosome in each pair of chromosomes. That clear?
Please explain your problem sufficiently. And what do you mean by 'explain' the data (the format the results or what is wrong here)? Note that it is virtually impossible to interpret somebody else's data by only seeing a fragment out of context.
Some todo's for you:
explain means how to interpret this data. Format. What these 0,1 means ? corresponding to that ID
Have you tried to search for documentation on that format? I assume SHAPEIT comes with some sort of documentation?
Yes it has the documentation. http://www.shapeit.fr/pages/m02_formats/haplegsample.html
Thanks alot. Now I got it. btw 5 means chromosome 5.