the phase status of an allele takes into account in which chromosome pair has been found. as far as I know, the main reason to use allele phasing information is to increase the correctness of the haplotypes and haplotype blocks inferred from them. it makes sense to name all allele pairs sorted in the same way once you know which allele pair is on which chromosome pair, because if you have all this information sorted you'll be able to easily build haplotypes by dealing sequentialy first with first allele bases only and then with second allele bases only.
trying to be a little more visual (and simplistic too, so please all basic geneticists accept my apologizes in advance), take the table from the webpage you've mentioned:
IND, id1, id2, id3, id4, id5
rs1, AT, TT, ??, AT, AA
rs2, GC, CC, GG, CC, CG
rs3, CC, ??, ??, CG, GG
rs4, AC, CC, AA, AC, AA
if you look to individual 1 (id1) you will have 2 different haplotypes: AGCA (from first chromosome pair) and TCCC (from second chromosome pair). this information wouldn't be known if genotypes were unphased, in which case other haplotyping algorithm should be applied.
I know this post is "old" but it was helpful for me as a springboard to go into more finding on the subject. If it was helpful to me now, it definitely will be helpful to others "tomorrow". Below is an excerpt (copy-paste) from The Variant Call Format and VCFtools - Danecek et al (2011) :
GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (”|”) or unphased (”/”) with respect to other data lines (Figure 1).
This was useful but still left the meaning of the order of the alleles ambiguous for me - i.e. which alleles are in the same chromosome/phase. A look at Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original paper you referenced confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).
So when VCF has
0|1
or1|0
, then it is safe to assume that first column (before|
) always represents one haplotype, and second column (after|
) always represents another haplotype?more or less. you will be able to build a haplotype with the alleles on the first column, and another one with the alleles on the second column.
Thanks, this is what I wanted to be sure of.