Question

1000 Genomes: Phased Or Not?

2

Entering edit mode

14.2 years ago

Chronos ▴ 620

Q1. Running

yields

NA18981
0/0
0/0
1/1
0/0
0/0
0/0
0/0
0/0
0/0

while running

yields

NA18981
0|0
0|0
0|0
0|0
0|0
0|0
0|0
1|0
0|0

Is NA18981 phased, or is it not? Is it partially phased? If yes - then what rule/convention explains this partiality? (I know that microsatellite calls are unphased in phased genomes, but I believe I haven't seen any in this file.)

Q2. For somatic chromosomes (I've only checked this on chrs 1 and 2, but I assume this pattern is characteristic for all autosomal chromosomes) all 629 samples appear to be phased - that is, their genotypes at all positions are either unknown ./. or phased (e.g. 0|1). So are all 629 samples really phased on all somatic chromosomes?

Somewhat related: http://biostar.stackexchange.com/questions/5315/phased-and-unphased-genotypes-in-vcf-files-does-the-order-of-alleles-matter

genome • 4.2k views

ADD COMMENT • link updated 14.1 years ago by lh3 33k • written 14.2 years ago by Chronos ▴ 620

0

Entering edit mode

You should check the documentation of the programs used to phase the data. They certainly contain a section about phasing chr X and haploids. Chr X is hemizygous in certain positions. Phasing can be problematic.

ADD REPLY • link 14.2 years ago by Jarretinha 3.5k

0

Entering edit mode

They used Beagle (judging from the chrX filename), and I'll have to read its manual sooner or later. Jarretinha, did you mean to say that 1000 genomes project considers pseudo-autosomal segments of Y as diploid segments on the corresponding chrX coordinates? That would make sense (as there is no Y chromosome anywhere in the data), but then... Why there are no non-diploid genotypes on chrX? (They should have no slash/pipe in them, just a number or a dot.)

ADD REPLY • link 14.2 years ago by Chronos ▴ 620

0

Entering edit mode

Please be more descriptive!

ADD REPLY • link 14.1 years ago by Thaman ★ 3.3k

0

Entering edit mode

+1 I actually think the question is very clear.

ADD REPLY • link 14.1 years ago by lh3 33k

0

Entering edit mode

ok, the question is very clear now after the edit. I've removed the -1.

ADD REPLY • link 13.0 years ago by Giovanni M Dall'Olio 28k

score 3 · Answer 1 · 2011-03-15

3

Entering edit mode

14.1 years ago

lh3 33k

You should not trust the X chromosome genotype calls. They were not made in the proper way. On autosomes, the majority of SNPs are arbitrarily phased, but some are unphased due to call set merging. The upcoming release will be much cleaner.

As to phasing itself, you can always arbitrarily phase heterozygotes. The question is how many switch errors we make. The answer to this question is largely unknown on a data set like 1000g.

ADD COMMENT • link 14.1 years ago by lh3 33k

2

Entering edit mode

The more consistent data set is now available ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20101123/interim_phase1_release/

The genotypes here are complete and phased for all 1094 individuals

ADD REPLY • link 13.9 years ago by Laura ★ 1.8k