1000 Genomes: Phased Or Not?
1
2
Entering edit mode
13.7 years ago
Chronos ▴ 620

Q1. Running

zcat ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz | grep -v ^## | cut -f 345 | cut -d ':' -f 1 | grep -v '\./\.' | grep -v '|' | head

yields

NA18981
0/0
0/0
1/1
0/0
0/0
0/0
0/0
0/0
0/0

while running

zcat ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz | grep -v ^## | cut -f 345 | cut -d ':' -f 1 | grep -v '\./\.' | head

yields

NA18981
0|0
0|0
0|0
0|0
0|0
0|0
0|0
1|0
0|0

Is NA18981 phased, or is it not? Is it partially phased? If yes - then what rule/convention explains this partiality? (I know that microsatellite calls are unphased in phased genomes, but I believe I haven't seen any in this file.)

Q2. For somatic chromosomes (I've only checked this on chrs 1 and 2, but I assume this pattern is characteristic for all autosomal chromosomes) all 629 samples appear to be phased - that is, their genotypes at all positions are either unknown ./. or phased (e.g. 0|1). So are all 629 samples really phased on all somatic chromosomes?

Somewhat related: http://biostar.stackexchange.com/questions/5315/phased-and-unphased-genotypes-in-vcf-files-does-the-order-of-alleles-matter

genome • 4.0k views
ADD COMMENT
0
Entering edit mode

You should check the documentation of the programs used to phase the data. They certainly contain a section about phasing chr X and haploids. Chr X is hemizygous in certain positions. Phasing can be problematic.

ADD REPLY
0
Entering edit mode

They used Beagle (judging from the chrX filename), and I'll have to read its manual sooner or later. Jarretinha, did you mean to say that 1000 genomes project considers pseudo-autosomal segments of Y as diploid segments on the corresponding chrX coordinates? That would make sense (as there is no Y chromosome anywhere in the data), but then... Why there are no non-diploid genotypes on chrX? (They should have no slash/pipe in them, just a number or a dot.)

ADD REPLY
0
Entering edit mode

Please be more descriptive!

ADD REPLY
0
Entering edit mode

+1 I actually think the question is very clear.

ADD REPLY
0
Entering edit mode

ok, the question is very clear now after the edit. I've removed the -1.

ADD REPLY
3
Entering edit mode
13.7 years ago
lh3 33k

You should not trust the X chromosome genotype calls. They were not made in the proper way. On autosomes, the majority of SNPs are arbitrarily phased, but some are unphased due to call set merging. The upcoming release will be much cleaner.

As to phasing itself, you can always arbitrarily phase heterozygotes. The question is how many switch errors we make. The answer to this question is largely unknown on a data set like 1000g.

ADD COMMENT
2
Entering edit mode

The more consistent data set is now available ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20101123/interim_phase1_release/

The genotypes here are complete and phased for all 1094 individuals

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6