How can you interpret a phased VCF with no "phase set" field (PS)

0

Entering edit mode

10 weeks ago

cmdcolin ★ 4.2k

I am trying to figure out how to interpret phased variant calls e.g. the genotype is 0|1 but it seems like it needs an additional PS ("phase set") field to know to what extent you can extrapolate the phase.

The VCF I've stumbled upon to try to find some info about this do not have any PS field including the 1000 genomes VCF from what I can tell e.g. at https://ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

...yet, they provide phased variant calls. does the phase help at all in this case?

blog post that describes the phase set https://www.goldenhelix.com/blog/the-power-of-phased-genotypes-in-variant-analysis/

variants vcf phasing genotyping • 554 views

ADD COMMENT • link 9 weeks ago by cmdcolin ★ 4.2k

0

Entering edit mode

note that the VCF spec says "All phased genotypes that do not contain a PS subfield are assumed to belong to the same phased set" but i would be surprised if something like the 1000 genomes was "completely phased"

ADD REPLY • link 10 weeks ago by cmdcolin ★ 4.2k

0

Entering edit mode

random update: I was assuming that full haplotype-phased-assembly was the "only way" to achieve complete phasing but that might not be true...phasing can be done by reference to imputation and ancestry, so the 1000 genomes might actually be "completely phased" to a good extent

ADD REPLY • link 10 weeks ago by cmdcolin ★ 4.2k

0

Entering edit mode

at least one VCF that uses phase set is located here https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/analysis/MPG_WhatsHap_phasing_07202017/

ADD REPLY • link 9 weeks ago by cmdcolin ★ 4.2k

Login before adding your answer.