Despite the detailed explanation of VCF format on the 1000Genomes site, it is still not clear to me how the data should be interpreted with respect to sample results.
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00002
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 1|0:48:8:51,51
20 1230237 . T C 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|1:3:5:65,3
For individual NA00002 the vertical upright bar in the second position indicates that the data is phased. But is there any significance as to which side of the bar the digits occur?
Eg for position 14370 does the first digit "1" in "1|0" (>A) relate to a particular parent---mother or father? And the second digit on the right of the bar "0" (>G) indicate the base from the other parent. Similarly at position 1230237 first digit "O" (>T) and second digit to the right of the bar "1" (>C) .
If so then the left chromosome will read AT and the right chromosome GC. Correct? or is it impossible to tell from the order of the alleles with respect to the vertical bar?
Thank you in advance
From what you're saying the position of the value wrt | is significant. So in your example the last three positions on the left of | implies they are ALT and on the same chromosome whilst to the right of | the values are REF and on the other chromosome. However as the first position is not phased is it possible to associate either allele with those below ie doesn't the block (chromosome segment) actually start with line 2?
From 100Genomes:
I did try to sort this out myself looking at the gene NPC1 on chr18 but in all cases of supposed family trios the child had been redacted, so not possible to check phased formatting.
Thanks brentp.
"/" indicates that it is not phased with anything before it.
"|" indicates that it is phased with (at least) the line before it.
So a block starts with "/' and ends 1 line before the next "/".
So if all you have are unphased genotypes "/" each line is the start and end of its own block.
So, to answer your first question, Yes, you can tell that all 4 variants, even the first are phased together.
OK, so the phasing is with the line(s) before rather than after the
|
. I wish that had been made explicit in the 1000Genomes page.Thanks again