Question

VCF - does mother or father come first in the comma separated ALT

0

Entering edit mode

5.4 years ago

LayneSadler ▴ 90

Does either the mother or father's chromosome always come first in the comma separated ALT of a VCF?

Or is it impossible to tell which chromosome/ ploid came from which parent?

ALT = [A, T]

Are the slash separated "0/0" and "1" in any way indicative of the actual chromosome that the variant falls under?

vcf gvcf heterozygous • 2.1k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.4 years ago by LayneSadler ▴ 90

0

Entering edit mode

I guess if ploids are split and read one at a time... the first position would at least always correspond to the same sequence for the length of the read?

But how can you trust a codon type when you don't know what sample comes from what index order in the alt column? https://en.wikipedia.org/wiki/DNA_codon_table

ADD REPLY • link 5.4 years ago by LayneSadler ▴ 90

score 6 · Answer 1 · 2019-06-29

6

Entering edit mode

5.4 years ago

Ram 44k

Comma-separated ALT alleles are multi-allelic loci, which can tell you that at a given location, one or more samples have multiple alleles that do not match the reference allele. The reference genome could have an A, for example, and one of the genomes might be a compound heterozygous change, such as a C/G. Or, two different samples could have two different changes - one of them might be a C/C and the other a G/G for instance.

Alleles separated by / (like A/T) are unphased, which means their order does not show if the first is from the mother or the father. Phased variants are separated by | (like A|T, and IIRC the first is from the father and the second from the mother.

ADD COMMENT • link 5.4 years ago by Ram 44k

0

Entering edit mode

wow. thanks for teaching me about phased | vs. unphased /. Father first in phased.

ADD REPLY • link 5.4 years ago by LayneSadler ▴ 90

0

Entering edit mode

I guess if ploids are split and read one at a time... the first position would at least always correspond to the same sequence for the length of the read?

But how can you trust a codon type when you don't know what sample comes from what index order in the alt column? https://en.wikipedia.org/wiki/DNA_codon_table

ADD REPLY • link 5.4 years ago by LayneSadler ▴ 90

0

Entering edit mode

Sorry, I've only ever worked with diploid organisms, so I do not know anything about polyploid VCFs. I don't think read length has anything to do with VCF content directly, read length correlates to read quality if anything, and read quality goes into calculating mapping quality, I think. Read length should not affect allele placement.

I also don't know how codon type correlates to alleles. Samples are linked to alleles through the GT (genotype) field. The GT notation is determined by ploidy - it is one digit per haploid set, so diploid organisms would have two numbers and triploid would have three. These numbers are separated by a / or | based on phasing. The number values range from 0 (REF allele) through n-1 where n is the number of ALT alleles for that variant.

I recommend you read the VCF spec - it will answer any questions you have about how a variant may be represented, across types of variants and across samples, ploidies, etc.

ADD REPLY • link 5.4 years ago by Ram 44k