Hello
I'm very confused, can any one brief me what is left align and parsimonious mean with simple example ?
Thanks!
Hello
I'm very confused, can any one brief me what is left align and parsimonious mean with simple example ?
Thanks!
This paper Unified representation of genetic variants or this wiki page from the same authors explain normalization very nicely.
The paper is clear:
A VCF entry is left aligned if and only if its base position is smallest among all potential VCF entries having the same allele length and representing the same variant
In Fig 1, see the difference between A and D. Both variants are of the same length(2 base deletion) and would produce the same effect (of deleting the CA
at position 4-5, to join the G
at pos 3 to the C
at pos 6). However A choose to pick the ALT's pos 6 as the variant's pos, whereas D opts to pick the left side base G
at pos 3 as the variant's pos. D is accurate, as its pos does not change even after the mutation happens - it is left-aligned.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
In this blog where they review (maybe same person) they indicated for the example in this thread "green variant is not left aligned as you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles."
https://genome.sph.umich.edu/wiki/Variant_Normalization
What does they mean exactly with "you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles"?
This should be a new question, not an answer on an existing question.
The comments are made in reference to the image following those comments, where the REF is
CAC
and the ALT isC
, showing a change going fromCAC
toC
, essentially aCA
deletion from the reference sequence. Because this deletion happens in a repeat region, the locus to delete should be the most 3' (left-most), which it is not in this case. AnACA
>A
change made one base to the left would have the same effect but be denoted more 3' than the current notation. Thus, the mutation cannot be denoted by fewer bases (it is most parsimonious) but can be denoted by something that is more 3' on the sequence (thus is not left-aligned).The shown change is
c.6CAC>C
, whereas the most left-aligned would bec.5ACA>A
. Both of these would cause oneCA
to be removed from the reference sequence.Ah ok so they essentially mean switching one position left on both the REF and ALT when they say add prefix of length 1 and then after that prefix of 1 is added then remove the suffix of length 1. Thanks. As far as being a new question, do I just add a comment if it is related to the question like mine? I wouldn't want to post this as a completely new thread correct?
You could add a comment or open a new question and reference this post there. We want discussion, but not extensive offshoots. In your case, adding a comment would have been better as you only want clarification, and you don't really have a related question.