What is Left align and parsimonious (VCF normalisation) ?
1
1
Entering edit mode
6.8 years ago
pinn ▴ 210

Hello

I'm very confused, can any one brief me what is left align and parsimonious mean with simple example ?

Thanks!

genome SNP next-gen sequence • 5.1k views
ADD COMMENT
0
Entering edit mode

In this blog where they review (maybe same person) they indicated for the example in this thread "green variant is not left aligned as you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles."

https://genome.sph.umich.edu/wiki/Variant_Normalization

What does they mean exactly with "you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles"?

ADD REPLY
0
Entering edit mode

This should be a new question, not an answer on an existing question.

The comments are made in reference to the image following those comments, where the REF is CAC and the ALT is C, showing a change going from CAC to C, essentially a CA deletion from the reference sequence. Because this deletion happens in a repeat region, the locus to delete should be the most 3' (left-most), which it is not in this case. An ACA > A change made one base to the left would have the same effect but be denoted more 3' than the current notation. Thus, the mutation cannot be denoted by fewer bases (it is most parsimonious) but can be denoted by something that is more 3' on the sequence (thus is not left-aligned).

The shown change is c.6CAC>C, whereas the most left-aligned would be c.5ACA>A. Both of these would cause one CA to be removed from the reference sequence.

ADD REPLY
0
Entering edit mode

Ah ok so they essentially mean switching one position left on both the REF and ALT when they say add prefix of length 1 and then after that prefix of 1 is added then remove the suffix of length 1. Thanks. As far as being a new question, do I just add a comment if it is related to the question like mine? I wouldn't want to post this as a completely new thread correct?

ADD REPLY
0
Entering edit mode

You could add a comment or open a new question and reference this post there. We want discussion, but not extensive offshoots. In your case, adding a comment would have been better as you only want clarification, and you don't really have a related question.

ADD REPLY
1
Entering edit mode
6.8 years ago

This paper Unified representation of genetic variants or this wiki page from the same authors explain normalization very nicely.

ADD COMMENT
0
Entering edit mode

I read it and performed my analysis. I'm not able to understand LEFT align concept ?

ADD REPLY
2
Entering edit mode

The paper is clear:

A VCF entry is left aligned if and only if its base position is smallest among all potential VCF entries having the same allele length and representing the same variant

In Fig 1, see the difference between A and D. Both variants are of the same length(2 base deletion) and would produce the same effect (of deleting the CA at position 4-5, to join the G at pos 3 to the C at pos 6). However A choose to pick the ALT's pos 6 as the variant's pos, whereas D opts to pick the left side base G at pos 3 as the variant's pos. D is accurate, as its pos does not change even after the mutation happens - it is left-aligned.

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6