Meaning of 2/1 , 3/1 in frameshift
1
1
Entering edit mode
9.6 years ago
basalganglia ▴ 40

Can anyone help me about explanation of 2/1, 1/1 in VCF. I know that 0 is reference, 1 is first alteration of allele and 2 is for second alteration.

However I recognize that 1/2 and 2/1 . Are they same or not?

Thanks

genotype • 1.6k views
ADD COMMENT
3
Entering edit mode
9.6 years ago
John 13k

Awesome username :)

After sequencing, if enough reads with a variation to the reference sequence pile up at a particular location, a variant is called - but it is very difficult to know if that variant is from the maternal or paternal chromosome.

So a 0/1 means there where some reference-sequence reads that were seen, but also some non-reference-sequence reads. It is impossible at this stage to tell if the non-ref sequence came from the maternal or paternal chromosome.

A 1/1 means only non-reference reads where seen, so both maternal and paternal chromosomes have the SAME variant (although occasionally this can also occur if there's a large deletions in one of the chromosomes, and the other has a variant.)

A 1/2 means only non-reference reads where seen, but maternal and paternal chromosomes have DIFFERENT variants. But again it is impossible to tell if the, say, maternal chromosome had variant 1 or 2.

If the SNP-calling program detects more than two kinds of different variant, it usually assumes the sequencing is crazy and doesn't call anything. 50/50 ratios are important for the callers generally.

Anyway, if you have data from not only the individual but also their relatives, then you can 'phase' the variants and say 'these variants all occur on the same chromosome'. This is trivial when you have the parents, and generally doable to quite a high degree if you just have sibs - although the more the merrier. Phasing your SNPs is cool, because you can then use it to detect chromosomal deletions (impossible SNP combinations), and crossing over events - but again you'll need the pedigree to figure that kind of stuff out.

Back in my day, we have SNP arrays/probes and not sequencing, so perhaps these days you could phase SNPs locally by catching two variants in the same read/read pair. During my Masters I was a big advocate for probabilistic-phasing using the dbSNP database of variant minor/major frequencies (since some variants will phase with others in a population), but I don't know if modern-day programs use any of that information. Who knows... maybe theres even biochemical ways to do it now...

Anyhoo, when variant data IS phased, then you do see 2/1 - because that is telling you that the left-hand-side data is all on the same chromosome. If you see a 2/1, it should follow that all the data in the file is phase correctly (implying the 1/2, 1/0 and 0/1 are also phased correctly), although obviously I cant say with any confidence that that is true for your data.

Hope that helps clear things up :)

ADD COMMENT
0
Entering edit mode

Thank you for your long explanation :) It is so useful for me

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6