In the variant call format v4.1,one example showed the variant result as follows:
ref:G, alt: A, NA00001:0|0,NA00002:1|0,NA00003:1/1
Every snp position of variants is like above. I was very confused. Is not there any snp position which changed to any other genotype like C,T? At this position, we had many samples, but i could not find any other alterations like the substance i mentioned above.
No, but at this detailed position this is the case. so i want to know if G could change to any other genotype except for A.we had many samples, but i could not find any other alterations at this potion like the substance i mentioned above.
Short answer: No, you won't see non REF ALT nucleotides at that position in any of your samples.
Long answer: VCF stores entries like so:
Each line is a position in the ref genome that sees a difference in at least one of your samples. If a sample has REF/REF, you'd see 0/0. ALT/ALT is 1/1. REF/ALT is 0/1 - these are the genotypes (homozygous and heterozygous).
Multi allelic variants are where multiple (>2) bases are seen at the same locus on the samples. Multi allelic variants usually have a comma separated list of ALT alleles.
So, if you see only 1 REF and 1 ALT allele, you can rest assured that all your samples either contain REF/REF, ALT/ALT or REF/ALT. No third nucleotide is involved at that position.
Yes, SNV has single character REF and ALT. The entries are single for any bi-allelic variant - even indels. The length of the REF or ALT might vary, but the entry is still just one.
For example: at one specific position, reference genotype is C, alteration genotype is G. At this same position, NA1 genotype is CC, NA2 genotype is CG, NA3 genotype is CA, NA4 genotype is CT. So NA1 may be represented with 0/0, NA2 with 0/1, NA3 with 0/x, NA4 with 0/z. The question is what are x and y, 0 or 1 or else?
REF allele (not genotype) is C and ALT allele is not G, but from your example, G,A,T. I think increasing numbers from 0 are used for REF, ALT1, ALT2. In this case, you'd have 0,1,2,3 - where 0 is the REF allele and 1,2 and 3 are the various ALT alleles.
Maybe the region had only 3 alleles, or the 4th allele fell below the threshold frequency and was deemed a sequencing error rather than an actual variant. How do you know it is a known variant - what is your source of information that you're checking the VCF against?
Thank you for your reply. I have understood the problem.
Plus, I used the same pipiline, software and the same dataset, but with different versions of software. The positions of variants seldom were the same. Do you think it was normal?
You say you call variants yourself - so the VCF is the source of info on the variants, but you also mention that your samples have 4 alleles at a locus. How do you know they have 4 alleles at that locus if the VCF is your only source of information?
Are you saying that all your REFs and ALTs are either Gs or As?
No, but at this detailed position this is the case. so i want to know if G could change to any other genotype except for A.we had many samples, but i could not find any other alterations at this potion like the substance i mentioned above.