variant call format
1
0
Entering edit mode
10.1 years ago
897598644 ▴ 100

Excuse me:

In the variant call format v4.1,one example showed the variant result as follows:

ref:G, alt: A, NA00001:0|0,NA00002:1|0,NA00003:1/1

Every snp position of variants is like above. I was very confused. Is not there any snp position which changed to any other genotype like C,T? At this position, we had many samples, but i could not find any other alterations like the substance i mentioned above.

Thx in advance!

next-gen-sequencing • 3.2k views
ADD COMMENT
0
Entering edit mode

Are you saying that all your REFs and ALTs are either Gs or As?

ADD REPLY
0
Entering edit mode

No, but at this detailed position this is the case. so i want to know if G could change to any other genotype except for A.we had many samples, but i could not find any other alterations at this potion like the substance i mentioned above.

ADD REPLY
4
Entering edit mode
10.1 years ago
Ram 44k

Short answer: No, you won't see non REF ALT nucleotides at that position in any of your samples.

Long answer: VCF stores entries like so:

Each line is a position in the ref genome that sees a difference in at least one of your samples. If a sample has REF/REF, you'd see 0/0. ALT/ALT is 1/1. REF/ALT is 0/1 - these are the genotypes (homozygous and heterozygous).

Multi allelic variants are where multiple (>2) bases are seen at the same locus on the samples. Multi allelic variants usually have a comma separated list of ALT alleles.

So, if you see only 1 REF and 1 ALT allele, you can rest assured that all your samples either contain REF/REF, ALT/ALT or REF/ALT. No third nucleotide is involved at that position.

ADD COMMENT
0
Entering edit mode

If this mutation type is snv, I see only 1 REF and 1 ALT allele. But commonly speaking, this is imcompatible with the case I think.

ADD REPLY
0
Entering edit mode

Yes, SNV has single character REF and ALT. The entries are single for any bi-allelic variant - even indels. The length of the REF or ALT might vary, but the entry is still just one.

Also, what case are you referring to?

ADD REPLY
0
Entering edit mode

For example: at one specific position, reference genotype is C, alteration genotype is G. At this same position, NA1 genotype is CC, NA2 genotype is CG, NA3 genotype is CA, NA4 genotype is CT. So NA1 may be represented with 0/0, NA2 with 0/1, NA3 with 0/x, NA4 with 0/z. The question is what are x and y, 0 or 1 or else?

Thx in advance!

ADD REPLY
0
Entering edit mode

REF allele (not genotype) is C and ALT allele is not G, but from your example, G,A,T. I think increasing numbers from 0 are used for REF, ALT1, ALT2. In this case, you'd have 0,1,2,3 - where 0 is the REF allele and 1,2 and 3 are the various ALT alleles.

ADD REPLY
0
Entering edit mode

But i only found 1 and 2, did not find 3 in the vcf file. So what is the possible reason?

Best!

ADD REPLY
0
Entering edit mode

Maybe the region had only 3 alleles, or the 4th allele fell below the threshold frequency and was deemed a sequencing error rather than an actual variant. How do you know it is a known variant - what is your source of information that you're checking the VCF against?

ADD REPLY
0
Entering edit mode

I call the variants myself.

ADD REPLY
0
Entering edit mode

Hi 897598644,

Could you move this to a reply on my comment please? That would involve the following steps:

  • Copy the contents of your reply from this answer
  • Click on "Add Reply" on my comment here: variant call format
  • Paste the copied text
  • Click on the green "Add Comment" button

Thank you!

ADD REPLY
0
Entering edit mode

Thank you for your reply. I have understood the problem.

Plus, I used the same pipiline, software and the same dataset, but with different versions of software. The positions of variants seldom were the same. Do you think it was normal?

Best!

ADD REPLY
0
Entering edit mode

That should not happen. A large part of the results should overlap, with minimal new/deleted variants - especially with just version changes.

ADD REPLY
0
Entering edit mode

You say you call variants yourself - so the VCF is the source of info on the variants, but you also mention that your samples have 4 alleles at a locus. How do you know they have 4 alleles at that locus if the VCF is your only source of information?

ADD REPLY

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6