Entering edit mode
3.3 years ago
Vic
▴
100
Hello everyone,
I have a very simple question about vcf file genotypes. This file was made in GATK then filtered in vcftools. Here is an example from my vcf file:
NC_037638.1 143865 . C *,G 7980.77 . 0/0:40,0,0:40:99:.:.:0,108,1472,108,1472,1472:. 1|2:6,54,47:107:99:0|1:143858_C_*:3960,1714,1811,2016,0,2126:143858
This has two samples. The first one is homozygous for the reference allele C represented by 0/0, the next one has the genotype 1/2 which is the alternate alleles, one is an overlapping or spanning deletion and one is the allele G.
Am I correct in thinking 1 = overlapping/ spanning deletion and 2 = G ?
additionally, how would the * be represented in a ped file? would it just be called as a 0?
many thanks!
For your question about the alt alleles, as far as I understand, you are correct about the GT field and the alt alleles. So yes, 0=C, 1=* and 2=G
Awesome, thanks!