Why does the AD value of my VCF file display 3 values?
1
1
Entering edit mode
4 weeks ago
mgranada3 ▴ 50

I have been trying to calculate the allele frequency in my data per these instructions given to me.

However, I got to a sample which has 3 values for the AD column.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  FR06_AU1_150_S12
ChrA_C_glabrata_CBS138  185059  .   T   G   400.64  QD_filter   AC=1;AF=0.500;AN=2;BaseQRankSum=-5.514;DP=323;ExcessHet=0.0000;FS=51.624;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=1.71;ReadPosRankSum=-2.736;SOR=0.040;ANN=G|5_prime_UTR_variant|MODIFIER|CAGL0A01738g|CAGL0A01738g|transcript|CAGL0A01738g-T|protein_coding|1/1|c.-529T>G|||||529|,G|upstream_gene_variant|MODIFIER|CAGL0A01672g|CAGL0A01672g|transcript|CAGL0A01672g-T|protein_coding||c.-3491A>C|||||3464|,G|upstream_gene_variant|MODIFIER|CAGL0A01694g|CAGL0A01694g|transcript|CAGL0A01694g-T|protein_coding||c.-1244A>C|||||1194|,G|downstream_gene_variant|MODIFIER|CAGL0A01716g|CAGL0A01716g|transcript|CAGL0A01716g-T|protein_coding||c.*209T>G|||||89|,G|downstream_gene_variant|MODIFIER|CAGL0A01760g|CAGL0A01760g|transcript|CAGL0A01760g-T|protein_coding||c.*2197A>C|||||2145|,G|downstream_gene_variant|MODIFIER|CAGL0A01782g|CAGL0A01782g|transcript|CAGL0A01782g-T|protein_coding||c.*4607A>C|||||4338| GT:AD:DP:GQ:PL  0/1:170,64:234:99:408,0,4106
ChrA_C_glabrata_CBS138  357752  .   T   G   232.64  FS_filter;QD_filter;ReadPosRankSum_filter;SOR_filter    AC=1;AF=0.500;AN=2;BaseQRankSum=-14.586;DP=1779;ExcessHet=0.0000;FS=135.229;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=-2.448;QD=0.15;ReadPosRankSum=-11.495;SOR=10.961;ANN=G|upstream_gene_variant|MODIFIER|CAGL0A03300g|CAGL0A03300g|transcript|CAGL0A03300g-T|protein_coding||c.-5428A>C|||||4544|,G|downstream_gene_variant|MODIFIER|CAGL0A03322g|CAGL0A03322g|transcript|CAGL0A03322g-T|protein_coding||c.*3236T>G|||||2944|,G|downstream_gene_variant|MODIFIER|CAGL0A03344g|CAGL0A03344g|transcript|CAGL0A03344g-T|protein_coding||c.*726T>G|||||324|,G|downstream_gene_variant|MODIFIER|CAGL0A03366g|CAGL0A03366g|transcript|CAGL0A03366g-T|protein_coding||c.*883A>C|||||482|,G|downstream_gene_variant|MODIFIER|CAGL0A03388g|CAGL0A03388g|transcript|CAGL0A03388g-T|protein_coding||c.*3644A>C|||||3475|,G|non_coding_transcript_exon_variant|MODIFIER|CAGL0A03355r|CAGL0A03355r|transcript|CAGL0A03355r-T|lincRNA|1/1|n.250T>G||||||  GT:AD:DP:GQ:PL  0/1:1304,239:1543:99:240,0,42733
ChrA_C_glabrata_CBS138  526966  .   T   G   810.64  PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=-13.367;DP=484;ExcessHet=0.0000;FS=36.698;MLEAC=1;MLEAF=0.500;MQ=56.97;MQRankSum=-6.884;QD=2.00;ReadPosRankSum=9.343;SOR=0.040;ANN=G|downstream_gene_variant|MODIFIER|CAGL0A04851g|CAGL0A04851g|transcript|CAGL0A04851g-T|protein_coding||c.*3083T>G|||||3083|,G|intergenic_region|MODIFIER|CAGL0A04851g-CHR_END|CAGL0A04851g-CHR_END|intergenic_region|CAGL0A04851g-CHR_END|||n.526966T>G||||||    GT:AD:DP:GQ:PL  0/1:325,80:405:99:818,0,5895
ChrB_C_glabrata_CBS138  334637  .   C   T   80876.06    PASS    AC=2;AF=1.00;AN=2;BaseQRankSum=-1.727;DP=2189;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.99;MQRankSum=0.000;QD=27.11;ReadPosRankSum=-1.068;SOR=0.315;ANN=T|missense_variant|MODERATE|CAGL0B03399g|CAGL0B03399g|transcript|CAGL0B03399g-T|protein_coding|1/1|c.790C>T|p.His264Tyr|1091/5453|790/4995|264/1664||,T|upstream_gene_variant|MODIFIER|CAGL0B03355g|CAGL0B03355g|transcript|CAGL0B03355g-T|protein_coding||c.-4414G>A|||||4360|,T|downstream_gene_variant|MODIFIER|CAGL0B03377g|CAGL0B03377g|transcript|CAGL0B03377g-T|protein_coding||c.*1559C>T|||||1402|,T|downstream_gene_variant|MODIFIER|CAGL0B03421g|CAGL0B03421g|transcript|CAGL0B03421g-T|protein_coding||c.*4444G>A|||||4341|    GT:AD:DP:GQ:PL  1/1:1,2013:2014:99:80890,6011,0
ChrD_C_glabrata_CBS138  851 .   A   *,C 76.01   QD_filter   AC=1,1;AF=0.500,0.500;AN=2;BaseQRankSum=-5.304;DP=161;ExcessHet=0.0000;FS=11.483;MLEAC=1,1;MLEAF=0.500,0.500;MQ=42.31;MQRankSum=-3.151;QD=0.60;ReadPosRankSum=-3.681;SOR=0.053;ANN=C|downstream_gene_variant|MODIFIER|CAGL0D00143g|CAGL0D00143g|transcript|CAGL0D00143g-T|protein_coding||c.*4326T>G|||||4326|,C|intergenic_region|MODIFIER|CHR_START-CAGL0D00143g|CHR_START-CAGL0D00143g|intergenic_region|CHR_START-CAGL0D00143g|||n.851A>C|||||| GT:AD:DP:GQ:PL  1/2:23,88,16:127:73:3125,293,73,2140,0,2177

Specifically, this is seen in ChrD, where is displays

GT:AD:DP:GQ:PL

1/2:23,88,16:127:73:3125,293,73,2140,0,2177

Is this an error or how should this be calculated? Given I was already told to do this, which only takes into account two variables:

enter image description here

GATK vcf SnpEff • 257 views
ADD COMMENT
2
Entering edit mode
4 weeks ago

the alt of *,C is a GATK convention which means there were some reads that had a C allele and some reads that are missing this position because it is located in a larger deletion. So i think the allelic depth is 23 A, 88 missing, 16 C

ADD COMMENT

Login before adding your answer.

Traffic: 1170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6