Hello,
I need some suggestions on how to get the allelic counts of the reference and alternate alleles from a GATK derived vcf file. This information is available from the information field of the Freebayes-derived VCF file (see below).
##INFO=<ID=RO,Number=1,Type=Integer,Description="Count of full observations of the reference haplotype.">
##INFO=<ID=AO,Number=A,Type=Integer,Description="Count of full observations of this alternate haplotype.">
However, I am doing SNP calling using GATK, but there is no such information field for reference and alternate allele counts. I also checked that the depth estimation from the DP field provided by Freebayes and GATK is completely different.
Could you please give some suggestions on how should I get the information on allele counts from a VCF file generated by GATK?
Also, could you please give any insights on why Freebayes and GATK gave highly distinct depth estimations?
Thank you. However, I extracted the AD for each SNP that gives two numbers separated by a comma. I guess the first one is for reference and the second one is for alternate alleles. Then I checked the output from Freebayes for the ref. and alternate allele counts and it is a huge difference between the two estimates. Is that expected?
may be downsampling(?), QC failing reads ?
But I used the same bam files in both freebayes and GATK. Even the overlapping SNPs from both software are giving different estimates in terms of ref. and alternate counts from AD and RO+AO fields.
I think GATK only counts informative reads in AD, do you have a DP field? DP should be closer what FreeBayes reports. You can also check GATK documentation on AD and DP here https://gatk.broadinstitute.org/hc/en-us/articles/360035532252-Allele-Depth-AD-is-lower-than-expected
Ok, thank you for the reply. I will check out the DP field then.