Dear vg
team,
I have used vg call -a
to genotype variants in a number of samples based on vg giraffe
alignments to a pangenome graph. Now, I am unsure how to interpret the resulting quality as well as genotype likelihood/probability scores in the VCF, as I have trouble finding information on that in the vg
wiki and the vg call
paper. I have a couple of associated questions associated so that I can determine how to best filter my call set:
1) How are the QUAL scores inferred? They do not seem to be solely based on read depth. What else do they take into account?
2) What are the criteria to receive a "PASS" FILTER flag? Is it simply having a read depth >= 4?
3) I noticed that "lowad" sites do have calls (mostly, or maybe exclusively, 0/0 and 1/1) but the genotype likelihoods are identical for hom. ref.,hom. alt. and het.. So, how are these calls inferred? Are they just random, or based on some other read mapping information (e.g., with respect to haplotypes)?
4) Ultimately, what would be your recommendations for filtering the VCF file? Simply keeping only PASS sites, or using a certain QUAL threshold?
Any help would be greatly appreciated!
Best,
Tobias