How to interpret and filter vg call output VCF
0
1
Entering edit mode
20 days ago
Tobias ▴ 40

Dear vg team,

I have used vg call -a to genotype variants in a number of samples based on vg giraffe alignments to a pangenome graph. Now, I am unsure how to interpret the resulting quality as well as genotype likelihood/probability scores in the VCF, as I have trouble finding information on that in the vg wiki and the vg call paper. I have a couple of associated questions associated so that I can determine how to best filter my call set:

1) How are the QUAL scores inferred? They do not seem to be solely based on read depth. What else do they take into account?

2) What are the criteria to receive a "PASS" FILTER flag? Is it simply having a read depth >= 4?

3) I noticed that "lowad" sites do have calls (mostly, or maybe exclusively, 0/0 and 1/1) but the genotype likelihoods are identical for hom. ref.,hom. alt. and het.. So, how are these calls inferred? Are they just random, or based on some other read mapping information (e.g., with respect to haplotypes)?

4) Ultimately, what would be your recommendations for filtering the VCF file? Simply keeping only PASS sites, or using a certain QUAL threshold?

Any help would be greatly appreciated!

Best,

Tobias

graph genotyping pangenome vg VCF • 397 views
ADD COMMENT

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6