How is QUAL score calculated in multigenome VCF file?
1
0
Entering edit mode
8.6 years ago
MAPK ★ 2.1k

Suppose I have two VCF files with one sample in each and want to merge them using python scripts. I want to know if the QUAL score in these files should be summed up or taken as an averaged for the final QUAL score. Can someone please explain me how is this score differ when merging two different files with different number of samples as well?

vcf • 3.0k views
ADD COMMENT
2
Entering edit mode
8.6 years ago
Len Trigg ★ 1.6k

So, in simple terms QUAL is defined as a (scaled etc) probability measure that the site is variant for any of the samples (Note that it is therefore not a good measure for whether a particular sample in a multisample VCF is variant, for that you are probably better off with GQ).

So, say you know the probability that the site is variant for sample A from one VCF P(A), and similarly for the second sample B from the other VCF P(B), what you need for the combined VCF is P(A ∪ B) = P(A) + P(B) - P(A ∩ B)). See that extra term that isn't available? Thats the probability that the site is variant in both samples, and will vary according to the independence of A and B. If the samples are very unrelated you can possibly ignore the term, whereas if the samples are highly related (e.g. same family), it should definitely not be ignored. Variant callers like the RTG pedigree aware callers incorporate this information into their scoring, and can be quite hard to bolt on after the fact -- you'll have to make simplifying assumptions depending on your samples.

ADD COMMENT

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6