What is ExcessHet key in the INFO field of a vcf file
1
0
Entering edit mode
5.2 years ago
Kash ▴ 110

Hi,

Following is a single line from a vcf file I have (there are about 90 samples, so I omitted that part)

chr1    3463    .       C       T       59.40   .    AC=2;AF=0.143;AN=14;DP=13;ExcessHet=0.1703;FS=0.000;MLEAC=2;MLEAF=0.143;MQ=26.38;QD=29.70;SOR=2.303      GT:AD:DP:GQ:PGT:PID:PL
  1. I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.
  2. How can I use this key to filter out positions with excess heterozygotes (Is there a cut of value that I need to use for this)?
SNP next-gen sequencing • 4.1k views
ADD COMMENT
0
Entering edit mode

I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.

it should be defined in the VCF header.

ADD REPLY
1
Entering edit mode
5.2 years ago
Dave Carlson ★ 1.9k

Just checked a vcf file and the header indicates that the ExcessHet field is the

Phred-scaled p-value for exact test of excess heterozygosity

This VCF was created using GATK's HaplotypeCaller, so you can find more information about the ExcessHet statistic here.

Since the ExcessHet score is Phred-scaled, a low score (e.g. zero), would indicate a high p-value. while a large score would indicate a low p-value. This GATK forum post (where I got the above information) has some nice suggestions for options you could use when filtering on ExcessHet.

ADD COMMENT
0
Entering edit mode

Thank you. My final understanding is I need to look at the distribution of my ExcessHet values in the vcf file and then determine a z-score. Based on that z-score I can find the cut off ExcessHet value. Please let me know if I am wrong.

ADD REPLY
0
Entering edit mode

Assuming for the moment that you do want to filter based on ExcessHet, then yes that sound about right to me.

Here is some more information from the Broad that might be helpful:

https://gatkforums.broadinstitute.org/gatk/discussion/23216/how-to-filter-variants-either-with-vqsr-or-by-hard-filtering. Notably

ExcessHet filtering applies only to callsets with a large number of samples, e.g. hundreds of unrelated samples. Small cohorts should not trigger ExcessHet filtering as values should remain small. Note cohorts of consanguinous samples will inflate ExcessHet, and it is possible to limit the annotation to founders for such cohorts by providing a pedigree file during variant calling.

ADD REPLY

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6