Identifying Heterozygous Variants in VCF file using vcftools or bcftools
1
1
Entering edit mode
6.7 years ago
oars ▴ 200

To identify the number of heterozygous variants in my .vcf file, I used the following linux command in vcftools:

 $ vcftools --vcf SRR1611183.gatk.vcf --het

And returned the following:

INDV    O(HOM)  E(HOM)  N_SITES F
NA12878 0   0.0 1101    0.00000

This is the same as the 'het_RA_count' => 1101 found in the vcf-stats feature; however, vcf-stats also lists 'het_AA_count' => 2, which is not included in the --het command. If I'm seeking the total number of heterozygous variants should I just total them up (to 1103)? Perhaps het_AA is not truly heterozygous?

I'm open to a bcftools solution if its more straightforward.

Heterozygous vcftools bcftools • 7.8k views
ADD COMMENT
5
Entering edit mode
6.7 years ago

You can try my 'independent' script and see what it returns: A: calculate Per variant Heterozygosity from VCF file

That just counts the occurences of 0/1 1/0 0|1 and 1|0 per line. The total heterozyguos calls in the file will be the total of the final output column.

Another, more complex version of this is here: A: How to get sample names and genotype for SNP in multi-sample VCF file

Note that you should indeed be using BCFtools and not VCFtools. I believe that this advisory is even given by the VCFtools author (Adam).

Kevin

ADD COMMENT
1
Entering edit mode

Kevin - this is brilliant! Works perfectly, all I needed to do was sum up the 1's in excel.

ADD REPLY
1
Entering edit mode

Great - happy boating (oars... boats...)

ADD REPLY

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6