How to compute heterozygosity for each individual?
Currently I use:
plink --bfile file --hardy --out file --noweb
but I need .hwe files for each individual separately in a dataset of hundreds of individuals.
How to compute heterozygosity for each individual?
Currently I use:
plink --bfile file --hardy --out file --noweb
but I need .hwe files for each individual separately in a dataset of hundreds of individuals.
--hardy requires many individuals to judge deviation from Hardy-Weinberg equilibrium. Single-individual .hwe files will not be useful.
If you are trying to perform quality control on samples, consider plotting top principal components (computed with EIGENSOFT 6, or plink --pca) and removing extreme outliers.
VCFtools has the function --het
such as vcftools --vcf input.vcf --het --out output.het
--het
Calculates a measure of heterozygosity on a per-individual basis. Specfically, the inbreeding coefficient, F, is estimated for each individual using a method of moments. The resulting file has the suffix ".het".
Check the full manual: https://vcftools.github.io/man_latest.html
Check this post if you have difficulty to interpret the results: Is the heterozygosity flag (--het) in vcftools calculate observed and expected heterozygosity?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I need to compute heterozygosity per individual from genotype data. Not for quality control.
Ah. --het (https://www.cog-genomics.org/plink2/basic_stats#ibc ) should come in handy, then.
Thanks. Two problems:
--het computes observed and expected autosomal homozygous genotype counts for each sample. However, I need heterozygosity for the X chromosome too (separately).
--chr 23 --het
gives zero o(hom) and e(hom). My sample is females only.--het gives an observed heterozygosity of 40% for my example individual, while if I calculate the average o(het) of autosomal snips of the same individual from a .hwe file, o(het) is 30%. For expected heterozygosity the numbers are 36% vs 16%. Thus, the two methods yield very different answers, although if I understand correctly, they should measure the same thing.
As a context: the goal is to compare heterozygosity of X and autosomes between groups, and in order to do a t-test, I need X and autosomal het per individual. I already have average X and autosomal het per group.
It's a dirty hack, but if your .bim file has numeric chromosome codes, you can force the X chromosome to be treated like an autosome by specifying a species with more chromosomes (e.g. "--dog").
That works, but I still need to compute this using
--hardy
too, per individual. Creating a text file with one person, and using--keep myfile.txt
is not possible because of the high number of individuals.