I want to use vcftools to find nucleotide diversity across a set of individuals in a VCF file with the --site-pi
and --window-pi
commands. However, my VCF like any other has missing data at some genomic sites. I want to know, but can't seem to be able to find out, how vcftools accounts for missing data when calculating nucleotide diversity. If 5% of individuals at genomic position 1000 on chromosome 1 have a missing data point, does vcftools throw the entire site out in the calculation? Does it only do the nucleotide diversity calculation on the non-missing sites (which is what I want)? Does it count the missing sites as variants?