I have a vcf file with all genomic sites including invariable, which I generated with the option --ALL-SITES in gatk GenotypeGVCFs. I want to filter that file for quality, coverage etc, but the VCFtools command below excludes all invariant sites. Why? It doesn't seem like it should. I wanna keep them (if they pass filters).
module load vcftools/0.1.14/INTEL-18.0.0
vcftools --gzvcf raw_unfiltered_ALLSITES.vcf --max-missing 0.5 --minQ 30 --minDP 5 --recode --recode-INFO-all --out temp
Consider using
bcftools
instead.VCFtools
has not been updated for a long time and it won't be updated (according to its author).bcftools
acts more reasonably thanVCFtools
.Maybe describe more why you want to keep invariant sites in a variant call file? I think VCFs and vcftools are designed with the assumption that these files describe variants.
I too have come across this thread looking for the same answer. . why: Basically, to use pixy (https://pixy.readthedocs.io/en/latest/about.html) they require invariant sites https://pixy.readthedocs.io/en/latest/generating_invar/generating_invar.html#generating-allsites-vcfs-using-gatk , once you have the output file, it needs to be filtered. (yes I could have, should have, filtered all the g.vcf files first). But the gatk GenomicsDBImport step took weeeeeks to run ... I dont want to have to go back a step.
I will look into bcftools as an alternative
It seems "--minQ 30" filtered out most of your invariant sites. Quality has a different meaning for invariant sites as it's to variant sites. You should follow the pixy guide, separate the allsite VCF into invariant.vcf and variant.vcf and filter them separately.