Is there a quicker way of extracting the number of positions from a vcf file using bcftools query?
When I run
bcftools query -l <file>.vcf.gz | wc
it returns the list/number of samples very rapidly, however when I run bcftools query -f '%POS\n' <file>.vcf.gz | wc as stated here: https://samtools.github.io/bcftools/howtos/query.html, it takes forever.
bcftools query -l needs to read only the header lines, whereas the latter query needs to parse the whole VCF file, so naturally it takes a lot longer than the header-only command.
You could zgrep "^[^#]" file.vcf.gz | awk '{print $2}' - that might be a little faster.
Thanks for the suggestion, but it doesn't seem to be any quicker unfortunately. My file is huge, so I guess this is to be expected!
Indeed. VCF files are heavy and take a while to parse.