Quick way of returning list of positions from vcf file with bcftools query
1
0
Entering edit mode
4.7 years ago

Hi,

Is there a quicker way of extracting the number of positions from a vcf file using bcftools query?

When I run bcftools query -l <file>.vcf.gz | wc it returns the list/number of samples very rapidly, however when I run bcftools query -f '%POS\n' <file>.vcf.gz | wc as stated here: https://samtools.github.io/bcftools/howtos/query.html, it takes forever.

Thanks

vcf bcftools • 3.7k views
ADD COMMENT
1
Entering edit mode
4.7 years ago
Ram 44k

bcftools query -l needs to read only the header lines, whereas the latter query needs to parse the whole VCF file, so naturally it takes a lot longer than the header-only command.

You could zgrep "^[^#]" file.vcf.gz | awk '{print $2}' - that might be a little faster.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion, but it doesn't seem to be any quicker unfortunately. My file is huge, so I guess this is to be expected!

ADD REPLY
0
Entering edit mode

Indeed. VCF files are heavy and take a while to parse.

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6