Hi,
Apologies for the newbie question! I was trying to get some summary numbers from my vcf file. I wanted:
- Summary numbers for all variants that PASS and are INDELS (per sample)
- Summary numbers for all variants that PASS and are SNVs (per sample)
I tried the command:
bcftools query -i 'FILTER=="PASS" && TYPE=="INDEL"' vcf_file.vcf > pass.indel.vcf
but that seems to give an error (usage document pops up).
EDIT (to include error(?)/console output):
About: Extracts fields from VCF/BCF file and prints them in user-defined format
Usage: bcftools query [options] <A.vcf.gz> [<B.vcf.gz> [...]]
Options:
-e, --exclude <expr> exclude sites for which the expression is true (see man page for details)
-f, --format <string> see man page for details
-H, --print-header print header
-i, --include <expr> select sites for which the expression is true (see man page for details)
-l, --list-samples print the list of samples and exit
-o, --output-file <file> output file name [stdout]
-r, --regions <region> restrict to comma-separated list of regions
-R, --regions-file <file> restrict to regions listed in a file
-s, --samples <list> list of samples to include
-S, --samples-file <file> file of samples to include
-t, --targets <region> similar to -r but streams rather than index-jumps
-T, --targets-file <file> similar to -R but streams rather than index-jumps
-u, --allow-undef-tags print "." for undefined tags
-v, --vcf-list <file> process multiple VCFs listed in the file
Examples:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz
So, my questions:
What is the mistake in this query?
Would piping it to 'bcftools stats' (instead of writing to file) give the sample wise count (for indels), or is there an easier way to get that info.
How should I re-work the query to include all SNVs (instead of the indels).
many thanks!
What is the error? can you add it to the question ?
It's probably that bcftools query expects a
format
option.query
is a formatter after all.