bcftools filtering error
1
0
Entering edit mode
6.3 years ago
bsmith030465 ▴ 240

Hi,

Apologies for the newbie question! I was trying to get some summary numbers from my vcf file. I wanted:

  1. Summary numbers for all variants that PASS and are INDELS (per sample)
  2. Summary numbers for all variants that PASS and are SNVs (per sample)

I tried the command:

bcftools query -i 'FILTER=="PASS" && TYPE=="INDEL"' vcf_file.vcf > pass.indel.vcf

but that seems to give an error (usage document pops up).

EDIT (to include error(?)/console output):

About:   Extracts fields from VCF/BCF file and prints them in user-defined format
Usage:   bcftools query [options] <A.vcf.gz> [<B.vcf.gz> [...]]

Options:
-e, --exclude <expr>              exclude sites for which the expression is true (see man page for details)
-f, --format <string>             see man page for details
-H, --print-header                print header
-i, --include <expr>              select sites for which the expression is true (see man page for details)
-l, --list-samples                print the list of samples and exit
-o, --output-file <file>          output file name [stdout]
-r, --regions <region>            restrict to comma-separated list of regions
-R, --regions-file <file>         restrict to regions listed in a file
-s, --samples <list>              list of samples to include
-S, --samples-file <file>         file of samples to include
-t, --targets <region>            similar to -r but streams rather than index-jumps
-T, --targets-file <file>         similar to -R but streams rather than index-jumps
-u, --allow-undef-tags            print "." for undefined tags
-v, --vcf-list <file>             process multiple VCFs listed in the file

Examples:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz

So, my questions:

  1. What is the mistake in this query?

  2. Would piping it to 'bcftools stats' (instead of writing to file) give the sample wise count (for indels), or is there an easier way to get that info.

  3. How should I re-work the query to include all SNVs (instead of the indels).

many thanks!

next-gen samtools bcftools • 4.3k views
ADD COMMENT
0
Entering edit mode

What is the error? can you add it to the question ?

ADD REPLY
0
Entering edit mode

It's probably that bcftools query expects a format option. query is a formatter after all.

ADD REPLY
1
Entering edit mode
6.3 years ago
Ram 44k

Why are you using query? You are trying to subset variants, you should be using bcftools view. The -v option will let you pick SNVs/InDels.

bcftools view -v snps -f PASS vcf_file.vcf >snps_vcf.vcf
bcftools view -v indels -f PASS vcf_file.vcf >indels_vcf.vcf
ADD COMMENT
0
Entering edit mode

I was trying to replicate the manual : bcftools filtering

Example in link:

$ bcftools query -i'QUAL>20 && DP>10' -f'%CHROM %POS %QUAL %DP\n' file.bcf
ADD REPLY
0
Entering edit mode

The -f tag is worth noting. bctools query is principally a formatter. I'm not sure why the -i filter won't work though.

ADD REPLY
0
Entering edit mode

So, according to the example, the -f flag would specify which columns to include, right?

Shouldn't '-f PASS vcf_file.vcf' also include which column we want to apply the 'PASS' to?

Sorry for the newbie questions!

ADD REPLY
0
Entering edit mode

The -f flag is a pre-defined format, so you would not be able to use arbitrary values in there. Examples would show you that all format strings begin with a %.

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6