Question

I can not run bcftools stats (help)

0

Entering edit mode

5.7 years ago

zion22 ▴ 70

Hi I would like to make statistics from vcf.gz files using bcftools stats, but when I try to run the following script, it generates files without weight my script is this:

> bcftools stats -F "My_reference_genome.fasta" -s "My_vcf.gz_file.vcf.gz" > "/T1_.vcf.stats"

Immediately ran the script it get me the following on the command screen:

About:   Parses VCF or BCF and produces stats which can be plotted using plot-vcfstats.
     When two files are given, the program generates separate stats for intersection
     and the complements. By default only sites are compared, -s/-S must given to include
     also sample columns.
Usage:   bcftools stats [options] <A.vcf.gz> [<B.vcf.gz>]

Options:
        --af-bins <list>               allele frequency bins, a list (0.1,0.5,1) or a file (0.1\n0.5\n1)
        --af-tag <string>              allele frequency tag to use, by default estimated from AN,AC or GT
    -1, --1st-allele-only              include only 1st allele at multiallelic sites
    -c, --collapse <string>            treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
    -d, --depth <int,int,int>          depth distribution: min,max,bin size [0,500,1]
    -e, --exclude <expr>               exclude sites for which the expression is true (see man page for details)
    -E, --exons <file.gz>              tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed)
    -f, --apply-filters <list>         require at least one of the listed FILTER strings (e.g. "PASS,.")
    -F, --fasta-ref <file>             faidx indexed reference sequence file to determine INDEL context
    -i, --include <expr>               select sites for which the expression is true (see man page for details)
    -I, --split-by-ID                  collect stats for sites with ID separately (known vs novel)
    -r, --regions <region>             restrict to comma-separated list of regions
    -R, --regions-file <file>          restrict to regions listed in a file
    -s, --samples <list>               list of samples for sample stats, "-" to include all samples
    -S, --samples-file <file>          file of samples to include
    -t, --targets <region>             similar to -r but streams rather than index-jumps
    -T, --targets-file <file>          similar to -R but streams rather than index-jumps
    -u, --user-tstv <TAG[:min:max:n]>  collect Ts/Tv stats for any tag using the given binning [0:1:100]
        --threads <int>                number of extra decompression threads [0]
    -v, --verbose                      produce verbose per-site and per-sample output

If anyone could help me, I'd be very grateful. thanks

genome • 2.7k views

ADD COMMENT • link 5.7 years ago by zion22 ▴ 70

1

Entering edit mode

Any reason why you deleted your question, zion22? - I have undeleted it. prasundutta87 went to the trouble of providing an answer and you should respect that.

ADD REPLY • link 5.7 years ago by Kevin Blighe 88k

Ram · Answer 1 · 2019-03-17

1

Entering edit mode

5.7 years ago

prasundutta87 ▴ 670

'-s' stands for list of samples for sample stats

The command expects sample names and not the VCF file as you have written.

The correct command should be

bcftools stats -F "My_reference_genome.fasta" -s - "My_vcf.gz_file.vcf.gz" > "/T1_.vcf.stats"

This would of course give you stats for all your samples

ADD COMMENT • link updated 5.7 years ago by Ram 44k • written 5.7 years ago by prasundutta87 ▴ 670