Bcftools view options for SNPs calling.
1
3
Entering edit mode
9.4 years ago
Eddie_I ▴ 30

Hi there,

Newbie to the bioinformatic area and looking for some help.

I'm trying to use bcftools to do some SNPs calling from a bcf file created out of samtools mpileup.

In a lot of posts I see the bcftools view options as -bvcg, but when I try it I get the error

bcftools: unknown option -- b

In reading I see that bcftools have changed their view options with the more recent versions (I'm running 1.2), so was wondering if any one can translate the old options into what to use for the new options values. I'm trying :-

-O b -v snps -g het

but since I have never used bcftools before I don't know if that would get me the same output.

Thanks for your assistance in advance.

Eddie

bcftools samtools SNP mpileup • 29k views
ADD COMMENT
2
Entering edit mode

Hi!

There is no -b option in the bcftools manual.

-O b (in your second command line) means to get output as a compressed BCF file.

To do SNP calling, I usually use the command line:

bcftools call --skip-variants indels --multiallelic-caller --variants-only  -O v <output.bcf> -o <output.vcf>
ADD REPLY
0
Entering edit mode

Thanks for the reply. I will give bcftools call a try.

Yes, I was trying to use the command bcftools view -bvcg ..., but it seem those were the valid options for an older version of bcftools.

I found this list of options for bcftools view (an old version, which explained the -bvcg to me):

Usage: bcftools view [options] <in.bcf> [reg]

Input/output options:

       -A        keep all possible alternate alleles at variant sites
       -b        output BCF instead of VCF
       -D FILE   sequence dictionary for VCF->BCF conversion [null]
       -F        PL generated by r921 or before (which generate old ordering)
       -G        suppress all individual genotype information
       -l FILE   list of sites (chr pos) or regions (BED) to output [all sites]
       -L        calculate LD for adjacent sites
       -N        skip sites where REF is not A/C/G/T
       -Q        output the QCALL likelihood format
       -s FILE   list of samples to use [all samples]
       -S        input is VCF
       -u        uncompressed BCF output (force -b)

Consensus/variant calling options:

       -c        SNP calling (force -e)
       -d FLOAT  skip loci where less than FLOAT fraction of samples covered [0]
       -e        likelihood based analyses
       -g        call genotypes at variant sites (force -c)
       -i FLOAT  indel-to-substitution ratio [-1]
       -I        skip indels
       -m FLOAT  alternative model for multiallelic and rare-variant calling, include if P(chi^2)>=FLOAT
       -p FLOAT  variant if P(ref|D)<FLOAT [0.5]
       -P STR    type of prior: full, cond2, flat [full]
       -t FLOAT  scaled substitution mutation rate [0.001]
       -T STR    constrained calling; STR can be: pair, trioauto, trioxd and trioxs (see manual) [null]
       -v        output potential variant sites only (force -c)

Contrast calling and association test options:

       -1 INT    number of group-1 samples [0]
       -C FLOAT  posterior constrast for LRT<FLOAT and P(ref|D)<0.5 [1]
       -U INT    number of permutations for association testing (effective with -1) [0]
       -X FLOAT  only perform permutations for P(chi^2)<FLOAT [0.01]

but the version 1.2 bcftools view options are :

$ bcftools view

About:   VCF/BCF conversion, view, subset and filter VCF/BCF files.
Usage:   bcftools view [options] <in.vcf.gz> [region1 [...]]

Output options:
    -G,   --drop-genotypes              drop individual genotype information (after subsetting if -s option set)
    -h/H, --header-only/--no-header     print the header only/suppress the header in VCF output
    -l,   --compression-level [0-9]     compression level: 0 uncompressed, 1 best speed, 9 best compression [-1]
    -o,   --output-file <file>          output file name [stdout]
    -O,   --output-type <b|u|z|v>       b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
    -r, --regions <region>              restrict to comma-separated list of regions
    -R, --regions-file <file>           restrict to regions listed in a file
    -t, --targets [^]<region>           similar to -r but streams rather than index-jumps. Exclude regions with "^" prefix
    -T, --targets-file [^]<file>        similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix

Subset options:
    -a, --trim-alt-alleles        trim alternate alleles not seen in the subset
    -I, --no-update               do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN)
    -s, --samples [^]<list>       comma separated list of samples to include (or exclude with "^" prefix)
    -S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
        --force-samples           only warn about unknown subset samples

Filter options:
    -c/C, --min-ac/--max-ac <int>[:<type>]      minimum/maximum count for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -f,   --apply-filters <list>                require at least one of the listed FILTER strings (e.g. "PASS,.")
    -g,   --genotype [^]<hom|het|miss>          require one or more hom/het/missing genotype or, if prefixed with "^", exclude sites with hom/het/missing genotypes
    -i/e, --include/--exclude <expr>            select/exclude sites for which the expression is true (see man page for details)
    -k/n, --known/--novel                       select known/novel sites only (ID is not/is '.')
    -m/M, --min-alleles/--max-alleles <int>     minimum/maximum number of alleles listed in REF and ALT (e.g. -m2 -M2 for biallelic sites)
    -p/P, --phased/--exclude-phased             select/exclude sites where all samples are phased
    -q/Q, --min-af/--max-af <float>[:<type>]    minimum/maximum frequency for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -u/U, --uncalled/--exclude-uncalled         select/exclude sites without a called genotype
    -v/V, --types/--exclude-types <list>        select/exclude comma-separated list of variant types: snps,indels,mnps,other [null]
    -x/X, --private/--exclude-private           select/exclude sites where the non-reference alleles are exclusive (private) to the subset samples

So just trying to find the new options variable that will get me the same output.

ADD REPLY
0
Entering edit mode

Hi Evgeniia,

Gave bcftools call a try and it worked for me.

Thanks for your help.

Eddie

ADD REPLY
0
Entering edit mode

You are welcome! :)

ADD REPLY
1
Entering edit mode
6.3 years ago

For anybody else arriving at this thread:

# 1, call variants
    bcftools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF \
      --output-tags DP,AD \
      --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
      -f "${Ref_FASTA}" \
      --BCF "${repBAM1}" "${repBAM2}" "${repBAM3}" "${repBAM4}" | \
    bcftools call --multiallelic-caller --variants-only -Ob > out.bcf ;

#2, filter the variants further
    #Options:
    #   -Q INT  minimum RMS mapping quality for SNPs [10]
    #   -d INT  minimum read depth [2]
    #   -D INT  maximum read depth [10000000]
    #   -a INT  minimum number of alternate bases [2]
    #   -w INT  SNP within INT bp around a gap to be filtered [3]
    #   -W INT  window size for filtering adjacent gaps [10]
    #   -1 FLOAT    min P-value for strand bias (given PV4) [0.0001]
    #   -2 FLOAT    min P-value for baseQ bias [1e-100]
    #   -3 FLOAT    min P-value for mapQ bias [0]
    #   -4 FLOAT    min P-value for end distance bias [0.0001]
    #   -e FLOAT    min P-value for HWE (plus F<0) [0.0001]
    #   -p      print filtered variants

    bcftools view -Ov out.bcf | misc/vcfutils.pl varFilter -d 18 -w 1 -W 3 -a 1 \
    -1 0.05 -2 0.05 -3 0.05 -4 0.05 \
    -e 0.05 -p > out.filt.vcf ;

misc/vcfutils.pl should come bundled with BCFtools.

You can then add further tags to the VCF/BCF by following this: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6