SNPs only output from bcftools
1
0
Entering edit mode
10.2 years ago
biogirl ▴ 210

Hi there,

I'm trying to call SNPs from an alignment (BWA) to a reference genome using bcftools (0.1.18). However, my output contains INDELs too. I'm looking for just the SNPs. Here's what I've done:

samtools mpileup -uf reference.fasta output.bam | 
bcftools view -bvcg - > variants.bcf
bcftools view variants.bcf | vcfutils.pl varFilter > filtered_variants.vcf

And the output looks like this:

#USUAL PREAMBLE
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    WGS
Chrom1 4    .    C    T    125    .   DP=37;VDB=0.0026;AF1=1;AC1=2;DP4=0,0,35,2;MQ=59;FQ=-138    GT:PL:GQ    1/1:158,111,0:99
Chrom1 19    .    CAAAAAAAAAAAA    CAAAAAAAAAA,CAAAAAAAAA    145    .    INDEL;DP=162;VDB=0.0404;AF1=1;AC1=2;DP4=0,0,65,76;MQ=55;FQ=-290    GT:PL:GQ    1/1:186,255,0,177,255,168:99

I want a vcf with just the SNPs i.e. the ones like Chrom1 position 4, not the INDELs. There isn't an 'INDEL' column, so can't just remove that column.......I'm clearly missing something! Ideas would be greatly appreciated.

Thanks

sequencing snp bcftools • 14k views
ADD COMMENT
0
Entering edit mode

bcftools 0.1.18 is quite old. Try updating both it and samtools to 1.0.

ADD REPLY
0
Entering edit mode

So you can't call just SNPs in older versions of bcftools?

ADD REPLY
1
Entering edit mode

No one is going to offer support for such an old version. 0.1.18 may well have had a bug in this regard, though I have zero interest in checking.

ADD REPLY
0
Entering edit mode

Alright, only asking!

ADD REPLY
3
Entering edit mode
10.2 years ago
poisonAlien ★ 3.2k

Add -I argument to skip Indels.

bcftools view -I -bvcg - > variants.bcf 

Consensus/variant calling options:
       -c  SNP calling (force -e)
       -d FLOAT  skip loci where less than FLOAT fraction of samples covered [0]
       -e        likelihood based analyses
       -g        call genotypes at variant sites (force -c)
       -i FLOAT  indel-to-substitution ratio [-1]
       -I        skip indels
       -m FLOAT  alternative model for multiallelic and rare-variant calling, include if P(chi^2)>=FLOAT
       -p FLOAT  variant if P(ref|D)<FLOAT [0.5]
       -P STR    type of prior: full, cond2, flat [full]
       -t FLOAT  scaled substitution mutation rate [0.001]
       -T STR    constrained calling; STR can be: pair, trioauto, trioxd and trioxs (see manual) [null]
       -v        output potential variant sites only (force -c)
ADD COMMENT
0
Entering edit mode

Thanks, that works. I'd clearly skipped past that and focused just on the -c flag.

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6