Question

VCF filtering with a maximum coverage threshold

1

Entering edit mode

10.4 years ago

rwn ▴ 610

Hello all,

I am working on finding SNPs in a small number of highly similar Pseudomonas genomes. I've used freebayes to call variants with something like:

freebayes -f myREF.fasta --ploidy 1 --standard-filters -F 0.95 -C 5 myBams.sorted.bam > freebayes.vcf

I've already used some filters as above but now I'd like to filter further, using the vcffilter program. My question relates to what might be a "sensible" set of filtering criteria, with particular reference to setting a maximum coverage cut-off (ie. something along the lines of "DP < 250" or something). I'm worried about including SNPs from regions of the genome with super-high coverage, like insertion sequences and other TE's/repeated regions (or at least I'd like to see what the effect of filtering out these regions is).

I realise it's a bit of a how-long-is-a-piece-of-string type question, but was just wondering what people's thoughts were...

Cheers!

VCF freebayes SNPs vcffilter bacteria • 4.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by rwn ▴ 610

Ram · Answer 1 · 2014-07-14

1

Entering edit mode

10.4 years ago

brentp 24k

You might start with Heng Li's paper

and the associated script(s)

(hopefully someone will implement and distribute a python/c/perl-based version of those filters)

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by brentp 24k

0

Entering edit mode

Thanks for the link to the paper brentp :)

ADD REPLY • link 10.4 years ago by rwn ▴ 610