VCF filtering with a maximum coverage threshold
1
1
Entering edit mode
10.4 years ago
rwn ▴ 610

Hello all,

I am working on finding SNPs in a small number of highly similar Pseudomonas genomes. I've used freebayes to call variants with something like:

freebayes -f myREF.fasta --ploidy 1 --standard-filters -F 0.95 -C 5 myBams.sorted.bam > freebayes.vcf

I've already used some filters as above but now I'd like to filter further, using the vcffilter program. My question relates to what might be a "sensible" set of filtering criteria, with particular reference to setting a maximum coverage cut-off (ie. something along the lines of "DP < 250" or something). I'm worried about including SNPs from regions of the genome with super-high coverage, like insertion sequences and other TE's/repeated regions (or at least I'd like to see what the effect of filtering out these regions is).

I realise it's a bit of a how-long-is-a-piece-of-string type question, but was just wondering what people's thoughts were...

Cheers!

VCF freebayes SNPs vcffilter bacteria • 4.1k views
ADD COMMENT
1
Entering edit mode
10.4 years ago
brentp 24k

You might start with Heng Li's paper

and the associated script(s)

(hopefully someone will implement and distribute a python/c/perl-based version of those filters)

ADD COMMENT
0
Entering edit mode

Thanks for the link to the paper brentp :)

ADD REPLY

Login before adding your answer.

Traffic: 1794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6