Hello all,
I am working on finding SNPs in a small number of highly similar Pseudomonas genomes. I've used freebayes to call variants with something like:
freebayes -f myREF.fasta --ploidy 1 --standard-filters -F 0.95 -C 5 myBams.sorted.bam > freebayes.vcf
I've already used some filters as above but now I'd like to filter further, using the vcffilter program. My question relates to what might be a "sensible" set of filtering criteria, with particular reference to setting a maximum coverage cut-off (ie. something along the lines of "DP < 250" or something). I'm worried about including SNPs from regions of the genome with super-high coverage, like insertion sequences and other TE's/repeated regions (or at least I'd like to see what the effect of filtering out these regions is).
I realise it's a bit of a how-long-is-a-piece-of-string type question, but was just wondering what people's thoughts were...
Cheers!
Thanks for the link to the paper brentp :)