I want to call variants from Illumina single-end reads of canine origin. Although, the reads were obtained after an initial loci-enrichment using Nimblegen sequence capture array, I aligned them to the canine whole genome. Now, I want to avoid variant calls that originate from reads mapped to chromosomes that were not involved in the enrichment. So I plan to use -q and -D options in the mpileup and varFilter respectively. My questions are: 1. What thresholds to use for the above filters. 2. Can you suggest any additional filters? 3. Is there an intuitive way to look at the read-depths and then decide on a threshold?
Thanks Zev. The bed file filter did not work in mpileup for some reason (I am trying to redo it). However I was able to use the same bed file to filter my original vcf file (no filters) in vcftools.
Oops...spoke too soon. The BED file from Nimblegen does not work in either mpileup or vcftools. I am able to load the BED file on UCSC Browser and see it. It is tab-delimited and looks like this:
track name=target_region description="Target Regions"
chr1 11398197 11398414 chr1 11453999 11454327 chr1 11456061 11456496 chr1 11537462 11537663 Any ideas as to what might be wrong?
What is the Samtools error?
No error specified, it just freezes.
In vcftools, I changed the first column to just integers (removed chr). Vcftools now seems to do the filtering (as seen from log file) but does not write the filtered sites to a new file even when I gave the --out specification.