Hey All,
I only used so far three filters for my whole exome pipeline (aligning to hg19) for a HapMap sample. I tried it on the NA19240 Hapmap sample from paper below (Table 3) which shows ~196 variants (SNPs and INDELs). http://www.nature.com/nmeth/journal/v9/n2/extref/nmeth.1810-S1.pdf
However, using my filters as below I get = ~15000 (just NONSYNONYMOUSCODING alterations). If you add INDELS, it's going to be much higher number. What am I doing wrong?
My list of filters are:
1) vcfutils varFilter -D1000 2) snpEff -minQ 20 -minCoverage 30
Could they have different filters like frequency of variants etc.? If so, how do I set these up? Any help? What are the default parameters for # of reads (minimum) and frequency in bwa,samtools?
Below is my pipeline:
- bwa aln hg19.fa S375R1.fastq > S3751.sai
- bwa aln hg19.fa S375R2.fastq > S3752.sai
- bwa sampe hg19.fa S3751.sai S3752.sai S375R1.fastq S375R2.fastq > S375NoIndexL007.sam
- samtools view -bS S375NoIndexL007.sam > S375NoIndexL007.bam
- samtools sort S375NoIndexL007.bam S375NoIndexL007.sorted
- Marked duplicates using picard
- samtools index S375NoIndexL007.marked.bam
- samtools mpileup -uf hg19.fa S375NoIndexL007.marked.bam | bcftools view -bvcg - > S375NoIndexL007.raw.bcf
- bcftools view S375NoIndexL007.raw.bcf | vcfutils.pl varFilter -D1000 > S375NoIndexL007vard200.flt.vcf
HI Zev, thanks very much for your reply.
Q1 Does it matter if I re-align after marking duplicates. Just to avoid repeating the duplication?
Q2 How do I filter on maping quality. I found this link and I didn't understand th filter in the example. What filter on mapping quality do you recommend?
I would remove the duplicates and then do the realignment; it will go faster. Samtools view: '-q INT Skip alignments with MAPQ smaller than INT [0]'. 20 is what we use. This mean those reads that map with a quality below twenty are eliminated.
Thanks very much Zevm will try it. Currently, I am getting a couple of errors with re-alignment but will keep trying.
email me your error zev (dot) kronenberg (at) gmail.com