How To Set Filter For Frequency Of Reads And Hapmap Exome Sample Results:
1
0
Entering edit mode
12.8 years ago
Angel ▴ 220

Hey All,

I only used so far three filters for my whole exome pipeline (aligning to hg19) for a HapMap sample. I tried it on the NA19240 Hapmap sample from paper below (Table 3) which shows ~196 variants (SNPs and INDELs). http://www.nature.com/nmeth/journal/v9/n2/extref/nmeth.1810-S1.pdf

However, using my filters as below I get = ~15000 (just NONSYNONYMOUSCODING alterations). If you add INDELS, it's going to be much higher number. What am I doing wrong?

My list of filters are:

1) vcfutils varFilter -D1000 2) snpEff -minQ 20 -minCoverage 30

Could they have different filters like frequency of variants etc.? If so, how do I set these up? Any help? What are the default parameters for # of reads (minimum) and frequency in bwa,samtools?

Below is my pipeline:

  • bwa aln hg19.fa S375R1.fastq > S3751.sai
  • bwa aln hg19.fa S375R2.fastq > S3752.sai
  • bwa sampe hg19.fa S3751.sai S3752.sai S375R1.fastq S375R2.fastq > S375NoIndexL007.sam
  • samtools view -bS S375NoIndexL007.sam > S375NoIndexL007.bam
  • samtools sort S375NoIndexL007.bam S375NoIndexL007.sorted
  • Marked duplicates using picard
  • samtools index S375NoIndexL007.marked.bam
  • samtools mpileup -uf hg19.fa S375NoIndexL007.marked.bam | bcftools view -bvcg - > S375NoIndexL007.raw.bcf
  • bcftools view S375NoIndexL007.raw.bcf | vcfutils.pl varFilter -D1000 > S375NoIndexL007vard200.flt.vcf
exome filter hapmap snp indel • 3.9k views
ADD COMMENT
1
Entering edit mode
12.8 years ago

Filter your bam on mapping quality -q 20. Filter your variant quality using varfilter via samtools. Filter low depth sites using varfilter via samtools.

Realigning INDELS is very important! You should consider doing it if you care about calling INDELS correctly.

ADD COMMENT
0
Entering edit mode

HI Zev, thanks very much for your reply.

Q1 Does it matter if I re-align after marking duplicates. Just to avoid repeating the duplication?

Q2 How do I filter on maping quality. I found this link and I didn't understand th filter in the example. What filter on mapping quality do you recommend?

ADD REPLY
0
Entering edit mode

I would remove the duplicates and then do the realignment; it will go faster. Samtools view: '-q INT Skip alignments with MAPQ smaller than INT [0]'. 20 is what we use. This mean those reads that map with a quality below twenty are eliminated.

ADD REPLY
0
Entering edit mode

Thanks very much Zevm will try it. Currently, I am getting a couple of errors with re-alignment but will keep trying.

ADD REPLY
0
Entering edit mode

email me your error zev (dot) kronenberg (at) gmail.com

ADD REPLY

Login before adding your answer.

Traffic: 1589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6