Question

HaplotypeCaller filter out variants covered only in one direction

0

Entering edit mode

7.3 years ago

stanedav ▴ 50

Hello,

I would like to ask you, if anyone has experience with filtering out variants, that are not covered in both directions by reads (WES and Gene panels). I am using GATK workflow with haplotype caller and hard filtering, but after all steps it remains even the variants, which are covered only 2+ 0- or 3+ 0- and I want these variants with zero in any direction not to be called.

Any ideas? Thank you very much

Here is syntax of mine HaplotypeCaller command:

java -jar $gatk -T HaplotypeCaller -R hg19.fasta -L bedfile -I sample.bam -o output.g.vcf -nct 16 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --dontUseSoftClippedBases

variant calling GATK Haplotype Caller • 3.0k views

ADD COMMENT • link updated 7.3 years ago by pfs ▴ 280 • written 7.3 years ago by stanedav ▴ 50

score 0 · Answer 1 · 2017-08-22

0

Entering edit mode

7.3 years ago

pfs ▴ 280

This link may help: https://gatkforums.broadinstitute.org/gatk/discussion/4939/haplotypecaller-strandbiasbysample-annotation . This option will probably require post processing of the VCF output file. I would recommend going back in your pipeline and removing any steps that removes duplicates. Then calculating a strand bias ratio. Then using this ratio value as a filter.

ADD COMMENT • link 7.3 years ago by pfs ▴ 280

1

Entering edit mode

You do want to calculate StrandBias, unless you are doing amplicon sequencing though you absolutely do not want to remove duplicate filtering. That is just adding noise and bias into your data.

ADD REPLY • link 7.3 years ago by DG 7.3k

0

Entering edit mode

Good afternoon Dan, It is unclear to me why you would not calculate a strand bias ratio for amplicon sequencing. There should be reads associated with both the forward and reverse primers. With respect to removing duplicates, if you have computational/time constraints I understand removing duplicates otherwise I would argue you are introducing bias and removing signal by removing duplicates. Given the low read depth the poster gave why would you remove signal? How would you calculate an accurate strand bias ratios if you are removing duplicates?

ADD REPLY • link 7.3 years ago by pfs ▴ 280

0

Entering edit mode

I think you misinterpreted my meaning. In all cases the OP wants to use StrandBias as a filter. However, if you are working with any form of hybridization data you absolutely should be doing duplicate removal. If you are working with amplicon data you should skip doing duplicate removal because you will be throwing out legitimate data. The original poster stated they are working with both WES and Gene panel data, the gene panel data could be coming from amplicon or hybrid capture approaches. If they are not doing amplicon sequencing they should never skip PCR duplicate removal.

ADD REPLY • link 7.3 years ago by DG 7.3k