Hi, I am facing a strange issue with GATK 3.4. I have a set of PE fastq files. I first ran the variant calling pipeline using the following steps:
bwa mem --> sam to bam --> mark duplicates using picard --> RealignerTargetCreator GATK 3.4 --> IndelRealigner --> HaplotypeCaller --> GenotypeGVCFs
Then I saw that the probe sequences that were used to design the region of interest (its a custom-designed panel) overlapped with the some variants of interest. So, the genotypes of some of these variants was 0/1 (due to probes seqs in those positions) instead of the the true 1/1 (I am using a gold standard sample).
So, then I clipped the first 27 bases of each read (the probes are all around that range 22-31 nts with 27 being the most common length). And then I ran the same above pipeline but some of the variants were not called in spite of all reads having those variants.
I am attaching a snapshot of the bam file alignment with the variant in orange in the center. Top track is the first case (before clipping the probes) where the probes are present in the upper reference block without the variant and the lower block with the insert that carry the variant and hence the 0/1 call.
Lower track is after the clipping of the probe sequences and so only the insert with the variant are left. But the variant is still not called in the VCF file.
I read in another thread that this happens because the tool needs 50bp on either side to do proper reassembly. Is that correct? If so, how do I get around it? Should I not use the IndelRealignment? Any other suggestion to solve this issue?
Thanks a bunch in advance!!
Just to let you all know that using Unified Genotyper called the variants that HC missed. So using UG now.
Great - thanks for the feedback. Be wary of the Unified Genotyper, though (not even recommended for use anymore, as far as I'm aware).