Hello. I encountered an issue while interperting human variants. In some cases varinats that in IGV are seen as one long variants are split to several variants in the vcf file produced by GATK4 or other variant callers.
Here is an example:
The IGV clearly shows a deletion of 11 bases in the sequence, but in the output of GATK4 (in the right), this variant is split to two adjacent variants.
Another example that is a bit different: The IGV shows a deletion of 2 bases in the sequence and a substitution of one base. These variants are reported as two variants in the output of GATK4 (in the right), but if I compare between the WT and mutent sequences, I get a single variant of delTTTinsA:
WT TTTCCGAAGCGAACCTCTCTGCCTTTTGTTTCTTTCTTTCGCTTCTTTTTCACTTTATTTTTTTCCTTCTGCTGCTTGGCATTAGAA
Mut TTTCCGAAGCGAACCTCTCTGCCTTTTGTTTCTTTCTTTCGCTTCTTTTTCACTTTAATTTTCCTTCTGCTGCTTGGCATTAGAA
Why GATK interpretate variants in this way? Thank you.
Hi, did you figure out what is causing this split?
To troubleshoot that sort of thing, you need to re-run HaplotypeCaller with the -bamout argument so you can see what it was "thinking". If the call still doesn't make sense, make a post in the GATK forum so the GATK support team can look at it. Good luck!