My understanding is that Haplotypecaller is specifically designed to call variant from DNA-Seq data. However, RNA-Seq has different (more complicated) allele frequency than DNA-Seq. So my question is:
- How comes that Haplotypecaller is still being used in RNA-Seq variant calling pipeline?
- Cosidering the complication in allele frequency in RNA-Seq data (genomic imprinting or allele imbalance), how possible that a variant caller can confidently call either homo or hetero variant?
- If tumor sample is used, the allele frequency will be even more complicated (purity, heterogeneity and structural variants). No currently available tool can make the confident call, right?
Any comments on these questions are really appreciated!
Isn't the third point (regarding purity, heterogeneity and structural variants) also equally applicable to DNA-seq as well as RNA-seq?
Yes. They are. My point is that there are too many complications involved to accurately call variants from RNA-Seq of tumor sample.
Even on DNA-seq data, GATK misses many genuine variant calls ('genuine' = confirmed by Sanger). On the other hand, samtools / bcftools mpileup can easily call these. The GATK 'engine' has never been quite right, but they stuck with it without bench-marking against Gold standard technologies in clinical genetics. From that, came Google's DeepVariant, which I would argue on face value is worse than GATK.
See answer here: A: Inferring genotype based on RNA sequnces