Hi,
I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for "Germline short variants discovery" for single-sample data to generate a VCF using HaplotypeCaller.
The guidelines then point to the use of CNNScoreVariants
and FilterVariantTranches
to eventually get an analysis-ready VCF. However, I have also read of pipelines using VQSR
and ApplyVQSR
at this stage to recalibrate the quality of variants and apply filters.
Has anybody tried using either and can point to pros and cons of either workflows? It seems from the documentation that the GATK team is pushing towards the more recent CNNScoreVariants
tools for this step, but I was wondering whether any drawback is known and has to be kept in mind.
According to this publication, VQSR
is not necessary and the raw unfiltered VCF actually is better in terms of balance between precision and recall. Any thoughts?