Hello all
I have 600 samples from which I have obtained SNPs following the GATK HaplotypeCaller step.
My goal is to find rare variants across these 600 samples and visualize them.
For example, a heatmap where the X axis has the samples and the Y axis has the different variant sites .
This means I have to filter out common variant sites occurring in 70% of the data and keeping the unique 30%
Prior to this GATK has a step called BSQR where I provided a set of gold standard known polymorphic sites to skip over. I have already done this.
If I understand correctly this is exactly like the Joint genotyping step GVCF of GATK however I want the rare variants occurring in individual samples and not the common ones. i.e. probably the data GVCF skips?
What is currently the best way to do this?