I am relatively new to structural variant calling so I apologise in advance if I seem unaware of certain things.
I am currently trying to call structural variants using multiple callers: (Manta, smoove, and GRIDSS) by merging their individual outputs using SURVIVOR. I'm attempting to compare the outputs I get from this process to a truthset called HG002_v0.6.
I want to see if there are any structural variants from the samples I am calling which have a stronger link to or impact on certain genes when compared to others, but I'm also trying to filter out as many false positives as possible.
I have also tried filtering the individual VCF files using duphold's added annotations (and recommended filtering on their github page), filtering via PASS only, and removing structural variants below 50bp in length.
For GRIDSS I have also annotated the calls using the simple_event_annotation.R script supplied by the GRIDSS github repository and used SVTyper for genotyping to make a merge with SURVIVOR possible for a GRIDSS VCF file.
In your opinion, what would be more important of a metric to determine if the structural variant calling is tuned properly, a higher recall? Or a higher precision. Currently I am achieving ~15-20% recall and >90% Precision.
Hello, sorry to bother you. After reading your article, as a beginner, I have some doubts. Did you successfully genotype the vcf file of Gridss using svtyper? Why did I make some errors and couldn't find a solution.
Looking forward to your reply. Wishing you all the best