I want to evaluate how many variants from a high-confidence short read consensus callset are called by long-read callers (with ONT data).
At the minute I have tried BCFtools isec and bedtools jaccard and intersect with default parameters but these feel a bit primitive.
For tools such as these, what sort of parameters are recommended e.g. requiring reciprocal overlap or filtering based on MAF? especially given that this is comparing two different sequencing technologies, Im unsure how strict to be in terms of consensus between variant calls.
For parameters such as reciprocal overlap, would people recommend altering this based on variant sizes i.e. a multi-megabase/very large deletion may require a more stringent %overlap as it is "easier" for any variant to overlap with such a large deletion by chance.
Are there other tools or methods I could use? I'm struggling to find standard methods in the literature...
Hey, not a direct answer, but I recommend you read this paper: Krusche, Peter, et al. "Best practices for benchmarking germline small-variant calls in human genomes." Nature biotechnology 37.5 (2019): 555-560.