Comparing variant calls
1
0
Entering edit mode
19 months ago
eebloom ▴ 90

I want to evaluate how many variants from a high-confidence short read consensus callset are called by long-read callers (with ONT data).

At the minute I have tried BCFtools isec and bedtools jaccard and intersect with default parameters but these feel a bit primitive.

For tools such as these, what sort of parameters are recommended e.g. requiring reciprocal overlap or filtering based on MAF? especially given that this is comparing two different sequencing technologies, Im unsure how strict to be in terms of consensus between variant calls.

For parameters such as reciprocal overlap, would people recommend altering this based on variant sizes i.e. a multi-megabase/very large deletion may require a more stringent %overlap as it is "easier" for any variant to overlap with such a large deletion by chance.

Are there other tools or methods I could use? I'm struggling to find standard methods in the literature...

bedtools VCF bcftools • 819 views
ADD COMMENT
0
Entering edit mode

Hey, not a direct answer, but I recommend you read this paper: Krusche, Peter, et al. "Best practices for benchmarking germline small-variant calls in human genomes." Nature biotechnology 37.5 (2019): 555-560.

ADD REPLY
0
Entering edit mode
19 months ago

The Real Time Genomics toolset has the snpeval tool and other related ones that seem to be the most commonly used in papers dealing with variant call comparisons:

https://github.com/RealTimeGenomics/rtg-tools

Manual:

https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual.pdf

Search for papers on evaluating variant calls - there are tons of these around; in my recollection, the process is surprisingly subjective.

ADD COMMENT

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6