Comparing two VCFs and drawing a precision-recall curve
1
0
Entering edit mode
7.4 years ago
eric.kai0918 ▴ 10

Hi,

I'm trying to validate my variant calling pipeline using NA12878 genome. I downloaded NA12878 VCF file from GIAB and I sequenced NA12878 cell line in my lab and called variants using my variant calling pipeline. Finally I got two VCFs and now I want to draw a PR curve for validating my variant calling pipeline performance.

When I google it, it seems like I can draw PR curve using R package like ROCR or PRROC. How do I make input data for them using two VCFs?

Thanks.

vcf pr curve • 3.3k views
ADD COMMENT
6
Entering edit mode
7.4 years ago
Len Trigg ★ 1.6k

To keep it short and sweet, correctly comparing VCFs is non-trivial, so you should use one of the haplotype-aware methods such as vcfeval or hap.py that can take into account differences in variant representation. Both of these tools can give you P/R curves.

RTG-Tools is easy to install and use, with a nice interactive ROC/PR curve viewer (IMO, I help develop it). hap.py is a little harder to install and use but has extra reporting breakdowns by stratification regions (and it can use vcfeval as the underlying comparison engine). Maybe start with vcfeval and then move on to hap.py if you need the extra metrics breakdowns.

ADD COMMENT
0
Entering edit mode

Thank you for your kind answer. vcfeval seems to be good.

ADD REPLY
0
Entering edit mode

deleteddeleteddeleteddeleted

ADD REPLY

Login before adding your answer.

Traffic: 2151 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6