The result of Illumina/hap.py using the same file.
2
2
Entering edit mode
4.0 years ago
qhtjrmin ▴ 30

I am trying to use the tool 'Illumina/hap.py' from GA4GH to compare the results of variant calling tools (vcf files). Before I compared my results using this tool, I wanted to make sure that the results of this tool were reliable.

So I performed hap.py, using the same vcf file as truth and query. That is, I carried out the test with the command as shown below.

$ {HAPPY}/bin/hap.py  \
    example/happy/NA12878_chr21.vcf.gz \
    example/happy/NA12878_chr21.vcf.gz \
    -r example/Ref/reference.fa \
    -o test

In my personal opinion, I thought the F1 score should be 1.0, but the result wasn't. Even the number of TRUTH.TOTAL and QUERY.TOTAL were different.

I don't understand these results. Is there anyone who can explain why these results came out? Is it wrong to test hap.py using the same file?

Thank you for reading it.

hap.py genome • 2.1k views
ADD COMMENT
1
Entering edit mode
15 months ago

I believe this topic is covered on the following GitHub post (for hap.py):

https://github.com/Illumina/hap.py/issues/143

I could reproduce the issue. However, this only affected a relatively small number of variants in the VCF file that I tested.

It appears that can happen for a larger fraction of variants for other VCF files, but I have not tested/found that myself.

ADD COMMENT
0
Entering edit mode
15 months ago
DBScan ▴ 450

You should use the vcfeval engine from RTG tools, then you get correct results (eg. F1-score of 1 when comparing identical VCFs).

ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6