Question

The result of Illumina/hap.py using the same file.

2

Entering edit mode

4.4 years ago

qhtjrmin ▴ 30

I am trying to use the tool 'Illumina/hap.py' from GA4GH to compare the results of variant calling tools (vcf files). Before I compared my results using this tool, I wanted to make sure that the results of this tool were reliable.

So I performed hap.py, using the same vcf file as truth and query. That is, I carried out the test with the command as shown below.

$ {HAPPY}/bin/hap.py  \
    example/happy/NA12878_chr21.vcf.gz \
    example/happy/NA12878_chr21.vcf.gz \
    -r example/Ref/reference.fa \
    -o test

In my personal opinion, I thought the F1 score should be 1.0, but the result wasn't. Even the number of TRUTH.TOTAL and QUERY.TOTAL were different.

I don't understand these results. Is there anyone who can explain why these results came out? Is it wrong to test hap.py using the same file?

Thank you for reading it.

hap.py genome • 2.5k views

ADD COMMENT • link updated 20 months ago by DBScan ▴ 470 • written 4.4 years ago by qhtjrmin ▴ 30

score 1 · Answer 1 · 2023-08-10

I believe this topic is covered on the following GitHub post (for hap.py):

https://github.com/Illumina/hap.py/issues/143

I could reproduce the issue. However, this only affected a relatively small number of variants in the VCF file that I tested.

It appears that can happen for a larger fraction of variants for other VCF files, but I have not tested/found that myself.

score 0 · Answer 2 · 2023-08-11

0

Entering edit mode

20 months ago

DBScan ▴ 470

You should use the vcfeval engine from RTG tools, then you get correct results (eg. F1-score of 1 when comparing identical VCFs).

ADD COMMENT • link 20 months ago by DBScan ▴ 470