Entering edit mode
6.1 years ago
ofonov
▴
20
I run hap.py twice on the same vcf file - first time without a stratification file, and second time with a stratification for GC and Low complexity regions. I am puzzled by variation in output of the tool, I get different values in precision and recall, for the same stratification category calculated in hap.py by default (no additional stratification files are needed).
Why do I observe this variation?
Type Subtype Subset Filter Genotype QQ.Field QQ METRIC.Recall METRIC.Precision
INDEL I16_PLUS TS_boundary PASS * QUAL * 0.5 0.857143
INDEL I16_PLUS TS_boundary PASS * QUAL * 0.546392 0.928571
Following code was used to run hap.py first time:
sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE
Following code was used to run hap.py second time:
sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--stratification /data/LowComplexity_GC.tsv \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE
Here is a sample of stratification file:
gc15 /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/GCcontent/human_g1k_v37_l100_gc15_slop50.bed.gz
AllRepeats_51to200bp_gt95identity_merged /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/LowComplexity/AllRepeats_51to200bp_gt95identity_merged.bed.gz