Hello, I have sequenced HG002 (Genome In a Bottle) sample using short read sequencing. The reads have been mapped to GRCh37 and I have created a BAM file suing GATK best practise workflow.
I would like to do some benchmarking of Structural Variants > 20bp. I downloaded the VCF file from https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_UnionSVs_12122017/
I would like to compare my bam file along with this VCF. However i see that the variant calls do not align correctly with my bam file. I want to know if this is the right way to do this or is there some preprocessing that needs to be done before i do this comparison.
Thanks!
Do you mean you bought the sample from NIST and sequenced it yourself?
Yes exactly.
Did you see the readme from the folder you link above? It say the following
These do not appear to be SV calls for one sample.
they are for the trio but the VCFs should be able to identify the variants for the individual genomes as well. Or is there a way to link the VCFs?