I am using the HPRC human pangenome as a reference for aligning my whole genome sequencing data. My focus is on the HLA region, and I anticipate improved alignment results using the pangenome reference. To achieve this, I make use of the provided HPRC data files (.gbwt, .gg, .dist, and .min) in combination with the Giraffe alignment tool. As a result of this alignment process, I now possess a gam file.
I would like to generate a VCF file that encompasses all the variants, referencing to the hg38 genome (which I understand supposed to be one of the paths in the graph). Can I directly utilize the existing files for the "vg call", or is additional preprocessing of these files necessary? What parameters should I use for the vg call command?
Thank you! It is very helpful! Where in your commands do you define the GRCh38 as reference? in -r $snarls? how can I know that path name? should I run vg snarls first?