Hi, I am making personalized pangenome (vg haplotype). I have WGS data from a mixture of individuals (let's say HG001+ HG002 + HG003). HG001 consists of 90% of the mixture. The other two consists of 5% each. In this case, can I expect the personalized pangenome will approximate HG001 assembly? Or the personalized pangenome will approximent all 3 individual's assemblies in an equal weight?
Thanks! I am looking at the documentation. How is absent/present/heterozygous determined in this process? Is it through the frequency in the whole k-mer library?
vg first estimates kmer coverage from the kmer counts. If you have 30x 150 bp reads, kmer coverage should be 21 or 22 with the default minimizer parameters. If the total frequency of a kmer and its reverse complement is around that, it is classified homozygous. Kmers with frequency close to 50% of the coverage are considered heterozygous, with the threshold being somewhere around 70%. Kmers with frequency below 10% are considered absent, and those above 250% will be ignored.