Whether personalized pangenome considers allele frequency
1
0
Entering edit mode
5 weeks ago
Wang Cong ▴ 20

Hi, I am making personalized pangenome (vg haplotype). I have WGS data from a mixture of individuals (let's say HG001+ HG002 + HG003). HG001 consists of 90% of the mixture. The other two consists of 5% each. In this case, can I expect the personalized pangenome will approximate HG001 assembly? Or the personalized pangenome will approximent all 3 individual's assemblies in an equal weight?

pangenome vg • 423 views
ADD COMMENT
0
Entering edit mode
4 weeks ago
Jouni Sirén ▴ 540

The sampling algorithm only sees the k-mer counts in the reads. If 90% of the reads are from the same sample, the result will be close to a personalized pangenome for that sample. The biggest impact is probably from k-mers that are absent from the primary sample but homozygous in the other two samples. Their frequency will often be high enough that they will be classified as heterozygous.

ADD COMMENT
0
Entering edit mode

Thanks! I am looking at the documentation. How is absent/present/heterozygous determined in this process? Is it through the frequency in the whole k-mer library?

enter image description here

ADD REPLY
0
Entering edit mode

vg first estimates kmer coverage from the kmer counts. If you have 30x 150 bp reads, kmer coverage should be 21 or 22 with the default minimizer parameters. If the total frequency of a kmer and its reverse complement is around that, it is classified homozygous. Kmers with frequency close to 50% of the coverage are considered heterozygous, with the threshold being somewhere around 70%. Kmers with frequency below 10% are considered absent, and those above 250% will be ignored.

ADD REPLY

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6