Question

Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file

0

Entering edit mode

4.4 years ago

tothepoint ▴ 940

I am trying to calculate Pooled Heterozygosity (hp) by identifying nMAJ and nMIN from vcf file with a sliding window 150kb. I am confused after reading papers where they calculated using formula but no particular method to calculate nMAJ and nMIN. Can you please share the way to calculate the same. I will be grateful to you all.

Hp = 2ΣnMAJΣnMIN/(ΣnMAJ + ΣnMIN)2

Thankyou

selection wgs vcf gatk • 1.7k views

ADD COMMENT • link updated 4.1 years ago by kk.mahsa ▴ 150 • written 4.4 years ago by tothepoint ▴ 940

0

Entering edit mode

Hi to thepoint

I have same question. It's been about three months since you asked the question. If you have received an answer during this time, please share it with us so that we can use it as well.

Thankful

ADD REPLY • link 4.1 years ago by kk.mahsa ▴ 150

score 0 · Answer 1 · 2021-07-14

Hi Devarora,

As far as I understand, you have to find, for every SNP the most common allele (nMAJ). You sum the count of all these major alleles in your 150 kb window. Same goes for the least common allele at every SNP (nMIN). again, you sum them in the window.

You can extract the count of alleles from the vcf (https://www.internationalgenome.org/wiki/Analysis/vcf4.0/). For this, you will probably need to program something in bash, awk, perl, python, to extract the right column and retrieve the numbers that you need to sum.

Best,

Guenole