Question

Calculation of heterozygosity at multi allelic region from 1000 Genomes data

0

Entering edit mode

9.5 years ago

suimye • 0

Hi all,

I would like to ask how to calculate heterozygosity from 1000 genomes data.

In the ENCODE study, one of the diversity index shown in Figure 1 was calculated from YRI population. The authors were written that "Heterozygosity was calculated basewise as 2pq, where p and q are allele frequencies estimated from the pilot sample of the 1000 Genomes YRI population". However, in the sample of 1000 genome data, there are a lot of multiallelic SNV such as

22      16051453        rs62224611      A       C,G     100     PASS    AC=478,17;AF=0.0954473,0.00339457;AN=5008;NS=2504;DP=22548;EAS_AF=0.0744,0;AMR_AF=0.1239,0;AFR_AF=0.003,0;EUR_AF=0.0746,0.003;SAS_AF=0.2434,0.0143;AA=.|||;VT=SNP;MULTI_ALLELIC

How can I calculate 2pq from this?

I assume that variations of heterozygosity in this case are "AC", "AG" and "CG".

For calculation of heterozygosity "H", allele frequencies are

Allele A: p

Allele C: q

Allele G: r,

Then,

H = 2pq + 2pr + 2rq.

Is this OK?

Thanks a lot!

suimye

snp genome • 3.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by suimye • 0

2

Entering edit mode

I am not familiar with that paper but they may have used only bi-allelic sites. Please note that your variant above is in fact bi-allelic in YRI because the AFR super-population lacks a carrier for the G allele (INFO:AFR_AF=0.003,0;).

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.5 years ago by trausch ★ 2.0k

0

Entering edit mode

Thanks, reading and comment!

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.5 years ago by suimye • 0