Calculation of heterozygosity at multi allelic region from 1000 Genomes data
0
0
Entering edit mode
8.9 years ago
suimye • 0

Hi all,

I would like to ask how to calculate heterozygosity from 1000 genomes data.

In the ENCODE study, one of the diversity index shown in Figure 1 was calculated from YRI population. The authors were written that "Heterozygosity was calculated basewise as 2pq, where p and q are allele frequencies estimated from the pilot sample of the 1000 Genomes YRI population". However, in the sample of 1000 genome data, there are a lot of multiallelic SNV such as

22      16051453        rs62224611      A       C,G     100     PASS    AC=478,17;AF=0.0954473,0.00339457;AN=5008;NS=2504;DP=22548;EAS_AF=0.0744,0;AMR_AF=0.1239,0;AFR_AF=0.003,0;EUR_AF=0.0746,0.003;SAS_AF=0.2434,0.0143;AA=.|||;VT=SNP;MULTI_ALLELIC

How can I calculate 2pq from this?

I assume that variations of heterozygosity in this case are "AC", "AG" and "CG".

For calculation of heterozygosity "H", allele frequencies are

Allele A: p

Allele C: q

Allele G: r,

Then,

H = 2pq + 2pr + 2rq.

Is this OK?

Thanks a lot!

suimye

snp genome • 2.8k views
ADD COMMENT
2
Entering edit mode

I am not familiar with that paper but they may have used only bi-allelic sites. Please note that your variant above is in fact bi-allelic in YRI because the AFR super-population lacks a carrier for the G allele (INFO:AFR_AF=0.003,0;).

ADD REPLY
0
Entering edit mode

Thanks, reading and comment!

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6