Question

How To Calculate Genetic Heterogeneity From Genotype Data - How Useful Is This Measure?

6

Entering edit mode

13.8 years ago

Larry_Parnell 16k

After providing answers to over 100 questions here, I now have one of my own. Actually, this is a two-part question. What tool(s) do you use to calculate genetic heterogeneity from SNP genotype data collected across an entire chromosome or genome? If the measure of heterogeneity is at or near zero, then the individual (human, animal, plant) is a product of inbreeding. This number will rise as the parents come from increasingly divergent genetic backgrounds.

That then brings up the second part of the question. For those of you who have looked into such measures of genetic diversity or heterogeneity, how useful is this and what kinds of values can I expect from the human genome-wide SNP genotypes I have? A preliminary and crude analysis gave me 69% of SNPs across chromosome 7 as homozygous, but that value rises to 92+% across two small HLA loci. That seems interesting but I don't know where to go with this.

Thanks in advance for any insight, advice.

genetics snp genotyping • 11k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 13.8 years ago by Larry_Parnell 16k

1

Entering edit mode

I know that without any doubt. My feeling is the MHC will have high homogeneity. I'm interested in any tools that can do the calculations across any range of input SNPs, provided those are in genome order, and I'm curious of others' experiences with these calculations. Thanks, Al.

ADD REPLY • link 13.8 years ago by Larry_Parnell 16k

0

Entering edit mode

Hi Larry, I am not sure if comparing the HLA loci to the genome as a whole is a fair comparison. It would be more interesting to compare to other loci in the MHC which are likely have have undergone similar historic selection.

ADD REPLY • link 13.8 years ago by Alastair Kerr 5.3k

0

Entering edit mode

I think that's quite odd this level of homozigosity in a populational sense. Are you using the entire HapMap? How many haplotype blocks?

ADD REPLY • link 13.8 years ago by Jarretinha 3.4k

0

Entering edit mode

I have genotype data for an individual across the entire genome. So, I could look at heterozygosity vs homozygosity (or rates of heterogeneity) for that individual across a chromosome or gene region or region of any size.

ADD REPLY • link 13.8 years ago by Larry_Parnell 16k

0

Entering edit mode

As a follow-up question: are there ways to also quantify heterogeneity from RNA-seq (or transcriptomics) data?

ADD REPLY • link 3.4 years ago by cwwong13 ▴ 40

score 2 · Answer 1 · 2011-02-05

2

Entering edit mode

13.8 years ago

Jan Oosting ▴ 920

You will have to correct for each SNP the level of homozygosity for the level of homozygosity within the population of interest. When I have few samples I use the hapmap frequencies for that, but with many samples it is probably better to calculate the population frequencies for each allele from your data.

For several chips I've noticed that the minor allele frequencies for the HLA region SNPs is quite low. This will give a high rate of homozygous SNPs if you do not correct for that.

I have a R script that takes population frequencies into account, but I will have to polish it up a bit before I can post it here.

ADD COMMENT • link 13.8 years ago by Jan Oosting ▴ 920

0

Entering edit mode

Thanks, Jan, for your comments and insight. If you wish to share your script, please contact me as I would be curious to give it a try.

ADD REPLY • link 13.8 years ago by Larry_Parnell 16k

Ram · Answer 2 · 2011-02-04

Well,

As I understand, genetic heterogeneity is a populational measure. For haplotye imputation, I favor BEAGLE. I think that getting good and suficient data is the hard part of the business.

Sincerely, I don't know a tool really able to calculate genetic diversity/heregeneity in a population genetics sense. Only R has useful packages/tools (DEMEtics, popgen, genetics, pegas). But even those must be hacked most of the time to accept SNP data. So, normally I develop my own approach based on the ideas in this paper. Nevertheless, there are a lot of problems with such analysis. The effective number of genes per locus is highly variable across a chromosome/genome. This discrepancy is even higher between regions with quite different recombination rate. Low diversity could simply reflect insuficient populational sampling or biased haplotype reconstruction.

Complementary to it, biased gene conversion and/or genetic hitchhiking could give you the same impression. Hence, low diversity could be the result of excess recombination in the presence of homology, selection at linked loci or low effective population size at that locus. You cannot distinguish them without a linkage map or similar.