Question

Please help me understand linkage disequilibrium

2

Entering edit mode

3.2 years ago

? ▴ 60

I read many explanations about LD but I'm still not comfortable with the explanations.

https://www.youtube.com/watch?v=iH8b-5BxtuY

as you can see on these videos, most of them explains as if LD is about the relation between the genotype frequency of parental cell and gametes made from it. But if so, I should have the genotype frequency of the parental cell and the frequency for all the gametes made which is impossible.

Or is it's the relation between the genotype frequency for each loci(such as 0.6 for A and 0.4 for a in one loci and 0.6 for B and 0.4 for b in another loci) of the parental population and the combined genotype frequency of the sibling population(such as 0.36 for AB , 0.16 for ab)? if so, isn't it also impossible to get the frequency of the parental population?

I suppose I'm not getting it right I'm so confused.

For my situation, if I look at the VCF file I made, I have genotypes of 0|0 1|0 0|1 1|1 for each SNP.

For simplicity, if there are two samples(A and B) and I want to see how linked two SNP positions 1,2 are, how do I find out?

let's say sample A SNP position 1 has a genotype of 0|0 and position 2 has a genotype of 1|0. and sample B SNP position 1 has a genotype of 0|1 and position 2 has a genotype of 1|1

is it possible to calculate the linked relationship? or are there other values required.

Please help me

LD Linkage disequilibrium SNP • 3.6k views

ADD COMMENT • link updated 3.2 years ago by i.sudbery 20k • written 3.2 years ago by ? ▴ 60

score 6 · Accepted Answer · 2021-09-08

In the beginning was the Gene. As originally envisaged genes were atomic (i.e. indivisible) and inherited independently. That means if gene 1 has alleles A and a, and gene 2 has alleles B and b, then the allele you inherit for gene 1 should not depend on the allele you inherit for gene 2. But this isn't true because in physical reality genes are linked to each other on chromosomes.

In the case of a single cross, this clearly isn't true. Consider following parents:

Mother:
chromosome copy 1:  ----A--------B-----
chromosome copy 2:  ----A--------B-----

Father:
Chromosome copy 1:  ----a--------b-----
Chromosome copy 2:  ----A--------B-----

Possible offspring:
----A--------B-----      or     ----A--------B-----
----A--------B-----             ----a--------b-----

Offspring from this cross will always inherit an A and a B from the mother, but 50% will inherit A and 50% a and 50% B and 50%b from the father.

Under independent inheritance, the genotypes AABB AaBB AABb and AaBb should be equally likely (you can do a punnet's square to check). But that isn't the case because the offspring either in inherits chromosome copy 1 or chormosome copy 2 from the father, so the only possible genotypes for the offspring are AABB (if the offspring inherits copy 2 from the father) and AaBb (if the offspring inherits copy 1 from the father). This is the phenomena of Linkage

All of this is assuming that recombination won't form a ------A-------b----- chromosome in the father. While this is unlikely to happen in one cross, over a population and across evolutionary time, the association of a with b and A with B will break down and you will get lots of -----A--------b---- and -----a------B---- chromosomes. when the probability of being hetrozyous at both loci is equal to the probability of being hetrozygous at gene 1 multiplied by the probability of being hetrozygous at gene 2, then the loci are said to be in linkage equilibrium.

If this is not that case, and the probability of having b rather than B at loci 1 depends on whether you have a rather than A at loci 2, then the loci are said to be in Linkage disequilibrium. At its extreme, for 2 loci in complete LD, if you tell me what the allele at loci 1 is, I can tell you what the allele at loci 2 is.

EDIT: after rereading the question.

What I've drawn above are haplotypes, not genotypes. In your question you write 0|1, which suggests your data is phased. Unphased data would normally be written 1/0. So if you phased genotypes are 0|1, 0|1, then your haplotypes are 00 and 11.
LD is a property of a population, not of a single individual, or even 2 individuals -you can only calculate LD if you have the genotypes of a large population. However, Linkage is not the same as linkage disequilibrium, and you can calculate Linkage from a collection of parents and their offspring (how often do you see a haplotype in the offspring that is not present in the parents). But no on really calculates linkage any more since genome sequences made genetic mapping unnecessary.