Dear all,
my goal is to infer relationships between large cohort of samples. I use two measures for that - percent of shared genotype and Identity by State 2 (IBS2). I do not have other data that e.g. KING uses (I can get them but it will lead to a huge computational and network costs) so I can not "just run KING".
How IBS2 and shared genotype are defined - imagine we have two individuals X and Y and one biallelic variant with alleles A and B.
If X and Y are homozygous and equal (AA and AA or BB and BB) then IBS2 += 1.
If X and Y share one allele, their shared genotype is increased by 1. If they are homozygous and equal, increased by 2.
All of these values normalised by the maximum possible value.
The question:
Same sample, if sequenced twice, have ~100% of shared genotype and around 35% of IBS2 (since only around 35% of mutations are homozygous).
Parent-Child or Siblings have high IBS2 (around 27%) and high shared percentage of genotype (around 78%).
Unrelated individuals have around 60% of shared genotype and around 20% if IBS2.
But I have a clear cluster of samples with pairwise low IBS2 (around 17%), but really high shared genotype (around 70%)! Which means they are cousins or may be grandparents/grandkids.
Is it normal that cousins have less equal homozygous positions than two unrelated individuals, having quite high genomic overlap? (small IBS2 with high shared genotype)
(The differences are noticeable by eye and are statistically significant 100%)
(I have trios and siblings and checked the values for them, but do not have cousins in my database to check)
Your measures are frequency-dependent. If your sample contains individuals of mixed ancestry or admixed, your percentages would be off from what you expect. Contamination could also explain off-percentages (samples with high heterozygosity). Otherwise, if your sample is homogeneous in terms of ancestry, your IBS2 measure should be higher in related (any types) than in unrelated pairs.
Thanks a lot. Indeed I see different populations - they have really low shared genotype, I see clear clouds on the plots. But the ones with high shared genotype may have low IBS2 =( have no idea why