Question

How accurate is the IBD calculation by plink?

2

Entering edit mode

9.0 years ago

MAPK ★ 2.1k

I was trying to calculate the IBD values for about 100 individuals all likely to be unrelated. I tried to use plink tool ( http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml ), but looks like it generates to many false positives (or high IBDs for unrelated individuals). I have one sample with at least 5 other samples with IBD =1 (I am looking at Z0 values). Can someone please explain me what these values mentioned in their website are:

Z0  P(IBD=0)
Z1  P(IBD=1)
Z2  P(IBD=2)
PI_HAT  Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)

plink IBD • 16k views

ADD COMMENT • link 9.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

It's possible that your unrelated individuals are actually related, or sample swaps?

ADD REPLY • link 9.0 years ago by Matt Shirley 10k

0

Entering edit mode

It's also known that PLINK's IBS calculations aren't that great. The kcoeff paper has some comparisons.

ADD REPLY • link 9.0 years ago by Matt Shirley 10k

0

Entering edit mode

Have you carefully QC'ed your genotypes like what you would do for GWAS analysis? Poor quality genotypes would give you wrong calculations, but it's not the fault of IBD.

ADD REPLY • link 8.0 years ago by Zhenyu Zhang ★ 1.3k

score 6 · Answer 1 · 2016-04-27

6

Entering edit mode

9.0 years ago

leekaiinthesky ▴ 180

These are not false positives!

In fact, they are not positives at all. As you yourself wrote, Z0 is the probability that at a given locus 0 alleles are identical by descent. In other words, if your samples are unrelated, you should expect a Z0 close to 1.

PI_HAT is a measure of overall IBD alleles. If your samples are unrelated, you should expect a PI_HAT close to 0.

Z0, Z1, and Z2 segregate out the probabilities of having IBD of 0, 1, or 2 over the loci, which gives you a way of discriminating between relationship types. Ideal parent-offspring has (Z0, Z1, Z2) = (0, 1, 0), i.e. all loci have one allele identical by descent; ideal full sibling = (1/4, 1/2, 1/4), i.e. 25% of loci have 0 alleles IBD, 50% have 1 allele IBD, 25% have 2 alleles IBD; etc.

ADD COMMENT • link 9.0 years ago by leekaiinthesky ▴ 180

0

Entering edit mode

Thanks. So do I need to compare PI_HAT to get the actual relationships between the individual which is supposedly between 0 to 1?

ADD REPLY • link 9.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

Yes, PI_HAT is a summary statistic that will give you overall IBD proportion. But Z0, Z1, and Z2 are also helpful to understand for distinguishing between relationship types, so it's useful to take the time to understand what all four measures mean.

ADD REPLY • link 9.0 years ago by leekaiinthesky ▴ 180

0

Entering edit mode

Thanks, but Pi_HAT values don't make sense at all (unless I am doing something wrong). I am getting 0 for the same individuals, where it is supposed to be 1 (IBD=1 , when compared to same or monozygotic individuals?)

ADD REPLY • link 9.0 years ago by MAPK ★ 2.1k

1

Entering edit mode

You may be confusing Z0, Z1, Z2, and PI_HAT. First take some time to understand their relationship.

ADD REPLY • link 9.0 years ago by leekaiinthesky ▴ 180

2

Entering edit mode

Where is good place to start to understand this relationship? It seems like plink documentation would rather give 5 hints than 1 explanation.

ADD REPLY • link 4.3 years ago by bjwiley23 ▴ 40

0

Entering edit mode

Also, I am using only 27000 SNPs (LD pruned and quality filtered) for 150 samples. Do you think the number of SNPs is the issue here?

ADD REPLY • link 9.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

Supposedly, that should be enough.

ADD REPLY • link 9.0 years ago by leekaiinthesky ▴ 180

score 1 · Answer 2 · 2016-04-27

1

Entering edit mode

9.0 years ago

Matt Shirley 10k

If you want an independent method to compare to I suggest trying kcoeff which estimates k0, k1, and k2 which are the portion of the genome shared IBS0/1/2.

ADD COMMENT • link 9.0 years ago by Matt Shirley 10k

0

Entering edit mode

Thanks. I am getting IBD = 1 for 1 sample with multiple samples. So this can't be true unless the samples are duplicated.

ADD REPLY • link 9.0 years ago by MAPK ★ 2.1k

1

Entering edit mode

You should be able to tell if the samples are exactly duplicated by looking at the data. Otherwise, they might have been duplicated during sample handling before the genotyping.

ADD REPLY • link 9.0 years ago by Matt Shirley 10k