Question

How are plink's hardcalls calculated?

1

Entering edit mode

6.1 years ago

JourneyToAbyss ▴ 250

According to plink documentation, a hardcall is saved when the distance from the nearest hardcall, defined as 0.5 * sum_i |x_i - round(x_i)| (where the x_i's are 0..2 allele dosages), is not greater than 0.1.

Truthfully, this doesn't make sense to me (especially why x_i is being summed) and was hoping if someone could help break down math.

Many thanks!

plink • 1.9k views

ADD COMMENT • link 6.1 years ago by JourneyToAbyss ▴ 250

0

Entering edit mode

So lets say you had single dosages representing only 2 alleles.

Example 1:

A/T = 1.900

0.5* (|1.900-2| + |0.100 - 0|)

= 0.5 * (0.100 + 0.100)

= 0.5 * 0.200

= 0.100

As this is not greater than 0.1, we would code the dosage as 2 (for TT).

Example 2:

A/T = 0.980

0.5 * (|0.980 -1| + |1.02 - 1|)

=0.5 * (0.020 + 0.020)

=0.5 * 0.040

= 0.020

In this example, we would code it as A and T (heterogeneous).

My concern is how do you "decide" which alleles are present. Looking at the examples and my first example, it seems obvious. But I want to be able to program logic in a python script to check my understanding and compare my results against Plink.

I appreciate your help!

ADD REPLY • link 6.1 years ago by JourneyToAbyss ▴ 250

1

Entering edit mode

Just look at when round(x_i) isn't zero.

ADD REPLY • link 6.1 years ago by chrchang523 11k

0

Entering edit mode

Clever. =)

Thank you

ADD REPLY • link 6.1 years ago by JourneyToAbyss ▴ 250

GenoMax · Accepted Answer · 2019-06-08

Suppose there are three alleles, with dosages 1.01, 0.92, and 0.07. Then the formula yields:

  0.5 * (|1.01 - 1| + |0.92 - 1| + |0.07 - 0|)
= 0.5 * (0.01 + 0.08 + 0.07)
= 0.5 * 0.16
= 0.08

which is not greater than 0.1, so the nearest hardcall (one copy of the first allele and one copy of the second allele) is saved.

The formula is a little bit complicated because it has to generalize to multiallelic variants. It does exactly what you’d expect when there are only two alleles, though.