How are plink's hardcalls calculated?
1
1
Entering edit mode
5.4 years ago

According to plink documentation, a hardcall is saved when the distance from the nearest hardcall, defined as 0.5 * sum_i |x_i - round(x_i)| (where the x_i's are 0..2 allele dosages), is not greater than 0.1.

Truthfully, this doesn't make sense to me (especially why x_i is being summed) and was hoping if someone could help break down math.

Many thanks!

plink • 1.6k views
ADD COMMENT
0
Entering edit mode

So lets say you had single dosages representing only 2 alleles.

Example 1:

A/T = 1.900

0.5* (|1.900-2| + |0.100 - 0|)

= 0.5 * (0.100 + 0.100)

= 0.5 * 0.200

= 0.100

As this is not greater than 0.1, we would code the dosage as 2 (for TT).

Example 2:

A/T = 0.980

0.5 * (|0.980 -1| + |1.02 - 1|)

=0.5 * (0.020 + 0.020)

=0.5 * 0.040

= 0.020

In this example, we would code it as A and T (heterogeneous).

My concern is how do you "decide" which alleles are present. Looking at the examples and my first example, it seems obvious. But I want to be able to program logic in a python script to check my understanding and compare my results against Plink.

I appreciate your help!

ADD REPLY
1
Entering edit mode

Just look at when round(x_i) isn't zero.

ADD REPLY
0
Entering edit mode

Clever. =)

Thank you

ADD REPLY
6
Entering edit mode
5.4 years ago

Suppose there are three alleles, with dosages 1.01, 0.92, and 0.07. Then the formula yields:

  0.5 * (|1.01 - 1| + |0.92 - 1| + |0.07 - 0|)
= 0.5 * (0.01 + 0.08 + 0.07)
= 0.5 * 0.16
= 0.08

which is not greater than 0.1, so the nearest hardcall (one copy of the first allele and one copy of the second allele) is saved.

The formula is a little bit complicated because it has to generalize to multiallelic variants. It does exactly what you’d expect when there are only two alleles, though.

ADD COMMENT

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6