I try to understand how to chose the optimized pi_hat parameter for a dataset. In many articles, they chose 0.2 as pi_hat, and everything above that is considered to be cryptic relatedness or duplicates.
I've tested IBD on HapMap, the files I use can be found here: ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/. I first remove all annotated offspring from HapMap. Then I peform IBD to see if it still finds samples with cryptic relatedness to each other. The steps I peform are the following (in PLINK):
1) LD-prune:
plink --file hapmap --indep-pairwise 50 5 0.2
plink --file hapmap --extract plink.prune.in --recode --out hapmap_pruned
(2) IBD:
plink --file hapmap_pruned --genome --min 0.2
The results shows that many cryptic related samples can be found with a pi_hat of 0.2 as threshold, even if all offspring were initially removed. My question is, is this a normal behavior? Or should one increase the pi_hat? How to find out a "good" pi_hat for a custom dataset?
See this post How large would inbreeding coefficient be to be anomalous? "Third-degree relatives are 12.5% equal IBD (Pihat 0.125)"