I know it is not super ideal; but on my WES data of n=~350 individuals I am trying to observe any cryptic relatedness that could be present in the sequencing data. To do so, I first QC my WES data in PLINK with MAF > 0.01 and call rate > 98% filters, leaving ~200K high quality variants for analysis (again, not so ideal for assessing relatedness, but I think it still should give an idea more or less). Then I calculate IBD scores in PLINK with --genome option.
We already identified a first-degree relative pair with this approach (PI-HAT = 0.58, confirmed by fingerprinting [i.e. high similarity between the lengths of STR markers all over the genome, done using stock sample tubes] and clinical files), and new sequencing results indicates another pair (PI-HAT = 0.5). However; interesting thing is that, I have two samples that are somehow related to every other individual in the cohort. I have to indicate that the cohort is consisting of multiple centers all over the Europe, presumably unrelated, so this observation is indeed false-positive.
Sample 1 -> Related to 346 others with 0.25 < PI-HAT < 0.38 scores.
Sample 2 -> Related to 346 others with 0.12 < PI-HAT < 0.22 scores.
All other pairs are having PI-HAT < 0.1.
What could be the reason that we have such an observation? First thing comes to my mind is of course a possible contamination, but I wouldn't expect that since these samples were run in batches of 12, and their libraries were prepared independently as well.
TLDR: In my whole-exome sequencing data, I find one individual with cryptic relatedness to all other 346 participants; what could be the reason?
Thank you for your reply! Actually I already checked it with plink2 --make-king-table too and I've got these:
Kinship score between siblings (confirmed) : 0.29
Kinship score between possible new pair: 0.19
Kinship score between Sample1 (with respect to my original post) and all others: varies between ~0.04 and ~0.08
Kinship score between Sample2 (with respect to my original post) and all others: <~0.04
Kinship score between Sample 1 and Sample 2: 0.1682! (I haven't mentioned but this was PI-HAT 0.38 before).
I am not familiar with this KING format; how would you comment on the table? Does it tell anything new? Pattern seems to be remaining.
Hi. Were you able to find out the issue that was causing this? I have a similar issue. I have a total of 5 plates joint called in GenomeStudio. 14 samples from 1 plate have a PIHAT of 0.3-0.5 with all samples in the cohort, and 4 samples from another plate give the same result. I'm trying to investigate the cause.