Question

PLINK IBD calculation for given samples against the rest of the data

0

Entering edit mode

7.5 years ago

gaow • 0

PLINK IBD calculation via --genome computes IBD/IBS for all pairs of samples in the dataset. Is there a way to list specific M samples that one wants to compute against the rest of data so that it's (N - M) * M pairs of results, rather than as many as (N - 1) * (N - 1) pairs?

SNP PLINK • 2.4k views

ADD COMMENT • link updated 7.5 years ago by chrchang523 11k • written 7.5 years ago by gaow • 0

0

Entering edit mode

I've added the PLINK tag, to help those watching this tag to find this question.

ADD REPLY • link 7.5 years ago by Kevin Blighe 89k

score 0 · Answer 1 · 2018-01-28

This isn't directly supported by plink 1.9. However, if M is large enough that it isn't reasonable to just perform the entire computation and then filter for the lines of interest, the following hack will help:

Create a file (I'll call this id_order.txt) which has the M sample IDs of interest on the bottom, with the other (N-M) on top.
Use "plink --bfile ... --indiv-sort f id_order.txt --make-bed reordered" to create a new fileset with the desired sample order.
Run "plink --bfile reordered --genome --parallel k k --out ...", where k is the largest integer which isn't greater than N/(2M).
The resulting .genome.[k] file will still have a few extra lines, so you may want to use e.g. a Python script to filter them out.