The answers on this post helped a bit, but I figured it out myself. I found the process somewhat confusing, and the documentation slightly lacking, so I will explain my solution in some detail.
I have some VCF files, and a separate file that describes child, mother and father relationships, including the sexes of each individual. To add all this information, I basically need to use the --update-parents
and --update-sex
methods in plink
. These methods accept a file format that is not entirely 100% clear, so I'll explain it shortly.
In my case, the individuals didn't have family and within-family IDs that lend itself to this process. So first I had to change the family ID and the individual ID (also called the within-family ID), such that the ID of an individual is equal to FID + IID. To do that I call plink --file data --update-ids update_ids.txt --recode --out data_updated_ids
.
I shall explain how I made the update_ids.txt
file, mentioned above, to fix my formatting problems. Say we have an individual with the ID ABC01, ABC being the family ID, and 01 being the within-family ID. My PED file said both the FID and IID were ABC01. To fix this problem, update_ids.txt
has to contain the row:
ABC01 ABC01 ABC 01
where the columns are the original family ID, the original within-family ID, the new family ID, and the new within-family ID.
This should explain the format that the --update-parents
and --update-sex
methods use. These methods are also described here: https://www.cog-genomics.org/plink2/data#update_indiv