Hi all,
I am trying to update my .fam file with newly acquired phenotypes for my subjects, but am running into a few problems.
I am trying to use plink 2 to update these phenotypes listed in a txt file. Few column examples of format of txt file:
FID IID HC MDD PTSD Bipolar
82 PB001 MDD
3 PB005 MDD
I am trying to update my .fam file phenotype column using this plink2 scirpt:
plink2 --bfile chr1_ped --no-fid --pheno phenoPRS6_widFID.txt --pheno-name
I used --no-fid as my .fam file is using subject ID's as FIDs. Example of format of .fam file:
1_PB068 1_PB068 0 0 0 -9
1_PB286 1_PB286 0 0 0 -9
2_PB039 2_PB039 0 0 0 -9
This error was obtained upon running plink2:
Error: No entries in phenoPRS6_widFID.txt correspond to loaded sample IDs.
and I'm sure this error is occuring because my .fam file is using subject ID's as FIDs and also the format of subject ID is #_subjectID. See example of .fam format above.
Questions:
- What's the best way I can update my subject IDs excluding the #_ ?
- How then can I update my fiid's to match the FID's in the txt pheno file?
- Should my script to update phenotypes work after fixing these two columns? If not, what should I change in the script?
- Does Plink2 truly work on categorical phenotypes using plink2 work? Do phenoytpes in fact have to be listed as numerical (despite the guide saying it can be categorical?)
Lots of questions, I know. Perhaps simple answers. Please help, if you can!
Thanks.
Thank you greatly. Does Plink not have an ignore fid option? Couldn't find in my search. Otherwise, will work on trying the pgen format. Do you know switching to pgen will then cause issues when going into trying to run a covariate analysis such as this:
Can I seamessly go from pgen to PED?
plink2 --make-pgen has an option to omit the FID column from its output, and from that point on you can provide --pheno/--covar/etc. input files without an FID column when working with the dataset. What plink2 won't do is ignore a --pheno/--covar/etc. FID column you provide; it's your responsibility to provide an FID column that makes sense if you're providing one at all.
Note that this functionality requires plink 2.0. You're using plink 1.9 and referring to it as "plink2", which made sense back in the day, but now the plink 2.0 binary is named "plink2" and the plink 1.9 binary is named "plink". Incidentally, plink 2.0 --linear (actually, the flag has been renamed to --glm, but --linear will still work) is often hundreds or even thousands of times faster than plink 1.9 --linear.
The ability to "seamlessly go from pgen to PED", or vice versa, has been DELIBERATELY EXCLUDED from all plink 2.0 builds so far. PED is an inefficient format that has been obsolete for around a decade; VCF has become the standard text interchange format. PED has essentially no place in new scripts, or even most older scripts that are worth maintaining. If PED still has a place in your workflow, stick to plink 1.9, and just zero out the FID columns in all the files you provide to plink.
(You can seamlessly go from pgen to plink .bed, of course.)