Question

Update FAM file Plink for fid and pheno

0

Entering edit mode

3.7 years ago

kstafford32 • 0

Hi all,

I am trying to update my .fam file with newly acquired phenotypes for my subjects, but am running into a few problems.

I am trying to use plink 2 to update these phenotypes listed in a txt file. Few column examples of format of txt file:

FID IID HC  MDD PTSD    Bipolar     
82  PB001       MDD             
3   PB005       MDD

I am trying to update my .fam file phenotype column using this plink2 scirpt:

plink2 --bfile chr1_ped --no-fid --pheno phenoPRS6_widFID.txt --pheno-name

I used --no-fid as my .fam file is using subject ID's as FIDs. Example of format of .fam file:

1_PB068 1_PB068 0 0 0 -9
1_PB286 1_PB286 0 0 0 -9
2_PB039 2_PB039 0 0 0 -9

This error was obtained upon running plink2:

 Error: No entries in phenoPRS6_widFID.txt correspond to loaded sample IDs.

and I'm sure this error is occuring because my .fam file is using subject ID's as FIDs and also the format of subject ID is #_subjectID. See example of .fam format above.

Questions:

What's the best way I can update my subject IDs excluding the #_ ?
How then can I update my fiid's to match the FID's in the txt pheno file?
Should my script to update phenotypes work after fixing these two columns? If not, what should I change in the script?
Does Plink2 truly work on categorical phenotypes using plink2 work? Do phenoytpes in fact have to be listed as numerical (despite the guide saying it can be categorical?)

Lots of questions, I know. Perhaps simple answers. Please help, if you can!

Thanks.

FID fam plink phenotype • 7.1k views

ADD COMMENT • link updated 3.7 years ago by chrchang523 11k • written 3.7 years ago by kstafford32 • 0

score 1 · Answer 1 · 2021-03-23

1

Entering edit mode

3.7 years ago

chrchang523 11k

--no-fid says there is no FID column in your .fam file, which is not true in your case. I realize you want it to mean "ignore the FID in my --pheno .txt file", but that is not what it means. As for your specific questions:

See 2.
It's up to you how you want to do this, but the bottom line is that your sample-info file and your --pheno file need to use the same IDs. You might want to switch from --bfile/--make-bed to plink2's --pfile/--make-pgen format, since you can then have a .psam file with just a single #IID column, and your --pheno file can also have only IIDs without FIDs.
I don't know, because you didn't post your full command line.
--glm automatically expands categorical covariates into the appropriate number of dummy variables. However, multinomial logistic regression on categorical phenotypes is not currently supported.

ADD COMMENT • link 3.7 years ago by chrchang523 11k

0

Entering edit mode

Thank you greatly. Does Plink not have an ignore fid option? Couldn't find in my search. Otherwise, will work on trying the pgen format. Do you know switching to pgen will then cause issues when going into trying to run a covariate analysis such as this:

plink --bfile data --covar covar_data.txt --pheno phenoPRS6_widFID.txt --all-pheno --linear hide-covar --out logistics_data_results

Can I seamessly go from pgen to PED?

ADD REPLY • link 3.7 years ago by kstafford32 • 0

0

Entering edit mode

plink2 --make-pgen has an option to omit the FID column from its output, and from that point on you can provide --pheno/--covar/etc. input files without an FID column when working with the dataset. What plink2 won't do is ignore a --pheno/--covar/etc. FID column you provide; it's your responsibility to provide an FID column that makes sense if you're providing one at all.

Note that this functionality requires plink 2.0. You're using plink 1.9 and referring to it as "plink2", which made sense back in the day, but now the plink 2.0 binary is named "plink2" and the plink 1.9 binary is named "plink". Incidentally, plink 2.0 --linear (actually, the flag has been renamed to --glm, but --linear will still work) is often hundreds or even thousands of times faster than plink 1.9 --linear.

The ability to "seamlessly go from pgen to PED", or vice versa, has been DELIBERATELY EXCLUDED from all plink 2.0 builds so far. PED is an inefficient format that has been obsolete for around a decade; VCF has become the standard text interchange format. PED has essentially no place in new scripts, or even most older scripts that are worth maintaining. If PED still has a place in your workflow, stick to plink 1.9, and just zero out the FID columns in all the files you provide to plink.

(You can seamlessly go from pgen to plink .bed, of course.)

ADD REPLY • link 3.7 years ago by chrchang523 11k