What is the code for male and female in plink covariate files? Is it (male = 1, female = 0), or ('1' = male, '2' = female, '0' = unknown)? Is the code for gender different in covariate and fam. files?
I'm wondering about the inconsistent statements in the the manual of plink 1.9, as showed below. Thank you in advance.
The expected coding in the .fam file is male='1', female='2'; this is then coded by plink 1.x as male=1, female=0 during the --linear/--logistic regression. In other words, you would get the same results (outside of chrX) with --linear sex as you would with --linear combined with a covariate file with a male=1, female=0 sex covariate added.
Yes, this is counterintuitive, so I got rid of this discrepancy in plink 2.0. From its --glm documentation: "Note that PLINK 2.0 encodes the .fam/.psam sex covariate as male = 1, female = 2, to match the actual numbers in the input file. This is a minor change from PLINK 1.x." So with PLINK 2.0, if you use male=1, female=2 coding in both file types, you don't have to worry about the sign of the sex beta coefficient changing on you with .fam vs. --covar.
(With that said, even with PLINK 1.x, you don't have to worry about any p-values being affected by the 1/2 vs. 1/0 coding.)
Thank you so much, chrchang523!