Hi,
I am running associations in Plink and I have a covariate variable that I would like to correct for which codes for 4 different diseases. When I include this covariate as a numeric variable, e.g. 1,2,3,4, Plink works fine. However from the Manual it seems that categorical variables should be recoded as binary dummy variables, e.g. 0/1 for disease 1; 0/1 for disease 2, and so on. But when I do that, Plink gives me NA results. Maybe because of small variation in the phenotypes... My question is, how do these two methods differ? Is it ok for me to use the numeric variable instead?
Thank you for any advice.
It is a long time since I last used PLINK. However, here are my thoughts.
1) I think it is not correct to use a categorical covariate as a numeric variable, because PLINK will work on it as it would on a quantitative phenotype.
2) By coding the four diseases as a series of binary traits, you should be fine. What do you mean by "gives NA results"? Does it return an NA pvalue for all the association tests you are performing? In addition to giving NA, did PLINK give some warning? Maybe there is an issue in formatting.
3) What do you mean by "small variation in the phenotypes"? Did you try to enter one covariate at a time and see what happened? Maybe one of them is causing the problem (I don't see why, but it is worth a try)
Thank you Fabio.
What do you use instead of Plink? I am curious. I was thinking of a mixed model in GCTA, but I imagine this covariate would still cause a problem if there are not enough individuals in each group for example. 1) ok so I should not use it as numeric. Thank you. 2) The results have NA in the p-value column (no warning), which seems to happen when some categories do not have enough number of individuals. 3) Yes, there is one category that has 1% of the individuals - removing this category (when coded as binary) solves this issue, but that is not what I want (it doesn't make sense to adjust for all except this category, and I cannot collapse other categories), that is why the coding as numeric could have been a nice solution.
Thank you for your feedback, really appreicate it.
I do not perform GWAS anymore. Anyway, a lot of people use EMMAX (or similar). I would try an alternative approach (also GCTA could do); in theory, a rare covariate should have low or no effect, I don't see why it should result in NA results. I am sorry I cannot help you more. My experience is way too limited.