Is there a reason to limit dummy coding these alleles as 0 or 1 in this regression model example, instead of allowing values of 0,1,or 2?
0
0
Entering edit mode
4.1 years ago
curious ▴ 820

This is not homework its just an example I found online.

They are testing association between HLA alleles and some binary disease here

in the data frame they have col DQDRa1 and DQDRa2 for haplotypes of HLA gene DQDR. It can be 10 different alleles which they recode into a number of dummy-variables D1, D6, D7, etc., where Di=1 if the subject has at least one ā€™iā€™-allele, and =0 otherwise. I don't get why they don't let the Di values go to 2, like with the example at index 460. This individual has diploid for DQDR, but they set D15 col to 1 instead of 2. Why do this?

I think the regression is supposed to look like this is that helps

library(data.frame)
data <- fread("http://www.math.chalmers.se/Stat/Grundutb/CTH/tms121/1011/diabetes.txt")
fit <- glm(Y ~ D6 + D7 + D9 + D10 + D11 + D12 + D13 + D14 + D15 + D99, family="binomial", data=data)
logistic regression gwas • 874 views
ADD COMMENT
1
Entering edit mode

I assume because they test association for a dominant model, where 0/1 supposedly has the same effect as 1/1.

ADD REPLY
0
Entering edit mode

That would make too much sense :)

ADD REPLY

Login before adding your answer.

Traffic: 1823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6