This is not homework its just an example I found online.
They are testing association between HLA alleles and some binary disease here
in the data frame they have col DQDRa1
and DQDRa2
for haplotypes of HLA gene DQDR. It can be 10 different alleles which they recode into a number of dummy-variables D1, D6, D7, etc., where Di=1 if the subject has at least one
āiā-allele, and =0 otherwise. I don't get why they don't let the Di values go to 2, like with the example at index 460. This individual has diploid for DQDR, but they set D15 col to 1 instead of 2. Why do this?
I think the regression is supposed to look like this is that helps
library(data.frame)
data <- fread("http://www.math.chalmers.se/Stat/Grundutb/CTH/tms121/1011/diabetes.txt")
fit <- glm(Y ~ D6 + D7 + D9 + D10 + D11 + D12 + D13 + D14 + D15 + D99, family="binomial", data=data)
I assume because they test association for a dominant model, where 0/1 supposedly has the same effect as 1/1.
That would make too much sense :)