Question

What's the reason and how to fix - Regression model summary - variable name is combined with a category character in R

0

Entering edit mode

2.0 years ago

ohtang7 ▴ 40

Hello,

I am now working with a Linear Model (and also GLM) with my data in R.

I made my model, but one thing is very awkward to me.

Below is my data format.

dataformat

And my model summary is as below.

model summary

In the variable part in the summary, you can see the categorical word 'None, Non-Central, urban, Yes' are behind to the words of variables.

I didn't see any models of the examples I've seen in the study showed this kind of result.

Is there anything I missed some option in the model script ?

What' the reason and How can I fix this phenomenon in the result of model ?

Thank you.

interpretation Model GLM Regression Summary LM • 1.0k views

ADD COMMENT • link 21 months ago by ohtang7 ▴ 40

1

Entering edit mode

This is just how R shows results for coefficients following the automatic construction of contrasts from a design.

For instance, "HomelessNone" means that for the factor "Homeless" you are testing the difference between the level "None" and the reference level, which I take to be "Exist" for your data.

So what this actually means is that the estimate for the coefficient shows the change in shannon_entropy when "Homeless" is "None" compared to "Exist", while taking all the other covariates into account. Same goes for other variables: Non-Central vs Central, Yes vs No, etc.

This is when you have only two variables, where the reference is taken to be the first level of the factor, and contrasts are built for the other level. If you had other levels you would have e.g. "subway_stationYes" (vs No) and "subway_stationMaybe" (again, vs No) as different coefficients.

If you want to change your reference level, and assuming that your alpha$Homeless and other variables are of class factor, you can do

alpha$Homeless = relevel(alpha$Homeless, ref = "None")

In which case the summary of the glm will show the coefficient name as HomelessExist, and you know you have to interpret the coefficient as Exist vs None.