Question

Regression factor levels question

0

Entering edit mode

20 months ago

Jára • 0

Hi all,

I am experimenting with logistic regression for combinations of two SNPs to assess their combined effects on risk of disease.

I have some different permutations of these two SNPs (het_het, hom_het) etc and I've assigned them factor levels. I have set one of the factor levels as the reference level (ref_ref). I then run my regression and get coefficients for each factor level with respect to the reference. My issue is that if I perform separate analyses, let's say one where it's just het_het vs. ref_ref, the coefficients are different to when I include them together.

My understanding is that this shouldn't be the case, so I'm a bit puzzled. The covariates are always the same and ref group is always the same. There is just a big difference depending on whether I include all factor levels or test them one at a time vs. ref.

Any ideas or pointers?

logistic-regression snp • 526 views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 20 months ago by Jára • 0

1

Entering edit mode

When you refer to separate analysis do you mean splitting your dataset and then fitting multiple regressions? If you do this you would expect the coefficients to be different from the full dataset since you are giving it a reduced subset of your larger dataset for each fit.

ADD REPLY • link 20 months ago by rpolicastro 13k

0

Entering edit mode

Hm, yes I think that's what I'm doing actually. Would it be better to just include all the factor levels together? Should the separate analyses work out alright if I add another factor to capture all "other" combinations and thus not reduce the size of the dataset?

ADD REPLY • link 20 months ago by Jára • 0

1

Entering edit mode

I would include them all in one model.

As a side note, if you have complex model designs its often easier to use a package like emmeans to work with and explore your fit model.

ADD REPLY • link 20 months ago by rpolicastro 13k