Hi all,
I am experimenting with logistic regression for combinations of two SNPs to assess their combined effects on risk of disease.
I have some different permutations of these two SNPs (het_het
, hom_het
) etc and I've assigned them factor levels. I have set one of the factor levels as the reference level (ref_ref). I then run my regression and get coefficients for each factor level with respect to the reference. My issue is that if I perform separate analyses, let's say one where it's just het_het
vs. ref_ref
, the coefficients are different to when I include them together.
My understanding is that this shouldn't be the case, so I'm a bit puzzled. The covariates are always the same and ref group is always the same. There is just a big difference depending on whether I include all factor levels or test them one at a time vs. ref.
Any ideas or pointers?
When you refer to separate analysis do you mean splitting your dataset and then fitting multiple regressions? If you do this you would expect the coefficients to be different from the full dataset since you are giving it a reduced subset of your larger dataset for each fit.
Hm, yes I think that's what I'm doing actually. Would it be better to just include all the factor levels together? Should the separate analyses work out alright if I add another factor to capture all "other" combinations and thus not reduce the size of the dataset?
I would include them all in one model.
As a side note, if you have complex model designs its often easier to use a package like
emmeans
to work with and explore your fit model.