Question

Regression models on genetic data

0

Entering edit mode

5.6 years ago

mel22 ▴ 100

Hello,

I am using unconditional logistic regression to modelise genetic effect and genetic*environment exposure effect on my outcome.

My results a bit strange :

When modeling only main variants effect, I have no SNP associated

When modeling with interaction term exposure:SNP with additive term , I have a strong significant signal only for additive term ( towo SNP with p<<10e-8) and nothing for interaction.

I am using 0,1 and 2 codes for SNP (effect allele) and a continuous exposure variable.

I am working on case control study ( 2300 subjects) and testing 7000 SNPs

Can this be a reliable result ? How could this be explained ?

Thank you very much !

R SNP • 1000 views

ADD COMMENT • link updated 5.6 years ago by Lemire ▴ 940 • written 5.6 years ago by mel22 ▴ 100

2

Entering edit mode

What, precisely, is your model formula? - outcome ~ exposure:SNP + SNP

Working with regression models can be difficult (and 'risky') - basically, it is possible to find a statistically significant p-value by messing around with the model formula; however, the models may be meaningless. Without also looking at the standard errors, the beta coefficients, and odds ratios, one cannot really make any interpretation based solely on the p-value. Also, should you be adjusting for population stratification?

ADD REPLY • link 5.6 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you Kevin, I am adjusting on PCA and this a result of metaanlysis of two different studies. my model is outcome ~ exposure:SNP + SNP+ exposure+ other cofactors than I did the metaanlysis from wich I get the significant result

ADD REPLY • link 5.6 years ago by mel22 ▴ 100

0

Entering edit mode

Ah, a model formula like this:

outcome ~ exposure:SNP + SNP+ exposure

...is the same as:

outcome ~ exposure * SNP

, i.e., it is a multiplicative model, also sometimes called the 'log-additive model'. Perhaps this may assist in the interpretation? As an example, I conducted a similar study in 2016 (but it was conditional regression with Family ID as the matched strata) and I also used a multiplicative model. How are the standard errors?

Lemire has provided an answer, below.

ADD REPLY • link 5.6 years ago by Kevin Blighe 89k

score 2 · Answer 1 · 2019-08-26

In your regression equation, you have the following terms:

beta_s * SNP + beta_i * SNP * exposure (ignoring the other ones you may have)

The estimate for beta_s (from which you derived your significance) is the slope of the effect of the SNP on your outcome when the exposure variable is equal to 0. That's how you need to interpret your result. The only thing you can say from your output is that your SNP has a significant effect when the exposure is 0. If your exposure would be equal to, e.g., 2, then the effect (slope) of the SNP would be beta_s+2*beta_i (which would have a different sd thus a different significance level). Don't overinterpret each coefficient taken separately.