Hi,
I'd like to calculate the increase in qualitative disease score per risk allele. I've a cohort of 1000, and a single SNP, for each of the individuals, I've a score from 0 to 100. It was recommended that I use a linear regression model.
Can anyone elaborate on why you might use a linear regression model for this? Are there any other models that would fit this task? Suggestions on R packages I could use, are welcome.
Thanks
OK, I'm kind of new to this area so I am coming in a little blind. I've the score that varies between 0-100, for 1000 people. For each of those, I've the GG (most common homozygote) as 0, GA as 1 and AA (least common homozygote). What I am gathering from online is that I need to break each genotype down into it's own covariant in the model that I use, in order to get the differences between each group in terms of the score. So, I figured that I would look into a multiple linear regression model in SPSS, for which there is say a column for the score, then for each of the genotypes. The score being the dependent variable, and the genotypes being the three explanatory variables. Firstly, for example, there will be a different number of genotype values for each genotype, GG n=500, GA n=350 and AA n=150. If I would to set the score as my dependent variable n=1000, how does that work? Since the score wouldn't correspond across the row. This differing from a more standard model, where you would say have earnings as a dependent variable (n=1000), education level (n=1000) and experience (n=1000) as your explanatory variables. Do you get what I mean?
I'll given an example using R, since I don't use SPSS. There are actually two ways to go about this:
The other way is to make this an additive model so you see estimate a homozygous A interaction (i.e., whether there's a simple additive effect of the genotype or if two copies of A produce a non-linear effect).
I actually prefer R, so thank you very much for this. I believe that it is the additive model that I am looking for right now, since I would like to see what the interaction with the homozygous A.
Would you be able to suggest the function/library that is used to do that kind of analysis?
Data is currently organised something like the following:
I'll play around with the R implementation and try to get that working, but I am new to it.
Running both the lm and gam, I find that I am not getting the results that I need from the summary/anova.
(per G allele IRR 0.89, 95% confidence interval [95% CI] 0.82, 0.97; PLR ? 0.002)
So, in my case I should be seeing a percentage lower risk of the disease for AA over GA and GG.
So in the above case, GA individuals have a 11% lower risk than those of AA.
Any idea on where I should look for that information?
Not off-hand, no.