Struggling with biostats and association study designs.
I initially wanted to do an association study comparing two populations and seeing which SNPs are significant. For example, I am looking at just variation between centenarians (people who live >100years) and a control group. Should I be including age as a covariate? I am interested in detecting longevity variants or anything that suggests a difference from the control group. I do not think age would be necessary since it is not a confounder because it is not an independent covariate. I think I was reading that adding independent covariates can decrease power of the study.
When do you guys start considering modeling with linear or logistic regressions instead of GWAS? Is it when you have a dependent variable and a predictor variable you are interested in? Would you guys think I should have added covariates?
Or is another way possible to change the case and control population to groups that reflect covariate status? For example, If I was interested in centenarians with Alzheimer's compared to a control population with Alzheimer's, a logistic regression with Alzheimer status be more appropriate than running a GWAS on them?
Sorry for so many questions.
I think this is a correct answer. Age should not be a covariate because it is dependent on your response variable. An alternative could be to model a poisson regression on the age, instead of a logistic regression, which would give you how much each SNP status leads to an increase of one year of age (instead of categories centanarians/non centenarians). Another approaches is to model the logistic regression as the probability of reaching the maximum age (e.g. follow the example here: http://www.r-bloggers.com/generalised-linear-models-in-r/ )
Other factors that can be included as covariates, apartfrom the PC components, are: 1) the sequencing center and the 2) technology used to sequence, if they are different; 3) the location where the samples were taken.
I really appreciate the links to further reading since I am currently teaching myself as well!
Regarding regression vs association study, I think what I meant was doing a simple Chi-squared or Fisher's of the allele frequencies versus linear/logistic regression. Sorry for the confusion, but the correct approach would be to do an association study with Fisher's exact test between my centenarians against controls. However, if I wish to add disease status into the analysis, would I use logistic regression with for example Alzheimer's as my outcome variable, genotype as my predictor, and centenarians status as my covariate?
Thanks!