This is actually a generic question which does not have to be related to biology.
Given a numeric score "y" which looks normally distributed, and a category variable "x" which has two categories 0 & 1. To test if "y" is the same between category 0 & 1, I understand you can just do a t-test and get your p-value. The question is, how if I want to use linear regression -- e.g. glm(y ~ x)? I assumed the t-test & the glm would return the same p-value, but they didn't. Thought they were the same.
Thanks!
Could you post the code that you used, please? Also, could you state whether your experiment is balanced, that is, is there the same number of samples for category 0 as for category 1?
Code is so simply:
d<-read.delim("mydata.txt"); attach(d); d1<-subset(d[,"score"], category == 0); d2<-subset(d[,"score"], category == 1);
t.test
t.test(d1, d2, var.equal=T);
glm
summary(glm(score ~ category));
With "var.equal=F", t.test & glm gave different p-values. With "var.equal=T" they yielded the same p-value.
My experiment is not balanced and not paired.
If the y variable is only 0|1, it would be more appropriate to do a logistic regression, e.g. summary(glm(y~x, family='binomial')). This will also give you an odds-ratio, an estimate of how much an increase in x corresponds to higher/lower odds of getting y==0.
In general I think the advantages of using a regression over a t-test are two: 1) you get an odds-ratio apart from a p-value 2) you can easily add more factors in if there are other variables.
Yeah, good points. thanks.