I did a survival analysis in two ways. Firstly, I dichotomised the survival data into two groups.
- Dead from cancer within one year.
- Alive more than four years, with no sign of relapse.
Then, I used limma to build a linear model and find differentially expressed probes.
Secondly, I used Cox proportional hazards regression. I fitted one model per probe.
hazardModels <- lapply(1:nrow(expression), function(probe) coxph(survivalData ~ expression[probe, ]))
I plotted the coefficients from the two methods, to check concordance. Most probes have similar coefficients, and the scatterplot is quite linear. I plotted raw expression values for some probes that were in disagreement. It is surprising that the Cox proportional hazards model is detecting many genes that have no probe expression difference between groups, but are highly expressed. These probes are not detected by the first method. What is the statistical explanation for this ?
It'd be rather helpful to see an example.
do you get similar results for different probes from the same gene?
are there outliers for expression[probe, ] that may adversely affect the survival analysis?
have you log-transformed, or similar, expression[probe, ] and do your histograms for the same look 'normal'?