The way I understand cox regression is that it works on the assumption that the hazard curves for groups are proportional and as such do not cross on a plot.
So I have this experiment that is looking at the effect of low or high expression levels of gene 1 and gene 2 on survival of cancer patients using cox regression.
I am using low expression of gene 1 and gene 2 as my reference level (red curve on plot below) to compare all the other curves against.
I make my plot:
There is extensive crossover between the red and blue curve, which make me worry I am breaking the proportional hazard assumption when comparing those curves with cox regression :(
I run the cox.zph
function, which I understand to be a statistical test for proportional hazards. None of the p values for my groups are <0.5, which makes me think I am not breaking the assumption regardless of the visual crossover on the plot.
This is my general code approach:
res.cox <- coxph(Surv(new_death, death_event) ~ event_rna , data=all_clinical_df)
res.cox.extended <- summary(res.cox)
test_proportional_hazard <- cox.zph(res.cox)
This is the output of cox.zph
:
rho chisq p
event_rna high_gene1_low_gene2 -0.1651 1.5135 0.219
event_rna low_gene1_high_gene2 -0.0422 0.0981 0.754
event_rna high_gene1_high_gene2 -0.1251 0.8244 0.364
GLOBAL NA 1.6660 0.645
This is the summary of the coxph
output:
Call:
coxph(formula = Surv(new_death, death_event) ~ event_rna, data = all_clinical_df)
n= 170, number of events= 56
coef exp(coef) se(coef) z Pr(>|z|)
event_rna high_gene1_low_gene2 1.217946 3.380239 0.440276 2.766 0.00567 **
event_rna low_gene1_high_gene2 0.008347 1.008382 0.571929 0.015 0.98836
event_rna high_gene1_high_gene2 1.237366 3.446522 0.404320 3.060 0.00221 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
event_rna high_gene1_low_gene2 3.380 0.2958 1.4262 8.011
event_rna low_gene1_high_gene2 1.008 0.9917 0.3287 3.094
event_rna high_gene1_high_gene2 3.447 0.2901 1.5604 7.613
Concordance= 0.648 (se = 0.035 )
Likelihood ratio test= 17.57 on 3 df, p=5e-04
Wald test = 14.8 on 3 df, p=0.002
Score (logrank) test = 16.66 on 3 df, p=8e-04
I have been on this for a while and would be extremely grateful for any suggestions.