Hi, I want to perform survival analysis on TCGA dataset. I use “survival” package in R to do it. For each gene, the equation for the model is “coxph(Surv(time,censor) ~ exprs)”, where time is survival time (for dead patients) or last follow up time (for alive patients), censor is dead or alive (alive=0 and dead=1) for each cancer sample, and exprs is the gene expression value. I have about 1000 genes. So I do it for 1000 times.
I also try almost the same equation just changing censor from “alive=0 and dead=1” to “alive=1 and dead=0”. The p-value changes a lot. The number of significant genes is almost the same. But the overlapping of significant genes for these two options is quite small (~30%).
From my understanding, the code for alive or dead cannot affect anything. However, why does it affect the result?
Did you read the help pages for coxph and Surv to see exactly how the variables passed to these should be encoded? At the console, type
?coxph
and?Surv
. I even given an example here: Survival analysis with gene expression Be aware that there can be a World of difference between a number encoded as numeric and that coded as a factor.Thanks, Kevin I do read the help page of "?surv". It recommands alvie=0 and dead=1. I just want to know why. I am reading your post, Thanks~~
Hey again. I do not really see your point of view... I mean, the survival of the patient is critical to how the statistical calculations are performed. It is 'hard-coded' in the program to expect that
alive=0
anddead=1
. So, that is how you must encode them in your input data.Thanks Kevin, you really help me a lot.