I have a question regarding this C: Survival analysis of TCGA patients integrating gene expression (RNASeq) data
run survival analysis
s <- survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum])
This can be calculated without any problem. However, I am wondering why one needs to create the event_rna
s1 <- tryCatch(survdiff(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]), error = function(e) return(NA))
This gives me an empty list. What does it mean? why should it be empty?
# extraect the p.value
pv <- ifelse ( is.na(s1),next,(round(1 - pchisq(s1$chisq, length(s1$n) - 1),3)))[[1]]
I get this error for this command which is because I don't have the s1. So would it not be OK ?
Error in ifelseis.na(s1), next, (round(1 - pchisq(s1$chisq, length(s1$n) - :
no loop for break/next, jumping to top level
.
# plot the data
plot(survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]),
col=c(1:3), frame=F, lwd=2,main=paste('KIRK',rownames(z_rna)[ind_gene],sep='\n'))
This gives me three graph. Red, Green and Black. I guess normally it should give black and red, right?
# add lines for the median survival
x1 <- ifelse ( is.na(as.numeric(summary(s)$table[,'median'][1])),'NA',as.numeric(summary(s)$table[,'median'][1]))
x2 <- as.numeric(summary(s)$table[,'median'][2])
if(x1 != 'NA' & x2 != 'NA'){
lines(c(0,x1),c(0.5,0.5),col='blue')
lines(c(x1,x1),c(0,0.5),col='black')
lines(c(x2,x2),c(0,0.5),col='red')
}
I get this error
Error in summary(s)$table[, "median"] : incorrect number of dimensions
Please use below for an example
how i do survival analysis I am trying to give the data in R so that one can explain how to use it
df<- structure(list(sample = structure(c(3L, 3L, 1L, 2L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Down",
"Up", "UP"), class = "factor"), new_time = c(42L, 52L, 99L, 41L,
20L, 6L, 4L, 11L, 47L, 29L, 4L, 8L, 10L, 28L, 16L, 18L, 32L,
26L), new_death = c(71L, 2L, 35L, 58L, 5L, 6L, 44L, 11L, 18L,
33L, 42L, 40L, 53L, 28L, 24L, 18L, 45L, 26L), death_event = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L)), .Names = c("sample", "new_time", "new_death", "death_event"
), class = "data.frame", row.names = c(NA, -18L))
@Kevin Blighe thank you for your amazing help. do you think that I need more information than the example data I provided? sometimes I see people make it so fancy in their paper. generally we should make a curve between the time and event. am I right? is the above plot statistically valid? can you please tell me your opinion the graph? it means that down regulation survive better :-D as an example if you look above post , you can see he involved a dummy variable into the model ~event_rna[ind_gene,ind_tum] , why such thing should be incorporated ? do you have any idea?
Hi, well, if you generate it as a curve, then a reviewer may just tell you to put it back as a step-wise / straight line plot, as I have it above.
I do not believe that you have to make it too fancy. The km.coxph.plot generates a curve based on the Cox Proportional Hazards and therefore generates a p-value and hazard ratio for your categories being plotted (however, you have to choose a reference level via the command that I used above, i.e.,
df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up"))
). There are other p-values that can be quoted, such as that from the Wald test (not seen these too much in survival analyses) and the log rank.What I did in a recent manuscript was just added the hazard ratios, their confidence intervals, and the p-values as text to the plot using
mtext()
There are many tutorials online about survival analysis. I notice now that there's even an implementation for ggplot2.
Finally, I am not sure that the
km.coxph.plot
function calculates the no. at risk correctly. I recall using it in the past and re-writing the function to make it work.@Kevin Blighe is there a way to incorporate the new-time too ? or I must use only the death time? can you share your paper with me? I can at least refer to that
Let me check when I get home later today!
@Kevin Blighe ok thanks. There is another way to also do that dft<- Surv(time, event) then fit <- survfit(dft ~ 1) and plot(fit) . It always makes me wonder which way is best or how to interpret it based on different functions :-D
Yes, there is 1 billion different ways to do survival in R. You could try this:
To see the stats, just call the function like this. To do this properly, though, I would generate the p-values from the original and not the merged data object, i.e., separately for new time and new death. The first set of P values listed are from a Cox proportional hazards regression model.
To generate the plot:
@Kevin Blighe Thank you very much. What are these numbers in the bottom (No. At Risk? ) and what are these HR on the figures ? can you please explian ?
I would encourage you to read more about the number at risk part (via online tutorials). You can easily remove this by using
show.n.risk=TRUE
The 'HR' relates to 'Hazard Ratio', which is derived from the Cox Proportional Hazards model coefficients (see exp(coef), lower .95, and upper .95 in the stats output). In my example, I have not matched the numbers in the plot to the numbers in the output - this is just an example.
There is lots to survival analysis, as you can see! If you are struggling to stay on top of it all, then best to keep it very simple for now and use the more simple functions that you found.
@Kevin Blighe do you know why I am getting this error ?
Call: + survival::coxph(formula = formula.s, data = data.s, weights = weight.s) Error: object 'Call' not found
Looks strange. What is the exact command that you are running?
At the end of the day, most of these R survival functions are just wrappers for the basic functions, i.e., Surv and coxph.
@Kevin Blighe This is what I run exactly
I'm not sure that's an error - a bit strange, in fact. Nothing was output t terminal? If you are running on a cluster, have you got X11 set-up so that plot widows can be displayed?
Load these functions and then re-try: