Hello, I am trying to do a survival analysis with TCGA data following this tutorial: Survival analysis of TCGA patients integrating gene expression (RNASeq) data , but I want to integrate methylation data instead of rnaseq data. What I am doing is selecting the gene of interest and averaging its CpG beta values per column: then, I create a vector for that gene where, for samples where avg beta values are >0.3 , the value is a 1, and a 0 if its <0.3. Then, my survival function is as follows:
s <- survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_r[ind_clin])
s1 <- tryCatch(survdiff(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_r[ind_clin]), error = function(e) return(NA))
where event_r is a vector of 1s and 0s for the corresponding clinical samples in that gene of interest.
Do you think averaging the beta-values for the whole gene is a good approach to integrate methylation with survival? Any other suggestions? Thank you
I agree with @Kevin you should perform cox for each and every probe separately. This will lead you to identify probable prognostic CpG sites.