Using Cox model, we can estimate survival probabilities but I want to predict survival months of the patient using machine learning. Most of the TCGA datasets have overall survival months of patients. Do I still need to use censoring for predicting survival months ?
The censoring rate is too high (85%) and number of patients around 700. So I won't get too many to train data. Another thing is if tcga data are incomplete, I think predicting overall survival months (continuous) should incorporate censoring. I should not treat it as complete data otherwise all analysis is wrong. Do you agree that Kevin ?
I would try it with and without the censored patients. For many of the TCGA censored patients, It is impossible to know whether or not they eventually became deceased, and at which point after the study. Check the other variable, Vital Status, too.
Depending on the model / classifier that you are aiming to use, you may even be able to include the censored patients and encode them thus.
Another idea: use the censored patients as a second validation cohort, on which you will apply your classifier.
The censoring rate is too high (85%) and number of patients around 700. So I won't get too many to train data. Another thing is if tcga data are incomplete, I think predicting overall survival months (continuous) should incorporate censoring. I should not treat it as complete data otherwise all analysis is wrong. Do you agree that Kevin ?
Your intuition is as good as mine, akutasame