Hello everyone,
I recently downloaded the TCGA-BRCA gene expression and phenotype data from UCSC Xena (https://xenabrowser.net/datapages/?cohort=TCGA%20Breast%20Cancer%20(BRCA)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). While analyzing the data, I noticed there are 3 types of survival times: OS (Overall Survival), DFI (Disease-Free Interval), and PFI (Progression-Free Interval). Within each kind of survival time, there are instances where samples with a survival outcome of 1 (indicating death) have longer survival times than those where the survival outcome is 0 (indicating survival).
For instance, sample [TCGA-A1-A0SJ-01] has a survival outcome of 0 with an OS time of 416 days, whereas sample [TCGA-A1-A0SK-01], which shows a survival outcome of 1, has an OS time of 967 days. This suggests that the sample marked as deceased had a longer survival time compared to the sample marked as alive.
I'm seeking insights on how to interpret these observations. Additionally, I am curious about the potential impacts of these discrepancies on: (1)The accuracy of Kaplan-Meier survival curves. (2)The development of machine learning models for predicting survival time and outcomes based on gene expression patterns.
Thank you.