Interpreting Varied Survival Times in TCGA-BRCA Data
2
0
Entering edit mode
12 weeks ago
Siqi • 0

Hello everyone,

I recently downloaded the TCGA-BRCA gene expression and phenotype data from UCSC Xena (https://xenabrowser.net/datapages/?cohort=TCGA%20Breast%20Cancer%20(BRCA)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). While analyzing the data, I noticed there are 3 types of survival times: OS (Overall Survival), DFI (Disease-Free Interval), and PFI (Progression-Free Interval). Within each kind of survival time, there are instances where samples with a survival outcome of 1 (indicating death) have longer survival times than those where the survival outcome is 0 (indicating survival).

For instance, sample [TCGA-A1-A0SJ-01] has a survival outcome of 0 with an OS time of 416 days, whereas sample [TCGA-A1-A0SK-01], which shows a survival outcome of 1, has an OS time of 967 days. This suggests that the sample marked as deceased had a longer survival time compared to the sample marked as alive.

I'm seeking insights on how to interpret these observations. Additionally, I am curious about the potential impacts of these discrepancies on: (1)The accuracy of Kaplan-Meier survival curves. (2)The development of machine learning models for predicting survival time and outcomes based on gene expression patterns.

Thank you.

Survival Kaplan-Meier TCGA analysis survival curves • 697 views
ADD COMMENT
0
Entering edit mode

Sorry for interrupting your question but can I ask you where you find the data on what samples belong to healthy patients and what sample to cancer patients?

ADD REPLY
0
Entering edit mode

The TCGA barcode has sample type information: https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/

ADD REPLY
0
Entering edit mode
10 weeks ago
Zhenyu Zhang ★ 1.2k

Why do you download these data from Xena, instead of go to GDC and grab the latest and most accurate TCGA survival data? Or even this 2018 publication has more accurate clinical data than most 3rd party data sources https://gdc.cancer.gov/about-data/publications/PanCan-Clinical-2018

ADD COMMENT
0
Entering edit mode
10 weeks ago
wenbinm ▴ 40

I don't see why "samples with a survival outcome of 1" can't "have longer survival times than those where the survival outcome is 0"? Samples marked with 1 have event date annotated while samples marked with 0 don't. The latter may never experience events (still alive) or were censored. The survival analysis model can handle this.

"Sample [TCGA-A1-A0SJ-01] has a survival outcome of 0 with an OS time of 416 days" means the patient is known to be event-free till day 416.

For the different survival time, you can refer to https://pmc.ncbi.nlm.nih.gov/articles/PMC6066282/

ADD COMMENT

Login before adding your answer.

Traffic: 1115 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6