Question

TCGA clinical data: stage of tumor at time biopsy was taken

5

Entering edit mode

8.7 years ago

thomaskuilman ▴ 850

I have recently discovered a potential biomarker and would like to validate its prognostic value in the TCGA dataset on late-stage melanama. I realized that one can make survival curves from the days_to_last_followup and days_to_death tabs, but the problem with that is that those survival data do not fully correlate with the related sequencing data. For instance, for a stage I melanoma patient it can be that the submitted_tumor_site is "Regional Lymp Node", which is incompatible with stage I. In other words, the staging was at the time of the original (earliest) diagnosis, and the submitted sample was from a relapsing tumor at a later date (and most likely higher stage). If I were to apply my biomarker to this set, in my opinion the above-mentioned sample would be mis-staged since the sequenced tumor has stage III/IV characteristics, while being staged as stage I.

An alternative approach would be to select samples based on the site of the submitted biopsy (for instance including only tumors that have spread into the regional lymph node), taking into account the fact that the biopsy was taken a number of days after the earliest diagnosis (the days_to_submitted_specimen_dx would provide me with that number). The problem with this is that (again) the staging should be taken into account, as staging obviously is a major determinant of outcome. Therefore, my question is whether the stage at the time the submitted biopsy was taken is available, and if so where I can find that (I have checked https://tcga-data.nci.nih.gov/docs/dictionary/ but did not find it there). If not, could anyone suggest to me what would be a fair alternative for coupling sequencing data to survival?

Thanks!

ps Sorry for being verbose but I found that survival and staging-related questions about the TCGA database are underrepresented and other Biostarrers might benefit from a slightly longer version of this post.

TCGA clinical data staging survival Kaplan–Meier • 7.1k views

ADD COMMENT • link 8.7 years ago by thomaskuilman ▴ 850

0

Entering edit mode

I don't have any great suggestions here. TCGA tumors were cobbled together from whoever was willing to provide samples, and the clinical data is generally pretty lacking.

ADD REPLY • link 8.7 years ago by Chris Miller 22k

0

Entering edit mode

Thanks for your answer; I have dropped an email at TCGA to check whether I am missing something but I fear that those kind of data are indeed not available. If anything comes out from that I will update the post.

ADD REPLY • link 8.7 years ago by thomaskuilman ▴ 850

0

Entering edit mode

I would try to look directly into pathology reports pdfs, which should contain data based on biopsy.

ADD REPLY • link 8.7 years ago by minio.cz ▴ 10

0

Entering edit mode

That is indeed something I did not think of yet, but it seems only feasible if you are working on medium-size cohorts.

ADD REPLY • link 8.7 years ago by thomaskuilman ▴ 850

0

Entering edit mode

Hi,for other who may also need to figure out these information, explanation of these terms can be found from the Clinical and Biospecimen section of GDC documentation viwer.

ADD REPLY • link 7.7 years ago by solo7773 ▴ 90

score 5 · Accepted Answer · 2016-03-14

I have submitted my question to the TCGA, and I am pasting their entire answer below:

We may not have the exact time interval with corresponding staging as you request, but below is an explanation of each clinical variable. I hope that you can use this information for your analysis:

The only overall stage that TCGA collected for SKCM is the "ajcc_pathologic_tumor_stage" in the clinical_patient_skcm.txt file. As you indicated, this reflects the stage at initial pathologic diagnosis and this diagnosis is not necessarily the event that yielded the biospecimen sent to the BCR. Unfortunately, TCGA did not collect the stage specifically at the time that the specimen sent to the BCR was obtained.

The " days_to_initial_pathologic_diagnosis" indicates the date of initial melanoma diagnosis. The "submitted_tumor_dx_days_to" indicates the date of diagnosis for the sample submitted to the BCR (actually days from the initial melanoma diagnosis).

There is also a "days_to_sample_procurement" in the nationwidechildrens.org_ssf_tumor_samples_skcm.txt file. This indicates the days to cancer sample procurement for the sample submitted to the BCR for TCGA in relation to the date of initial melanoma diagnosis.

If you filter "days_to_sample_procurement" for 0 (or within a number of days) and use primary tumor (submitted_tumor_site) samples, the "ajcc_pathologic_tumor_stage" should reflect the stage at the time the submitted biopsy was taken.

Indeed, as suggested by the TCGA, the days_to_sample_procurement is the more accurate tab to define the date that the tumor was obtained (rather than the days_to_submitted_specimen_dx I mentioned in my original post).

Without wanting to dive into the pathology reports (yet), I see a number of possibilities:

Filter based on the site of the biopsy. For instance, if submitted_tumor_site is "Distant Metastasis", this is by definition from a stage IV tumor. Alternatively, if it is "Regional Lymph Node" it should be stage III or stage IV. In this case, the number of days that can be used for survival curves are last_contact_days_to - days_to_sample_procurement (censored) and death_days_to - days_to_sample_procurement (not censored).
Filter for days_to_sample_procurement around 0 days. Indeed as suggested in the reply by the TCGA team, the stage obtained from ajcc_pathologic_tumor_stage should reflect the stage at the time the biopsy was taken. In this case, the above-mentioned formulas for calculating days for the survival curves can be used too.
Not to care about the mis-staging of samples (not my favourite option!)