Question

TCGAbiolinks TCGA-BRCA RNA-seq clinical data

3

Entering edit mode

6.5 years ago

Matina ▴ 250

Hi all,

I have downloaded the TCGA-BRCA RNA-seq data and the associated clinical information using the code below.

CancerProject <- "TCGA-BRCA"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification", 
                      workflow.type = "HTSeq - Counts", 
                      barcode = dataSmTP)

GDCdownload(query = queryDown,directory = "BRC_RESULTS/TCGA/htseq_data/")                    

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  "BRC_RESULTS/TCGA/htseq_data/",
                       save.filename = "htseq_counts.rda", summarizedExperiment = TRUE)

In the clinical data there are several columns such as days_to_death or days_to_last_follow_up and other columns such as subtype_OS.Time or subtype_OS.event.

What is the difference between the columns having subtype_ at the begging and the rest and which one should I use for survival analysis? At the moment I have used the subtype_ columns for my analysis and I am wondering if this correct.

Thanks a lot,

Matina

TCGA BRCA TCGAbiolinks RNA-Seq • 4.6k views

ADD COMMENT • link updated 6.5 years ago by igor 13k • written 6.5 years ago by Matina ▴ 250

2

Entering edit mode

Dear Matina,

what is your purpose with the RNA-Seq data ? DE analysis ? looking for example to inspect the expression of specific genes ? or looking for molecular subtype pattern and survival analysis ? i think you already got an answer from one of the creators of the R package in the github account, correct ?

https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/227

Best,

Efstathios

ADD REPLY • link 6.5 years ago by svlachavas ▴ 790

0

Entering edit mode

Hi Efstathios,

I have a set of genes that I am interested in and I want to see if they are associated with clinical outcomes and molecular subtype patterns. You are right, I got an answer in the GitHub account.

Thanks a lot for your answer! Matina

ADD REPLY • link 6.5 years ago by Matina ▴ 250

score 4 · Answer 1 · 2018-06-12

4

Entering edit mode

6.5 years ago

atakanekiz ▴ 310

Hi Matina,

I would go with the days_to_death and days_to_last_follow_up (for alive patients) for survival analyses. I think stuff that starts with subtype_ might be manually curated data. I'm not 100% sure but, subtype_OS.Time sounds like the time period that the tumor was classified as a certain subtype (primary-metastatic-stage i-ii-iii etc). I think days_to_death is a more straightforward data type.

Atakan

ADD COMMENT • link 6.5 years ago by atakanekiz ▴ 310

0

Entering edit mode

Hi Atakan,

This is correct - I got an answer from one of the developers of TCGAbiolinks at the Github account saying that everything that starts with subtype_ is actually metadata from papers that analyzed the samples suggested to use days_to_death. In any case what is strange is that the subtype_ column for OS has clinical info for patients that in the days_to_last_follow_up column is shown as missing or they report completely different number of days.

Thanks again, Matina

ADD REPLY • link 6.5 years ago by Matina ▴ 250

zx8754 · Answer 2 · 2018-06-12

3

Entering edit mode

6.5 years ago

igor 13k

You could also consider using the Pan-Cancer Atlas curated survival data from Xena:

Survival_SupplementalTable_S1_20171025_xena_sp

ADD COMMENT • link updated 6.5 years ago by zx8754 12k • written 6.5 years ago by igor 13k

0

Entering edit mode

Thank you very much Igor! I will have a look at this!

ADD REPLY • link 6.5 years ago by Matina ▴ 250