Hi all,
I have downloaded the TCGA-BRCA RNA-seq data and the associated clinical information using the code below.
CancerProject <- "TCGA-BRCA"
query <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
samplesDown <- getResults(query,cols=c("cases"))
dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "TP")
queryDown <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = dataSmTP)
GDCdownload(query = queryDown,directory = "BRC_RESULTS/TCGA/htseq_data/")
dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
directory = "BRC_RESULTS/TCGA/htseq_data/",
save.filename = "htseq_counts.rda", summarizedExperiment = TRUE)
In the clinical data there are several columns such as days_to_death or days_to_last_follow_up and other columns such as subtype_OS.Time or subtype_OS.event.
What is the difference between the columns having subtype_ at the begging and the rest and which one should I use for survival analysis? At the moment I have used the subtype_ columns for my analysis and I am wondering if this correct.
Thanks a lot,
Matina
Dear Matina,
what is your purpose with the RNA-Seq data ? DE analysis ? looking for example to inspect the expression of specific genes ? or looking for molecular subtype pattern and survival analysis ? i think you already got an answer from one of the creators of the R package in the github account, correct ?
https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/227
Best,
Efstathios
Hi Efstathios,
I have a set of genes that I am interested in and I want to see if they are associated with clinical outcomes and molecular subtype patterns. You are right, I got an answer in the GitHub account.
Thanks a lot for your answer! Matina