Hello, I am new to bioinformatic analyses and I am trying to analyse the TCGA dataset to plot a survival curve based on the expression of a gene of interest (say TP53). I have written the following code to analyse the TCGA data, but I am unable to proceed further in my attempt to plot a KM plot for survival for my gene, in this case TP53. Following is the code I have written so far. I would be grateful if someone can help me proceed with my analysis.
Load Packages
library(TCGAbiolinks) library(SummarizedExperiment) library(EDASeq) library(edgeR) library(survminer)
setwd()
Build a query to retrieve gene expression data of required samples
query <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts")
samplesDown <- getResults(query,cols=c("cases"))
Selection of tumor samples "TP"
dataTP <- TCGAquery_SampleTypes(barcode = samplesDown,typesample = "TP")
Selection of normal samples "NT"
dataNT <- TCGAquery_SampleTypes(barcode = samplesDown,typesample = "NT")
query.selected.samples <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", barcode = c(dataTP, dataNT))
Download data
GDCdownload(query = query.selected.samples)
Prepare expression matrix with geneID in the rows and samples (barcode) in the columns
dataPrep <- GDCprepare(query = query.selected.samples, save = TRUE, save.filename = "TCGA_PAAD.rda")
paad_matrix <- assay(dataPrep, 'fpkm_unstrand')
For gene expression if you need to see a boxplot correlation and AAIC plot to define outliers you can run
dataPrep <- TCGAanalyze_Preprocessing(object = dataPrep, cor.cut = 0.6)
Normalization of genes
dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep, geneInfo = geneInfoHT, method = "gcContent")
boxplot(dataPrep, outline = FALSE) boxplot(dataNorm, outline = FALSE)
Quantile filter of genes
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut = 0.25)
Differential Expression Analysis (DEA)
dataDEGs_0.01 <- TCGAanalyze_DEA(mat1 = dataFilt[,dataTP], mat2 = dataFilt[,dataNT], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.01, logFC.cut = 1, method = "glmLRT") dataDEGs_0.05 <- TCGAanalyze_DEA(mat1 = dataFilt[,dataTP], mat2 = dataFilt[,dataNT], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.05, logFC.cut = 1, method = "glmLRT")
DEGs table with expression values in normal and tumor samples
dataDEGsFiltLevel_0.01 <- TCGAanalyze_LevelTab(FC_FDR_table_mRNA = dataDEGs_0.01, typeCond1 = "Tumor", typeCond2 = "Normal", TableCond1 = dataFilt[,dataNT], TableCond2 = dataFilt[,dataTP]) dataDEGsFiltLevel_0.05 <- TCGAanalyze_LevelTab(FC_FDR_table_mRNA = dataDEGs_0.05, typeCond1 = "Tumor", typeCond2 = "Normal", TableCond1 = dataFilt[,dataNT], TableCond2 = dataFilt[,dataTP])
Thank you for your valuable time and advice. Manav
Hi! I don't think it is necessary to carry a DEA to conduct a survival analysis. All you need is the expression quantification of the gene of interest, normalize the expression values, establish a cut-off value to classify the samples as
high
(high expression of TP53 according to your cut-off) orlow
(you can also create three categories (high, medium, low)). Maybe Survival analysis of TCGA patients integrating gene expression (RNASeq) data helps :).Thanks, tried working with the dataset and workflow. It worked.