KM Plot for gene of interest (e.g. TP53) using TCGA-PAAD dataset
0
0
Entering edit mode
20 months ago
Manav • 0

Hello, I am new to bioinformatic analyses and I am trying to analyse the TCGA dataset to plot a survival curve based on the expression of a gene of interest (say TP53). I have written the following code to analyse the TCGA data, but I am unable to proceed further in my attempt to plot a KM plot for survival for my gene, in this case TP53. Following is the code I have written so far. I would be grateful if someone can help me proceed with my analysis.

Load Packages

library(TCGAbiolinks) library(SummarizedExperiment) library(EDASeq) library(edgeR) library(survminer)

setwd()

Build a query to retrieve gene expression data of required samples

query <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts")

samplesDown <- getResults(query,cols=c("cases"))

Selection of tumor samples "TP"

dataTP <- TCGAquery_SampleTypes(barcode = samplesDown,typesample = "TP")

Selection of normal samples "NT"

dataNT <- TCGAquery_SampleTypes(barcode = samplesDown,typesample = "NT")

query.selected.samples <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", barcode = c(dataTP, dataNT))

Download data

GDCdownload(query = query.selected.samples)

Prepare expression matrix with geneID in the rows and samples (barcode) in the columns

dataPrep <- GDCprepare(query = query.selected.samples, save = TRUE, save.filename = "TCGA_PAAD.rda")

paad_matrix <- assay(dataPrep, 'fpkm_unstrand')

For gene expression if you need to see a boxplot correlation and AAIC plot to define outliers you can run

dataPrep <- TCGAanalyze_Preprocessing(object = dataPrep, cor.cut = 0.6)

Normalization of genes

dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep, geneInfo = geneInfoHT, method = "gcContent")

boxplot(dataPrep, outline = FALSE) boxplot(dataNorm, outline = FALSE)

Quantile filter of genes

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut = 0.25)

Differential Expression Analysis (DEA)

dataDEGs_0.01 <- TCGAanalyze_DEA(mat1 = dataFilt[,dataTP], mat2 = dataFilt[,dataNT], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.01, logFC.cut = 1, method = "glmLRT") dataDEGs_0.05 <- TCGAanalyze_DEA(mat1 = dataFilt[,dataTP], mat2 = dataFilt[,dataNT], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.05, logFC.cut = 1, method = "glmLRT")

DEGs table with expression values in normal and tumor samples

dataDEGsFiltLevel_0.01 <- TCGAanalyze_LevelTab(FC_FDR_table_mRNA = dataDEGs_0.01, typeCond1 = "Tumor", typeCond2 = "Normal", TableCond1 = dataFilt[,dataNT], TableCond2 = dataFilt[,dataTP]) dataDEGsFiltLevel_0.05 <- TCGAanalyze_LevelTab(FC_FDR_table_mRNA = dataDEGs_0.05, typeCond1 = "Tumor", typeCond2 = "Normal", TableCond1 = dataFilt[,dataNT], TableCond2 = dataFilt[,dataTP])

Thank you for your valuable time and advice. Manav

PAAD KM Plot TCGA • 815 views
ADD COMMENT
3
Entering edit mode

Hi! I don't think it is necessary to carry a DEA to conduct a survival analysis. All you need is the expression quantification of the gene of interest, normalize the expression values, establish a cut-off value to classify the samples as high (high expression of TP53 according to your cut-off) or low (you can also create three categories (high, medium, low)). Maybe Survival analysis of TCGA patients integrating gene expression (RNASeq) data helps :).

ADD REPLY
0
Entering edit mode

Thanks, tried working with the dataset and workflow. It worked.

ADD REPLY

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6