Question

need some help on how to use DESeq2 for TCGA data

0

Entering edit mode

3.0 years ago

Leo • 0

Hello, I am sorry for this newbie question, but I spent all morning trying to find it out but can't find a clear answer anywhere.

I want to normalise RNA seq data from TCGA using DESeq2. I use the TCGA-Assembler R package to download RNA seq data using array platform: "gene_RNAseq" which gives an excel with raw_count and scaled_estimate for each patient sample and gene. Should I use the "ProcessRNASeqData" function from the TCGA-Assembler package or just go with the excel file given by the "DownloadRNASeqData" function?

Then before I can start using DESeq2, I have to create a count matrix. How do I do this? Does anyone perhaps have any code ready that I can use for this type of data?

I really would appreciate any help because I have no idea how to continue. Thanks!

edit: read somewhere that I should download "illuminahiseq_rnaseqv2-RSEM_genes_normalized (MD5)" from firehose. how can I convert this data so I can use it for DESeq2?

DESeq2 TCGA R TCGA-Assembler • 4.4k views

ADD COMMENT • link updated 2.1 years ago by Hamid Ghaedi 3.3k • written 3.0 years ago by Leo • 0

1

Entering edit mode

In this repository, I provide R code to download bladder cancer data from TCGA (TCGA-BLCA) using the package TCGABiolinks and then used DESeq2 to analyze the data. If you want to give it a try, just replace "TCGA-BLCA" with your cancer of interest TCGA abbreviation.

ADD REPLY • link 3.0 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Hamid Ghaedi , the repository you provided has really been helpful. however, I have problem with this syntax write.csv(res_df, file= paste0(resultsNames(dds)[2], ".csv") initially I corrected it by adding a close bracket. But then, after running the code, I got this as saved file

saved_csv

I ran this before the write.csv

df <- ensid_symbol(row.names(res_output))

result_df <- as.data.frame(res_output)
result_df$ensembl_gene_id <- row.names(result_df) result_df <- merge(df,result_df, by = "ensembl_gene_id") resOrdered<-result_df[with(result_df, order(abs(log2FoldChange), padj, decreasing = TRUE)), ]

I have tried all I can but it seems am not getting solution. is there something am doing wrong?

kindly assist

ADD REPLY • link 2.1 years ago by Jakpa ▴ 50

0

Entering edit mode

Jakpa Thanks for mentioning the closing bracket :). Can you change the values you have passed to alpha= (let's set this as 0.1) and/or lfcThreshold = (let's set this as 0.5) in the following chunk in your code? Then run the code and inspect the result

res <- results(dds, alpha = 0.05,  altHypothesis = "greaterAbs", lfcThreshold = 1.5) # alpha controls FDR rate

ADD REPLY • link 2.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Hamid Ghaedi , Thank you for your response. But, its still same output. i.e, empty rows

I couldnt figure the reason:)

any suggestion?

ADD REPLY • link 2.1 years ago by Jakpa ▴ 50

0

Entering edit mode

Then open a new question and provide all the code you are using, and you will get feedback.

ADD REPLY • link 2.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Leo, do not delete posts that have received feedback.

ADD REPLY • link 2.3 years ago by Ram 44k

score 3 · Accepted Answer · 2021-12-03

Here is a .Rmd file for downloading miRNA, mRNA expression from TCGA-PRAD and downstream DESeq2 analysis using TCGA-Biolinks, which should do what you need. Take the parts you need and substitute in your cancer of interest at the biolinks step.

https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd

A cautionary tale: you will be missing ~10% of the ~60,000 genes (the same happens with firehose)..