TCGA- which files to download for analyzing differentially expressed miRNAs

0

Entering edit mode

4.2 years ago

ginny • 0

I have just started working on TCGA data, and I observed that the RNA-seq (HT-Seq counts) files also have the ENSEMBL gene ids for miRNAs, which means that the expression values of miRNA genes are also present in the RNA-seq files.(?)

So then why does TCGA have a separate miRNA quantification dataset (files ending with .mirbase.mirna.quantification)?

I am confused because I plan to find both the differentially expressed genes as well as miRNAs, and don't know which dataset to consider for DESeq2.

Please help! :(

RNA-Seq sequencing mirna tcga DESEQ2 • 1.1k views

ADD COMMENT • link 4.2 years ago by ginny • 0

1

Entering edit mode

You need to download them separately.

The mirbase.mirna.quantification files are what you want for miRNA DE analysis. You will want to subset the HT-Seq counts too if they contain roughly 50,000 rows (harmonized data) to contain only coding genes ~20,000

ADD REPLY • link 4.2 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

Thank you so much! Any idea how can I filter out only the coding genes?

ADD REPLY • link 4.2 years ago by ginny • 0

1

Entering edit mode

I have code here (https://github.com/BarryDigby/TCGA_Biolinks/blob/master/TCGA_Biolinks.Rmd) that does everything you want: download data, prepare metadata, filtering coding genes, differential expression analysis. It's a good basic template to start with.

It was conducted on TCGA PRAD. Install packages as required, change PRAD to your tissue type of interest and you are good to go.

ADD REPLY • link 4.2 years ago by Barry Digby ★ 1.3k

Login before adding your answer.