How to download a full matrix of gene expression of a TCGA dataset
1
4
Entering edit mode
10.5 years ago
fbrundu ▴ 350

Hi all,

I am not understanding how to download a full matrix of a TCGA subset (e.g. Colon Adenocarcinoma).

I selected Download Data > Data matrix > COAD Data matrix but I am getting a lot of files (level 3).

I want only the matrix of genes x samples. Have I to assemble it or can I download it somewhere?

Thanks

tcga gene expression cancer samples • 20k views
ADD COMMENT
5
Entering edit mode
10.5 years ago

Hi,

you were searching at the right place. Indeed you will find plenty of txt files when you select "RNA-Seq" in the data matrix. There are gene expression files for each individual seperately. In your case, i assume you want the files endlich with "gene.quantification.txt" as the contain gene and RPKM.

Another possibility would be https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/. Select your tumor and analysis type of interest (Buton "Add datasets") and on the left side (mouse over) you will find a symbol which downloads the data.

I hope that helps,

Sebastian

ADD COMMENT
0
Entering edit mode

Thanks Sebastian, I got it. Just a little question: is there a reason why I have to select RNA-Seq and not RNA-SeqV2 or TotalRNA-SeqV2?

ADD REPLY
1
Entering edit mode

Sorry for the late reply. That is actually a very good question. The differences between RNA-Seq and RNA-SeqV2 e.g. is a different (computational) processing to determine expression levels (e.g. RSEM instead of RPKM) => https://wiki.nci.nih.gov/display/TCGA/RNASeq+Version+2

So, as far as I understand the (raw) sequencing data is the same (which you can only access when you succesfully applied for access), but the processing differs.

ADD REPLY
0
Entering edit mode

I would recommend UCSC Xena. The same team developed Xena and Cancer Browser mentioned above. The last data release on the UCSC Cancer Genomics Browser is Feb 2015. Since then, all new data are only being released from our team to Xena at http://xena.ucsc.edu . Drill down to the cohort of your interest, and then dataset of your interest, click the download button on the dataset detail page for bulk download.

For example: the COAD (colon cancer, TCGA) RNAseq gene expression estimation dataset page is at this url:

https://genome-cancer.soe.ucsc.edu/proj/site/xena/datapages/?dataset=TCGA.COAD.sampleMap/HiSeqV2&host=https://tcga.xenahubs.net

Hope this is helpful.

ADD REPLY
0
Entering edit mode

There doesn't seem to be a "download button" anymore as you mentioned. Nor does the link work - but this is 3yrs old, so I'm not surprised.

ADD REPLY
0
Entering edit mode

There is a download link .once you select your interested cancer dataset and choose the study(for eg:htseq count). There u find download link on the top.

ADD REPLY

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6