TCGA has recently migrated to the Genomic Data Commons (GDC). Following this migration, many tools convenient for retrieving TCGA data, such as TCGA-Assembler, no longer apply. So, with the new GDC, I'd like to download RNA-Seq data (in bulk) for tumor samples as well as normal control samples. How might I accomplish this?
I know how to download data from the GDC, but I need to know whether a specific RNA-seq data file is coming from a tumor how to obtain its "matched normal tissue" RNA-seq data. I'd like to do this in bulk, for, say, the HCC project.
I know there's a "legacy portal" for the old TCGA data on the GDC website, but I want to use the newest GDC portal.
Thanks in advance.
On the search page you can use the "Add cases filter" link to add sample_type as a filter, and then limit to "blood dervied normal" or "solid tissue normal", and then on the files tab, select transcriptome profiling. However there are far fewer normal samples than tumour samples. (e.g. only ~100 breast cancer samples are from normals, as opposed to ~1000 from tumor) and non of these seem to have the raw sequencing associated with them. Perhaps they haven't finished all the processing yet?
Any solution ? I'm also interested by retrieving associated normal tissue.
legacy portal contains the raw fastq files which maybe handy and what you are after.