I'm confused about the relationship of TCGA, Firehose and cBioportal data.
Are the data from each database are totally different, or partly/largely overlapping?
Some links or references?
I'm confused about the relationship of TCGA, Firehose and cBioportal data.
Are the data from each database are totally different, or partly/largely overlapping?
Some links or references?
TCGA was a project to collect cancer genome data in a standardised way from many centres.
The official repository for this data is the Genome Data Commons (GDC, which also stores data from other projects). It is the only place completely raw data can be downloaded (although you generally need access authorisation to do so).
Firehose is a service that provides analysis of the TCGA data. Some of the analyses that you can down load from Firehose (e.g. gene expression measurements) are also available from GDC, but are created with different tools. Other analyses on Firehose are only available there. Firehose is the most user friendly way to access complete TCGA datasets processed sufficiently for interpretive analysis.
cBioPortal is another service that provides analysis. It combines data from many different projects (including TCGA) to produce high-level overviews and analysis of the data. You probably wouldn't go to cBioPortal to download whole datasets, but rather to produce summaries or do tests.
To summarise:
TCGA is the data.
GDC, Firehose and cBioPortal are sites that provide access to the data with varying levels of pre-processing and/or analysis.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.