Entering edit mode
5.1 years ago
H.Hasani
▴
990
Hello all,
almost three years later I'm facing the same issue here. When using the downloaded data from Firehose, the number of mutations were more than the ones reported in cbioportal and particularly comparing with "provisional datasets", which according to cbioportal's FAQ it should contain all data available from the Broad Firehose? any idea why? I'm suspecting that they are applying internal filtering but not sure if they are applying it to the provisional datasets?
This is just a comment: I never expect any of those datasets to line up perfectly anymore. Also, I would regard MSKCC (cBioPortal) and Broad Institute (Firehose) as third party providers of TCGA data: They take the main data from Genomic Data Commons (GDC) and may do some re-processing / filtering. They should date stamp and report clearly every step that they take. One issue, of course, is that even the data at GDC is evolving over time.
The best that you can do is date-stamp your own data, i.e., for the purposes of publishing.