Hi
I am struggling with selection of maf files for mutation analyses on TCGA patients. Here is the actual problem.
Say I want to do somatic mutation analysis for GBM patients. So there are two types I can get the data from
(i) From TCGA-DATA MATRIX (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm), I selected 'somatic mutations' in "Data Type" and clicking apply I followed the email link to download the data. By untarring the file I got Level_2 maf file named "broad.mit.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf" having 21495 lines (first line being the header)
(ii) From broad institute 's "MAF+Dashboard" facility (https://confluence.broadinstitute.org/display/GDAC/MAF+Dashboard) I got two files for GBM tissue under the main section (MAFs Available from the DCC as of 26 February 2015) of this above said web link page.
These two files are:
(a) gbm_liftover.aggregated.capture.tcga.uuid.somatic.maf (22169 lines)
(b) step4_gbm_liftover.aggregated.capture.tcga.uuid.maf2.4.migrated.somatic.maf (22171 lines)
Now my confusion is, which file to proceed with for mutational analysis in GBM.
Thanks in advance
try here. Generally all versions are available, including the obsolete ones. Look for deploy date to select recent one.
Thanks poisonAlien
I downloaded the recent file in GBM (i.e. 27-JUN-13; last link in GBM section) and found its identical to the (ii) (b) file mentioned in my query. So can I proceed with this
Other query is:
In MAF+Dashboard facility (https://confluence.broadinstitute.org/display/GDAC/MAF+Dashboard), they have divided web page into two sections
A) MAFs Ingested into Broad GDAC Firehose as of 05 February 2015
B) MAFs Available from the DCC as of 26 February 2015
So what's the meaning of section A) in that page? because in section A)
gbm_liftover.aggregated.capture.tcga.uuid.somatic.maf.txt
is enlisted while what we have selected for recent deployment isstep4_gbm_liftover.aggregated.capture.tcga.uuid.maf2.4.migrated.somatic.maf
Thanks
Both files you have mentioned are same, with same mutations. Only difference is in the way they are annotated. First one is annotated using (
gbm_liftover*.maf
) Oncotator v0.5.25.0 whereas the second one (step4_gbm_*.maf
) is using Oncotator v1.0.0.0rc20. Please look at the comment lines at the beginning of the file (begining with#
). Except this both files have same mutations (22167). This is the difference between both versions of oncotator:Older version:
See how the some of the annotation columns change (like cosmic version used in both case). More information of oncotator here.
If I come just down to that page (https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files#TCGAMAFFiles-GBM:Glioblastomamultiforme) under the HNSC tissue
HNSC: Head and Neck squamous cell carcinoma
I get total 3 files for HNSC
and latest deploy date is for following file having 279:279 (tumor:normal) samples
if I select this file based on deploy date as mentioned by you earlier, then am I missing any information in terms of samples as 2nd file ("PR_TCGA_HNSC_PAIR_Capture_All_Pairs_QCPASS_v4.aggregated.capture.tcga.uuid.automated.somatic.maf") contains 509:567 samples; but latters deploy date is 26-MAR-14 .
So how to go about it.
Thanks
hi, monukmr98 Because tcga dcc came to an end, i couldn't access tcga data from broad institute 's "MAF+Dashboard" facility. files in this section So how to do except GDCGDC_TCGA Thanks in advance.