Entering edit mode
5.6 years ago
rajesh
▴
60
Hi everyone
I have downloaded the "Masked Somatic mutation" file for Pancreatic adenocarcinoma form TCGA.
This file contains information regarding somatic mutation present in the tumor sample when compared to the reference and matched normal sample.
My question is that
- Since at the beginning of each file, there is Gene name and in the sixth and seventh column, there is coordinate. So the mutation type present in MAF file is it only for protein-coding part or thy corresponds to non-protein coding part also.
- I have to map mutation onto the non-coding part of the genome, especially the enhancer region.
- If non-coding mutation is not present in MAF file, then where to download the mutation file.
I am attaching the sample file also.
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID HGVSc HGVSp HGVSp_Short Transcript_IDExon_Number t_depth t_ref_count t_alt_count n_depth n_ref_count n_alt_count all_effects Allele Gene Feature Feature_type One_Consequence Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE TRANSCRIPT_STRAND SYMBOL SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen EXON INTRON DOMAINS GMAF AFR_MAF AMR_MAF ASN_MAF EAS_MAF EUR_MAF SAS_MAF AA_MAEA_MAF CLIN_SIG SOMATIC PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL HGVS_OFFSET PHENO MINIMISED ExAC_AF ExAC_AF_Adj ExAC_AF_AFR ExAC_AF_AMR ExAC_AF_EAS ExAC_AF_FIN ExAC_AF_NFE ExAC_AF_OTH ExAC_AF_SAS GENE_PHENO FILTER CONTEXT src_vcf_id tumor_bam_uuid normal_bam_uuid case_id GDC_FILTER COSMIC MC3_Overlap GDC_Validation_Status
BCAN 63827 BI GRCh38 chr1 156651635 156651635 + Missense_Mutation SNP G G A rs770559603 byFrequency TCGA-2L-AAQJ-01A-12D-A397-08 TCGA-2L-AAQJ-11A-11D-A39A-08 Somatic Illumina HiSeq 2000 de369dbb-736e-4970-998d-a0470029653f cb472e98-8801-40f4-9c2c-6ebb03b41c40 c.1243G>A p.Gly415Arg p.G415R ENST00000329117 7/14 623 524 99 103 BCAN,missense_variant,p.G415R,ENST00000329117,NM_021948.4,c.1243G>A,MODERATE,YES,tolerated(0.16),benign(0.013),1;BCAN,missense_variant,p.G415R,ENST00000361588,NM_198427.1,c.1243G>A,MODERATE,,tolerated(0.21),benign(0.022),1;BCAN,downstream_gene_variant,,ENST00000424639,,,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000457777,,,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000441358,,,MODIFIER,,,,1;RP11-284F21.7,intron_variant,,ENST00000448869,,n.111-4481C>T,MODIFIER,YES,,,-1;BCAN,3_prime_UTR_variant,,ENST00000479949,,c.*477G>A,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000491823,,,MODIFIER,,,,1 A ENSG00000132692 ENST00000329117 Transcript missense_variant missense_variant 1579/3466 1243/2736 415/911 G/R Gga/Aga rs770559603 1 1 BCAN HGNC HGNC:23059 protein_coding YES CCDS1149.1 ENSP00000331210 Q96GW7 UPI000006F0E9 NM_021948.4 tolerated(0.16) benign(0.013) 7/14 PROSITE_profiles:PS50313 MODERATE 1 SNV 1 1 5.766e-05 5.822e-05 0 0 0 0 0.0001059 0 0 panel_of_normals ACGGAGGAGGT bd948014-be86-4c11-8061-a96b8c73fa83 9f9d28db-babf-4851-a32f-f00f97c523f8 81dd6131-efa9-4bad-9539-93e15b8100a6 f96ab3fe-bb11-4585-a35e-52d400e55ab7 gdc_pon True Unknown`
Thanks for the reply, but I still do not get the answer, is MAF file is for WES mutation or are they for WGS mutation. please clarify this.
Could you also tell me that how to download WGS mutation files from the GDC TCGA.
A MAF file is just a list of mutations found in a given sample(s). You can find mutations in either WES or WGS data, so you can have a MAF file (or a VCF file) for either type of experiment. For WGS, you are looking at mutations across the whole genome, so you will likely have more mutations listed in the MAF compared to WES. To find WGS projects through the GDC, you can click on Projects or Repository, look under "Experimental Strategies," and check the box "WGS."
But the WGS file are not open access, these are protected and controlled. Am i right.
Yes, any potentially identifiable data in GDC falls under controlled access. From GDC's documentation:
You can find information on accessing controlled data here: https://gdc.cancer.gov/node/205/