Mutations Not Recognized in MuSiC
1
3
Entering edit mode
9.3 years ago

Hi,

I am trying to use MuSiC to analyse mutation rates in novel, non-coding genes. I am able to successfully run the relevant commands in MuSiC and the coverage statistics look correct, but the results show no mutations in any genes (which I know isn't true). My guess is that there is probably some formatting issue with the .maf file containing somatic mutations, which is causing the output of the "bmr calc-bmr" to be inaccurate.

Here are the first few lines of my .maf file

#version 2.3
Hugo_Symbol    Entrez_Gene_Id    Center    NCBI_Build    Chromosome    Start_Position    End_Position    Strand    Variant_Classification    Variant_Type    Reference_Allele    Tumor_Seq_Allele1    Tumor_Seq_Allele2    dbSNP_RS    dbSNP_Val_Status    Tumor_Sample_Barcode    Matched_Norm_Sample_Barcode    Match_Norm_Seq_Allele1    Match_Norm_Seq_Allele2    Tumor_Validation_Allele1    Tumor_Validation_Allele2    Match_Norm_Validation_Allele1    Match_Norm_Validation_Allele2    Verification_Status    Validation_Status    Mutation_Status    Sequencing_Phase    Sequence_Source    Validation_Method    Score    BAM_File    Sequencer    Tumor_Sample_UUID    Matched_Norm_Sample_UUID
Unknown    0    genome.wustl.edu    GRCh37-lite    1    322115    322115    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    328193    328193    +    Targeted_Region    SNP    A    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    A    A    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    384901    384901    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    390657    390657    +    Targeted_Region    SNP    A    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    A    A    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    404577    404577    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354

Here are the music commands that I am using:

genome music bmr calc-covg --bam-list /path/to/bam.list --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed
genome music bmr calc-bmr --bam-list /tcga/users/cdwarden/wgs/BRCA/MuSiC/bam.list --maf-file /path/to/somatic.maf --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed
genome music smg --gene-mr-file /path/to/gene_mrs --output-file /path/to/smgs

I have also tried adding the transcript ID to the first mutation in the .maf file (so that I would expect to see one mutation in the smgs_detailed file), but that gene still is reported to have 0 mutations.

Can you please help me troubleshoot this issue?

Thanks,
Charles

maf mutation music DNA-Seq • 3.2k views
ADD COMMENT
0
Entering edit mode

I think its because Hugo_Symbols are Unknown in your maf file.

ADD REPLY
0
Entering edit mode

I changed the transcript ID for the first mutation to match the corresponding gene, and that gene was still reported to not have any mutations. Also, I used "Unknown" (instead of NA, etc.) because that is what I thought the .maf format required for such genes.

Is there something else that should be changed besides "Unknown"?

ADD REPLY
0
Entering edit mode

I have used this program a while back, and what I understand is, the gene names in maf file must match the gene names in your roi file, which you use for calc-covg function. Also, it will skip all those silent variants in Variant_Classification column ; unless you mention not skip so. In your example, I see that most of the variants have Variant_Classification set to Unknown, which might be the one reason.

ADD REPLY
0
Entering edit mode

This is correct. The Hugo_Symbol needs to be properly defined. These calls seem to be annotated incorrectly as Targeted_Region, which is something that MuSiC skips as intergenic. Considering that the MAF says WGS, these might be legitimately intergenic calls. Check in a genome browser.

ADD REPLY
0
Entering edit mode

Yes - I want to characterize mutation rates in ncRNAs (most of which will not be covered in exome designs, and many of which are novel).

What would you recommend for the Variant_Classification and Variant_Type, in this situation?

ADD REPLY
1
Entering edit mode

You can refer to the documentation here. When you run music bmr calc-bmr, enable the option --noskip-non-coding. You'll still need to annotate each variant with a symbol that it can match back to a region in your ROI file. MAF format is not as detailed in distinguishing between ncRNA types. Variant_Classification will always say RNA. But name the genes differently using annotators like VEP, and you should be fine. Have you tried the maf2maf tool?

ADD REPLY
0
Entering edit mode

Thank you very much !!

ADD REPLY
0
Entering edit mode

This is also something i wonder how to prioritize such intergenic/intronic SNVs.

ADD REPLY
4
Entering edit mode
9.2 years ago

Thanks to Cyriac, I found the solution is as follows:

1) Set Variant_Classification to RNA

2) Use the "--noskip-non-coding" option when running music bmr calc-bmr

ADD COMMENT

Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6