I am pretty new to the cancer field and now try to run MuSiC for WashU to detect significantly mutated genes in our exome-seq samples. Since I am not sure if I am doing the right things I would appreciate if someone can take a look at what tools I am using. Probable there is a lot room for improvements. So here are the steps I am running to get significantly mutated genes:
FastQ files
BWA map to reference genome
Mark duplicates by picard tools
Use GATK to InDel realign and base recalibration
samtools mpileup for each normal and tumor sample
Somatic variation calling. I use VarScan somatic for this.
snpEff to annotate the mutations and their consequences
Convert VCF file from previous step to MAF file and filter for consequences that very likely change the gene function (e.g. missense)
Combine filtered MAF files from all samples
Run MuSiC bmr calc-covg
Run MuSiC bmr calc-bmr
Run MuSiC smg
Especially step 8 and 9 seem to be tricky. I am not sure if I am missing some straight forward solution from calling mutations to MAF files.
Looks good. In step 8, do not "filter for consequences that very likely change the gene function". That's too stringent, and you might miss something novel or non-coding. Rather, reduce the noise from false-positive variants using tools like this. MuSiC's calc-bmr step will exclude Silent (synonymous) SNVs by default. Steps 7 and 8 can be solved using this script. And step 9 shouldn't be tricky... simply concatenate the MAFs.
If any of the resulting SMGs (significantly mutated genes) don't make sense, then take a closer look at their variants. This is a good way to weed out recurrent false-positives - usually germline calls that are incorrectly called somatic for reasons like amplification bias, or artifacts in the reference sequence like misplaced paralogs. You can also try calc-bmr with an option called --separate-truncations, which prioritizes truncating variants in the math. "Truncations" include frame-shift, nonsense, and splice-site mutations.
Sorry for asking this as a comment. I have data from 50 patients, all are from targeted capture of around 120 genes. My question is.. is it ok to use MuSic for targeted capture data ? I guess its would be biased to study smg's from targeted capture, just wanted an opinion on this.
Yea, that's totally fine. MuSiC's SMG test was meant to shortlist genes significantly altered in exome-seq, so that you could then target them for capture on larger cohorts. But when your ROI file (regions of interest) lists only about 120 genes, then the SMG test will at least help you rank them in order of significance.
I have mutation calls on both, set of matched tumor-normal pairs and tumor-only samples (using panel of normal approach). I doubt if music2 can be used for calling mutational significance for tumor-only samples but if you have any suggestions to do so otherwise or use comparable tools, like oncodriveFM or others, that would be of help.
Thanks,
Samir
ADD REPLY
• link
updated 3.1 years ago by
Ram
44k
•
written 6.6 years ago by
Samir
▴
210
Hi Cyriac,
Sorry for asking this as a comment. I have data from 50 patients, all are from targeted capture of around 120 genes. My question is.. is it ok to use MuSic for targeted capture data ? I guess its would be biased to study smg's from targeted capture, just wanted an opinion on this.
Thank you.
Yea, that's totally fine. MuSiC's SMG test was meant to shortlist genes significantly altered in exome-seq, so that you could then target them for capture on larger cohorts. But when your ROI file (regions of interest) lists only about 120 genes, then the SMG test will at least help you rank them in order of significance.
Hi Cyriac,
I have mutation calls on both, set of matched tumor-normal pairs and tumor-only samples (using panel of normal approach). I doubt if music2 can be used for calling mutational significance for tumor-only samples but if you have any suggestions to do so otherwise or use comparable tools, like oncodriveFM or others, that would be of help.
Thanks,
Samir