I'm a beginer with MutSigCV, I had ran it with example files successfully.But I really don't konw how to produce the maf,coverage and covariates table from raw sequencing data,which I have 40 CRC whole-genome(cancer-normal),that want to detect mutation significant genes.Which software should I use step by step? and the pipeline to use MutSigCV?
Thank you for your answer. I'm sorry to tell you I can't visit the script to generate maf file you provided.So I have two questions:
Thanks again for your help.
I assume that you have fastq files but nothing else ? How comfortable are you in using Unix ?
I only have the fastq files, and I use Linux to work. How should I do next?
okay. Since you sound like you have just started dealing with ngs data , I would suggest you start with reading about some basic file formats that you will commonly encounter, like - fastq and sam format. You will also be using a lot of tools and some of them are essential (like samtools).
Anyways first step for you to do is align your fastq files to reference genome. Now there many aligners but most commonly used one is bwa (at-least for WGS and WES). This will generate bam files, which you will use for detecting somatic variants.
There is a great tools which does all this for you in 2 or 3 commands, - check out speedseq
Thank you for your reply, poisonAlien. I had call snp for another data set(51 cancer-normal samples) use BWA and GATK for each sample.
Next step, should I use VEP to annotation?
and then, how can I concatenate these file to a vcf and should I use vcf2maf?
Hi, You need to give more info. As far as I know, GATK does not call somatic variants. Maybe try using more sophisticated somatic callers such as VarScan2 or MuTect. Then you annotate them using annovar or vep (I would suggest annovar since its simple and easy to use). After annotation, usual protocol is to remove those variants commonly found in general populations (such as those found in 1000 genome project). Once you do this, what you left with are candidate somatic variants, which you will use for MutSig.
Thanks for your reply again.I have 102 vcf file,which called snp for each sample,it isn't somatic snp,but for each-sample's snp.Can I use these file for MutSigCV after using Annovar? Maybe I should do some other work?
there is no point is using these for mutsig. These are not somatic variants (present in cancer sample but not in matched normal). You need to identify somatic variants first.
Thanks a lot,I learn so much from you these days. SpeedSeq is good tool to analysis sequencing data.Maybe now I will learn SpeedSeq and MuTect first and if I have question,I will ask you, thank you very much.
Dear poisonAlien,
I take your advice to use SpeedSeq tool to call somatic snp,but when I use speedseq somatic command,I don't konw how to create tumor or normal bam file from raw WGS samples,respectively.Can you help me?Thanks...
Hello,
I stumbled across this thread while searching for something similar.
So if I understood you correctly, after SNP annotation (I'm using Haplotype Caller), I'm left with a VCF file, from which I need to remove all the annotated SNPs. This leaves me with only un-annotated variants (which are the putative somatic variants present in the tumor samples). This will be my input into MutSig. Am I right? Appreciate your reply. Thanks!
HI, Alien
I have some difficulties in using MutSig. I am looking for solutions and find your answer here. I think you must be an expert in bioinformatics. Could you help me?
I don't have the coverage file, so I use the full coverage file provided by MutSig. I also follow the guide by using 6 arguments to run MutSig like this:
But the program still tells me it cannot finish the categ discovery:
though I include the chr files.
Do you know how to solve this problem?
Thank you very much!