Question

How to use MutSigCV correctly

0

Entering edit mode

9.5 years ago

bioxujintian • 0

I'm a beginer with MutSigCV, I had ran it with example files successfully.But I really don't konw how to produce the maf,coverage and covariates table from raw sequencing data,which I have 40 CRC  
whole-genome(cancer-normal),that want to detect mutation significant genes.Which software should I use step by step? and the pipeline to use MutSigCV?

software error • 13k views

ADD COMMENT • link updated 7.6 years ago by achristofferson ▴ 10 • written 9.5 years ago by bioxujintian • 0

Ram · Answer 1 · 2015-11-04

4

Entering edit mode

9.5 years ago

poisonAlien ★ 3.2k

You don't really need coverage and covariates table (mutsig comes with some of these files, in case you don't have). But maf file is necessary. Read about maf specification here.

For mutsig, 9 fields are necessary.

Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele1, Tumor_Seq_Allele2, Variant_Classification, Tumor_Sample_Barcode

You can use this script to generate maf file.

ADD COMMENT • link updated 6.5 years ago by Ram 45k • written 9.5 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thank you for your answer. I'm sorry to tell you I can't visit the script to generate maf file you provided.So I have two questions:

How can I produce maf table from raw sequencing data(cancer-normal),should I use BWA,GATK and other software?Can you tell me the pipeline to produce the maf table.
How can I get the coverage and covariates table?

Thanks again for your help.

ADD REPLY • link updated 6.5 years ago by Ram 45k • written 9.5 years ago by bioxujintian • 0

0

Entering edit mode

I assume that you have fastq files but nothing else ? How comfortable are you in using Unix ?

ADD REPLY • link 9.5 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

I only have the fastq files, and I use Linux to work. How should I do next?

ADD REPLY • link 9.5 years ago by bioxujintian • 0

0

Entering edit mode

okay. Since you sound like you have just started dealing with ngs data , I would suggest you start with reading about some basic file formats that you will commonly encounter, like - fastq and sam format. You will also be using a lot of tools and some of them are essential (like samtools).

Anyways first step for you to do is align your fastq files to reference genome. Now there many aligners but most commonly used one is bwa (at-least for WGS and WES). This will generate bam files, which you will use for detecting somatic variants.

There is a great tools which does all this for you in 2 or 3 commands, - check out speedseq

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thank you for your reply, poisonAlien. I had call snp for another data set(51 cancer-normal samples) use BWA and GATK for each sample.

Next step, should I use VEP to annotation?

and then, how can I concatenate these file to a vcf and should I use vcf2maf?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by bioxujintian • 0

0

Entering edit mode

Hi, You need to give more info. As far as I know, GATK does not call somatic variants. Maybe try using more sophisticated somatic callers such as VarScan2 or MuTect. Then you annotate them using annovar or vep (I would suggest annovar since its simple and easy to use). After annotation, usual protocol is to remove those variants commonly found in general populations (such as those found in 1000 genome project). Once you do this, what you left with are candidate somatic variants, which you will use for MutSig.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thanks for your reply again.I have 102 vcf file,which called snp for each sample,it isn't somatic snp,but for each-sample's snp.Can I use these file for MutSigCV after using Annovar? Maybe I should do some other work?

ADD REPLY • link 9.5 years ago by bioxujintian • 0

0

Entering edit mode

there is no point is using these for mutsig. These are not somatic variants (present in cancer sample but not in matched normal). You need to identify somatic variants first.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thanks a lot,I learn so much from you these days. SpeedSeq is good tool to analysis sequencing data.Maybe now I will learn SpeedSeq and MuTect first and if I have question,I will ask you, thank you very much.

ADD REPLY • link 9.5 years ago by bioxujintian • 0

0

Entering edit mode

Dear poisonAlien,

I take your advice to use SpeedSeq tool to call somatic snp,but when I use speedseq somatic command,I don't konw how to create tumor or normal bam file from raw WGS samples,respectively.Can you help me?Thanks...

ADD REPLY • link 9.5 years ago by bioxujintian • 0

0

Entering edit mode

Hello,

I stumbled across this thread while searching for something similar.

So if I understood you correctly, after SNP annotation (I'm using Haplotype Caller), I'm left with a VCF file, from which I need to remove all the annotated SNPs. This leaves me with only un-annotated variants (which are the putative somatic variants present in the tumor samples). This will be my input into MutSig. Am I right? Appreciate your reply. Thanks!

ADD REPLY • link 7.9 years ago by apuhegde ▴ 20

0

Entering edit mode

HI, Alien
I have some difficulties in using MutSig. I am looking for solutions and find your answer here. I think you must be an expert in bioinformatics. Could you help me?
I don't have the coverage file, so I use the full coverage file provided by MutSig. I also follow the guide by using 6 arguments to run MutSig like this:

MutSigCV('F:LUSC.MutSigCV.input.data.v1.0\LUSC.maf', 'F:exome_full192.coverage.txt', 'F:LUSC.MutSigCV.input.data.v1.0\gene.covariates.txt', 'F:LUSC.MutSigCV.input.data.v1.0\output.txt', 'F:mutation_type_dictionary_file.txt', 'F:chr_files_hg19')

But the program still tells me it cannot finish the categ discovery:

NOTE: unable to perform category discovery, because no chr_files available.Will use two categories: missense and null+indel

though I include the chr files.
Do you know how to solve this problem?
Thank you very much！

ADD REPLY • link updated 6.5 years ago by Ram 45k • written 7.7 years ago by shiyang93 ▴ 70

score 0 · Answer 2 · 2017-08-28

0

Entering edit mode

7.6 years ago

achristofferson ▴ 10

CovGen Can help with making a capture/target specific coverage table. The default coverage table that MutSig provides may not be appropriate for all cohorts.

ADD COMMENT • link 7.6 years ago by achristofferson ▴ 10

0

Entering edit mode

Thank you for the pointer to CovGen. I am working on the canine data and I was able to successfully generate the MutSigCV formatted coverage file using CovGen.

Do you have any directions to generate the gene.covariates.txt required by MutSig. Also, my data is WGS and pointers regarding this are well appreciated.

ADD REPLY • link 6.8 years ago by sutturka ▴ 190

0

Entering edit mode

Could you please tell me how to generate coverage files using CovGen?Thank you very much！

ADD REPLY • link 5.4 years ago by 220173404 • 0