MutSigCV input files
1
2
Entering edit mode
7.7 years ago
haiying.kong ▴ 360

On the document for MutSigCV: http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/MutSigCV

I know you can use the datasets come along with the software, but it is not going to be the best if you can provide details from your own data. So I am trying to provide information from myown data.

But this is very confusing. How are these defined?

  1. CpG transitions

  2. CpG transversions

  3. C:G transitions

  4. C:G transversions

  5. A:T transitions

  6. A:T transversions

  7. null+indel mutations

7 is clear. (1) How is CpG defined? Is it a CpG as long as ref_allele is C/G and it has adjacent nucleotide G/C, or it has to be CpG island? (2) What are C:G and A:T?

If I look at the data set comes along with software:

exome_full192.coverage.txt

gene effect categ coverage A1BG noncoding A(A->C)A 12

A1BG noncoding A(A->C)C 14

A1BG noncoding A(A->C)G 15

A1BG noncoding A(A->C)T 9

A1BG noncoding A(A->G)A 12

A1BG noncoding A(A->G)C 14

A1BG noncoding A(A->G)G 15

A1BG noncoding A(A->G)T 9

the categ column is not consistent with how it is defined in other input datasets.

What is coverage here? Is this tumor alternative count? The documentation is so confusing.

software • 5.0k views
ADD COMMENT
0
Entering edit mode
7.7 years ago
haiying.kong ▴ 360

To get categ for the mutations, I interpreted the terms as CpG: reference allele C/G with adjacent G/C. C:G: reference allele base pair CG A:T: reference allele base pair AT

For the coverage data, I removed all mutations with value "null" for "effect". As the instruction on the website: http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/MutSigCV I made wide table for coverage, each patient takes one column for coverage information. Then I got error message that says something like table is too wide..... (machine is dead, I cannot copy the error) So I tried to make it a long table with the columns: gene, effect, categ, patient, coverage. The machine crashed, and I have to do when admin is back on Monday.


If any one succeeded with using your own coverage data not the one that comes along with the software, could you please tell me what the coverage data file should look like? (1) column names (2) Which number do you use for coverage? Is it the count of alternative allele in tumor? (2) How do you treat duplicates? For same gene, effect, categ, patient, there can be multiple rows. Do you take max for the coverage?


ADD COMMENT
0
Entering edit mode

There is a tool CovGen which can be used to generate the sample/experiment specific coverage files as required by MutSigCV. I have not used this tool yet, but I will try it soon. In the meanwhile, if you have figured out the way to run the MutSigCV, please post any suggestions which might be useful for others.

ADD REPLY

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6