dNdScv Zero coding substitutions found in this dataset.
1
0
Entering edit mode
5.1 years ago
L. A. Liggett ▴ 130

I have been trying to run Inigo Martincorena's dNdScv on some exome seq data of mine and I am getting the error that zero coding substitutions are found in the dataset. It looks to me that having the data formatted differently from the example dataset is what has often caused this problem, but it looks like my data is formatted exactly the same and I'm still getting this error. My data is also aligned hg19 so I assume there should be no reference genome mismatch problems either.

Below is a sample of my data:

   sampleID chr    pos ref mut
1  Sample_1   1 808631   G   A
2  Sample_1   1 808922   G   A
3  Sample_1   1 808928   C   T
4  Sample_1   1 809876   A   G
5  Sample_1   1 865219   G   A
software error • 1.0k views
ADD COMMENT
0
Entering edit mode
5.1 years ago

This error is triggered when no coding substitutions are found in the input table. This often happens when inputting chromosome names different to those in the reference genome or when using only indels. None of these common problems appear to apply to your case (unless there are spaces before the chromosome names, which are not visible in your example). Are you sure that your input dataset contains any coding substitutions? For example, running your example table yields this error because these mutations occur in a non-coding region, although I appreciate that this is just a sample of 5 mutations.

ADD COMMENT
0
Entering edit mode

Thanks for the response Inigo. I will confirm that there are indeed coding mutations within the data, though my dataset in all is about 200 samples that have been exome sequenced and contain a bit over 10 million identified variants, so I assume there should be plenty of coding substitutions.

I realize this is not much information to troubleshoot...

ADD REPLY
0
Entering edit mode

So, I was not paying close attention to the console and it turns out the samples contained too many variants. In the notebook file the error only mentioned that no coding substitutions were found, but in the console it also mentions that too many variants are in each sample.

Using max_muts_per_gene_per_sample = Inf, max_coding_muts_per_sample = Inf solved the problem.

Thanks Inigo.

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6