Question

Multi-Sample Somatic Mutation Calling

1

Entering edit mode

10.7 years ago

jockbanan ▴ 440

Hi all! I have 4 pairs of matched tumor/normal exome sequencing experiments. These are from 4 patients with the same type of tumor. I want to detect tumor-specific somatic mutations.

Looking at the documentation of SomaticSniper, VarScan, GATK somaticIndelDetector and other tools, it seems they all can only process one pair (one patient) at a time. I was just thinking if there is some tool capable of performing multi-sample analysis - utilizing the information from all the patients and reporting tumor-specific variants. I can always process these 4 pairs separately and then compare the results myself, but if some tool could use its statistic model to process multiple samples directly, I would like to try it. Do you have any suggestions? Thanks.

snp variant-calling somatic mutation variant cancer • 4.7k views

ADD COMMENT • link updated 10.7 years ago by DG 7.3k • written 10.7 years ago by jockbanan ▴ 440

0

Entering edit mode

What gains do you think will come from analyzing multiple samples concurrently? Though there are hotspots in a few driver genes, most cancer samples have a very unique somatic mutation profile.

ADD REPLY • link 10.7 years ago by Chris Miller 22k

0

Entering edit mode

I think there could be some value in eliminating false-positive calls by looking at their presence in unpaired normals. But not sure how much better an integrated analysis would be compared to a post-calling heuristic filter.

ADD REPLY • link 10.7 years ago by Christian ★ 3.1k

0

Entering edit mode

This is true, but to really get a reliable feel for false-positive sites from the normals, I'd want way more than 4 samples.

ADD REPLY • link 10.7 years ago by Chris Miller 22k

0

Entering edit mode

Exactly, false-positives are the reason. And, well, yes, I would also like to have way more samples...

ADD REPLY • link 10.7 years ago by jockbanan ▴ 440

0

Entering edit mode

Yes, definitely value in this. Probably best to stick to downstream tools. You might also want to think about maintaining some sort of "Master" VCF with data about all samples you collect as, for instance, a merged VCF. You can then use tabix and other tools to quickly see the number of times specific mutations were seen in your normal samples for instance and apply that data to downstream heuristics and filters as appropriate.

ADD REPLY • link 10.7 years ago by DG 7.3k

0

Entering edit mode

Consider leveraging publicly available data from TCGA, or even 1000 genomes if you're just looking at the normals anyway.

ADD REPLY • link 10.7 years ago by Chris Miller 22k

score 1 · Answer 1 · 2014-03-14

1

Entering edit mode

10.7 years ago

DG 7.3k

I think it is generally a limitation of both computational overhead (you are already comparing two datasets in a run) and not wanting to deal with potential complexities of parsing multi-sample matched data. That said there are plenty of downstream tools for merging, comparing, and annotating vcf files to get to the shared somatic variants. snpEff and GEMINI for instance are great tools for annotating and data mining your results.