Multi-Sample Somatic Mutation Calling
1
1
Entering edit mode
10.7 years ago
jockbanan ▴ 440

Hi all! I have 4 pairs of matched tumor/normal exome sequencing experiments. These are from 4 patients with the same type of tumor. I want to detect tumor-specific somatic mutations.

Looking at the documentation of SomaticSniper, VarScan, GATK somaticIndelDetector and other tools, it seems they all can only process one pair (one patient) at a time. I was just thinking if there is some tool capable of performing multi-sample analysis - utilizing the information from all the patients and reporting tumor-specific variants. I can always process these 4 pairs separately and then compare the results myself, but if some tool could use its statistic model to process multiple samples directly, I would like to try it. Do you have any suggestions? Thanks.

snp variant-calling somatic mutation variant cancer • 4.7k views
ADD COMMENT
0
Entering edit mode

What gains do you think will come from analyzing multiple samples concurrently? Though there are hotspots in a few driver genes, most cancer samples have a very unique somatic mutation profile.

ADD REPLY
0
Entering edit mode

I think there could be some value in eliminating false-positive calls by looking at their presence in unpaired normals. But not sure how much better an integrated analysis would be compared to a post-calling heuristic filter.

ADD REPLY
0
Entering edit mode

This is true, but to really get a reliable feel for false-positive sites from the normals, I'd want way more than 4 samples.

ADD REPLY
0
Entering edit mode

Exactly, false-positives are the reason. And, well, yes, I would also like to have way more samples...

ADD REPLY
0
Entering edit mode

Yes, definitely value in this. Probably best to stick to downstream tools. You might also want to think about maintaining some sort of "Master" VCF with data about all samples you collect as, for instance, a merged VCF. You can then use tabix and other tools to quickly see the number of times specific mutations were seen in your normal samples for instance and apply that data to downstream heuristics and filters as appropriate.

ADD REPLY
0
Entering edit mode

Consider leveraging publicly available data from TCGA, or even 1000 genomes if you're just looking at the normals anyway.

ADD REPLY
1
Entering edit mode
10.7 years ago
DG 7.3k

I think it is generally a limitation of both computational overhead (you are already comparing two datasets in a run) and not wanting to deal with potential complexities of parsing multi-sample matched data. That said there are plenty of downstream tools for merging, comparing, and annotating vcf files to get to the shared somatic variants. snpEff and GEMINI for instance are great tools for annotating and data mining your results.

ADD COMMENT
0
Entering edit mode

Thanks for reply, I'll stick to downstream tools then.

ADD REPLY

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6