Hi all! I have 4 pairs of matched tumor/normal exome sequencing experiments. These are from 4 patients with the same type of tumor. I want to detect tumor-specific somatic mutations.
Looking at the documentation of SomaticSniper, VarScan, GATK somaticIndelDetector and other tools, it seems they all can only process one pair (one patient) at a time. I was just thinking if there is some tool capable of performing multi-sample analysis - utilizing the information from all the patients and reporting tumor-specific variants. I can always process these 4 pairs separately and then compare the results myself, but if some tool could use its statistic model to process multiple samples directly, I would like to try it. Do you have any suggestions? Thanks.
What gains do you think will come from analyzing multiple samples concurrently? Though there are hotspots in a few driver genes, most cancer samples have a very unique somatic mutation profile.
I think there could be some value in eliminating false-positive calls by looking at their presence in unpaired normals. But not sure how much better an integrated analysis would be compared to a post-calling heuristic filter.
This is true, but to really get a reliable feel for false-positive sites from the normals, I'd want way more than 4 samples.
Exactly, false-positives are the reason. And, well, yes, I would also like to have way more samples...
Yes, definitely value in this. Probably best to stick to downstream tools. You might also want to think about maintaining some sort of "Master" VCF with data about all samples you collect as, for instance, a merged VCF. You can then use tabix and other tools to quickly see the number of times specific mutations were seen in your normal samples for instance and apply that data to downstream heuristics and filters as appropriate.
Consider leveraging publicly available data from TCGA, or even 1000 genomes if you're just looking at the normals anyway.