If they all gave the same answer, there wouldn't need to be so many of them. They each have different strengths and weaknesses, which is a feature, not a bug; it means that for a given type of analysis, one or another (or a combination of more than one) is the right tool for the job.
I agree with Chris. More specifically, I would recommend filtering the VarScan results.
For example, try requiring a minimum of 10 reads total coverage (in both tumor and normal), minimum of 4 reads with the variant in the tumor sample, minimum of 30% tumor allele frequency, and maximum of 5% normal allele frequency.
I recently tried using those parameters for some somatic VarScan variants (for WGS data) and I thought they yielded decent results (although I thought the Strelka 'passed' variants were better for small indels).
In the case of single-sample analysis, there are benchmarks for justifying a similar set of parameters in this paper:
If I'm looking for a rare subclone that contributes to therapy resistance, setting a 30% threshold is going to be a bad idea. Similarly, if I have a impure tumor, there may be nothing above 30%. Or if I have 500x coverage, we can expect to reliably detect variants at far lower VAFs. The point here is, that the parameters you use should be chosen intelligently, based on the details of your experiment.
True - that is a poor choice of parameters for studying subclones.
However, I would expect those parameters to identify variants that show a greater concordance rate with the Strelka / MuTect variant lists, if the user is interested in defining a conservative set of somatic variants.
That actually sounds about right, depending on the parameters that you used. Some notes:
I'd expect the concordance to be fairly high for high-VAF variants, but when you get down to rare subclonal, variants, where only a small amount of read support exists, callers can handle those cases very differently.
Do you want high specificity or sensitivity - each caller has made it's own set of tradeoffs between the two, and figuring out the 'sweet spot' for you will depend on your experiment.
Intersecting the data will generally improve specificity (I'd expect 90%+ if something is called by all three callers), but will lose you a lot of the true positives at low VAF that may only be picked up by one of the statistical models. Again, just depends on what you're hoping to accomplish with your experiment.
I respond here since I can't add comment or reply somehow. The button is gray and unavailable.
Thanks to your useful help I found each caller had its own drawback and preference for mutation detection in tumor tissues. And all of them need optimization of parameters and futher filtration. By manual check using IGV, I discoverd that MuTect genenally discovered most reliable candidates, while quite a few Strelka calls are haunted with low-quality mapping, while Varscan2 not very sensitive to low VAF calls.
Yes, Mutect is tuned to be reasonably sensitive and highly specific. You could certainly do worse if you're going to use a single caller approach. FWIW, intersecting callers in intelligent ways can give better results, but requires thinking carefully about how to do so.
If they all gave the same answer, there wouldn't need to be so many of them. They each have different strengths and weaknesses, which is a feature, not a bug; it means that for a given type of analysis, one or another (or a combination of more than one) is the right tool for the job.