Question

Why different somatic mutation callers agree so poorly on tumor sample pairs?

5

Entering edit mode

9.9 years ago

heartheone ▴ 70

I used Mutect Strelka and Varscan2 on multiple nomal-tumor pairs of sequencing data. Default parameters and recommended filtration were applied.

To my disappointment, they had really bad concordance. Only about 10%-30% of calls given by a caller could be hit by other tools.

Do you have any suggestion for that? I would be very grateful for any help!

Robert

snp next-gen-sequencing software-error • 6.3k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by heartheone ▴ 70

1

Entering edit mode

If they all gave the same answer, there wouldn't need to be so many of them. They each have different strengths and weaknesses, which is a feature, not a bug; it means that for a given type of analysis, one or another (or a combination of more than one) is the right tool for the job.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Jonathan Dursi ▴ 270

Ram · Answer 1 · 2014-12-23

4

Entering edit mode

9.9 years ago

Charles Warden 8.3k

I agree with Chris. More specifically, I would recommend filtering the VarScan results.

For example, try requiring a minimum of 10 reads total coverage (in both tumor and normal), minimum of 4 reads with the variant in the tumor sample, minimum of 30% tumor allele frequency, and maximum of 5% normal allele frequency.

I recently tried using those parameters for some somatic VarScan variants (for WGS data) and I thought they yielded decent results (although I thought the Strelka 'passed' variants were better for small indels).

In the case of single-sample analysis, there are benchmarks for justifying a similar set of parameters in this paper:

https://peerj.com/articles/600/

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Charles Warden 8.3k

1

Entering edit mode

If I'm looking for a rare subclone that contributes to therapy resistance, setting a 30% threshold is going to be a bad idea. Similarly, if I have a impure tumor, there may be nothing above 30%. Or if I have 500x coverage, we can expect to reliably detect variants at far lower VAFs. The point here is, that the parameters you use should be chosen intelligently, based on the details of your experiment.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Chris Miller 22k

0

Entering edit mode

True - that is a poor choice of parameters for studying subclones.

However, I would expect those parameters to identify variants that show a greater concordance rate with the Strelka / MuTect variant lists, if the user is interested in defining a conservative set of somatic variants.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Charles Warden 8.3k

Ram · Answer 2 · 2014-12-23

3

Entering edit mode

9.9 years ago

Chris Miller 22k

That actually sounds about right, depending on the parameters that you used. Some notes:

I'd expect the concordance to be fairly high for high-VAF variants, but when you get down to rare subclonal, variants, where only a small amount of read support exists, callers can handle those cases very differently.
Do you want high specificity or sensitivity - each caller has made it's own set of tradeoffs between the two, and figuring out the 'sweet spot' for you will depend on your experiment.
Intersecting the data will generally improve specificity (I'd expect 90%+ if something is called by all three callers), but will lose you a lot of the true positives at low VAF that may only be picked up by one of the statistical models. Again, just depends on what you're hoping to accomplish with your experiment.

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Chris Miller 22k

1

Entering edit mode

Really thanks for your advice!!

I respond here since I can't add comment or reply somehow. The button is gray and unavailable.

Thanks to your useful help I found each caller had its own drawback and preference for mutation detection in tumor tissues. And all of them need optimization of parameters and futher filtration. By manual check using IGV, I discoverd that MuTect genenally discovered most reliable candidates, while quite a few Strelka calls are haunted with low-quality mapping, while Varscan2 not very sensitive to low VAF calls.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by heartheone ▴ 70

0

Entering edit mode

Yes, Mutect is tuned to be reasonably sensitive and highly specific. You could certainly do worse if you're going to use a single caller approach. FWIW, intersecting callers in intelligent ways can give better results, but requires thinking carefully about how to do so.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Chris Miller 22k