Deciding on a Variant Caller
3
2
Entering edit mode
7.7 years ago

How should I determine which is the best variant caller to use for a cancer mutations dataset? I'm working with about 70% average tumor purity so it's not great.

MuSE performs very well with a similar dataset but at 90%+ purity so I'm not sure how it will perform with this data. It seems MuSE outperforms MuTect2 generally but I'm still unsure...

It seems that tumor purity confounds the results so I'm leaning towards using Varscan as it circumvents this as it doesn't use probabilistic framework (like bayesian stats) to detect variants and assess confidence in them however it struggles with sensitivity and fails to pick up somatic SNVs of low allelic fraction so that's a major problem.

I would really appreciate some advice on what to look at when deciding what to use...

next-gen sequencing mutation vcf variant call • 3.5k views
ADD COMMENT
0
Entering edit mode

MuSE outperforms MuTect2

Do you have a reference for this statement (I'm genuinely interested I don't mean to argue for or against it)

ADD REPLY
0
Entering edit mode

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1029-6

This is a good comparison, maybe I'm over generalising but the context I'm working with this seems to be the case.

ADD REPLY
3
Entering edit mode

Of course, that is a paper from the MuSE developers. Every variant caller that gets published claims to be better than all the previous ones.

ADD REPLY
3
Entering edit mode

It's pretty easy to outperform 5 other callers when you get to select the data set, the truth set, and the callers to compare against.

ADD REPLY
0
Entering edit mode

In fact I was hoping for a reference other than the authors' paper...

ADD REPLY
0
Entering edit mode

good point, I will get back to you if I find anything worthwhile.

ADD REPLY
4
Entering edit mode
7.7 years ago

Any modern caller should be "good enough" for most high-VAF calls in relatively pure samples (70% counts as relatively pure, in my book). When you need low-VAF data or are worried about tricky regions, my preferred approach is to run several callers, merge the calls, then do some post-filtering.

See Figure 4 and the supplement of our paper here for a comparison on one very-deeply sequenced tumor: http://www.cell.com/cell-systems/abstract/S2405-4712(15)00113-1 It does not include newer callers like Muse or Mutect2, but does show that different callers have different strengths and weaknesses.

ADD COMMENT
0
Entering edit mode

Thanks, this is excellent advice and really useful. One question, what would you consider as high-VAF?

ADD REPLY
3
Entering edit mode
7.7 years ago
igor 13k

Different variant callers will perform differently for different samples. You have to find one that works best for yours. This means calling variants with different callers and checking which ones can be successfully validated.

It sounds like you are basing your assessment based on tumor purity alone. It is a complicated measurement and it's often difficult to estimate the tumor fraction accurately. How confident are you in that 70% estimate? Additionally, tumors are heterogenous, so there is never actually a pure tumor. Tumor fraction aside, there are a lot of other factors, such as the quality of input DNA and sequencing depth, that will have a huge impact on your results.

ADD COMMENT
3
Entering edit mode
7.7 years ago
d-cameron ★ 2.9k

Have a look the results of the DREAM Somatic Mutation Calling Challenge [1]. There are a number of somatic-only callers that perform well on their benchmarks.

[1] http://dreamchallenges.org/project/icgc-tcga-dream-somatic-mutation-calling-challenge/

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6