Question

Mixed opinions on somatic variant calling method

1

Entering edit mode

9.0 years ago

umn_bist ▴ 390

I have sought previous posts (post1, post2) on how to call somatic variants and it seems the general practice is to intersect multiple callers to insure that low VAF mutations are being called.

My approach was to samtools, MuTect2, SomaticSniper, VarScan2 but I found an interesting post saying that as long as read placements are perfect, any caller suffices (even samtool mpileup). I should mention that I am working with RNA seq of cancer samples (matched with normal).

In general, my view is as long as read placement is perfect, even naive methods work sufficiently well... To me, the simplest yet most effective strategy is to use two distinct alignment algorithms, such as bwa and bwa-sw, which have distinct error modes. You only consider mutations shared between the two alignments... Another complication is structural variations, in which I am less experienced. In some sense, false mutations caused by structural variations are still indication of something different between normal and tumor... In all, I think you do not need to worry about which software to use for detecting somatic mutations - anything reasonable is fine. You should pay more attention to mismapping and structural variations.

First, does read placement mean how well aligners align our samples to the reference? Does working with RNA-Seq introduce higher error rates in read placements? Has the consensus changed or is intersecting multiple callers still recommended? Thank you very much for your help.

RNA-Seq somatic mutation variant caller • 3.3k views

ADD COMMENT • link 9.0 years ago by umn_bist ▴ 390

Ram · Accepted Answer · 2016-01-26

3

Entering edit mode

9.0 years ago

Chris Miller 22k

If someone tells you that somatic mutation calling is easy or a solved problem, they have never really tried to do somatic mutation calling.

There are a host of issue to contend with - sequencing artifacts, problems with the reference, differential coverage, (and yes, mismappings are common!), etc. Your approach of using mutliple callers seems sensible. The tricky part is figuring out how to combine them. Straight intersections will give you high specificity, but low sensitivity. Unioning the three will result in the opposite. A more nuanced approach has been been explored by recent tools like somaticseq. I haven't used that one in particular, but I am convinced that an approach of that nature is most likely to succeed.

ADD COMMENT • link updated 5.1 years ago by Ram 44k • written 9.0 years ago by Chris Miller 22k

0

Entering edit mode

Thank you for your reply. The reference you provided will be a great help. One thing - can you clarify this part of your response:

Straight intersections will give you high specificity, but low specificity.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.0 years ago by umn_bist ▴ 390

1

Entering edit mode

Whoops - intersections will give you high *specificity* but low *sensitivity*. I'll edit my answer to fix that!

ADD REPLY • link 9.0 years ago by Chris Miller 22k

0

Entering edit mode

No problem. Thank you for the clarification. Question: for these callers (specifically samtools mpileup), are there any documentations of common/established hard filters used for somatic variants?

ADD REPLY • link 9.0 years ago by umn_bist ▴ 390

2

Entering edit mode

samtools mpileup will not be a good approach without some significant downstream work to determine the evidence for a normal genotype in the normal and a different (mutant) genotype in the tumor. I'd stick to somatic variant callers for calling somatic variants.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.0 years ago by Sean Davis 27k

0

Entering edit mode

That is unfortunate. I am a new trainee and samtools was what was most comfortable. I may just follow up on somaticSeq and use their pipeline considering the limitation of using any one single caller. Thank you for your help.

Do you have any recommended papers that explains the challenges of analyzing somatic variants?

ADD REPLY • link 9.0 years ago by umn_bist ▴ 390

1

Entering edit mode

Somaticseq uses a machine learning approach. Therefore, to put it to best use, you need a training set. I suspect that you don't have such a set, so you might want to start by running some tools like strelka, mutect, lofreq, varscan2, etc.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.0 years ago by Sean Davis 27k

0

Entering edit mode

There's some info that will be useful to you in our recent paper here:

Optimizing Cancer Genome Sequencing and Analysis

See Figure 4 and the supplement for some detailed info on specific variant callers and how they performed on this ultra-deep, highly validated tumor.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.0 years ago by Chris Miller 22k