I have sought previous posts (post1, post2) on how to call somatic variants and it seems the general practice is to intersect multiple callers to insure that low VAF mutations are being called.
My approach was to samtools, MuTect2, SomaticSniper, VarScan2 but I found an interesting post saying that as long as read placements are perfect, any caller suffices (even samtool mpileup). I should mention that I am working with RNA seq of cancer samples (matched with normal).
In general, my view is as long as read placement is perfect, even naive methods work sufficiently well... To me, the simplest yet most effective strategy is to use two distinct alignment algorithms, such as bwa and bwa-sw, which have distinct error modes. You only consider mutations shared between the two alignments... Another complication is structural variations, in which I am less experienced. In some sense, false mutations caused by structural variations are still indication of something different between normal and tumor... In all, I think you do not need to worry about which software to use for detecting somatic mutations - anything reasonable is fine. You should pay more attention to mismapping and structural variations.
First, does read placement mean how well aligners align our samples to the reference? Does working with RNA-Seq introduce higher error rates in read placements? Has the consensus changed or is intersecting multiple callers still recommended? Thank you very much for your help.
Thank you for your reply. The reference you provided will be a great help. One thing - can you clarify this part of your response:
Whoops - intersections will give you high *specificity* but low *sensitivity*. I'll edit my answer to fix that!
No problem. Thank you for the clarification. Question: for these callers (specifically samtools mpileup), are there any documentations of common/established hard filters used for somatic variants?
samtools mpileup will not be a good approach without some significant downstream work to determine the evidence for a normal genotype in the normal and a different (mutant) genotype in the tumor. I'd stick to somatic variant callers for calling somatic variants.
That is unfortunate. I am a new trainee and samtools was what was most comfortable. I may just follow up on somaticSeq and use their pipeline considering the limitation of using any one single caller. Thank you for your help.
Do you have any recommended papers that explains the challenges of analyzing somatic variants?
Somaticseq uses a machine learning approach. Therefore, to put it to best use, you need a training set. I suspect that you don't have such a set, so you might want to start by running some tools like strelka, mutect, lofreq, varscan2, etc.
There's some info that will be useful to you in our recent paper here:
Optimizing Cancer Genome Sequencing and Analysis
See Figure 4 and the supplement for some detailed info on specific variant callers and how they performed on this ultra-deep, highly validated tumor.