Hello,
I have been using VarScan and MuTect to call somatic variants. Lately, of curiosity, I also tried to call variants separately for the tumor sample (vs hg19) and normal sample (also vs hg19) using GATK, and compared the outputs, which is - I thought - theoretically also the somatic variants. However, the number of variants found this way is way more than the one found by VarScan or MuTect.
My questions is: What's wrong with doing so? Is that because some systematic errors in the algorithm somehow got doubled when calling variants separately?
Thank you very much.
Hi Chris, thank you very much for the quick reply. I'm going to read your paper right now. I did search briefly before this post, but couldn't obtain a satisfying answer.
So just to confirm: if I don't worry about purity, ploidy, heterogeneity or AFs, calling variants separately would be appropriate then? Thank you:)
Sort of. There are also sequencing artifacts to consider, which are easier to detect with joint calling (because they'll appear in both samples). There's a nice overview here in the Strelka paper:
Thank you very much for the information. I really appreciate it.
How about flipping it around. Given the availability of tools that perform joint calling of paired tumor-normal samples that do take these factors into account, why would you want to run a pipeline that does not?
Even in germline calling, there is benefit to calling multiple samples jointly (or pooled, depending on your terminology) rather than calling the samples independently (see here), and you get further benefit by incorporating pedigree information directly into the calling as done in the pedigree aware calling of RTG. (BTW, RTG also have a paired tumor-normal variant caller you may want to try).
Like I said, it was of curiosity:) Thank you very much for the information.