I'm having trouble selecting a somatic mutation software for non-cancer cases, and would really appreciate some suggestion here. Basically, we are comparing multiple tiny small chunk of healthy tissues to blood sample, which serves as germline. (Not going to talk about all the details here) However, I briefly looked into and tried the following software and none of them seems to be completely satisfying:
MuTect 2. It is too sensitive, because it considers "varying allelic fraction for each variant, as is often seen in tumors with purity less than 100%, multiple subclones, and/or copy number variation". (reference). It's not to say that my tissue sample is super pure, but instead of looking into the impurity, I rather discard them for now. ---Also, MuTect 2 is not working very well on heterogeneity changes. I observed similar situations as it is described in this post: http://gatkforums.broadinstitute.org/gatk/discussion/7619/how-does-mutect2-treat-heterozygous-mutant-to-homozygous-mutant
Samtools. It is quite flexible, especially if I just stop at the pileup step, and do all the filtration on my own. However, it is very very slow and generate huge resulting files.
VarScan 2. It seems to work, but it gives me way more mutation callings than I was expecting, even after the hard filtration. I do like the detailed output. Currently I haven't tried any other software for comparison yet, so not sure if the result is reliable.
GATK (to each sample individually). I know it is not designed for somatic mutation calling, but I tried it just because I'm familiar with it. (In fact, if I assume each of my sample is pure, then the traditional diploid assumption still apply and then GATK is okay to use I think.) It is also too sensitive. The outcome is completely not comparable to VarScan 2.
Any other software you would recommend for my case please? (I don't mind filter out bad or even not-so-good reads and positions at this moment. So I guess I care more about specificity than sensitivity. Also, I care about change of heterogeneity on single sites - a lot, since it is supposed to be common in healthy tissues) Thanks a lot in advance.
I am not really recommending samtools, but – if your only problem with samtools is the huge output file, you can use unix pipe like:
samtools mpileup sample1.bam sample2.bam | post-filter.py
.Wow thanks so much for your reply - I really like your software!! Yes I will use piping if I decided to use samtools eventually, but what makes you think samtools is not recommended please?