in the light of a new article in PLOS : "Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data" (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151664) that recommends the use of Mutect, Strelka and Virmid for somatic mutation calling, wondering if you guys can share your experience with these 3 algorithms, or if you have any other recommendations. Thanks a lot,
I always interpret some of these results with a healthy grain of salt, in particular because the actual truth set of mutations usually isn't defined, so that there is no good analysis of Precision, Recall, and other measures. The authors also generally assumed that the four callers that had tended to converge on one another, and had the most agreement with each other, where also the correct callers. I happen to agree with them, but that kind of reasoning can be flawed if they are suffering from systematic bias due to a similar calling method. That said, of the recommended callers I've only used MuTect extensively. I'm somewhat surprised that FreeBayes wasn't included as it is widely considered to be an excellent Haplotype based caller as well. MuTect, in my experience, works quite well, although I've found getting the old version is now somewhat of a pain to find and download, and get good documentation for. There are also issues I believe with making sure you are using the correct Java version, which is a perennial problem for Java software in general and anything that uses the GATK in any way in particular.
I'm now looking at testing EBCall and Virmid in my own pipeline as they weren't in there. I've been using MuTect, FreeBayes, Scalpel, VarDict, Pindel, and Platypus.
Thanks for sharing. I was actually wondering about how MuTect2 handles LOH cases please? In my somatic mutation calling project hyterogeneity changes turn to be important. Thank you.
I mean heterogeneity on single sites, not a range of LOH. E.g. read counts for normal: 40%A, 60% T, and for tumor 15%A, 85%T -- would MuTect mark this as a variant or just consider this as a sample impurity please? Thank you.
You could also have a look at this paper, which compared mutation callers:
Alioto, T. S., Buchhalter, I., Derdak, S., Hutter, B., Eldridge, M. D., Hovig, E., … Gut, I. G. (2015). A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nature Communications, 6, 10001. doi:10.1038/ncomms10001 (http://www.ncbi.nlm.nih.gov/pubmed/26647970)
Hi Bogdan,
Can you please share how you ran VIRMID.
I am trying to run VIRMID, but i keep getting error, as posted here Virmid mutation calling, error
Thanks !
Hi Chirag, i will have to check my notes and folders, and I can email you later. A note to add though : VIRMID does not call the INDELS, it does call only the SNV. I would recommend using MUTECT2 and VARSCAN as you can obtain both lists of SNV and INDEL.
Thank you !! Looking forward for your reply in the above post. I have already used Mutect and Varscan. I wanted to try Virmid, but have not been successful in running it.
Yes, that is nice.
I have two questions.
Among the called somatic variants, variants are classified as PASS(high confident) and into Tier1 to Tier5. What do you think is reasonable to include for downstream filtering.
This is the total count i had in my cohort of around 100 patients
PASS 17846
Tier1 5836
Tier2 3604
Tier3 1638
Tier4 138
Tier5 2038
What filtering threshold did you use ?
Second, which command did you use to filter somatic calls based on AD ?
PASS is the most stringent called variants. For WES, it is recommended that Tier-5 can be excluded for WES, but also suggested to compare with other tools, as that might decrease false positive. Other thing is, muse reports LOH as somatic events, while varscan2 separates LOH from somatic events.
Thanks for sharing. I was actually wondering about how MuTect2 handles LOH cases please? In my somatic mutation calling project hyterogeneity changes turn to be important. Thank you.
Are you talking about changes in tumour heterogeneity or LOH (Loss of Heterozygosity)?
I mean heterogeneity on single sites, not a range of LOH. E.g. read counts for normal: 40%A, 60% T, and for tumor 15%A, 85%T -- would MuTect mark this as a variant or just consider this as a sample impurity please? Thank you.
MuTect should discard it. The variant in this case is present in the normal tissue so it isn't a somatic mutation.
I see. Thanks a lot!