Question

about somatic mutation calling

1

Entering edit mode

9.1 years ago

Bogdan ★ 1.4k

Dear all,

in the light of a new article in PLOS : "Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data" (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151664) that recommends the use of Mutect, Strelka and Virmid for somatic mutation calling, wondering if you guys can share your experience with these 3 algorithms, or if you have any other recommendations. Thanks a lot,

bogdan

SNP genome sequencing snp • 5.4k views

ADD COMMENT • link updated 9.1 years ago by Tabea ▴ 30 • written 9.1 years ago by Bogdan ★ 1.4k

2

Entering edit mode

9.1 years ago

Tabea ▴ 30

You could also have a look at this paper, which compared mutation callers:

Alioto, T. S., Buchhalter, I., Derdak, S., Hutter, B., Eldridge, M. D., Hovig, E., … Gut, I. G. (2015). A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nature Communications, 6, 10001. doi:10.1038/ncomms10001 (http://www.ncbi.nlm.nih.gov/pubmed/26647970)

They created a curated "gold" set of mutations.

ADD COMMENT • link 9.1 years ago by Tabea ▴ 30

0

Entering edit mode

thanks gentlemen ! we were quite impressed with VIRMID ..it provides very clean results and very reliable.

ADD REPLY • link 9.1 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Bogdan, Can you please share how you ran VIRMID. I am trying to run VIRMID, but i keep getting error, as posted here Virmid mutation calling, error Thanks !

ADD REPLY • link 7.3 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Hi Chirag, i will have to check my notes and folders, and I can email you later. A note to add though : VIRMID does not call the INDELS, it does call only the SNV. I would recommend using MUTECT2 and VARSCAN as you can obtain both lists of SNV and INDEL.

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Thank you !! Looking forward for your reply in the above post. I have already used Mutect and Varscan. I wanted to try Virmid, but have not been successful in running it.

ADD REPLY • link 7.3 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

oh, well, the algorithm has not been updated for a long while. if you'd like, please take a look also into MuSe: http://bioinformatics.mdanderson.org/main/MuSE

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Chirag, talking about VIRMID :

-- 've used the version 1.2.0 a while ago (approx 2 years ago, and it worked fine); -- on the command line :

java -jar Virmid.jar -R hg19.fa -D tumor.bam -N normal.bam \ -c1 20 -C1 100 -c2 20 -C2 100

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Thanks Bogdan ! I did use similar command, as mentioned previously in the above link. However, i get error. I tried MuSE. It works fine.

ADD REPLY • link 7.3 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

glad to know that MuSe is working for you ;)

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Chirag, if I may ask please, would it be possible to share 2-3 lines of variants from the vcf output by MuSe. Thanks a lot !

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

1

Entering edit mode

Yes, please. I removed the commented lines. Here are the first 5 lines of called somatic variants

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL

chr1 1274696 . G T . Tier5 SOMATIC GT:DP:AD:BQ:SS 0/1:42:39,3:35,37:2 0/0:15:15,0:36,0:.

chr1 2451820 . C T . Tier2 SOMATIC GT:DP:AD:BQ:SS 0/1:38:34,4:36,36:2 0/0:31:31,0:34,0:.

chr1 12855752 . G A . Tier1 SOMATIC GT:DP:AD:BQ:SS 0/1:241:231,10:35,35:2 0/0:280:280,0:35,0:.

chr1 12888364 . G A . PASS SOMATIC GT:DP:AD:BQ:SS 0/1:98:89,9:34,36:2 0/0:32:32,0:35,0:.

chr1 27277248 . C A . PASS SOMATIC GT:DP:AD:BQ:SS 0/1:89:77,12:35,37:2 0/0:87:87,0:34,0:.

ADD REPLY • link 7.3 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

thanks a lot Chirag. I am glad that they report the AD.

ADD REPLY • link 7.3 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Yes, that is nice. I have two questions. Among the called somatic variants, variants are classified as PASS(high confident) and into Tier1 to Tier5. What do you think is reasonable to include for downstream filtering. This is the total count i had in my cohort of around 100 patients

PASS 17846

Tier1 5836

Tier2 3604

Tier3 1638

Tier4 138

Tier5 2038

What filtering threshold did you use ?

Second, which command did you use to filter somatic calls based on AD ?

ADD REPLY • link 7.2 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Hi Chirag, for filtering, you can use please VCFTOOLS or VCFLIB . Talking about the tiers 1 to 5, we could ask the author of the algorithm.

ADD REPLY • link 7.2 years ago by Bogdan ★ 1.4k

0

Entering edit mode

PASS is the most stringent called variants. For WES, it is recommended that Tier-5 can be excluded for WES, but also suggested to compare with other tools, as that might decrease false positive. Other thing is, muse reports LOH as somatic events, while varscan2 separates LOH from somatic events.

ADD REPLY • link 7.2 years ago by Chirag Nepal ★ 2.4k

score 3 · Accepted Answer · 2016-03-30

3

Entering edit mode

9.1 years ago

DG 7.3k

I always interpret some of these results with a healthy grain of salt, in particular because the actual truth set of mutations usually isn't defined, so that there is no good analysis of Precision, Recall, and other measures. The authors also generally assumed that the four callers that had tended to converge on one another, and had the most agreement with each other, where also the correct callers. I happen to agree with them, but that kind of reasoning can be flawed if they are suffering from systematic bias due to a similar calling method. That said, of the recommended callers I've only used MuTect extensively. I'm somewhat surprised that FreeBayes wasn't included as it is widely considered to be an excellent Haplotype based caller as well. MuTect, in my experience, works quite well, although I've found getting the old version is now somewhat of a pain to find and download, and get good documentation for. There are also issues I believe with making sure you are using the correct Java version, which is a perennial problem for Java software in general and anything that uses the GATK in any way in particular.

I'm now looking at testing EBCall and Virmid in my own pipeline as they weren't in there. I've been using MuTect, FreeBayes, Scalpel, VarDict, Pindel, and Platypus.

ADD COMMENT • link 9.1 years ago by DG 7.3k

0

Entering edit mode

Thanks for sharing. I was actually wondering about how MuTect2 handles LOH cases please? In my somatic mutation calling project hyterogeneity changes turn to be important. Thank you.

ADD REPLY • link 9.0 years ago by DVA ▴ 640

0

Entering edit mode

Are you talking about changes in tumour heterogeneity or LOH (Loss of Heterozygosity)?

ADD REPLY • link 9.0 years ago by DG 7.3k

0

Entering edit mode

I mean heterogeneity on single sites, not a range of LOH. E.g. read counts for normal: 40%A, 60% T, and for tumor 15%A, 85%T -- would MuTect mark this as a variant or just consider this as a sample impurity please? Thank you.

ADD REPLY • link 9.0 years ago by DVA ▴ 640

0

Entering edit mode

MuTect should discard it. The variant in this case is present in the normal tissue so it isn't a somatic mutation.