Question

MUtect2-Vardict

0

Entering edit mode

3.7 years ago

sarastrafella.ss ▴ 20

Hi everyone, I am performing NGS data analysis for the discovery of somatic variants in target sequencing experiment, I used Ion Torrent data and made the variant calling with two different tools (Vardict and Mutect2). I have chosen two open source instruments because I do not have the proprietary TVC software, I have obtained very discordant results. Few variants in common, Mutect2 also detected more variants. Could someone tell me why so much discord? Thanks in advance.

Mutect2 VarDict NGS • 2.3k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 88k • written 3.7 years ago by sarastrafella.ss ▴ 20

score 2 · Answer 1 · 2021-04-04

2

Entering edit mode

3.7 years ago

Kevin Blighe 88k

Hi sarastrafella,

Discordance among these and other variant callers for both somatic and germline variants is expected and well documented, unfortunately. Please take a read of just these 3 examples:

Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency
Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data
An open resource for accurately benchmarking small variant and reference calls

With so many parameter configurations for these programs, and also while considering the differences in sequencing depth and error rates of reads coming from different instruments and library preparation kits, benchmarking is difficult. One would probably require a discussion by the developers of these programs in order to begin to elucidate why they disagree on some calls.

Kevin

ADD COMMENT • link 3.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you very much Kevin, I had read several articles about it. So it seems normal to have so many differences. It is even more difficult to understand which of the two tools tells the truth! not having a reference, a good strategy could be to consider the variants in the intersection as valid?

ADD REPLY • link 3.6 years ago by sarastrafella.ss ▴ 20

0

Entering edit mode

Hi again, yes, taking the intersection is how some people do it. I found, however, in my own work, that random read sub-sampling, followed by variant calling on each sub-sample, was sufficient to recover all known variants, although this was for germline variants and using samtools / bcftools mpileup: https://github.com/kevinblighe/ClinicalGradeDNAseq

ADD REPLY • link 3.6 years ago by Kevin Blighe 88k

0

Entering edit mode

ok, thank you Kevin! another question, but different...do you know how to obtain the average total reads per sample, average coverage per amplicon and coverage of targeted bases? I used bedtools multicov, but I was able to get coverage per amplicon for each sample, but no information about the mean ... I would be interested in having a statistics on multiple samples

ADD REPLY • link 3.6 years ago by sarastrafella.ss ▴ 20

0

Entering edit mode

Hi again! Hmm, I am not sure, is this what you need: Compute mean depth coverage for exome data with paired end, overlapping, features ?

ADD REPLY • link 3.6 years ago by Kevin Blighe 88k