Hi everyone, I am performing NGS data analysis for the discovery of somatic variants in target sequencing experiment, I used Ion Torrent data and made the variant calling with two different tools (Vardict and Mutect2). I have chosen two open source instruments because I do not have the proprietary TVC software, I have obtained very discordant results. Few variants in common, Mutect2 also detected more variants. Could someone tell me why so much discord?
Thanks in advance.
Discordance among these and other variant callers for both somatic and germline variants is expected and well documented, unfortunately. Please take a read of just these 3 examples:
With so many parameter configurations for these programs, and also while considering the differences in sequencing depth and error rates of reads coming from different instruments and library preparation kits, benchmarking is difficult. One would probably require a discussion by the developers of these programs in order to begin to elucidate why they disagree on some calls.
Thank you very much Kevin, I had read several articles about it. So it seems normal to have so many differences. It is even more difficult to understand which of the two tools tells the truth! not having a reference, a good strategy could be to consider the variants in the intersection as valid?
Hi again, yes, taking the intersection is how some people do it. I found, however, in my own work, that random read sub-sampling, followed by variant calling on each sub-sample, was sufficient to recover all known variants, although this was for germline variants and using samtools / bcftools mpileup: https://github.com/kevinblighe/ClinicalGradeDNAseq
ok, thank you Kevin! another question, but different...do you know how to obtain the average total reads per sample, average coverage per amplicon and coverage of targeted bases? I used bedtools multicov, but I was able to get coverage per amplicon for each sample, but no information about the mean ... I would be interested in having a statistics on multiple samples
Thank you very much Kevin, I had read several articles about it. So it seems normal to have so many differences. It is even more difficult to understand which of the two tools tells the truth! not having a reference, a good strategy could be to consider the variants in the intersection as valid?
Hi again, yes, taking the intersection is how some people do it. I found, however, in my own work, that random read sub-sampling, followed by variant calling on each sub-sample, was sufficient to recover all known variants, although this was for germline variants and using samtools / bcftools mpileup: https://github.com/kevinblighe/ClinicalGradeDNAseq
ok, thank you Kevin! another question, but different...do you know how to obtain the average total reads per sample, average coverage per amplicon and coverage of targeted bases? I used bedtools multicov, but I was able to get coverage per amplicon for each sample, but no information about the mean ... I would be interested in having a statistics on multiple samples
Hi again! Hmm, I am not sure, is this what you need: Compute mean depth coverage for exome data with paired end, overlapping, features ?