I have matched samples from three tissues of an individual and I am trying to prioritize the pathogenic variants depending upon phenotype terms encoded by HPO. For my variant calling I have used Mutect2, and Varscan v2.3.9. but none of the called somatic variants from both the tools are common, and the vcf generated from varscan classifies 3000 variants as somatic while I get only 31 variantions that have passed the filters of mutect and called as somatic variants. both the tools were run in the default settings provided by the manufacturer.
It will be really helpful if you can suggest me any tools which will be good to use with MuTect2, and methods to filter out the variants generated to reduce the no of false positive results.
Thanks in advance
Hi!
Have you already tried to annotate your variants? For example using SNPeff. This will give you a prediction of the impact of most of your variants that you can use to filter out the (potentially) less interesting one (using SNPSift for example).
SNPeff
Thanks for the reply. I have annotated the variants generated by Mutect2 using web version of Annovar, and I am planning to do the same with my variants generated by varscan. But, the number of variants in the varscan output is my real cause of worry.
Looking at the annotation you can find variants having a stronger impact and focus on those. Such as a stop codons or a frame shifts. There is generally a column in the annotation that can help you filtering. Ex: Variant impact HIGH, MODERATE or LOW impact from SNPeff.
Have you tried giving a chance to ensemble mutation calling ? I am using ICGC cancer data for my project and I clearly see that merging passed could be very stringent. They have their own pipeline for getting a consensus from multiple callers.
Hi , At the end you have more stringency with Mutect2 than with varscan, that is normal because i think you don't have same default parameters. The method to reduce your variants are : -cut off with depth coverage (number reads to accept or not a somatic variant, you should consider the strand bias to ) -Annotation of your variants
Best
Thanks for all your replies.
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.In addition, if an answer was helpful you should upvote it.
Anecdotally (speaking to other researchers, and from our own work), it seems that VarScan2 produces a lot of false positives. I have seen the same as you in terms of overlap with MuTect(2). One idea is to use Platypus following MuTect, or call with FreeBayes (and overlap these results a nice implementation is available in the SpeedSeq pipeline).