Question

Somatic variant call

1

Entering edit mode

7.6 years ago

sktbanerjee1 ▴ 30

I have matched samples from three tissues of an individual and I am trying to prioritize the pathogenic variants depending upon phenotype terms encoded by HPO. For my variant calling I have used Mutect2, and Varscan v2.3.9. but none of the called somatic variants from both the tools are common, and the vcf generated from varscan classifies 3000 variants as somatic while I get only 31 variantions that have passed the filters of mutect and called as somatic variants. both the tools were run in the default settings provided by the manufacturer.

It will be really helpful if you can suggest me any tools which will be good to use with MuTect2, and methods to filter out the variants generated to reduce the no of false positive results.

Thanks in advance

Mutect2 Varscan variant-filtering SNP • 3.6k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 7.6 years ago by sktbanerjee1 ▴ 30

0

Entering edit mode

Hi!

Have you already tried to annotate your variants? For example using SNPeff. This will give you a prediction of the impact of most of your variants that you can use to filter out the (potentially) less interesting one (using SNPSift for example).

SNPeff

ADD REPLY • link 7.6 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Thanks for the reply. I have annotated the variants generated by Mutect2 using web version of Annovar, and I am planning to do the same with my variants generated by varscan. But, the number of variants in the varscan output is my real cause of worry.

ADD REPLY • link 7.6 years ago by sktbanerjee1 ▴ 30

0

Entering edit mode

Looking at the annotation you can find variants having a stronger impact and focus on those. Such as a stop codons or a frame shifts. There is generally a column in the annotation that can help you filtering. Ex: Variant impact HIGH, MODERATE or LOW impact from SNPeff.

ADD REPLY • link 7.6 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Have you tried giving a chance to ensemble mutation calling ? I am using ICGC cancer data for my project and I clearly see that merging passed could be very stringent. They have their own pipeline for getting a consensus from multiple callers.

1) https://github.com/bioinform/somaticseq —> takes mutect and varscan output and combines with machine learning methods.
Bcbio toolkit used to have a standalone ensemble addition that combines multiple caller vcf output. But I couldnt find the link. ( maybe you can find it with creating an issue on github. The author is a very responsive person) https://github.com/chapmanb/bcbio-nextgen/

ADD REPLY • link 7.6 years ago by morovatunc ▴ 560

0

Entering edit mode

Hi , At the end you have more stringency with Mutect2 than with varscan, that is normal because i think you don't have same default parameters. The method to reduce your variants are : -cut off with depth coverage (number reads to accept or not a somatic variant, you should consider the strand bias to ) -Annotation of your variants

Best

ADD REPLY • link 7.6 years ago by Titus ▴ 910

0

Entering edit mode

Thanks for all your replies.

ADD REPLY • link 7.6 years ago by sktbanerjee1 ▴ 30

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

In addition, if an answer was helpful you should upvote it.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Anecdotally (speaking to other researchers, and from our own work), it seems that VarScan2 produces a lot of false positives. I have seen the same as you in terms of overlap with MuTect(2). One idea is to use Platypus following MuTect, or call with FreeBayes (and overlap these results a nice implementation is available in the SpeedSeq pipeline).

ADD REPLY • link 7.6 years ago by bruce.moran ▴ 970

score 0 · Answer 1 · 2018-01-04

the number of variants in the varscan output is my real cause of worry.

From the author of varscan, you should filter the results:

Because these are usually rare events, their call sets are often enriched for false positives.

Software：

The bam-readcount utility (https://github.com/genome/bam-readcount)

The fpfilter.pl accessory script (https://sourceforge.net/projects/varscan/files/scripts/)