I am quite new to sequencing world so I apologise in advance for my basic question.
I am looking for a good variant caller for my tumor/normal DNA-seq data (exome sequencing). I have read paper about it and this post:
Best Software For Detection Of Somatic Mutations From Matched Tumor:Normal Ngs Data but I'm not sure which tool has the best performance. I have tried to call variants with Mutect2, Mutect1 (for only snp) and Varscan2, I have seen that they call a lot of variants but they have in common only few site. Do you have any suggestion?
Welcome to the world of variant calling in tumour samples. Now, they should overlap pretty substantially if you are comparing the same types of variants (only SNVs for instance since MuTect doesn't call InDels). You may find reading Brad Chapman's posts on the subject quite informative. MuTect2 wasn't yet available when most of those comparisons were done though. It is supposed to be good, but is still in beta and I haven't yet seen it compared to good "truth" datasets. In particular you may like the idea of ensemble variant calling to produce the most robust datasets possible.
Thank you for your reply and help! The tools don't have in common a lot of variant... for example Varscan2 found 4910 sites and mutect2 7120.. they shared only 73 sites. I think that I have to filter some of that variants, because several of them have a very low read depth (for example the alterative allele is supported by 7/8 read... I guess that It is not enough..) and MAF. What do you think about it?
Yes, usually for determining what the overlap is between callers you should first apply whatever reasonable filters you would apply in your workflow. Depending on depth you likely have a minimum AF you would consider, minimum read depth, etc. Filter first and then compare. All callers will have their own area on the "edge space" of variant calling where they call false positives. Chances are those 73 variants are real though, and then there will be some other real variants that different callers miss.
Thank you for your reply and help! The tools don't have in common a lot of variant... for example Varscan2 found 4910 sites and mutect2 7120.. they shared only 73 sites. I think that I have to filter some of that variants, because several of them have a very low read depth (for example the alterative allele is supported by 7/8 read... I guess that It is not enough..) and MAF. What do you think about it?
Thank you in advance for your help!
Yes, usually for determining what the overlap is between callers you should first apply whatever reasonable filters you would apply in your workflow. Depending on depth you likely have a minimum AF you would consider, minimum read depth, etc. Filter first and then compare. All callers will have their own area on the "edge space" of variant calling where they call false positives. Chances are those 73 variants are real though, and then there will be some other real variants that different callers miss.
thank you for your help :)