Hi all,
I have some genes and want to investigate the status of their mutations during cancer so I annotated my VCF file (for somatic variants) by ANNOVAR and I used some databases for annotation (i.e. icgc28, nci60, Noncoding_CosmicV92, Coding_CosmicV92, cadd13gt10, dann, clinvar_20210501, avsnp150, gnomad211_exome, gnomad211_genome, hrcr1, cg46, cg69, kaviar_20150923, refGeneWithVer, knownGene, ensGene, cytoBand, genomicSuperDups, tfbsConsSites, wgRna, gwasCatalog, abraom, dbnsfp41a, eigen, esp6500siv2_all, exac03, gme, intervar_20180118 ) but I’m not sure if I need all of them ?! :|
which are helpful and preferred to annotating a VCF file for somatic variants?
Thanks for any help or suggestion.
If you are interested in driver mutations that occur in protein coding regions, then cadd/dann/eigen have suboptimal performance, as they were not made for somatic mutations in cancer or optimized for protein coding-alterations. Combining multiple predictors won't alleviate this problem (I've tried). You would be much better off using variant predictors designed for somatic mutations in cancer or at least have been benchmarked to have good performance (see https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-01954-z ). That benchmark would suggest CHASM, CTAT-cancer, DEOGEN2, or PrimateAI. Since I'm the developer of CHASMplus, a considerably better version of CHASM that was included in the benchmark, I would suggest CHASMplus, but you would need to either run the VCF file on the opencravat webserver or use a downloadable command line tool (see https://open-cravat.readthedocs.io/en/latest/ ).