I have analyzed normal-tumor samples with VarScan2 and now I have annotated variant data in VCF format;
##fileformat=VCFv4.1
##source=VarScan2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 27107650 . TA T . PASS DP=543;SS=1;SSC=3;GPV=1.5919E-53;SPV=4.6338E-1;ANNOVAR_DATE=2016-02-01;Func.refGene=UTR3;Gene.refGene=ARID1A;GeneDetail.refGene=NM_006015:c.*404delA,NM_139135:c.*404delA;ExonicFunc.refGene=.;AAChange.refGene=.;snp142=rs533673675;CLINSIG=.;CLNDBN=.;CLNACC=.;CLNDSDB=.;CLNDSDBID=.;cosmic70=.;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;ALLELE_END GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:56:30:16:34.78%:30,0,16,0 0/1:.:487:232:135:36.78%:194,38,91,44
...
I would like to visualize my results with coMut plot. In order to do this I should somehow summarize my variant data which. I have succesfully generated coMut plots by summarizing data by hand with following data format;
Patient Gene Effect ...
A APC synonymous
A BRCA1 synonymoys
B BRCA2 frameshift deletion
B KEAP1 nonsynonymous
C MDM2 NA
C PALB2 NA
...
However, it requires a huge amount of time to summarize data and I am now looking for approaches to do this by using command line tools and commands. Do you have any kind of suggestions how this kind of process could be done or is there some kind of tools which are able generate desired output files or output files with highly similar format?
I am going to do visualization by using ggplot2 package (R) and my only requirement (or wish) for output format is that format should be easily handled in R.
Thank you in advance!
Not a complete answer, but by chance I saw something very similar to what you ask for made by this tool: https://github.com/griffithlab/GenVisR#waterfall-mutation-overview-graphic (scroll down a bit). Probably that can give you some pointers and ideas on how to go forward.