I've written a new pipeline for my lab to process and call variants on whole exome sequencing data, built around bwa-mem
, samtools
, Picard
, and GATK
. The variant calling is done using Mutect2
, and I've filtered and annotated the SNP/indel calls using FilterMutect2
and Funcotator
. Whole exam seq is absolutely not my specialty, so I'm somewhat at a loss as to what I should do next. I'd like to end up with some tables / visualizations detailing which genes are mutated across samples. I've been loosely following this 2017 paper, which has some great visualizations such as this one that show how genes of interest are mutated.
So, my main question is how to I extract specifically which genes are mutated in my samples? I tried using VariantsToTable
, which returned to me a table containing chromosome & position, as well as whether the mutation was a SNP or an indel. Could I use the genomic coordinate to obtain the gene name?
Also, the VCF files are a nightmare to read using less
, so I haven't been able to inspect the annotations I added. Are there any programs other than IGV
used to inspect VCFs (I'm from a computational background, so manually inspecting a genome is somewhat out of my realm of expertise).
how does the VCF look like after the annotation with Funcotator ?
Sorry, I'm a little unsure how to describe it. I could attach a screenshot of the file while viewing it with
less
, but I'm not sure how helpful that would be. Runninghead ${sample}.vcf
returns:Thanks much for the assistance, I'm aware that I'm not doing an excellent job of describing my problems.
I wrote VCF2table : http://lindenb.github.io/jvarkit/VcfToTable.html
I'll clone the repo and give it a shot.