Question

How should I extract mutated genes from a VCF file?

0

Entering edit mode

4.6 years ago

jrleary ▴ 210

I've written a new pipeline for my lab to process and call variants on whole exome sequencing data, built around bwa-mem, samtools, Picard, and GATK. The variant calling is done using Mutect2, and I've filtered and annotated the SNP/indel calls using FilterMutect2 and Funcotator. Whole exam seq is absolutely not my specialty, so I'm somewhat at a loss as to what I should do next. I'd like to end up with some tables / visualizations detailing which genes are mutated across samples. I've been loosely following this 2017 paper, which has some great visualizations such as this one that show how genes of interest are mutated.

So, my main question is how to I extract specifically which genes are mutated in my samples? I tried using VariantsToTable, which returned to me a table containing chromosome & position, as well as whether the mutation was a SNP or an indel. Could I use the genomic coordinate to obtain the gene name?

Also, the VCF files are a nightmare to read using less, so I haven't been able to inspect the annotations I added. Are there any programs other than IGV used to inspect VCFs (I'm from a computational background, so manually inspecting a genome is somewhat out of my realm of expertise).

WES exome gatk • 2.3k views

ADD COMMENT • link 4.6 years ago by jrleary ▴ 210

0

Entering edit mode

how does the VCF look like after the annotation with Funcotator ?

ADD REPLY • link 4.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Sorry, I'm a little unsure how to describe it. I could attach a screenshot of the file while viewing it with less, but I'm not sure how helpful that would be. Running head ${sample}.vcf returns:

##fileformat=VCFv4.2
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">

Thanks much for the assistance, I'm aware that I'm not doing an excellent job of describing my problems.

ADD REPLY • link 4.6 years ago by jrleary ▴ 210

0

Entering edit mode

Also, the VCF files are a nightmare to read using less, so I haven't been able to inspect the annotations I added. Are there any programs other than IGV used to inspect VCFs

I wrote VCF2table : http://lindenb.github.io/jvarkit/VcfToTable.html

ADD REPLY • link 4.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I'll clone the repo and give it a shot.

ADD REPLY • link 4.6 years ago by jrleary ▴ 210