Is there any standard tool out there that can convert a VCF file to Mutation Annotation Format (MAF)?
Thanks -Kasthuri
Is there any standard tool out there that can convert a VCF file to Mutation Annotation Format (MAF)?
Thanks -Kasthuri
I recently posted a VCF->MAF conversion script at: github.com/ckandoth/vcf2maf. It's plenty documented so that you understand what information is lost in translation.
Briefly - each VCF variant must be annotated to only one of all possible gene transcripts/isoforms that it might affect. This selection of a single affected transcript/isoform per variant, is often subjective. For now, the script tries to follow best-practices: it chooses the "worst" effect on the "best" transcript. If there are multiple such candidates, it annotates the variant effect on the canonical "best" transcript.
That's a great tool, thanks! I added a command line parameter for the name of snpeff vcf, feel free to use it if interested. (https://github.com/dakl/vcf2maf)
Yea that makes sense - to give the user the option to run snpEff themselves. Actually, the first version of this script was a "converter of a pre-annotated VCF" :) Then I wanted to package it all-in-one.
Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.
FYI, I recently started getting ERROR: Unrecognized biotype "non_coding". Please update your hashes! at vcf2maf.pl line 287, <GEN0> line 171.
.I added it with priority 3 which had other non-coding RNAs in it. Just so you know.
Info on the biotype from here: http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html
Thanks. Which transcript database are you using? I don't see non_coding
as a valid transcript biotype in the Ensembl 74 GTF, but I do see it listed in the GENCODE specs. I have now updated the script to handle all the GENCODE biotypes.
Please see fork of the code mentioned above by @Danielk. Alternatively, my script skips snpEff annotation for an input VCF named file.vcf
if it finds an annotated VCF in the same folder named file.anno.vcf
.
Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.
MAF contains annotation about the variant effects on transcripts/proteins while VCF typically does not. You might find that using tools like annovar, snpeff, and the Ensembl Variant Effect Predictor get you pretty close. I'm not aware of a script that applies one or more of the tools to a VCF file to produce MAF directly.
FWIW, MAF is a "standard" format within the TCGA project. Here's documentation: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
See www.biostars.org/p/74822/ and seqanswers.com/forums/showthread.php?t=16740
I'm afraid your pointers are not useful here:
I have snpeff annotated vcf files and I am converting these to maf format. When I run vcf2maf I get the rerror
Can you please point out the reason for this error.
Please open a new question, and use tags and keywords like vcf, maf, vcf2maf... so the relevant folks can find it.