I'm looking for a general variant annotator ... one that would call a variant as synonymous, non-synonymous, non-genic, UTR, frame-shift, truncation, etc. By general I mean that we should be able to supply a reference genome sequence (fasta), gene models (say, in GFF), and vcf-formatted SNPs or indels, and that's it.
Many or all of the tools I've looked at (Annovar, Ensembl's Variant Effect Predictor, etc.) depend on an external database (i.e. NCBI or EMBL) or requires you to build a local database, or only does SNPs, or ... you get the point. We need to look at the effect of both indels and SNPs, and we cannot use one of the publicly released builds for our organism. Any takers? (thanks in advance ...)
i also like snpEff, but it can't handle larger indels/deletions where the variant is not encoded in the ALT column. agreed, pablo is very responsive, and he mentioned that accepting bed format may be in the pipeline which (imho) would make snpEff, more generally useful as a variant annotator.
Brent, thanks for the clarification; great to know. Hopefully this support will make it into a future version.
snpEff does seem to fit the bill. Brent, the main page lists many changes for the 1.8 version, including support for pileup support (which should allow encoding of any size indel). Is your experience with snpEff prior to 1.8, or would you say there are still issues with support for large indels?
The latest version of snpEff was just released and has support for BED format: http://snpeff.sourceforge.net/features.html
The latest version of snpEff (1.9) was just released and has support for BED format: snpeff.sourceforge.net/features.html
Can snpEff handle the single-mutation that does not include in any database? For example, I sequence some patients genome.
Yes. The effects predictions are based on the position of a mutation within transcripts, not on a pre-computed set of known mutations.