I'm often playing with draft genomes of non-model species (mostly in fishes) and we need to annotate these genomes. In cases like this, we do not really care about putative proteins that are based on ORFs or any ab-initio methods.
What we really need is to get a GFF3 annotation file listing known proteins (from swissprot, for example) with an accompanying .csv file that gives more informations about the proteins (scaffold, position, protein name, etc).
What would the simplest approach be to achieve that goal while treating intron/exons properly and producing annotations like (gene, cds, exon, utr...)?
Right now, I am considering a workflow like this:
- Repeat Masker
- PASA
- EVidence Modeler (EVM)
And skipping anything to do with ab-initio detection (augustus, exonerate...)
Am I missing a simpler approach? The approach needs to work for eukaryote genomes (~1-3 Gbp).
EDIT: Ah well... Please do not suggest MAKER 1 or 2. I am not going to use MAKER unless my actual survival depends on it ;)
In the end, it looks like Maker is still the best/correct approach... Eukaryote Genome Annotation needs some serious streamlining.
Isn't "eukariotic genome annotation" and "simple" an oxymoron (unless there is a not between them)?
I never used them (and they seem to be anything but simple), but do you know JAMg and JAMp?
Yes, I know firsthand that genome annotation and simple don't go hand in hand ;)
I'm investigating JAMg. Thanks for the suggestion!
JAMg looks a bit more complex than our current pipeline (which fails) and depends on some of the same software that fails on our genomes... ¯\_(ツ)_/¯