Question

Simplest genome protein annotation pipeline possible

1

Entering edit mode

7.8 years ago

Eric Normandeau 11k

I'm often playing with draft genomes of non-model species (mostly in fishes) and we need to annotate these genomes. In cases like this, we do not really care about putative proteins that are based on ORFs or any ab-initio methods.

What we really need is to get a GFF3 annotation file listing known proteins (from swissprot, for example) with an accompanying .csv file that gives more informations about the proteins (scaffold, position, protein name, etc).

What would the simplest approach be to achieve that goal while treating intron/exons properly and producing annotations like (gene, cds, exon, utr...)?

Right now, I am considering a workflow like this:

Repeat Masker
PASA
EVidence Modeler (EVM)

And skipping anything to do with ab-initio detection (augustus, exonerate...)

Am I missing a simpler approach? The approach needs to work for eukaryote genomes (~1-3 Gbp).

EDIT: Ah well... Please do not suggest MAKER 1 or 2. I am not going to use MAKER unless my actual survival depends on it ;)

genome annotation proteins • 3.1k views

ADD COMMENT • link 7.7 years ago by Eric Normandeau 11k

0

Entering edit mode

In the end, it looks like Maker is still the best/correct approach... Eukaryote Genome Annotation needs some serious streamlining.

ADD REPLY • link 7.8 years ago by Eric Normandeau 11k

1

Entering edit mode

Isn't "eukariotic genome annotation" and "simple" an oxymoron (unless there is a not between them)?

I never used them (and they seem to be anything but simple), but do you know JAMg and JAMp?

ADD REPLY • link 7.8 years ago by h.mon 35k

0

Entering edit mode

Yes, I know firsthand that genome annotation and simple don't go hand in hand ;)

I'm investigating JAMg. Thanks for the suggestion!

ADD REPLY • link 7.8 years ago by Eric Normandeau 11k

0

Entering edit mode

JAMg looks a bit more complex than our current pipeline (which fails) and depends on some of the same software that fails on our genomes... ¯\_(ツ)_/¯

ADD REPLY • link 7.8 years ago by Eric Normandeau 11k

score 0 · Answer 1 · 2017-07-28

0

Entering edit mode

7.7 years ago

Eric Normandeau 11k

I ended up developing a genome annotation pipeline based on suggestions from a colleague. You can find more about it in this Biostar post: GAWN - Genome Annotation Without Nightmares GAWN - Genome Annotation Without Nightmares

ADD COMMENT • link 7.7 years ago by Eric Normandeau 11k