I'm planning to use Augustus to annotate a genome. It needs some initial gene models to train on, and it is suggested I use ESTs and PASA to build these. However, PASA is a bit complicated, and requires things like mysql, which seems a bit unnecessary. Are there any simpler alternatives I could use?
Not ESTs based but used in few studies with novel genomes: CEGMA
It has some advantage of selected gene set (i.e. you do not overtrain Augustus with 100 most highly expressed kinases) but it is a bit dated (gene/protein sequences got updated since), and the set of "core genes" present in all Eukariotes seems to be shrinking.
In the case of Augustus the most important part (speaking of a plant genome) was to have possibly accurate set of hints obtained from RNA-Seq data, followed by repeat masking. Even with Arabidopsis-trained Augustus it was surprisingly accurate even for some giant plant genes. Situation may be obviously very different for other groups/organisms.