Training Set With Augustus
1
1
Entering edit mode
12.5 years ago
▴ 10

Hi, I am working on annotation of plant genome recently. I choose the AUGUSTUS to predict genes. I see the document of training sets.But I can't understand it.

Firstly, the protocol of "retraining AUGUSTUS" needs a training set,a test set and A META PARAMETERS. Are the training set or the test set completely sequeces? How can I get it ? From NCBI? And How can I configure the file(*.cfg) in META PARAMETERS?

Secondly, The file hints, How does it come from or generate?

Does the "retraining AUGUSTUS" and the hints have some relationship between them?

http://augustus.gobics.de/binaries/retraining.html http://bioinf.uni-greifswald.de/augustus/binaries/README.autoAug

Does the two web sites pointed to the same thing?

Hope for reply!

• 7.5k views
ADD COMMENT
1
Entering edit mode
12.2 years ago

The training set is a file of genes in genbank format to use for training. The test set is also a file of genes in genbank format that you may use to assess the quality of the training. The meta parameters are various parameters used by AUGUSTUS for prediction.

You must choose your own training and test set of genes. The "retraining AUGUSTUS" page suggests a number of possible sources:

  • Genbank
  • Spliced alignments of ESTs against the assembled genomic sequence. e.g. PASA
  • Spliced alignments of protein sequences of a related species against the assembled genomic sequence, e.g. GeneWise
  • Data from a related species
  • Iterate retraining with predicted genes

The meta parameters should be based on the generic ones that come with AUGUSTUS in generic_parameters.cfg and generic_weightmatrix.cfg

The first link you provide (http://augustus.gobics.de/binaries/retraining.html) describes how to perform the training of AUGUSTUS manually. The second link (http://bioinf.uni-greifswald.de/augustus/binaries/README.autoAug) describes another program, autoAug.pl, that can automate this training for you.

Training AUGUSTUS can seem intimidating at first, but if you follow the retraining document it is reasonably straightforward. In particular, the steps in the section 3. RUN THE SCRIPT optimize_augustus.pl are easy to follow.

ADD COMMENT
0
Entering edit mode

Dear David,

I'm using optimize_augustus.pl, with a training set of 1000 genes and the parameter -cpus=20, on a 650M genome, and for 5 rounds (default). One week have pass, all augustus processes have stopped except only one left on running with no sign to stop, and the nohup file really have gain no more information now.

It's quite a dilemma to me now, can you give me some advice. Thanks.

   Sincerely,
         Du Kang
ADD REPLY

Login before adding your answer.

Traffic: 2230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6