Dear All,
I need to start Augustus training. I have 3 hundreds curated genes obtained from Artemis. Now I need to use 5 scaffolds for Augustus training. Could you help me how to do Augustus training?
Thanks.
Dear All,
I need to start Augustus training. I have 3 hundreds curated genes obtained from Artemis. Now I need to use 5 scaffolds for Augustus training. Could you help me how to do Augustus training?
Thanks.
To resume:
Download gtf files of the investigated genome
To get a subset of this data (only one chromosome as example) do
awk '{if ($1 == "Chromosome") print $0 }' file.gtf >> file_chromosome.gtf
Download genome *fasta *of the investigated genome
Use the script gff2gbSmallDNA from augustus package
gff2gbSmallDNA.pl gff-file seq-file max-size-of-gene-flanking-DNA output-file
Get a random gene set from the file obtained previously with the script randomSplit from augustus package
randomSplit.pl geneSet.gb nbGene
(You should get 2 files. One gene set for training, the other to test the model obtained by the training)
Preparation of augustus directories and files (drosophila species as example)
new_species.pl --species=drosophila
Training augustus
etraining --species=species genesSet.gb.train
Launch Augustus:
augustus --gff3=on --species=drosophila genesSet.gb.test | tee drosophila_test.out
Look Results
grep -A 22 Evaluation drosophila_test.out
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi! I am trying to train augustus for glycine and I am following the steps you have mentioned.
While using
gff2gbSmallDNA.pl
. I am getting errorCouldn't open gff.
While using
autoAug.pl
following error:Can you please help me with this
Hi
please make sure that you have prepared your off files correctly.
you must define AUGUSTUS path correctly on your unix/linux environment.
Hi,
Look at these webpages, they contain everything you have to know.
Thanks. Today I could finish my work :) If someone wants to know how to do a training on Augustus, I can help :)
Hi,
when I run gff2gbSmallDNA.pl, it finishes without any error and doesn't give any output.
gff-file : I have list of intron coordinates gff
example :
seq-file: list of contig in fasta
I don't know how to go about it. could you please help with the same ?
My Regards,
Prasoon
Hi,
Can you please write what command you have used? and Did you check seq names in fasta file? The seq names should be same in the gff file? what option did you use in your command?
here is my command:
and files used in the command: head 1.gff
head 1.fa
Thank you very much.