Augustus training and artemis
1
1
Entering edit mode
9.9 years ago
Mehmet ▴ 820

Dear All,

I need to start Augustus training. I have 3 hundreds curated genes obtained from Artemis. Now I need to use 5 scaffolds for Augustus training. Could you help me how to do Augustus training?

Thanks.

Assembly genome gene alignment • 5.3k views
ADD COMMENT
1
Entering edit mode
9.9 years ago
Juke34 8.9k

To resume:

  1. Download gtf files of the investigated genome

    1. To get a subset of this data (only one chromosome as example) do

      awk '{if ($1 == "Chromosome") print $0 }' file.gtf >> file_chromosome.gtf
      
  2. Download genome *fasta *of the investigated genome

  3. Use the script gff2gbSmallDNA from augustus package

    gff2gbSmallDNA.pl gff-file seq-file max-size-of-gene-flanking-DNA output-file
    
  4. Get a random gene set from the file obtained previously with the script randomSplit from augustus package

    randomSplit.pl geneSet.gb nbGene
    

    (You should get 2 files. One gene set for training, the other to test the model obtained by the training)

  5. Preparation of augustus directories and files (drosophila species as example)

    new_species.pl --species=drosophila
    
  6. Training augustus

    etraining --species=species genesSet.gb.train
    
  7. Launch Augustus:

    augustus --gff3=on --species=drosophila genesSet.gb.test | tee drosophila_test.out
    
  8. Look Results

    grep -A 22 Evaluation drosophila_test.out
    
ADD COMMENT
0
Entering edit mode

Hi! I am trying to train augustus for glycine and I am following the steps you have mentioned.

While using gff2gbSmallDNA.pl . I am getting error Couldn't open gff.

While using autoAug.pl following error:

Error: The environment variable AUGUSTUS_CONFIG_PATH is not define

Can you please help me with this

ADD REPLY
0
Entering edit mode

Hi

please make sure that you have prepared your off files correctly.

you must define AUGUSTUS path correctly on your unix/linux environment.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks. Today I could finish my work :) If someone wants to know how to do a training on Augustus, I can help :)

ADD REPLY
0
Entering edit mode

Hi,

when I run gff2gbSmallDNA.pl, it finishes without any error and doesn't give any output.

gff-file : I have list of intron coordinates gff

example :

contig100067    b2h intron  730 761 0   .   .   grp=TRINITY_DN11987_c1_g42_i1;pri=4;src=E
contig100067    b2h intron  1773    1807    0   .   .   grp=TRINITY_DN11987_c1_g42_i1;pri=4;src=E

seq-file: list of contig in fasta

I don't know how to go about it. could you please help with the same ?

My Regards,
Prasoon

ADD REPLY
0
Entering edit mode

Hi,

Can you please write what command you have used? and Did you check seq names in fasta file? The seq names should be same in the gff file? what option did you use in your command?

/scripts/gff2gbSmallDNA.pl gff-file seq-file max-size-of-gene-flanking-DNA output-file [options] 

here is my command:

/scripts/gff2gbSmallDNA.pl 1.gff 1.fasta 100 1.small.gb

and files used in the command: head 1.gff

1   artemis gene    6521    7714    .   -   .   ID=1.1
1   artemis mRNA    6521    7714    .   -   .   ID=1.1t;Parent=1.1
1   artemis exon    6521    6634    .   -   .   ID=1.1.1;Parent=1.1t
1   artemis CDS 6521    6634    .   -   .   ID=cds.1.1.1;Parent=1.1t
1   artemis exon    6711    6939    .   -   .   ID=1.1.2;Parent=1.1t
1   artemis CDS 6711    6939    .   -   .   ID=cds.1.1.2;Parent=1.1t
1   artemis exon    6982    7130    .   -   .   ID=1.1.3;Parent=1.1t
1   artemis CDS 6982    7130    .   -   .   ID=cds.1.1.3;Parent=1.1t
1   artemis exon    7215    7353    .   -   .   ID=1.1.4;Parent=1.1t
1   artemis CDS 7215    7353    .   -   .   ID=cds.1.1.4;Parent=1.1t

head 1.fa

>1 all_bases
tccttactggatcatgaaaaatcaatggggggaagaatggggagagagcggattctttag
agctatccgcggcactaacaacatgctggttggggactggaactaccaagttcaacttta
aaattttcttaaatctttattatgatacaattttatggataaagcgctaaatttctattg
actttggatcatctgttgaggcagaaatgttgtgtgtttaaaactgtcttgtatttaact
aaataaatctactgaaacttttt

tgtagctccggtgaaaccagctgcttttaggctgtag
ggagaacgataaaaaattttgtctctgcagcgacaaaaagtggaaatcgcataaaacaca
aacattgccacgacggaaaaaaagacgttgtatgactgtttaataaacaaaactatttaa
ttttctagtttcgtcaaattccaaattgtcgctcatcttccgacagctatcttcattaaa
attcaaaaattttacttgcgaattgaggaaatcccgtttcttatcttcccgaagagaaag
ADD REPLY
0
Entering edit mode

Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6