Question

Genemark-Es Gtf Output Conversion To Mod

1

Entering edit mode

10.9 years ago

rob234king ▴ 610

I'm running Maker2 to annotate a genome but need to train Genemark-ES first. I have run Eukaryotic Genemark.hmm using the perl script which finished producing the GTF file but Maker2 requires a mod file and the mod folder is empty.

 /home/apps/scripts/gm_es.pl ../RR_1.7b.fasta

How do I convert the GTF output file to the mod format or have I missed something/errors? The documentation doesn't seem to explain this.

There are some errors given out during the process, the first part of output has no errors (see below)

    running hmm2nt.a2
4 files IN
Clusters were defined as:
 0 <= GC% <= 99
99 < GC% <= 99
99 < GC% <= 100

Parsing dna.fa.good.cod

Program complete
----------------
6442 sequences found
6443 dna.fa.good.ini
first order for Ini
GC Range: (0,99)
6442 sequences of length 12 used from 6442
total sequences in dna.fa.good.ini
Generating model...
TT     0.25 0.35 0.16 0.19 0.27 0.15 0.00 0.00 0.00 0.00 0.28 0.27
TC     0.25 0.35 0.55 0.23 0.36 0.58 0.00 0.00 0.00 0.00 0.53 0.32
TA     0.25 0.16 0.15 0.29 0.20 0.14 1.00 0.00 0.00 0.00 0.10 0.13
TG     0.25 0.14 0.14 0.30 0.17 0.13 0.00 0.00 1.00 0.00 0.08 0.28
CT     0.25 0.30 0.20 0.11 0.28 0.14 0.00 0.00 0.00 0.00 0.31 0.35
CC     0.25 0.30 0.35 0.10 0.26 0.40 0.00 0.00 0.00 0.00 0.30 0.25
CA     0.25 0.26 0.32 0.62 0.27 0.33 1.00 0.00 0.00 0.00 0.23 0.21
CG     0.25 0.14 0.13 0.17 0.19 0.13 0.00 0.00 0.00 0.00 0.15 0.20
AT     0.25 0.27 0.20 0.15 0.23 0.18 0.00 1.00 0.00 0.00 0.24 0.23
AC     0.25 0.27 0.35 0.16 0.32 0.30 0.00 0.00 0.00 0.00 0.27 0.27
AA     0.25 0.29 0.23 0.43 0.36 0.30 1.00 0.00 0.00 0.00 0.31 0.22
AG     0.25 0.18 0.21 0.25 0.10 0.23 0.00 0.00 0.00 0.00 0.19 0.29
GT     0.25 0.24 0.25 0.21 0.19 0.21 0.00 0.00 0.00 0.18 0.15 0.30
GC     0.25 0.33 0.34 0.23 0.35 0.34 0.00 0.00 0.00 0.20 0.41 0.35
GA     0.25 0.28 0.25 0.34 0.30 0.28 1.00 0.00 0.00 0.26 0.29 0.21
GG     0.25 0.16 0.16 0.22 0.16 0.18 0.00 0.00 0.00 0.36 0.15 0.15
Done
6443 lines read from dna.fa.good.ter
6442 sequences obtained
1 comment lines
0 lines contained no sequence (or improperly formatted seq)
2445 sequences used TAA
1802 sequences used TAG
2195 sequences used TGA
0 sequences did not begin with a stop codon
All lines accounted for
Done
2445 dna.fa.good.taa
first order for TAA
GC Range: (0,99)
2445 sequences of length 12 used from 2445
total sequences in dna.fa.good.taa
Generating model...
TT     1.00 0.00 0.00 0.00 0.32 0.30 0.28 0.30 0.32 0.29 0.30 0.28
TC     0.00 0.00 0.00 0.00 0.16 0.19 0.15 0.20 0.21 0.21 0.21 0.18
TA     0.00 1.00 0.00 0.00 0.26 0.26 0.25 0.25 0.21 0.24 0.26 0.26
TG     0.00 0.00 0.00 0.00 0.25 0.25 0.32 0.25 0.26 0.26 0.23 0.28
CT     1.00 0.00 0.00 0.00 0.26 0.26 0.27 0.28 0.33 0.30 0.26 0.27
CC     0.00 0.00 0.00 0.00 0.17 0.15 0.20 0.20 0.19 0.20 0.18 0.20
CA     0.00 0.00 0.00 0.00 0.37 0.34 0.35 0.38 0.30 0.33 0.37 0.39
CG     0.00 0.00 0.00 0.00 0.20 0.25 0.19 0.14 0.19 0.17 0.18 0.14
AT     1.00 0.00 0.00 0.20 0.26 0.31 0.26 0.28 0.29 0.31 0.29 0.33
AC     0.00 0.00 0.00 0.15 0.22 0.15 0.18 0.22 0.22 0.23 0.22 0.22
AA     0.00 0.00 1.00 0.29 0.26 0.28 0.30 0.29 0.27 0.24 0.30 0.23
AG     0.00 0.00 0.00 0.36 0.25 0.25 0.25 0.21 0.22 0.23 0.19 0.22
GT     1.00 0.00 0.00 0.00 0.46 0.22 0.22 0.25 0.25 0.23 0.25 0.26
GC     0.00 0.00 0.00 0.00 0.20 0.17 0.19 0.19 0.19 0.21 0.18 0.22
GA     0.00 0.00 0.00 0.00 0.20 0.36 0.34 0.31 0.33 0.34 0.31 0.32
GG     0.00 0.00 0.00 0.00 0.15 0.26 0.25 0.25 0.22 0.22 0.26 0.20
Done
1802 dna.fa.good.tag
zero order for TAG
GC Range: (0,99)
1802 sequences of length 12 used from 1802
total sequences in dna.fa.good.tag
Generating model...
T    1.00 0.00 0.00 0.17 0.33 0.28 0.25 0.28 0.31 0.29 0.29 0.32
C    0.00 0.00 0.00 0.13 0.20 0.15 0.17 0.18 0.18 0.18 0.19 0.17
A    0.00 1.00 0.00 0.48 0.26 0.30 0.30 0.30 0.26 0.29 0.29 0.28
G    0.00 0.00 1.00 0.22 0.21 0.27 0.27 0.24 0.25 0.25 0.24 0.23
Done
2195 dna.fa.good.tga
first order for TGA
GC Range: (0,99)
2195 sequences of length 12 used from 2195
total sequences in dna.fa.good.tga
Generating model...
TT     1.00 0.00 0.00 0.00 0.30 0.32 0.31 0.29 0.30 0.32 0.28 0.31
TC     0.00 0.00 0.00 0.00 0.15 0.14 0.13 0.18 0.14 0.22 0.19 0.16
TA     0.00 0.00 0.00 0.00 0.24 0.23 0.26 0.22 0.25 0.21 0.26 0.28
TG     0.00 1.00 0.00 0.00 0.31 0.31 0.30 0.31 0.30 0.25 0.28 0.25
CT     1.00 0.00 0.00 0.00 0.29 0.29 0.29 0.30 0.28 0.28 0.28 0.28
CC     0.00 0.00 0.00 0.00 0.22 0.14 0.19 0.20 0.20 0.18 0.15 0.15
CA     0.00 0.00 0.00 0.00 0.32 0.37 0.34 0.36 0.31 0.37 0.35 0.39
CG     0.00 0.00 0.00 0.00 0.17 0.20 0.18 0.14 0.21 0.18 0.21 0.18
AT     1.00 0.00 0.00 0.31 0.20 0.27 0.30 0.29 0.28 0.30 0.25 0.31
AC     0.00 0.00 0.00 0.12 0.22 0.21 0.18 0.22 0.24 0.21 0.19 0.21
AA     0.00 0.00 0.00 0.24 0.27 0.26 0.29 0.25 0.26 0.24 0.28 0.27
AG     0.00 0.00 0.00 0.33 0.31 0.27 0.23 0.24 0.23 0.25 0.28 0.22
GT     1.00 0.00 0.00 0.00 0.28 0.23 0.21 0.24 0.25 0.26 0.25 0.27
GC     0.00 0.00 0.00 0.00 0.19 0.18 0.15 0.20 0.18 0.18 0.19 0.19
GA     0.00 0.00 1.00 0.00 0.26 0.35 0.36 0.30 0.33 0.34 0.32 0.34

But then carrying on I get Error: unknown line format

3294 dna.fa.good.gb.acc.ph2
first order for ACC 2
Error: unknown line format
       GC%      Intron  Accession        (File generated at 2014/02/15 Sat 13:38:23 GMT)

GC Range: (0,99)
3293 sequences of length 21 used from 3294
total sequences in dna.fa.good.gb.acc.ph2
Generating model...

• 5.4k views

ADD COMMENT • link updated 7.8 years ago by S AR ▴ 80 • written 10.9 years ago by rob234king ▴ 610

1

Entering edit mode

Can anyone help me with this error:

error, file not found info/training.fna

I ran the following command to generate gmhmm.mode file of my desired model

gmes_petap.pl --ES  --cores 4 --sequence test_genome.fasta

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 7.8 years ago by S AR ▴ 80

0

Entering edit mode

Hi angelshiza,

Recently, I have encountered with the same error ("error, file not found info/training.fna"). Have you found a solution to troubleshoot this error ? Any help is appreciated. Thanks in advance.

ADD REPLY • link 7.5 years ago by gauravdube007 ▴ 20

0

Entering edit mode

I also had the same error. There are two solutions. Change the perl path to all .pl files or use the command change_path_in_perl_scripts.pl

I did it in the following way: victorc:~/bin/gm_et_linux_64/gmes_petap $ ./change_path_in_perl_scripts.pl /home/victor/bin/perl

ADD REPLY • link 5.8 years ago by victorcana1991 ▴ 10

0

Entering edit mode

Hi, I was wondering if you found a solution to the above problem? I am now facing exactly the same problem and have no idea how to fix it.

Thaks

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 10.1 years ago by tuanduonganh • 0

Ram · Answer 1 · 2015-12-21

Hopefully you got this figured out by now, but the mod file required for maker is located in the genemark 'output' directory, by default it is called gmhmm.mod. So with the most recent version of GeneMark-ES 4.32, you would run something like the following command:

gmes_petap.pl --ES  --cores 4 --sequence test_genome.fasta

The file for maker would be in the output folder, so output/gmhmm.mod.