Genemark-Es Gtf Output Conversion To Mod
1
1
Entering edit mode
10.9 years ago
rob234king ▴ 610

I'm running Maker2 to annotate a genome but need to train Genemark-ES first. I have run Eukaryotic Genemark.hmm using the perl script which finished producing the GTF file but Maker2 requires a mod file and the mod folder is empty.

 /home/apps/scripts/gm_es.pl ../RR_1.7b.fasta

How do I convert the GTF output file to the mod format or have I missed something/errors? The documentation doesn't seem to explain this.

There are some errors given out during the process, the first part of output has no errors (see below)

    running hmm2nt.a2
4 files IN
Clusters were defined as:
 0 <= GC% <= 99
99 < GC% <= 99
99 < GC% <= 100

Parsing dna.fa.good.cod

Program complete
----------------
6442 sequences found
6443 dna.fa.good.ini
first order for Ini
GC Range: (0,99)
6442 sequences of length 12 used from 6442
total sequences in dna.fa.good.ini
Generating model...
TT     0.25 0.35 0.16 0.19 0.27 0.15 0.00 0.00 0.00 0.00 0.28 0.27
TC     0.25 0.35 0.55 0.23 0.36 0.58 0.00 0.00 0.00 0.00 0.53 0.32
TA     0.25 0.16 0.15 0.29 0.20 0.14 1.00 0.00 0.00 0.00 0.10 0.13
TG     0.25 0.14 0.14 0.30 0.17 0.13 0.00 0.00 1.00 0.00 0.08 0.28
CT     0.25 0.30 0.20 0.11 0.28 0.14 0.00 0.00 0.00 0.00 0.31 0.35
CC     0.25 0.30 0.35 0.10 0.26 0.40 0.00 0.00 0.00 0.00 0.30 0.25
CA     0.25 0.26 0.32 0.62 0.27 0.33 1.00 0.00 0.00 0.00 0.23 0.21
CG     0.25 0.14 0.13 0.17 0.19 0.13 0.00 0.00 0.00 0.00 0.15 0.20
AT     0.25 0.27 0.20 0.15 0.23 0.18 0.00 1.00 0.00 0.00 0.24 0.23
AC     0.25 0.27 0.35 0.16 0.32 0.30 0.00 0.00 0.00 0.00 0.27 0.27
AA     0.25 0.29 0.23 0.43 0.36 0.30 1.00 0.00 0.00 0.00 0.31 0.22
AG     0.25 0.18 0.21 0.25 0.10 0.23 0.00 0.00 0.00 0.00 0.19 0.29
GT     0.25 0.24 0.25 0.21 0.19 0.21 0.00 0.00 0.00 0.18 0.15 0.30
GC     0.25 0.33 0.34 0.23 0.35 0.34 0.00 0.00 0.00 0.20 0.41 0.35
GA     0.25 0.28 0.25 0.34 0.30 0.28 1.00 0.00 0.00 0.26 0.29 0.21
GG     0.25 0.16 0.16 0.22 0.16 0.18 0.00 0.00 0.00 0.36 0.15 0.15
Done
6443 lines read from dna.fa.good.ter
6442 sequences obtained
1 comment lines
0 lines contained no sequence (or improperly formatted seq)
2445 sequences used TAA
1802 sequences used TAG
2195 sequences used TGA
0 sequences did not begin with a stop codon
All lines accounted for
Done
2445 dna.fa.good.taa
first order for TAA
GC Range: (0,99)
2445 sequences of length 12 used from 2445
total sequences in dna.fa.good.taa
Generating model...
TT     1.00 0.00 0.00 0.00 0.32 0.30 0.28 0.30 0.32 0.29 0.30 0.28
TC     0.00 0.00 0.00 0.00 0.16 0.19 0.15 0.20 0.21 0.21 0.21 0.18
TA     0.00 1.00 0.00 0.00 0.26 0.26 0.25 0.25 0.21 0.24 0.26 0.26
TG     0.00 0.00 0.00 0.00 0.25 0.25 0.32 0.25 0.26 0.26 0.23 0.28
CT     1.00 0.00 0.00 0.00 0.26 0.26 0.27 0.28 0.33 0.30 0.26 0.27
CC     0.00 0.00 0.00 0.00 0.17 0.15 0.20 0.20 0.19 0.20 0.18 0.20
CA     0.00 0.00 0.00 0.00 0.37 0.34 0.35 0.38 0.30 0.33 0.37 0.39
CG     0.00 0.00 0.00 0.00 0.20 0.25 0.19 0.14 0.19 0.17 0.18 0.14
AT     1.00 0.00 0.00 0.20 0.26 0.31 0.26 0.28 0.29 0.31 0.29 0.33
AC     0.00 0.00 0.00 0.15 0.22 0.15 0.18 0.22 0.22 0.23 0.22 0.22
AA     0.00 0.00 1.00 0.29 0.26 0.28 0.30 0.29 0.27 0.24 0.30 0.23
AG     0.00 0.00 0.00 0.36 0.25 0.25 0.25 0.21 0.22 0.23 0.19 0.22
GT     1.00 0.00 0.00 0.00 0.46 0.22 0.22 0.25 0.25 0.23 0.25 0.26
GC     0.00 0.00 0.00 0.00 0.20 0.17 0.19 0.19 0.19 0.21 0.18 0.22
GA     0.00 0.00 0.00 0.00 0.20 0.36 0.34 0.31 0.33 0.34 0.31 0.32
GG     0.00 0.00 0.00 0.00 0.15 0.26 0.25 0.25 0.22 0.22 0.26 0.20
Done
1802 dna.fa.good.tag
zero order for TAG
GC Range: (0,99)
1802 sequences of length 12 used from 1802
total sequences in dna.fa.good.tag
Generating model...
T    1.00 0.00 0.00 0.17 0.33 0.28 0.25 0.28 0.31 0.29 0.29 0.32
C    0.00 0.00 0.00 0.13 0.20 0.15 0.17 0.18 0.18 0.18 0.19 0.17
A    0.00 1.00 0.00 0.48 0.26 0.30 0.30 0.30 0.26 0.29 0.29 0.28
G    0.00 0.00 1.00 0.22 0.21 0.27 0.27 0.24 0.25 0.25 0.24 0.23
Done
2195 dna.fa.good.tga
first order for TGA
GC Range: (0,99)
2195 sequences of length 12 used from 2195
total sequences in dna.fa.good.tga
Generating model...
TT     1.00 0.00 0.00 0.00 0.30 0.32 0.31 0.29 0.30 0.32 0.28 0.31
TC     0.00 0.00 0.00 0.00 0.15 0.14 0.13 0.18 0.14 0.22 0.19 0.16
TA     0.00 0.00 0.00 0.00 0.24 0.23 0.26 0.22 0.25 0.21 0.26 0.28
TG     0.00 1.00 0.00 0.00 0.31 0.31 0.30 0.31 0.30 0.25 0.28 0.25
CT     1.00 0.00 0.00 0.00 0.29 0.29 0.29 0.30 0.28 0.28 0.28 0.28
CC     0.00 0.00 0.00 0.00 0.22 0.14 0.19 0.20 0.20 0.18 0.15 0.15
CA     0.00 0.00 0.00 0.00 0.32 0.37 0.34 0.36 0.31 0.37 0.35 0.39
CG     0.00 0.00 0.00 0.00 0.17 0.20 0.18 0.14 0.21 0.18 0.21 0.18
AT     1.00 0.00 0.00 0.31 0.20 0.27 0.30 0.29 0.28 0.30 0.25 0.31
AC     0.00 0.00 0.00 0.12 0.22 0.21 0.18 0.22 0.24 0.21 0.19 0.21
AA     0.00 0.00 0.00 0.24 0.27 0.26 0.29 0.25 0.26 0.24 0.28 0.27
AG     0.00 0.00 0.00 0.33 0.31 0.27 0.23 0.24 0.23 0.25 0.28 0.22
GT     1.00 0.00 0.00 0.00 0.28 0.23 0.21 0.24 0.25 0.26 0.25 0.27
GC     0.00 0.00 0.00 0.00 0.19 0.18 0.15 0.20 0.18 0.18 0.19 0.19
GA     0.00 0.00 1.00 0.00 0.26 0.35 0.36 0.30 0.33 0.34 0.32 0.34

But then carrying on I get Error: unknown line format

3294 dna.fa.good.gb.acc.ph2
first order for ACC 2
Error: unknown line format
       GC%      Intron  Accession        (File generated at 2014/02/15 Sat 13:38:23 GMT)

GC Range: (0,99)
3293 sequences of length 21 used from 3294
total sequences in dna.fa.good.gb.acc.ph2
Generating model...
• 5.4k views
ADD COMMENT
1
Entering edit mode

Can anyone help me with this error:

error, file not found info/training.fna

I ran the following command to generate gmhmm.mode file of my desired model

gmes_petap.pl --ES  --cores 4 --sequence test_genome.fasta
ADD REPLY
0
Entering edit mode

Hi angelshiza,

Recently, I have encountered with the same error ("error, file not found info/training.fna"). Have you found a solution to troubleshoot this error ? Any help is appreciated. Thanks in advance.

ADD REPLY
0
Entering edit mode

I also had the same error. There are two solutions. Change the perl path to all .pl files or use the command change_path_in_perl_scripts.pl

I did it in the following way: victorc:~/bin/gm_et_linux_64/gmes_petap $ ./change_path_in_perl_scripts.pl /home/victor/bin/perl

ADD REPLY
0
Entering edit mode

Hi, I was wondering if you found a solution to the above problem? I am now facing exactly the same problem and have no idea how to fix it.

Thaks

ADD REPLY
3
Entering edit mode
9.0 years ago
Jon ▴ 360

Hopefully you got this figured out by now, but the mod file required for maker is located in the genemark 'output' directory, by default it is called gmhmm.mod. So with the most recent version of GeneMark-ES 4.32, you would run something like the following command:

gmes_petap.pl --ES  --cores 4 --sequence test_genome.fasta

The file for maker would be in the output folder, so output/gmhmm.mod.

ADD COMMENT

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6