I'm running Maker2 to annotate a genome but need to train Genemark-ES first. I have run Eukaryotic Genemark.hmm using the perl script which finished producing the GTF file but Maker2 requires a mod file and the mod folder is empty.
/home/apps/scripts/gm_es.pl ../RR_1.7b.fasta
How do I convert the GTF output file to the mod format or have I missed something/errors? The documentation doesn't seem to explain this.
There are some errors given out during the process, the first part of output has no errors (see below)
running hmm2nt.a2
4 files IN
Clusters were defined as:
0 <= GC% <= 99
99 < GC% <= 99
99 < GC% <= 100
Parsing dna.fa.good.cod
Program complete
----------------
6442 sequences found
6443 dna.fa.good.ini
first order for Ini
GC Range: (0,99)
6442 sequences of length 12 used from 6442
total sequences in dna.fa.good.ini
Generating model...
TT 0.25 0.35 0.16 0.19 0.27 0.15 0.00 0.00 0.00 0.00 0.28 0.27
TC 0.25 0.35 0.55 0.23 0.36 0.58 0.00 0.00 0.00 0.00 0.53 0.32
TA 0.25 0.16 0.15 0.29 0.20 0.14 1.00 0.00 0.00 0.00 0.10 0.13
TG 0.25 0.14 0.14 0.30 0.17 0.13 0.00 0.00 1.00 0.00 0.08 0.28
CT 0.25 0.30 0.20 0.11 0.28 0.14 0.00 0.00 0.00 0.00 0.31 0.35
CC 0.25 0.30 0.35 0.10 0.26 0.40 0.00 0.00 0.00 0.00 0.30 0.25
CA 0.25 0.26 0.32 0.62 0.27 0.33 1.00 0.00 0.00 0.00 0.23 0.21
CG 0.25 0.14 0.13 0.17 0.19 0.13 0.00 0.00 0.00 0.00 0.15 0.20
AT 0.25 0.27 0.20 0.15 0.23 0.18 0.00 1.00 0.00 0.00 0.24 0.23
AC 0.25 0.27 0.35 0.16 0.32 0.30 0.00 0.00 0.00 0.00 0.27 0.27
AA 0.25 0.29 0.23 0.43 0.36 0.30 1.00 0.00 0.00 0.00 0.31 0.22
AG 0.25 0.18 0.21 0.25 0.10 0.23 0.00 0.00 0.00 0.00 0.19 0.29
GT 0.25 0.24 0.25 0.21 0.19 0.21 0.00 0.00 0.00 0.18 0.15 0.30
GC 0.25 0.33 0.34 0.23 0.35 0.34 0.00 0.00 0.00 0.20 0.41 0.35
GA 0.25 0.28 0.25 0.34 0.30 0.28 1.00 0.00 0.00 0.26 0.29 0.21
GG 0.25 0.16 0.16 0.22 0.16 0.18 0.00 0.00 0.00 0.36 0.15 0.15
Done
6443 lines read from dna.fa.good.ter
6442 sequences obtained
1 comment lines
0 lines contained no sequence (or improperly formatted seq)
2445 sequences used TAA
1802 sequences used TAG
2195 sequences used TGA
0 sequences did not begin with a stop codon
All lines accounted for
Done
2445 dna.fa.good.taa
first order for TAA
GC Range: (0,99)
2445 sequences of length 12 used from 2445
total sequences in dna.fa.good.taa
Generating model...
TT 1.00 0.00 0.00 0.00 0.32 0.30 0.28 0.30 0.32 0.29 0.30 0.28
TC 0.00 0.00 0.00 0.00 0.16 0.19 0.15 0.20 0.21 0.21 0.21 0.18
TA 0.00 1.00 0.00 0.00 0.26 0.26 0.25 0.25 0.21 0.24 0.26 0.26
TG 0.00 0.00 0.00 0.00 0.25 0.25 0.32 0.25 0.26 0.26 0.23 0.28
CT 1.00 0.00 0.00 0.00 0.26 0.26 0.27 0.28 0.33 0.30 0.26 0.27
CC 0.00 0.00 0.00 0.00 0.17 0.15 0.20 0.20 0.19 0.20 0.18 0.20
CA 0.00 0.00 0.00 0.00 0.37 0.34 0.35 0.38 0.30 0.33 0.37 0.39
CG 0.00 0.00 0.00 0.00 0.20 0.25 0.19 0.14 0.19 0.17 0.18 0.14
AT 1.00 0.00 0.00 0.20 0.26 0.31 0.26 0.28 0.29 0.31 0.29 0.33
AC 0.00 0.00 0.00 0.15 0.22 0.15 0.18 0.22 0.22 0.23 0.22 0.22
AA 0.00 0.00 1.00 0.29 0.26 0.28 0.30 0.29 0.27 0.24 0.30 0.23
AG 0.00 0.00 0.00 0.36 0.25 0.25 0.25 0.21 0.22 0.23 0.19 0.22
GT 1.00 0.00 0.00 0.00 0.46 0.22 0.22 0.25 0.25 0.23 0.25 0.26
GC 0.00 0.00 0.00 0.00 0.20 0.17 0.19 0.19 0.19 0.21 0.18 0.22
GA 0.00 0.00 0.00 0.00 0.20 0.36 0.34 0.31 0.33 0.34 0.31 0.32
GG 0.00 0.00 0.00 0.00 0.15 0.26 0.25 0.25 0.22 0.22 0.26 0.20
Done
1802 dna.fa.good.tag
zero order for TAG
GC Range: (0,99)
1802 sequences of length 12 used from 1802
total sequences in dna.fa.good.tag
Generating model...
T 1.00 0.00 0.00 0.17 0.33 0.28 0.25 0.28 0.31 0.29 0.29 0.32
C 0.00 0.00 0.00 0.13 0.20 0.15 0.17 0.18 0.18 0.18 0.19 0.17
A 0.00 1.00 0.00 0.48 0.26 0.30 0.30 0.30 0.26 0.29 0.29 0.28
G 0.00 0.00 1.00 0.22 0.21 0.27 0.27 0.24 0.25 0.25 0.24 0.23
Done
2195 dna.fa.good.tga
first order for TGA
GC Range: (0,99)
2195 sequences of length 12 used from 2195
total sequences in dna.fa.good.tga
Generating model...
TT 1.00 0.00 0.00 0.00 0.30 0.32 0.31 0.29 0.30 0.32 0.28 0.31
TC 0.00 0.00 0.00 0.00 0.15 0.14 0.13 0.18 0.14 0.22 0.19 0.16
TA 0.00 0.00 0.00 0.00 0.24 0.23 0.26 0.22 0.25 0.21 0.26 0.28
TG 0.00 1.00 0.00 0.00 0.31 0.31 0.30 0.31 0.30 0.25 0.28 0.25
CT 1.00 0.00 0.00 0.00 0.29 0.29 0.29 0.30 0.28 0.28 0.28 0.28
CC 0.00 0.00 0.00 0.00 0.22 0.14 0.19 0.20 0.20 0.18 0.15 0.15
CA 0.00 0.00 0.00 0.00 0.32 0.37 0.34 0.36 0.31 0.37 0.35 0.39
CG 0.00 0.00 0.00 0.00 0.17 0.20 0.18 0.14 0.21 0.18 0.21 0.18
AT 1.00 0.00 0.00 0.31 0.20 0.27 0.30 0.29 0.28 0.30 0.25 0.31
AC 0.00 0.00 0.00 0.12 0.22 0.21 0.18 0.22 0.24 0.21 0.19 0.21
AA 0.00 0.00 0.00 0.24 0.27 0.26 0.29 0.25 0.26 0.24 0.28 0.27
AG 0.00 0.00 0.00 0.33 0.31 0.27 0.23 0.24 0.23 0.25 0.28 0.22
GT 1.00 0.00 0.00 0.00 0.28 0.23 0.21 0.24 0.25 0.26 0.25 0.27
GC 0.00 0.00 0.00 0.00 0.19 0.18 0.15 0.20 0.18 0.18 0.19 0.19
GA 0.00 0.00 1.00 0.00 0.26 0.35 0.36 0.30 0.33 0.34 0.32 0.34
But then carrying on I get Error: unknown line format
3294 dna.fa.good.gb.acc.ph2
first order for ACC 2
Error: unknown line format
GC% Intron Accession (File generated at 2014/02/15 Sat 13:38:23 GMT)
GC Range: (0,99)
3293 sequences of length 21 used from 3294
total sequences in dna.fa.good.gb.acc.ph2
Generating model...
Can anyone help me with this error:
I ran the following command to generate gmhmm.mode file of my desired model
Hi angelshiza,
Recently, I have encountered with the same error ("error, file not found info/training.fna"). Have you found a solution to troubleshoot this error ? Any help is appreciated. Thanks in advance.
I also had the same error. There are two solutions. Change the perl path to all .pl files or use the command change_path_in_perl_scripts.pl
I did it in the following way: victorc:~/bin/gm_et_linux_64/gmes_petap $ ./change_path_in_perl_scripts.pl /home/victor/bin/perl
Hi, I was wondering if you found a solution to the above problem? I am now facing exactly the same problem and have no idea how to fix it.
Thaks