How To Build A Custom Model File In Genemark?
1
4
Entering edit mode
14.5 years ago
Panos ★ 1.8k

I want to use GeneMark (2.7d) to do gene prediction in a soup of sequences. I don't want, however, to use the general bacterial/archaeal model. Instead, I want to create a custom model file. While this is easy in glimmer (just select a bunch of long orfs, for example), I cannot find out how to do it using GeneMark (it's probably the 'probuild' program but don't know anything else).

Edit: I downloaded GeneMark from GeneMarkS - Linux64. After expanding the zipped file, there's a program called 'probuild' that you use for building custom models but the documentation is really poor (or "hidden" somewhere I cannot easily find!). The contents of one of the prebuilt model files are like this:

PHMM 2.5
NAME Aeropyrum_pernix
ORDM 2
ATG_ 0.298
GTG_ 0.279
TTG_ 0.423
CTG_ 0
TAA_ 1
TAG_ 1
TGA_ 1
MINC 40
MAXC 12000
MAXN 12000
NDEC 150
CDEC 300
CDCD 0.0
CD1P 1
CD2P 1
COD1 
0.00780 0.00540 0.00895
...Lots of numbers.....
COD2
...Lots of numbers.....
NONC
...Lots of numbers.....
RBSM
0.132    0.167    0.431    0.270
...Some more numbers...
RBSL 34

RBSD
0.016    0.008    0.024    0.032    0.12    0.174    0.128    0.086    0.094    0.08    0.03    0.022    0.012    0.012    0.012    0.006    0.002    0.006    0.01    0.01    0.008    0.012    0.004    0.008    0.008    0.008    0.002    0.012    0.002    0.012    0.004    0.002    0.01    0.01
gene prediction model • 7.0k views
ADD COMMENT
1
Entering edit mode

can you post a link to the program and maybe an example of the file you want to generate?

ADD REPLY
5
Entering edit mode
14.5 years ago
User 59 13k

GeneMarkS will do this for you - in fact this is what GeneMarkS is designed to do - self train it's own ORF prediction on an anonymous genome.

To generate the GeneMark.mat files simply run something like:

gmsn.pl -euk your_genome.fasta

(the -euk switch is great for intronless eukaryotes, but probably not for prokaryotes)

Then try

gm -m GeneMark.mat -R  -lo -op your_genome.fasta

The output options are of course, up to you - I like an ORF output so I can run genemark2artemis on the output easily.

ADD COMMENT
0
Entering edit mode

Thanks a lot Daniel! I checked gmsn.pl and it appears to be a --prok option, too. When I use it, I get both the mat and the mod files. What's the difference between them? Is *mod intended for use with prokaryotes?

ADD REPLY
0
Entering edit mode

The mat is definitely the gene model file. I think it is directly converted from the mod file (an HMM profile I assume) by mkmat (after the probuild step. Because this is all abstracted away by gmsn, this is just a hunch!

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6