I want to carry out gene prediction for fungus Cochliobolus sativus isolated strain. As there is no fungal training model available in GlimmerHMM, I am creating one using C.sativus ND90Pr, C.victoriae, C.miyabeanus ATCC 44560 v1.0, C.lunatus, C.heterostrophus, C.carbonum genomic data from JGI. When I execute trainGlimmerHMM <multifasta_file> <exon_file> I get an error for specific lines in my dummy exon file. According to my observation, the error occurs for reverse strand lines only. As mentioned in its README file, I have separated them with a blank line and also mentioned the co-ordinates in descending order. I get an error ERROR 27: Wrong exon coordinates file. Exon file line: scaffold_0 exon 3002 2420
Below is the dummy exon file
scaffold_0 3002 2420
scaffold_0 2422 2420
scaffold_0 3933 3078
scaffold_0 4219 3995
scaffold_0 4304 4267
scaffold_0 4397 4357
scaffold_0 4699 4450
scaffold_0 5213 5115
scaffold_0 5575 5264
scaffold_0 5724 5633
scaffold_0 5812 5778
scaffold_0 5921 5864
scaffold_0 5921 5919
scaffold_0 6144 6190
scaffold_0 6144 6146
scaffold_0 6247 6394
scaffold_0 6452 6598
scaffold_0 6596 6598
scaffold_0 7222 7310
scaffold_0 7222 7224
scaffold_0 7365 7461
scaffold_0 7526 7927
scaffold_0 7925 7927
scaffold_0 8253 9230
scaffold_0 8253 8255
scaffold_0 9228 9230
If I run the 'train' command only for forward strand exon co-ordinates, training set is created successfully. Can anyone please point out where I am going wrong?
The length of scaffold_0 is 870365 bases.
Error is generated most probably from this file: https://sourceforge.net/u/djinnome/jamg/ci/85b33b51b8ccdd6eadc8f5c7b8155baa119f4af4/tree/3rd_party/GlimmerHMM/train/trainGlimmerHMM
Search for
ERROR 27: Wrong exon coordinates file. Exon file line
I am not very good at perl so can't say much butmy ($anum,$ex1,$ex2)=/^(\S+)\s*([\>|\<]*\d+)\s*([\>|\<]*\d+\s*)$/;
In this line either
anum
orex1
orex2
has not been set properly.Hope it helps somehow.