Hi,
I have used GeneMark-ES in the past and the correct format GTF file is produced. However, I now want to run only GeneMark-E using the training file I have previously produced (using GM-ES).
The trouble is, GeneMark-E does not produce a correct GFF3 or GTF output. The first column does not give the contig number. As it stands, it is as though all the sequence data has been made into one sequence of 2122129 bp, so that the last gene for instance starts at position 2121656 - which is not particularly useful for graphical display (e.g. GenomeView)
Is there a way to resolve this within GeneMark or any outside tool/reasonable solution to recreate the GFF file correctly?
Here is the command I used and the output:
bsub "./gmhmme3 -m rum_parent_k71.mod -o bac_assembly_no_70.masked.7.gff3 -p -f gff bac_assembly_no_70.fa.masked"
##gff-version 3
# Eukariotyc GeneMark.hmm version bp 3.9d April 16, 2009
# Sequence name: bac_assembly_no_70.fa.masked
# Sequence length: 2122129 bp
# G+C content: 33.81%
# Matrices file:rum_parent_k71.mod
# Thu Jan 23 15:37:09 2014
# FASTA definition line: >1
##sequence-region seq 1 2122129
seq GeneMark.hmm3 gene 1270 11847 . + . ID=gene00001;Name=1
seq GeneMark.hmm3 mRNA 1270 11847 . + . ID=mRNA00001;Parent=gene00001;Name=1
seq GeneMark.hmm3 CDS 1270 1320 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 3449 3559 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 3986 4174 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 4793 4924 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 9577 9680 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 9890 10260 . + 1 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 10839 11146 . + 2 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 CDS 11530 11847 . + 0 ID=cds00001;Parent=mRNA00001;Name=1
seq GeneMark.hmm3 gene 12603 13670 . + . ID=gene00002;Name=2
seq GeneMark.hmm3 mRNA 12603 13670 . + . ID=mRNA00002;Parent=gene00002;Name=2
seq GeneMark.hmm3 CDS 12603 12639 . + 0 ID=cds00001;Parent=mRNA00002;Name=2
seq GeneMark.hmm3 CDS 13261 13670 . + 2 ID=cds00001;Parent=mRNA00002;Name=2
seq GeneMark.hmm3 gene 15040 18590 . - . ID=gene00003;Name=3
seq GeneMark.hmm3 mRNA 15040 18590 . - . ID=mRNA00003;Parent=gene00003;Name=3
seq GeneMark.hmm3 CDS 15040 15476 . - 0 ID=cds00001;Parent=mRNA00003;Name=3
seq GeneMark.hmm3 CDS 17352 17582 . - 1 ID=cds00001;Parent=mRNA00003;Name=3
seq GeneMark.hmm3 CDS 18524 18590 . - 1 ID=cds00001;Parent=mRNA00003;Name=3
seq GeneMark.hmm3 gene 20830 30008 . - . ID=gene00004;Name=4
seq GeneMark.hmm3 mRNA 20830 30008 . - . ID=mRNA00004;Parent=gene00004;Name=4
seq GeneMark.hmm3 CDS 20830 20980 . - 0 ID=cds00001;Parent=mRNA00004;Name=4
seq GeneMark.hmm3 CDS 21844 21914 . - 2 ID=cds00001;Parent=mRNA00004;Name=4
seq GeneMark.hmm3 CDS 21952 22019 . - 0 ID=cds00001;Parent=mRNA00004;Name=4
seq GeneMark.hmm3 CDS 22407 22651 . - 1 ID=cds00001;Parent=mRNA00004;Name=4