Genemark-E Gff3 Incorrect Format (Conversion Tool Available?)
0
0
Entering edit mode
10.9 years ago
jomaco ▴ 200

Hi,

I have used GeneMark-ES in the past and the correct format GTF file is produced. However, I now want to run only GeneMark-E using the training file I have previously produced (using GM-ES).

The trouble is, GeneMark-E does not produce a correct GFF3 or GTF output. The first column does not give the contig number. As it stands, it is as though all the sequence data has been made into one sequence of 2122129 bp, so that the last gene for instance starts at position 2121656 - which is not particularly useful for graphical display (e.g. GenomeView)

Is there a way to resolve this within GeneMark or any outside tool/reasonable solution to recreate the GFF file correctly?

Here is the command I used and the output:

bsub "./gmhmme3 -m rum_parent_k71.mod -o bac_assembly_no_70.masked.7.gff3 -p -f gff bac_assembly_no_70.fa.masked"

##gff-version 3
# Eukariotyc GeneMark.hmm version bp 3.9d April 16, 2009
# Sequence name: bac_assembly_no_70.fa.masked
# Sequence length: 2122129 bp
# G+C content: 33.81%
# Matrices file:rum_parent_k71.mod
# Thu Jan 23 15:37:09 2014
# FASTA definition line: >1
##sequence-region seq 1 2122129
seq     GeneMark.hmm3   gene    1270    11847   .       +       .       ID=gene00001;Name=1
seq     GeneMark.hmm3   mRNA    1270    11847   .       +       .       ID=mRNA00001;Parent=gene00001;Name=1
seq     GeneMark.hmm3   CDS     1270    1320    .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     3449    3559    .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     3986    4174    .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     4793    4924    .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     9577    9680    .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     9890    10260   .       +       1       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     10839   11146   .       +       2       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   CDS     11530   11847   .       +       0       ID=cds00001;Parent=mRNA00001;Name=1
seq     GeneMark.hmm3   gene    12603   13670   .       +       .       ID=gene00002;Name=2
seq     GeneMark.hmm3   mRNA    12603   13670   .       +       .       ID=mRNA00002;Parent=gene00002;Name=2
seq     GeneMark.hmm3   CDS     12603   12639   .       +       0       ID=cds00001;Parent=mRNA00002;Name=2
seq     GeneMark.hmm3   CDS     13261   13670   .       +       2       ID=cds00001;Parent=mRNA00002;Name=2
seq     GeneMark.hmm3   gene    15040   18590   .       -       .       ID=gene00003;Name=3
seq     GeneMark.hmm3   mRNA    15040   18590   .       -       .       ID=mRNA00003;Parent=gene00003;Name=3
seq     GeneMark.hmm3   CDS     15040   15476   .       -       0       ID=cds00001;Parent=mRNA00003;Name=3
seq     GeneMark.hmm3   CDS     17352   17582   .       -       1       ID=cds00001;Parent=mRNA00003;Name=3
seq     GeneMark.hmm3   CDS     18524   18590   .       -       1       ID=cds00001;Parent=mRNA00003;Name=3
seq     GeneMark.hmm3   gene    20830   30008   .       -       .       ID=gene00004;Name=4
seq     GeneMark.hmm3   mRNA    20830   30008   .       -       .       ID=mRNA00004;Parent=gene00004;Name=4
seq     GeneMark.hmm3   CDS     20830   20980   .       -       0       ID=cds00001;Parent=mRNA00004;Name=4
seq     GeneMark.hmm3   CDS     21844   21914   .       -       2       ID=cds00001;Parent=mRNA00004;Name=4
seq     GeneMark.hmm3   CDS     21952   22019   .       -       0       ID=cds00001;Parent=mRNA00004;Name=4
seq     GeneMark.hmm3   CDS     22407   22651   .       -       1       ID=cds00001;Parent=mRNA00004;Name=4
gff3 format • 3.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6