Dear all,
I have a gff3 file that looks like this:
# start gene g1
scaf00001 AUGUSTUS gene 6504 8593 . + . ID=g1
scaf00001 AUGUSTUS transcript 6504 8593 . + . ID=g1.t1;Parent=g1;Ontology_term=GO:0055085,GO:0016021
scaf00001 AUGUSTUS intron 6625 6675 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 6797 6841 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 6924 6966 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 7119 7161 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 7245 7286 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 7423 7476 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 7630 7673 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 7750 7962 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 8110 8158 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 8225 8265 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS intron 8365 8407 . + . Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 6504 6624 . + 0 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 6676 6796 . + 2 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 6842 6923 . + 1 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 6967 7118 . + 0 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 7162 7244 . + 1 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 7287 7422 . + 2 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 7477 7629 . + 1 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 7674 7749 . + 1 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 7963 8109 . + 0 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 8159 8224 . + 0 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 8266 8364 . + 0 Parent=g1.t1,g1
scaf00001 AUGUSTUS exon 8408 8593 . + 0 Parent=g1.t1,g1
That I want to convert to the EMBL flat file format.
To do that I have been using EMBLmyGFF3 in the following way:
EMBLmyGFF3 -i XXX -m "genomic DNA" -p XXX --rg "XXX" -t linear -x "INV" -s "XXX" -r 1 -o all-annotations.embl all-annotations.gff3 genome.fna
The output contains the following errors:
the exons are duplicated in the EMBL file output. Here is an example:
FT exon 6504..6624
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6676..6796
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6842..6923
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6967..7118
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7162..7244
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7287..7422
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7477..7629
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7674..7749
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7963..8109
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8159..8224
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8266..8364
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8408..8593
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6504..6624
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6676..6796
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6842..6923
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 6967..7118
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7162..7244
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7287..7422
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7477..7629
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7674..7749
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 7963..8109
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8159..8224
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8266..8364
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
FT exon 8408..8593
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
the introns are joined, which does not make any biological sense. Here is an example:
FT intron join(6625..6675,6797..6841,6924..6966,7119..7161,
FT 7245..7286,7423..7476,7630..7673,7750..7962,8110..8158,
FT 8225..8265,8365..8407)
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
Does someone know why I'm getting these errors?
Many thanks, Sophie
Sorry, For some reason the embl lines are all together in the submitted questions. I tried formatting them in code but it didn't work.
Now formatted properly.
Thanks! It looks much better!