Emboss seqret - problem conversion gff+fasta to EMBL
2
4
Entering edit mode
9.3 years ago
Juke34 8.9k

Hi everyone,

I try to use the seqret tool from Emboss but I'm experiencing some difficulties.

I would like to create am EMBL formatted file from a gff3 file and a fasta file.

I'm using the following command:

seqret -sequence genome.fasta -feature -fformat gff -fopenfile annotation.gff -osformat embl

My fasta file contains several sequences.

The problem is, the tool writes the gff3 features but as many time as there is a sequence in the fasta file (before each sequence).

Does someone has already experienced that and knows a way to avoid the problem?

Or any idea about another tool to do that conversion?

Thank you

emboss genome sequence software-error • 7.1k views
ADD COMMENT
4
Entering edit mode
7.7 years ago
Juke34 8.9k

After lot of time spent on that, I concluded that no tool was working properly nowadays for that purpose (GFF3 to EMBL). Actually in my group we were not the only one that faced up this problem... Indeed it has been released recently such kind of converter for the Prokka gff3 output: https://github.com/sanger-pathogens/gff3toembl In our side we also developed our own tool, but we implemented something more generalized that could be apply to any kind of gff3. We hope to release it publicly in the next few weeks.

ADD COMMENT
0
Entering edit mode

here is the tool we developed: https://github.com/NBISweden/EMBLmyGFF3

It works for any type of gff3 annotation.

ADD REPLY
0
Entering edit mode

Thank you for the tool!

ADD REPLY
2
Entering edit mode
9.3 years ago
Juke34 8.9k

Someone had already asked about the conversion, I found answers here Gff3 + Fasta To Genbank (Augustus Training Set)

Here they also propose an easy way to do the conversion using Bioperl: http://ratt.sourceforge.net/transform.html

Now my problem changed... I have an issue with the Locus name. Bioperl says:

--------------------- WARNING ---------------------
MSG: Bad LOCUS name? Changing [NODE_57_length_618_cov_40.4969_ID_247618] to 'unknown' and length to NODE_57_length_618_cov_40.4969_ID_247618

Any suggestion about what kind of locus name is expected to avoid to have it replaced by "unknown"?

ADD COMMENT
0
Entering edit mode

OK now I found information about LOCUS information expected here: Locus Field Format On Genbank

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6