Merging gff/gbk to create a 'full' gbk
0
0
Entering edit mode
9.0 years ago
Joe 21k

Hi,

Apologies if there is an existing answer to this, but I'm not exactly sure what I'm looking for...

I have a number of genomes that are annotated by the bacterial annotation program prokka. The output of the program is a number of sequence files:

-rw-r--r-- 1 wms_joe wms_joe 834K Jan  4 14:21 PLJXUR_01042016.err
-rw-r--r-- 1 wms_joe wms_joe 1.6M Jan  4 14:21 PLJXUR_01042016.faa
-rw-r--r-- 1 wms_joe wms_joe 4.3M Jan  4 14:21 PLJXUR_01042016.ffn
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan  4 14:06 PLJXUR_01042016.fna
-rw-r--r-- 1 wms_joe wms_joe 5.3M Jan  4 14:21 PLJXUR_01042016.fsa
-rw-r--r-- 1 wms_joe wms_joe  12M Jan  4 14:21 PLJXUR_01042016.gbk
-rw-r--r-- 1 wms_joe wms_joe 6.9M Jan  4 14:21 PLJXUR_01042016.gff
-rw-r--r-- 1 wms_joe wms_joe  60K Jan  4 14:21 PLJXUR_01042016.log
-rw-r--r-- 1 wms_joe wms_joe  18M Jan  4 14:21 PLJXUR_01042016.sqn
-rw-r--r-- 1 wms_joe wms_joe 1.2M Jan  4 14:21 PLJXUR_01042016.tbl
-rw-r--r-- 1 wms_joe wms_joe  151 Jan  4 14:21 PLJXUR_01042016.txt

As you might be able to tell from the extensions, there are fasta feature files, plaintext files, tabular files and a gff and gbk (among others).

For some reason (a quirk of the program I guess, or maybe an option I'm missing), the gbk file contains no annotations, so when browsing in Artemis Genome Browser (from the Sanger Ins.) the sequence is available, but no genes. Consequently, I use the gff for examining the genomes. I'm not actually sure why this works, as the gff is supposed to only be a feature file and contain no sequence as far as I'm aware.

So my actual question is:

Can I somehow combine annotation information in to the genbank, to create a genbank (full) as you might get from NCBI, or, combine sequence information in to the gff?

I just want one file that can be browsed that has both the sequence and annotation. Does anyone know of any scripts or programs that already take care of that?

sequence annotation • 4.5k views
ADD COMMENT
0
Entering edit mode

GFF3 files can contain sequence information in a sequence section, see http://gmod.org/wiki/GFF3#GFF3_Sequence_Section

That would explain why it works in the genome browser, alternatively this works because you have loaded the sequence first and add the annotation file in the same session.

If you focus on making a Genbank file from GFF the answer can already be found here: Converting Gff/Gtf + Reference To Embl Or Genbank ...Any Tools Available?

ADD REPLY
0
Entering edit mode

Ah excellent, They do indeed seem to include the sequence. I must have had errors claiming no sequence from trying to use them with programs that do not yet support GFF.

I know I wasn't adding the annotations after loading the sequence as it would work with just the gffs alone. Thanks for the links.

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6