Genbank File Format Question
1
0
Entering edit mode
12.3 years ago
Lee Katz ★ 3.2k

Hi, I am making a script to concatenate contigs in genbank format. There are many annotations in the file. Between each contig is a linker sequence that is bounded by Ns and internally has starts and stops in every coding frame. The objective is to be able to view a single contig in a genome browser such as Apollo.

My question is, how would I correctly annotate the artificial linker sequence in GenBank? I found several fields that are allowed in GenBank format as feature keys, but none seem to qualify as "artificial linker sequence." I found the "unsure" feature key which is as good as any, but is there one that Apollo will recognize and that is allowed in GenBank format?

For reference: http://www.insdc.org/documents/feature_table.html#7.3

genbank format • 3.2k views
ADD COMMENT
1
Entering edit mode
12.2 years ago
Torst ▴ 980

I tend to use "unsure" for stuff like that. But you could use "assembly-gap" which not strictly true to the INSDC definition, is in the spirit of what it is! I think "misc-feature" is also valid here. I've even seen "-" as a feature type in some Genbank files.

If you are happy to use GFF3 + SOFA feature types, then "assembly-component" is appropriate.

ADD COMMENT

Login before adding your answer.

Traffic: 2110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6