Tool:Goodbye, Genbank: A Python package that salvages feature annotations from GenBank records
0
5
Entering edit mode
8.6 years ago

Hi,

While building a parts library for internal use, I noticed the quirks of the GenBank format and also the fact that almost no GenBank file is up to spec. I started building a tool to iron out the quirks and salvage only the usable parts of GenBank feature annotations for use elsewhere. It has become a larger task than I initially anticipated and I thought some other people might find it useful or wish to contribute to it, so I made it open source:

https://biosustain.github.io/goodbye-genbank/

In summary, this is:

  • A Python package for use with Biopython
  • It maps GenBank feature keys (and in some cases qualifiers) to Sequence Ontology terms.
  • It fixes/normalizes GenBank feature qualifiers (annotations) and discards qualifiers that cannot be fixed. This is customizable to allow for adding your own salvaging code for certain qualifiers.
  • The output is nice, predictable features that can be used elsewhere.
  • Masochists can also use this package to simply clean up GenBank feature annotations into valid GenBank.

(A GFF3 exporter is also planned, but it may be a while.)

sequence genome python biopython • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6