I'm trying to think about the best ways to retrieve and unify data from many different genome databases into a common format I can work with locally?
I'm doing comparative genomics analysis and have been using EnsEMBL as my primary source to date, which has excellent APIs to access the various annotation features, but now I would like to retrieve sequences from databases such as FlyBase, GenBank etc and I'm trying to think of the best way to approach it?
Perhaps write separate wrappers/parsers and pull the data into a SQLite database, GFF or Chaos-XML file? I want something that can be portable too if possible!?
Lol! Thanks Pierre, I'll take a look at that :-)