Question

Genome Specific Database (Gbrowse / Ensembl Type)

4

Entering edit mode

14.7 years ago

Darked89 4.7k

I am interested in your opinions about database systems used to store, query and visualize genomic sequence and annotations. I am talking about ca 600-700Mb draft genome with a large number of contigs outside scaffolds. Yep, I know that annotating anything before reaching some quality milestones may be considered pointless, but I want to get the back end (DB) and the pipeline
working way before that.

So far I started testing Gbrowse (1.70), been impressed by Ensembl as an end-user, and looked at (unsuitable) eye candy GenomeProjector http://www.g-language.org/GenomeProjector/

I will appreciate any thoughts about ease of installation/maintenance and integration with annotation tools such as Apollo / Artemis.

Thanks

darked89

PS There is no way top add proper tags (genome annotation database) to this post

genome-annotation-database sequence • 4.6k views

ADD COMMENT • link updated 13 months ago by Ram 44k • written 14.7 years ago by Darked89 4.7k

0

Entering edit mode

now fixed, tagging rules have been relaxed please try again, thanks

ADD REPLY • link 14.7 years ago by Istvan Albert 101k

Ram · Answer 1 · 2010-03-02

This is a really debated topic, whether it is better to store sequences on a database or on simple flat files. I have never had to annotate draft genomes as you so I can't suggest you which is the best approach for you, but I would recommend using flat files, as you will have more support and tools, it will take less time to set it up, and I have the feeling that that is the direction that most projects are taking for the future.

In case you want to use databases, have a look at this post and a this type of column type, the datatype-geometric.

In case you want to try flat files, you will have to study BED, GFF, and maybe BAM formats, along with VCF if you have snps. For example, if you BED, you will be able to use BEDTools, which will allow you to merge and work with genomic features and are very fast. You will be surprised to know that GBrowse uses only GFF files to store data, it has no DB backend.

Another alternative is HDF5, about which you may find some questions here. So, you have a lot of homework here :-)

Ram · Answer 2 · 2010-03-28

3

Entering edit mode

14.7 years ago

Yannick Wurm ★ 2.5k

What are your needs?

http://stackoverflow.com/questions/1890285/are-there-any-existing-solutions-for-creating-a-generic-dna-sequence-database-wit/1893358

I think chado/Apollo is the way to go.

ADD COMMENT • link updated 13 months ago by Ram 44k • written 14.7 years ago by Yannick Wurm ★ 2.5k