Question

What Are The Advantages Of Data Management In Databases?

0

Entering edit mode

11.6 years ago

jobinv ★ 1.1k

I recently described our group's current status in this post: We have the minimum of everything required for bioinformatics analysis; why do we need more?

This is a follow-up to one of those points, namely the data management issue. Would someone be able to give me good arguments for why it is better to switch over to database-based data management? What are the advantages of this, that I would not be able to do by just keeping everything in files?

database • 4.1k views

ADD COMMENT • link updated 11.6 years ago by Istvan Albert 102k • written 11.6 years ago by jobinv ★ 1.1k

2

Entering edit mode

Well, you will have to define more clearly what you would want to store in such a databases. Generally speaking, databases are good for relational data.

Also, I don't think it makes sense to explicitly use either.

ADD REPLY • link 11.6 years ago by David Westergaard ★ 1.5k

1

Entering edit mode

Perhaps not a strictly bioinformatics-related question, this, but it is so tightly connected to what we need to do in bioinformatics that I still consider it appropriate for this forum. Please let me know if I am wrong about this.

ADD REPLY • link 11.6 years ago by jobinv ★ 1.1k

0

Entering edit mode

You posed and answered your question in the same sentence: "not a strictly bioinformatics-related question" yet "so tightly connected to what we need to do in bioinformatics". I consider activities connected to bioinformatics to be the subject of bioinformatics questions. So, I think this question is completely appropriate here.

As to the question itself, without a database, what method would you suggest for making queries across all your projects? A script that scans directories and reads standardized flat files? The "management" part implies the ability to gain and navigate some kind of overview. I employ both methods due to a generally un-directed and historically messy design process, but seldom hear about how to "manage" data overviews without a database.

ADD REPLY • link 11.6 years ago by seidel 11k

1

Entering edit mode

A friend of mine made a suggestion to me just earlier today, that if I'm talking about a rare query that I'm interested in doing across projects, then it might be better to just stick with flat files. He was suggesting that maintaining a database with all its hassles might be a bit excessive for what I would need it for.

ADD REPLY • link 11.6 years ago by jobinv ★ 1.1k

1

Entering edit mode

http://stackoverflow.com/questions/2356851 "database vs. flat files" ; http://stackoverflow.com/questions/6853482 "Flat file vs database - speed?"; etc...

ADD REPLY • link 11.6 years ago by Pierre Lindenbaum 166k

score 1 · Answer 1 · 2013-10-13

I guess with "database" you think of a relational database.

Pros:
1. less data redundancy (if normalized) this:
  - reduces errors
  - enables consistent data changes (e.g. renaming of one experimental condition across multiple experiments)
2. a standard query and reporting language across all your data
3. error checking on data entry (completeness of records, wrong data types)
4. integrity on data changes (ACID http://en.wikipedia.org/wiki/ACID) for most relational databases
5. tools allow relatively easy construction of GUIs from database models
6. most programming languages have drivers for RDMS. So you have one data model and can query/report/update with R, java, python etc...
7. all data in one place (compared to data in folders). This allows you to integrate data across experiments for checking of systematic trends e.g. quality control
8. evolution will be consistent across all the data. Which will be a little more difficult than an ad hoc change with the current project, but the consistency pays off.
Cons:

Some data structures are more difficult to represent in a relational database e.g. trees
Need some thought (and experience) at the beginning to implement well

For most of my analysis projects I also have a sample csv table. But as the project grows I start to feel the pain (mostly data inconsistencies). We also have some RDBMs for the real stuff of course, but the additional data that I have for the individual projects (additional sample annotation from the researcher) is not entered into the RDBMs, because it has no place there. But I query to RDBMs to check for some consistency.

score 1 · Answer 2 · 2013-10-13

Be careful not to think in terms of (false) choices. Storing data in databases does not preclude you from also keeping them around in flat files.

Databases are designed to represent/query information stored in a predetermined format. They work best when used in a specialized context and for solving a well defined use case.

In fact you probably would need to create different databases for different use cases.

The more "unified" and "global" your database the more untenable and difficult your task of creating and maintaining them.