Forum:Which database software is better for bioinformatics web development? MySQL or MongoDB?
3
0
Entering edit mode
6.2 years ago
mcclintock ▴ 10

Once I did a bioinformatics project using MongoDB, but I'm kind of confused which database software is more suitable for bioinformatics web development. Sure, MySQL, relational database, is great. However, what's the difference between the performance of these two kinds of database, relational and non-relational? PS: Does it depends on whether we use PHP or not?

database mysql mongodb software • 4.6k views
ADD COMMENT
10
Entering edit mode
6.2 years ago
Michael 55k

This is a very general question, in particular what is bioinformatics web development? It really depends on the application case and requirements, and certainly the type of data to store. Note: Database Management Systems (DBMS) are a back-end technology meaning they are used for storing and retrieving the data in a reliable and performant way. web development refers to front-end technology, that is the presentation of the application to the user and how the application is interacting with the user. Deciding on the technology should be the last step in the design process of a software.

Some questions to guide the decision process:

  • How is your data structured, can the data be represented in relational format?
  • Have you modeled the data using UML/ER diagrams?
  • Volume of your data?
  • Do you update/read often or rarely?
  • Do you update/read in parallel?
  • Do you need transactions, atomic actions, ACID, to ensure consistency?
  • Do you need foreign keys, indexing, stored procedures, views?
  • Distributed?
  • Other performance boosting required?
  • Replication required?
  • Do you need a well-defined Query language (SQL, XPATH) or can you hardcode a few queries?
  • can you setup a DB server in client server infrastructure?
  • do you need remote connections (CLI, TCP, web-services) to the database?
  • which authentication/authorization model do you use for your application?
  • which backup strategy do you prefer/policies are in place at the site of deployment?
  • Do language bindings to preferred frame-work/coding tools/languages exist at the appropriate level of abstraction?
  • Which support model do you prefer? (Some might prefer a commercial support model)

Candidates:

  • PostgreSQL (most feature-rich, covers practically everything named above, richest SQL, including Document database/NoSQL functions, will work most of the time)
  • MySQL, MariaDB, etc. (very similar in feature richness, maybe a bit higher performance)
  • SQLite (file based SQL database management, no server required, great for testing, should fit most bioinformatics needs, unless many concurrent updates, easy to share as a file - no dump required, easiest way to start and to learn SQL)
  • Oracle (if you need a commercial DBMS)
  • Large variable matrices (microarrays, RNA-seq data, other measurements) or array-shaped data with variable dimensions or loosely defined structure are not such a good fit for Relational Databases, HDF5 or similar formats could be a reasonable alternative, possibly keeping metadata in MongoDB.
ADD COMMENT
0
Entering edit mode

Great idea! There are so many things I never learned. Thanks, it really help me a lot.

ADD REPLY
0
Entering edit mode

in light of the 'bioinformatics' side of the question, I think it also becomes important to consider where you plan to run the app from. A lot of bioinformatics work takes place on shared systems like HPC's, which have restricted user-rights and software access; in these cases, I end up going with SQLite a lot because its often the easiest to get running there.

ADD REPLY
3
Entering edit mode
6.2 years ago

You could use PostgreSQL and JsonB to get the best of both worlds: relational and non-relational features.

ADD COMMENT
0
Entering edit mode

Thank you very much! I will give it a try.

ADD REPLY
2
Entering edit mode
6.2 years ago
Eric Lim ★ 2.2k

Micheal has given and asked some really excellent questions, so I'll just share what we've been doing.

Our approach is developing and deploying events driven web applications using serverless technology (https://aws.amazon.com/lambda/ ) and couple these applications with various serverless database services (https://aws.amazon.com/dynamodb/, https://aws.amazon.com/rds/aurora/, and https://cloud.google.com/bigquery/ ). The idea behind this approach is to reduce the group's burden in maintaining running servers so we can focus on the development and analytics.

ADD COMMENT

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6