Database Documentation
5
8
Entering edit mode
13.8 years ago
Dave Bridges ★ 1.4k

I keep my experimental data in a MySQL database with a Django frontend. Although the django code is documented at the model level, the database isnt documented at all. I think it might be a good idea to document the database schema separately so I wonder what the BioStars suggest would be a good way to go about getting database documentation. I guess what I am looking for is a way to document what is in tables, what the fields mean, and how the relations are defined.

database mysql • 6.6k views
ADD COMMENT
6
Entering edit mode
13.8 years ago

This is in fact one of the advantages of the use of the Resource Description Framework, where databases can be self-descriptive, using ontologies, describing the meaning of the equivalents of rows and columns in relational database tables. You might be interested in the work of the RDB2RDF working group and the D2RQ tool in particular.

It is important to realize that the ontology can be very simple (perhaps not even to be called an ontology), just complex enough to reflect the meaning of tables, columns, and rows. For example, if you have a table with proteins, listing a uniprot ID and a sequence, the matching ontology can be as simple as something like:

:Protein rdf:type rdfs:Class ;
   rdfs:label "Protein Record" .
:uniProtID rdf:type rdfs:Property;
   rdfs:label "has the uniprot ID" ;
   rdfs:domain :Protein ;
   rdfs:range rdfs:Literal .
:sequence rdf:type rdfs:Property;
   rdfs:label "has the sequence" ;
   rdfs:domain :Protein ;
   rdfs:range rdfs:Literal .

Then, rows can either be represented as instances:

:1CRN a :Protein .

Or as subclasses:

:1CRN rdfs:subClassOf :Protein .

This could be used independently from whether or not you would expose the database as RDF instead. A full table row would look like (taking the row-as-subclass route):

:1CRN rdfs:subClassOf :Protein ;
    :uniProtID "P01542" ;
    :sequence "TTCCPSIVAR SNFNVCRLPG TPEALCATYT GCIIIPGATC PGDYAN" .

Further reading:

  1. Astrova, I., 2009. Rules for mapping SQL relational databases to OWL ontologies. In: Sicilia, M.-A., Lytras, M. D. (Eds.), Metadata and Semantics. Springer US, Boston, MA, Ch. 40, pp. 415-424. DOI:10.1007/978-0-387-77745-0_40
  2. Bob DuCharme, SPARQL and relational databases: getting started, 2008. http://www.snee.com/bobdc.blog/2008/10/sparql-and-relational-database.html
ADD COMMENT
2
Entering edit mode

Are there tools available that apply this idea?

ADD REPLY
2
Entering edit mode

I understand that, I've used Protege a lot. Ontology development is a specialised skill - if you're recommending this approach, I hope that I wouldn't have to first devise and test an ontology, then write a tool to map my relational model to OWL (or whatever) and then write a rendering tool to give human-readable documentation (in this case).

ADD REPLY
1
Entering edit mode

No, you don't. Ontologies can be complex, using upper ontologies, etc. But despite what some want people to believe, the can be small, simple, and just reflecting your schema. In fact, that is what it recommended by people in the Health Care and Life Sciences working group for the adoption of semantic web technologies (like OWL). KISS is perfectly fine in RDF. Just look at FOAF, or the RDF used by Google and Yahoo to enrich search results.

ADD REPLY
1
Entering edit mode

Well strictly speaking, RDF is completely schemaless and DOES NOT require ANY ontology....it just helps to find your way round...

ADD REPLY
0
Entering edit mode

The documentation of what is what in the relational database is done via an ontology, e.g. tables would be classes, rows could be instances or subclasses, and columns are predicates. There are many tools availale, Protege would be the best know to define the ontology used to describe the database schema.

ADD REPLY
0
Entering edit mode

Do you express the semantics of your RDB schema as a specialised ontology (created e.g. with Protege) and then use that to generate your SQL DDL and accompanying documentation? Perhaps I should open this as a question...

ADD REPLY
0
Entering edit mode

I added a very basic approach in my answer.

ADD REPLY
0
Entering edit mode

I added a very basic approach in my answer. No need for Protege, but some people like GUIs. The example is, in fact, not using OWL, but merely RDF Schema.

ADD REPLY
0
Entering edit mode

Thanks for your patience. That's a lot clearer to me now.

ADD REPLY
0
Entering edit mode

My apologies for being too brief to start with :)

ADD REPLY
0
Entering edit mode

Oh, also added a link to the D2RQ tool (first paragraph) which my might find interesting.

ADD REPLY
0
Entering edit mode

Yes, important point. RDF gives flexibility here.

ADD REPLY
3
Entering edit mode
13.8 years ago

Pragmatically, I would use Emacs org-mode to write a master document and 'tangle' the SQL DDL, LaTeX and HTML documentation out of the single source. The advantage is that it's very simple to do (and org-mode is included in Emacs 23).

First ensure that SQL is enabled for org-mode, in the .emacs file:

(org-babel-do-load-languages 'org-babel-load-languages
                             '((dot . t)    ; Graphviz dot
                               (sh . t)
                               (python . t) 
                               (perl . t)
                               (ruby . t)
                               (R . t)
                               (sql . t)))  ; add this pair if necessary

Here's the text input:

* My example schema
  Schema overview goes here.

** Organism table
   Explain the purpose of the 'organism' table.

#+begin_src sql :tangle schema.ddl :comments org
CREATE TABLE organism (
  id INTEGER PRIMARY KEY,
  taxid INTEGER NOT NULL,
  name varchar(512) NOT NULL
);
#+end_src

** Chromosome table
   Explain the purpose of the 'chromosome' table.

#+begin_src sql :tangle schema.ddl :comments org
CREATE TABLE chromosome (
  id INTEGER PRIMARY KEY,
  name varchar(128) NOT NULL,
  length INTEGER NOT NULL
);
#+end_src

*** Organism constraint
    Explain the purpose of a foreign key.

#+begin_src sql :tangle schema.ddl :comments org
ALTER TABLE chromosome ADD COLUMN organism INTEGER NOT NULL;
ALTER TABLE chromosome ADD CONSTRAINT chromosome_organism_fk \
FOREIGN KEY (organism) REFERENCES organism(id);
#+end_src

I can now generate the DDL (in this case, including docs as SQL comments), HTML and PDF via LaTeX. Since you get all the org-mode features, you can insert or generate other elements such as diagrams, cross-references and so on. Here's the DDL generated in the schema.ddl file when running M-x org-babel-tangle.

-- Organism table
--    Explain the purpose of the 'organism' table.

CREATE TABLE organism (
  id INTEGER PRIMARY KEY,
  taxid INTEGER NOT NULL,
  name varchar(512) NOT NULL
);

-- ** Chromosome table
--    Explain the purpose of the 'chromosome' table.

CREATE TABLE chromosome (
  id INTEGER PRIMARY KEY,
  name varchar(128) NOT NULL,
  length INTEGER NOT NULL
);

-- *** Organism constraint
--     Explain the purpose of a foreign key.

ALTER TABLE chromosome ADD COLUMN organism INTEGER NOT NULL;
ALTER TABLE chromosome ADD CONSTRAINT chromosome_organism_fk \
FOREIGN KEY (organism) REFERENCES organism(id);

You can also go the other way and include executable SQL queries in the org-mode file which insert their results back into that file when they are run (instead of producing source code as above). This way real examples and results may be merged back into the documentation source.

ADD COMMENT
2
Entering edit mode
13.8 years ago
Neilfws 49k

SQLFairy is a nice set of tools for database inspection. From their webpage:

SQL::Translator is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle), visualizations of schemas (pseudo-ER diagrams: GraphViz or GD), automatic code generation (using Class::DBI), converting non-RDBMS files to SQL schemas (xSV text files, Excel spreadsheets), serializing parsed schemas (via Storable, YAML and XML), creating documentation (HTML and POD), and more.

One simple use is creation of an Entity Relationship Diagram (ERD). For example, if you've installed the Debian/Ubuntu sqlfairy package, you can run:

sqlt-graph -f MySQL -o mydatabase.png -t png mydatabase.sql

Apart from that as Pierre suggests, if you use a ORM then in effect, your code documents the database schema. I prefer Ruby for this task, for which there are multiple ORMs: Sequel and DataMapper are popular choices.

ADD COMMENT
0
Entering edit mode

SQLFairy has saved me many hours of work porting from MySQL to Postgres.

ADD REPLY
1
Entering edit mode
13.8 years ago
  • You could use DIA to simply draw the schema.
  • You could use a format used by an Object Relational Mapping (ORM) tool.

    For example, the XML format used by hibernate: can be used to describe the tables and their relationships. Although those tools are mainly used by java, the format remains valid for describing your schema.

    <hibernate-mapping>
    <class name="Snp" table="snp">
    <id name="id" column="id"> <generator class="increment"/></id>
    <property name="rsNumber" column="name" >
    <property name="chromosome" column="chrom">
    </property>
    <property name="pos" column="pos">
    <meta attribute="description">The 0 based position</meta>
    </property>
    <set name="courses" table="SNP_GENE" cascade="all">
          <key column="GENE_ID" />
          <many-to-many column="GENE_ID"  class="Gene" />
    </set>
    </class>
    (...)
    </hibernate-mapping>
    
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
13.8 years ago
Ngn ▴ 130

There are quite a number of standard tools to document relational databases (Entity-relationsip modeling).

MySQL Workbench might be the most straightforward tool to document an existing MySQL database.

http://dev.mysql.com/downloads/workbench/5.2.html

It provides means to reverse engineer the database design (e.g. retrieve tables and relationships of an existing database) and autogenerate the entity relationship diagrams.

Screenshots: http://wb.mysql.com/?page_id=35

ADD COMMENT

Login before adding your answer.

Traffic: 4320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6