Relational Database For De Novo Variants
4
1
Entering edit mode
12.4 years ago
Davy ▴ 410

I have been asked to look into creating a database to store data relating to de novo variants found during a sequencing project in my institution. The database would then used to centrally store all the information amassed on the discovered variants, to be accessed via a website, by other members of my institution to do things like download, upload, and modify information relating to these variants. I am somewhat familiar with pyhton, django, and mysql - (as in I've been using python for a number of years for simple scripting, and I've been through the django and mysql tutorials).

I have been thinking about the database design. The database will need to store things like chromosome, bp, gene, exon, strand, cytogenetic band, individual in which the variant was discovered, validation status, gerp score, polyphen prediction, and perhaps more information that I haven't thought of yet. Does anyone have any ideas for optimal db design in this instance. Should I use seperate tables for Variant, Gene, Individual, and then use relational tables (is that what they are called?) to put all the info together for the end user?

Any suggestions are welcome, and also if you know some better tools than python, django, and mysql, please let me know. Cheers, Davy.

denovo database python mysql • 3.4k views
ADD COMMENT
0
Entering edit mode

If you want a subjective opinion, I am currently partial to flask instead of django and postgresql rather than mysql.

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
12.4 years ago

In my bookmarks: LOVD:

"Leiden Open (source) Variation Database."

LOVD's purpose : To provide a flexible, freely available tool for Gene-centered collection and display of DNA variations.

http://www.lovd.nl/2.0/

ADD COMMENT
2
Entering edit mode
12.4 years ago

Not knowing your level of expertise, here are some thoughts from my own database experience. First, you should try answering the following questions:

  • which fields belong to the same category (describe the characteristics of another field)
  • which fields of the db will be queried by users?
  • which combination of fields will be queried most frequently?

These answers should lead you to the database desgin that is most appropriate for your situation.

Briefly, each table should describe the features of a given variable. For instance, if you have a 'Gene' table, each line should describe chromosome, strand, ORF position, ... It may sound obvious, but conceptually separating each variable description into a table will give you a sane design. Then, according to the expected usage, you should avoid having to join tables to access data. If this is done frequently, you should consider having an intermediate table containing the join.

ADD COMMENT
0
Entering edit mode
12.4 years ago
Christian ★ 3.1k

You might find this useful:

Human variation database: an open-source database template for genomic discovery

http://bioinformatics.oxfordjournals.org/content/27/8/1155.abstract

ADD COMMENT
0
Entering edit mode

This is an old question but I am looking at setting up a local DB for variations from local exome sequencing projects which will allow us to see rare variants that are due to our local population and are not causal disease variants. I looked at using this tool but following the website's instructions it really seems like it just doesn't work. Anyone successfully using this?

ADD REPLY
0
Entering edit mode

Have you tried contacting the author?

ADD REPLY
0
Entering edit mode

Not yet, decided to cut my losses and just implement my own database. Was going to be quicker and easier for my needs anyway.

ADD REPLY
0
Entering edit mode
12.4 years ago

Before going off to build something from scratch, you should look at using biomart. It may suit your needs and will be less work than building something from scratch (though it is less flexible than DIY).

ADD COMMENT

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6