I have been asked to look into creating a database to store data relating to de novo variants found during a sequencing project in my institution. The database would then used to centrally store all the information amassed on the discovered variants, to be accessed via a website, by other members of my institution to do things like download, upload, and modify information relating to these variants. I am somewhat familiar with pyhton, django, and mysql - (as in I've been using python for a number of years for simple scripting, and I've been through the django and mysql tutorials).
I have been thinking about the database design. The database will need to store things like chromosome, bp, gene, exon, strand, cytogenetic band, individual in which the variant was discovered, validation status, gerp score, polyphen prediction, and perhaps more information that I haven't thought of yet. Does anyone have any ideas for optimal db design in this instance. Should I use seperate tables for Variant, Gene, Individual, and then use relational tables (is that what they are called?) to put all the info together for the end user?
Any suggestions are welcome, and also if you know some better tools than python, django, and mysql, please let me know. Cheers, Davy.
If you want a subjective opinion, I am currently partial to flask instead of django and postgresql rather than mysql.
cross-posted on SO: http://stackoverflow.com/questions/11181142