I need to store our NGS data in a database. Are their any open source solutions? I would prefer not to design the database and write all of the import/analysis scripts! A relational database solution would be preferable.
Thanks for your help
Sorry for the lack of clarity. I have no intention of storing the reads in a database. I would like to store variants, annotate the variants and the supply users with a web interface to view the variants.
You should probably specify what line of research do you wish to pursue. NGS is too broad of the term, it is unlikely that you could find a database that would be appropriate to all types of analyses.
Do you mean to store your actual sequence data in a database? Or, is it more of a LIMS-type application where you want to store details about the samples, libraries, runs, analyses done with links to where the actual sequence files reside on a file server?
Good question. I wish I knew of such a tool already available. It's on my planning list, in nice MySQL... with an elegant easy-to-use GUI front-end... post on github for download for all interested... but only so many hours in the day... Has anyone already done this?
Thanks for your clarification. If you want an open source database for storing and displaying variants on the web, then the Leiden Open Variation Database might fit the bill. They "provide a flexible, freely available tool for Gene-centered collection and display of DNA variations."
Also, check out the Human Variation Database. They provide "an open-source (PostgreSQL) database for the storage and analysis of thousands of next-generation sequencing variations, a Java API to perform common functions, such as generation of standard experimental reports and graphical summaries of modifications to genes, and libraries to allow adopters of the database to quickly develop their own queries."
Your question (once clarified) was actually asked before on Biostar. See the suggestion there to adopt Ensembl's variation schema and API.
Thanks for posting this link, I wasn't aware of this project. Looks very, very interesting. How does the ISAtab format compare to standard SQL? One key feature that I would look for is the ability to pipe data from the db to various tools in Perl, Python, R, etc. for further analysis. Is there a DBI for ISAtab (I don't see one on CPAN, for example...)?
Do you really intend to store actual NGS reads in a RDBMS and access it through DBI? In my opinion the storage of NGS data in a database makes no sense whatsoever - the more layers of software you introduce, the slower things usually get. The primary purpose of an RDBMS is not performing sequential I/O. In my experience, DBI is terribly slow and I avoid using it for any form of data import into database tables. Stay as close as possible to the operating system, and you will generally achieve much higher throughput. Perhaps if you could specify what exactly you are planning to do, people could come up with better suggestions :-)
You should probably specify what line of research do you wish to pursue. NGS is too broad of the term, it is unlikely that you could find a database that would be appropriate to all types of analyses.
Do you mean to store your actual sequence data in a database? Or, is it more of a LIMS-type application where you want to store details about the samples, libraries, runs, analyses done with links to where the actual sequence files reside on a file server?
Good question. I wish I knew of such a tool already available. It's on my planning list, in nice MySQL... with an elegant easy-to-use GUI front-end... post on github for download for all interested... but only so many hours in the day... Has anyone already done this?