If so, please answer with a review and how you use it.
If you've used another ETL (extract, transform and load) system, please feel free to post.
Related Links:
- http://en.wikipedia.org/wiki/Pentaho
- Pentaho Data Integration (Kettle)
If so, please answer with a review and how you use it.
If you've used another ETL (extract, transform and load) system, please feel free to post.
Related Links:
I don't know about ETL tools from other domains but this is the process we go through when loading data into InterMine.
InterMine provides a set of scripts for reading from many standard biological data formats. These read XML, flat files or databases and translate data into the InterMine model (based on Sequence Ontology). You can also add your own sources which provide a script and any new classes/fields you want to add to the data model.
You configure the sources you wish to include in your data warehouse and each one is loaded in turn, integration and conflict resolution is all configurable.
Why use business tools for data integration when there are better alternatives such as biomart, intermine and DAS?
Extraction and loading are normally done with wrappers depending on the data source: each project will have associated scripts. The Bio-* projects (e.g. bioruby, biopython, bioperl, biojava) can take this role via their modules, including transforming the data. Depending on the data source, standardisation is often achieved via a reference sequence: often a genome build or a uniprot reference etc.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Istvan, I disagree. I wish there was more BI (i.e. those kinds of tools and thinking) in bioinformatics. You'd have higher quality science. Most pharma companies use some very high quality data warehouses (sometimes built from open source tools), and the quality of how they function is way better than most academic research
I think you will find that the so called "business intelligence" and bioinformatics has little in common.
Istvan Albert: Is Perl the most common method for extracting, transforming and loading data? Not looking to use Pentaho, just Kettle for data integration automation.
ETL sounds like something more applicable to medical informatics. But I imagine some of the bigger repositories do use something like that.