Tool:ObjTables: tools for creating and reusing high-quality spreadsheets (e.g., supplementary tables to articles)
1
4
Entering edit mode
4.3 years ago
jonrkarr ▴ 110

Dear colleagues,

We invite you to use ObjTables (https://objtables.org), a free and open-source toolkit, to create and reuse high-quality spreadsheets, such as supplementary tables to articles.

Comparing and integrating data is essential to science. However, it is difficult to reuse many data sets, including spreadsheets, one of the most common formats.

ObjTables makes spreadsheets reusable by combining spreadsheets with schemas, an object-relational mapping system, numerous data types for scientific information, and high-level software tools. First, ObjTables enables authors to use Excel and similar programs to create spreadsheets and use ObjTables to error check spreadsheets. Second, ObjTables extends the impact of sharing data by helping other investigators compare, merge, and translate spreadsheets into data structures that can be analyzed with tools such as Python.

ObjTables is available as a web application, a command-line program, a web service, and a Python package.

We hope that you join this initiative to make supplementary tables more reusable. Please contact us to share feedback or get involved. Together, we believe we can create a robust ecosystem of reusable data for research!

More information: https://objtables.org

Web application: https://objtables.org/app

Command line program and Python package: https://pypi.org/project/obj-tables/

Web service: https://objtables.org/api

Issues: https://github.com/KarrLab/obj_tables/issues

schema-validation CSV spreadsheet XLSX • 1.2k views
ADD COMMENT
0
Entering edit mode

This looks super interesting. Can you compare ObjTables to Frictionless Data?

ADD REPLY
0
Entering edit mode

Okay, I'll bite. What does it do when it sees "3-Mar" as a gene name?

ADD REPLY
0
Entering edit mode

ObjTables makes it possible to define validation to automatically recognize that this isn't a gene name. For some organisms, this could be done with regular expression patterns using the Regex attribute type. A more semantically-aware attribute could be added to use MyGene or similar to verify gene names. We're very interested in community feedback on types of attributes that would be helpful.

ADD REPLY
2
Entering edit mode
4.3 years ago
jonrkarr ▴ 110

Table Schema (Frictionless Data) is probably one of the closest general-purpose tools. Other similar domain-specific tools include IDEOM, ISA-Tab, MAGE-TAB, and SBtab. ObjTables has a few differences from Table Schema:

  • ObjTables is more focused on authors of supplementary tables to articles than on software developers.
  • ObjTables schemas directly cover spreadsheets that include multiple worksheets. Table Schema handles this more indirectly.
  • ObjTables places more emphasis human-readability. It does this by supporting multiple layout conventions including multi-level headings (basically this allows *-to-one relationships to be encoded into a group of columns within a table) and transposed tables (records are columns rather than rows). It also supports grammars for encoding *-to-many relationships into columns (e.g., reaction equations, mathematical equations, chemical formulae).
  • ObjTables can pretty-print spreadsheets for publication. This leverages some of the features of the XLSX format to help readers best utilize Excel and similar programs. This includes a table of contents, inline help embedded into notes, XLSX validation, highlighted header rows, etc.
  • ObjTables can also captured structured metadata about each dataset and each individual worksheet. For example, this can be used to note methods, units, the author, last updated date, etc.
  • ObjTables provides a few ways of handling comments in datasets. This include syntax for comment rows and syntax for distinguishing between worksheets and columns controlled by the schema and additional associated unstructured worksheets and columns.
  • ObjTables provides two ways of defining schemas A simpler tabular format for simple schemas which may be easier for many scientists. These can be placed in worksheet alongside data.
    • A more Pythonic way similar to ORMs such as Django or SQLAlchemy.
    • ObjTables can map to higher level Python classes such as NumPy arrays, SymPy expressions, Open Babel molecules, pint units, etc.
  • Similar to other ORMs, validation can be defined at multiple levels: attributes, instances, classes, datasets. This enables more complex validation such as checking for unique tuples, checking mass balance of chemical reactions, checking for acyclic networks, etc.
  • ObjTables provides methods for comparing, merging, and migrating datasets.

Maybe the Table Schema team can help clarify the differences.

ADD COMMENT

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6