Question

CWL: reading files within an expressionTool

2

Entering edit mode

8.7 years ago

karl.nordstrom ▴ 90

I am trying to convert a csv-file to a set of arrays with an expressionTool and have a piece of javascript that executes as intended when calling:

node javaScript.js

Due to lacking experience with java script I use googled solutions and when executing the script as a part of a cwl-pipeline it crashes. The problematic line is:

var fs = require('fs')

It results in a ReferenceError for require. The reason I have found seems to point toward fs being a server side feature, and I can only guess, but perhaps cwl runs the script as a client-script?

The alternative method I found included FileReader, but that doesn't seem to be part of the node environment.

Is there a correct way of doing this? I'm at a loss...

cwl common-workflow-language javascript • 6.1k views

ADD COMMENT • link updated 8.7 years ago by alaindomissy ▴ 160 • written 8.7 years ago by karl.nordstrom ▴ 90

score 10 · Accepted Answer · 2016-12-09

The require function is a feature available in nodejs ("server side javascript") to import other javascript modules into the current javascript file.

When using the InlineJavascriptRequirement requirement in a cwl CommanLineTool or in an ExpressionTool, the cwl engine will try to locate a javascript interpreter. If you use cwltool and you have nodejs installed, the javascript code included in your CommanLineTool or ExpressionTool will be passed to nodejs to be executed. However I do not think that such javascript code can include instructions to import other nodejs module by calling the require function.

One way to work around not using the require function, would be to implement the needed processing completely and solely with the javascript code directly included as expression in your CommanLineTool or ExpressionTool.

Here is an example, where you can see a piece of javascript code that takes care of parsing the contents of the csv files into an object with key/values being line numbers and of arrays of strings for each line in the csv

Lets assume this csv file:

data.csv

A,B,C,D
E,F,G,H
I,J,K,L

The cwl job file is:

expression.yaml

#!/usr/bin/env cwltool

cwl:tool: expression.cwl

datafile:
  class: File
  path: data.csv

The expression tool file is:

expression.cwl

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool

requirements:
  - class: InlineJavascriptRequirement

inputs:
  filename:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.basename)
  filecontent:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.contents)
  datafile:
    type: File
    inputBinding:
      loadContents: true

outputs:
  processedoutput:
    type: Any

expression: "${var lines = inputs.datafile.contents.split('\\n');
               var nblines = lines.length;
               var arrayofarrays = [];
               var setofarrays = {};
               for (var i = 0; i < nblines; i++) {
                  arrayofarrays.push(lines[i].split(','));
                  setofarrays[i] = lines[i].split(',');}
               return { 'processedoutput': setofarrays } ;
              }"

This will produce the following results:

Final process status is success
{
    "processedoutput": {
        "1": [
            "E", 
            "F", 
            "G", 
            "H"
        ], 
        "0": [
            "A", 
            "B", 
            "C", 
            "D"
        ], 
        "2": [
            "I", 
            "J", 
            "K", 
            "L"
        ]
    }, 
    "filecontent": "A,B,C,D\nE,F,G,H\nI,J,K,L", 
    "filename": "data.csv"
}

The two outputs filename and filecontents are not necessary, but may help with exploring how this works.

The question described desired data structure for the result as a "set of arrays" An example of csv file and result desired might help. As it is I am not sure if "set" was referring to the Set class available in ECMAScript 6 (recent version of javascript). The JSON types available for cwl outputs inlude arrays and objects, so the example show how to convert the csv file content into an object whose property values are arrays of strings, and the keys are the line numbers. If an array of array is desired instead, the code can be changed in the last line by replacing return { 'processedoutput': setofarrays } ; with return { 'processedoutput': arrayofarrays } ;

I hope this helps...