CWL: reading files within an expressionTool
1
2
Entering edit mode
8.0 years ago

I am trying to convert a csv-file to a set of arrays with an expressionTool and have a piece of javascript that executes as intended when calling:

node javaScript.js

Due to lacking experience with java script I use googled solutions and when executing the script as a part of a cwl-pipeline it crashes. The problematic line is:

var fs = require('fs')

It results in a ReferenceError for require. The reason I have found seems to point toward fs being a server side feature, and I can only guess, but perhaps cwl runs the script as a client-script?

The alternative method I found included FileReader, but that doesn't seem to be part of the node environment.

Is there a correct way of doing this? I'm at a loss...

cwl common-workflow-language javascript • 5.7k views
ADD COMMENT
10
Entering edit mode
8.0 years ago
alaindomissy ▴ 160

The require function is a feature available in nodejs ("server side javascript") to import other javascript modules into the current javascript file.

When using the InlineJavascriptRequirement requirement in a cwl CommanLineTool or in an ExpressionTool, the cwl engine will try to locate a javascript interpreter. If you use cwltool and you have nodejs installed, the javascript code included in your CommanLineTool or ExpressionTool will be passed to nodejs to be executed. However I do not think that such javascript code can include instructions to import other nodejs module by calling the require function.

One way to work around not using the require function, would be to implement the needed processing completely and solely with the javascript code directly included as expression in your CommanLineTool or ExpressionTool.

Here is an example, where you can see a piece of javascript code that takes care of parsing the contents of the csv files into an object with key/values being line numbers and of arrays of strings for each line in the csv

Lets assume this csv file:

data.csv

A,B,C,D
E,F,G,H
I,J,K,L

The cwl job file is:

expression.yaml

#!/usr/bin/env cwltool

cwl:tool: expression.cwl

datafile:
  class: File
  path: data.csv

The expression tool file is:

expression.cwl

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool

requirements:
  - class: InlineJavascriptRequirement

inputs:
  filename:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.basename)
  filecontent:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.contents)
  datafile:
    type: File
    inputBinding:
      loadContents: true

outputs:
  processedoutput:
    type: Any

expression: "${var lines = inputs.datafile.contents.split('\\n');
               var nblines = lines.length;
               var arrayofarrays = [];
               var setofarrays = {};
               for (var i = 0; i < nblines; i++) {
                  arrayofarrays.push(lines[i].split(','));
                  setofarrays[i] = lines[i].split(',');}
               return { 'processedoutput': setofarrays } ;
              }"

This will produce the following results:

Final process status is success
{
    "processedoutput": {
        "1": [
            "E", 
            "F", 
            "G", 
            "H"
        ], 
        "0": [
            "A", 
            "B", 
            "C", 
            "D"
        ], 
        "2": [
            "I", 
            "J", 
            "K", 
            "L"
        ]
    }, 
    "filecontent": "A,B,C,D\nE,F,G,H\nI,J,K,L", 
    "filename": "data.csv"
}

The two outputs filename and filecontents are not necessary, but may help with exploring how this works.

The question described desired data structure for the result as a "set of arrays" An example of csv file and result desired might help. As it is I am not sure if "set" was referring to the Set class available in ECMAScript 6 (recent version of javascript). The JSON types available for cwl outputs inlude arrays and objects, so the example show how to convert the csv file content into an object whose property values are arrays of strings, and the keys are the line numbers. If an array of array is desired instead, the code can be changed in the last line by replacing return { 'processedoutput': setofarrays } ; with return { 'processedoutput': arrayofarrays } ;

I hope this helps...

ADD COMMENT
0
Entering edit mode

This solution works very well. I wasn't aware of the loadContent option.

I aimed for something like processedoutput when I spoke of "set of arrays".

Thank you very much.

ADD REPLY
0
Entering edit mode

Great example, thank you a lot! Just one question: why are filename and filecontents returned in the body of processedoutput, though you did not push them into this object explicitly?

ADD REPLY
0
Entering edit mode

I would guess that was from an earlier version of the expression that included it for debugging purposes

ADD REPLY
0
Entering edit mode

How do you save this JS somewhere so that it can be re-used in different places in your workflow?

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6