The require
function is a feature available in nodejs ("server side javascript") to import other javascript modules into the current javascript file.
When using the InlineJavascriptRequirement
requirement in a cwl CommanLineTool or in an ExpressionTool, the cwl engine will try to locate a javascript interpreter. If you use cwltool and you have nodejs installed, the javascript code included in your CommanLineTool or ExpressionTool will be passed to nodejs to be executed. However I do not think that such javascript code can include instructions to import other nodejs module by calling the require
function.
One way to work around not using the require function, would be to implement the needed processing completely and solely with the javascript code directly included as expression in your CommanLineTool or ExpressionTool.
Here is an example, where you can see a piece of javascript code that takes care of parsing the contents of the csv files into an object with key/values being line numbers and of arrays of strings for each line in the csv
Lets assume this csv file:
data.csv
A,B,C,D
E,F,G,H
I,J,K,L
The cwl job file is:
expression.yaml
#!/usr/bin/env cwltool
cwl:tool: expression.cwl
datafile:
class: File
path: data.csv
The expression tool file is:
expression.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool
requirements:
- class: InlineJavascriptRequirement
inputs:
filename:
type: string
outputBinding:
outputEval: $(inputs.datafile.basename)
filecontent:
type: string
outputBinding:
outputEval: $(inputs.datafile.contents)
datafile:
type: File
inputBinding:
loadContents: true
outputs:
processedoutput:
type: Any
expression: "${var lines = inputs.datafile.contents.split('\\n');
var nblines = lines.length;
var arrayofarrays = [];
var setofarrays = {};
for (var i = 0; i < nblines; i++) {
arrayofarrays.push(lines[i].split(','));
setofarrays[i] = lines[i].split(',');}
return { 'processedoutput': setofarrays } ;
}"
This will produce the following results:
Final process status is success
{
"processedoutput": {
"1": [
"E",
"F",
"G",
"H"
],
"0": [
"A",
"B",
"C",
"D"
],
"2": [
"I",
"J",
"K",
"L"
]
},
"filecontent": "A,B,C,D\nE,F,G,H\nI,J,K,L",
"filename": "data.csv"
}
The two outputs filename
and filecontents
are not necessary, but may help with exploring how this works.
The question described desired data structure for the result as a "set of arrays" An example of csv file and result desired might help. As it is I am not sure if "set" was referring to the Set class available in ECMAScript 6 (recent version of javascript). The JSON types available for cwl outputs inlude arrays and objects, so the example show how to convert the csv file content into an object whose property values are arrays of strings, and the keys are the line numbers. If an array of array is desired instead, the code can be changed in the last line by replacing return { 'processedoutput': setofarrays } ;
with return { 'processedoutput': arrayofarrays } ;
I hope this helps...
This solution works very well. I wasn't aware of the loadContent option.
I aimed for something like processedoutput when I spoke of "set of arrays".
Thank you very much.
Great example, thank you a lot! Just one question: why are
filename
andfilecontents
returned in the body ofprocessedoutput
, though you did not push them into this object explicitly?I would guess that was from an earlier version of the expression that included it for debugging purposes
How do you save this JS somewhere so that it can be re-used in different places in your workflow?