CWL join two dimensional file array output
1
0
Entering edit mode
5.5 years ago
silverspanch ▴ 20

I have a CWL workflow that scatters a list of variables over a nested workflow which returns a file array. The output is then an array of file arrays, but it needs to be flattened for the next step. How do I gather the inputs from one step, and join the multidimensional file array into a single flat array?

cwl workflow scatter • 2.7k views
ADD COMMENT
0
Entering edit mode

Did you consider using StepInputExpressionRequirement (https://www.commonwl.org/v1.0/Workflow.html#StepInputExpressionRequirement)? You can set a valueFrom using JS expression and transform a step input with $(self[0]) from, e.g. [ [ 1, 2 ] ] to [ 1, 2 ].

ADD REPLY
0
Entering edit mode

Thanks, I didnt know about this functionality.

ADD REPLY
0
Entering edit mode

you can also use linkMerge: merge_flattened in the input of the 2nd step

ADD REPLY
1
Entering edit mode
5.5 years ago
Tom ▴ 540

I believe there is no built-in functionality for what you are asking.

I would suggest adding an ExpressionTool between the two workflow steps. I just tested the following and it seems to work:

 [in the steps section of your workflow]
   arrayBusiness:
    run:
      class: ExpressionTool
      inputs:
        arrayTwoDim:
          type:
            type: array
            items:
              type: array
              items: File
          inputBinding:
            loadContents: true
      outputs:
        array1d:
          type: File[]
      expression: >
        ${
          var newArray= [];
          for (var i = 0; i < inputs.arrayTwoDim.length; i++) {
            for (var k = 0; k < inputs.arrayTwoDim[i].length; k++) {
              newArray.push((inputs.arrayTwoDim[i])[k]);
            }
          }
          return { 'array1d' : newArray }
        }
    in:
      arrayTwoDim: make2dArray/array2d
    out: [array1d]
   [...]

It took me a while to figure out how cwl would like it's nested arrays described. Make sure to set the type of the output of the previous step as:

type:
  type: array
  items:
    type: array
    items: File

Otherwise cwltool will throw a fit because the output is not compatible with the input of the ExpressionTool above. If you try passing type: File[] between the workflows steps cwl will, during runtime, realize that its actually passing along a nested array to a step where the input is supposed to be an array of files and abort.

ADD COMMENT
1
Entering edit mode

Thanks Tom, this worked great! I havent used the ExpressionTool class before but this is exactly what I was looking for.

ADD REPLY
1
Entering edit mode

Im a big fan of the CWL, but I think that arrayTwoDim: type: type: array items: type: array items: File is by far the stupidest way Ive ever seen for a language to declare a 2d array. Why is File[][] so hard?

ADD REPLY

Login before adding your answer.

Traffic: 2193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6