Hi -
I would like to have a workflow with the following set up:
Step 1: creates an array of N files with specific naming conventions
Step 2: scatters over the output of step 1
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
- class: ScatterFeatureRequirement
inputs:
input_file: File
steps:
step1:
run: step1.cwl
in:
input_file: input_file
out: [output_files]
step2:
run: step2.cwl
scatter: input_file
in:
input_file: step1/output_files
out: [output_files]
outputs:
final_out:
type: File[]
outputSource: step2/output_files
Where Step 1 is something like this, where the command is just a shell script that splits the files into 4 independent files, each with specific naming conventions:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: split_file.sh
inputs:
input_file:
type: File
inputBinding:
position: 1
outputs:
junctions:
type: File[]
outputBinding:
glob:
- $(inputs.input_file.basename).a.tmp
- $(inputs.input_file.basename).b.txt
- $(inputs.input_file.basename).c.fastq
- $(inputs.input_file.basename).d.fasta
I know that I could just glob: "*"
to gather all these outputs, but I want to specifically check for the existence each output before moving onto Step 2. When I tried the above, it returned an empty array as output of Step 1, even though the script being called did produce each output in the temp directory. If I use secondaryFiles, it doesn't scatter across them. Is it currently possible to achieve something like this with CWL and what would be the best way? As a note, I cannot currently use ExpressionTool as it isn't supported by the runner we are using just yet.
Thanks!
Thanks for confirming that the array output is set up correctly, that helps! And you were correct about the script placing into another directory, where the output files were made, but not collected. I originally pointed
glob
to the files with the relative directory path or the absolute path and those did not work, but as it turns out, I just didn't fully understand where to place things like$(runtime.outdir)
and use directories in my script set up. Thanks so much for the response! I just started learning CWL and this really helped clarify things!