I have a CWL workflow with lots of outputs. All of those are just written flat to the output directory. Therefore, I would like to organise outputs into subdirectories. Is anything like that possible with CWL?
Note: The two toy workflows I am playing with atm are here: https://github.com/mareq/cwl-tutor/tree/master/workflows
However, these are just toy examples, I play with. This is supposed to be a generic question, so that I can eventually apply the answer to the real workflow.
I don't think what you ask for is possible. I tackle the issue at tool-level. Every tool that might provide a workflows final output returns a directory which name is somehow derived from the tools input. Your other option would be appending a tool to the workflow which handles creating the folder structure you want.
I realized i have use for the latter option as well. I wrote an ExpressionTool which can be stitched to the end of a workflow. It can be used to group files/directories at the end of a workflow using scatter.
[Warning: I have not yet tested what happens if an empty array is passed to this tool as input for either or both of the arrays.]
Code redacted because it was a horrible mess. Please use the tool posted below.
I improved this. Circumstance forced my hand. It turns out passing arrays with less then two entries breaks CWL. I have not found a really elegant solution and i'm inclined to think cwl simply won't allow one. So now there are additional optional inputs for single files / directories. The newname input has also become optional because it didn't play well with scatter. If you don't provide a newname, one will be chosen based on the names of the other stuff you fed into the tool.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool
label: Returns a directory named after inputs.newname, containing all input files and directories.
requirements:
InlineJavascriptRequirement: {}
inputs:
file_single:
type: File?
label: A single file which will be placed in the output directory.
file_array:
type: File[]?
label: An array of files which will be placed in the output directory.
directory_single:
type: Directory?
label: A single directory which will be placed in the output directory as a subdirectory.
directory_array:
type: Directory[]?
label: An array of directories which will be placed in the output directory as subdirectories.
newname:
type: string?
label: Name of the output-directory. If no input is provided, tool will try use the nameroot of file_single, directory_single, file_array[0], directory_array[0] (in this order).
outputs:
pool_directory:
type: Directory
label: Directory where all input files and subdirectories will be pooled. Named after inputs.newname.
expression: |
${
//Check if an input for newname was provided. If yes, use this as new directory name.
var newName = "";
var needName = true;
if ( inputs.newname != undefined ) {
newName = inputs.newname;
needName = false;
}
//Check which input files / directories are present. Add them to the new directory.
//If no input for newname was provided, use the name of one of the files or directories.
var outputList = [];
if ( inputs.file_single != undefined ) {
outputList.push( inputs.file_single );
if ( needName ) {
newName = inputs.file_single.nameroot;
needName = false;
}
}
if ( inputs.directory_single != undefined ) {
outputList.push( inputs.directory_single );
if ( needName ) {
newName = inputs.directory_single.basename;
needName = false;
}
}
if ( inputs.file_array != undefined ) {
for ( var count = 0; count < inputs.file_array.length; count++ ) {
var nextfile = inputs.file_array[count];
outputList.push( nextfile );
}
if ( needName ) {
newName = inputs.file_array[0].nameroot;
needName = false;
}
}
if ( inputs.directory_array != undefined ) {
for ( var count = 0; count < inputs.directory_array.length; count++ ) {
var nextdir = inputs.directory_array[count];
outputList.push( nextdir );
}
if ( needName ) {
newName = inputs.directory_array[0].basename;
needName = false;
}
}
return {
"pool_directory": {
"class": "Directory",
"basename": newName,
"listing": outputList
}
};
}
I realized i have use for the latter option as well. I wrote an ExpressionTool which can be stitched to the end of a workflow. It can be used to group files/directories at the end of a workflow using scatter. [Warning: I have not yet tested what happens if an empty array is passed to this tool as input for either or both of the arrays.]
I improved this. Circumstance forced my hand. It turns out passing arrays with less then two entries breaks CWL. I have not found a really elegant solution and i'm inclined to think cwl simply won't allow one. So now there are additional optional inputs for single files / directories. The
newname
input has also become optional because it didn't play well with scatter. If you don't provide anewname
, one will be chosen based on the names of the other stuff you fed into the tool.