I want to stream the output of one command as the input to the second command. How do I do that using CWL?
For example:
zcat sample.fastq.gz | grep ...
My attempt to use stdout captured that stdout to a tmp file.
Thank you Manisha
I want to stream the output of one command as the input to the second command. How do I do that using CWL?
For example:
zcat sample.fastq.gz | grep ...
My attempt to use stdout captured that stdout to a tmp file.
Thank you Manisha
If you want to put it all together in a single CommandLineTool, you need to include ShellCommandRequirement. See:
http://www.commonwl.org/v1.0/CommandLineTool.html#ShellCommandRequirement
and for pipes and other interpreted characters you have to use shellQuote: False
An alternative is to do a workflow. You can specify your input and outputs as streamable in the tool descriptions. In principle, that should achieve the wanted behavior, but this depends on the implementation and I'm not sure how far along this is.
Seems like the proper way is to do a workflow. That way you can keep the zcat and grep commands separate. From the intro to CWL doc "CWL tasks are isolated and you must be explicit about your inputs and outputs." First create tool wrappers for zcat and grep:
cwlVersion: v1.0
class: CommandLineTool
baseCommand: zcat
stdout: $(inputs.unzippedFileName)
inputs:
gzipFile:
type: File
inputBinding:
position: 1
unzippedFileName:
type: string
outputs:
unzippedFile:
type: stdout
cwlVersion: v1.0
class: CommandLineTool
baseCommand: grep
stdout: $(inputs.outFileName)
inputs:
pattern:
type: string
inputBinding:
position: 1
fileToSearch:
type: File
inputBinding:
position: 2
outFileName:
type: string
outputs:
grepOut:
type: stdout
Then make a workflow to put them together:
cwlVersion: v1.0
class: Workflow
############
inputs:
GZIPFILE:
type: File
UNZIPPEDFILENAME:
type: string
default: blah #doesn't really matter, not permanant output.
PATTERN:
type: string
OUTFILENAME:
type: string
############
outputs:
grepOutput:
type: File
outputSource: grep/grepOut
############
steps:
zcat:
run: zcat.cwl
in:
gzipFile: GZIPFILE
unzippedFileName: UNZIPPEDFILENAME
out: [unzippedFile]
grep:
run: grep.cwl
in:
pattern: PATTERN
fileToSearch: zcat/unzippedFile
outFileName: OUTFILENAME
out: [grepOut]
And finally a YML file to describe your inputs:
GZIPFILE:
class: File
path: test.txt.gz
#UZIPPEDFILENAME: Not needed, default given in workflow.
PATTERN: two
OUTFILENAME: zcatPipeGrepWorkflowOutput.txt
A bit of work this way, but once you have it, you can reuse. My test.txt.gz file just contains four lines [one, two, three, four] and the file returned just contains the search pattern 'two'. An easier way would be to just make a bash script and make a tool wrapper for it, but that doesn't keep your tools isolated.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you provide an example that actually uses the ShellCommandRequirement and shellQuote:false?
I know that the "right" way is using the "streamable: true" but what do you do if you CWL runner tool doesn't support that and you don't want to save the intermediate outputs?
I am new to CWL, so I might have the formatting/syntax wrong, but would something like this work?