cwlexec and cwltool not interpreting baseCommand the same way
1
0
Entering edit mode
6.5 years ago
drkennetz ▴ 560

I wrote a workflow to see where a bam file intersects with multiple bed files, each bed file corresponds to regions in different chromosomes. The output from step 2 (which uses bedtools intersect) is in the following format:

chr1    1    2    400    0
chr1    3    4    176    0
...
...
chr12    500    501    300    1
chr12    501    502    176    1

The first column is chr number (output from bedtools genomecov -ibam input.bam -a) to output all regions. Then I have another tool that is:

bedtools -a outgenome.txt -b chr12.bed -c

which outputs all positions of the outgenome.txt with column 4 being # reads at each position and column 5 is 0 if it does not intersect with the bed, and 1 if it does. I then wrote a final tool to extract only column 5 with a value of 1 (which intersect with the bed). the tool is:

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
 - class: ShellCommandRequirement
inputs:
  atoBComparison:
    type: File
    inputBinding:
      position: 1
outputs:
  bRegionsNonZero:
    type: stdout
stdout: $(inputs.atoBComparison.basename)_nonZero.txt
baseCommand: [awk, "$5!=0"]

The tool runner prints the base command as it should (so awk works):

awk' '$5!=0' '/users/dkennetz/tmp7l0ehcvy/stg2d08cfbe-1cd1-4ce2-a39a-f9c1ae65f504/PromChr12.bed_AtoB.txt' > /users/dkennetz/tmp9us6fg_b/PromChr12.bed_AtoB.txt_nonZero.txt

but when I run the workflow cwlexec inputs the awk command differently:

/bin/sh -c 'awk $5!=0 filename'

which is read differently by the interpreter. It thinks the filename is in the awk command, causing it to fail. Any ideas about changing the tool to fix this, or does this seem like a bug?

cwl bed • 1.6k views
ADD COMMENT
0
Entering edit mode

Thanks for this question; it has been turned into a new conformance test for the CWL standard! https://github.com/common-workflow-language/common-workflow-language/pull/701

ADD REPLY
2
Entering edit mode
6.5 years ago

Hello @drkennetz,

You CWL is valid, so this is likely be a bug with cwlexec that should be reported at https://github.com/IBMSpectrumComputing/cwlexec/issues

I tried to do the following

  1. Removing ShellCommandRequirement as that isn't needed →No change.
  2. moving the $5!=0 to the arguments section →The $5!=0 appears to be quoted, but I get the same awk: line 1: syntax error at or near != error
  3. Restoring ShellCommandRequirement with an explicit shellQuote: true (the default value) →Same as above

This reveals that cwlexec is allowing arguments to be evaluated by the shell instead of passing them verbatim to the tool, which is contrary to the CWL standard; so please file a bug report with IBM. In the meantime, here is a workaround that works with both cwlexec and other compliant CWL implementations:

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
 ShellCommandRequirement: {}
inputs:
  atoBComparison:
    type: File
    inputBinding:
      position: 1
outputs:
  bRegionsNonZero:
    type: stdout
stdout: $(inputs.atoBComparison.basename)_nonZero.txt
baseCommand: awk
arguments:
 - valueFrom: "'$5!=0'"
   shellQuote: false
ADD COMMENT
1
Entering edit mode

Thanks for your diligence with the CWL stuff Michael! I think you are apart of something pretty awesome, and the dedication you guys put forth making CWL better is noticable. Keep up the good work, and thanks for the complete answer.

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6