I wrote a workflow to see where a bam file intersects with multiple bed files, each bed file corresponds to regions in different chromosomes. The output from step 2 (which uses bedtools intersect) is in the following format:
chr1 1 2 400 0
chr1 3 4 176 0
...
...
chr12 500 501 300 1
chr12 501 502 176 1
The first column is chr number (output from bedtools genomecov -ibam input.bam -a) to output all regions. Then I have another tool that is:
bedtools -a outgenome.txt -b chr12.bed -c
which outputs all positions of the outgenome.txt with column 4 being # reads at each position and column 5 is 0 if it does not intersect with the bed, and 1 if it does. I then wrote a final tool to extract only column 5 with a value of 1 (which intersect with the bed). the tool is:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: ShellCommandRequirement
inputs:
atoBComparison:
type: File
inputBinding:
position: 1
outputs:
bRegionsNonZero:
type: stdout
stdout: $(inputs.atoBComparison.basename)_nonZero.txt
baseCommand: [awk, "$5!=0"]
The tool runner prints the base command as it should (so awk works):
awk' '$5!=0' '/users/dkennetz/tmp7l0ehcvy/stg2d08cfbe-1cd1-4ce2-a39a-f9c1ae65f504/PromChr12.bed_AtoB.txt' >
/users/dkennetz/tmp9us6fg_b/PromChr12.bed_AtoB.txt_nonZero.txt
but when I run the workflow cwlexec inputs the awk command differently:
/bin/sh -c 'awk $5!=0 filename'
which is read differently by the interpreter. It thinks the filename is in the awk command, causing it to fail. Any ideas about changing the tool to fix this, or does this seem like a bug?
Thanks for this question; it has been turned into a new conformance test for the CWL standard! https://github.com/common-workflow-language/common-workflow-language/pull/701