CWL: Getting output file of a CommandLineTool using glob
3
1
Entering edit mode
6.4 years ago
skanwal ▴ 50

Hi,

I am trying to run the following CommandLineTool description:

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

hints:
 DockerRequirement:
  dockerPull: quay.io/biocontainers/kallisto:0.44.0--h7d86c95_2

inputs:
 fastqs:
   type: File[]
   inputBinding: {}

 index:
   type: File
   inputBinding:
     prefix: "--index"

baseCommand: [ "kallisto", "quant" ]

arguments:
  - valueFrom: $(runtime.outdir)
    prefix: --output-dir

outputs:
 quantification:
  type: File
  outputBinding:
   glob: abundance.tsv

Using this command:

cwltool kallisto-quant.cwl kallisto-quant.json

It throws the following error:

Error collecting output for parameter 'quantification':
kallisto-quant.cwl:31:4: Did not find output file with glob pattern: '['abundance.tsv']'

The complete docker command looks fine to me and runs successfully without cwltool (with the same arguments).

The complete output of the cwltool run command is:

(cwl) 4180L-137952-M ~/Documents/UMCCR/Play/cwl-metrics/kallisto $ cwltool kallisto-quant.cwl kallisto-quant.json
/Users/kanwals/virtualenvironment/cwl/bin/cwltool 1.0.20180719140605
Resolved 'kallisto-quant.cwl' to 'file:///Users/kanwals/Documents/UMCCR/Play/cwl-metrics/kallisto/kallisto-quant.cwl'
[job kallisto-quant.cwl] /private/tmp/docker_tmp1tk42cjr$ docker \
    run \
    -i \
    --volume=/private/tmp/docker_tmp1tk42cjr:/var/spool/cwl:rw \
    --volume=/private/var/folders/_v/g24brqws13v3gtrvcz2j57ldbdw8jg/T/tmp8t7700g9:/tmp:rw \
    --volume=/Users/kanwals/Documents/UMCCR/Play/cwl-metrics/kallisto/input/fusion-1_1.fq:/var/lib/cwl/stg4024e429-2f7e-40f2-b701-5b12dd6c3cad/fusion-1_1.fq:ro \
    --volume=/Users/kanwals/Documents/UMCCR/Play/cwl-metrics/kallisto/input/fusion-1_2.fq:/var/lib/cwl/stg38cb4274-de7f-41ba-bd37-050cd3825377/fusion-1_2.fq:ro \
    --volume=/Users/kanwals/Documents/UMCCR/Play/cwl-metrics/kallisto/index/GRCh37.idx:/var/lib/cwl/stg3bc0fbee-9d95-480b-8cb6-23a2862b6f0d/GRCh37.idx:ro \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --user=1457398319:2094513965 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    --memory=1024m \
    quay.io/biocontainers/kallisto:0.44.0--h7d86c95_2 \
    kallisto \
    quant \
    --output-dir \
    out \
    /var/lib/cwl/stg4024e429-2f7e-40f2-b701-5b12dd6c3cad/fusion-1_1.fq \
    /var/lib/cwl/stg38cb4274-de7f-41ba-bd37-050cd3825377/fusion-1_2.fq \
    --index \
    /var/lib/cwl/stg3bc0fbee-9d95-480b-8cb6-23a2862b6f0d/GRCh37.idx

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 196,501
[index] number of k-mers: 116,739,414
[job kallisto-quant.cwl] Job error:
Error collecting output for parameter 'quantification':
kallisto-quant.cwl:31:4: Did not find output file with glob pattern: '['abundance.tsv']'
[job kallisto-quant.cwl] completed permanentFail
{}
Final process status is permanentFail

Any help to resolve this issue will be highly appreciated.

Thanks.

Common-Workflow-Language cwl • 3.1k views
ADD COMMENT
1
Entering edit mode
6.4 years ago
inutano ▴ 30

The latest version of cwltool set memory resource 1024m in default ("--memory=1024m" In the log file) and kallisto can't run with the amount of memory. As I know cwltool 1.0.20180403145700 didn't set the default resource value, but 1.0.20180711112827 does.

So you have to set the minimum memory amount in the CWL file like below:

requirements:
  ResourceRequirement:
    ramMin: 4096

And it worked for me. Try it!

ADD COMMENT
0
Entering edit mode

Thanks @inutano. It did the trick :)

ADD REPLY
0
Entering edit mode
6.4 years ago

Hi! On the face of it the tool is not producing the abundance.tsv. When you run the tool bare (by yourself) do you see that .tsv file produced?

ADD COMMENT
0
Entering edit mode
6.4 years ago
skanwal ▴ 50

Hi Kaushik,

Thanks for replying.

Yes, I have run this tool a few times, using a conda installation and also it's docker image. The tool produces three files:

abundance.h5
abundance.tsv
run_info.json

The docker command produced by cwltool ((provided in the original question) also looks fine to me.

I used the following command to run kallisto using its docker image:

docker run -v $PWD:/home/kallisto quay.io/biocontainers/kallisto:0.44.0--h7d86c95_2 kallisto quant --output-dir /home/kallisto/out --index /home/kallisto/index/GRCh37.idx /home/kallisto/input/fusion-1_1.fq /home/kallisto/input/fusion-1_2.fq

And this did produce the three output files. So, I am confused what I am missing from cwltool definition or what is the diffreence between this command and the one produced by cwltool run.

ADD COMMENT
0
Entering edit mode

Could you try "*.tsv" for the glob please?

ADD REPLY
0
Entering edit mode

Same error:

[job kallisto-quant.cwl] Job error:
Error collecting output for parameter 'quantification':
kallisto-quant.cwl:31:4: Did not find output file with glob pattern: '['*.tsv']'
[job kallisto-quant.cwl] completed permanentFail
{}
Final process status is permanentFail
ADD REPLY
0
Entering edit mode

Are the files being produced in the working directory? Do you have Rabix Executor installed by any chance: (https://github.com/rabix/bunny). Might be worth trying on that too.

ADD REPLY

Login before adding your answer.

Traffic: 1191 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6