Perform one basecommand fot multiple files and get output for each file in CWL
1
0
Entering edit mode
5.5 years ago

Dear all,

I have a Dockerfile image with gzip utilite:

FROM alpine

RUN apk update && apk add gzip

ENTRYPOINT ["unzip"]

Consider I have two fastq.gz files and I want to unzip it using my image.

cwlVersion: v1.0
class: CommandLineTool
label: "gzip wrapper"

baseCommand: [unzip, -p]

requirements:
  - class: InlineJavascriptRequirement
  - class: DockerRequirement
    dockerImageId: gzip_wrapper
    dockerFile: 
      $import: Dockerfile
inputs:
  forward:
    type: File
    inputBinding:
      position: 0
  reverse:
    type: File
    inputBinding:
      position: 1
  output_file_name: string?

outputs:
  extracted_fastq: stdout

How can I define stdout for my purpose? Can I perform a certain base command for array of files and receive array of result files?

Many thanks.

CWL Pipeline Workflow • 1.6k views
ADD COMMENT
3
Entering edit mode
5.4 years ago
Tom ▴ 540

Hello fellow pipeline enthusiast!

To my knowledge, it is not possible to do what you ask using only a command line tool and stdout. Two alternate solutions come to mind:

  1. Change your command line tool so it will only unzip and return a single file (or use this one). Embed said command line tool in a workflow and use WorkflowStepScatter to scatter over an array of zipped files and receive an array of unzipped files.

  2. Let unzip output decompressed files instead of passing everything to stdout. Then return those files using glob.

Personally, i would prefer solution number one. It seems more in line with the cwl ideal of keeping everything simple and modular.

Don't hesitate to ask further questions if there are any problems with implementing this.

ADD COMMENT
1
Entering edit mode

Dear Tom,

Thanks for the advice.

I have decided to realize the first strategy from your list.

As a result, I have: 1) Unzip wrapper:

cwlVersion: v1.0
class: CommandLineTool
label: "unzip wrapper"

baseCommand: [gunzip, -c]
stdout: $(inputs.infile.nameroot)

requirements:
  - class: DockerRequirement
    dockerImageId: gzip_wrapper
    dockerFile: 
      $import: Dockerfile

inputs:
  infile:
    type: File
    inputBinding:
      prefix: --file

outputs:
  outfile:
    type: File
    outputBinding:
        glob: $(inputs.infile.nameroot)

2) Workflow:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  -class: ScatterFeatureRequirement

inputs:
  in_files: File[]

outputs:
  compiled_class:
    type: File
    outputSource: unzip/outfile

steps:
  unzip:
    scatter: infile
    run: unzip.cwl
    in:
      infile: in_files
    out: [outfile]

3) YAML file:

in_files: 
  - input/test_R1.fastq.gz
  - input/test_R2.fastq.gz

Unfortunately, I have received the error message:

INFO /home/maria/miniconda3/bin/cwltool 1.0.20190607183319
INFO Resolved 'workflow.cwl' to 'file:///home/maria/Bioinformatics/CWL/workflow.cwl'
ERROR Tool definition failed validation:
mapSubject '-class' value 'None' is not a dict and does not have a mapPredicate.

What's wrong with my implementation? Hope you can help me.

Best wishes, Maria.

ADD REPLY
1
Entering edit mode

Hi Maria,

The workflow is missing a space character between "-" and "class" in the requirements section. Also, since the unzip step uses scatter, it will return an array as output. For this reason the compiled_class entry in the outputs section of the workflow needs to be of type: File[].

Cheers, Tom

ADD REPLY
2
Entering edit mode

I'm grateful for your assistance, Tom!

I've edited my workflow using your remarks, now it works. :) After that, I've changed my final workflow implementation: steps are presented by unzip_forward, unzip_reverse and target_tool_for_unzipped_reads, and inputs are gzipped files.

ADD REPLY

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6