Simple(?) workflow, can't sort out what output goes where
1
2
Entering edit mode
8.0 years ago
starkruzr ▴ 20

Hi folks,

I am trying to feed simple results from Muscle into RaxML for processing. The use case here is for cwl-runner to be used as a workflow engine for the Airavata science gateway software. We'll have Airavata run, "cwl-runner muscle-raxml.cwl --infile file.fa [--diags] --model BINGAMMA" (for example) and then have it return the result. The problem right now is that cwl-runner doesn't understand my outputs, which makes sense, because I don't really understand how CWL keeps track of outputs either! When I run it right now, I get the following:

Fornacis:science-gateway-experiment-code jtd$ cwl-runner muscle-raxml.cwl --infile unaligned.fa --diags --model BINGAMMA
/usr/local/bin/cwl-runner 1.0.20161128202906
Resolved 'muscle-raxml.cwl' to 'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl'
Tool definition failed validation:
While checking field `outputs`
  While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#classout`
    Field `outputSource` contains undefined reference to `raxmloutput`, tried [u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#classout/raxmloutput', u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxmloutput']
While checking field `steps`
  While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxml`
    While checking field `in`
      While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxml/raxmlinfile`
        Field `source` contains undefined reference to `intermediatefile`, tried [u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#intermediatefile']

The idea is for Muscle to generate a file called "intermediatefile" which is then fed into RaxML for processing. RaxML then produces several files, which because of the arguments we provided will terminate all of them with ".out". Sounds sort of logical, but doesn't actually work.

Here's the contents of my three CWL files.

muscle-raxml.cwl:

cwlVersion: v1.0
class: Workflow
inputs:
  infile: File
  diags: boolean
  model: string

outputs:
  classout:
    type: File
    outputSource: raxmloutput

steps:
  muscle:
    run: muscleraxml-muscle.cwl
    in:
      muscleinfile: infile
      diagsflag: diags
    out: [intermediatefile]

  raxml:
    run: muscleraxml-raxml.cwl
    in:
      raxmlinfile: intermediatefile
      raxml_model: model
    out: [raxmloutput]

muscleraxml-muscle.cwl:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [muscle]
arguments: ["-out intermediatefile"]
inputs:
  muscleinfile:
    type: File
    inputBinding:
      position: 1
      prefix: -in
  diagsflag:
    type: boolean
    inputBinding:
      position: 2
      prefix: -diags

outputs:
  intermediatefile:
    type: File
    outputBinding:
      glob: intermediatefile

muscleraxml-raxml.cwl:

cwlVersion: v1.0
class: CommandLineTool
label: RaxML wrapper
baseCommand: raxml
arguments: ["-n out -T 2"]
inputs:
  raxmlinfile:
    type: File
    inputBinding:
      position: 1
      prefix: -s
  raxml_model:
    type: string
    inputBinding:
      position: 2
      prefix: -m

outputs:
  raxmloutput:
    type: File
    outputBinding:
      glob: "*.out"

Help?

Thanks!

cwl • 2.2k views
ADD COMMENT
5
Entering edit mode
8.0 years ago
alaindomissy ▴ 160

In your workflow level file (muscle-raxml.cwl):

  • raxmloutput would refer to a workflow level input (which does not exist)
  • intermediatefile would refer to a workflow level input (which does not exist)

You need to specify the workflow step where these outputs come from:

  • instead of outputSource: raxmloutput you need outputSource: raxml/raxmloutput
  • instead of raxmlinfile: intermediatefile you need raxmlinfile: muscle/intermediatefile

ALSO: your example "cwl-runner muscle-raxml.cwl --infile file.fa [--diags] --model BINGAMMA" indicates that --flags is optional. So you need to make the corresponding input optionnals at the workflow level (muscle-raxml.cwl):

  • instead of diags: boolean you need diags: boolean?

as well as the tool level (muscleraxml-muscle.cwl):

  • instead of: diagsflag: type: boolean you need: diagsflag: type: boolean?

ALSO: the arguments field in muscleraxml-muscle.cwl needs to be a list of 2 strings instead of just one :

  • instead of: ["-out intermediatefile"] you need: ["-out", "intermediatefile"]

Here's the contents of the modified CWL files.

muscle-raxml.cwl:

cwlVersion: v1.0 
class: Workflow
inputs:
  infile: File
  diags: boolean?
  model: string
outputs:
  classout:
    type: File
    outputSource: raxml/raxmloutput
steps:
  muscle:
    run: muscleraxml-muscle.cwl
    in:
      muscleinfile: infile
      diagsflag: diags
    out: [intermediatefile]
  raxml:
    run: muscleraxml-raxml.cwl
    in:
      raxmlinfile: muscle/intermediatefile
      raxml_model: model
    out: [raxmloutput]

muscleraxml-muscle.cwl:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [muscle]
arguments: ["-out", "intermediatefile"]
inputs:
  muscleinfile:
    type: File
      inputBinding:
        position: 1
        prefix: -in
  diagsflag:
    type: boolean?
      inputBinding:
        position: 2
        prefix: -diags
outputs:
  intermediatefile:
    type: File
    outputBinding:
      glob: intermediatefile
ADD COMMENT

Login before adding your answer.

Traffic: 1502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6