Is there a schema for CWL inputs/job files?
3
2
Entering edit mode
5.6 years ago
karl.sebby ▴ 100

One thing I really like about CWL is the ability to load CWL files into a CommandLineTool or Workflow object after generating the classes using schema-salad-tool --codegen=python CommonWorkflowLanguage.yml > cwl_classes.py. Is there a schema file that describes CWL inputs/job files similar to CommonWorkflowLanguage.yml? I have been working with these files as dicts, but would be super nice to be able to load them straight into a class object.

cwl • 3.7k views
ADD COMMENT
2
Entering edit mode
5.6 years ago

Hello Karl,

Yes, the inputs section of a CWL document is a schema for the input job object.

ADD COMMENT
1
Entering edit mode

Thanks! I remember coming across this at some point now...

ADD REPLY
0
Entering edit mode

I am testing dnanexus/dxCompiler. And get this error? Is this something "CWL-like" I can fix or is it just something I cannot do until they reply to my issue.

Command: java -jar dxCompiler-2.4.7.jar compile /Users/****/****/UKBB/analysis-workflows/definitions/pipelines/CH_exome_Final2.cwl -language CWL

Error: Custom types is "missing definition for schema"

[error] Error creating translator for /Users///UKBB/analysis-workflows/definitions/pipelines/CH_exome_Final2.cwl java.lang.RuntimeException: missing definition for schema file:///Users/brian/Bolton/UKBB/analysis-workflows/definitions/types/labelled_file.yml#labelled_file at dx.cwl.CwlType$.inner$1(CwlType.scala:117)

ADD REPLY
0
Entering edit mode

I see that you've already opened an issue at that repo: https://github.com/dnanexus/dxCompiler/issues/149

That's probably the best place to ask, yes.

Does the CWL reference runner, cwltool, accept your inputs?

ADD REPLY
2
Entering edit mode
5.6 years ago
peter.amstutz ▴ 300

To expand a bit on what Michael said, the "inputs" and "outputs" section of every tool or workflow is a schema for the input object, so (although I have not tried it) it probably is not much more complicated than dumping the inputs section and using the code generator on it.

ADD COMMENT
1
Entering edit mode

Thanks. Will give it a try!

ADD REPLY
0
Entering edit mode

OK. So I've gotten around to giving this a try and I'm hitting some issues. To keep things simple I'm playing around with the echo example, 1st-tool.cwl, and echo-job.yml used in the user guide https://github.com/common-workflow-language/user_guide/tree/gh-pages/_includes/cwl/02-1st-example. I've tried to validate the inputs section using $schema-salad-tool inputs.yml where inputs.yml is used just as the inputs section is written:

message:
  type: string
  inputBinding:
    position: 1

or after it has been loaded and then dumped/saved using the generated python classes:

 - id: file:///sandbox/echo.cwl#message
    inputBinding:
      position: 1
    type: string

Both forms suffer from the same issue; they are not a valid SaladRecordSchema, SaldEnumSchema, or Documentation field which has let me to creating this, which does validate:

- name: Inputs
  documentRoot: true
  type: record
  fields:
    inputs:
      type:
        type: array
        items: Input


- name: Input
  type: record
  fields:
    message:
        type: string

Does it seem like I'm going down the right path, or am I making things more complicated than they need to be?

ADD REPLY
1
Entering edit mode

Hey Karl.

Yep, you are very close to a valid schema salad representation for the 1st_tool.cwl's input section. You'll need to add - $import: "schema_salad/metaschema/metaschema_base.yml" to the beginning of the document and leave out the Inputs section, then the following will work:

$ schema-salad-tool biostars-383396.yml  echo-job.yml 
/home/michael/schema_salad/env3.7/bin/schema-salad-tool Current version: 4.5.20190815125611
Document `echo-job.yml` is valid
$ schema-salad-tool --codegen python biostars-383396.yml > cwl_utils/first_tool.py
/home/michael/schema_salad/env3.7/bin/schema-salad-tool Current version: 4.5.20190815125611
$ python -c "from cwl_utils.first_tool import load_document; print(load_document('echo-job.yml').message)"
INFO:rdflib:RDFLib Version: 4.2.2
Hello world!

Now we need to find an automated method of doing that. I've opened a feature request at https://github.com/common-workflow-language/schema_salad/issues/276 for that (maybe belongs in cwltool, we'll see).

ADD REPLY
0
Entering edit mode

Awesome! I'll give this a try and then try out some File and Directory inputs.

ADD REPLY
0
Entering edit mode
5.3 years ago
karl.sebby ▴ 100

Here's what I ended up with and has been working for the cases I've tested so far.

$base: "https://w3id.org/cwl/cwl#"

$namespaces:
  cwl: "https://w3id.org/cwl/cwl#"
  sld: "https://w3id.org/cwl/salad#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"

$graph:

# items from Process.yml

- $import: metaschema_base.yml

- name: CWLType
  type: enum
  extends: "sld:PrimitiveType"
  symbols:
    - cwl:File
    - cwl:Directory

- name: File
  type: record
  docParent: "#CWLType"
  doc:
  fields:
    - name: class
      type:
        type: enum
        name: File_class
        symbols:
          - cwl:File
      jsonldPredicate:
        _id: "@type"
        _type: "@vocab"

    - name: location
      type: string?
      jsonldPredicate:
        _id: "@id"
        _type: "@id"

    - name: path
      type: string?
      jsonldPredicate:
        "_id": "cwl:path"
        "_type": "@id"

    - name: basename
      type: string?
      jsonldPredicate: "cwl:basename"

    - name: dirname
      type: string?

    - name: nameroot
      type: string?

    - name: nameext
      type: string?

    - name: checksum
      type: string?

    - name: size
      type: long?

    - name: "secondaryFiles"
      type:
        - "null"
        - type: array
          items: [File, Directory]
      jsonldPredicate: "cwl:secondaryFiles"

    - name: format
      type: string?
      jsonldPredicate:
        _id: cwl:format
        _type: "@id"
        identity: true

    - name: contents
      type: string?

- name: Directory
  type: record
  fields:
    - name: class
      type:
        type: enum
        name: Directory_class
        symbols:
          - cwl:Directory
      jsonldPredicate:
        _id: "@type"
        _type: "@vocab"

    - name: location
      type: string?
      jsonldPredicate:
        _id: "@id"
        _type: "@id"

    - name: path
      type: string?
      jsonldPredicate:
        _id: "cwl:path"
        _type: "@id"

    - name: basename
      type: string?
      jsonldPredicate: "cwl:basename"

    - name: listing
      type:
        - "null"
        - type: array
          items: [File, Directory]
      jsonldPredicate:
        _id: "cwl:listing"



- name: InputsField
  type: record
  documentRoot: true
  fields: ~

I then populate the InputsFields.fields with a map of name: type from the cwl file. e.g. for a single optional input called inFiles that expects an array of Files.

 - name: InputsField
      type: record
      documentRoot: true
      fields:
        inFiles:
          type:
            - "null"
            - type: array
              items: File
ADD COMMENT

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6