Warning message: [outdir] is not a file, ignoring
2
1
Entering edit mode
7.4 years ago

I am familiarizing myself with Docker, Common Workflow Language, and Dockstore. I've created a simple wrapper for the MEGAHIT assembler, and followed Dockstore's instructions for building a Docker image, describing the tool in CWL, and specifying inputs and outputs with JSON.

Here is the tool description in CWL.

class: CommandLineTool
doc: MEGAHIT assembler
id: megahit-v1.1.1
label: "MEGAHIT v1.1.1"

cwlVersion: v1.0

dct:creator:
  "@id": "http://orcid.org/0000-0003-0342-8531"
  foaf:name: Daniel Standage
  foaf:mbox: "mailto:daniel.standage@gmail.com"

requirements:
  - class: DockerRequirement
    dockerPull: "quay.io/standage/megahit:latest"

hints:
  - class: ResourceRequirement
    coresMin: 4
    ramMin: 2048

inputs:
  intfasta:
    type: File
    doc: "Interleaved Fastq file to be assembled"
    inputBinding:
      position: 1
      prefix: --12
  threads:
    type: int
    doc: "Number of threads"
    inputBinding:
      position: 2
      prefix: --num-cpu-threads

outputs:
  megahit_out:
    type: Directory
    doc: "Assembly output"
    outputBinding:
      glob: megahit_out

baseCommand: ["megahit"]

And here is my parameter configuration.

{
  "intfasta": {
    "path": "/tmp/reads.fastq.gz",
    "class": "File"
  },
  "megahit_out": {
    "path": "/tmp/megahit_out",
    "class": "Directory"
  },
  "threads": 4
}

Using the command recommended by Dockstore, I can successfully execute the workflow.

dockstore tool launch --entry quay.io/standage/megahit-dockstore:latest --json Dockstore.json

However, at the end of the output I see the following message.

--- [STAT] 117 contigs, total 4577548 bp, min 220 bp, max 246618 bp, avg 39124 bp, N50 105709 bp                                    
        --- [Fri Jun 23 19:40:56 2017] ALL DONE. Time elapsed: 1562.705876 seconds ---
        [job temp6452337804155837687.cwl] completed success           
        Final process status is success

Saving copy of cwltool stdout to: /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821e-c59b82aa75bf/outputs/cwltool.stdout.txt
Saving copy of cwltool stderr to: /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821e-c59b82aa75bf/outputs/cwltool.stderr.txt

Provisioning your output files to their final destinations /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821ec59b82aa75bf/outputs/megahit_out is not a file, ignoring

The last line is the bit that concerns me. I assume Directory is a valid output type--I couldn't confirm from the documentation, but the Dockstore and CWL tooling seemed to handle it fine. Why is the output directory not being moved/copied to the destination specified in the JSON file?

cwl dockstore • 2.1k views
ADD COMMENT
3
Entering edit mode
7.4 years ago
denis.yuen ▴ 100

Hi, That's actually a missing (Dockstore) feature that we've encountered recently and have been working on filling. Simply put, the directory type was added in v1.0 of CWL and we didn't notice the the ability to designate a whole directory as output (as opposed to say, an array of files or a file with secondary files).

The temporary workaround would be to use an array of files using a wildcard http://www.commonwl.org/v1.0/UserGuide.html#Array_outputs or secondary files https://github.com/ga4gh/dockstore/blob/1.2.3/dockstore-client/src/test/resources/integrate/cwl/samtools_index.cwl if you have some understanding of the tool's output.

Alternatively, the tool will still work as a basic CWL tool, just without file provisioning.

ADD COMMENT
2
Entering edit mode
7.4 years ago

Hey Daniel Standage,

While the Directory type outputs are a real timesaver, it would be more idiomatic to label each of the outputs especially when they are so different as in this case. (As opposed to a tool that outputs many files all of the same type of contents)

I offer the following as both a conceptual and technical fix to your problem:

outputs:
  contigs:
    type: File
    outputBinding:
      glob: megahit_out/final.contigs.fa
  # and so on for each component of interest
ADD COMMENT
0
Entering edit mode

Yeah, I was afraid this might be the "correct" answer. Indeed, the contigs are the primary interest of output. But in cases where an assembler performs strangely it's useful to have all of the other ancillary and intermediate files around as well for troubleshooting. Declaring each of these files explicitly 1) requires a familiarity with the assembler that I don't yet have/need, and 2) has the potential to communicate an interest in these files that under normal circumstances isn't there.

But maybe I'm thinking too hard, and just grabbing the contigs should probably be sufficient for the majority of cases.

+1. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6