I am familiarizing myself with Docker, Common Workflow Language, and Dockstore. I've created a simple wrapper for the MEGAHIT assembler, and followed Dockstore's instructions for building a Docker image, describing the tool in CWL, and specifying inputs and outputs with JSON.
Here is the tool description in CWL.
class: CommandLineTool
doc: MEGAHIT assembler
id: megahit-v1.1.1
label: "MEGAHIT v1.1.1"
cwlVersion: v1.0
dct:creator:
"@id": "http://orcid.org/0000-0003-0342-8531"
foaf:name: Daniel Standage
foaf:mbox: "mailto:daniel.standage@gmail.com"
requirements:
- class: DockerRequirement
dockerPull: "quay.io/standage/megahit:latest"
hints:
- class: ResourceRequirement
coresMin: 4
ramMin: 2048
inputs:
intfasta:
type: File
doc: "Interleaved Fastq file to be assembled"
inputBinding:
position: 1
prefix: --12
threads:
type: int
doc: "Number of threads"
inputBinding:
position: 2
prefix: --num-cpu-threads
outputs:
megahit_out:
type: Directory
doc: "Assembly output"
outputBinding:
glob: megahit_out
baseCommand: ["megahit"]
And here is my parameter configuration.
{
"intfasta": {
"path": "/tmp/reads.fastq.gz",
"class": "File"
},
"megahit_out": {
"path": "/tmp/megahit_out",
"class": "Directory"
},
"threads": 4
}
Using the command recommended by Dockstore, I can successfully execute the workflow.
dockstore tool launch --entry quay.io/standage/megahit-dockstore:latest --json Dockstore.json
However, at the end of the output I see the following message.
--- [STAT] 117 contigs, total 4577548 bp, min 220 bp, max 246618 bp, avg 39124 bp, N50 105709 bp
--- [Fri Jun 23 19:40:56 2017] ALL DONE. Time elapsed: 1562.705876 seconds ---
[job temp6452337804155837687.cwl] completed success
Final process status is success
Saving copy of cwltool stdout to: /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821e-c59b82aa75bf/outputs/cwltool.stdout.txt
Saving copy of cwltool stderr to: /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821e-c59b82aa75bf/outputs/cwltool.stderr.txt
Provisioning your output files to their final destinations /Users/standage/Software/megahit-dockstore/./datastore/launcher-aaf3f88d-7e32-4bdd-821ec59b82aa75bf/outputs/megahit_out is not a file, ignoring
The last line is the bit that concerns me. I assume Directory
is a valid output type--I couldn't confirm from the documentation, but the Dockstore and CWL tooling seemed to handle it fine. Why is the output directory not being moved/copied to the destination specified in the JSON file?
Yeah, I was afraid this might be the "correct" answer. Indeed, the contigs are the primary interest of output. But in cases where an assembler performs strangely it's useful to have all of the other ancillary and intermediate files around as well for troubleshooting. Declaring each of these files explicitly 1) requires a familiarity with the assembler that I don't yet have/need, and 2) has the potential to communicate an interest in these files that under normal circumstances isn't there.
But maybe I'm thinking too hard, and just grabbing the contigs should probably be sufficient for the majority of cases.
+1. Thanks!