I am trying to write a wrapper for kraken2
and struggling to express the database as a File
with secondaryFiles
. The structure of the DB is a directory with 3 files: hash.k2d
, opts.k2d
and taxo.k2d
. Because of this structure the typical secondaryFiles
format that assumes at least a shared nameroot
between the files does not work. However, the specification states that secondaryFiles
can be an expression that "must return a filename string relative to the path to the primary File, a File or Directory object with either path or location and basename fields set, or an array consisting of strings or File or Directory objects.".
I thus tried:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
id: kraken2
baseCommand:
- kraken2
inputs:
database:
type:
- Directory
- File
label: "Kraken 2 DB"
inputBinding:
position: 0
prefix: --db
valueFrom: $(self.dirname)
secondaryFiles: |
${
let dirname = self.location.split('/').slice(0,-1).join('/');
return [
{ class: "File", location: dirname + '/opts.k2d' },
{ class: "File", location: dirname + '/taxo.k2d' }
]
}
but this yields the following cwltool error (cwltool version 1.0.20190831161204):
cwltool kraken2.cwl input1.yml
INFO /home/pvh/.virtualenvs/cwltool/bin/cwltool 1.0.20190831161204
INFO Resolved 'kraken2.cwl' to 'file:///home/pvh/Documents/code/SANBI/pvh-forks/bio-cwl-tools/kraken2/kraken2.cwl'
ERROR Got workflow error
Traceback (most recent call last):
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/executors.py", line 168, in run_jobs
for job in jobiter:
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/command_line_tool.py", line 430, in job
builder = self._init_job(job_order, runtimeContext)
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/process.py", line 718, in _init_job
discover_secondaryFiles=getdefault(runtime_context.toplevel, False)))
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/builder.py", line 276, in bind_input
bindings.extend(self.bind_input(f, datum[f["name"]], lead_pos=lead_pos, tail_pos=f["name"], discover_secondaryFiles=discover_secondaryFiles))
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/builder.py", line 251, in bind_input
self.bind_input(schema, datum, lead_pos=lead_pos, tail_pos=tail_pos, discover_secondaryFiles=discover_secondaryFiles)
File "/home/pvh/.virtualenvs/cwltool/lib/python3.7/site-packages/cwltool/builder.py", line 332, in bind_input
sf_location = datum["location"][0:datum["location"].rindex("/")+1]+sfname
TypeError: can only concatenate str (not "dict") to str
ERROR Workflow error, try again with --debug for more information:
can only concatenate str (not "dict") to str
So I presume the CWL I have is incorrect. Is there a way to specify this structure of files?
And even if this can be done with some form of expression, the specification warns:
"To work on non-filename-preserving storage systems, portable tool descriptions should avoid constructing new values from location, but should construct relative references using basename or nameroot instead."
Yet since dirname
is seemingly not available to the expression, I am forced to rely on location
- I do not know what a non-filename-preserving storage system (e.g. S3?) would look like from this perspective.
Thanks for the clarification about the use of expressions in contrast to bare strings with
seoncaryFiles
. I ended up with:which allows the
database
parameter to be either aFile
or aDirectory
. In the case of theFile
input, it looks like:and in the case of the Directory it is: