Hi everybody,
I have a problem with hard-coded paths in scripts that would require changing the directory before executing a command line tool.
Let's consider the following minimal example:
○ → tree
.
├── code
│ └── script.py
├── data
│ └── hi.txt
└── tool.cwl
with script.py:
#! /usr/bin/env python3
with open('../data/hi.txt', 'r') as f:
for line in f:
print(line)
When I run script.py
from the /code
diretory, it prints the content of hi.txt
. I then try to wrap this into a CWL tool:
cwlVersion: v1.0
class: CommandLineTool
inputs:
script:
type: File
inputBinding:
position: 1
default:
class: File
path: code/script.py
data:
type: File
default:
class: File
path: data/hi.txt
outputs: []
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs.script)
entryname: code/script.py
- entry: $(inputs.data)
entryname: data/hi.txt
Running this tool fails when the script tries to open hi.txt
. I can fix this by changing into the right directory at the beginning of the script:
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))
Now I'm wondering if CWL offers a way to change the directory before executing a tool instead. I am aware that using arguments instead of hard-coded paths or at least making the paths relative to the root of my dummy project would help here. But let's just assume that I can not modify the script at all.
Thanks!
Hi Tom,
thanks for your remark. Yes, the two directories are created (at least when using the reference implementation, I haven't tried with any other CWL runner). The content of the execution directory for my above example looks like this:
Cheers, Andreas
Alright, good to know! Have you tried passing the script's name as an argument (like in the example above) instead of using
inputBinding
? Because if the directories are created as planned and no job input for script is provided, i would expect cwltool to:use default values for
script
anddata
, which means passing script.py and hi.txt from the subdirectory to the toolcreate both directories as specified
place script.py and hi.txt in the respective subdirectories
invoke the command line tool and pass script.py as the only argument
The last step is where i assume stuff goes wrong. The default value for
script
is a file, not the path to a file. So the command line argument is "script.py" and not "code/script.py". The script is therefore not executed in the "code" subdirectory.In my own experience, combining relative paths with CWL tends to cause a horrible dumpster fire. Necat was the last tool which forced that stuff upon me, and what should have been a single tool wrapper became two bash scripts and a four step workflow.
Hi Tom, thanks for that suggestion, but adding the script as an argument instead of an input doesn't help.
In fact, in my above attempt the path of the script appears in the command invocation as it should (under
path/to/tmp/exec-dir/code/script.py
). But that doesn't change the fact that the script is executed inpath/to/tmp/exec-dir
and there seems to be no way to force CWL to execute the script inpath/to/tmp/exec-dir/code
except for changing the path inside the script.I completely agree that hard-coded relative paths are a bad idea in CWL. I was just trying to find out to what extent any existing (potentially poorly written) codebase could be used in CWL pipelines without any modification of the code itself.