Question

How to globally specify that shell sessions for all tasks must source (e.g.) $HOME/.profile ?

0

Entering edit mode

15 months ago

kynnjo ▴ 70

As far as I can tell, by default at least, shell sessions corresponding to the command section of tasks in a WDL do not source any of the user-specific shell initialization files, such as $HOME/.profile, $HOME/.bash_profile, or $HOME/.bashrc.

This makes it difficult to implement uniform set-up behaviors that should happen for all tasks.

Of course, one can always include the boilerplate line

source $HOME/.profile

at the beginning of each and every task, but this is tedious, and, therefore, error-prone.

Hence, I am looking for some alternative way to achieve the same thing globally for all the tasks in a WDL.

I stress that I am looking for a solution that (a) applies to all the tasks within the lexical scope of a single WDL file, and (b) can be achieved within the WDL itself (as opposed to through external files such as an options.json file). The rationale is the same for both these requirements: namely, to keep the WDL self-contained.

One additional requirement is that the solution should be consistent with running the workflows through Terra.

(I do not rule out a solution at the level of the docker image itself, but I don't think it would likely. The need for sourcing a configuration file at the beginning of every shell session is to implement settings that cannot be known at build-time, and therefore must be postponed until run-time.)

EDIT: $HOME/.profile is just an example. I don't care what the name of the file is. $HOME/setup, $HOME/foo42, or whatever else you like. The only requirement is that I don't have to include something like source $WHATEVER at the beginning of every task.

cromwell terra wdl docker • 988 views

ADD COMMENT • link updated 15 months ago by Patrick Magee • 0 • written 15 months ago by kynnjo ▴ 70

0

Entering edit mode

Is there any chance you could have the workflow use login sessions (bash -l)?

ADD REPLY • link 15 months ago by Ram 44k

0

Entering edit mode

have you tried adding something to the cromwell configuration for your particular backend?https://github.com/broadinstitute/cromwell/blob/develop/cromwell.example.backends/cromwell.examples.conf

ADD REPLY • link 15 months ago by Jeremy Leipzig 22k

score 0 · Answer 1 · 2023-08-23

I would avoid using $HOME/.profile as it will function as a hidden input that will hurt reproducibility. I would rather have an explicit input that is part of the WDL:

version 1.0

workflow my_workflow {
  input { String? task_setup }
  call my_task { input: setup = task_setup }
  output { Array[File] files = my_task.files }
}

task my_task {
  input { String? setup }
  command <<<
    ~{setup}
  >>>
  output { Array[File] files = glob("*") }
  runtime { docker: "debian:bookworm-slim" }
}

Then running it with miniwdl:

$ miniwdl run setup.wdl 'task_setup=touch foo.txt && touch bar.txt'
#### MUCH OUTPUT REMOVED ####
{
  "dir": "/Users/username/wdl_setup/20230823_144257_my_workflow",
  "outputs": {
    "my_workflow.files": [
      "/Users/username/wdl_setup/20230823_144257_my_workflow/out/files/0/bar.txt",
      "/Users/username/wdl_setup/20230823_144257_my_workflow/out/files/1/foo.txt"
    ]
  }
}

Now, the task_setup input obviously does code injection and therefore there are security implications. You can't run this as a service where arbitrary people can provide inputs. But if only trusted users are running it, then it may be acceptable.

I am not a Terra user, so I don't know what limitations they might impose on your WDLs.

score 0 · Answer 2 · 2023-08-23

@kynnjo your only possible remedy here is to modify the docker images that your tasks use.

In the docker image you can configure the user user at runtime by the USER directive, and then configure bash profile for that user in the Dockerfile. This approach will definitely work with Terra, since you are in full control of the container images that your WDL uses.

WDL is first and foremost a language that was designed to handle bioinformatics workloads in a distributed system. Part of this is a heavy reliance on containerized environments (although not strictly necessary). There is no concept of a shared user profile because tasks are meant to be completely independent of one another.

Most workflows I run in production use a variety of different docker images, most of which do not have the same binaries installed or even the same user so the concept of a shared user session simply does not make sense.