I would like to apologize in advance for any ignorance regarding the documentation that I might have missed. It is not my intention to ask for what I would have known if I had read the documentation better, I am merely trying to grasp the concepts that are abstracted in the Cromwell metadata as described by the paragraph about metadata in the Cromwell docs.
When executing a workflow written in WDL and executed with Cromwell (the scientific workflow engine) one can extract metadata out of the Cromwell database. Within this metadata, the following "executionEvents" are available for each "workflow.task" in the "calls" objects.
- Pending
- Requesting ExecutionToken
- WaitingFor ValueStore
- PreparingJob
- CallCache Reading
- RunningJob
- Updating CallCache
- Updating JobStore
From the documentation:
Call Caching allows Cromwell to detect when a job has been run in the past so that it doesn't have to re-compute results, saving both time and money. The main purpose of the Job Store table is to support resuming execution of a workflow when Cromwell is restarted by recovering the outputs of completed jobs.
I couldn't find a description of the Execution Token nor of the Value Store in the docs.
My questions are the following:
- What is the engine waiting on when a task/job is "Pending"?
- Is Requesting an Execution Token something that happens for every task because of security reasons, or does it have to do with the allowed capacity for Cromwell? What types of token are we talking about?
- What happens during Value Store, where are which values stored and why are we waiting on it rather than doing it?
- is this, for example, collecting default environment variables that should be set before running the workflow; or
- is it collecting the values of variables that are used in the workflow, provided with the
inputs.json
?
- Does "PreparingJob" actually mean preparing the environment, or is it digesting all inputs for the task such that it can create the actual script to run/submit to the backend?
- Is it possible to split RunningJob into more fragments than the time between submitting a task to the backend and the time it returns from the backend, or should everyone build a construct for their own backends themselves? (often one is interested in the time it takes a task to execute, rather than the sum of the time it takes for the backend to queue the task and the time it takes to execute it)