Entering edit mode
8 months ago
asmariyaz23
▴
10
Hello
I have been using singularity within cromwell conf and now want to enable call-caching. I do not have internet on any of the cluster nodes. However the call-cache is still turned off. Is it because cromwell cannot pull the docker image (there is not internet on any of the nodes)?.
Here is the cromwell.conf.
# This line is required. It pulls in default overrides from the embedded cromwell
# `reference.conf` (in core/src/main/resources) needed for proper performance of cromwell.
include required(classpath("application"))
# Cromwell HTTP server settings
webservice {
#port = 8000
#interface = 0.0.0.0
#binding-timeout = 5s
#instance.name = "reference"
}
# Cromwell "system" settings
system {
# If 'true', a SIGINT will trigger Cromwell to attempt to abort all currently running jobs before exiting
#abort-jobs-on-terminate = false
# If 'true', a SIGTERM or SIGINT will trigger Cromwell to attempt to gracefully shutdown in server mode,
# in particular clearing up all queued database writes before letting the JVM shut down.
# The shutdown is a multi-phase process, each phase having its own configurable timeout. See the Dev Wiki for more details.
#graceful-server-shutdown = true
# Cromwell will cap the number of running workflows at N
#max-concurrent-workflows = 5000
# Cromwell will launch up to N submitted workflows at a time, regardless of how many open workflow slots exist
#max-workflow-launch-count = 50
# Number of seconds between workflow launches
#new-workflow-poll-rate = 20
# Since the WorkflowLogCopyRouter is initialized in code, this is the number of workers
#number-of-workflow-log-copy-workers = 10
# Default number of cache read workers
#number-of-cache-read-workers = 25
io {
# throttle {
# # Global Throttling - This is mostly useful for GCS and can be adjusted to match
# # the quota availble on the GCS API
# #number-of-requests = 100000
# #per = 100 seconds
# }
# Number of times an I/O operation should be attempted before giving up and failing it.
#number-of-attempts = 5
}
# Maximum number of input file bytes allowed in order to read each type.
# If exceeded a FileSizeTooBig exception will be thrown.
input-read-limits {
#lines = 128000
#bool = 7
#int = 19
#float = 50
#string = 128000
#json = 128000
#tsv = 128000
#map = 128000
#object = 128000
}
abort {
# These are the default values in Cromwell, in most circumstances there should not be a need to change them.
# How frequently Cromwell should scan for aborts.
scan-frequency: 30 seconds
# The cache of in-progress aborts. Cromwell will add entries to this cache once a WorkflowActor has been messaged to abort.
# If on the next scan an 'Aborting' status is found for a workflow that has an entry in this cache, Cromwell will not ask
# the associated WorkflowActor to abort again.
cache {
enabled: true
# Guava cache concurrency.
concurrency: 1
# How long entries in the cache should live from the time they are added to the cache.
ttl: 20 minutes
# Maximum number of entries in the cache.
size: 100000
}
}
# Cromwell reads this value into the JVM's `networkaddress.cache.ttl` setting to control DNS cache expiration
dns-cache-ttl: 3 minutes
}
docker {
hash-lookup {
# Set this to match your available quota against the Google Container Engine API
#gcr-api-queries-per-100-seconds = 1000
# Time in minutes before an entry expires from the docker hashes cache and needs to be fetched again
#cache-entry-ttl = "20 minutes"
# Maximum number of elements to be kept in the cache. If the limit is reached, old elements will be removed from the cache
#cache-size = 200
# How should docker hashes be looked up. Possible values are "local" and "remote"
# "local": Lookup hashes on the local docker daemon using the cli
# "remote": Lookup hashes on docker hub, gcr, gar, quay
#method = "remote"
enabled = "false"
}
}
# Here is where you can define the backend providers that Cromwell understands.
# The default is a local provider.
# To add additional backend providers, you should copy paste additional backends
# of interest that you can find in the cromwell.example.backends folder
# folder at https://www.github.com/broadinstitute/cromwell
# Other backend providers include SGE, SLURM, Docker, udocker, Singularity. etc.
# Don't forget you will need to customize them for your particular use case.
backend {
# Override the default backend.
default = slurm
# The list of providers.
providers {
# Copy paste the contents of a backend provider in this section
# Examples in cromwell.example.backends include:
# LocalExample: What you should use if you want to define a new backend provider
# AWS: Amazon Web Services
# BCS: Alibaba Cloud Batch Compute
# TES: protocol defined by GA4GH
# TESK: the same, with kubernetes support
# Google Pipelines, v2 (PAPIv2)
# Docker
# Singularity: a container safe for HPC
# Singularity+Slurm: and an example on Slurm
# udocker: another rootless container solution
# udocker+slurm: also exemplified on slurm
# HtCondor: workload manager at UW-Madison
# LSF: the Platform Load Sharing Facility backend
# SGE: Sun Grid Engine
# SLURM: workload manager
# Note that these other backend examples will need tweaking and configuration.
# Please open an issue https://www.github.com/broadinstitute/cromwell if you have any questions
slurm {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
# Root directory where Cromwell writes job results in the container. This value
# can be used to specify where the execution folder is mounted in the container.
# it is used for the construction of the docker_cwd string in the submit-docker
# value above.
dockerRoot = "/cromwell-executions"
concurrent-job-limit = 10
# If an 'exit-code-timeout-seconds' value is specified:
# - check-alive will be run at this interval for every job
# - if a job is found to be not alive, and no RC file appears after this interval
# - Then it will be marked as Failed.
## Warning: If set, Cromwell will run 'check-alive' for every job at this interval
exit-code-timeout-seconds = 360
filesystems {
local {
localization: [
# soft link does not work for docker with --contain. Hard links won't work
# across file systems
"copy", "hard-link", "soft-link"
]
caching {
duplication-strategy: ["copy", "hard-link", "soft-link"]
hashing-strategy: "xxh64"
}
}
}
#
runtime-attributes = """
Int runtime_minutes = 600
Int cpus = 2
Int requested_memory_mb_per_core = 30000
String? docker
String? account
String? IMAGE
"""
submit = """
sbatch \
--wait \
--job-name=${job_name} \
--chdir=${cwd} \
--output=${out} \
--error=${err} \
--time=${runtime_minutes} \
${"--cpus-per-task=" + cpus} \
--mem-per-cpu=${requested_memory_mb_per_core} \
--account=${account} \
--wrap "/bin/bash ${script}"
"""
submit-docker = """
# SINGULARITY_CACHEDIR needs to point to a directory accessible by
# the jobs (i.e. not lscratch). Might want to use a workflow local
# cache dir like in run.sh
source /scratch/asmab/set_singularity_cachedir.sh
SINGULARITY_CACHEDIR=/scratch/asmab/singularity-cache
echo "SINGULARITY_CACHEDIR $SINGULARITY_CACHEDIR"
if [ -z $SINGULARITY_CACHEDIR ]; then
CACHE_DIR=$HOME/.singularity
else
CACHE_DIR=$SINGULARITY_CACHEDIR
fi
mkdir -p $CACHE_DIR
echo "SINGULARITY_CACHEDIR $SINGULARITY_CACHEDIR"
LOCK_FILE=$CACHE_DIR/singularity_pull_flock
# we want to avoid all the cromwell tasks hammering each other trying
# to pull the container into the cache for the first time. flock works
# on GPFS, netapp, and vast (of course only for processes on the same
# machine which is the case here since we're pulling it in the master
# process before submitting).
#flock --exclusive --timeout 1200 $LOCK_FILE \
# singularity exec --containall docker://${docker} \
# echo "successfully pulled ${docker}!" &> /dev/null
# Ensure singularity is loaded if it's installed as a module
#module load Singularity/3.0.1
module load apptainer/1.2.4
# Build the Docker image into a singularity image
IMAGE=${docker}.sif
apptainer build $IMAGE docker://${docker}
# Submit the script to SLURM
sbatch \
--wait \
--job-name=${job_name} \
--chdir=${cwd} \
--output=${cwd}/execution/stdout \
--error=${cwd}/execution/stderr \
--time=${runtime_minutes} \
${"--cpus-per-task=" + cpus} \
--mem-per-cpu=${requested_memory_mb_per_core} \
--account=${account} \
--wrap "apptainer exec --containall --bind ${cwd}:${docker_cwd} ${IMAGE} ${job_shell} ${docker_script}"
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
}
}
call-caching {
enabled = true
invalidate-bad-cache-results = true
}