Is it better to define time/cpus/memory in the nextflow.config file?
Nextflow provides process selectors for the config file. This means that it may be easier to set process directives, including resources requests, in the nextflow.config
file. You could use the label
directive in your processes (in the script.nf
file) to label them as needing a lot of resources, medium or small. Then, in the nextflow.config
file, you would have something like:
process {
cpus = 16
queue = 'long'
withLabel: big_mem {
memory = 64.GB
}
withLabel: small_mem {
memory = 2.GB
}
}
Besides, using nextflow.config
for this makes your pipeline easier to port to other users/infrastructures. They just have to change the nextflow.config
file instead of looking for the directives in your scripts. In a simple case, you could only have one script.nf
but in other scenarios, there could be plenty of script files.
Is nextflow better able to estimate the necessary resources?
Nextflow doesn't do any resource estimation, but Nextflow Tower does. It will estimate better resource configurations based on previous runs and runs from other users of the same pipeline.
Is it necessary to define time for longer processes?
Well, it's up to you. If you think a process shouldn't run for longer than N minutes, and you want Nextflow to abort the task if it happens, set a time for it.