How to convert a for loop to a Job-array in LSF cluster
2
1
Entering edit mode
2.2 years ago
LDT ▴ 340

I have 100 files, and I want to parallelise my submission to save time instead of running jobs one by one. How can I change this script to a Job-array in LSF using bsub submission system?

#BSUB -J ExampleJob1         #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash           #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00                #Set the wall clock limit to 2hr
#BSUB -n 1                   #Request 1 core
#BSUB -R "span[ptile=1]"     #Request 1 core per node.
#BSUB -R "rusage[mem=5000]"  #Request 5000MB per process (CPU) for the job
#BSUB -M 5000                #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J      #Send stdout and stderr to "Example1Out.[jobID]"

path=./home/

for each in *.bam 
do 
samtools coverage ${each} -o ${each}_coverage.txt
done

Thank you for your time; any help is appreciated. I am a starter at LSF and quite confused

lsf cluster-computing Job-array hpc • 1.8k views
ADD COMMENT
1
Entering edit mode

How about just submitting jobs in a for loop, i.e. remove the loop from the script and replace ${each} with $1

ADD REPLY
0
Entering edit mode

I ll try this one 5heikki, it seems very promising. I am sure thought how i can define the range to run only 10 jobs at a time

ADD REPLY
0
Entering edit mode

It shouldn't be your responsibility. Whoever setup the queue manager should be responsible for such stuff. As an example, you submit 10k jobs, but only 10 run in parallel (if there are available slots) because that's how the queue manager was setup..

ADD REPLY
3
Entering edit mode
2.2 years ago

ok, second answer with a Makefile using option -j

SHELL=/bin/bash
BAMS=$(shell ls -1 /path/to/*.bam)
define run

$$(addsuffix .out,$(1)): $(1)
    samtools coverage  -o $$@ $$<
endef

all: $(addsuffix .out,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))

and run with:

#BSUB -J ExampleJob1         #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash           #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00                #Set the wall clock limit to 2hr
#BSUB -n 10                   #Request 10 core <====================
#BSUB -R "span[ptile=1]"     #Request 1 core per node.
#BSUB -R "rusage[mem=5000]"  #Request 5000MB per process (CPU) for the job
#BSUB -M 5000                #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J      #Send stdout and stderr to "Example1Out.[jobID]"


cd /path/to/your/makefile/dir && make -j 10
ADD COMMENT
2
Entering edit mode
2.2 years ago

use a workflow manager like nextflow. e.g (not tested)

params.bams="NO_FILE"

workflow  {
        ch1= ST_COVERAGE(params,Channel.fromPath(params.bams).splitText().map{it.trim()})
        MAKE_LIST(params,ch1.output.collect())
        }
process ST_COVERAGE {
tag "${bam}"
input:
     val(meta)
     val(bam)
output:
    path("coverage.txt"),emit:output
script:
"""
samtools coverage -o coverage.txt "${bam}"
"""
}

process MAKE_LIST {
input:
   val(meta)
   val(L)
output:
  path("all.list"),emit:output
script:
"""
cat << EOF > all.list
${L.join("\n")}
EOF
"""
}

and run, something like

nextflow run -C lsf.config -resume --bams bam_paths.txt biostars9540675.nf
ADD COMMENT
0
Entering edit mode

That's a nice suggestion, Pierre. I appreciate. I am a bit behind in next flow but your post boosts me to go in that direction even faster

ADD REPLY

Login before adding your answer.

Traffic: 2409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6