Question

Allocating right resources to HPC using LSF scheduler

0

Entering edit mode

13 months ago

Maverick ▴ 10

I am new to Bioinformatics(trainee) with only Data Analytics as background.

This is my current job script for running a GATK germline short variant discovery workflow for a single sample.

Note: I also ran it with 10 cores and 4GB memory per core

#!/bin/bash
#BSUB -J GATK_Test_Run
#BSUB -P 4kajdatx    
#BSUB -o %J.out    
#BSUB -e %J.err    
#BSUB -W 2:00    
#BSUB -q normal    
#BSUB -n 40    
#BSUB -R "rusage[mem=2G]"    
#BSUB -B    
#BSUB -N    
#BSUB -u"

ml anaconda3
conda activate gatk
cd ~/GATKproj/SCRIPTS
./variant_calling.sh

I am testing out the commands individually on the hpc for initial evaluation. When I run bwa-mem with 10 threads on my Macbook pro M3(has 36GB RAM and 12 cores = 12 threads on Mac), it takes 11 minutes.

If I do the same on the hpc , no matter how much I increase memory or cores the job execution time remains ~ 1 hr and am not able to reduce it. I was able to see that the job is spread across multiple hosts and I suspect this could be the reason. Also should I submit bwa as a serial or a parallel job?

Thank you so much for your input and advice.

BWA GATK hpc lsf bsub • 1.8k views

ADD COMMENT • link 13 months ago by Maverick ▴ 10

1

Entering edit mode

You need to be careful about allocating memory. Most programs in bioinformatics are not using parallel programming and when using multiple cores need to have adequate amount of RAM associated with each core. 10 cores with 4 GB of memory each may lead to starving individual cores so it may be better do run fewer cores with more RAM per core. You will also want to allocate those cores on the same physical node to avoid network overhead (so -n NN should also be using -N 1).

Are steps happening serially inside your .sh script? Not all steps are going to have the same compute requirements so submitting them as individual jobs (with dependencies to start next one after first completes) may be something to look into.

ADD REPLY • link 13 months ago by GenoMax 150k

0

Entering edit mode

Thank you so much for that! I will alter the job script and test it now.

Yes, they are happening serially. But I am just testing this for one step at the moment. A single command -

$bwa mem -t 10 -R "@RG\tID:MIA_NEO\tPL:ILLUMINA\tSM:$sample_name" ${ref} ${reads}/$input1.fastq.gz ${reads}/$input2.fastq.gz > ${aligned_reads}/$sample_name.paired.sam

So the -t is where I mention the thread count and I wanted to know if using multiple threads has anything to do with submitting the job as parallel. Or the will it be parallel if I submit a job to run multiple inputs for the same command?

Thank you.

ADD REPLY • link updated 13 months ago by Ram 45k • written 13 months ago by Maverick ▴ 10

0

Entering edit mode

This is the architecture of our hpc.

Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0-159
Thread(s) per core:    4
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          6
Model:                 2.2 (pvr 004e 1202)
Model name:            POWER9, altivec supported
CPU max MHz:           3800.0000
CPU min MHz:           2300.0000
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              10240K
NUMA node0 CPU(s):     0-79
NUMA node8 CPU(s):     80-159
NUMA node252 CPU(s):   
NUMA node253 CPU(s):   
NUMA node254 CPU(s):

ADD REPLY • link 13 months ago by Maverick ▴ 10

1

Entering edit mode

This is a multi-threaded job (i.e using more than one core) but it is not a parallel job in the sense of MPI. If you are using 10 cores for the job allocate 30-40GB of RAM (with --mem=40g) in SLURM. You should also directly sort and create BAM files in one step (no need to keep SAM).

bwa mem (...options)  | samtools sort --write-index -o sorted.bam -

Interesting. This appears to be a Power based (IBM?) server. What OS are you running? Did you compile bwa for this architecture?

ADD REPLY • link 13 months ago by GenoMax 150k

0

Entering edit mode

Understood, thank you so much. I run $gatk MarkDuplicatesSpark -I ${aligned_reads}/$sample_name.paired.sam -O ${aligned_reads}/$sample_name"_sorted_dedup_reads.bam" to mark duplicates and sort. Should i use the samtools option instead?

ADD REPLY • link 13 months ago by Maverick ▴ 10

1

Entering edit mode

You should keep the GATK markdup step since you are using other tools from the suite.

ADD REPLY • link 13 months ago by GenoMax 150k

0

Entering edit mode

I assumed since mem was memory per core and my Macbook has only 36 GB RAM with 12 cores, i expected 10 cores with 4GB RAM per core would give 40 GB RAM in total and a similar compute power to my Mac. Maybe its different with the hpc.I got it now. Thank you once again. It has been really helpful!

ADD REPLY • link 13 months ago by Maverick ▴ 10

0

Entering edit mode

The difference you see in time may also originate from different CPU architecture. If you are running a x64 binary on POWER then it may be running under some sort of translation/emulation. You should re-compile bwa from source if your are indeed using POWER CPU's to get the best speed.

ADD REPLY • link 13 months ago by GenoMax 150k

1

Entering edit mode

I did compile it myself from source. The file command for bwa executable gives the following output:

ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=6e335ef1eec6a421c095036a47f9eb1c3c1e1113, not stripped

ADD REPLY • link 13 months ago by Maverick ▴ 10

score 1 · Answer 1 · 2024-03-25

1

Entering edit mode

13 months ago

noodle ▴ 650

Are you sure your job submissions are running the entire time? Depending on how your LSF admins configured job submissions, there is also a chance your jobs get suspended. In addition to the other comments in this thread, you should make sure your jobs aren't bouncing between running and suspended states.

ADD COMMENT • link 13 months ago by noodle ▴ 650

0

Entering edit mode

Yeah, I think i should look into this as well. I did not think my job could get suspended and run alternatively like that. I have a script to give me the time taken to complete my command - but of course it's reliable only if my job starts and ends without interruption.Thanks for your input!

ADD REPLY • link 13 months ago by Maverick ▴ 10

1

Entering edit mode

LSF has multiple ways to monitoring the system, all the way down to job level, via web browser. Ask your LSF admins if they have installed the monitoring programs and how you can access them. If by chance they haven't installed them, or made them public to users, you should demand that they do...