I am new to Bioinformatics(trainee) with only Data Analytics as background.
This is my current job script for running a GATK germline short variant discovery workflow for a single sample.
Note: I also ran it with 10 cores and 4GB memory per core
#!/bin/bash
#BSUB -J GATK_Test_Run
#BSUB -P 4kajdatx
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 2:00
#BSUB -q normal
#BSUB -n 40
#BSUB -R "rusage[mem=2G]"
#BSUB -B
#BSUB -N
#BSUB -u"
ml anaconda3
conda activate gatk
cd ~/GATKproj/SCRIPTS
./variant_calling.sh
I am testing out the commands individually on the hpc for initial evaluation. When I run bwa-mem with 10 threads on my Macbook pro M3(has 36GB RAM and 12 cores = 12 threads on Mac), it takes 11 minutes.
If I do the same on the hpc , no matter how much I increase memory or cores the job execution time remains ~ 1 hr and am not able to reduce it. I was able to see that the job is spread across multiple hosts and I suspect this could be the reason. Also should I submit bwa as a serial or a parallel job?
Thank you so much for your input and advice.
You need to be careful about allocating memory. Most programs in bioinformatics are not using parallel programming and when using multiple cores need to have adequate amount of RAM associated with each core. 10 cores with 4 GB of memory each may lead to starving individual cores so it may be better do run fewer cores with more RAM per core. You will also want to allocate those cores on the same physical node to avoid network overhead (so
-n NN
should also be using-N 1
).Are steps happening serially inside your
.sh
script? Not all steps are going to have the same compute requirements so submitting them as individual jobs (with dependencies to start next one after first completes) may be something to look into.Thank you so much for that! I will alter the job script and test it now.
Yes, they are happening serially. But I am just testing this for one step at the moment. A single command -
So the
-t
is where I mention the thread count and I wanted to know if using multiple threads has anything to do with submitting the job as parallel. Or the will it be parallel if I submit a job to run multiple inputs for the same command?Thank you.
This is the architecture of our hpc.
This is a multi-threaded job (i.e using more than one core) but it is not a parallel job in the sense of MPI. If you are using 10 cores for the job allocate 30-40GB of RAM (with
--mem=40g
) in SLURM. You should also directly sort and create BAM files in one step (no need to keep SAM).Interesting. This appears to be a Power based (IBM?) server. What OS are you running? Did you compile
bwa
for this architecture?Understood, thank you so much. I run
$gatk MarkDuplicatesSpark -I ${aligned_reads}/$sample_name.paired.sam -O ${aligned_reads}/$sample_name"_sorted_dedup_reads.bam"
to mark duplicates and sort. Should i use the samtools option instead?You should keep the GATK markdup step since you are using other tools from the suite.
I assumed since mem was memory per core and my Macbook has only 36 GB RAM with 12 cores, i expected 10 cores with 4GB RAM per core would give 40 GB RAM in total and a similar compute power to my Mac. Maybe its different with the hpc.I got it now. Thank you once again. It has been really helpful!
The difference you see in time may also originate from different CPU architecture. If you are running a x64 binary on POWER then it may be running under some sort of translation/emulation. You should re-compile
bwa
from source if your are indeed using POWER CPU's to get the best speed.I did compile it myself from source. The file command for bwa executable gives the following output: