Hello, In my ubuntu distribution I have 40 cpu. For my bioinformatics analysis, for instance, to execute an assembly via spades, I would like parallelize tasks using "&". But I don't know how to do that. What will happened when my computer will reach this full capacity? Let say I have 4 samples, should i give 10 cpus each?
From now, I an running my samples in a for loop, and give the full cpu capacity to each sample, without parallelization, but I think there in a clever way to optimize it.
spades.py -1 {sample}_R1.fastq -2 {sample}_R2.fastq -t 40 -o {out_dir}
Thank you!
you'd better use a workflow manager using snakemake or nextflow.
That you for your answer, You are right, but I am building an app in streamlit that allow the user to see the progression of each sample and process. If I run the whole workflow in nextflow I wouldn't be able to display the progression.
and
If you have 40 cores and you are already using them for one job you can't "parallelize". If you start multiple jobs using the same 40 cores then you will simply end up with contention issues and overall poor experience.
You could use 10 cores each and start 4 jobs in parallel but depending on capability of your computer hardware there would be bottlenecks with input/output etc with the end-result being the same as above.
Sometimes it may be worth doing serial jobs allowing each job to complete utilizing available resources to their full potential.
Understood! Thank you very much for the help