Hi all,
I have a variant calling pipeline containing multiple steps from different tools. Currently, I am working on a Cray system. I have already run the commands individually on single sample. Now, I want to go for multiple samples (24 samples at a time). I want to run my complete pipeline of variant calling on the high-end server using the MPI module. My commands are in the python script and I want to modify it for mpi4py. Just an ex:
When run individually:
import os
os.system("command 1")
But if running all together for multiple commands on multiple cores
from mpi4py import MPI
import os
Sample = ["1","2","3"]
for a in Sample:
os.system("command1..input="+a", output="+a+"_1")
os.system("command2..input="+a+"_1, output="+a+"_2")
os.system("command3..input="+a+"_2, output="+a+"_3")
os.system("command4..input="+a+"_3, output="+a+"_4")
os.system("command5..input="+a+"_4, output="+a+"_5")
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
This script is not working at all.
Can anyone pls help me. I just want to run my python script with import os on multiple processors at a time. (20 samples on 20 cores) And I have to use only MPI module.
Thank you
please don't. Use a workflow manager like nextflow or snakemake.
Agree with what Pierre Lindenbaum said
Using GNU parallel instead