Question

Variant Calling pipeline to be run parallel on multiple cores

0

Entering edit mode

4.8 years ago

AR • 0

Hi all,

I have a variant calling pipeline containing multiple steps from different tools. Currently, I am working on a Cray system. I have already run the commands individually on single sample. Now, I want to go for multiple samples (24 samples at a time). I want to run my complete pipeline of variant calling on the high-end server using the MPI module. My commands are in the python script and I want to modify it for mpi4py. Just an ex:

When run individually:

import os
os.system("command 1")

But if running all together for multiple commands on multiple cores

from mpi4py import MPI
import os

Sample = ["1","2","3"]

for a in Sample:
    os.system("command1..input="+a", output="+a+"_1") 
    os.system("command2..input="+a+"_1, output="+a+"_2")
    os.system("command3..input="+a+"_2, output="+a+"_3")
    os.system("command4..input="+a+"_3, output="+a+"_4")
    os.system("command5..input="+a+"_4, output="+a+"_5")

comm = MPI.COMM_WORLD 
rank = comm.Get_rank()

This script is not working at all.

Can anyone pls help me. I just want to run my python script with import os on multiple processors at a time. (20 samples on 20 cores) And I have to use only MPI module.

Thank you

sequencing • 1.9k views

ADD COMMENT • link updated 4.8 years ago by Ram 45k • written 4.8 years ago by AR • 0

5

Entering edit mode

My commands are in the python script

please don't. Use a workflow manager like nextflow or snakemake.

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Agree with what Pierre Lindenbaum said

ADD REPLY • link 4.8 years ago by lakhujanivijay 5.9k

1

Entering edit mode

Using GNU parallel instead

psuedo code

import subprocess

cmd_file_name = cmd_file.txt     
cmd_file = open(cmd_file_name, "a")

jobs = 20 

for i in your_sample_list :
  cmd = " ".join( [  "your command", "-i" , i ] )
  cmd_file.write(cmd, "\n")

cmd_file.close()

parallel_cmd = " ".join( [ "parallel", "--eta", "-j", jobs, "<", cmd_file_name ] )

subprocess.run(parallel_cmd, shell = True)

ADD REPLY • link updated 4.8 years ago by Ram 45k • written 4.8 years ago by lakhujanivijay 5.9k