Question

Speed up PHASE software for haplotype inference

0

Entering edit mode

8.9 years ago

kshitijtayal ▴ 40

I am a Bioinformatician and recently stuck in a problem which requires some scripting to speed up my process. We have a software called PHASE and Command that I type in my command line to fire software is

./PHASE test.inp test.out

where PHASE is the name of the program and test.ip is the input file and test.out is the output file.It takes one core to run the above process which takes approx 3 hours to complete.

Now I have 1000 of input files say test1.inp,test2.inp,test3.inp,... and so on to test1000.inp and want to generate all 1000 output files: test1.out,test2.out,.., test100.out using full capacity of my system which has 4 cores.

To use full capacity of my system I want to fire 4 instance of the above script that takes 4 input files like this and generate 4 different outputs

./PHASE test1.inp test1.out
./PHASE test2.inp test2.out
./PHASE test3.inp test3.out
./PHASE test4.inp test4.out

After each job is finished and output file has been generated the script should again fire up the remaining input files until all are over.

./PHASE test5.inp test5.out
./PHASE test6.inp test6.out
./PHASE test7.inp test7.out
./PHASE test8.inp test8.out

and so on..

How do I write the script for the above process where the script takes advantage of 4 cores and speed up my process?

multithreading unix haplotype • 1.9k views

ADD COMMENT • link updated 8 months ago by WANG • 0 • written 8.9 years ago by kshitijtayal ▴ 40

score 0 · Answer 1 · 2024-03-22

0

Entering edit mode

8 months ago

WANG • 0

A GNU tool parallel might be a choice.

ADD COMMENT • link 8 months ago by WANG • 0