I am a Bioinformatician and recently stuck in a problem which requires some scripting to speed up my process. We have a software called PHASE and Command that I type in my command line to fire software is
./PHASE test.inp test.out
where PHASE is the name of the program and test.ip is the input file and test.out is the output file.It takes one core to run the above process which takes approx 3 hours to complete.
Now I have 1000 of input files say test1.inp
,test2.inp
,test3.inp
,... and so on to test1000.inp
and want to generate all 1000 output files: test1.out
,test2.out
,.., test100.out
using full capacity of my system which has 4 cores.
To use full capacity of my system I want to fire 4 instance of the above script that takes 4 input files like this and generate 4 different outputs
./PHASE test1.inp test1.out
./PHASE test2.inp test2.out
./PHASE test3.inp test3.out
./PHASE test4.inp test4.out
After each job is finished and output file has been generated the script should again fire up the remaining input files until all are over.
./PHASE test5.inp test5.out
./PHASE test6.inp test6.out
./PHASE test7.inp test7.out
./PHASE test8.inp test8.out
and so on..
How do I write the script for the above process where the script takes advantage of 4 cores and speed up my process?