The simplest way to do this sort of task is with gnu parallel, which is a very powerful tool to launch and coordinate multiple independent tasks:
#!/bin/bash
parallel --xapply -a $1 -a $2 echo {1}.1111 {2}.1111
parallel --xapply -a $1 -a $2 echo {1}.2222 {2}.2222
This will run all of the first jobs first in parallel, and then all of the second jobs first in parallel; it guarantees your constraint is met, but it's a bit heavy handed (waiting until _all_ of the first jobs are done untill _any_ of the second jobs are done):
$ ./parallel-script capital.txt small.txt
A.1111 a.1111
B.1111 b.1111
C.1111 c.1111
D.1111 d.1111
A.2222 a.2222
B.2222 b.2222
C.2222 c.2222
D.2222 d.2222
You could also have each processor do both dependant in order:
#!/bin/bash
parallel --xapply -a $1 -a $2 "echo {1}.1111 {2}.1111; echo {1}.2222 {2}.2222"
$ ./script capital.txt small.txt
A.1111 a.1111
A.2222 a.2222
B.1111 b.1111
B.2222 b.2222
C.1111 c.1111
C.2222 c.2222
D.1111 d.1111
D.2222 d.2222
The tutorial on biostars shows how this can be used to run across nodes, and how to set the number of processes to run on.
It's possible to do this without gnu-parallel, of course, but it's instructive to see how much more complicated it is. So make -j
is of course an old standby, made a little more complicated to use here because we need to get two arguments to the makefile. Here we write a script to build a Makefile (for which we'll require a fairly new gnu make:)
#!/bin/bash
makefile="Makefile"
jobs=("1111" "2222")
items=$( paste -d_ $1 $2 )
njobs=${#jobs[@]}
let lastjob=${njobs}-1
echo -n "all: " > ${makefile}
for (( job=0; job<=${lastjob}; job++ ))
do
for item in ${items}
do
echo -n "${item}.${jobs[$job]} " >> ${makefile}
done
done
echo "" >> ${makefile}
echo "" >> ${makefile}
echo 'firstfile = $(firstword $(subst _, , $1))$(strip $(2))' >> ${makefile}
echo 'secondfile = $(word 2,$(subst _, , $1))$(strip $(2))' >> ${makefile}
echo "" >> ${makefile}
for (( job=0; job<=${lastjob}; job++ ))
do
let pjob=${job}-1
if [ $job == 0 ]
then
echo "%.${jobs[$job]}:" >> ${makefile}
else
echo "%.${jobs[$job]}: %.${jobs[$pjob]}" >> ${makefile}
fi
echo ' @echo $(call firstfile, $(basename $@), $(suffix $@)) \' >> ${makefile}
echo ' $(call secondfile, $(basename $@), $(suffix $@)) ' >> ${makefile}
echo ' touch $@' >> ${makefile}
echo " " >> ${makefile}
done
Running it gives:
$ ./makemakefile capital.txt small.txt
$ cat Makefile
all: A_a.1111 B_b.1111 C_c.1111 D_d.1111 A_a.2222 B_b.2222 C_c.2222 D_d.2222
firstfile = $(firstword $(subst _, , $1))$(strip $(2))
secondfile = $(word 2,$(subst _, , $1))$(strip $(2))
%.1111:
@echo $(call firstfile, $(basename $@), $(suffix $@)) \
$(call secondfile, $(basename $@), $(suffix $@))
touch $@
%.2222: %.1111
@echo $(call firstfile, $(basename $@), $(suffix $@)) \
$(call secondfile, $(basename $@), $(suffix $@))
touch $@
$ make -j 3
A.1111 a.1111
B.1111 b.1111
touch A_a.1111
touch B_b.1111
C.1111 c.1111
touch C_c.1111
D.1111 d.1111
touch D_d.1111
A.2222 a.2222
touch A_a.2222
B.2222 b.2222
touch B_b.2222
C.2222 c.2222
touch C_c.2222
D.2222 d.2222
touch D_d.2222
$ rm *_*
Note here we've created phony targets and created them with touch; your workflow will likely produce real files that you can use as dependencies instead.
Finally, you can even just launch multiple processes on the same machine with ampersand, and wait for them to complete. You might need to stick another wait in there to make sure the jobs you need are complete:
$ cat ./makerunscript
#!/bin/bash
NPROCS=3
jobscript="jobscript.sh"
echo "#!/bin/bash" > $jobscript
let count=0
for job in "1111" "2222"
do
for item in $( paste -d_ $1 $2 )
do
left=$( echo $item | sed -e 's/^\([^_]*\)_.*/\1/' )
right=$( echo $item | sed -e 's/[^_]*_\(.*\)/\1/' )
echo "echo ${left}.${job} ${right}.${job} &" >> $jobscript
let count=count+1
if [ $(( count % NPROCS )) == 0 ]
then
echo "wait" >> $jobscript
fi
done
echo "wait # make sure all earlier jobs done" >> $jobscript
done
echo "wait # make sure all jobs done" >> $jobscript
$ ./makerunscript capital.txt small.txt
$ cat jobscript.sh
#!/bin/bash
echo A.1111 a.1111 &
echo B.1111 b.1111 &
echo C.1111 c.1111 &
wait
echo D.1111 d.1111 &
wait # make sure all earlier jobs done
echo A.2222 a.2222 &
echo B.2222 b.2222 &
wait
echo C.2222 c.2222 &
echo D.2222 d.2222 &
wait # make sure all earlier jobs done
wait # make sure all jobs done
$ source jobscript.sh
A.1111 a.1111
[1] Done echo A.1111 a.1111
B.1111 b.1111
[2]- Done echo B.1111 b.1111
C.1111 c.1111
[3]+ Done echo C.1111 c.1111
D.1111 d.1111
[1]+ Done echo D.1111 d.1111
A.2222 a.2222
[1]- Done echo A.2222 a.2222
B.2222 b.2222
[2]+ Done echo B.2222 b.2222
C.2222 c.2222
D.2222 d.2222
[1]- Done echo C.2222 c.2222
[2]+ Done echo D.2222 d.2222
So it's certainly possible to do with tools other than gnu-parallel, but it sure is a lot easier to just make sure gnu-parallel is installed.
" that it is important to run each letter in series because it is dependent on the other" sounds like a job for Makefile, with opion ' -j '