shell commands in parallel

categories: code

I needed a way to execute a list of commands in parallel. Existing tools like parallel from the moreutils debian package and pexec only allowed to pass the arguments by commandline. This becomes a problem when there are more commands than exec() can handle. You find out that limit with getconf NCARGS. Another issue with them is that they allow only a list of arguments that they append to a given command, not a list of commands to be run in parallel. Also the number of arguments that they can give to that command is limited to one. They also can only execute one command and not a chain of commands separated by semicolon.

What I needed was a program that would sequentially read commands or multiple commands separated by semicolons from a file. One command or chain of them per line and execute them when the overall number of currently executing processes is below a threshold.

The following script reads a file from stdin and does exactly what I want:

#!/bin/sh

NUM=0
QUEUE=""
MAX_NPROC=6

while read CMD; do
sh -c "$CMD" &

PID=$!
QUEUE="$QUEUE $PID"
NUM=$(($NUM+1))

# if enough processes were created
while [ $NUM -ge $MAX_NPROC ]; do
# check whether any process finished
for PID in $QUEUE; do
if [ ! -d /proc/$PID ]; then
TMPQUEUE=$QUEUE
QUEUE=""
NUM=0
# rebuild new queue from processes
# that are still alive
for PID in $TMPQUEUE; do
if [ -d /proc/$PID ]; then
QUEUE="$QUEUE $PID"
NUM=$(($NUM+1))
fi
done
break
fi
done
sleep 0.5
done
done
wait

EDIT: way too late I figured out that what I wanted to do is much easier by just using xargs like this:

cat command_list | xargs --delimiter='\n' --max-args=1 --max-procs=4 sh -c

where -P executes sh in parallel.

View Comments
blog comments powered by Disqus