• Home
  • Tags
  • RSS
  • About
  • shell commands in parallel

    Timestamp:
    Tags: code

    I needed a way to execute a list of commands in parallel. Existing tools like parallel from the moreutils debian package and pexec only allowed to pass the arguments by commandline. This becomes a problem when there are more commands than exec() can handle. You find out that limit with getconf NCARGS. Another issue with them is that they allow only a list of arguments that they append to a given command, not a list of commands to be run in parallel. Also the number of arguments that they can give to that command is limited to one. They also can only execute one command and not a chain of commands separated by semicolon.

    What I needed was a program that would sequentially read commands or multiple commands separated by semicolons from a file. One command or chain of them per line and execute them when the overall number of currently executing processes is below a threshold.

    The following script reads a file from stdin and does exactly what I want:

    #!/bin/sh
    
    NUM=0
    QUEUE=""
    MAX_NPROC=6
    
    while read CMD; do
        sh -c "$CMD" &
    
        PID=$!
        QUEUE="$QUEUE $PID"
        NUM=$(($NUM+1))
    
        # if enough processes were created
        while [ $NUM -ge $MAX_NPROC ]; do
            # check whether any process finished
            for PID in $QUEUE; do
                if [ ! -d /proc/$PID ]; then
                    TMPQUEUE=$QUEUE
                    QUEUE=""
                    NUM=0
                    # rebuild new queue from processes
                    # that are still alive
                    for PID in $TMPQUEUE; do
                        if [ -d /proc/$PID  ]; then
                            QUEUE="$QUEUE $PID"
                            NUM=$(($NUM+1))
                        fi
                    done
                    break
                fi
            done
            sleep 0.5
        done
    done
    wait
    

    EDIT: way too late I figured out that what I wanted to do is much easier by just using xargs like this:

    cat command_list | xargs --delimiter='\n' --max-args=1 --max-procs=4 sh -c
    

    where -P executes sh in parallel.