A Weird Imagination

Kill child jobs on script exit

Posted in

The problem

When writing a shell script that starts background jobs, sometimes running those jobs past the lifetime of the script doesn't make sense. (Of course, sometimes background jobs really should keeping going after the script completes, but that's not the case this post is concerned with.) In the case that either the background jobs are used to do some background computation relevant to the script or the script can conceptually be thought of as a collection of processes, it makes sense for killing the script to also kill any background jobs it started.

The solution

At the start of the script, add

cleanup() {
    # kill all processes whose parent is this process
    pkill -P $$

for sig in INT QUIT HUP TERM; do
  trap "
    trap - $sig EXIT
    kill -s $sig "'"$$"' "$sig"
trap cleanup EXIT

If you really want to kill only jobs and not all child processes, use the kill_child_jobs() function from all.sh or look at the other versions in the kill-child-jobs repository.

The details

Running cleanup on exit

This StackExchange answer gives example code to run a cleanup function on exit, even if exiting due to being killed by a signal that would normally halt the script immediately (obviously except for SIGKILL):

for sig in INT QUIT HUP TERM ALRM USR1; do
  trap "
    trap - $sig EXIT
    kill -s $sig "'"$$"' "$sig"
trap cleanup EXIT

With how to run the cleanup function solved, we just have to figure out what to put in it.

Which processes to kill?

At a closer look, I realized the problem was underspecified.

The solution I ended up using kills all immediate child processes of the script by killing all processes whose parent is the script:

pkill -P $$

Another option is killing all descendants using rkill:

rkill $$

Alternatively, the OS provides various process grouping mechanisms, and you can kill all processes of the same group. These have various complications, including that if you don't make sure your script is the root of its own process group (or session, etc.), then that group may include parent processes of the script, not just child processes.

Lastly, as a variant of the first option, we can kill all child jobs using the shell's concept of "jobs" which are not quite the same as child processes.

Background jobs

Unix shells have a feature called job control for working with sub-processes they create. Normally a background job is created by running a command with & at the end:

/path/to/cmd some_arg &

The jobs builtin can be used to inspect the active jobs:

$ sleep 100 &
[1] 1730089
$ jobs
[1]+  Running                 sleep 100 &

As using & is the only way for a shell script to have an immediate child process unless the shell has run the process and is waiting for it to complete, generally the set of child processes and set of jobs is the same excepting foreground jobs which are generally short-running tasks, so it's probably not a big deal to let them complete. (Actually, bash includes foreground jobs in the list of jobs, although other shells do not.)

But complicating matters, bash and zsh have a builtin disown which allows for a job to be removed from the list of jobs without killing it, and therefore the set of jobs can be different from the set of background child processes. (And, to further add to the confusion, ksh has a builtin named disown but it has different semantics which do not include actually removing the job from the jobs list.) See this StackExchange answer for a more in-depth explanation.

Which gives another possible interpretation of killing all child jobs: it's possible that what we want is to kill all jobs which have not been disowned. Unfortunately, this quickly runs into differences between shells.

Killing all jobs in all shells

There is a short straightforward way to kill all jobs that works in both bash and zsh:

while kill %% 2>/dev/null; do sleep 0; done

(The original without the sleep 0 works in zsh but makes bash hang.)

Unfortunately, this doesn't actually kill the jobs in dash (the default shell for scripts in Debian and Ubuntu) and, worse, hangs in BusyBox's ash (the default shell on many embedded platforms).

Another alternative that almost works is

kill $(jobs -p)

It works in bash, but dash has a bug that requires the workaround of writing the output of jobs -p to a file and reading that file back. Then it works in every shell I tested except zsh where the jobs builtin does not have a -p option, requiring this workaround of parsing the output of jobs instead, which doesn't work in bash or dash. Combining those two solutions gives all.sh which works in all of the shells I tested in, but is much longer than the solutions that work in a subset of the shells.

Testing scripts

As figuring all this out involved a lot of running a bunch of scripts in a bunch of shells and checking their output, I wrote a script to do so and output the following table:

    ∞=script does not halt (after 1 second timeout)
    X=disown unsupported by shell
    ☠=all children killed
    πŸƒ=all children still running
    βœ”οΈ=expected result (job killed, disowned child alive)
bash sh ash dash zsh ksh
all.sh βœ”οΈ X☠ X☠ X☠ βœ”οΈ ☠
bash.sh βœ”οΈ XπŸƒ ∞X☠ XπŸƒ βœ”οΈ ☠
dash.sh βœ”οΈ X☠ X☠ X☠ πŸƒ ☠
noop.sh πŸƒ XπŸƒ XπŸƒ XπŸƒ πŸƒ πŸƒ
pkill-P.sh ☠ X☠ X☠ X☠ ☠ ☠
zsh.sh βˆžβœ”οΈ XπŸƒ ∞X☠ XπŸƒ βœ”οΈ ☠

test-all.sh prints the table and loops over the kill_child_jobs implementations and shells and runs test-kill-child-jobs.sh for each combination. That script uses the specified shell to run make-and-kill-child-jobs.sh which starts two jobs and disowns one of them before calling the specified kill_child_jobs implementation. The jobs it starts are wait_for_pid_exit.sh which is just a simple loop that constantly checks if the specified PID is dead (and therefore it outlived that process). test-kill-child-jobs.sh interprets the output of the script to determine which jobs outlived the script and prints the summary string to go into table.


Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.