199
behavioral concurrency

Supervisor-Worker

Reference Wikipedia ↗
Supervisor-Worker — sequence diagram
Plate 199 sequence diagram

The Supervisor-Worker pattern addresses the challenges of managing and maintaining long-running processes or tasks. A Supervisor component is responsible for monitoring and controlling one or more Worker components. The Workers perform the actual work, while the Supervisor ensures that Workers stay alive, restarts them if they fail, and handles failures gracefully. This separation of concerns enhances the reliability and resilience of the system.

Usage

This pattern is widely used in distributed systems, microservices architectures, and any scenario requiring asynchronous task processing with guaranteed execution. Specifically, it’s beneficial when: tasks are time-consuming, workers may encounter unpredictable failures, resilience is crucial for system stability, and monitoring/control of worker state is needed. Common applications include background job processing, data ingestion pipelines, and managing worker nodes in a cluster.

Examples

  1. Kubernetes: Kubernetes utilizes the Supervisor-Worker pattern extensively. The Control Plane (Supervisor) manages Pods (Workers). If a Pod crashes, the Control Plane automatically restarts it, ensuring the desired number of replicas are always running. Health probes define the criteria for determining worker failure.

  2. Celery (Python): Celery is a distributed task queue system. Celery’s worker processes execute tasks, and a Celery broker (often Redis or RabbitMQ) combined with the Celery client (Supervisor) manages the workers. If a worker becomes unresponsive, the Celery client detects this and restarts it, or spawns a new one. The Supervisor also handles task distribution and result retrieval.

  3. Systemd (Linux): Systemd is a system and service manager for Linux. It functions as a Supervisor, managing services (Workers). Systemd defines configurations for each service, including restart policies (e.g., “on-failure”) that dictate how a service should be handled if it terminates unexpectedly, effectively embodying the Supervisor-Worker pattern at the OS level.

Specimens

10 implementations
Specimen 199.01 Kotlin View specimen ↗

The Supervisor-Worker pattern manages a team of worker coroutines from a supervising coroutine. The supervisor ensures that if a worker fails, it’s restarted, and any child workers of the failed worker are cancelled. This prevents cascading failures and maintains application stability. The Kotlin implementation utilizes supervisorScope to create the supervising context. Each worker is launched as a launch coroutine within this scope. The supervising scope handles the cancellation and retry logic automatically when workers fail due to exceptions. This is idiomatic Kotlin because it leverages Kotlin’s coroutines for asynchronous task management and its built-in error handling capabilities while keeping the code concise and readable.

import kotlinx.coroutines.*

fun main() = runBlocking {
    supervisorScope {
        val worker1 = launch {
            try {
                repeat(5) {
                    println("Worker 1: Doing work ${it + 1}")
                    delay(500)
                    if (it == 2) throw Exception("Worker 1 failed!")
                }
                println("Worker 1: Finished")
            } catch (e: Exception) {
                println("Worker 1: Error - ${e.message}")
            }
        }

        val worker2 = launch {
            println("Worker 2: Starting")
            delay(1000)
            println("Worker 2: Finished")
        }

        val worker3 = launch {
            try {
                repeat(3) {
                    println("Worker 3: Doing work ${it + 1}")
                    delay(750)
                }
                println("Worker 3: Finished")
            } catch (e: Exception) {
                println("Worker 3: Error - ${e.message}")
            }
        }

        worker1.join()
        worker2.join()
        worker3.join()
    }
    println("All workers completed (or supervisor finished handling failures).")
}