RubyGems - async-container-supervisor - Versions diffs - 0.6.4 → 0.8.0 - Mend

async-container-supervisor 0.6.4 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
checksums.yaml.gz.sig +0 -0
data/bake/async/container/supervisor.rb +19 -0
data/context/getting-started.md +51 -13
data/context/index.yaml +8 -0
data/context/memory-monitor.md +129 -0
data/context/process-monitor.md +91 -0
data/lib/async/container/supervisor/client.rb +3 -0
data/lib/async/container/supervisor/connection.rb +117 -0
data/lib/async/container/supervisor/dispatchable.rb +8 -0
data/lib/async/container/supervisor/endpoint.rb +4 -0
data/lib/async/container/supervisor/environment.rb +11 -0
data/lib/async/container/supervisor/memory_monitor.rb +25 -1
data/lib/async/container/supervisor/process_monitor.rb +89 -0
data/lib/async/container/supervisor/server.rb +78 -1
data/lib/async/container/supervisor/service.rb +11 -0
data/lib/async/container/supervisor/supervised.rb +7 -0
data/lib/async/container/supervisor/version.rb +4 -1
data/lib/async/container/supervisor/worker.rb +86 -2
data/lib/async/container/supervisor.rb +1 -10
data/readme.md +13 -0
data/releases.md +9 -0
data.tar.gz.sig +0 -0
metadata +32 -1
metadata.gz.sig +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d8b6fc36efbaa903110cf96b7a36dad650dcd0f872699e04cd0f7f49e75eb492
-  data.tar.gz: b48de3c7330a916104fcfb451efd9de6957fb6760b55713fa35307c0c1959eff
+  metadata.gz: 39abccaf400a7b793d8f0094e32ccee9a4a9fdad6c6f570e361cace376ebd611
+  data.tar.gz: 2f135ee3b0979a16a899a07c760e8aeb46f2474635f9f2862c4eef43b7744961
 SHA512:
-  metadata.gz: e060d79b2dd7eb36eb10bf7866a62ed7bb62085dec6ec2b34829417a7f0c4e71ba3961868bcffe9732a3469b3233dcb669698d5bb4c73f1c369c7331dd4ad7ee
-  data.tar.gz: 0e27ebb20d7379e6584d6a6abbbe6afe82dc1053a78625faf28cb13e6358020ec1eeb4f53d72a623e636c23b3abeb4a76547c5edaa00d0777782190689f5c7f7
+  metadata.gz: ffe7ddc8855501a0c30e35e925596a2aaf262d34a373608e16882295603bb1137f30be80f533adcfc480c957208c96e42907006d0564270bf209763b7d3d81a5
+  data.tar.gz: 7b71f2cdcf3f75973fffdaf676f430e34270a6b1be3b684439e50f202854bffddf9ca8d4983a441d9e4fb01319414e80343e9f2a78e4698e964b01f94c71c58e

checksums.yaml.gz.sig CHANGED Viewed

Binary file

data/bake/async/container/supervisor.rb CHANGED Viewed

@@ -29,6 +29,25 @@ def status
 	end
 end
+# Sample memory allocations from a worker over a time period.
+#
+# This is useful for identifying memory leaks by tracking allocations
+# that are retained after garbage collection.
+#
+# @parameter duration [Integer] The duration in seconds to sample for (default: 10).
+# @parameter connection_id [String] The connection ID to target a specific worker.
+def memory_sample(duration: 10, connection_id:)
+	client do |connection|
+		Console.info(self, "Sampling memory from worker...", duration: duration, connection_id: connection_id)
+		# Build the operation request:
+		operation = {do: :memory_sample, duration: duration}
+		# Use the forward operation to proxy the request to a worker:
+		return connection.call(do: :forward, operation: operation, connection_id: connection_id)
+	end
+end
 private
 def endpoint

data/context/getting-started.md CHANGED Viewed

@@ -35,12 +35,6 @@ graph TD
     Worker1 -.->|connects via IPC| Supervisor
     Worker2 -.->|connects via IPC| Supervisor
     WorkerN -.->|connects via IPC| Supervisor
-    style Controller fill:#e1f5ff
-    style Supervisor fill:#fff4e1
-    style Worker1 fill:#e8f5e9
-    style Worker2 fill:#e8f5e9
-    style WorkerN fill:#e8f5e9
 ```
 **Important:** The supervisor process is itself just another process managed by the root controller. If the supervisor crashes, the controller will restart it, and all worker processes will automatically reconnect to the new supervisor. This design ensures high availability and fault tolerance.
@@ -115,7 +109,13 @@ This will start:
 ### Adding Health Monitors
-You can add monitors to detect and respond to unhealthy conditions. For example, to add a memory monitor:
+You can add monitors to observe worker health and automatically respond to issues. Monitors are useful for:
+- **Memory leak detection**: Automatically restart workers consuming excessive memory.
+- **Performance monitoring**: Track CPU and memory usage trends.
+- **Capacity planning**: Understand resource requirements.
+For example, to add monitoring:
 ```ruby
 service "supervisor" do
@@ -123,29 +123,67 @@ service "supervisor" do
 	monitors do
 		[
-			# Restart workers that exceed 500MB of memory:
+			# Log process metrics for observability:
+			Async::Container::Supervisor::ProcessMonitor.new(
+				interval: 60
+			),
+			# Restart workers exceeding memory limits:
 			Async::Container::Supervisor::MemoryMonitor.new(
-				interval: 10,  # Check every 10 seconds
-				limit: 1024 * 1024 * 500  # 500MB limit
+				interval: 10,
+				maximum_size_limit: 1024 * 1024 * 500  # 500MB limit per process
 			)
 		]
 	end
 end
 ```
-The {ruby Async::Container::Supervisor::MemoryMonitor} will periodically check worker memory usage and restart any workers that exceed the configured limit.
+See the {ruby Async::Container::Supervisor::MemoryMonitor Memory Monitor} and {ruby Async::Container::Supervisor::ProcessMonitor Process Monitor} guides for detailed configuration options and best practices.
 ### Collecting Diagnostics
 The supervisor can collect various diagnostics from workers on demand:
-- **Memory dumps**: Full heap dumps for memory analysis
-- **Thread dumps**: Stack traces of all threads
+- **Memory dumps**: Full heap dumps for memory analysis via `ObjectSpace.dump_all`.
+- **Memory samples**: Lightweight sampling to identify memory leaks.
+- **Thread dumps**: Stack traces of all threads.
 - **Scheduler dumps**: Async fiber hierarchy
 - **Garbage collection profiles**: GC performance data
 These can be triggered programmatically or via command-line tools (when available).
+#### Memory Leak Diagnosis
+To identify memory leaks, you can use the memory sampling feature which is much lighter weight than a full memory dump. It tracks allocations over a time period and focuses on retained objects.
+**Using the bake task:**
+```bash
+# Sample for 30 seconds and print report to console
+$ bake async:container:supervisor:memory_sample duration=30
+```
+**Programmatically:**
+```ruby
+# Assuming you have a connection to a worker:
+result = connection.call(do: :memory_sample, duration: 30)
+puts result[:data]
+```
+This will sample memory allocations for the specified duration, then force a garbage collection and return a JSON report showing what objects were allocated during that period and retained after GC. Late-lifecycle allocations that are retained are likely memory leaks.
+The JSON report includes:
+- `total_allocated`: Total allocated memory and count
+- `total_retained`: Total retained memory and count
+- `by_gem`: Breakdown by gem/library
+- `by_file`: Breakdown by source file
+- `by_location`: Breakdown by specific file:line locations
+- `by_class`: Breakdown by object class
+- `strings`: String allocation analysis
+This is much more efficient than `do: :memory_dump` which uses `ObjectSpace.dump_all` and can be slow and blocking on large heaps. The JSON format also makes it easy to integrate with monitoring and analysis tools.
 ## Advanced Usage
 ### Custom Monitors

data/context/index.yaml CHANGED Viewed

@@ -10,3 +10,11 @@ files:
   title: Getting Started
   description: This guide explains how to get started with `async-container-supervisor`
     to supervise and monitor worker processes in your Ruby applications.
+- path: memory-monitor.md
+  title: Memory Monitor
+  description: This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::MemoryMonitor</code>
+    to detect and restart workers that exceed memory limits or develop memory leaks.
+- path: process-monitor.md
+  title: Process Monitor
+  description: This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::ProcessMonitor</code>
+    to log CPU and memory metrics for your worker processes.

data/context/memory-monitor.md ADDED Viewed

@@ -0,0 +1,129 @@
+# Memory Monitor
+This guide explains how to use the {ruby Async::Container::Supervisor::MemoryMonitor} to detect and restart workers that exceed memory limits or develop memory leaks.
+## Overview
+Long-running worker processes often accumulate memory over time, either through legitimate growth or memory leaks. Without intervention, workers can consume all available system memory, causing performance degradation or system crashes. The `MemoryMonitor` solves this by automatically detecting and restarting problematic workers before they impact system stability.
+Use the `MemoryMonitor` when you need:
+- **Memory leak protection**: Automatically restart workers that continuously accumulate memory.
+- **Resource limits**: Enforce maximum memory usage per worker.
+- **System stability**: Prevent runaway processes from exhausting system memory.
+- **Leak diagnosis**: Capture memory samples when leaks are detected for debugging.
+The monitor uses the `memory-leak` gem to track process memory usage over time, detecting abnormal growth patterns that indicate leaks.
+## Usage
+Add a memory monitor to your supervisor service to automatically restart workers that exceed 500MB:
+```ruby
+service "supervisor" do
+	include Async::Container::Supervisor::Environment
+	monitors do
+		[
+			Async::Container::Supervisor::MemoryMonitor.new(
+				# Check worker memory every 10 seconds:
+				interval: 10,
+				# Restart workers exceeding 500MB:
+				maximum_size_limit: 1024 * 1024 * 500
+			)
+		]
+	end
+end
+```
+When a worker exceeds the limit:
+1. The monitor logs the leak detection.
+2. Optionally captures a memory sample for debugging.
+3. Sends `SIGINT` to gracefully shut down the worker.
+4. The container automatically spawns a replacement worker.
+## Configuration Options
+The `MemoryMonitor` accepts the following options:
+### `interval`
+The interval (in seconds) at which to check for memory leaks. Default: `10` seconds.
+```ruby
+Async::Container::Supervisor::MemoryMonitor.new(interval: 30)
+```
+### `maximum_size_limit`
+The maximum memory size (in bytes) per process. When a process exceeds this limit, it will be restarted.
+```ruby
+# 500MB limit
+Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 500)
+# 1GB limit
+Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 1024)
+```
+### `total_size_limit`
+The total size limit (in bytes) for all monitored processes combined. If not specified, only per-process limits are enforced.
+```ruby
+# Total limit of 2GB across all workers
+Async::Container::Supervisor::MemoryMonitor.new(
+	maximum_size_limit: 1024 * 1024 * 500,  # 500MB per process
+	total_size_limit: 1024 * 1024 * 1024 * 2  # 2GB total
+)
+```
+### `memory_sample`
+Options for capturing memory samples when a leak is detected. If `nil`, memory sampling is disabled.
+Default: `{duration: 30, timeout: 120}`
+```ruby
+# Customize memory sampling:
+Async::Container::Supervisor::MemoryMonitor.new(
+	memory_sample: {
+		duration: 60,  # Sample for 60 seconds
+		timeout: 180   # Timeout after 180 seconds
+	}
+)
+# Disable memory sampling:
+Async::Container::Supervisor::MemoryMonitor.new(
+	memory_sample: nil
+)
+```
+## Memory Leak Detection
+When a memory leak is detected, the monitor will:
+1. Log the leak detection with process details.
+2. If `memory_sample` is configured, capture a memory sample from the worker.
+3. Send a `SIGINT` signal to gracefully restart the worker.
+4. The container will automatically restart the worker process.
+### Memory Sampling
+When a memory leak is detected and `memory_sample` is configured, the monitor requests a lightweight memory sample from the worker. This sample:
+- Tracks allocations during the sampling period.
+- Forces a garbage collection.
+- Returns a JSON report showing retained objects.
+The report includes:
+- `total_allocated`: Total allocated memory and object count.
+- `total_retained`: Total retained memory and count after GC.
+- `by_gem`: Breakdown by gem/library.
+- `by_file`: Breakdown by source file.
+- `by_location`: Breakdown by specific file:line locations.
+- `by_class`: Breakdown by object class.
+- `strings`: String allocation analysis.
+This is much more efficient than a full heap dump using `ObjectSpace.dump_all`.

data/context/process-monitor.md ADDED Viewed

@@ -0,0 +1,91 @@
+# Process Monitor
+This guide explains how to use the {ruby Async::Container::Supervisor::ProcessMonitor} to log CPU and memory metrics for your worker processes.
+## Overview
+Understanding how your workers consume resources over time is essential for performance optimization, capacity planning, and debugging. Without visibility into CPU and memory usage, you can't identify bottlenecks, plan infrastructure scaling, or diagnose production issues effectively.
+The `ProcessMonitor` provides this observability by periodically capturing and logging comprehensive metrics for your entire application process tree.
+Use the `ProcessMonitor` when you need:
+- **Performance analysis**: Identify which workers consume the most CPU or memory.
+- **Capacity planning**: Determine optimal worker counts and memory requirements.
+- **Trend monitoring**: Track resource usage patterns over time.
+- **Debugging assistance**: Correlate resource usage with application behavior.
+- **Cost optimization**: Right-size infrastructure based on actual usage.
+Unlike the {ruby Async::Container::Supervisor::MemoryMonitor}, which takes action when limits are exceeded, the `ProcessMonitor` is purely observational - it logs metrics without interfering with worker processes.
+## Usage
+Add a process monitor to log resource usage every minute:
+```ruby
+service "supervisor" do
+	include Async::Container::Supervisor::Environment
+	monitors do
+		[
+			# Log CPU and memory metrics for all processes:
+			Async::Container::Supervisor::ProcessMonitor.new(
+				interval: 60  # Capture metrics every minute
+			)
+		]
+	end
+end
+```
+This allows you to easily search and filter by specific fields:
+- `general.process_id = 12347` - Find metrics for a specific process.
+- `general.command = "worker-1"` - Find all metrics for worker processes.
+- `general.processor_utilization > 50` - Find high CPU usage processes.
+- `general.resident_size > 500000` - Find processes using more than 500MB.
+## Configuration Options
+### `interval`
+The interval (in seconds) at which to capture and log process metrics. Default: `60` seconds.
+```ruby
+# Log every 30 seconds
+Async::Container::Supervisor::ProcessMonitor.new(interval: 30)
+# Log every 5 minutes
+Async::Container::Supervisor::ProcessMonitor.new(interval: 300)
+```
+## Captured Metrics
+The `ProcessMonitor` captures the following metrics for each process:
+### Core Metrics
+- **process_id**: Unique identifier for the process.
+- **parent_process_id**: The parent process that spawned this one.
+- **process_group_id**: Process group identifier.
+- **command**: The command name.
+- **processor_utilization**: CPU usage percentage.
+- **resident_size**: Physical memory used (KB).
+- **total_size**: Total memory space including shared memory (KB).
+- **processor_time**: Total CPU time used (seconds).
+- **elapsed_time**: How long the process has been running (seconds).
+### Detailed Memory Metrics
+When available (OS-dependent), additional memory details are captured:
+- **map_count**: Number of memory mappings (stacks, libraries, etc.).
+- **proportional_size**: Memory usage accounting for shared memory (KB).
+- **shared_clean_size**: Unmodified shared memory (KB).
+- **shared_dirty_size**: Modified shared memory (KB).
+- **private_clean_size**: Unmodified private memory (KB).
+- **private_dirty_size**: Modified private memory (KB).
+- **referenced_size**: Active page-cache (KB).
+- **anonymous_size**: Memory not backed by files (KB)
+- **swap_size**: Memory swapped to disk (KB).
+- **proportional_swap_size**: Proportional swap usage (KB).
+- **major_faults**: The number of page faults requiring I/O.
+- **minor_faults**: The number of page faults that don't require I/O (e.g. CoW).

data/lib/async/container/supervisor/client.rb CHANGED Viewed

@@ -11,6 +11,9 @@ module Async
 		module Supervisor
 			# A client provides a mechanism to connect to a supervisor server in order to execute operations.
 			class Client
+				# Initialize a new client.
+				#
+				# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
 				def initialize(endpoint: Supervisor.endpoint)
 					@endpoint = endpoint
 				end

data/lib/async/container/supervisor/connection.rb CHANGED Viewed

@@ -8,8 +8,19 @@ require "json"
 module Async
 	module Container
 		module Supervisor
+			# Represents a bidirectional communication channel between supervisor and worker.
+			#
+			# Handles message passing, call/response patterns, and connection lifecycle.
 			class Connection
+				# Represents a remote procedure call over a connection.
+				#
+				# Manages the call lifecycle, response queueing, and completion signaling.
 				class Call
+					# Initialize a new call.
+					#
+					# @parameter connection [Connection] The connection this call belongs to.
+					# @parameter id [Integer] The unique call identifier.
+					# @parameter message [Hash] The call message/parameters.
 					def initialize(connection, id, message)
 						@connection = connection
 						@id = id
@@ -18,10 +29,16 @@ module Async
 						@queue = ::Thread::Queue.new
 					end
+					# Convert the call to a JSON-compatible hash.
+					#
+					# @returns [Hash] The message hash.
 					def as_json(...)
 						@message
 					end
+					# Convert the call to a JSON string.
+					#
+					# @returns [String] The JSON representation.
 					def to_json(...)
 						as_json.to_json(...)
 					end
@@ -32,14 +49,24 @@ module Async
 					# @attribute [Hash] The message that initiated the call.
 					attr :message
+					# Access a parameter from the call message.
+					#
+					# @parameter key [Symbol] The parameter name.
+					# @returns [Object] The parameter value.
 					def [] key
 						@message[key]
 					end
+					# Push a response into the call's queue.
+					#
+					# @parameter response [Hash] The response data to push.
 					def push(**response)
 						@queue.push(response)
 					end
+					# Pop a response from the call's queue.
+					#
+					# @returns [Hash, nil] The next response or nil if queue is closed.
 					def pop(...)
 						@queue.pop(...)
 					end
@@ -49,12 +76,20 @@ module Async
 						@queue.close
 					end
+					# Iterate over all responses from the call.
+					#
+					# @yields {|response| ...} Each response from the queue.
 					def each(&block)
 						while response = self.pop
 							yield response
 						end
 					end
+					# Finish the call with a final response.
+					#
+					# Closes the response queue after pushing the final response.
+					#
+					# @parameter response [Hash] The final response data.
 					def finish(**response)
 						# If the remote end has already closed the connection, we don't need to send a finished message:
 						unless @queue.closed?
@@ -63,14 +98,51 @@ module Async
 						end
 					end
+					# Finish the call with a failure response.
+					#
+					# @parameter response [Hash] The error response data.
 					def fail(**response)
 						self.finish(failed: true, **response)
 					end
+					# Check if the call's queue is closed.
+					#
+					# @returns [Boolean] True if the queue is closed.
 					def closed?
 						@queue.closed?
 					end
+					# Forward this call to another connection, proxying all responses back.
+					#
+					# This provides true streaming forwarding - intermediate responses flow through
+					# in real-time rather than being buffered. The forwarding runs asynchronously
+					# to avoid blocking the dispatcher.
+					#
+					# @parameter target_connection [Connection] The connection to forward the call to.
+					# @parameter operation [Hash] The operation request to forward (must include :do key).
+					def forward(target_connection, operation)
+						# Forward the operation in an async task to avoid blocking
+						Async do
+							# Make the call to the target connection and stream responses back:
+							Call.call(target_connection, **operation) do |response|
+								# Push each response through our queue:
+								self.push(**response)
+							end
+						ensure
+							# Close our queue to signal completion:
+							@queue.close
+						end
+					end
+					# Dispatch a call to a target handler.
+					#
+					# Creates a call, dispatches it to the target, and streams responses back
+					# through the connection.
+					#
+					# @parameter connection [Connection] The connection to dispatch on.
+					# @parameter target [Dispatchable] The target handler.
+					# @parameter id [Integer] The call identifier.
+					# @parameter message [Hash] The call message.
 					def self.dispatch(connection, target, id, message)
 						Async do
 							call = self.new(connection, id, message)
@@ -91,6 +163,15 @@ module Async
 						end
 					end
+					# Make a call on a connection and wait for responses.
+					#
+					# If a block is provided, yields each response. Otherwise, buffers intermediate
+					# responses and returns the final response.
+					#
+					# @parameter connection [Connection] The connection to call on.
+					# @parameter message [Hash] The call message/parameters.
+					# @yields {|response| ...} Each intermediate response if block given.
+					# @returns [Hash, Array] The final response or array of intermediate responses.
 					def self.call(connection, **message, &block)
 						id = connection.next_id
 						call = self.new(connection, id, message)
@@ -128,6 +209,11 @@ module Async
 					end
 				end
+				# Initialize a new connection.
+				#
+				# @parameter stream [IO] The underlying IO stream.
+				# @parameter id [Integer] The starting call ID (default: 0).
+				# @parameter state [Hash] Initial connection state.
 				def initialize(stream, id = 0, **state)
 					@stream = stream
 					@id = id
@@ -143,15 +229,26 @@ module Async
 				# @attribute [Hash(Symbol, Object)] State associated with this connection, for example the process ID, etc.
 				attr_accessor :state
+				# Generate the next unique call ID.
+				#
+				# @returns [Integer] The next call identifier.
 				def next_id
 					@id += 2
 				end
+				# Write a message to the connection stream.
+				#
+				# @parameter message [Hash] The message to write.
 				def write(**message)
 					@stream.write(JSON.dump(message) << "\n")
 					@stream.flush
 				end
+				# Make a synchronous call and wait for a single response.
+				#
+				# @parameter timeout [Numeric, nil] Optional timeout for the call.
+				# @parameter message [Hash] The call message.
+				# @returns [Hash] The response.
 				def call(timeout: nil, **message)
 					id = next_id
 					calls[id] = ::Thread::Queue.new
@@ -163,22 +260,34 @@ module Async
 					calls.delete(id)
 				end
+				# Read a message from the connection stream.
+				#
+				# @returns [Hash, nil] The parsed message or nil if stream is closed.
 				def read
 					if line = @stream&.gets
 						JSON.parse(line, symbolize_names: true)
 					end
 				end
+				# Iterate over all messages from the connection.
+				#
+				# @yields {|message| ...} Each message read from the stream.
 				def each
 					while message = self.read
 						yield message
 					end
 				end
+				# Make a synchronous call and wait for a single response.
 				def call(...)
 					Call.call(self, ...)
 				end
+				# Run the connection, processing incoming messages.
+				#
+				# Dispatches incoming calls to the target and routes responses to waiting calls.
+				#
+				# @parameter target [Dispatchable] The target to dispatch calls to.
 				def run(target)
 					self.each do |message|
 						if id = message.delete(:id)
@@ -198,12 +307,20 @@ module Async
 					end
 				end
+				# Run the connection in a background task.
+				#
+				# @parameter target [Dispatchable] The target to dispatch calls to.
+				# @parameter parent [Async::Task] The parent task.
+				# @returns [Async::Task] The background reader task.
 				def run_in_background(target, parent: Task.current)
 					@reader ||= parent.async do
 						self.run(target)
 					end
 				end
+				# Close the connection and clean up resources.
+				#
+				# Stops the background reader, closes the stream, and closes all pending calls.
 				def close
 					if @reader
 						@reader.stop

data/lib/async/container/supervisor/dispatchable.rb CHANGED Viewed

@@ -9,7 +9,15 @@ require_relative "endpoint"
 module Async
 	module Container
 		module Supervisor
+			# A mixin for objects that can dispatch calls.
+			#
+			# Provides automatic method dispatch based on the call's `:do` parameter.
 			module Dispatchable
+				# Dispatch a call to the appropriate method.
+				#
+				# Routes calls to methods named `do_#{operation}` based on the call's `:do` parameter.
+				#
+				# @parameter call [Connection::Call] The call to dispatch.
 				def dispatch(call)
 					method_name = "do_#{call.message[:do]}"
 					self.public_send(method_name, call)

data/lib/async/container/supervisor/endpoint.rb CHANGED Viewed

@@ -8,6 +8,10 @@ require "io/endpoint/unix_endpoint"
 module Async
 	module Container
 		module Supervisor
+			# Get the supervisor IPC endpoint.
+			#
+			# @parameter path [String] The path for the Unix socket (default: "supervisor.ipc").
+			# @returns [IO::Endpoint] The Unix socket endpoint.
 			def self.endpoint(path = "supervisor.ipc")
 				::IO::Endpoint.unix(path)
 			end

data/lib/async/container/supervisor/environment.rb CHANGED Viewed

@@ -10,6 +10,9 @@ require_relative "service"
 module Async
 	module Container
 		module Supervisor
+			# An environment mixin for supervisor services.
+			#
+			# Provides configuration and setup for supervisor processes that monitor workers.
 			module Environment
 				# The service class to use for the supervisor.
 				# @returns [Class]
@@ -40,10 +43,18 @@ module Async
 					{restart: true, count: 1, health_check_timeout: 30}
 				end
+				# Get the list of monitors to run in the supervisor.
+				#
+				# Override this method to provide custom monitors.
+				#
+				# @returns [Array] The list of monitor instances.
 				def monitors
 					[]
 				end
+				# Create the supervisor server instance.
+				#
+				# @returns [Server] The supervisor server.
 				def make_server(endpoint)
 					Server.new(endpoint: endpoint, monitors: self.monitors)
 				end

data/lib/async/container/supervisor/memory_monitor.rb CHANGED Viewed

@@ -9,16 +9,23 @@ require "set"
 module Async
 	module Container
 		module Supervisor
+			# Monitors worker memory usage and restarts workers that exceed limits.
+			#
+			# Uses the `memory` gem to track process memory and detect leaks.
 			class MemoryMonitor
+				MEMORY_SAMPLE = {duration: 30, timeout: 30*4}
 				# Create a new memory monitor.
 				#
 				# @parameter interval [Integer] The interval at which to check for memory leaks.
 				# @parameter total_size_limit [Integer] The total size limit of all processes, or nil for no limit.
 				# @parameter options [Hash] Options to pass to the cluster when adding processes.
-				def initialize(interval: 10, total_size_limit: nil, **options)
+				def initialize(interval: 10, total_size_limit: nil, memory_sample: MEMORY_SAMPLE, **options)
 					@interval = interval
 					@cluster = Memory::Leak::Cluster.new(total_size_limit: total_size_limit)
+					@memory_sample = memory_sample
 					# We use these options when adding processes to the cluster:
 					@options = options
@@ -74,6 +81,23 @@ module Async
 				# @parameter monitor [Memory::Leak::Monitor] The monitor that detected the memory leak.
 				# @returns [Boolean] True if the process was killed.
 				def memory_leak_detected(process_id, monitor)
+					Console.info(self, "Memory leak detected!", child: {process_id: process_id}, monitor: monitor)
+					if @memory_sample
+						Console.info(self, "Capturing memory sample...", child: {process_id: process_id}, memory_sample: @memory_sample)
+						# We are tracking multiple connections to the same process:
+						connections = @processes[process_id]
+						# Try to capture a memory sample:
+						connections.each do |connection|
+							result = connection.call(do: :memory_sample, **@memory_sample)
+							Console.info(self, "Memory sample completed:", child: {process_id: process_id}, result: result)
+						end
+					end
+					# Kill the process gently:
 					Console.info(self, "Killing process!", child: {process_id: process_id})
 					Process.kill(:INT, process_id)

data/lib/async/container/supervisor/process_monitor.rb ADDED Viewed

@@ -0,0 +1,89 @@
+# frozen_string_literal: true
+# Released under the MIT License.
+# Copyright, 2025, by Samuel Williams.
+require "process/metrics"
+module Async
+	module Container
+		module Supervisor
+			# Monitors process metrics and logs them periodically.
+			#
+			# Uses the `process-metrics` gem to capture CPU and memory metrics for a process tree.
+			# Unlike {MemoryMonitor}, this monitor captures metrics for the entire process tree
+			# by tracking the parent process ID (ppid), which is more efficient than tracking
+			# individual processes.
+			class ProcessMonitor
+			# Create a new process monitor.
+			#
+			# @parameter interval [Integer] The interval in seconds at which to log process metrics.
+			# @parameter ppid [Integer] The parent process ID to monitor. If nil, uses the current process to capture its children.
+			def initialize(interval: 60, ppid: nil)
+				@interval = interval
+				@ppid = ppid || Process.ppid
+			end
+				# @attribute [Integer] The parent process ID being monitored.
+				attr :ppid
+				# Register a connection with the process monitor.
+				#
+				# This is provided for consistency with {MemoryMonitor}, but since we monitor
+				# the entire process tree via ppid, we don't need to track individual connections.
+				#
+				# @parameter connection [Connection] The connection to register.
+				def register(connection)
+					Console.debug(self, "Connection registered.", connection: connection, state: connection.state)
+				end
+				# Remove a connection from the process monitor.
+				#
+				# This is provided for consistency with {MemoryMonitor}, but since we monitor
+				# the entire process tree via ppid, we don't need to track individual connections.
+				#
+				# @parameter connection [Connection] The connection to remove.
+				def remove(connection)
+					Console.debug(self, "Connection removed.", connection: connection, state: connection.state)
+				end
+				# Capture current process metrics for the entire process tree.
+				#
+				# @returns [Hash] A hash mapping process IDs to their metrics.
+				def metrics
+					Process::Metrics::General.capture(ppid: @ppid)
+				end
+				# Dump the current status of the process monitor.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				def status(call)
+					metrics = self.metrics
+					call.push(process_monitor: {ppid: @ppid, metrics: metrics})
+				end
+				# Run the process monitor.
+				#
+				# Periodically captures and logs process metrics for the entire process tree.
+				#
+				# @returns [Async::Task] The task that is running the process monitor.
+			def run
+				Async do
+					while true
+						metrics = self.metrics
+						# Log each process individually for better searchability in log platforms:
+						metrics.each do |process_id, general|
+							Console.info(self, "Process metrics captured.", general: general)
+						end
+						sleep(@interval)
+					end
+				end
+			end
+			end
+		end
+	end
+end

data/lib/async/container/supervisor/server.rb CHANGED Viewed

@@ -3,6 +3,8 @@
 # Released under the MIT License.
 # Copyright, 2025, by Samuel Williams.
+require "securerandom"
 require_relative "connection"
 require_relative "endpoint"
 require_relative "dispatchable"
@@ -14,18 +16,36 @@ module Async
 			#
 			# There are various tasks that can be executed by the server, such as restarting the process group, and querying the status of the processes. The server is also responsible for managing the lifecycle of the monitors, which can be used to monitor the status of the connected workers.
 			class Server
+				# Initialize a new supervisor server.
+				#
+				# @parameter monitors [Array] The monitors to run.
+				# @parameter endpoint [IO::Endpoint] The endpoint to listen on.
 				def initialize(monitors: [], endpoint: Supervisor.endpoint)
 					@monitors = monitors
 					@endpoint = endpoint
+					@connections = {}
 				end
 				attr :monitors
+				attr :connections
 				include Dispatchable
+				# Register a worker connection with the supervisor.
+				#
+				# Assigns a unique connection ID and notifies all monitors of the new connection.
+				#
+				# @parameter call [Connection::Call] The registration call.
+				# @parameter call[:state] [Hash] The worker state to merge (e.g. process_id).
 				def do_register(call)
 					call.connection.state.merge!(call.message[:state])
+					connection_id = SecureRandom.uuid
+					call.connection.state[:connection_id] = connection_id
+					@connections[connection_id] = call.connection
 					@monitors.each do |monitor|
 						monitor.register(call.connection)
 					rescue => error
@@ -35,6 +55,35 @@ module Async
 					call.finish
 				end
+				# Forward an operation to a worker connection.
+				#
+				# This allows clients to invoke operations on specific worker processes by
+				# providing a connection_id. The operation is proxied through to the worker
+				# and responses are streamed back to the client.
+				#
+				# @parameter call [Connection::Call] The call to handle.
+				# @parameter call[:operation] [Hash] The operation to forward, must include :do key.
+				# @parameter call[:connection_id] [String] The connection ID to target.
+				def do_forward(call)
+					operation = call[:operation]
+					connection_id = call[:connection_id]
+					unless connection_id
+						call.fail(error: "Missing 'connection_id' parameter")
+						return
+					end
+					connection = @connections[connection_id]
+					unless connection
+						call.fail(error: "Connection not found", connection_id: connection_id)
+						return
+					end
+					# Forward the call to the target connection
+					call.forward(connection, operation)
+				end
 				# Restart the current process group, usually including the supervisor and any other processes.
 				#
 				# @parameter signal [Symbol] The signal to send to the process group.
@@ -47,15 +96,38 @@ module Async
 					::Process.kill(signal, ::Process.ppid)
 				end
+				# Query the status of the supervisor and all connected workers.
+				#
+				# Returns information about all registered connections and delegates to
+				# monitors to provide additional status information.
+				#
+				# @parameter call [Connection::Call] The status call.
 				def do_status(call)
+					connections = @connections.map do |connection_id, connection|
+						{
+							connection_id: connection_id,
+							process_id: connection.state[:process_id],
+							state: connection.state,
+						}
+					end
 					@monitors.each do |monitor|
 						monitor.status(call)
 					end
-					call.finish
+					call.finish(connections: connections)
 				end
+				# Remove a worker connection from the supervisor.
+				#
+				# Notifies all monitors and removes the connection from tracking.
+				#
+				# @parameter connection [Connection] The connection to remove.
 				def remove(connection)
+					if connection_id = connection.state[:connection_id]
+						@connections.delete(connection_id)
+					end
 					@monitors.each do |monitor|
 						monitor.remove(connection)
 					rescue => error
@@ -63,6 +135,11 @@ module Async
 					end
 				end
+				# Run the supervisor server.
+				#
+				# Starts all monitors and accepts connections from workers.
+				#
+				# @parameter parent [Async::Task] The parent task to run under.
 				def run(parent: Task.current)
 					parent.async do |task|
 						@monitors.each do |monitor|

data/lib/async/container/supervisor/service.rb CHANGED Viewed

@@ -10,6 +10,9 @@ require "io/endpoint/bound_endpoint"
 module Async
 	module Container
 		module Supervisor
+			# The supervisor service implementation.
+			#
+			# Manages the lifecycle of the supervisor server and its monitors.
 			class Service < Async::Service::Generic
 				# Initialize the supervisor using the given environment.
 				# @parameter environment [Build::Environment]
@@ -32,10 +35,18 @@ module Async
 					super
 				end
+				# Get the name of the supervisor service.
+				#
+				# @returns [String] The service name.
 				def name
 					@evaluator.name
 				end
+				# Set up the supervisor service in the container.
+				#
+				# Creates and runs the supervisor server with configured monitors.
+				#
+				# @parameter container [Async::Container::Generic] The container to set up in.
 				def setup(container)
 					container_options = @evaluator.container_options
 					health_check_timeout = container_options[:health_check_timeout]

data/lib/async/container/supervisor/supervised.rb CHANGED Viewed

@@ -8,6 +8,9 @@ require "async/service/environment"
 module Async
 	module Container
 		module Supervisor
+			# An environment mixin for supervised worker services.
+			#
+			# Enables workers to connect to and be supervised by the supervisor.
 			module Supervised
 				# The IPC path to use for communication with the supervisor.
 				# @returns [String]
@@ -21,6 +24,10 @@ module Async
 					::IO::Endpoint.unix(supervisor_ipc_path)
 				end
+				# Create a supervised worker for the given instance.
+				#
+				# @parameter instance [Async::Container::Instance] The container instance.
+				# @returns [Worker] The worker client.
 				def make_supervised_worker(instance)
 					Worker.new(instance, endpoint: supervisor_endpoint)
 				end

data/lib/async/container/supervisor/version.rb CHANGED Viewed

@@ -3,10 +3,13 @@
 # Released under the MIT License.
 # Copyright, 2025, by Samuel Williams.
+# @namespace
 module Async
+	# @namespace
 	module Container
+		# @namespace
 		module Supervisor
-			VERSION = "0.6.4"
+			VERSION = "0.8.0"
 		end
 	end
 end

data/lib/async/container/supervisor/worker.rb CHANGED Viewed

@@ -13,10 +13,18 @@ module Async
 			#
 			# There are various tasks that can be executed by the worker, such as dumping memory, threads, and garbage collection profiles.
 			class Worker < Client
+				# Run a worker with the given state.
+				#
+				# @parameter state [Hash] The worker state (e.g. process_id, instance info).
+				# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
 				def self.run(...)
 					self.new(...).run
 				end
+				# Initialize a new worker.
+				#
+				# @parameter state [Hash] The worker state to register with the supervisor.
+				# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
 				def initialize(state, endpoint: Supervisor.endpoint)
 					@state = state
 					@endpoint = endpoint
@@ -39,12 +47,25 @@ module Async
 					end
 				end
+				# Dump the current fiber scheduler hierarchy.
+				#
+				# Generates a hierarchical view of all running fibers and their relationships.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				# @parameter call[:path] [String] Optional file path to save the dump.
 				def do_scheduler_dump(call)
 					dump(call) do |file|
 						Fiber.scheduler.print_hierarchy(file)
 					end
 				end
+				# Dump the entire object space to a file.
+				#
+				# This is a heavyweight operation that dumps all objects in the heap.
+				# Consider using {do_memory_sample} for lighter weight memory leak detection.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				# @parameter call[:path] [String] Optional file path to save the dump.
 				def do_memory_dump(call)
 					require "objspace"
@@ -53,6 +74,58 @@ module Async
 					end
 				end
+				# Sample memory allocations over a time period to identify potential leaks.
+				#
+				# This method is much lighter weight than {do_memory_dump} and focuses on
+				# retained objects allocated during the sampling period. Late-lifecycle
+				# allocations that are retained are likely memory leaks.
+				#
+				# The method samples allocations for the specified duration, forces a garbage
+				# collection, and returns a JSON report showing allocated vs retained memory
+				# broken down by gem, file, location, and class.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				# @parameter call[:duration] [Numeric] The duration in seconds to sample for.
+				def do_memory_sample(call)
+					require "memory"
+					unless duration = call[:duration] and duration.positive?
+						raise ArgumentError, "Positive duration is required!"
+					end
+					Console.info(self, "Starting memory sampling...", duration: duration)
+					# Create a sampler to track allocations
+					sampler = Memory::Sampler.new
+					# Start sampling
+					sampler.start
+					# Sample for the specified duration
+					sleep(duration)
+					# Stop sampling
+					sampler.stop
+					report = sampler.report
+					# This is a temporary log to help with debugging:
+					buffer = StringIO.new
+					report.print(buffer)
+					Console.info(self, "Memory sample completed.", report: buffer.string)
+					# Generate a report focused on retained objects (likely leaks):
+					call.finish(report: report)
+				ensure
+					GC.start
+				end
+				# Dump information about all running threads.
+				#
+				# Includes thread inspection and backtraces for debugging.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				# @parameter call[:path] [String] Optional file path to save the dump.
 				def do_thread_dump(call)
 					dump(call) do |file|
 						Thread.list.each do |thread|
@@ -62,17 +135,28 @@ module Async
 					end
 				end
+				# Start garbage collection profiling.
+				#
+				# Enables the GC profiler to track garbage collection performance.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
 				def do_garbage_profile_start(call)
 					GC::Profiler.enable
 					call.finish(started: true)
 				end
+				# Stop garbage collection profiling and return results.
+				#
+				# Disables the GC profiler and returns collected profiling data.
+				#
+				# @parameter call [Connection::Call] The call to respond to.
+				# @parameter call[:path] [String] Optional file path to save the profile.
 				def do_garbage_profile_stop(call)
-					GC::Profiler.disable
 					dump(connection, message) do |file|
 						file.puts GC::Profiler.result
 					end
+				ensure
+					GC::Profiler.disable
 				end
 				protected def connected!(connection)

data/lib/async/container/supervisor.rb CHANGED Viewed

@@ -10,16 +10,7 @@ require_relative "supervisor/worker"
 require_relative "supervisor/client"
 require_relative "supervisor/memory_monitor"
+require_relative "supervisor/process_monitor"
 require_relative "supervisor/environment"
 require_relative "supervisor/supervised"
-# @namespace
-module Async
-	# @namespace
-	module Container
-		# @namespace
-		module Supervisor
-		end
-	end
-end

data/readme.md CHANGED Viewed

@@ -18,10 +18,23 @@ Please see the [project documentation](https://socketry.github.io/async-containe
   - [Getting Started](https://socketry.github.io/async-container-supervisor/guides/getting-started/index) - This guide explains how to get started with `async-container-supervisor` to supervise and monitor worker processes in your Ruby applications.
+  - [Memory Monitor](https://socketry.github.io/async-container-supervisor/guides/memory-monitor/index) - This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::MemoryMonitor</code> to detect and restart workers that exceed memory limits or develop memory leaks.
+  - [Process Monitor](https://socketry.github.io/async-container-supervisor/guides/process-monitor/index) - This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::ProcessMonitor</code> to log CPU and memory metrics for your worker processes.
 ## Releases
 Please see the [project releases](https://socketry.github.io/async-container-supervisor/releases/index) for all releases.
+### v0.8.0
+  - Add `Async::Container::Supervisor::ProcessMonitor` for logging CPU and memory metrics periodically.
+  - Fix documentation to use correct `maximum_size_limit:` parameter name for `MemoryMonitor` (was incorrectly documented as `limit:`).
+### v0.7.0
+  - If a memory leak is detected, sample memory usage for 60 seconds before exiting.
 ### v0.6.4
   - Make client task (in supervised worker) transient, so that it doesn't keep the reactor alive unnecessarily. It also won't be stopped by default when SIGINT is received, so that the worker will remain connected to the supervisor until the worker is completely terminated.

data/releases.md CHANGED Viewed

@@ -1,5 +1,14 @@
 # Releases
+## v0.8.0
+  - Add `Async::Container::Supervisor::ProcessMonitor` for logging CPU and memory metrics periodically.
+  - Fix documentation to use correct `maximum_size_limit:` parameter name for `MemoryMonitor` (was incorrectly documented as `limit:`).
+## v0.7.0
+  - If a memory leak is detected, sample memory usage for 60 seconds before exiting.
 ## v0.6.4
   - Make client task (in supervised worker) transient, so that it doesn't keep the reactor alive unnecessarily. It also won't be stopped by default when SIGINT is received, so that the worker will remain connected to the supervisor until the worker is completely terminated.

data.tar.gz.sig CHANGED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: async-container-supervisor
 version: !ruby/object:Gem::Version
-  version: 0.6.4
+  version: 0.8.0
 platform: ruby
 authors:
 - Samuel Williams
@@ -66,6 +66,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: memory
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.7'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.7'
 - !ruby/object:Gem::Dependency
   name: memory-leak
   requirement: !ruby/object:Gem::Requirement
@@ -80,6 +94,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.5'
+- !ruby/object:Gem::Dependency
+  name: process-metrics
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 executables: []
 extensions: []
 extra_rdoc_files: []
@@ -87,6 +115,8 @@ files:
 - bake/async/container/supervisor.rb
 - context/getting-started.md
 - context/index.yaml
+- context/memory-monitor.md
+- context/process-monitor.md
 - lib/async/container/supervisor.rb
 - lib/async/container/supervisor/client.rb
 - lib/async/container/supervisor/connection.rb
@@ -94,6 +124,7 @@ files:
 - lib/async/container/supervisor/endpoint.rb
 - lib/async/container/supervisor/environment.rb
 - lib/async/container/supervisor/memory_monitor.rb
+- lib/async/container/supervisor/process_monitor.rb
 - lib/async/container/supervisor/server.rb
 - lib/async/container/supervisor/service.rb
 - lib/async/container/supervisor/supervised.rb

metadata.gz.sig CHANGED Viewed

Binary file