async-container-supervisor 0.7.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/context/getting-started.md +16 -11
- data/context/index.yaml +8 -0
- data/context/memory-monitor.md +129 -0
- data/context/process-monitor.md +91 -0
- data/lib/async/container/supervisor/client.rb +3 -0
- data/lib/async/container/supervisor/connection.rb +97 -1
- data/lib/async/container/supervisor/dispatchable.rb +8 -0
- data/lib/async/container/supervisor/endpoint.rb +4 -0
- data/lib/async/container/supervisor/environment.rb +11 -0
- data/lib/async/container/supervisor/memory_monitor.rb +5 -2
- data/lib/async/container/supervisor/process_monitor.rb +89 -0
- data/lib/async/container/supervisor/server.rb +32 -2
- data/lib/async/container/supervisor/service.rb +11 -0
- data/lib/async/container/supervisor/supervised.rb +7 -0
- data/lib/async/container/supervisor/version.rb +4 -1
- data/lib/async/container/supervisor/worker.rb +50 -4
- data/lib/async/container/supervisor.rb +1 -10
- data/readme.md +9 -0
- data/releases.md +5 -0
- data.tar.gz.sig +0 -0
- metadata +18 -1
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 39abccaf400a7b793d8f0094e32ccee9a4a9fdad6c6f570e361cace376ebd611
|
|
4
|
+
data.tar.gz: 2f135ee3b0979a16a899a07c760e8aeb46f2474635f9f2862c4eef43b7744961
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ffe7ddc8855501a0c30e35e925596a2aaf262d34a373608e16882295603bb1137f30be80f533adcfc480c957208c96e42907006d0564270bf209763b7d3d81a5
|
|
7
|
+
data.tar.gz: 7b71f2cdcf3f75973fffdaf676f430e34270a6b1be3b684439e50f202854bffddf9ca8d4983a441d9e4fb01319414e80343e9f2a78e4698e964b01f94c71c58e
|
checksums.yaml.gz.sig
CHANGED
|
Binary file
|
data/context/getting-started.md
CHANGED
|
@@ -35,12 +35,6 @@ graph TD
|
|
|
35
35
|
Worker1 -.->|connects via IPC| Supervisor
|
|
36
36
|
Worker2 -.->|connects via IPC| Supervisor
|
|
37
37
|
WorkerN -.->|connects via IPC| Supervisor
|
|
38
|
-
|
|
39
|
-
style Controller fill:#e1f5ff
|
|
40
|
-
style Supervisor fill:#fff4e1
|
|
41
|
-
style Worker1 fill:#e8f5e9
|
|
42
|
-
style Worker2 fill:#e8f5e9
|
|
43
|
-
style WorkerN fill:#e8f5e9
|
|
44
38
|
```
|
|
45
39
|
|
|
46
40
|
**Important:** The supervisor process is itself just another process managed by the root controller. If the supervisor crashes, the controller will restart it, and all worker processes will automatically reconnect to the new supervisor. This design ensures high availability and fault tolerance.
|
|
@@ -115,7 +109,13 @@ This will start:
|
|
|
115
109
|
|
|
116
110
|
### Adding Health Monitors
|
|
117
111
|
|
|
118
|
-
You can add monitors to
|
|
112
|
+
You can add monitors to observe worker health and automatically respond to issues. Monitors are useful for:
|
|
113
|
+
|
|
114
|
+
- **Memory leak detection**: Automatically restart workers consuming excessive memory.
|
|
115
|
+
- **Performance monitoring**: Track CPU and memory usage trends.
|
|
116
|
+
- **Capacity planning**: Understand resource requirements.
|
|
117
|
+
|
|
118
|
+
For example, to add monitoring:
|
|
119
119
|
|
|
120
120
|
```ruby
|
|
121
121
|
service "supervisor" do
|
|
@@ -123,17 +123,22 @@ service "supervisor" do
|
|
|
123
123
|
|
|
124
124
|
monitors do
|
|
125
125
|
[
|
|
126
|
-
#
|
|
126
|
+
# Log process metrics for observability:
|
|
127
|
+
Async::Container::Supervisor::ProcessMonitor.new(
|
|
128
|
+
interval: 60
|
|
129
|
+
),
|
|
130
|
+
|
|
131
|
+
# Restart workers exceeding memory limits:
|
|
127
132
|
Async::Container::Supervisor::MemoryMonitor.new(
|
|
128
|
-
interval: 10,
|
|
129
|
-
|
|
133
|
+
interval: 10,
|
|
134
|
+
maximum_size_limit: 1024 * 1024 * 500 # 500MB limit per process
|
|
130
135
|
)
|
|
131
136
|
]
|
|
132
137
|
end
|
|
133
138
|
end
|
|
134
139
|
```
|
|
135
140
|
|
|
136
|
-
|
|
141
|
+
See the {ruby Async::Container::Supervisor::MemoryMonitor Memory Monitor} and {ruby Async::Container::Supervisor::ProcessMonitor Process Monitor} guides for detailed configuration options and best practices.
|
|
137
142
|
|
|
138
143
|
### Collecting Diagnostics
|
|
139
144
|
|
data/context/index.yaml
CHANGED
|
@@ -10,3 +10,11 @@ files:
|
|
|
10
10
|
title: Getting Started
|
|
11
11
|
description: This guide explains how to get started with `async-container-supervisor`
|
|
12
12
|
to supervise and monitor worker processes in your Ruby applications.
|
|
13
|
+
- path: memory-monitor.md
|
|
14
|
+
title: Memory Monitor
|
|
15
|
+
description: This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::MemoryMonitor</code>
|
|
16
|
+
to detect and restart workers that exceed memory limits or develop memory leaks.
|
|
17
|
+
- path: process-monitor.md
|
|
18
|
+
title: Process Monitor
|
|
19
|
+
description: This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::ProcessMonitor</code>
|
|
20
|
+
to log CPU and memory metrics for your worker processes.
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# Memory Monitor
|
|
2
|
+
|
|
3
|
+
This guide explains how to use the {ruby Async::Container::Supervisor::MemoryMonitor} to detect and restart workers that exceed memory limits or develop memory leaks.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Long-running worker processes often accumulate memory over time, either through legitimate growth or memory leaks. Without intervention, workers can consume all available system memory, causing performance degradation or system crashes. The `MemoryMonitor` solves this by automatically detecting and restarting problematic workers before they impact system stability.
|
|
8
|
+
|
|
9
|
+
Use the `MemoryMonitor` when you need:
|
|
10
|
+
|
|
11
|
+
- **Memory leak protection**: Automatically restart workers that continuously accumulate memory.
|
|
12
|
+
- **Resource limits**: Enforce maximum memory usage per worker.
|
|
13
|
+
- **System stability**: Prevent runaway processes from exhausting system memory.
|
|
14
|
+
- **Leak diagnosis**: Capture memory samples when leaks are detected for debugging.
|
|
15
|
+
|
|
16
|
+
The monitor uses the `memory-leak` gem to track process memory usage over time, detecting abnormal growth patterns that indicate leaks.
|
|
17
|
+
|
|
18
|
+
## Usage
|
|
19
|
+
|
|
20
|
+
Add a memory monitor to your supervisor service to automatically restart workers that exceed 500MB:
|
|
21
|
+
|
|
22
|
+
```ruby
|
|
23
|
+
service "supervisor" do
|
|
24
|
+
include Async::Container::Supervisor::Environment
|
|
25
|
+
|
|
26
|
+
monitors do
|
|
27
|
+
[
|
|
28
|
+
Async::Container::Supervisor::MemoryMonitor.new(
|
|
29
|
+
# Check worker memory every 10 seconds:
|
|
30
|
+
interval: 10,
|
|
31
|
+
|
|
32
|
+
# Restart workers exceeding 500MB:
|
|
33
|
+
maximum_size_limit: 1024 * 1024 * 500
|
|
34
|
+
)
|
|
35
|
+
]
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
When a worker exceeds the limit:
|
|
41
|
+
1. The monitor logs the leak detection.
|
|
42
|
+
2. Optionally captures a memory sample for debugging.
|
|
43
|
+
3. Sends `SIGINT` to gracefully shut down the worker.
|
|
44
|
+
4. The container automatically spawns a replacement worker.
|
|
45
|
+
|
|
46
|
+
## Configuration Options
|
|
47
|
+
|
|
48
|
+
The `MemoryMonitor` accepts the following options:
|
|
49
|
+
|
|
50
|
+
### `interval`
|
|
51
|
+
|
|
52
|
+
The interval (in seconds) at which to check for memory leaks. Default: `10` seconds.
|
|
53
|
+
|
|
54
|
+
```ruby
|
|
55
|
+
Async::Container::Supervisor::MemoryMonitor.new(interval: 30)
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### `maximum_size_limit`
|
|
59
|
+
|
|
60
|
+
The maximum memory size (in bytes) per process. When a process exceeds this limit, it will be restarted.
|
|
61
|
+
|
|
62
|
+
```ruby
|
|
63
|
+
# 500MB limit
|
|
64
|
+
Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 500)
|
|
65
|
+
|
|
66
|
+
# 1GB limit
|
|
67
|
+
Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 1024)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### `total_size_limit`
|
|
71
|
+
|
|
72
|
+
The total size limit (in bytes) for all monitored processes combined. If not specified, only per-process limits are enforced.
|
|
73
|
+
|
|
74
|
+
```ruby
|
|
75
|
+
# Total limit of 2GB across all workers
|
|
76
|
+
Async::Container::Supervisor::MemoryMonitor.new(
|
|
77
|
+
maximum_size_limit: 1024 * 1024 * 500, # 500MB per process
|
|
78
|
+
total_size_limit: 1024 * 1024 * 1024 * 2 # 2GB total
|
|
79
|
+
)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### `memory_sample`
|
|
83
|
+
|
|
84
|
+
Options for capturing memory samples when a leak is detected. If `nil`, memory sampling is disabled.
|
|
85
|
+
|
|
86
|
+
Default: `{duration: 30, timeout: 120}`
|
|
87
|
+
|
|
88
|
+
```ruby
|
|
89
|
+
# Customize memory sampling:
|
|
90
|
+
Async::Container::Supervisor::MemoryMonitor.new(
|
|
91
|
+
memory_sample: {
|
|
92
|
+
duration: 60, # Sample for 60 seconds
|
|
93
|
+
timeout: 180 # Timeout after 180 seconds
|
|
94
|
+
}
|
|
95
|
+
)
|
|
96
|
+
|
|
97
|
+
# Disable memory sampling:
|
|
98
|
+
Async::Container::Supervisor::MemoryMonitor.new(
|
|
99
|
+
memory_sample: nil
|
|
100
|
+
)
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Memory Leak Detection
|
|
104
|
+
|
|
105
|
+
When a memory leak is detected, the monitor will:
|
|
106
|
+
|
|
107
|
+
1. Log the leak detection with process details.
|
|
108
|
+
2. If `memory_sample` is configured, capture a memory sample from the worker.
|
|
109
|
+
3. Send a `SIGINT` signal to gracefully restart the worker.
|
|
110
|
+
4. The container will automatically restart the worker process.
|
|
111
|
+
|
|
112
|
+
### Memory Sampling
|
|
113
|
+
|
|
114
|
+
When a memory leak is detected and `memory_sample` is configured, the monitor requests a lightweight memory sample from the worker. This sample:
|
|
115
|
+
|
|
116
|
+
- Tracks allocations during the sampling period.
|
|
117
|
+
- Forces a garbage collection.
|
|
118
|
+
- Returns a JSON report showing retained objects.
|
|
119
|
+
|
|
120
|
+
The report includes:
|
|
121
|
+
- `total_allocated`: Total allocated memory and object count.
|
|
122
|
+
- `total_retained`: Total retained memory and count after GC.
|
|
123
|
+
- `by_gem`: Breakdown by gem/library.
|
|
124
|
+
- `by_file`: Breakdown by source file.
|
|
125
|
+
- `by_location`: Breakdown by specific file:line locations.
|
|
126
|
+
- `by_class`: Breakdown by object class.
|
|
127
|
+
- `strings`: String allocation analysis.
|
|
128
|
+
|
|
129
|
+
This is much more efficient than a full heap dump using `ObjectSpace.dump_all`.
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# Process Monitor
|
|
2
|
+
|
|
3
|
+
This guide explains how to use the {ruby Async::Container::Supervisor::ProcessMonitor} to log CPU and memory metrics for your worker processes.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Understanding how your workers consume resources over time is essential for performance optimization, capacity planning, and debugging. Without visibility into CPU and memory usage, you can't identify bottlenecks, plan infrastructure scaling, or diagnose production issues effectively.
|
|
8
|
+
|
|
9
|
+
The `ProcessMonitor` provides this observability by periodically capturing and logging comprehensive metrics for your entire application process tree.
|
|
10
|
+
|
|
11
|
+
Use the `ProcessMonitor` when you need:
|
|
12
|
+
|
|
13
|
+
- **Performance analysis**: Identify which workers consume the most CPU or memory.
|
|
14
|
+
- **Capacity planning**: Determine optimal worker counts and memory requirements.
|
|
15
|
+
- **Trend monitoring**: Track resource usage patterns over time.
|
|
16
|
+
- **Debugging assistance**: Correlate resource usage with application behavior.
|
|
17
|
+
- **Cost optimization**: Right-size infrastructure based on actual usage.
|
|
18
|
+
|
|
19
|
+
Unlike the {ruby Async::Container::Supervisor::MemoryMonitor}, which takes action when limits are exceeded, the `ProcessMonitor` is purely observational - it logs metrics without interfering with worker processes.
|
|
20
|
+
|
|
21
|
+
## Usage
|
|
22
|
+
|
|
23
|
+
Add a process monitor to log resource usage every minute:
|
|
24
|
+
|
|
25
|
+
```ruby
|
|
26
|
+
service "supervisor" do
|
|
27
|
+
include Async::Container::Supervisor::Environment
|
|
28
|
+
|
|
29
|
+
monitors do
|
|
30
|
+
[
|
|
31
|
+
# Log CPU and memory metrics for all processes:
|
|
32
|
+
Async::Container::Supervisor::ProcessMonitor.new(
|
|
33
|
+
interval: 60 # Capture metrics every minute
|
|
34
|
+
)
|
|
35
|
+
]
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
This allows you to easily search and filter by specific fields:
|
|
41
|
+
- `general.process_id = 12347` - Find metrics for a specific process.
|
|
42
|
+
- `general.command = "worker-1"` - Find all metrics for worker processes.
|
|
43
|
+
- `general.processor_utilization > 50` - Find high CPU usage processes.
|
|
44
|
+
- `general.resident_size > 500000` - Find processes using more than 500MB.
|
|
45
|
+
|
|
46
|
+
## Configuration Options
|
|
47
|
+
|
|
48
|
+
### `interval`
|
|
49
|
+
|
|
50
|
+
The interval (in seconds) at which to capture and log process metrics. Default: `60` seconds.
|
|
51
|
+
|
|
52
|
+
```ruby
|
|
53
|
+
# Log every 30 seconds
|
|
54
|
+
Async::Container::Supervisor::ProcessMonitor.new(interval: 30)
|
|
55
|
+
|
|
56
|
+
# Log every 5 minutes
|
|
57
|
+
Async::Container::Supervisor::ProcessMonitor.new(interval: 300)
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Captured Metrics
|
|
61
|
+
|
|
62
|
+
The `ProcessMonitor` captures the following metrics for each process:
|
|
63
|
+
|
|
64
|
+
### Core Metrics
|
|
65
|
+
|
|
66
|
+
- **process_id**: Unique identifier for the process.
|
|
67
|
+
- **parent_process_id**: The parent process that spawned this one.
|
|
68
|
+
- **process_group_id**: Process group identifier.
|
|
69
|
+
- **command**: The command name.
|
|
70
|
+
- **processor_utilization**: CPU usage percentage.
|
|
71
|
+
- **resident_size**: Physical memory used (KB).
|
|
72
|
+
- **total_size**: Total memory space including shared memory (KB).
|
|
73
|
+
- **processor_time**: Total CPU time used (seconds).
|
|
74
|
+
- **elapsed_time**: How long the process has been running (seconds).
|
|
75
|
+
|
|
76
|
+
### Detailed Memory Metrics
|
|
77
|
+
|
|
78
|
+
When available (OS-dependent), additional memory details are captured:
|
|
79
|
+
|
|
80
|
+
- **map_count**: Number of memory mappings (stacks, libraries, etc.).
|
|
81
|
+
- **proportional_size**: Memory usage accounting for shared memory (KB).
|
|
82
|
+
- **shared_clean_size**: Unmodified shared memory (KB).
|
|
83
|
+
- **shared_dirty_size**: Modified shared memory (KB).
|
|
84
|
+
- **private_clean_size**: Unmodified private memory (KB).
|
|
85
|
+
- **private_dirty_size**: Modified private memory (KB).
|
|
86
|
+
- **referenced_size**: Active page-cache (KB).
|
|
87
|
+
- **anonymous_size**: Memory not backed by files (KB)
|
|
88
|
+
- **swap_size**: Memory swapped to disk (KB).
|
|
89
|
+
- **proportional_swap_size**: Proportional swap usage (KB).
|
|
90
|
+
- **major_faults**: The number of page faults requiring I/O.
|
|
91
|
+
- **minor_faults**: The number of page faults that don't require I/O (e.g. CoW).
|
|
@@ -11,6 +11,9 @@ module Async
|
|
|
11
11
|
module Supervisor
|
|
12
12
|
# A client provides a mechanism to connect to a supervisor server in order to execute operations.
|
|
13
13
|
class Client
|
|
14
|
+
# Initialize a new client.
|
|
15
|
+
#
|
|
16
|
+
# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
|
|
14
17
|
def initialize(endpoint: Supervisor.endpoint)
|
|
15
18
|
@endpoint = endpoint
|
|
16
19
|
end
|
|
@@ -8,8 +8,19 @@ require "json"
|
|
|
8
8
|
module Async
|
|
9
9
|
module Container
|
|
10
10
|
module Supervisor
|
|
11
|
+
# Represents a bidirectional communication channel between supervisor and worker.
|
|
12
|
+
#
|
|
13
|
+
# Handles message passing, call/response patterns, and connection lifecycle.
|
|
11
14
|
class Connection
|
|
15
|
+
# Represents a remote procedure call over a connection.
|
|
16
|
+
#
|
|
17
|
+
# Manages the call lifecycle, response queueing, and completion signaling.
|
|
12
18
|
class Call
|
|
19
|
+
# Initialize a new call.
|
|
20
|
+
#
|
|
21
|
+
# @parameter connection [Connection] The connection this call belongs to.
|
|
22
|
+
# @parameter id [Integer] The unique call identifier.
|
|
23
|
+
# @parameter message [Hash] The call message/parameters.
|
|
13
24
|
def initialize(connection, id, message)
|
|
14
25
|
@connection = connection
|
|
15
26
|
@id = id
|
|
@@ -18,10 +29,16 @@ module Async
|
|
|
18
29
|
@queue = ::Thread::Queue.new
|
|
19
30
|
end
|
|
20
31
|
|
|
32
|
+
# Convert the call to a JSON-compatible hash.
|
|
33
|
+
#
|
|
34
|
+
# @returns [Hash] The message hash.
|
|
21
35
|
def as_json(...)
|
|
22
36
|
@message
|
|
23
37
|
end
|
|
24
38
|
|
|
39
|
+
# Convert the call to a JSON string.
|
|
40
|
+
#
|
|
41
|
+
# @returns [String] The JSON representation.
|
|
25
42
|
def to_json(...)
|
|
26
43
|
as_json.to_json(...)
|
|
27
44
|
end
|
|
@@ -32,14 +49,24 @@ module Async
|
|
|
32
49
|
# @attribute [Hash] The message that initiated the call.
|
|
33
50
|
attr :message
|
|
34
51
|
|
|
52
|
+
# Access a parameter from the call message.
|
|
53
|
+
#
|
|
54
|
+
# @parameter key [Symbol] The parameter name.
|
|
55
|
+
# @returns [Object] The parameter value.
|
|
35
56
|
def [] key
|
|
36
57
|
@message[key]
|
|
37
58
|
end
|
|
38
59
|
|
|
60
|
+
# Push a response into the call's queue.
|
|
61
|
+
#
|
|
62
|
+
# @parameter response [Hash] The response data to push.
|
|
39
63
|
def push(**response)
|
|
40
64
|
@queue.push(response)
|
|
41
65
|
end
|
|
42
66
|
|
|
67
|
+
# Pop a response from the call's queue.
|
|
68
|
+
#
|
|
69
|
+
# @returns [Hash, nil] The next response or nil if queue is closed.
|
|
43
70
|
def pop(...)
|
|
44
71
|
@queue.pop(...)
|
|
45
72
|
end
|
|
@@ -49,12 +76,20 @@ module Async
|
|
|
49
76
|
@queue.close
|
|
50
77
|
end
|
|
51
78
|
|
|
79
|
+
# Iterate over all responses from the call.
|
|
80
|
+
#
|
|
81
|
+
# @yields {|response| ...} Each response from the queue.
|
|
52
82
|
def each(&block)
|
|
53
83
|
while response = self.pop
|
|
54
84
|
yield response
|
|
55
85
|
end
|
|
56
86
|
end
|
|
57
87
|
|
|
88
|
+
# Finish the call with a final response.
|
|
89
|
+
#
|
|
90
|
+
# Closes the response queue after pushing the final response.
|
|
91
|
+
#
|
|
92
|
+
# @parameter response [Hash] The final response data.
|
|
58
93
|
def finish(**response)
|
|
59
94
|
# If the remote end has already closed the connection, we don't need to send a finished message:
|
|
60
95
|
unless @queue.closed?
|
|
@@ -63,10 +98,16 @@ module Async
|
|
|
63
98
|
end
|
|
64
99
|
end
|
|
65
100
|
|
|
101
|
+
# Finish the call with a failure response.
|
|
102
|
+
#
|
|
103
|
+
# @parameter response [Hash] The error response data.
|
|
66
104
|
def fail(**response)
|
|
67
105
|
self.finish(failed: true, **response)
|
|
68
106
|
end
|
|
69
107
|
|
|
108
|
+
# Check if the call's queue is closed.
|
|
109
|
+
#
|
|
110
|
+
# @returns [Boolean] True if the queue is closed.
|
|
70
111
|
def closed?
|
|
71
112
|
@queue.closed?
|
|
72
113
|
end
|
|
@@ -74,7 +115,8 @@ module Async
|
|
|
74
115
|
# Forward this call to another connection, proxying all responses back.
|
|
75
116
|
#
|
|
76
117
|
# This provides true streaming forwarding - intermediate responses flow through
|
|
77
|
-
# in real-time rather than being buffered.
|
|
118
|
+
# in real-time rather than being buffered. The forwarding runs asynchronously
|
|
119
|
+
# to avoid blocking the dispatcher.
|
|
78
120
|
#
|
|
79
121
|
# @parameter target_connection [Connection] The connection to forward the call to.
|
|
80
122
|
# @parameter operation [Hash] The operation request to forward (must include :do key).
|
|
@@ -92,6 +134,15 @@ module Async
|
|
|
92
134
|
end
|
|
93
135
|
end
|
|
94
136
|
|
|
137
|
+
# Dispatch a call to a target handler.
|
|
138
|
+
#
|
|
139
|
+
# Creates a call, dispatches it to the target, and streams responses back
|
|
140
|
+
# through the connection.
|
|
141
|
+
#
|
|
142
|
+
# @parameter connection [Connection] The connection to dispatch on.
|
|
143
|
+
# @parameter target [Dispatchable] The target handler.
|
|
144
|
+
# @parameter id [Integer] The call identifier.
|
|
145
|
+
# @parameter message [Hash] The call message.
|
|
95
146
|
def self.dispatch(connection, target, id, message)
|
|
96
147
|
Async do
|
|
97
148
|
call = self.new(connection, id, message)
|
|
@@ -112,6 +163,15 @@ module Async
|
|
|
112
163
|
end
|
|
113
164
|
end
|
|
114
165
|
|
|
166
|
+
# Make a call on a connection and wait for responses.
|
|
167
|
+
#
|
|
168
|
+
# If a block is provided, yields each response. Otherwise, buffers intermediate
|
|
169
|
+
# responses and returns the final response.
|
|
170
|
+
#
|
|
171
|
+
# @parameter connection [Connection] The connection to call on.
|
|
172
|
+
# @parameter message [Hash] The call message/parameters.
|
|
173
|
+
# @yields {|response| ...} Each intermediate response if block given.
|
|
174
|
+
# @returns [Hash, Array] The final response or array of intermediate responses.
|
|
115
175
|
def self.call(connection, **message, &block)
|
|
116
176
|
id = connection.next_id
|
|
117
177
|
call = self.new(connection, id, message)
|
|
@@ -149,6 +209,11 @@ module Async
|
|
|
149
209
|
end
|
|
150
210
|
end
|
|
151
211
|
|
|
212
|
+
# Initialize a new connection.
|
|
213
|
+
#
|
|
214
|
+
# @parameter stream [IO] The underlying IO stream.
|
|
215
|
+
# @parameter id [Integer] The starting call ID (default: 0).
|
|
216
|
+
# @parameter state [Hash] Initial connection state.
|
|
152
217
|
def initialize(stream, id = 0, **state)
|
|
153
218
|
@stream = stream
|
|
154
219
|
@id = id
|
|
@@ -164,15 +229,26 @@ module Async
|
|
|
164
229
|
# @attribute [Hash(Symbol, Object)] State associated with this connection, for example the process ID, etc.
|
|
165
230
|
attr_accessor :state
|
|
166
231
|
|
|
232
|
+
# Generate the next unique call ID.
|
|
233
|
+
#
|
|
234
|
+
# @returns [Integer] The next call identifier.
|
|
167
235
|
def next_id
|
|
168
236
|
@id += 2
|
|
169
237
|
end
|
|
170
238
|
|
|
239
|
+
# Write a message to the connection stream.
|
|
240
|
+
#
|
|
241
|
+
# @parameter message [Hash] The message to write.
|
|
171
242
|
def write(**message)
|
|
172
243
|
@stream.write(JSON.dump(message) << "\n")
|
|
173
244
|
@stream.flush
|
|
174
245
|
end
|
|
175
246
|
|
|
247
|
+
# Make a synchronous call and wait for a single response.
|
|
248
|
+
#
|
|
249
|
+
# @parameter timeout [Numeric, nil] Optional timeout for the call.
|
|
250
|
+
# @parameter message [Hash] The call message.
|
|
251
|
+
# @returns [Hash] The response.
|
|
176
252
|
def call(timeout: nil, **message)
|
|
177
253
|
id = next_id
|
|
178
254
|
calls[id] = ::Thread::Queue.new
|
|
@@ -184,22 +260,34 @@ module Async
|
|
|
184
260
|
calls.delete(id)
|
|
185
261
|
end
|
|
186
262
|
|
|
263
|
+
# Read a message from the connection stream.
|
|
264
|
+
#
|
|
265
|
+
# @returns [Hash, nil] The parsed message or nil if stream is closed.
|
|
187
266
|
def read
|
|
188
267
|
if line = @stream&.gets
|
|
189
268
|
JSON.parse(line, symbolize_names: true)
|
|
190
269
|
end
|
|
191
270
|
end
|
|
192
271
|
|
|
272
|
+
# Iterate over all messages from the connection.
|
|
273
|
+
#
|
|
274
|
+
# @yields {|message| ...} Each message read from the stream.
|
|
193
275
|
def each
|
|
194
276
|
while message = self.read
|
|
195
277
|
yield message
|
|
196
278
|
end
|
|
197
279
|
end
|
|
198
280
|
|
|
281
|
+
# Make a synchronous call and wait for a single response.
|
|
199
282
|
def call(...)
|
|
200
283
|
Call.call(self, ...)
|
|
201
284
|
end
|
|
202
285
|
|
|
286
|
+
# Run the connection, processing incoming messages.
|
|
287
|
+
#
|
|
288
|
+
# Dispatches incoming calls to the target and routes responses to waiting calls.
|
|
289
|
+
#
|
|
290
|
+
# @parameter target [Dispatchable] The target to dispatch calls to.
|
|
203
291
|
def run(target)
|
|
204
292
|
self.each do |message|
|
|
205
293
|
if id = message.delete(:id)
|
|
@@ -219,12 +307,20 @@ module Async
|
|
|
219
307
|
end
|
|
220
308
|
end
|
|
221
309
|
|
|
310
|
+
# Run the connection in a background task.
|
|
311
|
+
#
|
|
312
|
+
# @parameter target [Dispatchable] The target to dispatch calls to.
|
|
313
|
+
# @parameter parent [Async::Task] The parent task.
|
|
314
|
+
# @returns [Async::Task] The background reader task.
|
|
222
315
|
def run_in_background(target, parent: Task.current)
|
|
223
316
|
@reader ||= parent.async do
|
|
224
317
|
self.run(target)
|
|
225
318
|
end
|
|
226
319
|
end
|
|
227
320
|
|
|
321
|
+
# Close the connection and clean up resources.
|
|
322
|
+
#
|
|
323
|
+
# Stops the background reader, closes the stream, and closes all pending calls.
|
|
228
324
|
def close
|
|
229
325
|
if @reader
|
|
230
326
|
@reader.stop
|
|
@@ -9,7 +9,15 @@ require_relative "endpoint"
|
|
|
9
9
|
module Async
|
|
10
10
|
module Container
|
|
11
11
|
module Supervisor
|
|
12
|
+
# A mixin for objects that can dispatch calls.
|
|
13
|
+
#
|
|
14
|
+
# Provides automatic method dispatch based on the call's `:do` parameter.
|
|
12
15
|
module Dispatchable
|
|
16
|
+
# Dispatch a call to the appropriate method.
|
|
17
|
+
#
|
|
18
|
+
# Routes calls to methods named `do_#{operation}` based on the call's `:do` parameter.
|
|
19
|
+
#
|
|
20
|
+
# @parameter call [Connection::Call] The call to dispatch.
|
|
13
21
|
def dispatch(call)
|
|
14
22
|
method_name = "do_#{call.message[:do]}"
|
|
15
23
|
self.public_send(method_name, call)
|
|
@@ -8,6 +8,10 @@ require "io/endpoint/unix_endpoint"
|
|
|
8
8
|
module Async
|
|
9
9
|
module Container
|
|
10
10
|
module Supervisor
|
|
11
|
+
# Get the supervisor IPC endpoint.
|
|
12
|
+
#
|
|
13
|
+
# @parameter path [String] The path for the Unix socket (default: "supervisor.ipc").
|
|
14
|
+
# @returns [IO::Endpoint] The Unix socket endpoint.
|
|
11
15
|
def self.endpoint(path = "supervisor.ipc")
|
|
12
16
|
::IO::Endpoint.unix(path)
|
|
13
17
|
end
|
|
@@ -10,6 +10,9 @@ require_relative "service"
|
|
|
10
10
|
module Async
|
|
11
11
|
module Container
|
|
12
12
|
module Supervisor
|
|
13
|
+
# An environment mixin for supervisor services.
|
|
14
|
+
#
|
|
15
|
+
# Provides configuration and setup for supervisor processes that monitor workers.
|
|
13
16
|
module Environment
|
|
14
17
|
# The service class to use for the supervisor.
|
|
15
18
|
# @returns [Class]
|
|
@@ -40,10 +43,18 @@ module Async
|
|
|
40
43
|
{restart: true, count: 1, health_check_timeout: 30}
|
|
41
44
|
end
|
|
42
45
|
|
|
46
|
+
# Get the list of monitors to run in the supervisor.
|
|
47
|
+
#
|
|
48
|
+
# Override this method to provide custom monitors.
|
|
49
|
+
#
|
|
50
|
+
# @returns [Array] The list of monitor instances.
|
|
43
51
|
def monitors
|
|
44
52
|
[]
|
|
45
53
|
end
|
|
46
54
|
|
|
55
|
+
# Create the supervisor server instance.
|
|
56
|
+
#
|
|
57
|
+
# @returns [Server] The supervisor server.
|
|
47
58
|
def make_server(endpoint)
|
|
48
59
|
Server.new(endpoint: endpoint, monitors: self.monitors)
|
|
49
60
|
end
|
|
@@ -9,8 +9,11 @@ require "set"
|
|
|
9
9
|
module Async
|
|
10
10
|
module Container
|
|
11
11
|
module Supervisor
|
|
12
|
+
# Monitors worker memory usage and restarts workers that exceed limits.
|
|
13
|
+
#
|
|
14
|
+
# Uses the `memory` gem to track process memory and detect leaks.
|
|
12
15
|
class MemoryMonitor
|
|
13
|
-
MEMORY_SAMPLE = {duration:
|
|
16
|
+
MEMORY_SAMPLE = {duration: 30, timeout: 30*4}
|
|
14
17
|
|
|
15
18
|
# Create a new memory monitor.
|
|
16
19
|
#
|
|
@@ -82,7 +85,7 @@ module Async
|
|
|
82
85
|
|
|
83
86
|
if @memory_sample
|
|
84
87
|
Console.info(self, "Capturing memory sample...", child: {process_id: process_id}, memory_sample: @memory_sample)
|
|
85
|
-
|
|
88
|
+
|
|
86
89
|
# We are tracking multiple connections to the same process:
|
|
87
90
|
connections = @processes[process_id]
|
|
88
91
|
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Released under the MIT License.
|
|
4
|
+
# Copyright, 2025, by Samuel Williams.
|
|
5
|
+
|
|
6
|
+
require "process/metrics"
|
|
7
|
+
|
|
8
|
+
module Async
|
|
9
|
+
module Container
|
|
10
|
+
module Supervisor
|
|
11
|
+
# Monitors process metrics and logs them periodically.
|
|
12
|
+
#
|
|
13
|
+
# Uses the `process-metrics` gem to capture CPU and memory metrics for a process tree.
|
|
14
|
+
# Unlike {MemoryMonitor}, this monitor captures metrics for the entire process tree
|
|
15
|
+
# by tracking the parent process ID (ppid), which is more efficient than tracking
|
|
16
|
+
# individual processes.
|
|
17
|
+
class ProcessMonitor
|
|
18
|
+
# Create a new process monitor.
|
|
19
|
+
#
|
|
20
|
+
# @parameter interval [Integer] The interval in seconds at which to log process metrics.
|
|
21
|
+
# @parameter ppid [Integer] The parent process ID to monitor. If nil, uses the current process to capture its children.
|
|
22
|
+
def initialize(interval: 60, ppid: nil)
|
|
23
|
+
@interval = interval
|
|
24
|
+
@ppid = ppid || Process.ppid
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
# @attribute [Integer] The parent process ID being monitored.
|
|
28
|
+
attr :ppid
|
|
29
|
+
|
|
30
|
+
# Register a connection with the process monitor.
|
|
31
|
+
#
|
|
32
|
+
# This is provided for consistency with {MemoryMonitor}, but since we monitor
|
|
33
|
+
# the entire process tree via ppid, we don't need to track individual connections.
|
|
34
|
+
#
|
|
35
|
+
# @parameter connection [Connection] The connection to register.
|
|
36
|
+
def register(connection)
|
|
37
|
+
Console.debug(self, "Connection registered.", connection: connection, state: connection.state)
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
# Remove a connection from the process monitor.
|
|
41
|
+
#
|
|
42
|
+
# This is provided for consistency with {MemoryMonitor}, but since we monitor
|
|
43
|
+
# the entire process tree via ppid, we don't need to track individual connections.
|
|
44
|
+
#
|
|
45
|
+
# @parameter connection [Connection] The connection to remove.
|
|
46
|
+
def remove(connection)
|
|
47
|
+
Console.debug(self, "Connection removed.", connection: connection, state: connection.state)
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
# Capture current process metrics for the entire process tree.
|
|
51
|
+
#
|
|
52
|
+
# @returns [Hash] A hash mapping process IDs to their metrics.
|
|
53
|
+
def metrics
|
|
54
|
+
Process::Metrics::General.capture(ppid: @ppid)
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# Dump the current status of the process monitor.
|
|
58
|
+
#
|
|
59
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
60
|
+
def status(call)
|
|
61
|
+
metrics = self.metrics
|
|
62
|
+
|
|
63
|
+
call.push(process_monitor: {ppid: @ppid, metrics: metrics})
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# Run the process monitor.
|
|
67
|
+
#
|
|
68
|
+
# Periodically captures and logs process metrics for the entire process tree.
|
|
69
|
+
#
|
|
70
|
+
# @returns [Async::Task] The task that is running the process monitor.
|
|
71
|
+
def run
|
|
72
|
+
Async do
|
|
73
|
+
while true
|
|
74
|
+
metrics = self.metrics
|
|
75
|
+
|
|
76
|
+
# Log each process individually for better searchability in log platforms:
|
|
77
|
+
metrics.each do |process_id, general|
|
|
78
|
+
Console.info(self, "Process metrics captured.", general: general)
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
sleep(@interval)
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
end
|
|
89
|
+
|
|
@@ -16,6 +16,10 @@ module Async
|
|
|
16
16
|
#
|
|
17
17
|
# There are various tasks that can be executed by the server, such as restarting the process group, and querying the status of the processes. The server is also responsible for managing the lifecycle of the monitors, which can be used to monitor the status of the connected workers.
|
|
18
18
|
class Server
|
|
19
|
+
# Initialize a new supervisor server.
|
|
20
|
+
#
|
|
21
|
+
# @parameter monitors [Array] The monitors to run.
|
|
22
|
+
# @parameter endpoint [IO::Endpoint] The endpoint to listen on.
|
|
19
23
|
def initialize(monitors: [], endpoint: Supervisor.endpoint)
|
|
20
24
|
@monitors = monitors
|
|
21
25
|
@endpoint = endpoint
|
|
@@ -28,6 +32,12 @@ module Async
|
|
|
28
32
|
|
|
29
33
|
include Dispatchable
|
|
30
34
|
|
|
35
|
+
# Register a worker connection with the supervisor.
|
|
36
|
+
#
|
|
37
|
+
# Assigns a unique connection ID and notifies all monitors of the new connection.
|
|
38
|
+
#
|
|
39
|
+
# @parameter call [Connection::Call] The registration call.
|
|
40
|
+
# @parameter call[:state] [Hash] The worker state to merge (e.g. process_id).
|
|
31
41
|
def do_register(call)
|
|
32
42
|
call.connection.state.merge!(call.message[:state])
|
|
33
43
|
|
|
@@ -47,9 +57,13 @@ module Async
|
|
|
47
57
|
|
|
48
58
|
# Forward an operation to a worker connection.
|
|
49
59
|
#
|
|
60
|
+
# This allows clients to invoke operations on specific worker processes by
|
|
61
|
+
# providing a connection_id. The operation is proxied through to the worker
|
|
62
|
+
# and responses are streamed back to the client.
|
|
63
|
+
#
|
|
50
64
|
# @parameter call [Connection::Call] The call to handle.
|
|
51
|
-
# @parameter operation [Hash] The operation to forward, must include :do key.
|
|
52
|
-
# @parameter connection_id [String] The connection ID to target.
|
|
65
|
+
# @parameter call[:operation] [Hash] The operation to forward, must include :do key.
|
|
66
|
+
# @parameter call[:connection_id] [String] The connection ID to target.
|
|
53
67
|
def do_forward(call)
|
|
54
68
|
operation = call[:operation]
|
|
55
69
|
connection_id = call[:connection_id]
|
|
@@ -82,6 +96,12 @@ module Async
|
|
|
82
96
|
::Process.kill(signal, ::Process.ppid)
|
|
83
97
|
end
|
|
84
98
|
|
|
99
|
+
# Query the status of the supervisor and all connected workers.
|
|
100
|
+
#
|
|
101
|
+
# Returns information about all registered connections and delegates to
|
|
102
|
+
# monitors to provide additional status information.
|
|
103
|
+
#
|
|
104
|
+
# @parameter call [Connection::Call] The status call.
|
|
85
105
|
def do_status(call)
|
|
86
106
|
connections = @connections.map do |connection_id, connection|
|
|
87
107
|
{
|
|
@@ -98,6 +118,11 @@ module Async
|
|
|
98
118
|
call.finish(connections: connections)
|
|
99
119
|
end
|
|
100
120
|
|
|
121
|
+
# Remove a worker connection from the supervisor.
|
|
122
|
+
#
|
|
123
|
+
# Notifies all monitors and removes the connection from tracking.
|
|
124
|
+
#
|
|
125
|
+
# @parameter connection [Connection] The connection to remove.
|
|
101
126
|
def remove(connection)
|
|
102
127
|
if connection_id = connection.state[:connection_id]
|
|
103
128
|
@connections.delete(connection_id)
|
|
@@ -110,6 +135,11 @@ module Async
|
|
|
110
135
|
end
|
|
111
136
|
end
|
|
112
137
|
|
|
138
|
+
# Run the supervisor server.
|
|
139
|
+
#
|
|
140
|
+
# Starts all monitors and accepts connections from workers.
|
|
141
|
+
#
|
|
142
|
+
# @parameter parent [Async::Task] The parent task to run under.
|
|
113
143
|
def run(parent: Task.current)
|
|
114
144
|
parent.async do |task|
|
|
115
145
|
@monitors.each do |monitor|
|
|
@@ -10,6 +10,9 @@ require "io/endpoint/bound_endpoint"
|
|
|
10
10
|
module Async
|
|
11
11
|
module Container
|
|
12
12
|
module Supervisor
|
|
13
|
+
# The supervisor service implementation.
|
|
14
|
+
#
|
|
15
|
+
# Manages the lifecycle of the supervisor server and its monitors.
|
|
13
16
|
class Service < Async::Service::Generic
|
|
14
17
|
# Initialize the supervisor using the given environment.
|
|
15
18
|
# @parameter environment [Build::Environment]
|
|
@@ -32,10 +35,18 @@ module Async
|
|
|
32
35
|
super
|
|
33
36
|
end
|
|
34
37
|
|
|
38
|
+
# Get the name of the supervisor service.
|
|
39
|
+
#
|
|
40
|
+
# @returns [String] The service name.
|
|
35
41
|
def name
|
|
36
42
|
@evaluator.name
|
|
37
43
|
end
|
|
38
44
|
|
|
45
|
+
# Set up the supervisor service in the container.
|
|
46
|
+
#
|
|
47
|
+
# Creates and runs the supervisor server with configured monitors.
|
|
48
|
+
#
|
|
49
|
+
# @parameter container [Async::Container::Generic] The container to set up in.
|
|
39
50
|
def setup(container)
|
|
40
51
|
container_options = @evaluator.container_options
|
|
41
52
|
health_check_timeout = container_options[:health_check_timeout]
|
|
@@ -8,6 +8,9 @@ require "async/service/environment"
|
|
|
8
8
|
module Async
|
|
9
9
|
module Container
|
|
10
10
|
module Supervisor
|
|
11
|
+
# An environment mixin for supervised worker services.
|
|
12
|
+
#
|
|
13
|
+
# Enables workers to connect to and be supervised by the supervisor.
|
|
11
14
|
module Supervised
|
|
12
15
|
# The IPC path to use for communication with the supervisor.
|
|
13
16
|
# @returns [String]
|
|
@@ -21,6 +24,10 @@ module Async
|
|
|
21
24
|
::IO::Endpoint.unix(supervisor_ipc_path)
|
|
22
25
|
end
|
|
23
26
|
|
|
27
|
+
# Create a supervised worker for the given instance.
|
|
28
|
+
#
|
|
29
|
+
# @parameter instance [Async::Container::Instance] The container instance.
|
|
30
|
+
# @returns [Worker] The worker client.
|
|
24
31
|
def make_supervised_worker(instance)
|
|
25
32
|
Worker.new(instance, endpoint: supervisor_endpoint)
|
|
26
33
|
end
|
|
@@ -13,10 +13,18 @@ module Async
|
|
|
13
13
|
#
|
|
14
14
|
# There are various tasks that can be executed by the worker, such as dumping memory, threads, and garbage collection profiles.
|
|
15
15
|
class Worker < Client
|
|
16
|
+
# Run a worker with the given state.
|
|
17
|
+
#
|
|
18
|
+
# @parameter state [Hash] The worker state (e.g. process_id, instance info).
|
|
19
|
+
# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
|
|
16
20
|
def self.run(...)
|
|
17
21
|
self.new(...).run
|
|
18
22
|
end
|
|
19
23
|
|
|
24
|
+
# Initialize a new worker.
|
|
25
|
+
#
|
|
26
|
+
# @parameter state [Hash] The worker state to register with the supervisor.
|
|
27
|
+
# @parameter endpoint [IO::Endpoint] The supervisor endpoint to connect to.
|
|
20
28
|
def initialize(state, endpoint: Supervisor.endpoint)
|
|
21
29
|
@state = state
|
|
22
30
|
@endpoint = endpoint
|
|
@@ -39,12 +47,25 @@ module Async
|
|
|
39
47
|
end
|
|
40
48
|
end
|
|
41
49
|
|
|
50
|
+
# Dump the current fiber scheduler hierarchy.
|
|
51
|
+
#
|
|
52
|
+
# Generates a hierarchical view of all running fibers and their relationships.
|
|
53
|
+
#
|
|
54
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
55
|
+
# @parameter call[:path] [String] Optional file path to save the dump.
|
|
42
56
|
def do_scheduler_dump(call)
|
|
43
57
|
dump(call) do |file|
|
|
44
58
|
Fiber.scheduler.print_hierarchy(file)
|
|
45
59
|
end
|
|
46
60
|
end
|
|
47
61
|
|
|
62
|
+
# Dump the entire object space to a file.
|
|
63
|
+
#
|
|
64
|
+
# This is a heavyweight operation that dumps all objects in the heap.
|
|
65
|
+
# Consider using {do_memory_sample} for lighter weight memory leak detection.
|
|
66
|
+
#
|
|
67
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
68
|
+
# @parameter call[:path] [String] Optional file path to save the dump.
|
|
48
69
|
def do_memory_dump(call)
|
|
49
70
|
require "objspace"
|
|
50
71
|
|
|
@@ -59,8 +80,12 @@ module Async
|
|
|
59
80
|
# retained objects allocated during the sampling period. Late-lifecycle
|
|
60
81
|
# allocations that are retained are likely memory leaks.
|
|
61
82
|
#
|
|
83
|
+
# The method samples allocations for the specified duration, forces a garbage
|
|
84
|
+
# collection, and returns a JSON report showing allocated vs retained memory
|
|
85
|
+
# broken down by gem, file, location, and class.
|
|
86
|
+
#
|
|
62
87
|
# @parameter call [Connection::Call] The call to respond to.
|
|
63
|
-
# @parameter duration [Numeric] The duration in seconds to sample for
|
|
88
|
+
# @parameter call[:duration] [Numeric] The duration in seconds to sample for.
|
|
64
89
|
def do_memory_sample(call)
|
|
65
90
|
require "memory"
|
|
66
91
|
|
|
@@ -82,15 +107,25 @@ module Async
|
|
|
82
107
|
# Stop sampling
|
|
83
108
|
sampler.stop
|
|
84
109
|
|
|
85
|
-
|
|
110
|
+
report = sampler.report
|
|
111
|
+
|
|
112
|
+
# This is a temporary log to help with debugging:
|
|
113
|
+
buffer = StringIO.new
|
|
114
|
+
report.print(buffer)
|
|
115
|
+
Console.info(self, "Memory sample completed.", report: buffer.string)
|
|
86
116
|
|
|
87
117
|
# Generate a report focused on retained objects (likely leaks):
|
|
88
|
-
report
|
|
89
|
-
call.finish(report: report.as_json)
|
|
118
|
+
call.finish(report: report)
|
|
90
119
|
ensure
|
|
91
120
|
GC.start
|
|
92
121
|
end
|
|
93
122
|
|
|
123
|
+
# Dump information about all running threads.
|
|
124
|
+
#
|
|
125
|
+
# Includes thread inspection and backtraces for debugging.
|
|
126
|
+
#
|
|
127
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
128
|
+
# @parameter call[:path] [String] Optional file path to save the dump.
|
|
94
129
|
def do_thread_dump(call)
|
|
95
130
|
dump(call) do |file|
|
|
96
131
|
Thread.list.each do |thread|
|
|
@@ -100,11 +135,22 @@ module Async
|
|
|
100
135
|
end
|
|
101
136
|
end
|
|
102
137
|
|
|
138
|
+
# Start garbage collection profiling.
|
|
139
|
+
#
|
|
140
|
+
# Enables the GC profiler to track garbage collection performance.
|
|
141
|
+
#
|
|
142
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
103
143
|
def do_garbage_profile_start(call)
|
|
104
144
|
GC::Profiler.enable
|
|
105
145
|
call.finish(started: true)
|
|
106
146
|
end
|
|
107
147
|
|
|
148
|
+
# Stop garbage collection profiling and return results.
|
|
149
|
+
#
|
|
150
|
+
# Disables the GC profiler and returns collected profiling data.
|
|
151
|
+
#
|
|
152
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
153
|
+
# @parameter call[:path] [String] Optional file path to save the profile.
|
|
108
154
|
def do_garbage_profile_stop(call)
|
|
109
155
|
dump(connection, message) do |file|
|
|
110
156
|
file.puts GC::Profiler.result
|
|
@@ -10,16 +10,7 @@ require_relative "supervisor/worker"
|
|
|
10
10
|
require_relative "supervisor/client"
|
|
11
11
|
|
|
12
12
|
require_relative "supervisor/memory_monitor"
|
|
13
|
+
require_relative "supervisor/process_monitor"
|
|
13
14
|
|
|
14
15
|
require_relative "supervisor/environment"
|
|
15
16
|
require_relative "supervisor/supervised"
|
|
16
|
-
|
|
17
|
-
# @namespace
|
|
18
|
-
module Async
|
|
19
|
-
# @namespace
|
|
20
|
-
module Container
|
|
21
|
-
# @namespace
|
|
22
|
-
module Supervisor
|
|
23
|
-
end
|
|
24
|
-
end
|
|
25
|
-
end
|
data/readme.md
CHANGED
|
@@ -18,10 +18,19 @@ Please see the [project documentation](https://socketry.github.io/async-containe
|
|
|
18
18
|
|
|
19
19
|
- [Getting Started](https://socketry.github.io/async-container-supervisor/guides/getting-started/index) - This guide explains how to get started with `async-container-supervisor` to supervise and monitor worker processes in your Ruby applications.
|
|
20
20
|
|
|
21
|
+
- [Memory Monitor](https://socketry.github.io/async-container-supervisor/guides/memory-monitor/index) - This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::MemoryMonitor</code> to detect and restart workers that exceed memory limits or develop memory leaks.
|
|
22
|
+
|
|
23
|
+
- [Process Monitor](https://socketry.github.io/async-container-supervisor/guides/process-monitor/index) - This guide explains how to use the <code class="language-ruby">Async::Container::Supervisor::ProcessMonitor</code> to log CPU and memory metrics for your worker processes.
|
|
24
|
+
|
|
21
25
|
## Releases
|
|
22
26
|
|
|
23
27
|
Please see the [project releases](https://socketry.github.io/async-container-supervisor/releases/index) for all releases.
|
|
24
28
|
|
|
29
|
+
### v0.8.0
|
|
30
|
+
|
|
31
|
+
- Add `Async::Container::Supervisor::ProcessMonitor` for logging CPU and memory metrics periodically.
|
|
32
|
+
- Fix documentation to use correct `maximum_size_limit:` parameter name for `MemoryMonitor` (was incorrectly documented as `limit:`).
|
|
33
|
+
|
|
25
34
|
### v0.7.0
|
|
26
35
|
|
|
27
36
|
- If a memory leak is detected, sample memory usage for 60 seconds before exiting.
|
data/releases.md
CHANGED
|
@@ -1,5 +1,10 @@
|
|
|
1
1
|
# Releases
|
|
2
2
|
|
|
3
|
+
## v0.8.0
|
|
4
|
+
|
|
5
|
+
- Add `Async::Container::Supervisor::ProcessMonitor` for logging CPU and memory metrics periodically.
|
|
6
|
+
- Fix documentation to use correct `maximum_size_limit:` parameter name for `MemoryMonitor` (was incorrectly documented as `limit:`).
|
|
7
|
+
|
|
3
8
|
## v0.7.0
|
|
4
9
|
|
|
5
10
|
- If a memory leak is detected, sample memory usage for 60 seconds before exiting.
|
data.tar.gz.sig
CHANGED
|
Binary file
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: async-container-supervisor
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.8.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Samuel Williams
|
|
@@ -94,6 +94,20 @@ dependencies:
|
|
|
94
94
|
- - "~>"
|
|
95
95
|
- !ruby/object:Gem::Version
|
|
96
96
|
version: '0.5'
|
|
97
|
+
- !ruby/object:Gem::Dependency
|
|
98
|
+
name: process-metrics
|
|
99
|
+
requirement: !ruby/object:Gem::Requirement
|
|
100
|
+
requirements:
|
|
101
|
+
- - ">="
|
|
102
|
+
- !ruby/object:Gem::Version
|
|
103
|
+
version: '0'
|
|
104
|
+
type: :runtime
|
|
105
|
+
prerelease: false
|
|
106
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
107
|
+
requirements:
|
|
108
|
+
- - ">="
|
|
109
|
+
- !ruby/object:Gem::Version
|
|
110
|
+
version: '0'
|
|
97
111
|
executables: []
|
|
98
112
|
extensions: []
|
|
99
113
|
extra_rdoc_files: []
|
|
@@ -101,6 +115,8 @@ files:
|
|
|
101
115
|
- bake/async/container/supervisor.rb
|
|
102
116
|
- context/getting-started.md
|
|
103
117
|
- context/index.yaml
|
|
118
|
+
- context/memory-monitor.md
|
|
119
|
+
- context/process-monitor.md
|
|
104
120
|
- lib/async/container/supervisor.rb
|
|
105
121
|
- lib/async/container/supervisor/client.rb
|
|
106
122
|
- lib/async/container/supervisor/connection.rb
|
|
@@ -108,6 +124,7 @@ files:
|
|
|
108
124
|
- lib/async/container/supervisor/endpoint.rb
|
|
109
125
|
- lib/async/container/supervisor/environment.rb
|
|
110
126
|
- lib/async/container/supervisor/memory_monitor.rb
|
|
127
|
+
- lib/async/container/supervisor/process_monitor.rb
|
|
111
128
|
- lib/async/container/supervisor/server.rb
|
|
112
129
|
- lib/async/container/supervisor/service.rb
|
|
113
130
|
- lib/async/container/supervisor/supervised.rb
|
metadata.gz.sig
CHANGED
|
Binary file
|