async-container-supervisor 0.6.3 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/bake/async/container/supervisor.rb +19 -0
- data/context/getting-started.md +35 -2
- data/lib/async/container/supervisor/client.rb +1 -1
- data/lib/async/container/supervisor/connection.rb +21 -0
- data/lib/async/container/supervisor/memory_monitor.rb +22 -1
- data/lib/async/container/supervisor/server.rb +48 -1
- data/lib/async/container/supervisor/version.rb +1 -1
- data/lib/async/container/supervisor/worker.rb +40 -2
- data/readme.md +8 -0
- data/releases.md +8 -0
- data.tar.gz.sig +0 -0
- metadata +15 -1
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a2da6a39261568dcfdcd067bcbd7364397df19aad34826ec9d7b6745e74aa198
|
|
4
|
+
data.tar.gz: b8510b8f17ac2fea393f12223604c96a8f9303d6e87b1fc2ac54fc6b0cdfdaeb
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: bf79321c826f009edac43b3c1bf1313710ed6881863aa56dea8c7909c7c231cbb1e07b92b0ed2241f1f76cd137e934ec17fb0b6ed00b30ebe8778c430c61735c
|
|
7
|
+
data.tar.gz: c469d508ec02830abe705ec935b6778b45f78923ed592e9af652f07189ad1929e2c61be389bed2ab2076697e7349398e199a43dd238b486e6ebbc899e4b5f9f7
|
checksums.yaml.gz.sig
CHANGED
|
Binary file
|
|
@@ -29,6 +29,25 @@ def status
|
|
|
29
29
|
end
|
|
30
30
|
end
|
|
31
31
|
|
|
32
|
+
# Sample memory allocations from a worker over a time period.
|
|
33
|
+
#
|
|
34
|
+
# This is useful for identifying memory leaks by tracking allocations
|
|
35
|
+
# that are retained after garbage collection.
|
|
36
|
+
#
|
|
37
|
+
# @parameter duration [Integer] The duration in seconds to sample for (default: 10).
|
|
38
|
+
# @parameter connection_id [String] The connection ID to target a specific worker.
|
|
39
|
+
def memory_sample(duration: 10, connection_id:)
|
|
40
|
+
client do |connection|
|
|
41
|
+
Console.info(self, "Sampling memory from worker...", duration: duration, connection_id: connection_id)
|
|
42
|
+
|
|
43
|
+
# Build the operation request:
|
|
44
|
+
operation = {do: :memory_sample, duration: duration}
|
|
45
|
+
|
|
46
|
+
# Use the forward operation to proxy the request to a worker:
|
|
47
|
+
return connection.call(do: :forward, operation: operation, connection_id: connection_id)
|
|
48
|
+
end
|
|
49
|
+
end
|
|
50
|
+
|
|
32
51
|
private
|
|
33
52
|
|
|
34
53
|
def endpoint
|
data/context/getting-started.md
CHANGED
|
@@ -139,13 +139,46 @@ The {ruby Async::Container::Supervisor::MemoryMonitor} will periodically check w
|
|
|
139
139
|
|
|
140
140
|
The supervisor can collect various diagnostics from workers on demand:
|
|
141
141
|
|
|
142
|
-
- **Memory dumps**: Full heap dumps for memory analysis
|
|
143
|
-
- **
|
|
142
|
+
- **Memory dumps**: Full heap dumps for memory analysis via `ObjectSpace.dump_all`.
|
|
143
|
+
- **Memory samples**: Lightweight sampling to identify memory leaks.
|
|
144
|
+
- **Thread dumps**: Stack traces of all threads.
|
|
144
145
|
- **Scheduler dumps**: Async fiber hierarchy
|
|
145
146
|
- **Garbage collection profiles**: GC performance data
|
|
146
147
|
|
|
147
148
|
These can be triggered programmatically or via command-line tools (when available).
|
|
148
149
|
|
|
150
|
+
#### Memory Leak Diagnosis
|
|
151
|
+
|
|
152
|
+
To identify memory leaks, you can use the memory sampling feature which is much lighter weight than a full memory dump. It tracks allocations over a time period and focuses on retained objects.
|
|
153
|
+
|
|
154
|
+
**Using the bake task:**
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
# Sample for 30 seconds and print report to console
|
|
158
|
+
$ bake async:container:supervisor:memory_sample duration=30
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Programmatically:**
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
# Assuming you have a connection to a worker:
|
|
165
|
+
result = connection.call(do: :memory_sample, duration: 30)
|
|
166
|
+
puts result[:data]
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
This will sample memory allocations for the specified duration, then force a garbage collection and return a JSON report showing what objects were allocated during that period and retained after GC. Late-lifecycle allocations that are retained are likely memory leaks.
|
|
170
|
+
|
|
171
|
+
The JSON report includes:
|
|
172
|
+
- `total_allocated`: Total allocated memory and count
|
|
173
|
+
- `total_retained`: Total retained memory and count
|
|
174
|
+
- `by_gem`: Breakdown by gem/library
|
|
175
|
+
- `by_file`: Breakdown by source file
|
|
176
|
+
- `by_location`: Breakdown by specific file:line locations
|
|
177
|
+
- `by_class`: Breakdown by object class
|
|
178
|
+
- `strings`: String allocation analysis
|
|
179
|
+
|
|
180
|
+
This is much more efficient than `do: :memory_dump` which uses `ObjectSpace.dump_all` and can be slow and blocking on large heaps. The JSON format also makes it easy to integrate with monitoring and analysis tools.
|
|
181
|
+
|
|
149
182
|
## Advanced Usage
|
|
150
183
|
|
|
151
184
|
### Custom Monitors
|
|
@@ -71,6 +71,27 @@ module Async
|
|
|
71
71
|
@queue.closed?
|
|
72
72
|
end
|
|
73
73
|
|
|
74
|
+
# Forward this call to another connection, proxying all responses back.
|
|
75
|
+
#
|
|
76
|
+
# This provides true streaming forwarding - intermediate responses flow through
|
|
77
|
+
# in real-time rather than being buffered.
|
|
78
|
+
#
|
|
79
|
+
# @parameter target_connection [Connection] The connection to forward the call to.
|
|
80
|
+
# @parameter operation [Hash] The operation request to forward (must include :do key).
|
|
81
|
+
def forward(target_connection, operation)
|
|
82
|
+
# Forward the operation in an async task to avoid blocking
|
|
83
|
+
Async do
|
|
84
|
+
# Make the call to the target connection and stream responses back:
|
|
85
|
+
Call.call(target_connection, **operation) do |response|
|
|
86
|
+
# Push each response through our queue:
|
|
87
|
+
self.push(**response)
|
|
88
|
+
end
|
|
89
|
+
ensure
|
|
90
|
+
# Close our queue to signal completion:
|
|
91
|
+
@queue.close
|
|
92
|
+
end
|
|
93
|
+
end
|
|
94
|
+
|
|
74
95
|
def self.dispatch(connection, target, id, message)
|
|
75
96
|
Async do
|
|
76
97
|
call = self.new(connection, id, message)
|
|
@@ -10,15 +10,19 @@ module Async
|
|
|
10
10
|
module Container
|
|
11
11
|
module Supervisor
|
|
12
12
|
class MemoryMonitor
|
|
13
|
+
MEMORY_SAMPLE = {duration: 60, timeout: 60+20}
|
|
14
|
+
|
|
13
15
|
# Create a new memory monitor.
|
|
14
16
|
#
|
|
15
17
|
# @parameter interval [Integer] The interval at which to check for memory leaks.
|
|
16
18
|
# @parameter total_size_limit [Integer] The total size limit of all processes, or nil for no limit.
|
|
17
19
|
# @parameter options [Hash] Options to pass to the cluster when adding processes.
|
|
18
|
-
def initialize(interval: 10, total_size_limit: nil, **options)
|
|
20
|
+
def initialize(interval: 10, total_size_limit: nil, memory_sample: MEMORY_SAMPLE, **options)
|
|
19
21
|
@interval = interval
|
|
20
22
|
@cluster = Memory::Leak::Cluster.new(total_size_limit: total_size_limit)
|
|
21
23
|
|
|
24
|
+
@memory_sample = memory_sample
|
|
25
|
+
|
|
22
26
|
# We use these options when adding processes to the cluster:
|
|
23
27
|
@options = options
|
|
24
28
|
|
|
@@ -74,6 +78,23 @@ module Async
|
|
|
74
78
|
# @parameter monitor [Memory::Leak::Monitor] The monitor that detected the memory leak.
|
|
75
79
|
# @returns [Boolean] True if the process was killed.
|
|
76
80
|
def memory_leak_detected(process_id, monitor)
|
|
81
|
+
Console.info(self, "Memory leak detected!", child: {process_id: process_id}, monitor: monitor)
|
|
82
|
+
|
|
83
|
+
if @memory_sample
|
|
84
|
+
Console.info(self, "Capturing memory sample...", child: {process_id: process_id}, memory_sample: @memory_sample)
|
|
85
|
+
|
|
86
|
+
# We are tracking multiple connections to the same process:
|
|
87
|
+
connections = @processes[process_id]
|
|
88
|
+
|
|
89
|
+
# Try to capture a memory sample:
|
|
90
|
+
connections.each do |connection|
|
|
91
|
+
result = connection.call(do: :memory_sample, **@memory_sample)
|
|
92
|
+
|
|
93
|
+
Console.info(self, "Memory sample completed:", child: {process_id: process_id}, result: result)
|
|
94
|
+
end
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
# Kill the process gently:
|
|
77
98
|
Console.info(self, "Killing process!", child: {process_id: process_id})
|
|
78
99
|
Process.kill(:INT, process_id)
|
|
79
100
|
|
|
@@ -3,6 +3,8 @@
|
|
|
3
3
|
# Released under the MIT License.
|
|
4
4
|
# Copyright, 2025, by Samuel Williams.
|
|
5
5
|
|
|
6
|
+
require "securerandom"
|
|
7
|
+
|
|
6
8
|
require_relative "connection"
|
|
7
9
|
require_relative "endpoint"
|
|
8
10
|
require_relative "dispatchable"
|
|
@@ -17,15 +19,23 @@ module Async
|
|
|
17
19
|
def initialize(monitors: [], endpoint: Supervisor.endpoint)
|
|
18
20
|
@monitors = monitors
|
|
19
21
|
@endpoint = endpoint
|
|
22
|
+
|
|
23
|
+
@connections = {}
|
|
20
24
|
end
|
|
21
25
|
|
|
22
26
|
attr :monitors
|
|
27
|
+
attr :connections
|
|
23
28
|
|
|
24
29
|
include Dispatchable
|
|
25
30
|
|
|
26
31
|
def do_register(call)
|
|
27
32
|
call.connection.state.merge!(call.message[:state])
|
|
28
33
|
|
|
34
|
+
connection_id = SecureRandom.uuid
|
|
35
|
+
call.connection.state[:connection_id] = connection_id
|
|
36
|
+
|
|
37
|
+
@connections[connection_id] = call.connection
|
|
38
|
+
|
|
29
39
|
@monitors.each do |monitor|
|
|
30
40
|
monitor.register(call.connection)
|
|
31
41
|
rescue => error
|
|
@@ -35,6 +45,31 @@ module Async
|
|
|
35
45
|
call.finish
|
|
36
46
|
end
|
|
37
47
|
|
|
48
|
+
# Forward an operation to a worker connection.
|
|
49
|
+
#
|
|
50
|
+
# @parameter call [Connection::Call] The call to handle.
|
|
51
|
+
# @parameter operation [Hash] The operation to forward, must include :do key.
|
|
52
|
+
# @parameter connection_id [String] The connection ID to target.
|
|
53
|
+
def do_forward(call)
|
|
54
|
+
operation = call[:operation]
|
|
55
|
+
connection_id = call[:connection_id]
|
|
56
|
+
|
|
57
|
+
unless connection_id
|
|
58
|
+
call.fail(error: "Missing 'connection_id' parameter")
|
|
59
|
+
return
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
connection = @connections[connection_id]
|
|
63
|
+
|
|
64
|
+
unless connection
|
|
65
|
+
call.fail(error: "Connection not found", connection_id: connection_id)
|
|
66
|
+
return
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
# Forward the call to the target connection
|
|
70
|
+
call.forward(connection, operation)
|
|
71
|
+
end
|
|
72
|
+
|
|
38
73
|
# Restart the current process group, usually including the supervisor and any other processes.
|
|
39
74
|
#
|
|
40
75
|
# @parameter signal [Symbol] The signal to send to the process group.
|
|
@@ -48,14 +83,26 @@ module Async
|
|
|
48
83
|
end
|
|
49
84
|
|
|
50
85
|
def do_status(call)
|
|
86
|
+
connections = @connections.map do |connection_id, connection|
|
|
87
|
+
{
|
|
88
|
+
connection_id: connection_id,
|
|
89
|
+
process_id: connection.state[:process_id],
|
|
90
|
+
state: connection.state,
|
|
91
|
+
}
|
|
92
|
+
end
|
|
93
|
+
|
|
51
94
|
@monitors.each do |monitor|
|
|
52
95
|
monitor.status(call)
|
|
53
96
|
end
|
|
54
97
|
|
|
55
|
-
call.finish
|
|
98
|
+
call.finish(connections: connections)
|
|
56
99
|
end
|
|
57
100
|
|
|
58
101
|
def remove(connection)
|
|
102
|
+
if connection_id = connection.state[:connection_id]
|
|
103
|
+
@connections.delete(connection_id)
|
|
104
|
+
end
|
|
105
|
+
|
|
59
106
|
@monitors.each do |monitor|
|
|
60
107
|
monitor.remove(connection)
|
|
61
108
|
rescue => error
|
|
@@ -53,6 +53,44 @@ module Async
|
|
|
53
53
|
end
|
|
54
54
|
end
|
|
55
55
|
|
|
56
|
+
# Sample memory allocations over a time period to identify potential leaks.
|
|
57
|
+
#
|
|
58
|
+
# This method is much lighter weight than {do_memory_dump} and focuses on
|
|
59
|
+
# retained objects allocated during the sampling period. Late-lifecycle
|
|
60
|
+
# allocations that are retained are likely memory leaks.
|
|
61
|
+
#
|
|
62
|
+
# @parameter call [Connection::Call] The call to respond to.
|
|
63
|
+
# @parameter duration [Numeric] The duration in seconds to sample for (default: 10).
|
|
64
|
+
def do_memory_sample(call)
|
|
65
|
+
require "memory"
|
|
66
|
+
|
|
67
|
+
unless duration = call[:duration] and duration.positive?
|
|
68
|
+
raise ArgumentError, "Positive duration is required!"
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
Console.info(self, "Starting memory sampling...", duration: duration)
|
|
72
|
+
|
|
73
|
+
# Create a sampler to track allocations
|
|
74
|
+
sampler = Memory::Sampler.new
|
|
75
|
+
|
|
76
|
+
# Start sampling
|
|
77
|
+
sampler.start
|
|
78
|
+
|
|
79
|
+
# Sample for the specified duration
|
|
80
|
+
sleep(duration)
|
|
81
|
+
|
|
82
|
+
# Stop sampling
|
|
83
|
+
sampler.stop
|
|
84
|
+
|
|
85
|
+
Console.info(self, "Memory sampling completed, generating report...", sampler: sampler)
|
|
86
|
+
|
|
87
|
+
# Generate a report focused on retained objects (likely leaks):
|
|
88
|
+
report = sampler.report
|
|
89
|
+
call.finish(report: report.as_json)
|
|
90
|
+
ensure
|
|
91
|
+
GC.start
|
|
92
|
+
end
|
|
93
|
+
|
|
56
94
|
def do_thread_dump(call)
|
|
57
95
|
dump(call) do |file|
|
|
58
96
|
Thread.list.each do |thread|
|
|
@@ -68,11 +106,11 @@ module Async
|
|
|
68
106
|
end
|
|
69
107
|
|
|
70
108
|
def do_garbage_profile_stop(call)
|
|
71
|
-
GC::Profiler.disable
|
|
72
|
-
|
|
73
109
|
dump(connection, message) do |file|
|
|
74
110
|
file.puts GC::Profiler.result
|
|
75
111
|
end
|
|
112
|
+
ensure
|
|
113
|
+
GC::Profiler.disable
|
|
76
114
|
end
|
|
77
115
|
|
|
78
116
|
protected def connected!(connection)
|
data/readme.md
CHANGED
|
@@ -22,6 +22,14 @@ Please see the [project documentation](https://socketry.github.io/async-containe
|
|
|
22
22
|
|
|
23
23
|
Please see the [project releases](https://socketry.github.io/async-container-supervisor/releases/index) for all releases.
|
|
24
24
|
|
|
25
|
+
### v0.7.0
|
|
26
|
+
|
|
27
|
+
- If a memory leak is detected, sample memory usage for 60 seconds before exiting.
|
|
28
|
+
|
|
29
|
+
### v0.6.4
|
|
30
|
+
|
|
31
|
+
- Make client task (in supervised worker) transient, so that it doesn't keep the reactor alive unnecessarily. It also won't be stopped by default when SIGINT is received, so that the worker will remain connected to the supervisor until the worker is completely terminated.
|
|
32
|
+
|
|
25
33
|
### v0.6.3
|
|
26
34
|
|
|
27
35
|
- Add agent context documentation.
|
data/releases.md
CHANGED
|
@@ -1,5 +1,13 @@
|
|
|
1
1
|
# Releases
|
|
2
2
|
|
|
3
|
+
## v0.7.0
|
|
4
|
+
|
|
5
|
+
- If a memory leak is detected, sample memory usage for 60 seconds before exiting.
|
|
6
|
+
|
|
7
|
+
## v0.6.4
|
|
8
|
+
|
|
9
|
+
- Make client task (in supervised worker) transient, so that it doesn't keep the reactor alive unnecessarily. It also won't be stopped by default when SIGINT is received, so that the worker will remain connected to the supervisor until the worker is completely terminated.
|
|
10
|
+
|
|
3
11
|
## v0.6.3
|
|
4
12
|
|
|
5
13
|
- Add agent context documentation.
|
data.tar.gz.sig
CHANGED
|
Binary file
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: async-container-supervisor
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.7.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Samuel Williams
|
|
@@ -66,6 +66,20 @@ dependencies:
|
|
|
66
66
|
- - ">="
|
|
67
67
|
- !ruby/object:Gem::Version
|
|
68
68
|
version: '0'
|
|
69
|
+
- !ruby/object:Gem::Dependency
|
|
70
|
+
name: memory
|
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
|
72
|
+
requirements:
|
|
73
|
+
- - "~>"
|
|
74
|
+
- !ruby/object:Gem::Version
|
|
75
|
+
version: '0.7'
|
|
76
|
+
type: :runtime
|
|
77
|
+
prerelease: false
|
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
79
|
+
requirements:
|
|
80
|
+
- - "~>"
|
|
81
|
+
- !ruby/object:Gem::Version
|
|
82
|
+
version: '0.7'
|
|
69
83
|
- !ruby/object:Gem::Dependency
|
|
70
84
|
name: memory-leak
|
|
71
85
|
requirement: !ruby/object:Gem::Requirement
|
metadata.gz.sig
CHANGED
|
Binary file
|