async-service-chaos_kitty 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f3ea2295d1cc85dec15cac5131cc571726ecbc1b1ca17ece44598d47f66a6feb
4
+ data.tar.gz: 35bc5bb1c4eaf3f889413525653c931a115f71b07170ea8cd6733500207f4f83
5
+ SHA512:
6
+ metadata.gz: b41e245341e3fb7140592b7782aa8561648555db543c241e26afb0a054d0029bb99efb8b0ab0fea42ef664504a65d7003517e8bc0ccdf2212b8f4579bdc6a721
7
+ data.tar.gz: bcaa93b95324b9c6461f8abfebabc1fe55e2255d839d7db9cd0a7c2cbe58a23bd98162a339bfa932203812e32845db6a083c4101d5a764a57a381ffdbcf14ce7
data/architecture.md ADDED
@@ -0,0 +1,153 @@
1
+ # Architecture
2
+
3
+ This document describes the architecture of async-service-chaos_kitty, which follows the same pattern as async-service-supervisor.
4
+
5
+ ## Overview
6
+
7
+ ChaosKitty is a chaos monkey system for testing service resilience. It uses a client-server architecture where workers connect to a central chaos server, and various chaos operations are unleashed on the connected workers.
8
+
9
+ ## Components
10
+
11
+ ### Core Components
12
+
13
+ #### Server (`server.rb`)
14
+ The main chaos server that:
15
+ - Accepts connections from workers (victims)
16
+ - Manages chaos operations
17
+ - Coordinates between chaos operations and connected victims
18
+ - Tracks all connected victims via controllers
19
+
20
+ #### Worker (`worker.rb`)
21
+ A worker process that:
22
+ - Connects to the chaos server
23
+ - Registers itself as a victim
24
+ - Exposes victim controller methods for chaos operations
25
+ - Runs the main application logic
26
+
27
+ #### Client (`client.rb`)
28
+ Base client class for connecting to the chaos server. Extended by Worker.
29
+
30
+ ### Controllers
31
+
32
+ #### ChaosController (`chaos_controller.rb`)
33
+ Server-side controller that:
34
+ - Manages victim registration
35
+ - Provides access to victim proxies
36
+ - Handles status queries
37
+ - Tracks victim metadata (ID, process ID, connection)
38
+
39
+ #### VictimController (`victim_controller.rb`)
40
+ Client-side controller that:
41
+ - Exposes methods that can be invoked by chaos operations
42
+ - Implements chaos actions: delay, raise_error, allocate_memory, cpu_spin, trigger_gc
43
+ - Logs chaos events
44
+
45
+ ### Chaos Operations
46
+
47
+ All chaos operations follow the same pattern:
48
+ - `register(chaos_controller)`: Called when a new victim connects
49
+ - `remove(chaos_controller)`: Called when a victim disconnects
50
+ - `status()`: Returns current status
51
+ - `run()`: Starts the chaos operation loop
52
+
53
+ #### Hairball (`hairball.rb`)
54
+ Causes random delays and blocking operations.
55
+
56
+ **Parameters:**
57
+ - `interval`: How often to check for chaos opportunities
58
+ - `probability`: Chance of causing chaos (0.0-1.0)
59
+ - `min_delay`: Minimum delay duration
60
+ - `max_delay`: Maximum delay duration
61
+
62
+ **Effect:** Calls `victim.delay(duration:)` on random victims
63
+
64
+ #### Scratch (`scratch.rb`)
65
+ Randomly terminates victim processes.
66
+
67
+ **Parameters:**
68
+ - `interval`: How often to check for chaos opportunities
69
+ - `probability`: Chance of causing chaos (0.0-1.0)
70
+ - `signal`: Signal to send to process
71
+
72
+ **Effect:** Sends signal to victim's process ID
73
+
74
+ #### Floop (`floop.rb`)
75
+ Creates random memory spikes.
76
+
77
+ **Parameters:**
78
+ - `interval`: How often to check for chaos opportunities
79
+ - `probability`: Chance of causing chaos (0.0-1.0)
80
+ - `min_size_mb`: Minimum memory allocation
81
+ - `max_size_mb`: Maximum memory allocation
82
+ - `hold_duration`: How long to hold the allocation
83
+
84
+ **Effect:** Calls `victim.allocate_memory(size_mb:, hold_duration:)`
85
+
86
+ #### Zoomies (`zoomies.rb`)
87
+ Generates random CPU spikes.
88
+
89
+ **Parameters:**
90
+ - `interval`: How often to check for chaos opportunities
91
+ - `probability`: Chance of causing chaos (0.0-1.0)
92
+ - `min_duration`: Minimum CPU spin duration
93
+ - `max_duration`: Maximum CPU spin duration
94
+
95
+ **Effect:** Calls `victim.cpu_spin(duration:)`
96
+
97
+ #### Yowl (`yowl.rb`)
98
+ Raises random exceptions.
99
+
100
+ **Parameters:**
101
+ - `interval`: How often to check for chaos opportunities
102
+ - `probability`: Chance of causing chaos (0.0-1.0)
103
+ - `messages`: Array of possible error messages
104
+
105
+ **Effect:** Calls `victim.raise_error(message:)`
106
+
107
+ ## Communication Flow
108
+
109
+ 1. **Worker Startup:**
110
+ - Worker creates a connection to chaos server
111
+ - Worker creates VictimController and binds it
112
+ - Worker calls `chaos.register(victim_proxy, process_id:)`
113
+ - Server allocates ID and calls `chaos_operation.register()` for each operation
114
+
115
+ 2. **Chaos Execution:**
116
+ - Chaos operation runs in a loop at specified interval
117
+ - On each iteration, randomly selects a victim
118
+ - Checks probability to determine if chaos should occur
119
+ - Invokes remote method on victim via proxy
120
+ - Victim controller executes the chaos action
121
+
122
+ 3. **Worker Shutdown:**
123
+ - Connection closes
124
+ - Server calls `chaos_operation.remove()` for each operation
125
+ - Controller is removed from tracking
126
+
127
+ ## IPC Mechanism
128
+
129
+ - Uses Unix domain sockets for inter-process communication
130
+ - Default socket path: `chaos_kitty.ipc`
131
+ - Built on async-bus for RPC capabilities
132
+ - Supports multi-hop forwarding for proxy calls
133
+
134
+ ## Threading Model
135
+
136
+ - Built on Async framework for cooperative concurrency
137
+ - Each chaos operation runs in its own Async task
138
+ - Connection handling is concurrent
139
+ - Chaos operations execute independently
140
+
141
+ ## Comparison with async-service-supervisor
142
+
143
+ | Supervisor | ChaosKitty | Purpose |
144
+ |------------|------------|---------|
145
+ | Monitor | Chaos Operation | Watches/affects workers |
146
+ | MemoryMonitor | Floop | Memory-related |
147
+ | ProcessMonitor | Scratch | Process-related |
148
+ | Worker | Worker | Connects to server |
149
+ | SupervisorController | ChaosController | Server-side RPC |
150
+ | WorkerController | VictimController | Client-side RPC |
151
+ | Monitors health | Causes chaos | Core function |
152
+
153
+ Both systems share the same architectural pattern but serve opposite purposes: one monitors and maintains health, the other intentionally causes problems to test resilience.
@@ -0,0 +1,113 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "async/bus/controller"
7
+
8
+ module Async
9
+ module Service
10
+ module ChaosKitty
11
+ # Controller for chaos operations.
12
+ #
13
+ # Handles registration of victims, victim lookup, and status queries.
14
+ class ChaosController < Async::Bus::Controller
15
+ def initialize(server, connection)
16
+ @server = server
17
+ @connection = connection
18
+
19
+ @id = nil
20
+ @process_id = nil
21
+ @victim = nil
22
+ end
23
+
24
+ # @attribute [Server] The server instance.
25
+ attr :server
26
+
27
+ # @attribute [Connection] The connection instance.
28
+ attr :connection
29
+
30
+ # @attribute [Integer] The ID assigned to this victim.
31
+ attr :id
32
+
33
+ # @attribute [Integer] The process ID of the victim.
34
+ attr :process_id
35
+
36
+ # @attribute [Proxy] The proxy to the victim controller.
37
+ attr :victim
38
+
39
+ # Register a victim connection with the chaos server.
40
+ #
41
+ # Allocates a unique sequential ID, stores the victim controller proxy,
42
+ # and notifies all chaos operations of the new connection.
43
+ #
44
+ # @parameter victim [Proxy] The proxy to the victim controller.
45
+ # @parameter process_id [Integer] The process ID of the victim.
46
+ # @returns [Integer] The connection ID assigned to the victim.
47
+ def register(victim, process_id:)
48
+ raise RuntimeError, "Already registered" if @id
49
+
50
+ @id = @server.next_id
51
+ @process_id = process_id
52
+ @victim = victim
53
+
54
+ @server.add(self)
55
+
56
+ return @id
57
+ end
58
+
59
+ # Get a victim controller proxy by connection ID.
60
+ #
61
+ # Returns a proxy to the victim controller that can be used to invoke
62
+ # operations directly on the victim. The proxy uses multi-hop forwarding
63
+ # to route calls through the chaos server to the victim.
64
+ #
65
+ # @parameter id [Integer] The ID of the victim.
66
+ # @returns [Proxy] A proxy to the victim controller.
67
+ # @raises [ArgumentError] If the connection ID is not found.
68
+ def [](id)
69
+ unless id
70
+ raise ArgumentError, "Missing 'id' parameter"
71
+ end
72
+
73
+ chaos_controller = @server.controllers[id]
74
+
75
+ unless chaos_controller
76
+ raise ArgumentError, "Connection not found: #{id}"
77
+ end
78
+
79
+ victim = chaos_controller.victim
80
+
81
+ unless victim
82
+ raise ArgumentError, "Victim controller not found for connection: #{id}"
83
+ end
84
+
85
+ return victim
86
+ end
87
+
88
+ # List all registered victim IDs.
89
+ #
90
+ # @returns [Array(Integer)] An array of IDs for all registered victims.
91
+ def keys
92
+ @server.controllers.keys
93
+ end
94
+
95
+ # Query the status of the chaos server and all connected victims.
96
+ #
97
+ # Returns an array of status information from each chaos operation.
98
+ # Each chaos operation provides its own status representation.
99
+ #
100
+ # @returns [Array] An array of status information from each chaos operation.
101
+ def status
102
+ @server.chaos_operations.map do |chaos|
103
+ begin
104
+ chaos.status
105
+ rescue => error
106
+ error
107
+ end
108
+ end.compact
109
+ end
110
+ end
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "async/bus/client"
7
+
8
+ module Async
9
+ module Service
10
+ module ChaosKitty
11
+ # A client provides a mechanism to connect to a chaos server in order to execute operations.
12
+ class Client < Async::Bus::Client
13
+ # Initialize a new client.
14
+ #
15
+ # @parameter endpoint [IO::Endpoint] The chaos endpoint to connect to.
16
+ def initialize(endpoint: ChaosKitty.endpoint, **options)
17
+ super(endpoint, **options)
18
+ end
19
+ end
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "io/endpoint/unix_endpoint"
7
+
8
+ module Async
9
+ module Service
10
+ module ChaosKitty
11
+ # Get the chaos kitty IPC endpoint.
12
+ #
13
+ # @parameter path [String] The path for the Unix socket (default: "chaos_kitty.ipc").
14
+ # @returns [IO::Endpoint] The Unix socket endpoint.
15
+ def self.endpoint(path = "chaos_kitty.ipc")
16
+ ::IO::Endpoint.unix(path)
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,104 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "set"
7
+ require_relative "loop"
8
+
9
+ module Async
10
+ module Service
11
+ module ChaosKitty
12
+ # Floop causes random memory spikes in victim processes.
13
+ #
14
+ # Like a cat flopping over dramatically, this chaos operation randomly
15
+ # allocates large amounts of memory to test memory handling and limits.
16
+ class Floop
17
+ # Create a new floop chaos operation.
18
+ #
19
+ # @parameter interval [Integer] How often to check for chaos opportunities.
20
+ # @parameter probability [Float] Probability (0.0 to 1.0) of causing chaos on each check.
21
+ # @parameter min_size_mb [Integer] Minimum memory allocation in megabytes.
22
+ # @parameter max_size_mb [Integer] Maximum memory allocation in megabytes.
23
+ # @parameter hold_duration [Numeric] How long to hold the allocation.
24
+ def initialize(interval: 30, probability: 0.2, min_size_mb: 10, max_size_mb: 100, hold_duration: 2)
25
+ @interval = interval
26
+ @probability = probability
27
+ @min_size_mb = min_size_mb
28
+ @max_size_mb = max_size_mb
29
+ @hold_duration = hold_duration
30
+ @victims = Set.new.compare_by_identity
31
+ end
32
+
33
+ # @attribute [Set] The set of registered victims.
34
+ attr_reader :victims
35
+
36
+ # Register a victim with the floop chaos.
37
+ #
38
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
39
+ def register(chaos_controller)
40
+ Console.debug(self, "😺 Registering victim for floop chaos.", id: chaos_controller.id)
41
+ @victims.add(chaos_controller)
42
+ end
43
+
44
+ # Remove a victim from the floop chaos.
45
+ #
46
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
47
+ def remove(chaos_controller)
48
+ @victims.delete(chaos_controller)
49
+ end
50
+
51
+ # Get status for the floop chaos.
52
+ #
53
+ # @returns [Hash] Status including victim count and configuration.
54
+ def status
55
+ {
56
+ floop: {
57
+ victims: @victims.size,
58
+ probability: @probability,
59
+ size_range_mb: [@min_size_mb, @max_size_mb],
60
+ hold_duration: @hold_duration
61
+ }
62
+ }
63
+ end
64
+
65
+ # Unleash a floop on a random victim.
66
+ def unleash_floop
67
+ return if @victims.empty?
68
+
69
+ # Pick a random victim
70
+ victim = @victims.to_a.sample
71
+ return unless victim
72
+
73
+ # Check probability
74
+ return unless rand < @probability
75
+
76
+ # Calculate random size
77
+ size_mb = @min_size_mb + rand(@max_size_mb - @min_size_mb)
78
+
79
+ Console.info(self, "😾 *FLOOP* Memory spike incoming!", id: victim.id, size_mb: size_mb)
80
+
81
+ begin
82
+ victim_proxy = victim.connection[:victim]
83
+ if victim_proxy
84
+ victim_proxy.allocate_memory(size_mb: size_mb, hold_duration: @hold_duration)
85
+ end
86
+ rescue => error
87
+ Console.error(self, "Failed to unleash floop!", id: victim.id, exception: error)
88
+ end
89
+ end
90
+
91
+ # Run the floop chaos operation.
92
+ #
93
+ # @returns [Async::Task] The task that is running the floop chaos.
94
+ def run
95
+ Async do
96
+ Loop.run(interval: @interval) do
97
+ unleash_floop
98
+ end
99
+ end
100
+ end
101
+ end
102
+ end
103
+ end
104
+ end
@@ -0,0 +1,101 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "set"
7
+ require_relative "loop"
8
+
9
+ module Async
10
+ module Service
11
+ module ChaosKitty
12
+ # Hairball causes random delays and blocking in victim processes.
13
+ #
14
+ # Like a cat hacking up a hairball, this chaos operation randomly
15
+ # blocks victims, simulating slow responses or stuck operations.
16
+ class Hairball
17
+ # Create a new hairball chaos operation.
18
+ #
19
+ # @parameter interval [Integer] How often to check for chaos opportunities.
20
+ # @parameter probability [Float] Probability (0.0 to 1.0) of causing chaos on each check.
21
+ # @parameter min_delay [Numeric] Minimum delay duration in seconds.
22
+ # @parameter max_delay [Numeric] Maximum delay duration in seconds.
23
+ def initialize(interval: 30, probability: 0.3, min_delay: 0.5, max_delay: 5.0)
24
+ @interval = interval
25
+ @probability = probability
26
+ @min_delay = min_delay
27
+ @max_delay = max_delay
28
+ @victims = Set.new.compare_by_identity
29
+ end
30
+
31
+ # @attribute [Set] The set of registered victims.
32
+ attr_reader :victims
33
+
34
+ # Register a victim with the hairball chaos.
35
+ #
36
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
37
+ def register(chaos_controller)
38
+ Console.debug(self, "😺 Registering victim for hairball chaos.", id: chaos_controller.id)
39
+ @victims.add(chaos_controller)
40
+ end
41
+
42
+ # Remove a victim from the hairball chaos.
43
+ #
44
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
45
+ def remove(chaos_controller)
46
+ @victims.delete(chaos_controller)
47
+ end
48
+
49
+ # Get status for the hairball chaos.
50
+ #
51
+ # @returns [Hash] Status including victim count and configuration.
52
+ def status
53
+ {
54
+ hairball: {
55
+ victims: @victims.size,
56
+ probability: @probability,
57
+ delay_range: [@min_delay, @max_delay]
58
+ }
59
+ }
60
+ end
61
+
62
+ # Unleash a hairball on a random victim.
63
+ def unleash_hairball
64
+ return if @victims.empty?
65
+
66
+ # Pick a random victim
67
+ victim = @victims.to_a.sample
68
+ return unless victim
69
+
70
+ # Check probability
71
+ return unless rand < @probability
72
+
73
+ # Calculate random delay
74
+ delay = @min_delay + rand * (@max_delay - @min_delay)
75
+
76
+ Console.info(self, "😾 *HACK* *HACK* Hairball time!", id: victim.id, delay: delay)
77
+
78
+ begin
79
+ victim_proxy = victim.connection[:victim]
80
+ if victim_proxy
81
+ victim_proxy.delay(duration: delay)
82
+ end
83
+ rescue => error
84
+ Console.error(self, "Failed to unleash hairball!", id: victim.id, exception: error)
85
+ end
86
+ end
87
+
88
+ # Run the hairball chaos operation.
89
+ #
90
+ # @returns [Async::Task] The task that is running the hairball chaos.
91
+ def run
92
+ Async do
93
+ Loop.run(interval: @interval) do
94
+ unleash_hairball
95
+ end
96
+ end
97
+ end
98
+ end
99
+ end
100
+ end
101
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ module Async
7
+ module Service
8
+ module ChaosKitty
9
+ # A helper for running loops at aligned intervals.
10
+ module Loop
11
+ # A robust loop that executes a block at aligned intervals.
12
+ #
13
+ # The alignment is modulo the current clock in seconds.
14
+ #
15
+ # If an error occurs during the execution of the block, it is logged and the loop continues.
16
+ #
17
+ # @parameter interval [Integer] The interval in seconds between executions of the block.
18
+ def self.run(interval: 60, &block)
19
+ while true
20
+ # Compute the wait time to the next interval:
21
+ wait = interval - (Time.now.to_f % interval)
22
+ if wait.positive?
23
+ # Sleep until the next interval boundary:
24
+ sleep(wait)
25
+ end
26
+
27
+ begin
28
+ yield
29
+ rescue => error
30
+ Console.error(self, "Loop error:", error)
31
+ end
32
+ end
33
+ end
34
+ end
35
+
36
+ private_constant :Loop
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,98 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Released under the MIT License.
4
+ # Copyright, 2026, by Samuel Williams.
5
+
6
+ require "set"
7
+ require_relative "loop"
8
+
9
+ module Async
10
+ module Service
11
+ module ChaosKitty
12
+ # Scratch randomly kills victim processes.
13
+ #
14
+ # Like a cat scratching furniture, this chaos operation randomly
15
+ # terminates victim processes to test resilience and recovery.
16
+ class Scratch
17
+ # Create a new scratch chaos operation.
18
+ #
19
+ # @parameter interval [Integer] How often to check for chaos opportunities.
20
+ # @parameter probability [Float] Probability (0.0 to 1.0) of causing chaos on each check.
21
+ # @parameter signal [Symbol] The signal to send when scratching.
22
+ def initialize(interval: 60, probability: 0.1, signal: :TERM)
23
+ @interval = interval
24
+ @probability = probability
25
+ @signal = signal
26
+ @victims = Set.new.compare_by_identity
27
+ end
28
+
29
+ # @attribute [Set] The set of registered victims.
30
+ attr_reader :victims
31
+
32
+ # Register a victim with the scratch chaos.
33
+ #
34
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
35
+ def register(chaos_controller)
36
+ Console.debug(self, "😺 Registering victim for scratch chaos.", id: chaos_controller.id)
37
+ @victims.add(chaos_controller)
38
+ end
39
+
40
+ # Remove a victim from the scratch chaos.
41
+ #
42
+ # @parameter chaos_controller [ChaosController] The chaos controller for the victim.
43
+ def remove(chaos_controller)
44
+ @victims.delete(chaos_controller)
45
+ end
46
+
47
+ # Get status for the scratch chaos.
48
+ #
49
+ # @returns [Hash] Status including victim count and configuration.
50
+ def status
51
+ {
52
+ scratch: {
53
+ victims: @victims.size,
54
+ probability: @probability,
55
+ signal: @signal
56
+ }
57
+ }
58
+ end
59
+
60
+ # Unleash a scratch on a random victim.
61
+ def unleash_scratch
62
+ return if @victims.empty?
63
+
64
+ # Pick a random victim
65
+ victim = @victims.to_a.sample
66
+ return unless victim
67
+
68
+ # Check probability
69
+ return unless rand < @probability
70
+
71
+ process_id = victim.process_id
72
+ return unless process_id
73
+
74
+ Console.info(self, "😾 *SCRATCH* Taking down a victim!", id: victim.id, process_id: process_id, signal: @signal)
75
+
76
+ begin
77
+ Process.kill(@signal, process_id)
78
+ rescue Errno::ESRCH
79
+ Console.warn(self, "Process already gone!", process_id: process_id)
80
+ rescue => error
81
+ Console.error(self, "Failed to scratch victim!", process_id: process_id, exception: error)
82
+ end
83
+ end
84
+
85
+ # Run the scratch chaos operation.
86
+ #
87
+ # @returns [Async::Task] The task that is running the scratch chaos.
88
+ def run
89
+ Async do
90
+ Loop.run(interval: @interval) do
91
+ unleash_scratch
92
+ end
93
+ end
94
+ end
95
+ end
96
+ end
97
+ end
98
+ end