semian 0.27.1 → 0.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +71 -0
- data/lib/semian/adapter.rb +1 -1
- data/lib/semian/adaptive_circuit_breaker.rb +136 -0
- data/lib/semian/circuit_breaker.rb +25 -23
- data/lib/semian/circuit_breaker_behaviour.rb +64 -0
- data/lib/semian/configuration_validator.rb +1 -0
- data/lib/semian/dual_circuit_breaker.rb +165 -0
- data/lib/semian/mysql2.rb +2 -2
- data/lib/semian/net_http.rb +3 -3
- data/lib/semian/pid_controller.rb +217 -0
- data/lib/semian/pid_controller_thread.rb +72 -0
- data/lib/semian/protected_resource.rb +1 -1
- data/lib/semian/simple_exponential_smoother.rb +137 -0
- data/lib/semian/unprotected_resource.rb +3 -3
- data/lib/semian/version.rb +1 -1
- data/lib/semian.rb +64 -4
- metadata +8 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: c6423964e3bf474c3f1c6c31ab4c52a14fc28a35d51d91480c537f46f0c1e5f2
|
|
4
|
+
data.tar.gz: a7ec4ad1154ceef88bdb3dd2bbd40d419d8dc3162a81a17804afdd9573d380be
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ccc4f1217740efde74e84acf26c2f73de7ac9c088575ac03d7be5408bc1c2711cd95ffd27a9b524071db95de3ce24ad3673b5a4f960a4d96b6b6c1c083c85183
|
|
7
|
+
data.tar.gz: 1f94bc2139a6397ba5a963a230496ab694f98cb1ae29e42b69017972d2268afc97c8fdde0d49ffd8d6514036199c15a5d62f67439a6790ed9d4696771ddf7b8e
|
data/README.md
CHANGED
|
@@ -607,6 +607,77 @@ It is possible to disable Circuit Breaker with environment variable
|
|
|
607
607
|
For more information about configuring these parameters, please read
|
|
608
608
|
[this post](https://shopify.engineering/circuit-breaker-misconfigured).
|
|
609
609
|
|
|
610
|
+
#### Adaptive Circuit Breaker (Experimental)
|
|
611
|
+
|
|
612
|
+
Semian also includes an experimental adaptive circuit breaker that uses a [PID controller](https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller)
|
|
613
|
+
to dynamically adjust the rejection rate based on real-time error rates. Unlike the
|
|
614
|
+
traditional circuit breaker with fixed thresholds, the adaptive circuit breaker continuously
|
|
615
|
+
monitors error rates and adjusts its behavior accordingly.
|
|
616
|
+
|
|
617
|
+
##### How It Works
|
|
618
|
+
|
|
619
|
+
The adaptive circuit breaker has two components:
|
|
620
|
+
|
|
621
|
+
1. An ideal error rate estimator that determines when the service is starting to become unhealthy
|
|
622
|
+
2. A PID controller that opens the circuit fully or partially based on how bad the situation is.
|
|
623
|
+
|
|
624
|
+
The ideal error rate estimator uses a "simple exponential smoother", which means it simply takes the average error rate
|
|
625
|
+
that it observes as the ideal. With the following caveat:
|
|
626
|
+
|
|
627
|
+
1. It ignores any data that is too high from its calculations. For example, we know that 20% error rate is an anamolous
|
|
628
|
+
observation so we ignore it.
|
|
629
|
+
1. It starts with an educated guess about the ideal error rate,
|
|
630
|
+
and then converges down quickly if it observes a lower error rate, and slowly if it observes a higher error rate.
|
|
631
|
+
1. After 30 minutes, it becomes more confident of its guess, and thus converges even slower in either directions.
|
|
632
|
+
|
|
633
|
+
The PID controller uses the following equation to determine whether to open or close the circuit:
|
|
634
|
+
|
|
635
|
+
```
|
|
636
|
+
P = (error_rate - ideal_error_rate) - (1 - (error_rate - ideal_error_rate)) * rejection_rate
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
Or, more simply, if you define `delta_error = error_rate - ideal_error_rate` then:
|
|
640
|
+
|
|
641
|
+
```
|
|
642
|
+
P = delta_error - (1 - delta_error) * rejection_rate
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
In simple terms: This equation says: open more when the error rate is higher than the rejection rate,
|
|
646
|
+
and less when the opposite. The multiplier of `(1 - delta_error)` is called the aggressiveness multiplier.
|
|
647
|
+
It allows the circuit to open more aggressively depending on how bad the situation is.
|
|
648
|
+
|
|
649
|
+
This P is fed into a typical PID equation, and is used to control the rejection rate of the circuit breaker.
|
|
650
|
+
|
|
651
|
+
##### Adaptive Circuit Breaker Configuration
|
|
652
|
+
|
|
653
|
+
To enable the adaptive circuit breaker, simply set **adaptive_circuit_breaker** to true.
|
|
654
|
+
|
|
655
|
+
Example configuration:
|
|
656
|
+
```ruby
|
|
657
|
+
Semian.register(
|
|
658
|
+
:my_service,
|
|
659
|
+
adaptive_circuit_breaker: true, # Use adaptive instead of traditional
|
|
660
|
+
bulkhead: false # Can be combined with bulkhead
|
|
661
|
+
)
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
**Note**: When `adaptive_circuit_breaker: true` is set, traditional circuit breaker
|
|
665
|
+
parameters (`error_threshold`, `error_timeout`, etc.) are ignored.
|
|
666
|
+
|
|
667
|
+
|
|
668
|
+
We **_highly_** recommend just setting that configuration and not any other.
|
|
669
|
+
One of the main goals of the adaptive circuit breaker is that it "just works".
|
|
670
|
+
Configuring it might be difficult and not provide much value. That said, here are the configurations you can set:
|
|
671
|
+
* **kp:** The contribution of P in the PID equation. Increasing it means you react more quickly to the latest data. Defaults to 1.0
|
|
672
|
+
* **ki**: The contribution of the integral in the PID equation. Increasing it means adding more "memory", which is useful to ignoring noise. Defaults to 0.2
|
|
673
|
+
* **kd**: The contribution of the derivative in the PID equation. Its behaviour can be complex because of our complex P equation. Defaults to 0.0
|
|
674
|
+
* **integral_upper_cap**: Maximum value of the integral, prevents integral windup. Default to 10.0
|
|
675
|
+
* **integral_lower_cap**: Minimum value of the integral, prevents integral windup. Default to -10.0
|
|
676
|
+
* **window_size**: How many seconds of observations to take into account. Note that this window is a sliding window of 1 second sliding interval. To control the sliding interval you should set the environment variable SEMIAN_ADAPTIVE_CIRCUIT_BREAKER_SLIDING_INTERVAL (shared among all adaptive circuit breakers). window_size default to 10 seconds
|
|
677
|
+
* **dead_zone_ratio**: An error percentage above the ideal_error_rate to ignore. This helps remove noise. Defaults to 0.25
|
|
678
|
+
* **initial_error_rate**: The guess to start with for the ideal error rate. Defaults to 0.05 (5%)
|
|
679
|
+
* **ideal_error_rate_estimator_cap_value**: The value above which we ignore observations for the ideal error rate. Defaults to 0.1 (10%)
|
|
680
|
+
|
|
610
681
|
### Bulkheading
|
|
611
682
|
|
|
612
683
|
For some applications, circuit breakers are not enough. This is best illustrated
|
data/lib/semian/adapter.rb
CHANGED
|
@@ -45,7 +45,7 @@ module Semian
|
|
|
45
45
|
end
|
|
46
46
|
rescue ::Semian::OpenCircuitError => error
|
|
47
47
|
last_error = semian_resource.circuit_breaker.last_error
|
|
48
|
-
message = "#{error.message} caused by #{last_error
|
|
48
|
+
message = "#{error.message} caused by #{last_error&.message}"
|
|
49
49
|
last_error = nil unless last_error.is_a?(Exception) # Net::HTTPServerError is not an exception
|
|
50
50
|
raise self.class::CircuitOpenError.new(semian_identifier, message), cause: last_error
|
|
51
51
|
rescue ::Semian::BaseError => error
|
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "circuit_breaker_behaviour"
|
|
4
|
+
require_relative "pid_controller_thread"
|
|
5
|
+
|
|
6
|
+
module Semian
|
|
7
|
+
# Adaptive Circuit Breaker that uses PID controller for dynamic rejection
|
|
8
|
+
class AdaptiveCircuitBreaker
|
|
9
|
+
include CircuitBreakerBehaviour
|
|
10
|
+
|
|
11
|
+
attr_reader :pid_controller, :update_thread, :sliding_interval, :pid_controller_thread, :stopped
|
|
12
|
+
|
|
13
|
+
@pid_controller_thread = nil
|
|
14
|
+
|
|
15
|
+
def initialize(name:, exceptions:, kp:, ki:, kd:, window_size:, initial_error_rate:, implementation:,
|
|
16
|
+
sliding_interval:, dead_zone_ratio:, ideal_error_rate_estimator_cap_value:, integral_upper_cap:,
|
|
17
|
+
integral_lower_cap:)
|
|
18
|
+
initialize_behaviour(name: name)
|
|
19
|
+
|
|
20
|
+
@exceptions = exceptions
|
|
21
|
+
@stopped = false
|
|
22
|
+
|
|
23
|
+
@pid_controller = implementation::PIDController.new(
|
|
24
|
+
kp: kp,
|
|
25
|
+
ki: ki,
|
|
26
|
+
kd: kd,
|
|
27
|
+
window_size: window_size,
|
|
28
|
+
implementation: implementation,
|
|
29
|
+
sliding_interval: sliding_interval,
|
|
30
|
+
initial_error_rate: initial_error_rate,
|
|
31
|
+
dead_zone_ratio: dead_zone_ratio,
|
|
32
|
+
ideal_error_rate_estimator_cap_value: ideal_error_rate_estimator_cap_value,
|
|
33
|
+
integral_upper_cap: integral_upper_cap,
|
|
34
|
+
integral_lower_cap: integral_lower_cap,
|
|
35
|
+
)
|
|
36
|
+
|
|
37
|
+
@pid_controller_thread = PIDControllerThread.instance.register_resource(self)
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def acquire(resource = nil, scope: nil, adapter: nil, &block)
|
|
41
|
+
unless request_allowed?
|
|
42
|
+
mark_rejected(scope:, adapter:)
|
|
43
|
+
raise OpenCircuitError, "Rejected by adaptive circuit breaker"
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
result = nil
|
|
47
|
+
begin
|
|
48
|
+
result = block.call
|
|
49
|
+
rescue *@exceptions => error
|
|
50
|
+
if !error.respond_to?(:marks_semian_circuits?) || error.marks_semian_circuits?
|
|
51
|
+
mark_failed(error, scope:, adapter:)
|
|
52
|
+
end
|
|
53
|
+
raise error
|
|
54
|
+
else
|
|
55
|
+
mark_success(scope:, adapter:)
|
|
56
|
+
end
|
|
57
|
+
result
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def reset(scope: nil, adapter: nil)
|
|
61
|
+
@last_error = nil
|
|
62
|
+
@pid_controller.reset
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def stop
|
|
66
|
+
destroy
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def destroy
|
|
70
|
+
@stopped = true
|
|
71
|
+
PIDControllerThread.instance.unregister_resource(self)
|
|
72
|
+
@pid_controller.reset
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def metrics
|
|
76
|
+
@pid_controller.metrics
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
def open?
|
|
80
|
+
@pid_controller.rejection_rate == 1
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
def closed?
|
|
84
|
+
@pid_controller.rejection_rate == 0
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
# Compatibility with ProtectedResource - Adaptive circuit breaker does not have a half open state
|
|
88
|
+
def half_open?
|
|
89
|
+
!open? && !closed?
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def mark_failed(error, scope: nil, adapter: nil)
|
|
93
|
+
@last_error = error
|
|
94
|
+
@pid_controller.record_request(:error)
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def mark_success(scope: nil, adapter: nil)
|
|
98
|
+
@pid_controller.record_request(:success)
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
def mark_rejected(scope: nil, adapter: nil)
|
|
102
|
+
@pid_controller.record_request(:rejected)
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def request_allowed?
|
|
106
|
+
!@pid_controller.should_reject?
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
def in_use?
|
|
110
|
+
true
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
def pid_controller_update
|
|
114
|
+
@pid_controller.update
|
|
115
|
+
notify_metrics_update(@pid_controller.metrics(full: false))
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
private
|
|
119
|
+
|
|
120
|
+
def notify_metrics_update(metrics)
|
|
121
|
+
Semian.notify(
|
|
122
|
+
:adaptive_update,
|
|
123
|
+
self,
|
|
124
|
+
nil,
|
|
125
|
+
nil,
|
|
126
|
+
rejection_rate: metrics[:rejection_rate],
|
|
127
|
+
error_rate: metrics[:error_rate],
|
|
128
|
+
ideal_error_rate: metrics[:ideal_error_rate],
|
|
129
|
+
p_value: metrics[:p_value],
|
|
130
|
+
integral: metrics[:integral],
|
|
131
|
+
derivative: metrics[:derivative],
|
|
132
|
+
previous_p_value: metrics[:previous_p_value],
|
|
133
|
+
)
|
|
134
|
+
end
|
|
135
|
+
end
|
|
136
|
+
end
|
|
@@ -1,17 +1,18 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require_relative "circuit_breaker_behaviour"
|
|
4
|
+
|
|
3
5
|
module Semian
|
|
4
6
|
class CircuitBreaker # :nodoc:
|
|
7
|
+
include CircuitBreakerBehaviour
|
|
5
8
|
extend Forwardable
|
|
6
9
|
|
|
7
10
|
def_delegators :@state, :closed?, :open?, :half_open?
|
|
8
11
|
|
|
9
12
|
attr_reader(
|
|
10
|
-
:name,
|
|
11
13
|
:half_open_resource_timeout,
|
|
12
14
|
:error_timeout,
|
|
13
15
|
:state,
|
|
14
|
-
:last_error,
|
|
15
16
|
:error_threshold_timeout_enabled,
|
|
16
17
|
:exponential_backoff_error_timeout,
|
|
17
18
|
:exponential_backoff_initial_timeout,
|
|
@@ -23,13 +24,14 @@ module Semian
|
|
|
23
24
|
error_threshold_timeout: nil, error_threshold_timeout_enabled: true,
|
|
24
25
|
lumping_interval: 0, exponential_backoff_error_timeout: false,
|
|
25
26
|
exponential_backoff_initial_timeout: 1, exponential_backoff_multiplier: 2)
|
|
26
|
-
|
|
27
|
+
initialize_behaviour(name: name)
|
|
28
|
+
|
|
29
|
+
@exceptions = exceptions
|
|
27
30
|
@success_count_threshold = success_threshold
|
|
28
31
|
@error_count_threshold = error_threshold
|
|
29
32
|
@error_threshold_timeout = error_threshold_timeout || error_timeout
|
|
30
33
|
@error_threshold_timeout_enabled = error_threshold_timeout_enabled.nil? ? true : error_threshold_timeout_enabled
|
|
31
34
|
@error_timeout = error_timeout
|
|
32
|
-
@exceptions = exceptions
|
|
33
35
|
@half_open_resource_timeout = half_open_resource_timeout
|
|
34
36
|
@lumping_interval = lumping_interval
|
|
35
37
|
@exponential_backoff_error_timeout = exponential_backoff_error_timeout
|
|
@@ -44,8 +46,8 @@ module Semian
|
|
|
44
46
|
reset
|
|
45
47
|
end
|
|
46
48
|
|
|
47
|
-
def acquire(resource = nil, &block)
|
|
48
|
-
transition_to_half_open if transition_to_half_open?
|
|
49
|
+
def acquire(resource = nil, scope: nil, adapter: nil, &block)
|
|
50
|
+
transition_to_half_open(scope: scope, adapter: adapter) if transition_to_half_open?
|
|
49
51
|
|
|
50
52
|
raise OpenCircuitError unless request_allowed?
|
|
51
53
|
|
|
@@ -54,11 +56,11 @@ module Semian
|
|
|
54
56
|
result = maybe_with_half_open_resource_timeout(resource, &block)
|
|
55
57
|
rescue *@exceptions => error
|
|
56
58
|
if !error.respond_to?(:marks_semian_circuits?) || error.marks_semian_circuits?
|
|
57
|
-
mark_failed(error)
|
|
59
|
+
mark_failed(error, scope: scope, adapter: adapter)
|
|
58
60
|
end
|
|
59
61
|
raise error
|
|
60
62
|
else
|
|
61
|
-
mark_success
|
|
63
|
+
mark_success(scope: scope, adapter: adapter)
|
|
62
64
|
end
|
|
63
65
|
result
|
|
64
66
|
end
|
|
@@ -71,26 +73,26 @@ module Semian
|
|
|
71
73
|
closed? || half_open? || transition_to_half_open?
|
|
72
74
|
end
|
|
73
75
|
|
|
74
|
-
def mark_failed(error)
|
|
76
|
+
def mark_failed(error, scope: nil, adapter: nil)
|
|
75
77
|
push_error(error)
|
|
76
78
|
if closed?
|
|
77
|
-
transition_to_open if error_threshold_reached?
|
|
79
|
+
transition_to_open(scope: scope, adapter: adapter) if error_threshold_reached?
|
|
78
80
|
elsif half_open?
|
|
79
|
-
transition_to_open
|
|
81
|
+
transition_to_open(scope: scope, adapter: adapter)
|
|
80
82
|
end
|
|
81
83
|
end
|
|
82
84
|
|
|
83
|
-
def mark_success
|
|
85
|
+
def mark_success(scope: nil, adapter: nil)
|
|
84
86
|
return unless half_open?
|
|
85
87
|
|
|
86
88
|
@successes.increment
|
|
87
|
-
transition_to_close if success_threshold_reached?
|
|
89
|
+
transition_to_close(scope: scope, adapter: adapter) if success_threshold_reached?
|
|
88
90
|
end
|
|
89
91
|
|
|
90
|
-
def reset
|
|
92
|
+
def reset(scope: nil, adapter: nil)
|
|
91
93
|
@errors.clear
|
|
92
94
|
@successes.reset
|
|
93
|
-
transition_to_close
|
|
95
|
+
transition_to_close(scope: scope, adapter: adapter)
|
|
94
96
|
end
|
|
95
97
|
|
|
96
98
|
def destroy
|
|
@@ -105,8 +107,8 @@ module Semian
|
|
|
105
107
|
|
|
106
108
|
private
|
|
107
109
|
|
|
108
|
-
def transition_to_close
|
|
109
|
-
notify_state_transition(:closed)
|
|
110
|
+
def transition_to_close(scope: nil, adapter: nil)
|
|
111
|
+
notify_state_transition(:closed, scope: scope, adapter: adapter)
|
|
110
112
|
log_state_transition(:closed)
|
|
111
113
|
@state.close!
|
|
112
114
|
@errors.clear
|
|
@@ -114,14 +116,14 @@ module Semian
|
|
|
114
116
|
@current_error_timeout = @exponential_backoff_error_timeout ? @exponential_backoff_initial_timeout : @error_timeout
|
|
115
117
|
end
|
|
116
118
|
|
|
117
|
-
def transition_to_open
|
|
118
|
-
notify_state_transition(:open)
|
|
119
|
+
def transition_to_open(scope: nil, adapter: nil)
|
|
120
|
+
notify_state_transition(:open, scope: scope, adapter: adapter)
|
|
119
121
|
log_state_transition(:open)
|
|
120
122
|
@state.open!
|
|
121
123
|
end
|
|
122
124
|
|
|
123
|
-
def transition_to_half_open
|
|
124
|
-
notify_state_transition(:half_open)
|
|
125
|
+
def transition_to_half_open(scope: nil, adapter: nil)
|
|
126
|
+
notify_state_transition(:half_open, scope: scope, adapter: adapter)
|
|
125
127
|
log_state_transition(:half_open)
|
|
126
128
|
@state.half_open!
|
|
127
129
|
@successes.reset
|
|
@@ -178,8 +180,8 @@ module Semian
|
|
|
178
180
|
Semian.logger.info(str)
|
|
179
181
|
end
|
|
180
182
|
|
|
181
|
-
def notify_state_transition(new_state)
|
|
182
|
-
Semian.notify(:state_change, self,
|
|
183
|
+
def notify_state_transition(new_state, scope: nil, adapter: nil)
|
|
184
|
+
Semian.notify(:state_change, self, scope, adapter, state: new_state)
|
|
183
185
|
end
|
|
184
186
|
|
|
185
187
|
def maybe_with_half_open_resource_timeout(resource, &block)
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Semian
|
|
4
|
+
module CircuitBreakerBehaviour
|
|
5
|
+
attr_reader :name, :last_error
|
|
6
|
+
attr_accessor :exceptions
|
|
7
|
+
|
|
8
|
+
# Initialize common circuit breaker attributes
|
|
9
|
+
def initialize_behaviour(name:)
|
|
10
|
+
@name = name.to_sym
|
|
11
|
+
@last_error = nil
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
# Main method to execute a block with circuit breaker protection
|
|
15
|
+
def acquire(resource = nil, scope: nil, adapter: nil, &block)
|
|
16
|
+
raise NotImplementedError, "#{self.class} must implement #acquire"
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
# Reset the circuit breaker to its initial state
|
|
20
|
+
def reset(scope: nil, adapter: nil)
|
|
21
|
+
raise NotImplementedError, "#{self.class} must implement #reset"
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
# Clean up resources
|
|
25
|
+
def destroy
|
|
26
|
+
raise NotImplementedError, "#{self.class} must implement #destroy"
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
# Check if the circuit is open (rejecting requests)
|
|
30
|
+
def open?
|
|
31
|
+
raise NotImplementedError, "#{self.class} must implement #open?"
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
# Check if the circuit is closed (allowing requests)
|
|
35
|
+
def closed?
|
|
36
|
+
raise NotImplementedError, "#{self.class} must implement #closed?"
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
# Check if the circuit is half-open (testing if service recovered)
|
|
40
|
+
def half_open?
|
|
41
|
+
raise NotImplementedError, "#{self.class} must implement #half_open?"
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
# Check if requests are currently allowed
|
|
45
|
+
def request_allowed?
|
|
46
|
+
raise NotImplementedError, "#{self.class} must implement #request_allowed?"
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
# Mark a request as failed
|
|
50
|
+
def mark_failed(error, scope: nil, adapter: nil)
|
|
51
|
+
raise NotImplementedError, "#{self.class} must implement #mark_failed"
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
# Mark a request as successful
|
|
55
|
+
def mark_success(scope: nil, adapter: nil)
|
|
56
|
+
raise NotImplementedError, "#{self.class} must implement #mark_success"
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
# Check if the circuit breaker is actively tracking failures
|
|
60
|
+
def in_use?
|
|
61
|
+
raise NotImplementedError, "#{self.class} must implement #in_use?"
|
|
62
|
+
end
|
|
63
|
+
end
|
|
64
|
+
end
|
|
@@ -66,6 +66,7 @@ module Semian
|
|
|
66
66
|
def validate_circuit_breaker_configuration!
|
|
67
67
|
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
|
68
68
|
return unless @configuration.fetch(:circuit_breaker, true)
|
|
69
|
+
return if @configuration[:adaptive_circuit_breaker] # Skip traditional validation if using adaptive
|
|
69
70
|
|
|
70
71
|
require_keys!([:success_threshold, :error_threshold, :error_timeout], @configuration)
|
|
71
72
|
validate_thresholds!
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Semian
|
|
4
|
+
# DualCircuitBreaker wraps both classic and adaptive circuit breakers,
|
|
5
|
+
# allowing runtime switching between them via a callable that determines which to use.
|
|
6
|
+
class DualCircuitBreaker
|
|
7
|
+
include CircuitBreakerBehaviour
|
|
8
|
+
|
|
9
|
+
# Module to synchronize mark_success and mark_failed calls between sibling circuit breakers
|
|
10
|
+
# and reduce code duplication
|
|
11
|
+
module SiblingSync
|
|
12
|
+
attr_writer :sibling
|
|
13
|
+
|
|
14
|
+
def mark_success(scope: nil, adapter: nil)
|
|
15
|
+
super
|
|
16
|
+
@sibling.method(:mark_success).super_method.call(scope:, adapter:)
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def mark_failed(error, scope: nil, adapter: nil)
|
|
20
|
+
super
|
|
21
|
+
@sibling.method(:mark_failed).super_method.call(error, scope:, adapter:)
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
class ChildClassicCircuitBreaker < CircuitBreaker
|
|
26
|
+
include SiblingSync
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
class ChildAdaptiveCircuitBreaker < AdaptiveCircuitBreaker
|
|
30
|
+
include SiblingSync
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
attr_reader :classic_circuit_breaker, :adaptive_circuit_breaker, :active_circuit_breaker
|
|
34
|
+
|
|
35
|
+
# use_adaptive should be a callable (Proc/lambda) that returns true/false
|
|
36
|
+
# to determine which circuit breaker to use. If it returns true, use adaptive.
|
|
37
|
+
def initialize(name:, classic_circuit_breaker:, adaptive_circuit_breaker:)
|
|
38
|
+
initialize_behaviour(name: name)
|
|
39
|
+
|
|
40
|
+
@classic_circuit_breaker = classic_circuit_breaker
|
|
41
|
+
@adaptive_circuit_breaker = adaptive_circuit_breaker
|
|
42
|
+
|
|
43
|
+
@classic_circuit_breaker.sibling = @adaptive_circuit_breaker
|
|
44
|
+
@adaptive_circuit_breaker.sibling = @classic_circuit_breaker
|
|
45
|
+
|
|
46
|
+
@active_circuit_breaker = @classic_circuit_breaker
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def self.adaptive_circuit_breaker_selector(selector) # rubocop:disable Style/ClassMethodsDefinitions
|
|
50
|
+
@@adaptive_circuit_breaker_selector = selector # rubocop:disable Style/ClassVars
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
def active_breaker_type
|
|
54
|
+
@active_circuit_breaker.is_a?(Semian::AdaptiveCircuitBreaker) ? :adaptive : :classic
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
def acquire(resource = nil, scope: nil, adapter: nil, &block)
|
|
58
|
+
# NOTE: This assignment is not thread-safe, but this is acceptable for now:
|
|
59
|
+
# - Each request gets its own decision based on the selector at that moment
|
|
60
|
+
# - The worst case is a brief inconsistency where a thread reads a stale value,
|
|
61
|
+
# which just means it uses the previous circuit breaker type for that one request
|
|
62
|
+
old_type = active_breaker_type
|
|
63
|
+
@active_circuit_breaker = get_active_circuit_breaker(resource)
|
|
64
|
+
if old_type != active_breaker_type
|
|
65
|
+
Semian.notify(:circuit_breaker_mode_change, self, nil, nil, old_mode: old_type, new_mode: active_breaker_type)
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
@active_circuit_breaker.acquire(resource, scope:, adapter:, &block)
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def open?
|
|
72
|
+
@active_circuit_breaker.open?
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def closed?
|
|
76
|
+
@active_circuit_breaker.closed?
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
def half_open?
|
|
80
|
+
@active_circuit_breaker.half_open?
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
def request_allowed?
|
|
84
|
+
@active_circuit_breaker.request_allowed?
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
def mark_failed(error, scope: nil, adapter: nil)
|
|
88
|
+
@active_circuit_breaker&.mark_failed(error, scope: nil, adapter: nil)
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def mark_success(scope: nil, adapter: nil)
|
|
92
|
+
@active_circuit_breaker&.mark_success(scope: nil, adapter: nil)
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
def stop
|
|
96
|
+
@adaptive_circuit_breaker&.stop
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
def reset(scope: nil, adapter: nil)
|
|
100
|
+
@classic_circuit_breaker&.reset(scope:, adapter:)
|
|
101
|
+
@adaptive_circuit_breaker&.reset(scope:, adapter:)
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def destroy
|
|
105
|
+
@classic_circuit_breaker&.destroy
|
|
106
|
+
@adaptive_circuit_breaker&.destroy
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
def in_use?
|
|
110
|
+
@classic_circuit_breaker&.in_use? || @adaptive_circuit_breaker&.in_use?
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
def last_error
|
|
114
|
+
@active_circuit_breaker.last_error
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
def metrics
|
|
118
|
+
{
|
|
119
|
+
active: active_breaker_type,
|
|
120
|
+
classic: classic_metrics,
|
|
121
|
+
adaptive: adaptive_metrics,
|
|
122
|
+
}
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
private
|
|
126
|
+
|
|
127
|
+
def classic_metrics
|
|
128
|
+
return {} unless @classic_circuit_breaker
|
|
129
|
+
|
|
130
|
+
{
|
|
131
|
+
state: @classic_circuit_breaker.state&.value,
|
|
132
|
+
open: @classic_circuit_breaker.open?,
|
|
133
|
+
closed: @classic_circuit_breaker.closed?,
|
|
134
|
+
half_open: @classic_circuit_breaker.half_open?,
|
|
135
|
+
}
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
def adaptive_metrics
|
|
139
|
+
return {} unless @adaptive_circuit_breaker
|
|
140
|
+
|
|
141
|
+
@adaptive_circuit_breaker.metrics.merge(
|
|
142
|
+
open: @adaptive_circuit_breaker.open?,
|
|
143
|
+
closed: @adaptive_circuit_breaker.closed?,
|
|
144
|
+
half_open: @adaptive_circuit_breaker.half_open?,
|
|
145
|
+
)
|
|
146
|
+
end
|
|
147
|
+
|
|
148
|
+
def get_active_circuit_breaker(resource)
|
|
149
|
+
if use_adaptive?(resource)
|
|
150
|
+
@adaptive_circuit_breaker
|
|
151
|
+
else
|
|
152
|
+
@classic_circuit_breaker
|
|
153
|
+
end
|
|
154
|
+
end
|
|
155
|
+
|
|
156
|
+
def use_adaptive?(resource = nil)
|
|
157
|
+
return false unless defined?(@@adaptive_circuit_breaker_selector)
|
|
158
|
+
|
|
159
|
+
@@adaptive_circuit_breaker_selector.call(resource)
|
|
160
|
+
rescue => e
|
|
161
|
+
Semian.logger&.warn("[#{@name}] use_adaptive check failed: #{e.message}. Defaulting to classic circuit breaker.")
|
|
162
|
+
false
|
|
163
|
+
end
|
|
164
|
+
end
|
|
165
|
+
end
|
data/lib/semian/mysql2.rb
CHANGED
|
@@ -126,11 +126,11 @@ module Semian
|
|
|
126
126
|
acquire_semian_resource(adapter: :mysql, scope: :connection) { raw_connect(*args) }
|
|
127
127
|
end
|
|
128
128
|
|
|
129
|
-
def acquire_semian_resource(**)
|
|
129
|
+
def acquire_semian_resource(adapter: nil, scope: nil, **)
|
|
130
130
|
super
|
|
131
131
|
rescue ::Mysql2::Error => error
|
|
132
132
|
if error.is_a?(PingFailure) || (!error.is_a?(::Mysql2::SemianError) && error.message.match?(CONNECTION_ERROR))
|
|
133
|
-
semian_resource.mark_failed(error)
|
|
133
|
+
semian_resource.mark_failed(error, scope: scope, adapter: adapter)
|
|
134
134
|
error.semian_identifier = semian_identifier
|
|
135
135
|
end
|
|
136
136
|
raise
|
data/lib/semian/net_http.rb
CHANGED
|
@@ -106,7 +106,7 @@ module Semian
|
|
|
106
106
|
return super if disabled?
|
|
107
107
|
|
|
108
108
|
acquire_semian_resource(adapter: :http, scope: :query) do
|
|
109
|
-
handle_error_responses(super)
|
|
109
|
+
handle_error_responses(super, adapter: :http, scope: :query)
|
|
110
110
|
end
|
|
111
111
|
end
|
|
112
112
|
end
|
|
@@ -126,9 +126,9 @@ module Semian
|
|
|
126
126
|
|
|
127
127
|
private
|
|
128
128
|
|
|
129
|
-
def handle_error_responses(result)
|
|
129
|
+
def handle_error_responses(result, scope:, adapter:)
|
|
130
130
|
if raw_semian_options.fetch(:open_circuit_server_errors, false)
|
|
131
|
-
semian_resource.mark_failed(result) if result.is_a?(::Net::HTTPServerError)
|
|
131
|
+
semian_resource.mark_failed(result, scope: scope, adapter: adapter) if result.is_a?(::Net::HTTPServerError)
|
|
132
132
|
end
|
|
133
133
|
result
|
|
134
134
|
end
|
|
@@ -0,0 +1,217 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "thread"
|
|
4
|
+
require_relative "simple_exponential_smoother"
|
|
5
|
+
|
|
6
|
+
module Semian
|
|
7
|
+
module Simple
|
|
8
|
+
# PID Controller for adaptive circuit breaking
|
|
9
|
+
# Based on the error function:
|
|
10
|
+
# P = (error_rate - ideal_error_rate) - (1 - (error_rate - ideal_error_rate)) * rejection_rate
|
|
11
|
+
# Note: P increases when error_rate increases
|
|
12
|
+
# P decreases when rejection_rate increases (providing feedback)
|
|
13
|
+
class PIDController
|
|
14
|
+
attr_reader :rejection_rate
|
|
15
|
+
|
|
16
|
+
def initialize(kp:, ki:, kd:, window_size:, sliding_interval:, implementation:, initial_error_rate:,
|
|
17
|
+
dead_zone_ratio:, ideal_error_rate_estimator_cap_value:, integral_upper_cap:, integral_lower_cap:)
|
|
18
|
+
@kp = kp
|
|
19
|
+
@ki = ki
|
|
20
|
+
@kd = kd
|
|
21
|
+
@dead_zone_ratio = dead_zone_ratio
|
|
22
|
+
@integral_upper_cap = integral_upper_cap
|
|
23
|
+
@integral_lower_cap = integral_lower_cap
|
|
24
|
+
|
|
25
|
+
@rejection_rate = 0.0
|
|
26
|
+
@integral = 0.0
|
|
27
|
+
@derivative = 0.0
|
|
28
|
+
@previous_p_value = 0.0
|
|
29
|
+
@last_ideal_error_rate = initial_error_rate
|
|
30
|
+
|
|
31
|
+
@window_size = window_size
|
|
32
|
+
@sliding_interval = sliding_interval
|
|
33
|
+
@smoother = SimpleExponentialSmoother.new(
|
|
34
|
+
cap_value: ideal_error_rate_estimator_cap_value,
|
|
35
|
+
initial_value: initial_error_rate,
|
|
36
|
+
observations_per_minute: 60 / sliding_interval,
|
|
37
|
+
)
|
|
38
|
+
|
|
39
|
+
@errors = implementation::SlidingWindow.new(max_size: 200 * window_size)
|
|
40
|
+
@successes = implementation::SlidingWindow.new(max_size: 200 * window_size)
|
|
41
|
+
@rejections = implementation::SlidingWindow.new(max_size: 200 * window_size)
|
|
42
|
+
|
|
43
|
+
@last_error_rate = 0.0
|
|
44
|
+
@last_p_value = 0.0
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def record_request(outcome)
|
|
48
|
+
case outcome
|
|
49
|
+
when :error
|
|
50
|
+
@errors.push(current_time)
|
|
51
|
+
when :success
|
|
52
|
+
@successes.push(current_time)
|
|
53
|
+
when :rejected
|
|
54
|
+
@rejections.push(current_time)
|
|
55
|
+
end
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def update
|
|
59
|
+
# Store the last window's P value so that we can serve it up in the metrics snapshots
|
|
60
|
+
@previous_p_value = @last_p_value
|
|
61
|
+
|
|
62
|
+
@last_error_rate = calculate_error_rate
|
|
63
|
+
|
|
64
|
+
store_error_rate(@last_error_rate)
|
|
65
|
+
|
|
66
|
+
dt = @sliding_interval
|
|
67
|
+
|
|
68
|
+
@last_p_value = calculate_p_value(@last_error_rate)
|
|
69
|
+
|
|
70
|
+
proportional = @kp * @last_p_value
|
|
71
|
+
@integral += @last_p_value * dt
|
|
72
|
+
integral = @ki * @integral
|
|
73
|
+
@derivative = @kd * (@last_p_value - @previous_p_value) / dt
|
|
74
|
+
|
|
75
|
+
# Calculate the control signal (change in rejection rate)
|
|
76
|
+
control_signal = proportional + integral + @derivative
|
|
77
|
+
|
|
78
|
+
# Calculate what the new rejection rate would be
|
|
79
|
+
new_rejection_rate = @rejection_rate + control_signal
|
|
80
|
+
|
|
81
|
+
# Update rejection rate (clamped between 0 and 1)
|
|
82
|
+
@rejection_rate = new_rejection_rate.clamp(0.0, 1.0)
|
|
83
|
+
|
|
84
|
+
@integral = @integral.clamp(@integral_lower_cap, @integral_upper_cap)
|
|
85
|
+
|
|
86
|
+
@rejection_rate
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Should we reject this request based on current rejection rate?
|
|
90
|
+
def should_reject?
|
|
91
|
+
rand < @rejection_rate
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
# Reset the controller state
|
|
95
|
+
def reset
|
|
96
|
+
@rejection_rate = 0.0
|
|
97
|
+
@integral = 0.0
|
|
98
|
+
@previous_p_value = 0.0
|
|
99
|
+
@derivative = 0.0
|
|
100
|
+
@last_p_value = 0.0
|
|
101
|
+
@errors.clear
|
|
102
|
+
@successes.clear
|
|
103
|
+
@rejections.clear
|
|
104
|
+
@last_error_rate = 0.0
|
|
105
|
+
@smoother.reset
|
|
106
|
+
@last_ideal_error_rate = @smoother.forecast
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
# Get current metrics for monitoring/debugging
|
|
110
|
+
def metrics(full: true)
|
|
111
|
+
result = {
|
|
112
|
+
rejection_rate: @rejection_rate,
|
|
113
|
+
error_rate: @last_error_rate,
|
|
114
|
+
ideal_error_rate: @last_ideal_error_rate,
|
|
115
|
+
dead_zone_ratio: @dead_zone_ratio,
|
|
116
|
+
p_value: @last_p_value,
|
|
117
|
+
previous_p_value: @previous_p_value,
|
|
118
|
+
integral: @integral,
|
|
119
|
+
derivative: @derivative,
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
if full
|
|
123
|
+
result[:smoother_state] = @smoother.state
|
|
124
|
+
result[:current_window_requests] = {
|
|
125
|
+
success: @successes.size,
|
|
126
|
+
error: @errors.size,
|
|
127
|
+
rejected: @rejections.size,
|
|
128
|
+
}
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
result
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
private
|
|
135
|
+
|
|
136
|
+
# Calculate the current P value with dead-zone noise suppression.
|
|
137
|
+
# The dead zone prevents the controller from reacting to small, noisy
|
|
138
|
+
# deviations from the ideal error rate. Only deviations exceeding
|
|
139
|
+
# ideal_error_rate * dead_zone_ratio trigger a response.
|
|
140
|
+
def calculate_p_value(current_error_rate)
|
|
141
|
+
@last_ideal_error_rate = calculate_ideal_error_rate
|
|
142
|
+
|
|
143
|
+
raw_delta = current_error_rate - @last_ideal_error_rate
|
|
144
|
+
dead_zone = @last_ideal_error_rate * @dead_zone_ratio
|
|
145
|
+
|
|
146
|
+
delta_error = if raw_delta <= 0
|
|
147
|
+
# Below or at ideal: pass through for recovery
|
|
148
|
+
raw_delta
|
|
149
|
+
elsif raw_delta <= dead_zone
|
|
150
|
+
# Within dead zone: suppress noise
|
|
151
|
+
0.0
|
|
152
|
+
else
|
|
153
|
+
# Above dead zone: full signal, dead zone only silences noise
|
|
154
|
+
raw_delta
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
delta_error - (1 - delta_error) * @rejection_rate
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
def calculate_error_rate
|
|
161
|
+
# Clean up old observations
|
|
162
|
+
current_timestamp = current_time
|
|
163
|
+
cutoff_time = current_timestamp - @window_size
|
|
164
|
+
@errors.reject! { |timestamp| timestamp < cutoff_time }
|
|
165
|
+
@successes.reject! { |timestamp| timestamp < cutoff_time }
|
|
166
|
+
@rejections.reject! { |timestamp| timestamp < cutoff_time }
|
|
167
|
+
|
|
168
|
+
total_requests = @successes.size + @errors.size
|
|
169
|
+
return 0.0 if total_requests == 0
|
|
170
|
+
|
|
171
|
+
@errors.size.to_f / total_requests
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
def store_error_rate(error_rate)
|
|
175
|
+
@smoother.add_observation(error_rate)
|
|
176
|
+
end
|
|
177
|
+
|
|
178
|
+
def calculate_ideal_error_rate
|
|
179
|
+
@smoother.forecast
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
def current_time
|
|
183
|
+
Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
184
|
+
end
|
|
185
|
+
end
|
|
186
|
+
end
|
|
187
|
+
|
|
188
|
+
module ThreadSafe
|
|
189
|
+
# Thread-safe version of PIDController
|
|
190
|
+
class PIDController < Simple::PIDController
|
|
191
|
+
def initialize(**kwargs)
|
|
192
|
+
super(**kwargs)
|
|
193
|
+
@lock = Mutex.new
|
|
194
|
+
end
|
|
195
|
+
|
|
196
|
+
def record_request(outcome)
|
|
197
|
+
@lock.synchronize { super }
|
|
198
|
+
end
|
|
199
|
+
|
|
200
|
+
def update
|
|
201
|
+
@lock.synchronize { super }
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
def should_reject?
|
|
205
|
+
@lock.synchronize { super }
|
|
206
|
+
end
|
|
207
|
+
|
|
208
|
+
def reset
|
|
209
|
+
@lock.synchronize { super }
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
# NOTE: metrics, calculate_error_rate are not overridden
|
|
213
|
+
# to avoid deadlock. calculate_error_rate is private method
|
|
214
|
+
# only called internally from update (synchronized) and metrics (not synchronized).
|
|
215
|
+
end
|
|
216
|
+
end
|
|
217
|
+
end
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "singleton"
|
|
4
|
+
require_relative "pid_controller"
|
|
5
|
+
|
|
6
|
+
module Semian
|
|
7
|
+
class PIDControllerThread
|
|
8
|
+
include Singleton
|
|
9
|
+
|
|
10
|
+
def initialize
|
|
11
|
+
@stopped = true
|
|
12
|
+
@update_thread = nil
|
|
13
|
+
@circuit_breakers = Concurrent::Map.new
|
|
14
|
+
@sliding_interval = ENV.fetch("SEMIAN_ADAPTIVE_CIRCUIT_BREAKER_SLIDING_INTERVAL", 1).to_i
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
# As per the singleton pattern, this is called only once
|
|
18
|
+
def start
|
|
19
|
+
@stopped = false
|
|
20
|
+
|
|
21
|
+
update_proc = proc do
|
|
22
|
+
loop do
|
|
23
|
+
break if @stopped
|
|
24
|
+
|
|
25
|
+
wait_for_window
|
|
26
|
+
|
|
27
|
+
# Update PID controller state for each registered circuit breaker
|
|
28
|
+
@circuit_breakers.each do |_, circuit_breaker|
|
|
29
|
+
circuit_breaker.pid_controller_update
|
|
30
|
+
end
|
|
31
|
+
rescue => e
|
|
32
|
+
Semian.logger&.warn("[#{@name}] PID controller update thread error: #{e.message}")
|
|
33
|
+
end
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
@update_thread = Thread.new(&update_proc)
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def stop
|
|
40
|
+
@stopped = true
|
|
41
|
+
@update_thread&.kill
|
|
42
|
+
@update_thread = nil
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
def register_resource(circuit_breaker)
|
|
46
|
+
# Track every registered circuit breaker in a Concurrent::Map
|
|
47
|
+
|
|
48
|
+
# Start the thread if it's not already running
|
|
49
|
+
if @circuit_breakers.empty? && @stopped
|
|
50
|
+
start
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
# Add the circuit breaker to the map
|
|
54
|
+
@circuit_breakers[circuit_breaker.name] = circuit_breaker
|
|
55
|
+
self
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def unregister_resource(circuit_breaker)
|
|
59
|
+
# Remove the circuit breaker from the map
|
|
60
|
+
@circuit_breakers.delete(circuit_breaker.name)
|
|
61
|
+
|
|
62
|
+
# Stop the thread if there are no more circuit breakers
|
|
63
|
+
if @circuit_breakers.empty?
|
|
64
|
+
stop
|
|
65
|
+
end
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def wait_for_window
|
|
69
|
+
Kernel.sleep(@sliding_interval)
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
end
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Semian
|
|
4
|
+
# SimpleExponentialSmoother implements Simple Exponential Smoothing (SES) for forecasting
|
|
5
|
+
# a stable baseline error rate in adaptive circuit breakers.
|
|
6
|
+
#
|
|
7
|
+
# SES focuses on the level component only (no trend or seasonality), using the formula:
|
|
8
|
+
# smoothed = alpha * value + (1 - alpha) * previous_smoothed
|
|
9
|
+
#
|
|
10
|
+
# Key characteristics:
|
|
11
|
+
# - Drops extreme values above cap to prevent outliers from distorting the forecast
|
|
12
|
+
# - Runs in two periods: low confidence (first 30 minutes) and high confidence (after 30 minutes)
|
|
13
|
+
# - During the low confidence period, we converge faster towards observed value than during the high confidence period
|
|
14
|
+
# - The choice of alphas follows the following criteria:
|
|
15
|
+
# - During low confidence:
|
|
16
|
+
# - If we are observing 2x our current estimate, we need to converge towards it in 30 minutes
|
|
17
|
+
# - If we are observing 0.5x our current estimate, we need to converge towards it in 5 minutes
|
|
18
|
+
# - During high confidence:
|
|
19
|
+
# - If we are observing 2x our current estimate, we need to converge towards it in 1 hour
|
|
20
|
+
# - If we are observing 0.5x our current estimate, we need to converge towards it in 10 minutes
|
|
21
|
+
# The following code snippet can be used to calculate the alphas:
|
|
22
|
+
# def find_alpha(name, start_point, multiplier, convergence_duration)
|
|
23
|
+
# target = start_point * multiplier
|
|
24
|
+
# desired_distance = 0.003
|
|
25
|
+
# alpha_ceil = 0.5
|
|
26
|
+
# alpha_floor = 0.0
|
|
27
|
+
# alpha = 0.25
|
|
28
|
+
# while true
|
|
29
|
+
# smoothed_value = start_point
|
|
30
|
+
# step_size = convergence_duration / 10
|
|
31
|
+
# converged_too_fast = false
|
|
32
|
+
# 10.times do |step|
|
|
33
|
+
# step_size.times do
|
|
34
|
+
# smoothed_value = alpha * target + (1 - alpha) * smoothed_value
|
|
35
|
+
# end
|
|
36
|
+
# if step < 9 and (smoothed_value - target).abs < desired_distance
|
|
37
|
+
# converged_too_fast = true
|
|
38
|
+
# end
|
|
39
|
+
# end
|
|
40
|
+
#
|
|
41
|
+
# if converged_too_fast
|
|
42
|
+
# alpha_ceil = alpha
|
|
43
|
+
# alpha = (alpha + alpha_floor) / 2
|
|
44
|
+
# next
|
|
45
|
+
# end
|
|
46
|
+
#
|
|
47
|
+
# if (smoothed_value - target).abs > desired_distance
|
|
48
|
+
# alpha_floor = alpha
|
|
49
|
+
# alpha = (alpha + alpha_ceil) / 2
|
|
50
|
+
# next
|
|
51
|
+
# end
|
|
52
|
+
#
|
|
53
|
+
# break
|
|
54
|
+
# end
|
|
55
|
+
#
|
|
56
|
+
# print "#{name} is #{alpha}\n"
|
|
57
|
+
# end
|
|
58
|
+
#
|
|
59
|
+
# initial_error_rate = 0.05
|
|
60
|
+
#
|
|
61
|
+
# find_alpha("low confidence upward convergence alpha", initial_error_rate, 2, 1800)
|
|
62
|
+
# find_alpha("low confidence downward convergence alpha", initial_error_rate, 0.5, 300)
|
|
63
|
+
# find_alpha("high confidence upward convergence alpha", initial_error_rate, 2, 3600)
|
|
64
|
+
# find_alpha("high confidence downward convergence alpha", initial_error_rate, 0.5, 600)
|
|
65
|
+
class SimpleExponentialSmoother
|
|
66
|
+
LOW_CONFIDENCE_ALPHA_UP = 0.0017
|
|
67
|
+
LOW_CONFIDENCE_ALPHA_DOWN = 0.078
|
|
68
|
+
HIGH_CONFIDENCE_ALPHA_UP = 0.0009
|
|
69
|
+
HIGH_CONFIDENCE_ALPHA_DOWN = 0.039
|
|
70
|
+
LOW_CONFIDENCE_THRESHOLD_MINUTES = 30
|
|
71
|
+
|
|
72
|
+
# Validate all alpha constants at class load time
|
|
73
|
+
[
|
|
74
|
+
LOW_CONFIDENCE_ALPHA_UP,
|
|
75
|
+
LOW_CONFIDENCE_ALPHA_DOWN,
|
|
76
|
+
HIGH_CONFIDENCE_ALPHA_UP,
|
|
77
|
+
HIGH_CONFIDENCE_ALPHA_DOWN,
|
|
78
|
+
].each do |alpha|
|
|
79
|
+
if alpha <= 0 || alpha >= 0.5
|
|
80
|
+
raise ArgumentError, "alpha constant must be in range (0, 0.5), got: #{alpha}"
|
|
81
|
+
end
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
attr_reader :alpha, :cap_value, :initial_value, :smoothed_value, :observations_per_minute
|
|
85
|
+
|
|
86
|
+
def initialize(cap_value:, initial_value:, observations_per_minute:)
|
|
87
|
+
@alpha = LOW_CONFIDENCE_ALPHA_DOWN # Start with low confidence, converging down
|
|
88
|
+
@cap_value = cap_value
|
|
89
|
+
@initial_value = initial_value
|
|
90
|
+
@observations_per_minute = observations_per_minute
|
|
91
|
+
@smoothed_value = initial_value
|
|
92
|
+
@observation_count = 0
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
def add_observation(value)
|
|
96
|
+
raise ArgumentError, "value must be non-negative, got: #{value}" if value < 0
|
|
97
|
+
|
|
98
|
+
return @smoothed_value if value > cap_value
|
|
99
|
+
|
|
100
|
+
@observation_count += 1
|
|
101
|
+
|
|
102
|
+
low_confidence = @observation_count < (@observations_per_minute * LOW_CONFIDENCE_THRESHOLD_MINUTES)
|
|
103
|
+
converging_up = value > @smoothed_value
|
|
104
|
+
|
|
105
|
+
@alpha = if low_confidence
|
|
106
|
+
converging_up ? LOW_CONFIDENCE_ALPHA_UP : LOW_CONFIDENCE_ALPHA_DOWN
|
|
107
|
+
else
|
|
108
|
+
converging_up ? HIGH_CONFIDENCE_ALPHA_UP : HIGH_CONFIDENCE_ALPHA_DOWN
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
@smoothed_value = (@alpha * value) + ((1.0 - @alpha) * @smoothed_value)
|
|
112
|
+
@smoothed_value
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
def forecast
|
|
116
|
+
@smoothed_value
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def state
|
|
120
|
+
{
|
|
121
|
+
smoothed_value: @smoothed_value,
|
|
122
|
+
alpha: @alpha,
|
|
123
|
+
cap_value: @cap_value,
|
|
124
|
+
initial_value: @initial_value,
|
|
125
|
+
observations_per_minute: @observations_per_minute,
|
|
126
|
+
observation_count: @observation_count,
|
|
127
|
+
}
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
def reset
|
|
131
|
+
@smoothed_value = initial_value
|
|
132
|
+
@observation_count = 0
|
|
133
|
+
@alpha = LOW_CONFIDENCE_ALPHA_DOWN
|
|
134
|
+
self
|
|
135
|
+
end
|
|
136
|
+
end
|
|
137
|
+
end
|
|
@@ -35,7 +35,7 @@ module Semian
|
|
|
35
35
|
0
|
|
36
36
|
end
|
|
37
37
|
|
|
38
|
-
def reset
|
|
38
|
+
def reset(**)
|
|
39
39
|
end
|
|
40
40
|
|
|
41
41
|
def open?
|
|
@@ -54,10 +54,10 @@ module Semian
|
|
|
54
54
|
true
|
|
55
55
|
end
|
|
56
56
|
|
|
57
|
-
def mark_failed(_error)
|
|
57
|
+
def mark_failed(_error, **)
|
|
58
58
|
end
|
|
59
59
|
|
|
60
|
-
def mark_success
|
|
60
|
+
def mark_success(**)
|
|
61
61
|
end
|
|
62
62
|
|
|
63
63
|
def bulkhead
|
data/lib/semian/version.rb
CHANGED
data/lib/semian.rb
CHANGED
|
@@ -11,6 +11,8 @@ require "semian/instrumentable"
|
|
|
11
11
|
require "semian/platform"
|
|
12
12
|
require "semian/resource"
|
|
13
13
|
require "semian/circuit_breaker"
|
|
14
|
+
require "semian/adaptive_circuit_breaker"
|
|
15
|
+
require "semian/dual_circuit_breaker"
|
|
14
16
|
require "semian/protected_resource"
|
|
15
17
|
require "semian/unprotected_resource"
|
|
16
18
|
require "semian/simple_sliding_window"
|
|
@@ -197,7 +199,7 @@ module Semian
|
|
|
197
199
|
# +exceptions+: An array of exception classes that should be accounted as resource errors. Default [].
|
|
198
200
|
# (circuit breaker)
|
|
199
201
|
#
|
|
200
|
-
# +exponential_backoff_error_timeout+: When set to true, instead of opening the circuit for the full
|
|
202
|
+
# # +exponential_backoff_error_timeout+: When set to true, instead of opening the circuit for the full
|
|
201
203
|
# error_timeout duration, it starts with a smaller timeout and increases exponentially on each subsequent
|
|
202
204
|
# opening up to error_timeout. This helps avoid over-opening the circuit for temporary issues.
|
|
203
205
|
# Default false. (circuit breaker)
|
|
@@ -209,6 +211,20 @@ module Semian
|
|
|
209
211
|
# when exponential backoff is enabled. Only valid when exponential_backoff_error_timeout is true.
|
|
210
212
|
# Default 2. (circuit breaker)
|
|
211
213
|
#
|
|
214
|
+
# +adaptive_circuit_breaker+: Enable adaptive circuit breaker using PID controller. Default false.
|
|
215
|
+
# When enabled, this replaces the traditional circuit breaker with an adaptive version
|
|
216
|
+
# that dynamically adjusts rejection rates based on service health. (adaptive circuit breaker)
|
|
217
|
+
#
|
|
218
|
+
# +dual_circuit_breaker+: Enable dual circuit breaker mode where both legacy and adaptive
|
|
219
|
+
# circuit breakers are initialized. Default false. When enabled, both circuit breakers track
|
|
220
|
+
# requests, but only one is used for decision-making based on use_adaptive.
|
|
221
|
+
# (dual circuit breaker)
|
|
222
|
+
#
|
|
223
|
+
# +use_adaptive+: A callable (Proc/lambda) that returns true to use adaptive circuit breaker
|
|
224
|
+
# or false to use legacy. Only used when dual_circuit_breaker is enabled. Default: ->() { false }.
|
|
225
|
+
# Example: ->() { MyFeatureFlag.enabled?(:adaptive_circuit_breaker) }
|
|
226
|
+
# (dual circuit breaker)
|
|
227
|
+
#
|
|
212
228
|
# Returns the registered resource.
|
|
213
229
|
def register(name, **options)
|
|
214
230
|
return UnprotectedResource.new(name) if ENV.key?("SEMIAN_DISABLED")
|
|
@@ -216,7 +232,14 @@ module Semian
|
|
|
216
232
|
# Validate configuration before proceeding
|
|
217
233
|
ConfigurationValidator.new(name, options).validate!
|
|
218
234
|
|
|
219
|
-
circuit_breaker =
|
|
235
|
+
circuit_breaker = if options[:dual_circuit_breaker]
|
|
236
|
+
create_dual_circuit_breaker(name, **options)
|
|
237
|
+
elsif options[:adaptive_circuit_breaker]
|
|
238
|
+
create_adaptive_circuit_breaker(name, **options)
|
|
239
|
+
else
|
|
240
|
+
create_circuit_breaker(name, **options)
|
|
241
|
+
end
|
|
242
|
+
|
|
220
243
|
bulkhead = create_bulkhead(name, **options)
|
|
221
244
|
|
|
222
245
|
resources[name] = ProtectedResource.new(name, bulkhead, circuit_breaker)
|
|
@@ -312,12 +335,49 @@ module Semian
|
|
|
312
335
|
|
|
313
336
|
private
|
|
314
337
|
|
|
315
|
-
def
|
|
338
|
+
def create_dual_circuit_breaker(name, **options)
|
|
339
|
+
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
|
340
|
+
|
|
341
|
+
classic_cb = create_circuit_breaker(name, is_child: true, **options)
|
|
342
|
+
adaptive_cb = create_adaptive_circuit_breaker(name, is_child: true, **options)
|
|
343
|
+
|
|
344
|
+
DualCircuitBreaker.new(
|
|
345
|
+
name: name,
|
|
346
|
+
classic_circuit_breaker: classic_cb,
|
|
347
|
+
adaptive_circuit_breaker: adaptive_cb,
|
|
348
|
+
)
|
|
349
|
+
end
|
|
350
|
+
|
|
351
|
+
def create_adaptive_circuit_breaker(name, is_child: false, **options)
|
|
352
|
+
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
|
353
|
+
|
|
354
|
+
exceptions = options[:exceptions] || []
|
|
355
|
+
cls = is_child ? DualCircuitBreaker::ChildAdaptiveCircuitBreaker : AdaptiveCircuitBreaker
|
|
356
|
+
cls.new(
|
|
357
|
+
name: name,
|
|
358
|
+
exceptions: Array(exceptions) + [::Semian::BaseError],
|
|
359
|
+
kp: options[:kp] || 1.0,
|
|
360
|
+
ki: options[:ki] || 0.2,
|
|
361
|
+
kd: options[:kd] || 0.0,
|
|
362
|
+
window_size: options[:window_size] || 10,
|
|
363
|
+
initial_error_rate: options[:initial_error_rate] || 0.05,
|
|
364
|
+
dead_zone_ratio: options[:dead_zone_ratio] || 0.25,
|
|
365
|
+
# We use an environment vraiable for the sliding interval because it is shared among all circuit breakers
|
|
366
|
+
sliding_interval: ENV.fetch("SEMIAN_ADAPTIVE_CIRCUIT_BREAKER_SLIDING_INTERVAL", 1).to_i,
|
|
367
|
+
ideal_error_rate_estimator_cap_value: options[:ideal_error_rate_estimator_cap_value] || 0.1,
|
|
368
|
+
integral_upper_cap: options[:integral_upper_cap] || 10.0,
|
|
369
|
+
integral_lower_cap: options[:integral_lower_cap] || -10.0,
|
|
370
|
+
implementation: implementation(**options),
|
|
371
|
+
)
|
|
372
|
+
end
|
|
373
|
+
|
|
374
|
+
def create_circuit_breaker(name, is_child: false, **options)
|
|
316
375
|
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
|
317
376
|
return unless options.fetch(:circuit_breaker, true)
|
|
318
377
|
|
|
319
378
|
exceptions = options[:exceptions] || []
|
|
320
|
-
CircuitBreaker
|
|
379
|
+
cls = is_child ? DualCircuitBreaker::ChildClassicCircuitBreaker : CircuitBreaker
|
|
380
|
+
cls.new(
|
|
321
381
|
name,
|
|
322
382
|
success_threshold: options[:success_threshold],
|
|
323
383
|
error_threshold: options[:error_threshold],
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: semian
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.28.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Scott Francis
|
|
@@ -51,13 +51,18 @@ files:
|
|
|
51
51
|
- lib/semian/activerecord_postgresql_adapter.rb
|
|
52
52
|
- lib/semian/activerecord_trilogy_adapter.rb
|
|
53
53
|
- lib/semian/adapter.rb
|
|
54
|
+
- lib/semian/adaptive_circuit_breaker.rb
|
|
54
55
|
- lib/semian/circuit_breaker.rb
|
|
56
|
+
- lib/semian/circuit_breaker_behaviour.rb
|
|
55
57
|
- lib/semian/configuration_validator.rb
|
|
58
|
+
- lib/semian/dual_circuit_breaker.rb
|
|
56
59
|
- lib/semian/grpc.rb
|
|
57
60
|
- lib/semian/instrumentable.rb
|
|
58
61
|
- lib/semian/lru_hash.rb
|
|
59
62
|
- lib/semian/mysql2.rb
|
|
60
63
|
- lib/semian/net_http.rb
|
|
64
|
+
- lib/semian/pid_controller.rb
|
|
65
|
+
- lib/semian/pid_controller_thread.rb
|
|
61
66
|
- lib/semian/platform.rb
|
|
62
67
|
- lib/semian/protected_resource.rb
|
|
63
68
|
- lib/semian/rails.rb
|
|
@@ -65,6 +70,7 @@ files:
|
|
|
65
70
|
- lib/semian/redis/v5.rb
|
|
66
71
|
- lib/semian/redis_client.rb
|
|
67
72
|
- lib/semian/resource.rb
|
|
73
|
+
- lib/semian/simple_exponential_smoother.rb
|
|
68
74
|
- lib/semian/simple_integer.rb
|
|
69
75
|
- lib/semian/simple_sliding_window.rb
|
|
70
76
|
- lib/semian/simple_state.rb
|
|
@@ -94,7 +100,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
94
100
|
- !ruby/object:Gem::Version
|
|
95
101
|
version: '0'
|
|
96
102
|
requirements: []
|
|
97
|
-
rubygems_version: 4.0.
|
|
103
|
+
rubygems_version: 4.0.8
|
|
98
104
|
specification_version: 4
|
|
99
105
|
summary: Bulkheading for Ruby with SysV semaphores
|
|
100
106
|
test_files: []
|