semian 0.23.0 → 0.24.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +72 -66
- data/lib/semian/circuit_breaker.rb +12 -7
- data/lib/semian/version.rb +1 -1
- data/lib/semian.rb +5 -0
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: aaba2c88b92196d855605e58c4525a4a8b7c2bb0a4d0b865cab244e059750bb1
|
4
|
+
data.tar.gz: cbd8a6b46fb102b15e9ed3b3d212502013687bcf1e857d924d453086527848c7
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 490119185621f843c028549cf213d806c2306bfaf0bb10ecf5a4de0134cd470dcde5c849b3119f3f1162fcde0b354045483ce929f3be180157abee76acfab350
|
7
|
+
data.tar.gz: 7afc74eaeb93dfd3f609801a44a4c8267134660172a1614de59747f0efe50371a8e817d40d4f4e97668853fbb30029fb8745da0f9c211f40d5906291bf56d86b
|
data/README.md
CHANGED
@@ -15,15 +15,15 @@ allowing you to handle errors gracefully.** Semian does this by intercepting
|
|
15
15
|
resource access through heuristic patterns inspired by [Hystrix][hystrix] and
|
16
16
|
[Release It][release-it]:
|
17
17
|
|
18
|
-
|
18
|
+
- [**Circuit breaker**](#circuit-breaker). A pattern for limiting the
|
19
19
|
amount of requests to a dependency that is having issues.
|
20
|
-
|
20
|
+
- [**Bulkheading**](#bulkheading). Controlling the concurrent access to
|
21
21
|
a single resource, access is coordinated server-wide with [SysV
|
22
22
|
semaphores][sysv].
|
23
23
|
|
24
24
|
Resource drivers are monkey-patched to be aware of Semian, these are called
|
25
25
|
[Semian Adapters](#adapters). Thus, every time resource access is requested
|
26
|
-
Semian is queried for status on the resource first.
|
26
|
+
Semian is queried for status on the resource first. If Semian, through the
|
27
27
|
patterns above, deems the resource to be unavailable it will raise an exception.
|
28
28
|
**The ultimate outcome of Semian is always an exception that can then be rescued
|
29
29
|
for a graceful fallback**. Instead of waiting for the timeout, Semian raises
|
@@ -60,7 +60,7 @@ section](#configuration) on how to configure adapters.
|
|
60
60
|
|
61
61
|
Semian works by intercepting resource access. Every time access is requested,
|
62
62
|
Semian is queried, and it will raise an exception if the resource is unavailable
|
63
|
-
according to the circuit breaker or bulkheads.
|
63
|
+
according to the circuit breaker or bulkheads. This is done by monkey-patching
|
64
64
|
the resource driver. **The exception raised by the driver always inherits from
|
65
65
|
the Base exception class of the driver**, meaning you can always simply rescue
|
66
66
|
the base class and catch both Semian and driver errors in the same rescue for
|
@@ -69,11 +69,11 @@ fallbacks.
|
|
69
69
|
The following adapters are in Semian and tested heavily in production, the
|
70
70
|
version is the version of the public gem with the same name:
|
71
71
|
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
72
|
+
- [`semian/mysql2`][mysql-semian-adapter] (~> 0.3.16)
|
73
|
+
- [`semian/redis`][redis-semian-adapter] (~> 3.2.1)
|
74
|
+
- [`semian/net_http`][nethttp-semian-adapter]
|
75
|
+
- [`semian/activerecord_trilogy_adapter`][activerecord-trilogy-semian-adapter]
|
76
|
+
- [`semian-postgres`][postgres-semian-adapter]
|
77
77
|
|
78
78
|
### Creating Adapters
|
79
79
|
|
@@ -158,27 +158,27 @@ should be adequate in most environments with reasonably low timeouts.
|
|
158
158
|
|
159
159
|
Internally, semian uses `SEM_UNDO` for several sysv semaphore operations:
|
160
160
|
|
161
|
-
|
162
|
-
|
163
|
-
|
161
|
+
- Acquire
|
162
|
+
- Worker registration
|
163
|
+
- Semaphore metadata state lock
|
164
164
|
|
165
165
|
The intention behind `SEM_UNDO` is that a semaphore operation is automatically undone when the process exits. This
|
166
166
|
is true even if the process exits abnormally - crashes, receives a `SIG_KILL`, etc, because it is handled by
|
167
167
|
the operating system and not the process itself.
|
168
168
|
|
169
169
|
If, however, a thread performs a semop, the `SEM_UNDO` is on its parent process. This means that the operation
|
170
|
-
|
170
|
+
_will not_ be undone when the thread exits. This can result in the following unfavorable behavior when using
|
171
171
|
threads:
|
172
172
|
|
173
|
-
|
174
|
-
ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
|
175
|
-
This can result in deadlock on the resource.
|
176
|
-
|
177
|
-
count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
|
178
|
-
only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
|
179
|
-
|
180
|
-
attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
|
181
|
-
from being updated.
|
173
|
+
- Threads acquire a resource, but are killed and the resource ticket is never released. For a process, the
|
174
|
+
ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
|
175
|
+
This can result in deadlock on the resource.
|
176
|
+
- Threads that register workers on a resource but are killed and never unregistered. For a process, the worker
|
177
|
+
count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
|
178
|
+
only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
|
179
|
+
- If a thread acquires the semaphore metadata lock and dies before releasing it, semian will deadlock on anything
|
180
|
+
attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
|
181
|
+
from being updated.
|
182
182
|
|
183
183
|
Moreover, a strategy that utilizes `SEM_UNDO` is not compatible with a strategy that attempts to the semaphores tickets manually.
|
184
184
|
In order to support threads, operations that currently use `SEM_UNDO` would need to use no semaphore flag, and the calling process
|
@@ -214,17 +214,19 @@ calculate and adjust ticket counts.
|
|
214
214
|
|
215
215
|
- You must pass **exactly** one of options: `tickets` or `quota`.
|
216
216
|
- Tickets available will be the ceiling of the quota ratio to the number of workers
|
217
|
-
|
217
|
+
- So, with one worker, there will always be a minimum of 1 ticket
|
218
218
|
- Workers in different processes will automatically unregister when the process exits.
|
219
219
|
- If you have a small number of workers (exactly 2) it's possible that the bulkhead will be too sensitive using quotas.
|
220
220
|
- If you use a forking web server (like unicorn) you should call `Semian.unregister_all_resources` before/after forking.
|
221
221
|
|
222
222
|
#### Net::HTTP
|
223
|
+
|
223
224
|
For the `Net::HTTP` specific Semian adapter, since many external libraries may create
|
224
225
|
HTTP connections on the user's behalf, the parameters are instead provided
|
225
226
|
by associating callback functions with `Semian::NetHTTP`, perhaps in an initialization file.
|
226
227
|
|
227
228
|
##### Naming and Options
|
229
|
+
|
228
230
|
To give Semian parameters, assign a `proc` to `Semian::NetHTTP.semian_configuration`
|
229
231
|
that takes a two parameters, `host` and `port` like `127.0.0.1`,`443` or `github_com`,`80`,
|
230
232
|
and returns a `Hash` with configuration parameters as follows. The `proc` is used as a
|
@@ -300,6 +302,7 @@ behavior can be changed to blacklisting or even be completely disabled by varyin
|
|
300
302
|
the use of returning `nil` in the assigned closure.
|
301
303
|
|
302
304
|
##### Additional Exceptions
|
305
|
+
|
303
306
|
Since we envision this particular adapter can be used in combination with many
|
304
307
|
external libraries, that can raise additional exceptions, we added functionality to
|
305
308
|
expand the Exceptions that can be tracked as part of Semian's circuit breaker.
|
@@ -513,22 +516,23 @@ all workers on a server.
|
|
513
516
|
|
514
517
|
There are four configuration parameters for circuit breakers in Semian:
|
515
518
|
|
516
|
-
|
517
|
-
|
519
|
+
- **circuit_breaker**. Enable or Disable Circuit Breaker. Defaults to `true` if not set.
|
520
|
+
- **error_threshold**. The amount of errors a worker encounters within `error_threshold_timeout`
|
518
521
|
amount of time before opening the circuit,
|
519
522
|
that is to start rejecting requests instantly.
|
520
|
-
|
523
|
+
- **error_threshold_timeout**. The amount of time in seconds that `error_threshold`
|
521
524
|
errors must occur to open the circuit.
|
522
525
|
Defaults to `error_timeout` seconds if not set.
|
523
|
-
|
526
|
+
- **error_timeout**. The amount of time in seconds until trying to query the resource
|
524
527
|
again.
|
525
|
-
|
528
|
+
- **error_threshold_timeout_enabled**. If set to false it will disable
|
526
529
|
the time window for evicting old exceptions. `error_timeout` is still used and
|
527
530
|
will reset the circuit. Defaults to `true` if not set.
|
528
|
-
|
531
|
+
- **success_threshold**. The amount of successes on the circuit until closing it
|
529
532
|
again, that is to start accepting all requests to the circuit.
|
530
|
-
|
533
|
+
- **half_open_resource_timeout**. Timeout for the resource in seconds when
|
531
534
|
the circuit is half-open (supported for MySQL, Net::HTTP and Redis).
|
535
|
+
- **lumping_interval**. If provided, errors within this timeframe (in seconds) will be lumped and recorded as one.
|
532
536
|
|
533
537
|
It is possible to disable Circuit Breaker with environment variable
|
534
538
|
`SEMIAN_CIRCUIT_BREAKER_DISABLED=1`.
|
@@ -587,13 +591,13 @@ graph TD;
|
|
587
591
|
ReleaseTicket[Release Ticket]
|
588
592
|
FailRequest[Fail Request]
|
589
593
|
OpenCircuit[Open Circuit Breaker]
|
590
|
-
|
594
|
+
|
591
595
|
Start --> CheckConnection
|
592
596
|
CheckConnection -->|Ticket Available| AllocateTicket
|
593
597
|
AllocateTicket --> AccessResource
|
594
598
|
AccessResource --> ReleaseTicket
|
595
599
|
ReleaseTicket --> CheckConnection
|
596
|
-
|
600
|
+
|
597
601
|
CheckConnection -->|No Ticket Available| BlockTimeout
|
598
602
|
BlockTimeout -->|Timeout| FailRequest
|
599
603
|
BlockTimeout -->|Ticket Available| AccessResource
|
@@ -614,9 +618,9 @@ still experimenting with ways to figure out optimal ticket numbers. Generally
|
|
614
618
|
something below half the number of workers on the server for endpoints that are
|
615
619
|
queried frequently has worked well for us.
|
616
620
|
|
617
|
-
|
618
|
-
|
619
|
-
|
621
|
+
- **bulkhead**. Enable or Disable Bulkhead. Defaults to `true` if not set.
|
622
|
+
- **tickets**. Number of workers that can concurrently access a resource.
|
623
|
+
- **timeout**. Time to wait in seconds to acquire a ticket if there are no tickets left.
|
620
624
|
We recommend this to be `0` unless you have very few workers running (i.e.
|
621
625
|
less than ~5).
|
622
626
|
|
@@ -626,11 +630,11 @@ It is possible to disable Bulkhead with environment variable
|
|
626
630
|
Note that there are system-wide limitations on how many tickets can be allocated
|
627
631
|
on a system. `cat /proc/sys/kernel/sem` will tell you.
|
628
632
|
|
629
|
-
> System-wide limit on the number of semaphore sets.
|
630
|
-
|
631
|
-
|
632
|
-
|
633
|
-
|
633
|
+
> System-wide limit on the number of semaphore sets. On Linux
|
634
|
+
> systems before version 3.19, the default value for this limit
|
635
|
+
> was 128. Since Linux 3.19, the default value is 32,000. On
|
636
|
+
> Linux, this limit can be read and modified via the fourth
|
637
|
+
> field of `/proc/sys/kernel/sem`.
|
634
638
|
|
635
639
|
#### Bulkhead debugging on linux
|
636
640
|
|
@@ -668,10 +672,10 @@ semnum value ncount zcount pid
|
|
668
672
|
In the above example, we can see each of the semaphores. Looking at the enum code
|
669
673
|
in `ext/semian/sysv_semaphores.h` we can see that:
|
670
674
|
|
671
|
-
|
672
|
-
|
673
|
-
|
674
|
-
|
675
|
+
- 0: is the semian meta lock (mutex) protecting updates to the other resources. It's currently free
|
676
|
+
- 1: is the number of available tickets - currently no tickets are in use because it's the same as 2
|
677
|
+
- 2: is the configured (maximum) number of tickets
|
678
|
+
- 3: is the number of registered workers (processes) that would be considered if using the quota strategy.
|
675
679
|
|
676
680
|
## Defense line
|
677
681
|
|
@@ -884,45 +888,47 @@ $ cd semian
|
|
884
888
|
```
|
885
889
|
|
886
890
|
## Visual Studio Code
|
887
|
-
- Open semian in vscode
|
888
|
-
- Install recommended extensions (one off requirement)
|
889
|
-
- Click `reopen in container` (first boot might take about a minute)
|
890
891
|
|
891
|
-
|
892
|
+
- Open semian in vscode
|
893
|
+
- Install recommended extensions (one off requirement)
|
894
|
+
- Click `reopen in container` (first boot might take about a minute)
|
892
895
|
|
896
|
+
See https://code.visualstudio.com/docs/remote/containers for more details
|
893
897
|
|
894
|
-
|
898
|
+
If you make any changes to `.devcontainer/` you'd need to recreate the containers:
|
895
899
|
|
896
|
-
|
900
|
+
- Select `Rebuild Container` from the command palette
|
897
901
|
|
902
|
+
Running Tests:
|
898
903
|
|
899
|
-
|
900
|
-
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
904
|
+
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
901
905
|
|
902
906
|
## Everything else
|
903
907
|
|
904
|
-
|
905
|
-
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
|
906
|
-
- `$ docker exec -it semian bash`
|
908
|
+
Test semian in containers:
|
907
909
|
|
908
|
-
|
910
|
+
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
|
911
|
+
- `$ docker exec -it semian bash`
|
909
912
|
|
910
|
-
|
913
|
+
If you make any changes to `.devcontainer/` you'd need to recreate the containers:
|
911
914
|
|
912
|
-
|
915
|
+
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d --force-recreate`
|
916
|
+
|
917
|
+
Run tests in containers:
|
918
|
+
|
919
|
+
```shell
|
920
|
+
$ docker-compose -f ./.devcontainer/docker-compose.yml run --rm test
|
921
|
+
```
|
913
922
|
|
914
|
-
|
915
|
-
$ docker-compose -f ./.devcontainer/docker-compose.yml run --rm test
|
916
|
-
```
|
923
|
+
Running Tests:
|
917
924
|
|
918
|
-
|
919
|
-
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
925
|
+
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
920
926
|
|
921
927
|
### Running tests in batches
|
922
928
|
|
923
|
-
|
924
|
-
It uses to identify a total number of batches, that would be run in parallel.
|
925
|
-
|
929
|
+
- _TEST_WORKERS_ - Total number of workers or batches.
|
930
|
+
It uses to identify a total number of batches, that would be run in parallel. _Default: 1_
|
931
|
+
- _TEST_WORKER_NUM_ - Specify which batch to run. The value is between 1 and _TEST_WORKERS_. _Default: 1_
|
926
932
|
|
927
933
|
```shell
|
928
934
|
$ bundle exec rake test:parallel TEST_WORKERS=5 TEST_WORKER_NUM=1
|
@@ -17,7 +17,8 @@ module Semian
|
|
17
17
|
|
18
18
|
def initialize(name, exceptions:, success_threshold:, error_threshold:,
|
19
19
|
error_timeout:, implementation:, half_open_resource_timeout: nil,
|
20
|
-
error_threshold_timeout: nil, error_threshold_timeout_enabled: true
|
20
|
+
error_threshold_timeout: nil, error_threshold_timeout_enabled: true,
|
21
|
+
lumping_interval: 0)
|
21
22
|
@name = name.to_sym
|
22
23
|
@success_count_threshold = success_threshold
|
23
24
|
@error_count_threshold = error_threshold
|
@@ -26,6 +27,11 @@ module Semian
|
|
26
27
|
@error_timeout = error_timeout
|
27
28
|
@exceptions = exceptions
|
28
29
|
@half_open_resource_timeout = half_open_resource_timeout
|
30
|
+
@lumping_interval = lumping_interval
|
31
|
+
|
32
|
+
if @lumping_interval > @error_threshold_timeout
|
33
|
+
raise ArgumentError, "lumping_interval (#{@lumping_interval}) must be less than error_threshold_timeout (#{@error_threshold_timeout})"
|
34
|
+
end
|
29
35
|
|
30
36
|
@errors = implementation::SlidingWindow.new(max_size: @error_count_threshold)
|
31
37
|
@successes = implementation::Integer.new
|
@@ -63,7 +69,6 @@ module Semian
|
|
63
69
|
|
64
70
|
def mark_failed(error)
|
65
71
|
push_error(error)
|
66
|
-
push_time
|
67
72
|
if closed?
|
68
73
|
transition_to_open if error_threshold_reached?
|
69
74
|
elsif half_open?
|
@@ -132,16 +137,16 @@ module Semian
|
|
132
137
|
end
|
133
138
|
|
134
139
|
def push_error(error)
|
135
|
-
@last_error = error
|
136
|
-
end
|
137
|
-
|
138
|
-
def push_time
|
139
140
|
time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
141
|
+
|
140
142
|
if error_threshold_timeout_enabled
|
141
143
|
@errors.reject! { |err_time| err_time + @error_threshold_timeout < time }
|
142
144
|
end
|
143
145
|
|
144
|
-
@errors
|
146
|
+
if @errors.empty? || @errors.last <= time - @lumping_interval
|
147
|
+
@last_error = error
|
148
|
+
@errors << time
|
149
|
+
end
|
145
150
|
end
|
146
151
|
|
147
152
|
def log_state_transition(new_state)
|
data/lib/semian/version.rb
CHANGED
data/lib/semian.rb
CHANGED
@@ -310,6 +310,11 @@ module Semian
|
|
310
310
|
else
|
311
311
|
options[:error_threshold_timeout_enabled]
|
312
312
|
end,
|
313
|
+
lumping_interval: if options[:lumping_interval].nil?
|
314
|
+
0
|
315
|
+
else
|
316
|
+
options[:lumping_interval]
|
317
|
+
end,
|
313
318
|
exceptions: Array(exceptions) + [::Semian::BaseError],
|
314
319
|
half_open_resource_timeout: options[:half_open_resource_timeout],
|
315
320
|
implementation: implementation(**options),
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: semian
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.24.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Scott Francis
|
@@ -77,7 +77,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
77
77
|
- !ruby/object:Gem::Version
|
78
78
|
version: '0'
|
79
79
|
requirements: []
|
80
|
-
rubygems_version: 3.6.
|
80
|
+
rubygems_version: 3.6.9
|
81
81
|
specification_version: 4
|
82
82
|
summary: Bulkheading for Ruby with SysV semaphores
|
83
83
|
test_files: []
|