semian 0.23.0 → 0.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +151 -70
- data/lib/semian/circuit_breaker.rb +8 -7
- data/lib/semian/configuration_validator.rb +233 -0
- data/lib/semian/lru_hash.rb +2 -2
- data/lib/semian/version.rb +1 -1
- data/lib/semian.rb +11 -14
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c46f4408d72d0fe86b9d1199429942007e48dc5a43dc75dc8f361d3d85e369c9
|
4
|
+
data.tar.gz: c297a18a78e2fc02e3c3823c6488413bc6170d8bd718acdcc8995c96bf68b070
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 115fe761cbc5e3cf7bdaabb2e28ae04c1ac594d9bad3b69b8c63c996d9b52bf2306e6a3de6892a65b069ed4033f0f53ca09f215b19ebde54e4616059bbb6784d
|
7
|
+
data.tar.gz: 94b77f1de5c869ce44384dc4e6b5fb2acbde52f7074a786130b52b48185ff367cd19541da1ecc298828fda06c679f431fe6fd11c7eee9565b882a6fe1432cb7c
|
data/README.md
CHANGED
@@ -15,15 +15,15 @@ allowing you to handle errors gracefully.** Semian does this by intercepting
|
|
15
15
|
resource access through heuristic patterns inspired by [Hystrix][hystrix] and
|
16
16
|
[Release It][release-it]:
|
17
17
|
|
18
|
-
|
18
|
+
- [**Circuit breaker**](#circuit-breaker). A pattern for limiting the
|
19
19
|
amount of requests to a dependency that is having issues.
|
20
|
-
|
20
|
+
- [**Bulkheading**](#bulkheading). Controlling the concurrent access to
|
21
21
|
a single resource, access is coordinated server-wide with [SysV
|
22
22
|
semaphores][sysv].
|
23
23
|
|
24
24
|
Resource drivers are monkey-patched to be aware of Semian, these are called
|
25
25
|
[Semian Adapters](#adapters). Thus, every time resource access is requested
|
26
|
-
Semian is queried for status on the resource first.
|
26
|
+
Semian is queried for status on the resource first. If Semian, through the
|
27
27
|
patterns above, deems the resource to be unavailable it will raise an exception.
|
28
28
|
**The ultimate outcome of Semian is always an exception that can then be rescued
|
29
29
|
for a graceful fallback**. Instead of waiting for the timeout, Semian raises
|
@@ -60,7 +60,7 @@ section](#configuration) on how to configure adapters.
|
|
60
60
|
|
61
61
|
Semian works by intercepting resource access. Every time access is requested,
|
62
62
|
Semian is queried, and it will raise an exception if the resource is unavailable
|
63
|
-
according to the circuit breaker or bulkheads.
|
63
|
+
according to the circuit breaker or bulkheads. This is done by monkey-patching
|
64
64
|
the resource driver. **The exception raised by the driver always inherits from
|
65
65
|
the Base exception class of the driver**, meaning you can always simply rescue
|
66
66
|
the base class and catch both Semian and driver errors in the same rescue for
|
@@ -69,11 +69,11 @@ fallbacks.
|
|
69
69
|
The following adapters are in Semian and tested heavily in production, the
|
70
70
|
version is the version of the public gem with the same name:
|
71
71
|
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
72
|
+
- [`semian/mysql2`][mysql-semian-adapter] (~> 0.3.16)
|
73
|
+
- [`semian/redis`][redis-semian-adapter] (~> 3.2.1)
|
74
|
+
- [`semian/net_http`][nethttp-semian-adapter]
|
75
|
+
- [`semian/activerecord_trilogy_adapter`][activerecord-trilogy-semian-adapter]
|
76
|
+
- [`semian-postgres`][postgres-semian-adapter]
|
77
77
|
|
78
78
|
### Creating Adapters
|
79
79
|
|
@@ -113,6 +113,10 @@ Semian.maximum_lru_size = 0
|
|
113
113
|
|
114
114
|
# Minimum time in seconds a resource should be resident in the LRU cache (default: 300s)
|
115
115
|
Semian.minimum_lru_time = 60
|
116
|
+
|
117
|
+
# If true, raise exceptions in case of a validation / constraint failure
|
118
|
+
# Otherwise, log in output
|
119
|
+
Semian.default_force_config_validation = false
|
116
120
|
```
|
117
121
|
|
118
122
|
Note: `minimum_lru_time` is a stronger guarantee than `maximum_lru_size`. That
|
@@ -120,6 +124,10 @@ is, if a resource has been updated more recently than `minimum_lru_time` it
|
|
120
124
|
will not be garbage collected, even if it would cause the LRU cache to grow
|
121
125
|
larger than `maximum_lru_size`.
|
122
126
|
|
127
|
+
Note: `default_force_config_validation` set to `true` is a
|
128
|
+
**_potentially breaking change_**. Misconfigured Semians will raise errors, so
|
129
|
+
make sure that this is what you want. See more in [Configuration Validation](#configuration-validation).
|
130
|
+
|
123
131
|
When instantiating a resource it now needs to be configured for Semian. This is
|
124
132
|
done by passing `semian` as an argument when initializing the client. Examples
|
125
133
|
built in adapters:
|
@@ -132,7 +140,8 @@ client = Mysql2::Client.new(host: "localhost", username: "root", semian: {
|
|
132
140
|
tickets: 8, # See the Understanding Semian section on picking these values
|
133
141
|
success_threshold: 2,
|
134
142
|
error_threshold: 3,
|
135
|
-
error_timeout: 10
|
143
|
+
error_timeout: 10,
|
144
|
+
force_config_validation: false
|
136
145
|
})
|
137
146
|
|
138
147
|
# Redis client
|
@@ -145,6 +154,32 @@ client = Redis.new(semian: {
|
|
145
154
|
})
|
146
155
|
```
|
147
156
|
|
157
|
+
#### Configuration Validation
|
158
|
+
|
159
|
+
Semian now provides a flag to specify log-based and exception-based configuration validation. To
|
160
|
+
explicitly force the Semian to validate it's configurations, pass `force_config_validation: true`
|
161
|
+
into your resource. This will raise an error in the case of a misconfigured or illegal Semian. Otherwise,
|
162
|
+
if it is set to `false`, it will log misconfigured parameters verbosely in output.
|
163
|
+
|
164
|
+
If not specified, it will use `Semian.default_force_config_validation` as
|
165
|
+
the flag.
|
166
|
+
|
167
|
+
##### Migration Strategy for Force Config Validation
|
168
|
+
|
169
|
+
When migrating to use `force_config_validation: true`, follow these steps:
|
170
|
+
|
171
|
+
1. **Deploy with it turned off**: Start with `force_config_validation: false` in your configuration
|
172
|
+
2. **Look for logs with prefix**: Monitor your application logs for entries with the `[SEMIAN_CONFIG_WARNING]:` prefix. These logs will indicate misconfigured Semian resources
|
173
|
+
3. **Iterate to fix**: Address each configuration issue identified in the logs by updating your Semian configurations
|
174
|
+
4. **Enable**: Once all configuration issues are resolved, set `force_config_validation: true` to enable strict validation
|
175
|
+
|
176
|
+
Example log entries to look for:
|
177
|
+
```
|
178
|
+
[SEMIAN_CONFIG_WARNING]: Missing required arguments for Semian: [:success_threshold, :error_threshold, :error_timeout]
|
179
|
+
[SEMIAN_CONFIG_WARNING]: Both bulkhead and circuitbreaker cannot be disabled.
|
180
|
+
[SEMIAN_CONFIG_WARNING]: Bulkhead configuration require either the :tickets or :quota parameter, you provided neither
|
181
|
+
```
|
182
|
+
|
148
183
|
#### Thread Safety
|
149
184
|
|
150
185
|
Semian's circuit breaker implementation is thread-safe by default as of
|
@@ -158,27 +193,27 @@ should be adequate in most environments with reasonably low timeouts.
|
|
158
193
|
|
159
194
|
Internally, semian uses `SEM_UNDO` for several sysv semaphore operations:
|
160
195
|
|
161
|
-
|
162
|
-
|
163
|
-
|
196
|
+
- Acquire
|
197
|
+
- Worker registration
|
198
|
+
- Semaphore metadata state lock
|
164
199
|
|
165
200
|
The intention behind `SEM_UNDO` is that a semaphore operation is automatically undone when the process exits. This
|
166
201
|
is true even if the process exits abnormally - crashes, receives a `SIG_KILL`, etc, because it is handled by
|
167
202
|
the operating system and not the process itself.
|
168
203
|
|
169
204
|
If, however, a thread performs a semop, the `SEM_UNDO` is on its parent process. This means that the operation
|
170
|
-
|
205
|
+
_will not_ be undone when the thread exits. This can result in the following unfavorable behavior when using
|
171
206
|
threads:
|
172
207
|
|
173
|
-
|
174
|
-
ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
|
175
|
-
This can result in deadlock on the resource.
|
176
|
-
|
177
|
-
count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
|
178
|
-
only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
|
179
|
-
|
180
|
-
attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
|
181
|
-
from being updated.
|
208
|
+
- Threads acquire a resource, but are killed and the resource ticket is never released. For a process, the
|
209
|
+
ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
|
210
|
+
This can result in deadlock on the resource.
|
211
|
+
- Threads that register workers on a resource but are killed and never unregistered. For a process, the worker
|
212
|
+
count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
|
213
|
+
only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
|
214
|
+
- If a thread acquires the semaphore metadata lock and dies before releasing it, semian will deadlock on anything
|
215
|
+
attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
|
216
|
+
from being updated.
|
182
217
|
|
183
218
|
Moreover, a strategy that utilizes `SEM_UNDO` is not compatible with a strategy that attempts to the semaphores tickets manually.
|
184
219
|
In order to support threads, operations that currently use `SEM_UNDO` would need to use no semaphore flag, and the calling process
|
@@ -214,17 +249,19 @@ calculate and adjust ticket counts.
|
|
214
249
|
|
215
250
|
- You must pass **exactly** one of options: `tickets` or `quota`.
|
216
251
|
- Tickets available will be the ceiling of the quota ratio to the number of workers
|
217
|
-
|
252
|
+
- So, with one worker, there will always be a minimum of 1 ticket
|
218
253
|
- Workers in different processes will automatically unregister when the process exits.
|
219
254
|
- If you have a small number of workers (exactly 2) it's possible that the bulkhead will be too sensitive using quotas.
|
220
255
|
- If you use a forking web server (like unicorn) you should call `Semian.unregister_all_resources` before/after forking.
|
221
256
|
|
222
257
|
#### Net::HTTP
|
258
|
+
|
223
259
|
For the `Net::HTTP` specific Semian adapter, since many external libraries may create
|
224
260
|
HTTP connections on the user's behalf, the parameters are instead provided
|
225
261
|
by associating callback functions with `Semian::NetHTTP`, perhaps in an initialization file.
|
226
262
|
|
227
263
|
##### Naming and Options
|
264
|
+
|
228
265
|
To give Semian parameters, assign a `proc` to `Semian::NetHTTP.semian_configuration`
|
229
266
|
that takes a two parameters, `host` and `port` like `127.0.0.1`,`443` or `github_com`,`80`,
|
230
267
|
and returns a `Hash` with configuration parameters as follows. The `proc` is used as a
|
@@ -282,11 +319,11 @@ Semian::NetHTTP.semian_configuration = proc do |host, port|
|
|
282
319
|
SEMIAN_PARAMETERS.merge(name: name)
|
283
320
|
end
|
284
321
|
|
285
|
-
# Two requests to
|
322
|
+
# Two requests to shopify.com can use two different semian resources,
|
286
323
|
# as long as `CurrentSemianSubResource.sub_name` is set accordingly:
|
287
|
-
# CurrentSemianSubResource.set(sub_name: "sub_resource_1") { Net::HTTP.get_response(URI("http://
|
324
|
+
# CurrentSemianSubResource.set(sub_name: "sub_resource_1") { Net::HTTP.get_response(URI("http://shopify.com")) }
|
288
325
|
# and:
|
289
|
-
# CurrentSemianSubResource.set(sub_name: "sub_resource_2") { Net::HTTP.get_response(URI("http://
|
326
|
+
# CurrentSemianSubResource.set(sub_name: "sub_resource_2") { Net::HTTP.get_response(URI("http://shopify.com")) }
|
290
327
|
```
|
291
328
|
|
292
329
|
For most purposes, `"#{host}_#{port}"` is a good default `name`. Custom `name` formats
|
@@ -300,6 +337,7 @@ behavior can be changed to blacklisting or even be completely disabled by varyin
|
|
300
337
|
the use of returning `nil` in the assigned closure.
|
301
338
|
|
302
339
|
##### Additional Exceptions
|
340
|
+
|
303
341
|
Since we envision this particular adapter can be used in combination with many
|
304
342
|
external libraries, that can raise additional exceptions, we added functionality to
|
305
343
|
expand the Exceptions that can be tracked as part of Semian's circuit breaker.
|
@@ -513,22 +551,23 @@ all workers on a server.
|
|
513
551
|
|
514
552
|
There are four configuration parameters for circuit breakers in Semian:
|
515
553
|
|
516
|
-
|
517
|
-
|
554
|
+
- **circuit_breaker**. Enable or Disable Circuit Breaker. Defaults to `true` if not set.
|
555
|
+
- **error_threshold**. The amount of errors a worker encounters within `error_threshold_timeout`
|
518
556
|
amount of time before opening the circuit,
|
519
557
|
that is to start rejecting requests instantly.
|
520
|
-
|
558
|
+
- **error_threshold_timeout**. The amount of time in seconds that `error_threshold`
|
521
559
|
errors must occur to open the circuit.
|
522
560
|
Defaults to `error_timeout` seconds if not set.
|
523
|
-
|
561
|
+
- **error_timeout**. The amount of time in seconds until trying to query the resource
|
524
562
|
again.
|
525
|
-
|
563
|
+
- **error_threshold_timeout_enabled**. If set to false it will disable
|
526
564
|
the time window for evicting old exceptions. `error_timeout` is still used and
|
527
565
|
will reset the circuit. Defaults to `true` if not set.
|
528
|
-
|
566
|
+
- **success_threshold**. The amount of successes on the circuit until closing it
|
529
567
|
again, that is to start accepting all requests to the circuit.
|
530
|
-
|
568
|
+
- **half_open_resource_timeout**. Timeout for the resource in seconds when
|
531
569
|
the circuit is half-open (supported for MySQL, Net::HTTP and Redis).
|
570
|
+
- **lumping_interval**. If provided, errors within this timeframe (in seconds) will be lumped and recorded as one.
|
532
571
|
|
533
572
|
It is possible to disable Circuit Breaker with environment variable
|
534
573
|
`SEMIAN_CIRCUIT_BREAKER_DISABLED=1`.
|
@@ -587,13 +626,13 @@ graph TD;
|
|
587
626
|
ReleaseTicket[Release Ticket]
|
588
627
|
FailRequest[Fail Request]
|
589
628
|
OpenCircuit[Open Circuit Breaker]
|
590
|
-
|
629
|
+
|
591
630
|
Start --> CheckConnection
|
592
631
|
CheckConnection -->|Ticket Available| AllocateTicket
|
593
632
|
AllocateTicket --> AccessResource
|
594
633
|
AccessResource --> ReleaseTicket
|
595
634
|
ReleaseTicket --> CheckConnection
|
596
|
-
|
635
|
+
|
597
636
|
CheckConnection -->|No Ticket Available| BlockTimeout
|
598
637
|
BlockTimeout -->|Timeout| FailRequest
|
599
638
|
BlockTimeout -->|Ticket Available| AccessResource
|
@@ -614,9 +653,9 @@ still experimenting with ways to figure out optimal ticket numbers. Generally
|
|
614
653
|
something below half the number of workers on the server for endpoints that are
|
615
654
|
queried frequently has worked well for us.
|
616
655
|
|
617
|
-
|
618
|
-
|
619
|
-
|
656
|
+
- **bulkhead**. Enable or Disable Bulkhead. Defaults to `true` if not set.
|
657
|
+
- **tickets**. Number of workers that can concurrently access a resource.
|
658
|
+
- **timeout**. Time to wait in seconds to acquire a ticket if there are no tickets left.
|
620
659
|
We recommend this to be `0` unless you have very few workers running (i.e.
|
621
660
|
less than ~5).
|
622
661
|
|
@@ -626,11 +665,11 @@ It is possible to disable Bulkhead with environment variable
|
|
626
665
|
Note that there are system-wide limitations on how many tickets can be allocated
|
627
666
|
on a system. `cat /proc/sys/kernel/sem` will tell you.
|
628
667
|
|
629
|
-
> System-wide limit on the number of semaphore sets.
|
630
|
-
|
631
|
-
|
632
|
-
|
633
|
-
|
668
|
+
> System-wide limit on the number of semaphore sets. On Linux
|
669
|
+
> systems before version 3.19, the default value for this limit
|
670
|
+
> was 128. Since Linux 3.19, the default value is 32,000. On
|
671
|
+
> Linux, this limit can be read and modified via the fourth
|
672
|
+
> field of `/proc/sys/kernel/sem`.
|
634
673
|
|
635
674
|
#### Bulkhead debugging on linux
|
636
675
|
|
@@ -668,10 +707,10 @@ semnum value ncount zcount pid
|
|
668
707
|
In the above example, we can see each of the semaphores. Looking at the enum code
|
669
708
|
in `ext/semian/sysv_semaphores.h` we can see that:
|
670
709
|
|
671
|
-
|
672
|
-
|
673
|
-
|
674
|
-
|
710
|
+
- 0: is the semian meta lock (mutex) protecting updates to the other resources. It's currently free
|
711
|
+
- 1: is the number of available tickets - currently no tickets are in use because it's the same as 2
|
712
|
+
- 2: is the configured (maximum) number of tickets
|
713
|
+
- 3: is the number of registered workers (processes) that would be considered if using the quota strategy.
|
675
714
|
|
676
715
|
## Defense line
|
677
716
|
|
@@ -884,45 +923,87 @@ $ cd semian
|
|
884
923
|
```
|
885
924
|
|
886
925
|
## Visual Studio Code
|
887
|
-
- Open semian in vscode
|
888
|
-
- Install recommended extensions (one off requirement)
|
889
|
-
- Click `reopen in container` (first boot might take about a minute)
|
890
926
|
|
891
|
-
|
927
|
+
- Open semian in vscode
|
928
|
+
- Install recommended extensions (one off requirement)
|
929
|
+
- Click `reopen in container` (first boot might take about a minute)
|
930
|
+
|
931
|
+
See https://code.visualstudio.com/docs/remote/containers for more details
|
932
|
+
|
933
|
+
If you make any changes to `.devcontainer/` you'd need to recreate the containers:
|
892
934
|
|
935
|
+
- Select `Rebuild Container` from the command palette
|
893
936
|
|
894
|
-
|
937
|
+
Running Tests:
|
895
938
|
|
896
|
-
|
939
|
+
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
897
940
|
|
941
|
+
### Interactive Test Debugging
|
898
942
|
|
899
|
-
|
900
|
-
|
943
|
+
To use the interactive debugger on vscode:
|
944
|
+
- Open semian in vscode
|
945
|
+
- Create an `.env` file (if it doesn't exist)
|
946
|
+
- Set up a `DEBUG` ENV variable (ex; `DEBUG=true`)
|
947
|
+
- Under the `.vscode/` subdirectory, create a `launch.json` file, and include the following:
|
948
|
+
|
949
|
+
```json
|
950
|
+
{
|
951
|
+
"configurations": [
|
952
|
+
{
|
953
|
+
"type": "rdbg",
|
954
|
+
"name": "Attach to Ruby rdbg",
|
955
|
+
"request": "attach",
|
956
|
+
"debugPort": "12345",
|
957
|
+
}
|
958
|
+
]
|
959
|
+
}
|
960
|
+
```
|
961
|
+
|
962
|
+
- For universal support, for any lines you would like to add breakpoints to in your `_test.rb` file (under `test/`), include the following snippet near the line of interest:
|
963
|
+
|
964
|
+
```rb
|
965
|
+
require "debug"
|
966
|
+
binding.break if ENV["DEBUG"]
|
967
|
+
```
|
968
|
+
|
969
|
+
**Note:** unless you are using an vscode extension such as [Dev Container](https://code.visualstudio.com/docs/devcontainers/tutorial), **do not use the built-in vscode breakpoints -- they will not work!**
|
970
|
+
|
971
|
+
- Start up the test container
|
972
|
+
|
973
|
+
```shell
|
974
|
+
$ docker-compose -f .devcontainer/docker-compose.yml --profile test up -d
|
975
|
+
```
|
976
|
+
|
977
|
+
- When the process indicates that it is waiting for the debugger connection, go to the `Run and Debug` tab, and execute the `Attach to Ruby rdbg` debugger
|
978
|
+
|
979
|
+
- Use the vscode debugging tools (such as step in, step out, pause, resume) as normal
|
901
980
|
|
902
981
|
## Everything else
|
903
982
|
|
904
|
-
|
905
|
-
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
|
906
|
-
- `$ docker exec -it semian bash`
|
983
|
+
Test semian in containers:
|
907
984
|
|
908
|
-
|
985
|
+
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
|
986
|
+
- `$ docker exec -it semian bash`
|
909
987
|
|
910
|
-
|
988
|
+
If you make any changes to `.devcontainer/` you'd need to recreate the containers:
|
911
989
|
|
912
|
-
|
990
|
+
- `$ docker-compose -f .devcontainer/docker-compose.yml up -d --force-recreate`
|
991
|
+
|
992
|
+
Run tests in containers:
|
993
|
+
|
994
|
+
```shell
|
995
|
+
$ docker-compose -f ./.devcontainer/docker-compose.yml --profile test run --rm test
|
996
|
+
```
|
913
997
|
|
914
|
-
|
915
|
-
$ docker-compose -f ./.devcontainer/docker-compose.yml run --rm test
|
916
|
-
```
|
998
|
+
Running Tests:
|
917
999
|
|
918
|
-
|
919
|
-
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
1000
|
+
- `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
|
920
1001
|
|
921
1002
|
### Running tests in batches
|
922
1003
|
|
923
|
-
|
924
|
-
It uses to identify a total number of batches, that would be run in parallel.
|
925
|
-
|
1004
|
+
- _TEST_WORKERS_ - Total number of workers or batches.
|
1005
|
+
It uses to identify a total number of batches, that would be run in parallel. _Default: 1_
|
1006
|
+
- _TEST_WORKER_NUM_ - Specify which batch to run. The value is between 1 and _TEST_WORKERS_. _Default: 1_
|
926
1007
|
|
927
1008
|
```shell
|
928
1009
|
$ bundle exec rake test:parallel TEST_WORKERS=5 TEST_WORKER_NUM=1
|
@@ -17,7 +17,8 @@ module Semian
|
|
17
17
|
|
18
18
|
def initialize(name, exceptions:, success_threshold:, error_threshold:,
|
19
19
|
error_timeout:, implementation:, half_open_resource_timeout: nil,
|
20
|
-
error_threshold_timeout: nil, error_threshold_timeout_enabled: true
|
20
|
+
error_threshold_timeout: nil, error_threshold_timeout_enabled: true,
|
21
|
+
lumping_interval: 0)
|
21
22
|
@name = name.to_sym
|
22
23
|
@success_count_threshold = success_threshold
|
23
24
|
@error_count_threshold = error_threshold
|
@@ -26,6 +27,7 @@ module Semian
|
|
26
27
|
@error_timeout = error_timeout
|
27
28
|
@exceptions = exceptions
|
28
29
|
@half_open_resource_timeout = half_open_resource_timeout
|
30
|
+
@lumping_interval = lumping_interval
|
29
31
|
|
30
32
|
@errors = implementation::SlidingWindow.new(max_size: @error_count_threshold)
|
31
33
|
@successes = implementation::Integer.new
|
@@ -63,7 +65,6 @@ module Semian
|
|
63
65
|
|
64
66
|
def mark_failed(error)
|
65
67
|
push_error(error)
|
66
|
-
push_time
|
67
68
|
if closed?
|
68
69
|
transition_to_open if error_threshold_reached?
|
69
70
|
elsif half_open?
|
@@ -132,16 +133,16 @@ module Semian
|
|
132
133
|
end
|
133
134
|
|
134
135
|
def push_error(error)
|
135
|
-
@last_error = error
|
136
|
-
end
|
137
|
-
|
138
|
-
def push_time
|
139
136
|
time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
137
|
+
|
140
138
|
if error_threshold_timeout_enabled
|
141
139
|
@errors.reject! { |err_time| err_time + @error_threshold_timeout < time }
|
142
140
|
end
|
143
141
|
|
144
|
-
@errors
|
142
|
+
if @errors.empty? || @errors.last <= time - @lumping_interval
|
143
|
+
@last_error = error
|
144
|
+
@errors << time
|
145
|
+
end
|
145
146
|
end
|
146
147
|
|
147
148
|
def log_state_transition(new_state)
|
@@ -0,0 +1,233 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Semian
|
4
|
+
class ConfigurationValidator
|
5
|
+
def initialize(name, configuration)
|
6
|
+
@name = name
|
7
|
+
@configuration = configuration
|
8
|
+
@adapter = configuration[:adapter]
|
9
|
+
@force_config_validation = force_config_validation?
|
10
|
+
|
11
|
+
unless @force_config_validation
|
12
|
+
Semian.logger.warn(
|
13
|
+
"Semian is running in log-mode for configuration validation. This means that Semian will not raise an error if the configuration is invalid. This is not recommended for production environments.\n\n[IMPORTANT] IN FUTURE RELEASES, STRICT CONFIGURATION VALIDATION WILL BE THE DEFAULT BEHAVIOR. PLEASE UPDATE YOUR CONFIGURATION TO USE `force_config_validation: true` TO ENABLE STRICT CONFIGURATION VALIDATION. ALLOWING MISCONFIGURATIONS IN FUTURE RELEASES WILL BREAK YOUR SEMIAN.\n---\n",
|
14
|
+
)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def validate!
|
19
|
+
validate_circuit_breaker_or_bulkhead!
|
20
|
+
validate_bulkhead_configuration!
|
21
|
+
validate_circuit_breaker_configuration!
|
22
|
+
validate_resource_name!
|
23
|
+
end
|
24
|
+
|
25
|
+
private
|
26
|
+
|
27
|
+
def hint_format(message)
|
28
|
+
"\n\nHINT: #{message}\n---"
|
29
|
+
end
|
30
|
+
|
31
|
+
def raise_or_log_validation_required!(message)
|
32
|
+
if @force_config_validation
|
33
|
+
raise ArgumentError, message
|
34
|
+
else
|
35
|
+
Semian.logger.warn("[SEMIAN_CONFIG_WARNING]: #{message}")
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
def require_keys!(required, options)
|
40
|
+
diff = required - options.keys
|
41
|
+
unless diff.empty?
|
42
|
+
raise_or_log_validation_required!("Missing required arguments for Semian: #{diff}")
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
def validate_circuit_breaker_or_bulkhead!
|
47
|
+
if (@configuration[:circuit_breaker] == false || ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")) && (@configuration[:bulkhead] == false || ENV.key?("SEMIAN_BULKHEAD_DISABLED"))
|
48
|
+
raise_or_log_validation_required!("Both bulkhead and circuitbreaker cannot be disabled.")
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def validate_bulkhead_configuration!
|
53
|
+
return if ENV.key?("SEMIAN_BULKHEAD_DISABLED")
|
54
|
+
return unless @configuration.fetch(:bulkhead, true)
|
55
|
+
|
56
|
+
tickets = @configuration[:tickets]
|
57
|
+
quota = @configuration[:quota]
|
58
|
+
|
59
|
+
if tickets.nil? && quota.nil?
|
60
|
+
raise_or_log_validation_required!("Bulkhead configuration require either the :tickets or :quota parameter, you provided neither")
|
61
|
+
end
|
62
|
+
|
63
|
+
if tickets && quota
|
64
|
+
raise_or_log_validation_required!("Bulkhead configuration require either the :tickets or :quota parameter, you provided both")
|
65
|
+
end
|
66
|
+
|
67
|
+
validate_quota!(quota) if quota
|
68
|
+
validate_tickets!(tickets) if tickets
|
69
|
+
end
|
70
|
+
|
71
|
+
def validate_circuit_breaker_configuration!
|
72
|
+
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
73
|
+
return unless @configuration.fetch(:circuit_breaker, true)
|
74
|
+
|
75
|
+
require_keys!([:success_threshold, :error_threshold, :error_timeout], @configuration)
|
76
|
+
validate_thresholds!
|
77
|
+
validate_timeouts!
|
78
|
+
end
|
79
|
+
|
80
|
+
def validate_thresholds!
|
81
|
+
success_threshold = @configuration[:success_threshold]
|
82
|
+
error_threshold = @configuration[:error_threshold]
|
83
|
+
|
84
|
+
unless success_threshold.is_a?(Integer) && success_threshold > 0
|
85
|
+
err = "success_threshold must be a positive integer, got #{success_threshold}"
|
86
|
+
|
87
|
+
if success_threshold == 0
|
88
|
+
err += hint_format("Are you sure that this is what you want? This will close the circuit breaker immediately after `error_timeout` seconds without checking the resource!")
|
89
|
+
end
|
90
|
+
|
91
|
+
raise_or_log_validation_required!(err)
|
92
|
+
end
|
93
|
+
|
94
|
+
unless error_threshold.is_a?(Integer) && error_threshold > 0
|
95
|
+
err = "error_threshold must be a positive integer, got #{error_threshold}"
|
96
|
+
|
97
|
+
if error_threshold == 0
|
98
|
+
err += hint_format("Are you sure that this is what you want? This can result in the circuit opening up at unpredictable times!")
|
99
|
+
end
|
100
|
+
|
101
|
+
raise_or_log_validation_required!(err)
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
def validate_timeouts!
|
106
|
+
error_timeout = @configuration[:error_timeout]
|
107
|
+
error_threshold_timeout_enabled = @configuration[:error_threshold_timeout_enabled].nil? ? true : @configuration[:error_threshold_timeout_enabled]
|
108
|
+
error_threshold = @configuration[:error_threshold]
|
109
|
+
lumping_interval = @configuration[:lumping_interval]
|
110
|
+
half_open_resource_timeout = @configuration[:half_open_resource_timeout]
|
111
|
+
|
112
|
+
unless error_timeout.is_a?(Numeric) && error_timeout > 0
|
113
|
+
err = "error_timeout must be a positive number, got #{error_timeout}"
|
114
|
+
|
115
|
+
if error_timeout == 0
|
116
|
+
err += hint_format("Are you sure that this is what you want? This will close the circuit breaker immediately after opening it!")
|
117
|
+
end
|
118
|
+
|
119
|
+
raise_or_log_validation_required!(err)
|
120
|
+
end
|
121
|
+
|
122
|
+
# This state checks for contradictions between error_threshold_timeout_enabled and error_threshold_timeout.
|
123
|
+
unless error_threshold_timeout_enabled || !@configuration[:error_threshold_timeout]
|
124
|
+
err = "error_threshold_timeout_enabled and error_threshold_timeout must not contradict each other, got error_threshold_timeout_enabled: #{error_threshold_timeout_enabled}, error_threshold_timeout: #{@configuration[:error_threshold_timeout]}"
|
125
|
+
err += hint_format("Are you sure this is what you want? This will set error_threshold_timeout_enabled to #{error_threshold_timeout_enabled} while error_threshold_timeout is #{@configuration[:error_threshold_timeout] ? "truthy" : "falsy"}")
|
126
|
+
|
127
|
+
raise_or_log_validation_required!(err)
|
128
|
+
end
|
129
|
+
|
130
|
+
# Only set this after we have checked the error_threshold_timeout_enabled condition
|
131
|
+
error_threshold_timeout = @configuration[:error_threshold_timeout] || error_timeout
|
132
|
+
unless error_threshold_timeout.is_a?(Numeric) && error_threshold_timeout > 0
|
133
|
+
err = "error_threshold_timeout must be a positive number, got #{error_threshold_timeout}"
|
134
|
+
|
135
|
+
if error_threshold_timeout == 0
|
136
|
+
err += hint_format("Are you sure that this is what you want? This will almost never open the circuit breaker since the time interval to catch errors is 0!")
|
137
|
+
end
|
138
|
+
|
139
|
+
raise_or_log_validation_required!(err)
|
140
|
+
end
|
141
|
+
|
142
|
+
unless half_open_resource_timeout.nil? || (half_open_resource_timeout.is_a?(Numeric) && half_open_resource_timeout > 0)
|
143
|
+
err = "half_open_resource_timeout must be a positive number, got #{half_open_resource_timeout}"
|
144
|
+
|
145
|
+
if half_open_resource_timeout == 0
|
146
|
+
err += hint_format("Are you sure that this is what you want? This will never half-open the circuit breaker! If that's what you want, you can omit the option instead")
|
147
|
+
end
|
148
|
+
|
149
|
+
raise_or_log_validation_required!(err)
|
150
|
+
end
|
151
|
+
|
152
|
+
unless lumping_interval.nil? || (lumping_interval.is_a?(Numeric) && lumping_interval > 0)
|
153
|
+
err = "lumping_interval must be a positive number, got #{lumping_interval}"
|
154
|
+
|
155
|
+
if lumping_interval == 0
|
156
|
+
err += hint_format("Are you sure that this is what you want? This will never lump errors! If that's what you want, you can omit the option instead")
|
157
|
+
end
|
158
|
+
|
159
|
+
raise_or_log_validation_required!(err)
|
160
|
+
end
|
161
|
+
|
162
|
+
# You might be wondering why not check just check lumping_interval * error_threshold <= error_threshold_timeout
|
163
|
+
# The reason being is that since the lumping_interval starts at the first error, we count the first error
|
164
|
+
# at second 0. So we need to subtract 1 from the error_threshold to get the correct minimum time to reach the
|
165
|
+
# error threshold. error_threshold_timeout cannot be less than this minimum time.
|
166
|
+
#
|
167
|
+
# For example,
|
168
|
+
#
|
169
|
+
# error_threshold = 3
|
170
|
+
# error_threshold_timeout = 10
|
171
|
+
# lumping_interval = 4
|
172
|
+
#
|
173
|
+
# The first error could be counted at second 0, the second error could be counted at second 4, and the third
|
174
|
+
# error could be counted at second 8. So this is a valid configuration.
|
175
|
+
|
176
|
+
unless lumping_interval.nil? || error_threshold_timeout.nil? || lumping_interval * (error_threshold - 1) <= error_threshold_timeout
|
177
|
+
err = "constraint violated, this circuit breaker can never open! lumping_interval * (error_threshold - 1) should be <= error_threshold_timeout, got lumping_interval: #{lumping_interval}, error_threshold: #{error_threshold}, error_threshold_timeout: #{error_threshold_timeout}"
|
178
|
+
err += hint_format("lumping_interval starts from the first error and not in a fixed window. So you can fit n errors in n-1 seconds, since error 0 starts at 0 seconds. Ensure that you can fit `error_threshold` errors lumped in `lumping_interval` seconds within `error_threshold_timeout` seconds.")
|
179
|
+
|
180
|
+
raise_or_log_validation_required!(err)
|
181
|
+
end
|
182
|
+
end
|
183
|
+
|
184
|
+
def validate_quota!(quota)
|
185
|
+
unless quota.is_a?(Numeric) && quota > 0 && quota < 1
|
186
|
+
err = "quota must be a decimal between 0 and 1, got #{quota}"
|
187
|
+
|
188
|
+
if quota == 0
|
189
|
+
err += hint_format("Are you sure that this is what you want? This is the same as assigning no workers to the resource, disabling the resource!")
|
190
|
+
elsif quota == 1
|
191
|
+
err += hint_format("Are you sure that this is what you want? This is the same as assigning all workers to the resource, disabling the bulkhead!")
|
192
|
+
end
|
193
|
+
|
194
|
+
raise_or_log_validation_required!(err)
|
195
|
+
end
|
196
|
+
end
|
197
|
+
|
198
|
+
def validate_tickets!(tickets)
|
199
|
+
unless tickets.is_a?(Integer) && tickets > 0 && tickets < Semian::MAX_TICKETS
|
200
|
+
err = "ticket count must be a positive integer and less than #{Semian::MAX_TICKETS}, got #{tickets}"
|
201
|
+
|
202
|
+
if tickets == 0
|
203
|
+
err += hint_format("Are you sure that this is what you want? This is the same as assigning no workers to the resource, disabling the resource!")
|
204
|
+
elsif tickets == Semian::MAX_TICKETS
|
205
|
+
err += hint_format("Are you sure that this is what you want? This is the same as assigning all workers to the resource, disabling the bulkhead!")
|
206
|
+
end
|
207
|
+
|
208
|
+
raise_or_log_validation_required!(err)
|
209
|
+
end
|
210
|
+
end
|
211
|
+
|
212
|
+
def validate_resource_name!
|
213
|
+
unless @name.is_a?(String) || @name.is_a?(Symbol)
|
214
|
+
raise_or_log_validation_required!("name must be a symbol or string, got #{@name}")
|
215
|
+
end
|
216
|
+
|
217
|
+
if Semian.resources[@name]
|
218
|
+
err = "Resource with name #{@name} is already registered"
|
219
|
+
err += hint_format("Are you sure that this is what you want? This will override an existing resource with the same name!")
|
220
|
+
|
221
|
+
raise_or_log_validation_required!(err)
|
222
|
+
end
|
223
|
+
end
|
224
|
+
|
225
|
+
def force_config_validation?
|
226
|
+
if @configuration[:force_config_validation].nil?
|
227
|
+
Semian.default_force_config_validation
|
228
|
+
else
|
229
|
+
@configuration[:force_config_validation]
|
230
|
+
end
|
231
|
+
end
|
232
|
+
end
|
233
|
+
end
|
data/lib/semian/lru_hash.rb
CHANGED
data/lib/semian/version.rb
CHANGED
data/lib/semian.rb
CHANGED
@@ -16,6 +16,7 @@ require "semian/simple_sliding_window"
|
|
16
16
|
require "semian/simple_integer"
|
17
17
|
require "semian/simple_state"
|
18
18
|
require "semian/lru_hash"
|
19
|
+
require "semian/configuration_validator"
|
19
20
|
|
20
21
|
#
|
21
22
|
# === Overview
|
@@ -102,11 +103,12 @@ module Semian
|
|
102
103
|
OpenCircuitError = Class.new(BaseError)
|
103
104
|
SemaphoreMissingError = Class.new(BaseError)
|
104
105
|
|
105
|
-
attr_accessor :maximum_lru_size, :minimum_lru_time, :default_permissions, :namespace
|
106
|
+
attr_accessor :maximum_lru_size, :minimum_lru_time, :default_permissions, :namespace, :default_force_config_validation
|
106
107
|
|
107
108
|
self.maximum_lru_size = 500
|
108
109
|
self.minimum_lru_time = 300 # 300 seconds / 5 minutes
|
109
110
|
self.default_permissions = 0660
|
111
|
+
self.default_force_config_validation = false
|
110
112
|
|
111
113
|
def issue_disabled_semaphores_warning
|
112
114
|
return if defined?(@warning_issued)
|
@@ -184,13 +186,12 @@ module Semian
|
|
184
186
|
def register(name, **options)
|
185
187
|
return UnprotectedResource.new(name) if ENV.key?("SEMIAN_DISABLED")
|
186
188
|
|
189
|
+
# Validate configuration before proceeding
|
190
|
+
ConfigurationValidator.new(name, options).validate!
|
191
|
+
|
187
192
|
circuit_breaker = create_circuit_breaker(name, **options)
|
188
193
|
bulkhead = create_bulkhead(name, **options)
|
189
194
|
|
190
|
-
if circuit_breaker.nil? && bulkhead.nil?
|
191
|
-
raise ArgumentError, "Both bulkhead and circuitbreaker cannot be disabled."
|
192
|
-
end
|
193
|
-
|
194
195
|
resources[name] = ProtectedResource.new(name, bulkhead, circuit_breaker)
|
195
196
|
end
|
196
197
|
|
@@ -296,8 +297,6 @@ module Semian
|
|
296
297
|
return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
|
297
298
|
return unless options.fetch(:circuit_breaker, true)
|
298
299
|
|
299
|
-
require_keys!([:success_threshold, :error_threshold, :error_timeout], options)
|
300
|
-
|
301
300
|
exceptions = options[:exceptions] || []
|
302
301
|
CircuitBreaker.new(
|
303
302
|
name,
|
@@ -310,6 +309,11 @@ module Semian
|
|
310
309
|
else
|
311
310
|
options[:error_threshold_timeout_enabled]
|
312
311
|
end,
|
312
|
+
lumping_interval: if options[:lumping_interval].nil?
|
313
|
+
0
|
314
|
+
else
|
315
|
+
options[:lumping_interval]
|
316
|
+
end,
|
313
317
|
exceptions: Array(exceptions) + [::Semian::BaseError],
|
314
318
|
half_open_resource_timeout: options[:half_open_resource_timeout],
|
315
319
|
implementation: implementation(**options),
|
@@ -346,13 +350,6 @@ module Semian
|
|
346
350
|
timeout: timeout,
|
347
351
|
)
|
348
352
|
end
|
349
|
-
|
350
|
-
def require_keys!(required, options)
|
351
|
-
diff = required - options.keys
|
352
|
-
unless diff.empty?
|
353
|
-
raise ArgumentError, "Missing required arguments for Semian: #{diff}"
|
354
|
-
end
|
355
|
-
end
|
356
353
|
end
|
357
354
|
|
358
355
|
if Semian.semaphores_enabled?
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: semian
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.25.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Scott Francis
|
@@ -36,6 +36,7 @@ files:
|
|
36
36
|
- lib/semian/activerecord_trilogy_adapter.rb
|
37
37
|
- lib/semian/adapter.rb
|
38
38
|
- lib/semian/circuit_breaker.rb
|
39
|
+
- lib/semian/configuration_validator.rb
|
39
40
|
- lib/semian/grpc.rb
|
40
41
|
- lib/semian/instrumentable.rb
|
41
42
|
- lib/semian/lru_hash.rb
|
@@ -77,7 +78,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
77
78
|
- !ruby/object:Gem::Version
|
78
79
|
version: '0'
|
79
80
|
requirements: []
|
80
|
-
rubygems_version: 3.
|
81
|
+
rubygems_version: 3.7.1
|
81
82
|
specification_version: 4
|
82
83
|
summary: Bulkheading for Ruby with SysV semaphores
|
83
84
|
test_files: []
|