semian 0.23.0 → 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0b38abe4839593005920d41c13352540c219f93d7995f13ddf3f49cf7b51bdb8
4
- data.tar.gz: 5f929e537cfb7f3a8b65c9fc3b43565d8891ad7f0cb30db1f72c633d89343cf1
3
+ metadata.gz: c46f4408d72d0fe86b9d1199429942007e48dc5a43dc75dc8f361d3d85e369c9
4
+ data.tar.gz: c297a18a78e2fc02e3c3823c6488413bc6170d8bd718acdcc8995c96bf68b070
5
5
  SHA512:
6
- metadata.gz: '09901480b2863949adda640b361d9026988e77fb36656593569f54bcf46651dad00c793a1d201111479122adaa0f584e33606811ef2ec35ff924ba6fd518e395'
7
- data.tar.gz: 4e8cd13ce2bb3844669befd3610893161f2e97652871856302e1661745f70a56bf6a43c77c96b3cd49024ec93cddbe504e5dfc64ccdf8d63c52eb86bb81ffd4a
6
+ metadata.gz: 115fe761cbc5e3cf7bdaabb2e28ae04c1ac594d9bad3b69b8c63c996d9b52bf2306e6a3de6892a65b069ed4033f0f53ca09f215b19ebde54e4616059bbb6784d
7
+ data.tar.gz: 94b77f1de5c869ce44384dc4e6b5fb2acbde52f7074a786130b52b48185ff367cd19541da1ecc298828fda06c679f431fe6fd11c7eee9565b882a6fe1432cb7c
data/README.md CHANGED
@@ -15,15 +15,15 @@ allowing you to handle errors gracefully.** Semian does this by intercepting
15
15
  resource access through heuristic patterns inspired by [Hystrix][hystrix] and
16
16
  [Release It][release-it]:
17
17
 
18
- * [**Circuit breaker**](#circuit-breaker). A pattern for limiting the
18
+ - [**Circuit breaker**](#circuit-breaker). A pattern for limiting the
19
19
  amount of requests to a dependency that is having issues.
20
- * [**Bulkheading**](#bulkheading). Controlling the concurrent access to
20
+ - [**Bulkheading**](#bulkheading). Controlling the concurrent access to
21
21
  a single resource, access is coordinated server-wide with [SysV
22
22
  semaphores][sysv].
23
23
 
24
24
  Resource drivers are monkey-patched to be aware of Semian, these are called
25
25
  [Semian Adapters](#adapters). Thus, every time resource access is requested
26
- Semian is queried for status on the resource first. If Semian, through the
26
+ Semian is queried for status on the resource first. If Semian, through the
27
27
  patterns above, deems the resource to be unavailable it will raise an exception.
28
28
  **The ultimate outcome of Semian is always an exception that can then be rescued
29
29
  for a graceful fallback**. Instead of waiting for the timeout, Semian raises
@@ -60,7 +60,7 @@ section](#configuration) on how to configure adapters.
60
60
 
61
61
  Semian works by intercepting resource access. Every time access is requested,
62
62
  Semian is queried, and it will raise an exception if the resource is unavailable
63
- according to the circuit breaker or bulkheads. This is done by monkey-patching
63
+ according to the circuit breaker or bulkheads. This is done by monkey-patching
64
64
  the resource driver. **The exception raised by the driver always inherits from
65
65
  the Base exception class of the driver**, meaning you can always simply rescue
66
66
  the base class and catch both Semian and driver errors in the same rescue for
@@ -69,11 +69,11 @@ fallbacks.
69
69
  The following adapters are in Semian and tested heavily in production, the
70
70
  version is the version of the public gem with the same name:
71
71
 
72
- * [`semian/mysql2`][mysql-semian-adapter] (~> 0.3.16)
73
- * [`semian/redis`][redis-semian-adapter] (~> 3.2.1)
74
- * [`semian/net_http`][nethttp-semian-adapter]
75
- * [`semian/activerecord_trilogy_adapter`][activerecord-trilogy-semian-adapter]
76
- * [`semian-postgres`][postgres-semian-adapter]
72
+ - [`semian/mysql2`][mysql-semian-adapter] (~> 0.3.16)
73
+ - [`semian/redis`][redis-semian-adapter] (~> 3.2.1)
74
+ - [`semian/net_http`][nethttp-semian-adapter]
75
+ - [`semian/activerecord_trilogy_adapter`][activerecord-trilogy-semian-adapter]
76
+ - [`semian-postgres`][postgres-semian-adapter]
77
77
 
78
78
  ### Creating Adapters
79
79
 
@@ -113,6 +113,10 @@ Semian.maximum_lru_size = 0
113
113
 
114
114
  # Minimum time in seconds a resource should be resident in the LRU cache (default: 300s)
115
115
  Semian.minimum_lru_time = 60
116
+
117
+ # If true, raise exceptions in case of a validation / constraint failure
118
+ # Otherwise, log in output
119
+ Semian.default_force_config_validation = false
116
120
  ```
117
121
 
118
122
  Note: `minimum_lru_time` is a stronger guarantee than `maximum_lru_size`. That
@@ -120,6 +124,10 @@ is, if a resource has been updated more recently than `minimum_lru_time` it
120
124
  will not be garbage collected, even if it would cause the LRU cache to grow
121
125
  larger than `maximum_lru_size`.
122
126
 
127
+ Note: `default_force_config_validation` set to `true` is a
128
+ **_potentially breaking change_**. Misconfigured Semians will raise errors, so
129
+ make sure that this is what you want. See more in [Configuration Validation](#configuration-validation).
130
+
123
131
  When instantiating a resource it now needs to be configured for Semian. This is
124
132
  done by passing `semian` as an argument when initializing the client. Examples
125
133
  built in adapters:
@@ -132,7 +140,8 @@ client = Mysql2::Client.new(host: "localhost", username: "root", semian: {
132
140
  tickets: 8, # See the Understanding Semian section on picking these values
133
141
  success_threshold: 2,
134
142
  error_threshold: 3,
135
- error_timeout: 10
143
+ error_timeout: 10,
144
+ force_config_validation: false
136
145
  })
137
146
 
138
147
  # Redis client
@@ -145,6 +154,32 @@ client = Redis.new(semian: {
145
154
  })
146
155
  ```
147
156
 
157
+ #### Configuration Validation
158
+
159
+ Semian now provides a flag to specify log-based and exception-based configuration validation. To
160
+ explicitly force the Semian to validate it's configurations, pass `force_config_validation: true`
161
+ into your resource. This will raise an error in the case of a misconfigured or illegal Semian. Otherwise,
162
+ if it is set to `false`, it will log misconfigured parameters verbosely in output.
163
+
164
+ If not specified, it will use `Semian.default_force_config_validation` as
165
+ the flag.
166
+
167
+ ##### Migration Strategy for Force Config Validation
168
+
169
+ When migrating to use `force_config_validation: true`, follow these steps:
170
+
171
+ 1. **Deploy with it turned off**: Start with `force_config_validation: false` in your configuration
172
+ 2. **Look for logs with prefix**: Monitor your application logs for entries with the `[SEMIAN_CONFIG_WARNING]:` prefix. These logs will indicate misconfigured Semian resources
173
+ 3. **Iterate to fix**: Address each configuration issue identified in the logs by updating your Semian configurations
174
+ 4. **Enable**: Once all configuration issues are resolved, set `force_config_validation: true` to enable strict validation
175
+
176
+ Example log entries to look for:
177
+ ```
178
+ [SEMIAN_CONFIG_WARNING]: Missing required arguments for Semian: [:success_threshold, :error_threshold, :error_timeout]
179
+ [SEMIAN_CONFIG_WARNING]: Both bulkhead and circuitbreaker cannot be disabled.
180
+ [SEMIAN_CONFIG_WARNING]: Bulkhead configuration require either the :tickets or :quota parameter, you provided neither
181
+ ```
182
+
148
183
  #### Thread Safety
149
184
 
150
185
  Semian's circuit breaker implementation is thread-safe by default as of
@@ -158,27 +193,27 @@ should be adequate in most environments with reasonably low timeouts.
158
193
 
159
194
  Internally, semian uses `SEM_UNDO` for several sysv semaphore operations:
160
195
 
161
- * Acquire
162
- * Worker registration
163
- * Semaphore metadata state lock
196
+ - Acquire
197
+ - Worker registration
198
+ - Semaphore metadata state lock
164
199
 
165
200
  The intention behind `SEM_UNDO` is that a semaphore operation is automatically undone when the process exits. This
166
201
  is true even if the process exits abnormally - crashes, receives a `SIG_KILL`, etc, because it is handled by
167
202
  the operating system and not the process itself.
168
203
 
169
204
  If, however, a thread performs a semop, the `SEM_UNDO` is on its parent process. This means that the operation
170
- *will not* be undone when the thread exits. This can result in the following unfavorable behavior when using
205
+ _will not_ be undone when the thread exits. This can result in the following unfavorable behavior when using
171
206
  threads:
172
207
 
173
- * Threads acquire a resource, but are killed and the resource ticket is never released. For a process, the
174
- ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
175
- This can result in deadlock on the resource.
176
- * Threads that register workers on a resource but are killed and never unregistered. For a process, the worker
177
- count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
178
- only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
179
- * If a thread acquires the semaphore metadata lock and dies before releasing it, semian will deadlock on anything
180
- attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
181
- from being updated.
208
+ - Threads acquire a resource, but are killed and the resource ticket is never released. For a process, the
209
+ ticket would be released by `SEM_UNDO`, but since it's a thread there is the potential for ticket starvation.
210
+ This can result in deadlock on the resource.
211
+ - Threads that register workers on a resource but are killed and never unregistered. For a process, the worker
212
+ count would be automatically decremented by `SEM_UNDO`, but for threads the worker count will continue to increment,
213
+ only being undone when the parent process dies. This can cause the number of tickets to dramatically exceed the quota.
214
+ - If a thread acquires the semaphore metadata lock and dies before releasing it, semian will deadlock on anything
215
+ attempting to acquire the metadata lock until the thread's parent process exits. This can prevent the ticket count
216
+ from being updated.
182
217
 
183
218
  Moreover, a strategy that utilizes `SEM_UNDO` is not compatible with a strategy that attempts to the semaphores tickets manually.
184
219
  In order to support threads, operations that currently use `SEM_UNDO` would need to use no semaphore flag, and the calling process
@@ -214,17 +249,19 @@ calculate and adjust ticket counts.
214
249
 
215
250
  - You must pass **exactly** one of options: `tickets` or `quota`.
216
251
  - Tickets available will be the ceiling of the quota ratio to the number of workers
217
- - So, with one worker, there will always be a minimum of 1 ticket
252
+ - So, with one worker, there will always be a minimum of 1 ticket
218
253
  - Workers in different processes will automatically unregister when the process exits.
219
254
  - If you have a small number of workers (exactly 2) it's possible that the bulkhead will be too sensitive using quotas.
220
255
  - If you use a forking web server (like unicorn) you should call `Semian.unregister_all_resources` before/after forking.
221
256
 
222
257
  #### Net::HTTP
258
+
223
259
  For the `Net::HTTP` specific Semian adapter, since many external libraries may create
224
260
  HTTP connections on the user's behalf, the parameters are instead provided
225
261
  by associating callback functions with `Semian::NetHTTP`, perhaps in an initialization file.
226
262
 
227
263
  ##### Naming and Options
264
+
228
265
  To give Semian parameters, assign a `proc` to `Semian::NetHTTP.semian_configuration`
229
266
  that takes a two parameters, `host` and `port` like `127.0.0.1`,`443` or `github_com`,`80`,
230
267
  and returns a `Hash` with configuration parameters as follows. The `proc` is used as a
@@ -282,11 +319,11 @@ Semian::NetHTTP.semian_configuration = proc do |host, port|
282
319
  SEMIAN_PARAMETERS.merge(name: name)
283
320
  end
284
321
 
285
- # Two requests to example.com can use two different semian resources,
322
+ # Two requests to shopify.com can use two different semian resources,
286
323
  # as long as `CurrentSemianSubResource.sub_name` is set accordingly:
287
- # CurrentSemianSubResource.set(sub_name: "sub_resource_1") { Net::HTTP.get_response(URI("http://example.com")) }
324
+ # CurrentSemianSubResource.set(sub_name: "sub_resource_1") { Net::HTTP.get_response(URI("http://shopify.com")) }
288
325
  # and:
289
- # CurrentSemianSubResource.set(sub_name: "sub_resource_2") { Net::HTTP.get_response(URI("http://example.com")) }
326
+ # CurrentSemianSubResource.set(sub_name: "sub_resource_2") { Net::HTTP.get_response(URI("http://shopify.com")) }
290
327
  ```
291
328
 
292
329
  For most purposes, `"#{host}_#{port}"` is a good default `name`. Custom `name` formats
@@ -300,6 +337,7 @@ behavior can be changed to blacklisting or even be completely disabled by varyin
300
337
  the use of returning `nil` in the assigned closure.
301
338
 
302
339
  ##### Additional Exceptions
340
+
303
341
  Since we envision this particular adapter can be used in combination with many
304
342
  external libraries, that can raise additional exceptions, we added functionality to
305
343
  expand the Exceptions that can be tracked as part of Semian's circuit breaker.
@@ -513,22 +551,23 @@ all workers on a server.
513
551
 
514
552
  There are four configuration parameters for circuit breakers in Semian:
515
553
 
516
- * **circuit_breaker**. Enable or Disable Circuit Breaker. Defaults to `true` if not set.
517
- * **error_threshold**. The amount of errors a worker encounters within `error_threshold_timeout`
554
+ - **circuit_breaker**. Enable or Disable Circuit Breaker. Defaults to `true` if not set.
555
+ - **error_threshold**. The amount of errors a worker encounters within `error_threshold_timeout`
518
556
  amount of time before opening the circuit,
519
557
  that is to start rejecting requests instantly.
520
- * **error_threshold_timeout**. The amount of time in seconds that `error_threshold`
558
+ - **error_threshold_timeout**. The amount of time in seconds that `error_threshold`
521
559
  errors must occur to open the circuit.
522
560
  Defaults to `error_timeout` seconds if not set.
523
- * **error_timeout**. The amount of time in seconds until trying to query the resource
561
+ - **error_timeout**. The amount of time in seconds until trying to query the resource
524
562
  again.
525
- * **error_threshold_timeout_enabled**. If set to false it will disable
563
+ - **error_threshold_timeout_enabled**. If set to false it will disable
526
564
  the time window for evicting old exceptions. `error_timeout` is still used and
527
565
  will reset the circuit. Defaults to `true` if not set.
528
- * **success_threshold**. The amount of successes on the circuit until closing it
566
+ - **success_threshold**. The amount of successes on the circuit until closing it
529
567
  again, that is to start accepting all requests to the circuit.
530
- * **half_open_resource_timeout**. Timeout for the resource in seconds when
568
+ - **half_open_resource_timeout**. Timeout for the resource in seconds when
531
569
  the circuit is half-open (supported for MySQL, Net::HTTP and Redis).
570
+ - **lumping_interval**. If provided, errors within this timeframe (in seconds) will be lumped and recorded as one.
532
571
 
533
572
  It is possible to disable Circuit Breaker with environment variable
534
573
  `SEMIAN_CIRCUIT_BREAKER_DISABLED=1`.
@@ -587,13 +626,13 @@ graph TD;
587
626
  ReleaseTicket[Release Ticket]
588
627
  FailRequest[Fail Request]
589
628
  OpenCircuit[Open Circuit Breaker]
590
-
629
+
591
630
  Start --> CheckConnection
592
631
  CheckConnection -->|Ticket Available| AllocateTicket
593
632
  AllocateTicket --> AccessResource
594
633
  AccessResource --> ReleaseTicket
595
634
  ReleaseTicket --> CheckConnection
596
-
635
+
597
636
  CheckConnection -->|No Ticket Available| BlockTimeout
598
637
  BlockTimeout -->|Timeout| FailRequest
599
638
  BlockTimeout -->|Ticket Available| AccessResource
@@ -614,9 +653,9 @@ still experimenting with ways to figure out optimal ticket numbers. Generally
614
653
  something below half the number of workers on the server for endpoints that are
615
654
  queried frequently has worked well for us.
616
655
 
617
- * **bulkhead**. Enable or Disable Bulkhead. Defaults to `true` if not set.
618
- * **tickets**. Number of workers that can concurrently access a resource.
619
- * **timeout**. Time to wait in seconds to acquire a ticket if there are no tickets left.
656
+ - **bulkhead**. Enable or Disable Bulkhead. Defaults to `true` if not set.
657
+ - **tickets**. Number of workers that can concurrently access a resource.
658
+ - **timeout**. Time to wait in seconds to acquire a ticket if there are no tickets left.
620
659
  We recommend this to be `0` unless you have very few workers running (i.e.
621
660
  less than ~5).
622
661
 
@@ -626,11 +665,11 @@ It is possible to disable Bulkhead with environment variable
626
665
  Note that there are system-wide limitations on how many tickets can be allocated
627
666
  on a system. `cat /proc/sys/kernel/sem` will tell you.
628
667
 
629
- > System-wide limit on the number of semaphore sets. On Linux
630
- systems before version 3.19, the default value for this limit
631
- was 128. Since Linux 3.19, the default value is 32,000. On
632
- Linux, this limit can be read and modified via the fourth
633
- field of `/proc/sys/kernel/sem`.
668
+ > System-wide limit on the number of semaphore sets. On Linux
669
+ > systems before version 3.19, the default value for this limit
670
+ > was 128. Since Linux 3.19, the default value is 32,000. On
671
+ > Linux, this limit can be read and modified via the fourth
672
+ > field of `/proc/sys/kernel/sem`.
634
673
 
635
674
  #### Bulkhead debugging on linux
636
675
 
@@ -668,10 +707,10 @@ semnum value ncount zcount pid
668
707
  In the above example, we can see each of the semaphores. Looking at the enum code
669
708
  in `ext/semian/sysv_semaphores.h` we can see that:
670
709
 
671
- * 0: is the semian meta lock (mutex) protecting updates to the other resources. It's currently free
672
- * 1: is the number of available tickets - currently no tickets are in use because it's the same as 2
673
- * 2: is the configured (maximum) number of tickets
674
- * 3: is the number of registered workers (processes) that would be considered if using the quota strategy.
710
+ - 0: is the semian meta lock (mutex) protecting updates to the other resources. It's currently free
711
+ - 1: is the number of available tickets - currently no tickets are in use because it's the same as 2
712
+ - 2: is the configured (maximum) number of tickets
713
+ - 3: is the number of registered workers (processes) that would be considered if using the quota strategy.
675
714
 
676
715
  ## Defense line
677
716
 
@@ -884,45 +923,87 @@ $ cd semian
884
923
  ```
885
924
 
886
925
  ## Visual Studio Code
887
- - Open semian in vscode
888
- - Install recommended extensions (one off requirement)
889
- - Click `reopen in container` (first boot might take about a minute)
890
926
 
891
- See https://code.visualstudio.com/docs/remote/containers for more details
927
+ - Open semian in vscode
928
+ - Install recommended extensions (one off requirement)
929
+ - Click `reopen in container` (first boot might take about a minute)
930
+
931
+ See https://code.visualstudio.com/docs/remote/containers for more details
932
+
933
+ If you make any changes to `.devcontainer/` you'd need to recreate the containers:
892
934
 
935
+ - Select `Rebuild Container` from the command palette
893
936
 
894
- If you make any changes to `.devcontainer/` you'd need to recreate the containers:
937
+ Running Tests:
895
938
 
896
- - Select `Rebuild Container` from the command palette
939
+ - `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
897
940
 
941
+ ### Interactive Test Debugging
898
942
 
899
- Running Tests:
900
- - `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
943
+ To use the interactive debugger on vscode:
944
+ - Open semian in vscode
945
+ - Create an `.env` file (if it doesn't exist)
946
+ - Set up a `DEBUG` ENV variable (ex; `DEBUG=true`)
947
+ - Under the `.vscode/` subdirectory, create a `launch.json` file, and include the following:
948
+
949
+ ```json
950
+ {
951
+ "configurations": [
952
+ {
953
+ "type": "rdbg",
954
+ "name": "Attach to Ruby rdbg",
955
+ "request": "attach",
956
+ "debugPort": "12345",
957
+ }
958
+ ]
959
+ }
960
+ ```
961
+
962
+ - For universal support, for any lines you would like to add breakpoints to in your `_test.rb` file (under `test/`), include the following snippet near the line of interest:
963
+
964
+ ```rb
965
+ require "debug"
966
+ binding.break if ENV["DEBUG"]
967
+ ```
968
+
969
+ **Note:** unless you are using an vscode extension such as [Dev Container](https://code.visualstudio.com/docs/devcontainers/tutorial), **do not use the built-in vscode breakpoints -- they will not work!**
970
+
971
+ - Start up the test container
972
+
973
+ ```shell
974
+ $ docker-compose -f .devcontainer/docker-compose.yml --profile test up -d
975
+ ```
976
+
977
+ - When the process indicates that it is waiting for the debugger connection, go to the `Run and Debug` tab, and execute the `Attach to Ruby rdbg` debugger
978
+
979
+ - Use the vscode debugging tools (such as step in, step out, pause, resume) as normal
901
980
 
902
981
  ## Everything else
903
982
 
904
- Test semian in containers:
905
- - `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
906
- - `$ docker exec -it semian bash`
983
+ Test semian in containers:
907
984
 
908
- If you make any changes to `.devcontainer/` you'd need to recreate the containers:
985
+ - `$ docker-compose -f .devcontainer/docker-compose.yml up -d`
986
+ - `$ docker exec -it semian bash`
909
987
 
910
- - `$ docker-compose -f .devcontainer/docker-compose.yml up -d --force-recreate`
988
+ If you make any changes to `.devcontainer/` you'd need to recreate the containers:
911
989
 
912
- Run tests in containers:
990
+ - `$ docker-compose -f .devcontainer/docker-compose.yml up -d --force-recreate`
991
+
992
+ Run tests in containers:
993
+
994
+ ```shell
995
+ $ docker-compose -f ./.devcontainer/docker-compose.yml --profile test run --rm test
996
+ ```
913
997
 
914
- ```shell
915
- $ docker-compose -f ./.devcontainer/docker-compose.yml run --rm test
916
- ```
998
+ Running Tests:
917
999
 
918
- Running Tests:
919
- - `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
1000
+ - `$ bundle exec rake` Run with `SKIP_FLAKY_TESTS=true` to skip flaky tests (CI runs all tests)
920
1001
 
921
1002
  ### Running tests in batches
922
1003
 
923
- * *TEST_WORKERS* - Total number of workers or batches.
924
- It uses to identify a total number of batches, that would be run in parallel. *Default: 1*
925
- * *TEST_WORKER_NUM* - Specify which batch to run. The value is between 1 and *TEST_WORKERS*. *Default: 1*
1004
+ - _TEST_WORKERS_ - Total number of workers or batches.
1005
+ It uses to identify a total number of batches, that would be run in parallel. _Default: 1_
1006
+ - _TEST_WORKER_NUM_ - Specify which batch to run. The value is between 1 and _TEST_WORKERS_. _Default: 1_
926
1007
 
927
1008
  ```shell
928
1009
  $ bundle exec rake test:parallel TEST_WORKERS=5 TEST_WORKER_NUM=1
@@ -17,7 +17,8 @@ module Semian
17
17
 
18
18
  def initialize(name, exceptions:, success_threshold:, error_threshold:,
19
19
  error_timeout:, implementation:, half_open_resource_timeout: nil,
20
- error_threshold_timeout: nil, error_threshold_timeout_enabled: true)
20
+ error_threshold_timeout: nil, error_threshold_timeout_enabled: true,
21
+ lumping_interval: 0)
21
22
  @name = name.to_sym
22
23
  @success_count_threshold = success_threshold
23
24
  @error_count_threshold = error_threshold
@@ -26,6 +27,7 @@ module Semian
26
27
  @error_timeout = error_timeout
27
28
  @exceptions = exceptions
28
29
  @half_open_resource_timeout = half_open_resource_timeout
30
+ @lumping_interval = lumping_interval
29
31
 
30
32
  @errors = implementation::SlidingWindow.new(max_size: @error_count_threshold)
31
33
  @successes = implementation::Integer.new
@@ -63,7 +65,6 @@ module Semian
63
65
 
64
66
  def mark_failed(error)
65
67
  push_error(error)
66
- push_time
67
68
  if closed?
68
69
  transition_to_open if error_threshold_reached?
69
70
  elsif half_open?
@@ -132,16 +133,16 @@ module Semian
132
133
  end
133
134
 
134
135
  def push_error(error)
135
- @last_error = error
136
- end
137
-
138
- def push_time
139
136
  time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
137
+
140
138
  if error_threshold_timeout_enabled
141
139
  @errors.reject! { |err_time| err_time + @error_threshold_timeout < time }
142
140
  end
143
141
 
144
- @errors << time
142
+ if @errors.empty? || @errors.last <= time - @lumping_interval
143
+ @last_error = error
144
+ @errors << time
145
+ end
145
146
  end
146
147
 
147
148
  def log_state_transition(new_state)
@@ -0,0 +1,233 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Semian
4
+ class ConfigurationValidator
5
+ def initialize(name, configuration)
6
+ @name = name
7
+ @configuration = configuration
8
+ @adapter = configuration[:adapter]
9
+ @force_config_validation = force_config_validation?
10
+
11
+ unless @force_config_validation
12
+ Semian.logger.warn(
13
+ "Semian is running in log-mode for configuration validation. This means that Semian will not raise an error if the configuration is invalid. This is not recommended for production environments.\n\n[IMPORTANT] IN FUTURE RELEASES, STRICT CONFIGURATION VALIDATION WILL BE THE DEFAULT BEHAVIOR. PLEASE UPDATE YOUR CONFIGURATION TO USE `force_config_validation: true` TO ENABLE STRICT CONFIGURATION VALIDATION. ALLOWING MISCONFIGURATIONS IN FUTURE RELEASES WILL BREAK YOUR SEMIAN.\n---\n",
14
+ )
15
+ end
16
+ end
17
+
18
+ def validate!
19
+ validate_circuit_breaker_or_bulkhead!
20
+ validate_bulkhead_configuration!
21
+ validate_circuit_breaker_configuration!
22
+ validate_resource_name!
23
+ end
24
+
25
+ private
26
+
27
+ def hint_format(message)
28
+ "\n\nHINT: #{message}\n---"
29
+ end
30
+
31
+ def raise_or_log_validation_required!(message)
32
+ if @force_config_validation
33
+ raise ArgumentError, message
34
+ else
35
+ Semian.logger.warn("[SEMIAN_CONFIG_WARNING]: #{message}")
36
+ end
37
+ end
38
+
39
+ def require_keys!(required, options)
40
+ diff = required - options.keys
41
+ unless diff.empty?
42
+ raise_or_log_validation_required!("Missing required arguments for Semian: #{diff}")
43
+ end
44
+ end
45
+
46
+ def validate_circuit_breaker_or_bulkhead!
47
+ if (@configuration[:circuit_breaker] == false || ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")) && (@configuration[:bulkhead] == false || ENV.key?("SEMIAN_BULKHEAD_DISABLED"))
48
+ raise_or_log_validation_required!("Both bulkhead and circuitbreaker cannot be disabled.")
49
+ end
50
+ end
51
+
52
+ def validate_bulkhead_configuration!
53
+ return if ENV.key?("SEMIAN_BULKHEAD_DISABLED")
54
+ return unless @configuration.fetch(:bulkhead, true)
55
+
56
+ tickets = @configuration[:tickets]
57
+ quota = @configuration[:quota]
58
+
59
+ if tickets.nil? && quota.nil?
60
+ raise_or_log_validation_required!("Bulkhead configuration require either the :tickets or :quota parameter, you provided neither")
61
+ end
62
+
63
+ if tickets && quota
64
+ raise_or_log_validation_required!("Bulkhead configuration require either the :tickets or :quota parameter, you provided both")
65
+ end
66
+
67
+ validate_quota!(quota) if quota
68
+ validate_tickets!(tickets) if tickets
69
+ end
70
+
71
+ def validate_circuit_breaker_configuration!
72
+ return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
73
+ return unless @configuration.fetch(:circuit_breaker, true)
74
+
75
+ require_keys!([:success_threshold, :error_threshold, :error_timeout], @configuration)
76
+ validate_thresholds!
77
+ validate_timeouts!
78
+ end
79
+
80
+ def validate_thresholds!
81
+ success_threshold = @configuration[:success_threshold]
82
+ error_threshold = @configuration[:error_threshold]
83
+
84
+ unless success_threshold.is_a?(Integer) && success_threshold > 0
85
+ err = "success_threshold must be a positive integer, got #{success_threshold}"
86
+
87
+ if success_threshold == 0
88
+ err += hint_format("Are you sure that this is what you want? This will close the circuit breaker immediately after `error_timeout` seconds without checking the resource!")
89
+ end
90
+
91
+ raise_or_log_validation_required!(err)
92
+ end
93
+
94
+ unless error_threshold.is_a?(Integer) && error_threshold > 0
95
+ err = "error_threshold must be a positive integer, got #{error_threshold}"
96
+
97
+ if error_threshold == 0
98
+ err += hint_format("Are you sure that this is what you want? This can result in the circuit opening up at unpredictable times!")
99
+ end
100
+
101
+ raise_or_log_validation_required!(err)
102
+ end
103
+ end
104
+
105
+ def validate_timeouts!
106
+ error_timeout = @configuration[:error_timeout]
107
+ error_threshold_timeout_enabled = @configuration[:error_threshold_timeout_enabled].nil? ? true : @configuration[:error_threshold_timeout_enabled]
108
+ error_threshold = @configuration[:error_threshold]
109
+ lumping_interval = @configuration[:lumping_interval]
110
+ half_open_resource_timeout = @configuration[:half_open_resource_timeout]
111
+
112
+ unless error_timeout.is_a?(Numeric) && error_timeout > 0
113
+ err = "error_timeout must be a positive number, got #{error_timeout}"
114
+
115
+ if error_timeout == 0
116
+ err += hint_format("Are you sure that this is what you want? This will close the circuit breaker immediately after opening it!")
117
+ end
118
+
119
+ raise_or_log_validation_required!(err)
120
+ end
121
+
122
+ # This state checks for contradictions between error_threshold_timeout_enabled and error_threshold_timeout.
123
+ unless error_threshold_timeout_enabled || !@configuration[:error_threshold_timeout]
124
+ err = "error_threshold_timeout_enabled and error_threshold_timeout must not contradict each other, got error_threshold_timeout_enabled: #{error_threshold_timeout_enabled}, error_threshold_timeout: #{@configuration[:error_threshold_timeout]}"
125
+ err += hint_format("Are you sure this is what you want? This will set error_threshold_timeout_enabled to #{error_threshold_timeout_enabled} while error_threshold_timeout is #{@configuration[:error_threshold_timeout] ? "truthy" : "falsy"}")
126
+
127
+ raise_or_log_validation_required!(err)
128
+ end
129
+
130
+ # Only set this after we have checked the error_threshold_timeout_enabled condition
131
+ error_threshold_timeout = @configuration[:error_threshold_timeout] || error_timeout
132
+ unless error_threshold_timeout.is_a?(Numeric) && error_threshold_timeout > 0
133
+ err = "error_threshold_timeout must be a positive number, got #{error_threshold_timeout}"
134
+
135
+ if error_threshold_timeout == 0
136
+ err += hint_format("Are you sure that this is what you want? This will almost never open the circuit breaker since the time interval to catch errors is 0!")
137
+ end
138
+
139
+ raise_or_log_validation_required!(err)
140
+ end
141
+
142
+ unless half_open_resource_timeout.nil? || (half_open_resource_timeout.is_a?(Numeric) && half_open_resource_timeout > 0)
143
+ err = "half_open_resource_timeout must be a positive number, got #{half_open_resource_timeout}"
144
+
145
+ if half_open_resource_timeout == 0
146
+ err += hint_format("Are you sure that this is what you want? This will never half-open the circuit breaker! If that's what you want, you can omit the option instead")
147
+ end
148
+
149
+ raise_or_log_validation_required!(err)
150
+ end
151
+
152
+ unless lumping_interval.nil? || (lumping_interval.is_a?(Numeric) && lumping_interval > 0)
153
+ err = "lumping_interval must be a positive number, got #{lumping_interval}"
154
+
155
+ if lumping_interval == 0
156
+ err += hint_format("Are you sure that this is what you want? This will never lump errors! If that's what you want, you can omit the option instead")
157
+ end
158
+
159
+ raise_or_log_validation_required!(err)
160
+ end
161
+
162
+ # You might be wondering why not check just check lumping_interval * error_threshold <= error_threshold_timeout
163
+ # The reason being is that since the lumping_interval starts at the first error, we count the first error
164
+ # at second 0. So we need to subtract 1 from the error_threshold to get the correct minimum time to reach the
165
+ # error threshold. error_threshold_timeout cannot be less than this minimum time.
166
+ #
167
+ # For example,
168
+ #
169
+ # error_threshold = 3
170
+ # error_threshold_timeout = 10
171
+ # lumping_interval = 4
172
+ #
173
+ # The first error could be counted at second 0, the second error could be counted at second 4, and the third
174
+ # error could be counted at second 8. So this is a valid configuration.
175
+
176
+ unless lumping_interval.nil? || error_threshold_timeout.nil? || lumping_interval * (error_threshold - 1) <= error_threshold_timeout
177
+ err = "constraint violated, this circuit breaker can never open! lumping_interval * (error_threshold - 1) should be <= error_threshold_timeout, got lumping_interval: #{lumping_interval}, error_threshold: #{error_threshold}, error_threshold_timeout: #{error_threshold_timeout}"
178
+ err += hint_format("lumping_interval starts from the first error and not in a fixed window. So you can fit n errors in n-1 seconds, since error 0 starts at 0 seconds. Ensure that you can fit `error_threshold` errors lumped in `lumping_interval` seconds within `error_threshold_timeout` seconds.")
179
+
180
+ raise_or_log_validation_required!(err)
181
+ end
182
+ end
183
+
184
+ def validate_quota!(quota)
185
+ unless quota.is_a?(Numeric) && quota > 0 && quota < 1
186
+ err = "quota must be a decimal between 0 and 1, got #{quota}"
187
+
188
+ if quota == 0
189
+ err += hint_format("Are you sure that this is what you want? This is the same as assigning no workers to the resource, disabling the resource!")
190
+ elsif quota == 1
191
+ err += hint_format("Are you sure that this is what you want? This is the same as assigning all workers to the resource, disabling the bulkhead!")
192
+ end
193
+
194
+ raise_or_log_validation_required!(err)
195
+ end
196
+ end
197
+
198
+ def validate_tickets!(tickets)
199
+ unless tickets.is_a?(Integer) && tickets > 0 && tickets < Semian::MAX_TICKETS
200
+ err = "ticket count must be a positive integer and less than #{Semian::MAX_TICKETS}, got #{tickets}"
201
+
202
+ if tickets == 0
203
+ err += hint_format("Are you sure that this is what you want? This is the same as assigning no workers to the resource, disabling the resource!")
204
+ elsif tickets == Semian::MAX_TICKETS
205
+ err += hint_format("Are you sure that this is what you want? This is the same as assigning all workers to the resource, disabling the bulkhead!")
206
+ end
207
+
208
+ raise_or_log_validation_required!(err)
209
+ end
210
+ end
211
+
212
+ def validate_resource_name!
213
+ unless @name.is_a?(String) || @name.is_a?(Symbol)
214
+ raise_or_log_validation_required!("name must be a symbol or string, got #{@name}")
215
+ end
216
+
217
+ if Semian.resources[@name]
218
+ err = "Resource with name #{@name} is already registered"
219
+ err += hint_format("Are you sure that this is what you want? This will override an existing resource with the same name!")
220
+
221
+ raise_or_log_validation_required!(err)
222
+ end
223
+ end
224
+
225
+ def force_config_validation?
226
+ if @configuration[:force_config_validation].nil?
227
+ Semian.default_force_config_validation
228
+ else
229
+ @configuration[:force_config_validation]
230
+ end
231
+ end
232
+ end
233
+ end
@@ -14,11 +14,11 @@ class LRUHash
14
14
  yield
15
15
  end
16
16
 
17
- def try_lock
17
+ def try_lock # rubocop:disable Naming/PredicateMethod
18
18
  true
19
19
  end
20
20
 
21
- def unlock
21
+ def unlock # rubocop:disable Naming/PredicateMethod
22
22
  true
23
23
  end
24
24
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Semian
4
- VERSION = "0.23.0"
4
+ VERSION = "0.25.0"
5
5
  end
data/lib/semian.rb CHANGED
@@ -16,6 +16,7 @@ require "semian/simple_sliding_window"
16
16
  require "semian/simple_integer"
17
17
  require "semian/simple_state"
18
18
  require "semian/lru_hash"
19
+ require "semian/configuration_validator"
19
20
 
20
21
  #
21
22
  # === Overview
@@ -102,11 +103,12 @@ module Semian
102
103
  OpenCircuitError = Class.new(BaseError)
103
104
  SemaphoreMissingError = Class.new(BaseError)
104
105
 
105
- attr_accessor :maximum_lru_size, :minimum_lru_time, :default_permissions, :namespace
106
+ attr_accessor :maximum_lru_size, :minimum_lru_time, :default_permissions, :namespace, :default_force_config_validation
106
107
 
107
108
  self.maximum_lru_size = 500
108
109
  self.minimum_lru_time = 300 # 300 seconds / 5 minutes
109
110
  self.default_permissions = 0660
111
+ self.default_force_config_validation = false
110
112
 
111
113
  def issue_disabled_semaphores_warning
112
114
  return if defined?(@warning_issued)
@@ -184,13 +186,12 @@ module Semian
184
186
  def register(name, **options)
185
187
  return UnprotectedResource.new(name) if ENV.key?("SEMIAN_DISABLED")
186
188
 
189
+ # Validate configuration before proceeding
190
+ ConfigurationValidator.new(name, options).validate!
191
+
187
192
  circuit_breaker = create_circuit_breaker(name, **options)
188
193
  bulkhead = create_bulkhead(name, **options)
189
194
 
190
- if circuit_breaker.nil? && bulkhead.nil?
191
- raise ArgumentError, "Both bulkhead and circuitbreaker cannot be disabled."
192
- end
193
-
194
195
  resources[name] = ProtectedResource.new(name, bulkhead, circuit_breaker)
195
196
  end
196
197
 
@@ -296,8 +297,6 @@ module Semian
296
297
  return if ENV.key?("SEMIAN_CIRCUIT_BREAKER_DISABLED")
297
298
  return unless options.fetch(:circuit_breaker, true)
298
299
 
299
- require_keys!([:success_threshold, :error_threshold, :error_timeout], options)
300
-
301
300
  exceptions = options[:exceptions] || []
302
301
  CircuitBreaker.new(
303
302
  name,
@@ -310,6 +309,11 @@ module Semian
310
309
  else
311
310
  options[:error_threshold_timeout_enabled]
312
311
  end,
312
+ lumping_interval: if options[:lumping_interval].nil?
313
+ 0
314
+ else
315
+ options[:lumping_interval]
316
+ end,
313
317
  exceptions: Array(exceptions) + [::Semian::BaseError],
314
318
  half_open_resource_timeout: options[:half_open_resource_timeout],
315
319
  implementation: implementation(**options),
@@ -346,13 +350,6 @@ module Semian
346
350
  timeout: timeout,
347
351
  )
348
352
  end
349
-
350
- def require_keys!(required, options)
351
- diff = required - options.keys
352
- unless diff.empty?
353
- raise ArgumentError, "Missing required arguments for Semian: #{diff}"
354
- end
355
- end
356
353
  end
357
354
 
358
355
  if Semian.semaphores_enabled?
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: semian
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.23.0
4
+ version: 0.25.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Scott Francis
@@ -36,6 +36,7 @@ files:
36
36
  - lib/semian/activerecord_trilogy_adapter.rb
37
37
  - lib/semian/adapter.rb
38
38
  - lib/semian/circuit_breaker.rb
39
+ - lib/semian/configuration_validator.rb
39
40
  - lib/semian/grpc.rb
40
41
  - lib/semian/instrumentable.rb
41
42
  - lib/semian/lru_hash.rb
@@ -77,7 +78,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
77
78
  - !ruby/object:Gem::Version
78
79
  version: '0'
79
80
  requirements: []
80
- rubygems_version: 3.6.8
81
+ rubygems_version: 3.7.1
81
82
  specification_version: 4
82
83
  summary: Bulkheading for Ruby with SysV semaphores
83
84
  test_files: []