tobox 0.5.2 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b8f6e85f173d4636a64617ede574141d4b2365799b774a6e56a76b70a59db6ba
4
- data.tar.gz: 49bb80cc99034ba4db846964988c8d1fc2a1d8e062decbe57d5b786e892afb8b
3
+ metadata.gz: 1d1d0e33f4747993a500d22dddbda2eaf2118e0351342a9d09acdf9b6516f86c
4
+ data.tar.gz: e75a3699edf170a1711cc749a8df6ab6b6a4a829057246e1fd690065d89224c5
5
5
  SHA512:
6
- metadata.gz: e1e3f588dbe62ceaf50d6d1e67e3fbcecf0798027728907961d2bd92d18ad5cf779d215c6cad614ace6bdf0c6bc00e8c32a2e3f0ddd30ae1c7f6c2f6524b3133
7
- data.tar.gz: 6170813875d41ed03012ded3a418290446abfeac34a80fb5a7e21df3741032df23b05261d7b87c4c9ff3c2d230738ff20e93c7d178969606e092617d6f91357c
6
+ metadata.gz: d097b80a68d2ec806f521f0a36113bf74dac00fd0e7080e95496ff0b93a2f8e2e1f594b9d13a4b02e6d094ff260ed839d76bd1116f774c8672c01fda5761f5d6
7
+ data.tar.gz: cf138ca863803ae62e165cbd43846c6e35f6b500b344b62fbb1d0a906839cd4d8cc00f720ead1de220dc34802d70baca3a3bae730d01b99fb7d7f550015a6f1e
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.6.0] - 2024-10-25
4
+
5
+ ### Features
6
+
7
+ #### Batch Events handling
8
+
9
+ It's now possible to handle N events at a time.
10
+
11
+ ```ruby
12
+ # tobox.rb
13
+
14
+ batch_size 10 # fetching 10 events at a time
15
+
16
+ on("user_created", "user_updated") do |*events| # 10 events at most
17
+ if events.size == 1
18
+ DataLakeService.user_created(events.first)
19
+ else
20
+ DataLakeService.batch_users_created(events)
21
+ end
22
+ end
23
+ ```
24
+
25
+ This also supports raising errors only for a subset of the events which failed processing (more info in the README).
26
+
3
27
  ## [0.5.2] - 2024-10-22
4
28
 
5
29
  ## Bugfixes
data/README.md CHANGED
@@ -21,6 +21,8 @@ Simple, data-first events processing framework based on the [transactional outbo
21
21
  - [Sentry](#sentry)
22
22
  - [Datadog](#datadog)
23
23
  - [Stats](#stats)
24
+ - [Advanced](#advanced)
25
+ - [Batch Events Handling](#batch-events)
24
26
  - [Supported Rubies](#supported-rubies)
25
27
  - [Rails support](#rails-support)
26
28
  - [Why?](#why)
@@ -101,8 +103,8 @@ database Sequel.connect("postgres://user:pass@dbhost/database")
101
103
  # concurrency 8
102
104
  on("user_created") do |event|
103
105
  puts "created user #{event[:after]["id"]}"
104
- DataLakeService.user_created(user_data_hash)
105
- BillingService.bill_user_account(user_data_hash)
106
+ DataLakeService.user_created(event)
107
+ BillingService.bill_user_account(event)
106
108
  end
107
109
  on("user_updated") do |event|
108
110
  # ...
@@ -660,6 +662,56 @@ c.on_stats(5) do |stats_collector, db|
660
662
  end
661
663
  ```
662
664
 
665
+ <a id="markdown-advanced" name="advanced"></a>
666
+ ## Advanced
667
+
668
+ <a id="markdown-batch-events" name="batch-events"></a>
669
+ ### Batch Events Handling
670
+
671
+ You may start hitting a scale where the workload generated by `tobox` puts the master replica under water. Particularly with PostgreSQL, which isn't optimized for writes, this manifests in CPU usage spiking due to index bypasses, or locks on accessing shared buffers.
672
+
673
+ A way to aleviate this is by hnadling events in batches. By handling N events at a time, the database can drain events more efficiently, while you can either still handle them one by one, or batch them, if possible. For instance, the AWS SDK contains batching alternatives of several APIs, including the SNS publish API.
674
+
675
+ You can do so by setting a batch size in your configuration, and spread the arguments in the event handler:
676
+
677
+ ```ruby
678
+ # tobox.rb
679
+
680
+ batch_size 10 # fetching 10 events at a time
681
+
682
+ on("user_created", "user_updated") do |*events| # 10 events at most
683
+ if events.size == 1
684
+ DataLakeService.user_created(events.first)
685
+ else
686
+ DataLakeService.batch_users_created(events)
687
+ end
688
+ end
689
+ ```
690
+
691
+ In case you're using a batch API solution which may fail for a subset of events, you are able to communicate which events from the batch failed by using `Tobox.raise_batch_errors` API:
692
+
693
+ ```ruby
694
+ on("user_created", "user_updated") do |*events| # 10 events at most
695
+ if events.size == 1
696
+ DataLakeService.user_created(events.first)
697
+ else
698
+ success, failed_events_with_errors = DataLakeService.batch_users_created(events)
699
+
700
+ # handle success first
701
+
702
+ batch_errors = failed_events_with_errors.to_h do |event, exception|
703
+ [
704
+ events.index(event),
705
+ exception
706
+ ]
707
+ end
708
+
709
+ # events identified by the batch index will be retried.
710
+ Tobox.raise_batch_errors(batch_errors)
711
+ end
712
+ end
713
+ ```
714
+
663
715
  <a id="markdown-supported-rubies" name="supported-rubies"></a>
664
716
  ## Supported Rubies
665
717
 
data/lib/tobox/fetcher.rb CHANGED
@@ -93,9 +93,10 @@ module Tobox
93
93
  events, error = yield_events(event_ids, &blk)
94
94
 
95
95
  events.each do |event|
96
- if error
97
- event.merge!(mark_as_error(event, error))
98
- handle_error_event(event, error)
96
+ event_error = error || event[:error]
97
+ if event_error
98
+ event.merge!(mark_as_error(event, event_error))
99
+ handle_error_event(event, event_error)
99
100
  else
100
101
  handle_after_event(event)
101
102
  end
@@ -109,9 +110,28 @@ module Tobox
109
110
  begin
110
111
  events = events_ds.all
111
112
 
112
- yield events
113
+ unless events.empty?
114
+ errors_by_id = catch(:tobox_batch_errors) do
115
+ yield events
116
+ nil
117
+ end
118
+
119
+ # some events from batch errored
120
+ if errors_by_id
121
+ failed = events.values_at(*errors_by_id.keys)
122
+ successful = events - failed
123
+
124
+ # fill in with batch error
125
+ failed.each do |ev|
126
+ ev[:error] = errors_by_id[events.index(ev)]
127
+ end
113
128
 
114
- events_ds.delete unless events.empty?
129
+ # delete successful
130
+ @ds.where(id: successful.map { |ev| ev[:id] }).delete unless successful.empty?
131
+ else
132
+ events_ds.delete
133
+ end
134
+ end
115
135
  rescue StandardError => e
116
136
  error = e
117
137
  end
@@ -129,7 +149,7 @@ module Tobox
129
149
  seconds: @exponential_retry_factor**(event[:attempts] - 1)),
130
150
  # run_at: Sequel.date_add(Sequel::CURRENT_TIMESTAMP,
131
151
  # seconds: Sequel.function(:POWER, Sequel[@table][:attempts] + 1, 4)),
132
- last_error: "#{error.message}\n#{error.backtrace.join("\n")}"
152
+ last_error: error.full_message(highlight: false)
133
153
  }
134
154
 
135
155
  set_event_retry_attempts(event, update_params)
data/lib/tobox/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Tobox
4
- VERSION = "0.5.2"
4
+ VERSION = "0.6.0"
5
5
  end
data/lib/tobox.rb CHANGED
@@ -34,6 +34,32 @@ module Tobox
34
34
  h.synchronize { h[name] = mod }
35
35
  end
36
36
  end
37
+
38
+ # when using batch sizes higher than 1, this method can be used to signal multiple errors
39
+ # for a subset of the events which may have failed processing; these events are identified
40
+ # by the index inside the batch.
41
+ #
42
+ # on(:event_type) do |*events|
43
+ # successful, failed = handle_event_batch(events)
44
+ #
45
+ # deal_with_success(successful)
46
+ #
47
+ # batch_errors = failed.to_h do |failed_event|
48
+ # [
49
+ # events.index(failed_event),
50
+ # MyException.new("failed handling process batch")
51
+ # ]
52
+ # end
53
+ #
54
+ # Tobox.raise_batch_error(batch_errors)
55
+ # end
56
+ def self.raise_batch_errors(batch_errors)
57
+ unless batch_errors.respond_to?(:to_hash) && batch_errors.all? { |k, v| k.is_a?(Integer) && v.is_a?(Exception) }
58
+ raise "batch errors must be an array of index-to-exception tuples"
59
+ end
60
+
61
+ throw(:tobox_batch_errors, batch_errors.to_h)
62
+ end
37
63
  end
38
64
 
39
65
  require_relative "tobox/fetcher"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tobox
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.2
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - HoneyryderChuck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-10-24 00:00:00.000000000 Z
11
+ date: 2024-10-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: logger