journaled 6.2.1 → 6.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +33 -0
- data/app/models/journaled/outbox/event.rb +12 -4
- data/lib/journaled/kinesis_batch_sender.rb +80 -63
- data/lib/journaled/kinesis_failed_event.rb +18 -0
- data/lib/journaled/kinesis_sequential_sender.rb +91 -0
- data/lib/journaled/outbox/batch_processor.rb +23 -14
- data/lib/journaled/outbox/metric_emitter.rb +74 -64
- data/lib/journaled/outbox/worker.rb +5 -14
- data/lib/journaled/version.rb +1 -1
- data/lib/journaled.rb +5 -2
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ed85996fe76daec652ed49ec5c128e27f906c9ce91193de1b394843e11ed3971
|
|
4
|
+
data.tar.gz: '09335faed12c2732e425535b849f36d7973a2a614ffd52bd1a77ac0e5b251ba5'
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e0f867425b1f9a033644b007b97f144665d1bf5c30f498621ad2a805d3d607158a219fe46e3ef189389a89b77c038289dc45943e659519c9932b1bcc5832b1fa
|
|
7
|
+
data.tar.gz: 1788bf801f0c3a3d2dafb5108532e350dce725319312367351fd0b9505cc554ce7a03d85c15b36a2c2cddbef313818d3770beeee9aaf1c8550e812925830fdc9
|
data/README.md
CHANGED
|
@@ -164,6 +164,25 @@ Journaling provides a number of different configuation options that can be set i
|
|
|
164
164
|
Journaled.outbox_base_class_name = 'EventsRecord'
|
|
165
165
|
```
|
|
166
166
|
|
|
167
|
+
#### `Journaled.outbox_processing_mode` (default: `:batch`)
|
|
168
|
+
|
|
169
|
+
**Only relevant when using `Journaled::Outbox::Adapter`.**
|
|
170
|
+
|
|
171
|
+
Controls how events are sent to Kinesis. Two modes are available:
|
|
172
|
+
|
|
173
|
+
- **`:batch`** (default) - Uses the Kinesis `put_records` batch API for high throughput. Events are sent in parallel batches, allowing multiple workers to run concurrently. Best for most use cases where strict ordering is not required.
|
|
174
|
+
|
|
175
|
+
- **`:guaranteed_order`** - Uses the Kinesis `put_record` single-event API to send events sequentially. Events are processed one at a time in order, stopping on the first transient failure to preserve ordering. Use this when you need strict ordering guarantees per partition key. Note: The current implementation requires single-threaded processing, but future optimizations may support batching and multi-threading by partition key.
|
|
176
|
+
|
|
177
|
+
Example:
|
|
178
|
+
```ruby
|
|
179
|
+
# For high throughput (default)
|
|
180
|
+
Journaled.outbox_processing_mode = :batch
|
|
181
|
+
|
|
182
|
+
# For guaranteed ordering
|
|
183
|
+
Journaled.outbox_processing_mode = :guaranteed_order
|
|
184
|
+
```
|
|
185
|
+
|
|
167
186
|
#### ActiveJob `set` options
|
|
168
187
|
|
|
169
188
|
Both model-level directives accept additional options to be passed into ActiveJob's `set` method:
|
|
@@ -182,6 +201,8 @@ journal_attributes :email, enqueue_with: { priority: 20, queue: 'journaled' }
|
|
|
182
201
|
|
|
183
202
|
Journaled includes a built-in Outbox-style delivery adapter with horizontally scalable workers.
|
|
184
203
|
|
|
204
|
+
By default, the Outbox adapter uses the Kinesis `put_records` batch API for high-throughput event processing, allowing multiple workers to process events in parallel. If you require strict ordering guarantees per partition key, you can configure sequential processing mode (see configuration options below).
|
|
205
|
+
|
|
185
206
|
**Setup:**
|
|
186
207
|
|
|
187
208
|
This feature requires creating database tables and is completely optional. Existing users are unaffected.
|
|
@@ -207,6 +228,16 @@ Journaled.delivery_adapter = Journaled::Outbox::Adapter
|
|
|
207
228
|
# Optional: Customize worker behavior (these are the defaults)
|
|
208
229
|
Journaled.worker_batch_size = 500 # Max events per Kinesis batch (Kinesis API limit)
|
|
209
230
|
Journaled.worker_poll_interval = 5 # Seconds between polls
|
|
231
|
+
|
|
232
|
+
# Optional: Configure processing mode (default: :batch)
|
|
233
|
+
# - :batch - Uses Kinesis put_records batch API for high throughput (default)
|
|
234
|
+
# Events are sent in parallel batches. Multiple workers can run concurrently.
|
|
235
|
+
# - :guaranteed_order - Uses Kinesis put_record single-event API for sequential processing
|
|
236
|
+
# Events are sent one at a time in order. Use this if you need
|
|
237
|
+
# strict ordering guarantees per partition key. The current
|
|
238
|
+
# implementation processes events single-threaded, though future
|
|
239
|
+
# optimizations may support batching/multi-threading by partition key.
|
|
240
|
+
Journaled.outbox_processing_mode = :batch
|
|
210
241
|
```
|
|
211
242
|
|
|
212
243
|
**Note:** When using the Outbox adapter, you do **not** need to configure an ActiveJob queue adapter (skip step 1 of Installation). The Outbox adapter uses the `journaled_outbox_events` table for event storage and its own worker daemons for processing, making it independent of ActiveJob. Transactional batching still works seamlessly with the Outbox adapter.
|
|
@@ -217,6 +248,8 @@ Journaled.worker_poll_interval = 5 # Seconds between polls
|
|
|
217
248
|
bundle exec rake journaled_worker:work
|
|
218
249
|
```
|
|
219
250
|
|
|
251
|
+
**Note:** In `:batch` mode (the default), you can run multiple worker processes concurrently for horizontal scaling. In `:guaranteed_order` mode, the current implementation is optimized for running a single worker to maintain ordering guarantees.
|
|
252
|
+
|
|
220
253
|
4. **Monitoring:**
|
|
221
254
|
|
|
222
255
|
The system emits `ActiveSupport::Notifications` events:
|
|
@@ -29,12 +29,20 @@ module Journaled
|
|
|
29
29
|
|
|
30
30
|
# Fetch a batch of events for processing using SELECT FOR UPDATE
|
|
31
31
|
#
|
|
32
|
+
# In :guaranteed_order mode, uses blocking lock to ensure sequential processing.
|
|
33
|
+
# In :batch mode, uses SKIP LOCKED to allow parallel workers.
|
|
34
|
+
#
|
|
32
35
|
# @return [Array<Journaled::Outbox::Event>] Events locked for processing
|
|
33
36
|
def self.fetch_batch_for_update
|
|
34
|
-
ready_to_process
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
37
|
+
query = ready_to_process.limit(Journaled.worker_batch_size)
|
|
38
|
+
|
|
39
|
+
lock_clause = if Journaled.outbox_processing_mode == :guaranteed_order
|
|
40
|
+
'FOR UPDATE'
|
|
41
|
+
else
|
|
42
|
+
'FOR UPDATE SKIP LOCKED'
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
query.lock(lock_clause).to_a
|
|
38
46
|
end
|
|
39
47
|
|
|
40
48
|
# Requeue a failed event for processing
|
|
@@ -1,98 +1,115 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
module Journaled
|
|
4
|
-
# Sends batches of events to Kinesis using the
|
|
4
|
+
# Sends batches of events to Kinesis using the PutRecords batch API
|
|
5
5
|
#
|
|
6
6
|
# This class handles:
|
|
7
|
-
# - Sending events
|
|
7
|
+
# - Sending events in batches to improve throughput
|
|
8
8
|
# - Handling failures on a per-event basis
|
|
9
9
|
# - Classifying errors as transient vs permanent
|
|
10
10
|
#
|
|
11
11
|
# Returns structured results for the caller to handle event state management.
|
|
12
12
|
class KinesisBatchSender
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
end
|
|
17
|
-
|
|
18
|
-
def permanent?
|
|
19
|
-
!transient
|
|
20
|
-
end
|
|
21
|
-
end
|
|
22
|
-
|
|
23
|
-
PERMANENT_ERROR_CLASSES = [
|
|
24
|
-
Aws::Kinesis::Errors::ValidationException,
|
|
13
|
+
# Per-record error codes that indicate permanent failures (bad event data)
|
|
14
|
+
PERMANENT_ERROR_CODES = [
|
|
15
|
+
'ValidationException',
|
|
25
16
|
].freeze
|
|
26
17
|
|
|
27
18
|
# Send a batch of database events to Kinesis
|
|
28
19
|
#
|
|
29
|
-
#
|
|
20
|
+
# Uses put_records batch API. Groups events by stream and sends each group as a batch.
|
|
30
21
|
#
|
|
31
22
|
# @param events [Array<Journaled::Outbox::Event>] Events to send
|
|
32
23
|
# @return [Hash] Result with:
|
|
33
24
|
# - succeeded: Array of successfully sent events
|
|
34
|
-
# - failed: Array of FailedEvent structs (
|
|
25
|
+
# - failed: Array of FailedEvent structs (both transient and permanent failures)
|
|
35
26
|
def send_batch(events)
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
27
|
+
# Group events by stream since put_records requires all records to go to the same stream
|
|
28
|
+
events.group_by(&:stream_name).each_with_object({ succeeded: [], failed: [] }) do |(stream_name, stream_events), result|
|
|
29
|
+
batch_result = send_stream_batch(stream_name, stream_events)
|
|
30
|
+
result[:succeeded].concat(batch_result[:succeeded])
|
|
31
|
+
result[:failed].concat(batch_result[:failed])
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
private
|
|
36
|
+
|
|
37
|
+
def send_stream_batch(stream_name, stream_events)
|
|
38
|
+
records = build_records(stream_events)
|
|
39
|
+
|
|
40
|
+
begin
|
|
41
|
+
response = kinesis_client.put_records(stream_name:, records:)
|
|
42
|
+
process_response(response, stream_events)
|
|
43
|
+
rescue Aws::Kinesis::Errors::ValidationException
|
|
44
|
+
# Re-raise batch-level validation errors (configuration issues)
|
|
45
|
+
# These indicate invalid stream name, batch too large, etc.
|
|
46
|
+
# Not event data problems - requires manual intervention
|
|
47
|
+
raise
|
|
48
|
+
rescue StandardError => e
|
|
49
|
+
# Handle transient errors (throttling, network issues, service unavailable)
|
|
50
|
+
handle_transient_batch_error(e, stream_events)
|
|
51
|
+
end
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def build_records(stream_events)
|
|
55
|
+
stream_events.map do |event|
|
|
56
|
+
{
|
|
57
|
+
data: event.event_data.merge(id: event.id).to_json,
|
|
58
|
+
partition_key: event.partition_key,
|
|
59
|
+
}
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
def process_response(response, stream_events)
|
|
64
|
+
succeeded = []
|
|
65
|
+
failed = []
|
|
66
|
+
|
|
67
|
+
response.records.each_with_index do |record_result, index|
|
|
68
|
+
event = stream_events[index]
|
|
69
|
+
|
|
70
|
+
if record_result.error_code
|
|
71
|
+
failed << create_failed_event(
|
|
72
|
+
event,
|
|
73
|
+
error_code: record_result.error_code,
|
|
74
|
+
error_message: record_result.error_message,
|
|
75
|
+
transient: PERMANENT_ERROR_CODES.exclude?(record_result.error_code),
|
|
76
|
+
)
|
|
47
77
|
else
|
|
48
|
-
|
|
78
|
+
succeeded << event
|
|
49
79
|
end
|
|
50
80
|
end
|
|
51
81
|
|
|
52
|
-
|
|
82
|
+
{ succeeded:, failed: }
|
|
53
83
|
end
|
|
54
84
|
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
# Send a single event to Kinesis
|
|
58
|
-
#
|
|
59
|
-
# @param event [Journaled::Outbox::Event] Event to send
|
|
60
|
-
# @return [Journaled::Outbox::Event, FailedEvent] The event on success, or FailedEvent on failure
|
|
61
|
-
def send_event(event)
|
|
62
|
-
# Merge the DB-generated ID into the event data before sending to Kinesis
|
|
63
|
-
event_data_with_id = event.event_data.merge(id: event.id)
|
|
64
|
-
|
|
65
|
-
kinesis_client.put_record(
|
|
66
|
-
stream_name: event.stream_name,
|
|
67
|
-
data: event_data_with_id.to_json,
|
|
68
|
-
partition_key: event.partition_key,
|
|
69
|
-
)
|
|
85
|
+
def create_failed_event(event, error_code:, error_message:, transient:)
|
|
86
|
+
Outbox::MetricEmitter.emit_kinesis_failure(event:, error_code:)
|
|
70
87
|
|
|
71
|
-
|
|
72
|
-
rescue *PERMANENT_ERROR_CLASSES => e
|
|
73
|
-
Rails.logger.error("Kinesis event send failed (permanent): #{e.class} - #{e.message}")
|
|
74
|
-
FailedEvent.new(
|
|
75
|
-
event:,
|
|
76
|
-
error_code: e.class.to_s,
|
|
77
|
-
error_message: e.message,
|
|
78
|
-
transient: false,
|
|
79
|
-
)
|
|
80
|
-
rescue StandardError => e
|
|
81
|
-
Rails.logger.error("Kinesis event send failed (transient): #{e.class} - #{e.message}")
|
|
82
|
-
FailedEvent.new(
|
|
88
|
+
Journaled::KinesisFailedEvent.new(
|
|
83
89
|
event:,
|
|
84
|
-
error_code
|
|
85
|
-
error_message
|
|
86
|
-
transient
|
|
90
|
+
error_code:,
|
|
91
|
+
error_message:,
|
|
92
|
+
transient:,
|
|
87
93
|
)
|
|
88
94
|
end
|
|
89
95
|
|
|
90
|
-
def
|
|
91
|
-
|
|
96
|
+
def handle_transient_batch_error(error, stream_events)
|
|
97
|
+
Rails.logger.error("Kinesis batch send failed (transient): #{error.class} - #{error.message}")
|
|
98
|
+
|
|
99
|
+
failed = stream_events.map do |event|
|
|
100
|
+
create_failed_event(
|
|
101
|
+
event,
|
|
102
|
+
error_code: error.class.to_s,
|
|
103
|
+
error_message: error.message,
|
|
104
|
+
transient: true,
|
|
105
|
+
)
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
{ succeeded: [], failed: }
|
|
92
109
|
end
|
|
93
110
|
|
|
94
|
-
def
|
|
95
|
-
|
|
111
|
+
def kinesis_client
|
|
112
|
+
@kinesis_client ||= KinesisClientFactory.build
|
|
96
113
|
end
|
|
97
114
|
end
|
|
98
115
|
end
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Journaled
|
|
4
|
+
# Represents a failed event from Kinesis send operations
|
|
5
|
+
#
|
|
6
|
+
# Used by both KinesisBatchSender and KinesisSequentialSender to represent
|
|
7
|
+
# events that failed to send to Kinesis, along with error details and whether
|
|
8
|
+
# the failure is transient (retriable) or permanent.
|
|
9
|
+
KinesisFailedEvent = Struct.new(:event, :error_code, :error_message, :transient, keyword_init: true) do
|
|
10
|
+
def transient?
|
|
11
|
+
transient
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
def permanent?
|
|
15
|
+
!transient
|
|
16
|
+
end
|
|
17
|
+
end
|
|
18
|
+
end
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Journaled
|
|
4
|
+
# Sends batches of events to Kinesis using the PutRecord single-event API
|
|
5
|
+
#
|
|
6
|
+
# This class handles:
|
|
7
|
+
# - Sending events individually in order to support guaranteed ordering
|
|
8
|
+
# - Stopping on first transient failure to preserve ordering
|
|
9
|
+
# - Classifying errors as transient vs permanent
|
|
10
|
+
#
|
|
11
|
+
# Returns structured results for the caller to handle event state management.
|
|
12
|
+
class KinesisSequentialSender
|
|
13
|
+
PERMANENT_ERROR_CLASSES = [
|
|
14
|
+
Aws::Kinesis::Errors::ValidationException,
|
|
15
|
+
].freeze
|
|
16
|
+
|
|
17
|
+
# Send a batch of database events to Kinesis
|
|
18
|
+
#
|
|
19
|
+
# Sends events one at a time to guarantee ordering. Stops on first transient failure.
|
|
20
|
+
#
|
|
21
|
+
# @param events [Array<Journaled::Outbox::Event>] Events to send
|
|
22
|
+
# @return [Hash] Result with:
|
|
23
|
+
# - succeeded: Array of successfully sent events
|
|
24
|
+
# - failed: Array of FailedEvent structs (only permanent failures)
|
|
25
|
+
def send_batch(events)
|
|
26
|
+
result = { succeeded: [], failed: [] }
|
|
27
|
+
|
|
28
|
+
events.each do |event|
|
|
29
|
+
event_result = send_event(event)
|
|
30
|
+
if event_result.is_a?(Journaled::KinesisFailedEvent)
|
|
31
|
+
if event_result.transient?
|
|
32
|
+
emit_transient_failure_metric
|
|
33
|
+
break
|
|
34
|
+
else
|
|
35
|
+
result[:failed] << event_result
|
|
36
|
+
end
|
|
37
|
+
else
|
|
38
|
+
result[:succeeded] << event_result
|
|
39
|
+
end
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
result
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
private
|
|
46
|
+
|
|
47
|
+
# Send a single event to Kinesis
|
|
48
|
+
#
|
|
49
|
+
# @param event [Journaled::Outbox::Event] Event to send
|
|
50
|
+
# @return [Journaled::Outbox::Event, FailedEvent] The event on success, or FailedEvent on failure
|
|
51
|
+
def send_event(event)
|
|
52
|
+
kinesis_client.put_record(
|
|
53
|
+
stream_name: event.stream_name,
|
|
54
|
+
data: event.event_data.merge(id: event.id).to_json,
|
|
55
|
+
partition_key: event.partition_key,
|
|
56
|
+
)
|
|
57
|
+
|
|
58
|
+
event
|
|
59
|
+
rescue *PERMANENT_ERROR_CLASSES => e
|
|
60
|
+
Rails.logger.error("[Journaled] Kinesis event send failed (permanent): #{e.class} - #{e.message}")
|
|
61
|
+
error_code = e.class.to_s
|
|
62
|
+
Outbox::MetricEmitter.emit_kinesis_failure(event:, error_code:)
|
|
63
|
+
|
|
64
|
+
Journaled::KinesisFailedEvent.new(
|
|
65
|
+
event:,
|
|
66
|
+
error_code:,
|
|
67
|
+
error_message: e.message,
|
|
68
|
+
transient: false,
|
|
69
|
+
)
|
|
70
|
+
rescue StandardError => e
|
|
71
|
+
Rails.logger.error("[Journaled] Kinesis event send failed (transient): #{e.class} - #{e.message}")
|
|
72
|
+
error_code = e.class.to_s
|
|
73
|
+
Outbox::MetricEmitter.emit_kinesis_failure(event:, error_code:)
|
|
74
|
+
|
|
75
|
+
Journaled::KinesisFailedEvent.new(
|
|
76
|
+
event:,
|
|
77
|
+
error_code:,
|
|
78
|
+
error_message: e.message,
|
|
79
|
+
transient: true,
|
|
80
|
+
)
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
def kinesis_client
|
|
84
|
+
@kinesis_client ||= KinesisClientFactory.build
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
def emit_transient_failure_metric
|
|
88
|
+
ActiveSupport::Notifications.instrument('journaled.kinesis_sequential_sender.transient_failure')
|
|
89
|
+
end
|
|
90
|
+
end
|
|
91
|
+
end
|
|
@@ -6,31 +6,36 @@ module Journaled
|
|
|
6
6
|
#
|
|
7
7
|
# This class handles the core business logic of:
|
|
8
8
|
# - Fetching events from the database (with FOR UPDATE)
|
|
9
|
-
# - Sending them to Kinesis
|
|
9
|
+
# - Sending them to Kinesis (batch API or sequential)
|
|
10
10
|
# - Handling successful deliveries (deleting events)
|
|
11
11
|
# - Handling permanent failures (marking with failed_at)
|
|
12
|
-
# - Handling
|
|
12
|
+
# - Handling transient failures (leaving unlocked for retry)
|
|
13
13
|
#
|
|
14
|
-
#
|
|
15
|
-
#
|
|
16
|
-
#
|
|
14
|
+
# Supports two modes based on Journaled.outbox_processing_mode:
|
|
15
|
+
# - :batch - Uses put_records API for high throughput with parallel workers
|
|
16
|
+
# - :guaranteed_order - Uses put_record API for sequential processing
|
|
17
17
|
#
|
|
18
18
|
# All operations happen within a single database transaction for consistency.
|
|
19
19
|
# The Worker class delegates to this for actual event processing.
|
|
20
20
|
class BatchProcessor
|
|
21
21
|
def initialize
|
|
22
|
-
@batch_sender =
|
|
22
|
+
@batch_sender = if Journaled.outbox_processing_mode == :guaranteed_order
|
|
23
|
+
KinesisSequentialSender.new
|
|
24
|
+
else
|
|
25
|
+
KinesisBatchSender.new
|
|
26
|
+
end
|
|
23
27
|
end
|
|
24
28
|
|
|
25
29
|
# Process a single batch of events
|
|
26
30
|
#
|
|
27
31
|
# Wraps the entire batch processing in a single transaction:
|
|
28
32
|
# 1. SELECT FOR UPDATE (claim events)
|
|
29
|
-
# 2. Send to Kinesis (batch
|
|
33
|
+
# 2. Send to Kinesis (batch API or sequential, based on mode)
|
|
30
34
|
# 3. Delete successful events
|
|
31
|
-
# 4. Mark failed events
|
|
35
|
+
# 4. Mark permanently failed events
|
|
36
|
+
# 5. Leave transient failures untouched (will be retried)
|
|
32
37
|
#
|
|
33
|
-
# @return [Hash] Statistics with :succeeded, :failed_permanently counts
|
|
38
|
+
# @return [Hash] Statistics with :succeeded, :failed_permanently, :failed_transiently counts
|
|
34
39
|
def process_batch
|
|
35
40
|
ActiveRecord::Base.transaction do
|
|
36
41
|
events = Event.fetch_batch_for_update
|
|
@@ -38,20 +43,24 @@ module Journaled
|
|
|
38
43
|
|
|
39
44
|
result = batch_sender.send_batch(events)
|
|
40
45
|
|
|
41
|
-
# Delete successful events
|
|
42
46
|
Event.where(id: result[:succeeded].map(&:id)).delete_all if result[:succeeded].any?
|
|
43
47
|
|
|
44
|
-
|
|
45
|
-
|
|
48
|
+
permanent_failures = result[:failed].select(&:permanent?)
|
|
49
|
+
transient_failures = result[:failed].select(&:transient?)
|
|
50
|
+
|
|
51
|
+
mark_events_as_failed(permanent_failures) if permanent_failures.any?
|
|
46
52
|
|
|
47
53
|
Rails.logger.info(
|
|
48
54
|
"[journaled] Batch complete: #{result[:succeeded].count} succeeded, " \
|
|
49
|
-
"#{
|
|
55
|
+
"#{permanent_failures.count} permanently failed, " \
|
|
56
|
+
"#{transient_failures.count} transiently failed (will retry) " \
|
|
57
|
+
"(batch size: #{events.count})",
|
|
50
58
|
)
|
|
51
59
|
|
|
52
60
|
{
|
|
53
61
|
succeeded: result[:succeeded].count,
|
|
54
|
-
failed_permanently:
|
|
62
|
+
failed_permanently: permanent_failures.count,
|
|
63
|
+
failed_transiently: transient_failures.count,
|
|
55
64
|
}
|
|
56
65
|
end
|
|
57
66
|
end
|
|
@@ -2,82 +2,92 @@
|
|
|
2
2
|
|
|
3
3
|
module Journaled
|
|
4
4
|
module Outbox
|
|
5
|
-
# Handles metric emission for the Worker
|
|
5
|
+
# Handles metric emission for the Worker and Kinesis senders
|
|
6
6
|
#
|
|
7
|
-
# This class
|
|
7
|
+
# This class provides utility methods for collecting and emitting metrics.
|
|
8
8
|
class MetricEmitter
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
9
|
+
class << self
|
|
10
|
+
# Emit batch processing metrics
|
|
11
|
+
#
|
|
12
|
+
# @param stats [Hash] Processing statistics with :succeeded, :failed_permanently, :failed_transiently
|
|
13
|
+
# @param worker_id [String] ID of the worker processing the batch
|
|
14
|
+
def emit_batch_metrics(stats, worker_id:)
|
|
15
|
+
total_events = stats[:succeeded] + stats[:failed_permanently] + stats[:failed_transiently]
|
|
12
16
|
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
17
|
+
emit_metric('journaled.outbox_event.processed', value: total_events, worker_id:)
|
|
18
|
+
emit_metric('journaled.outbox_event.sent', value: stats[:succeeded], worker_id:)
|
|
19
|
+
emit_metric('journaled.outbox_event.failed', value: stats[:failed_permanently], worker_id:)
|
|
20
|
+
emit_metric('journaled.outbox_event.errored', value: stats[:failed_transiently], worker_id:)
|
|
21
|
+
end
|
|
18
22
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
+
# Collect and emit queue metrics
|
|
24
|
+
#
|
|
25
|
+
# This calculates various queue statistics and emits individual metrics for each.
|
|
26
|
+
# @param worker_id [String] ID of the worker collecting metrics
|
|
27
|
+
def emit_queue_metrics(worker_id:)
|
|
28
|
+
metrics = calculate_queue_metrics
|
|
23
29
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
metrics = calculate_queue_metrics
|
|
30
|
+
emit_metric('journaled.worker.queue_total_count', value: metrics[:total_count], worker_id:)
|
|
31
|
+
emit_metric('journaled.worker.queue_workable_count', value: metrics[:workable_count], worker_id:)
|
|
32
|
+
emit_metric('journaled.worker.queue_failed_count', value: metrics[:failed_count], worker_id:)
|
|
33
|
+
emit_metric('journaled.worker.queue_oldest_age_seconds', value: metrics[:oldest_age_seconds], worker_id:)
|
|
29
34
|
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
35
|
+
Rails.logger.info(
|
|
36
|
+
"Queue metrics: total=#{metrics[:total_count]}, " \
|
|
37
|
+
"workable=#{metrics[:workable_count]}, " \
|
|
38
|
+
"failed=#{metrics[:failed_count]}, " \
|
|
39
|
+
"oldest_age=#{metrics[:oldest_age_seconds].round(2)}s",
|
|
40
|
+
)
|
|
41
|
+
end
|
|
34
42
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
43
|
+
# Emit a metric notification for a Kinesis send failure
|
|
44
|
+
#
|
|
45
|
+
# @param event [Journaled::Outbox::Event] The failed event
|
|
46
|
+
# @param error_code [String] The error code (e.g., 'ProvisionedThroughputExceededException')
|
|
47
|
+
def emit_kinesis_failure(event:, error_code:)
|
|
48
|
+
emit_metric(
|
|
49
|
+
'journaled.kinesis.send_failure',
|
|
50
|
+
partition_key: event.partition_key,
|
|
51
|
+
error_code:,
|
|
52
|
+
stream_name: event.stream_name,
|
|
53
|
+
event_type: event.event_type,
|
|
54
|
+
)
|
|
55
|
+
end
|
|
42
56
|
|
|
43
|
-
|
|
57
|
+
private
|
|
44
58
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
ActiveSupport::Notifications.instrument(
|
|
53
|
-
event_name,
|
|
54
|
-
payload.merge(worker_id:),
|
|
55
|
-
)
|
|
56
|
-
end
|
|
59
|
+
# Emit a single metric notification
|
|
60
|
+
#
|
|
61
|
+
# @param event_name [String] The name of the metric event
|
|
62
|
+
# @param payload [Hash] Additional payload data (event_count, value, etc.)
|
|
63
|
+
def emit_metric(event_name, payload)
|
|
64
|
+
ActiveSupport::Notifications.instrument(event_name, payload)
|
|
65
|
+
end
|
|
57
66
|
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
67
|
+
# Calculate queue metrics
|
|
68
|
+
#
|
|
69
|
+
# @return [Hash] Metrics including counts and oldest event timestamp
|
|
70
|
+
def calculate_queue_metrics
|
|
71
|
+
# Use a single query with COUNT(*) FILTER to calculate all counts in one table scan
|
|
72
|
+
result = Event.connection.select_one(
|
|
73
|
+
Event.select(
|
|
74
|
+
'COUNT(*) AS total_count',
|
|
75
|
+
'COUNT(*) FILTER (WHERE failed_at IS NULL) AS workable_count',
|
|
76
|
+
'COUNT(*) FILTER (WHERE failure_reason IS NOT NULL AND failed_at IS NULL) AS failed_count',
|
|
77
|
+
'MIN(created_at) FILTER (WHERE failed_at IS NULL) AS oldest_non_failed_timestamp',
|
|
78
|
+
).to_sql,
|
|
79
|
+
)
|
|
71
80
|
|
|
72
|
-
|
|
73
|
-
|
|
81
|
+
oldest_timestamp = result['oldest_non_failed_timestamp']
|
|
82
|
+
oldest_age_seconds = oldest_timestamp ? Time.current - oldest_timestamp : 0
|
|
74
83
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
84
|
+
{
|
|
85
|
+
total_count: result['total_count'],
|
|
86
|
+
workable_count: result['workable_count'],
|
|
87
|
+
failed_count: result['failed_count'],
|
|
88
|
+
oldest_age_seconds:,
|
|
89
|
+
}
|
|
90
|
+
end
|
|
81
91
|
end
|
|
82
92
|
end
|
|
83
93
|
end
|
|
@@ -18,7 +18,6 @@ module Journaled
|
|
|
18
18
|
@worker_id = "#{Socket.gethostname}-#{Process.pid}"
|
|
19
19
|
self.running = false
|
|
20
20
|
@processor = BatchProcessor.new
|
|
21
|
-
@metric_emitter = MetricEmitter.new(worker_id: @worker_id)
|
|
22
21
|
self.shutdown_requested = false
|
|
23
22
|
@last_metrics_emission = Time.current
|
|
24
23
|
end
|
|
@@ -50,7 +49,7 @@ module Journaled
|
|
|
50
49
|
|
|
51
50
|
private
|
|
52
51
|
|
|
53
|
-
attr_reader :worker_id, :processor
|
|
52
|
+
attr_reader :worker_id, :processor
|
|
54
53
|
attr_accessor :shutdown_requested, :running, :last_metrics_emission
|
|
55
54
|
|
|
56
55
|
def run_loop
|
|
@@ -60,9 +59,8 @@ module Journaled
|
|
|
60
59
|
break
|
|
61
60
|
end
|
|
62
61
|
|
|
63
|
-
events_processed = 0
|
|
64
62
|
begin
|
|
65
|
-
|
|
63
|
+
process_batch
|
|
66
64
|
emit_metrics_if_needed
|
|
67
65
|
rescue StandardError => e
|
|
68
66
|
Rails.logger.error("Worker error: #{e.class} - #{e.message}")
|
|
@@ -71,21 +69,14 @@ module Journaled
|
|
|
71
69
|
|
|
72
70
|
break if shutdown_requested
|
|
73
71
|
|
|
74
|
-
|
|
75
|
-
sleep(Journaled.worker_poll_interval) if events_processed.zero?
|
|
72
|
+
sleep(Journaled.worker_poll_interval)
|
|
76
73
|
end
|
|
77
74
|
end
|
|
78
75
|
|
|
79
76
|
def process_batch
|
|
80
77
|
stats = processor.process_batch
|
|
81
78
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
stats[:succeeded] + stats[:failed_permanently]
|
|
85
|
-
end
|
|
86
|
-
|
|
87
|
-
def instrument_batch_results(stats)
|
|
88
|
-
metric_emitter.emit_batch_metrics(stats)
|
|
79
|
+
MetricEmitter.emit_batch_metrics(stats, worker_id:)
|
|
89
80
|
end
|
|
90
81
|
|
|
91
82
|
def check_prerequisites!
|
|
@@ -128,7 +119,7 @@ module Journaled
|
|
|
128
119
|
|
|
129
120
|
# Collect and emit queue metrics
|
|
130
121
|
def collect_and_emit_metrics
|
|
131
|
-
|
|
122
|
+
MetricEmitter.emit_queue_metrics(worker_id:)
|
|
132
123
|
end
|
|
133
124
|
end
|
|
134
125
|
end
|
data/lib/journaled/version.rb
CHANGED
data/lib/journaled.rb
CHANGED
|
@@ -12,7 +12,9 @@ require 'journaled/delivery_adapter'
|
|
|
12
12
|
require 'journaled/delivery_adapters/active_job_adapter'
|
|
13
13
|
require 'journaled/outbox/adapter'
|
|
14
14
|
require 'journaled/kinesis_client_factory'
|
|
15
|
+
require 'journaled/kinesis_failed_event'
|
|
15
16
|
require 'journaled/kinesis_batch_sender'
|
|
17
|
+
require 'journaled/kinesis_sequential_sender'
|
|
16
18
|
require 'journaled/outbox/batch_processor'
|
|
17
19
|
require 'journaled/outbox/metric_emitter'
|
|
18
20
|
require 'journaled/outbox/worker'
|
|
@@ -31,8 +33,9 @@ module Journaled
|
|
|
31
33
|
mattr_writer(:transactional_batching_enabled) { true }
|
|
32
34
|
|
|
33
35
|
# Worker configuration (for Outbox-style event processing)
|
|
34
|
-
mattr_accessor(:worker_batch_size) {
|
|
35
|
-
mattr_accessor(:worker_poll_interval) {
|
|
36
|
+
mattr_accessor(:worker_batch_size) { 500 }
|
|
37
|
+
mattr_accessor(:worker_poll_interval) { 0.5 } # seconds
|
|
38
|
+
mattr_accessor(:outbox_processing_mode) { :batch } # :batch or :guaranteed_order
|
|
36
39
|
|
|
37
40
|
def self.transactional_batching_enabled?
|
|
38
41
|
Thread.current[:journaled_transactional_batching_enabled] || @@transactional_batching_enabled
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: journaled
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 6.2.
|
|
4
|
+
version: 6.2.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Jake Lipson
|
|
@@ -274,6 +274,8 @@ files:
|
|
|
274
274
|
- lib/journaled/errors.rb
|
|
275
275
|
- lib/journaled/kinesis_batch_sender.rb
|
|
276
276
|
- lib/journaled/kinesis_client_factory.rb
|
|
277
|
+
- lib/journaled/kinesis_failed_event.rb
|
|
278
|
+
- lib/journaled/kinesis_sequential_sender.rb
|
|
277
279
|
- lib/journaled/outbox/adapter.rb
|
|
278
280
|
- lib/journaled/outbox/batch_processor.rb
|
|
279
281
|
- lib/journaled/outbox/metric_emitter.rb
|