prosody 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +7 -0
  2. data/.cargo/config.toml +2 -0
  3. data/.release-please-manifest.json +3 -0
  4. data/.rspec +3 -0
  5. data/.ruby-version +1 -0
  6. data/.standard.yml +9 -0
  7. data/.taplo.toml +6 -0
  8. data/ARCHITECTURE.md +591 -0
  9. data/CHANGELOG.md +92 -0
  10. data/Cargo.lock +3513 -0
  11. data/Cargo.toml +77 -0
  12. data/LICENSE +21 -0
  13. data/Makefile +36 -0
  14. data/README.md +946 -0
  15. data/Rakefile +26 -0
  16. data/ext/prosody/Cargo.toml +38 -0
  17. data/ext/prosody/extconf.rb +6 -0
  18. data/ext/prosody/src/admin.rs +171 -0
  19. data/ext/prosody/src/bridge/callback.rs +60 -0
  20. data/ext/prosody/src/bridge/mod.rs +332 -0
  21. data/ext/prosody/src/client/config.rs +819 -0
  22. data/ext/prosody/src/client/mod.rs +379 -0
  23. data/ext/prosody/src/gvl.rs +149 -0
  24. data/ext/prosody/src/handler/context.rs +436 -0
  25. data/ext/prosody/src/handler/message.rs +144 -0
  26. data/ext/prosody/src/handler/mod.rs +338 -0
  27. data/ext/prosody/src/handler/trigger.rs +93 -0
  28. data/ext/prosody/src/lib.rs +82 -0
  29. data/ext/prosody/src/logging.rs +353 -0
  30. data/ext/prosody/src/scheduler/cancellation.rs +67 -0
  31. data/ext/prosody/src/scheduler/handle.rs +50 -0
  32. data/ext/prosody/src/scheduler/mod.rs +169 -0
  33. data/ext/prosody/src/scheduler/processor.rs +166 -0
  34. data/ext/prosody/src/scheduler/result.rs +197 -0
  35. data/ext/prosody/src/tracing_util.rs +56 -0
  36. data/ext/prosody/src/util.rs +219 -0
  37. data/lib/prosody/configuration.rb +333 -0
  38. data/lib/prosody/handler.rb +177 -0
  39. data/lib/prosody/native_stubs.rb +417 -0
  40. data/lib/prosody/processor.rb +321 -0
  41. data/lib/prosody/sentry.rb +36 -0
  42. data/lib/prosody/version.rb +10 -0
  43. data/lib/prosody.rb +42 -0
  44. data/release-please-config.json +10 -0
  45. data/sig/configuration.rbs +252 -0
  46. data/sig/handler.rbs +79 -0
  47. data/sig/processor.rbs +100 -0
  48. data/sig/prosody.rbs +171 -0
  49. data/sig/version.rbs +9 -0
  50. metadata +193 -0
data/README.md ADDED
@@ -0,0 +1,946 @@
1
+ # Prosody: Ruby Bindings for Kafka
2
+
3
+ Prosody offers Ruby bindings to the [Prosody Kafka client](https://github.com/prosody-events/prosody), providing
4
+ features for message production and consumption, including configurable retry mechanisms, failure handling
5
+ strategies, and integrated OpenTelemetry support for distributed tracing.
6
+
7
+ ## Features
8
+
9
+ - **Kafka Consumer**: Per-key ordering with cross-key concurrency, offset management, consumer groups
10
+ - **Kafka Producer**: Idempotent delivery with configurable retries
11
+ - **Timer System**: Persistent scheduled execution backed by Cassandra or in-memory store
12
+ - **Quality of Service**: Fair scheduling limits concurrency and prevents failures from starving fresh traffic. Pipeline mode adds deferred retry and monopolization detection
13
+ - **Distributed Tracing**: OpenTelemetry integration for tracing message flow across services
14
+ - **Backpressure**: Pauses partitions when handlers fall behind
15
+ - **Mocking**: In-memory Kafka broker for tests (`mock: true`)
16
+ - **Failure Handling**: Pipeline (retry forever), Low-Latency (dead letter), Best-Effort (log and skip)
17
+
18
+ ## Installation
19
+
20
+ Add this line to your application's Gemfile:
21
+
22
+ ```ruby
23
+ gem "prosody"
24
+ ```
25
+
26
+ Or install directly:
27
+
28
+ ```bash
29
+ gem install prosody
30
+ ```
31
+
32
+ ## Quick Start
33
+
34
+ ```ruby
35
+ require "prosody"
36
+
37
+ # Initialize the client with Kafka bootstrap server, consumer group, and topics
38
+ client = Prosody::Client.new(
39
+ # Bootstrap servers should normally be set using the PROSODY_BOOTSTRAP_SERVERS environment variable
40
+ bootstrap_servers: "localhost:9092",
41
+
42
+ # To allow loopbacks, the source_system must be different from the group_id.
43
+ # Normally, the source_system would be left unspecified, which would default to the group_id.
44
+ source_system: "my-application-source",
45
+
46
+ # The group_id should be set to the name of your application
47
+ group_id: "my-application",
48
+
49
+ # Topics the client should subscribe to
50
+ subscribed_topics: "my-topic"
51
+ )
52
+
53
+ # Define a custom message handler
54
+ class MyHandler < Prosody::EventHandler
55
+ def on_message(context, message)
56
+ # Process the received message
57
+ puts "Received message: #{message.payload.inspect}"
58
+
59
+ # Schedule a timer for delayed processing (requires Cassandra unless mock: true)
60
+ if message.payload["schedule_followup"]
61
+ future_time = Time.now + 30 # 30 seconds from now
62
+ context.schedule(future_time)
63
+ end
64
+ end
65
+
66
+ def on_timer(context, timer)
67
+ # Handle timer firing
68
+ puts "Timer fired for key: #{timer.key} at #{timer.time}"
69
+ end
70
+ end
71
+
72
+ # Subscribe to messages using the custom handler
73
+ client.subscribe(MyHandler.new)
74
+
75
+ # Send a message to a topic
76
+ client.send_message("my-topic", "message-key", { content: "Hello, Kafka!" })
77
+
78
+ # Ensure proper shutdown when done
79
+ client.unsubscribe
80
+ ```
81
+
82
+ ## Architecture
83
+
84
+ Prosody enables efficient, parallel processing of Kafka messages while maintaining order for messages with the same key:
85
+
86
+ - **Partition-Level Parallelism**: Separate management of each Kafka partition
87
+ - **Key-Based Queuing**: Ordered processing for each key within a partition
88
+ - **Concurrent Processing**: Simultaneous processing of different keys
89
+ - **Backpressure Management**: Pause consumption from backed-up partitions
90
+
91
+ ## Quality of Service
92
+
93
+ All modes use **fair scheduling** to limit concurrency and distribute execution time. Pipeline mode adds **deferred
94
+ retry** and **monopolization detection**.
95
+
96
+ ### Fair Scheduling (All Modes)
97
+
98
+ The scheduler controls which message runs next and how many run concurrently.
99
+
100
+ **Virtual Time (VT):** Each key accumulates VT equal to its handler execution time. The scheduler picks the key with the
101
+ lowest VT. A key that runs for 500ms accumulates 500ms of VT; a key that hasn't run recently has zero VT and gets
102
+ priority.
103
+
104
+ **Two-Class Split:** Normal messages and failure retries have separate VT pools. The scheduler allocates execution time
105
+ between them (default: 70% normal, 30% failure). During a failure spike, retries get at most 30% of execution time—fresh
106
+ messages continue processing.
107
+
108
+ **Starvation Prevention:** Tasks receive a quadratic priority boost based on wait time. A task waiting 2 minutes
109
+ (configurable) gets maximum boost, overriding VT disadvantage.
110
+
111
+ ### Deferred Retry (Pipeline Mode)
112
+
113
+ Moves failing keys to timer-based retry so the partition can continue processing other keys.
114
+
115
+ On transient failure: store the message offset in Cassandra, schedule a timer, return success. The partition advances.
116
+ When the timer fires, reload the message from Kafka and retry.
117
+
118
+ ```ruby
119
+ # Configure defer behavior
120
+ client = Prosody::Client.new(
121
+ group_id: "my-consumer-group",
122
+ subscribed_topics: "my-topic",
123
+ defer_enabled: true, # Enable deferral (default: true)
124
+ defer_base: 1.0, # Wait 1s before first retry
125
+ defer_max_delay: 86400.0, # Cap at 24 hours
126
+ defer_failure_threshold: 0.9 # Disable when >90% failing
127
+ )
128
+ ```
129
+
130
+ **Failure Rate Gating:** When >90% of recent messages fail, deferral disables. The retry middleware blocks the
131
+ partition, applying backpressure upstream.
132
+
133
+ ### Monopolization Detection (Pipeline Mode)
134
+
135
+ Rejects keys that consume too much execution time.
136
+
137
+ The middleware tracks per-key execution time in 5-minute rolling windows. Keys exceeding 90% of window time are rejected
138
+ with a transient error, routing them through defer.
139
+
140
+ ```ruby
141
+ # Configure monopolization detection
142
+ client = Prosody::Client.new(
143
+ group_id: "my-consumer-group",
144
+ subscribed_topics: "my-topic",
145
+ monopolization_enabled: true, # Enable detection (default: true)
146
+ monopolization_threshold: 0.9, # Reject keys using >90% of window
147
+ monopolization_window: 300.0 # 5-minute window
148
+ )
149
+ ```
150
+
151
+ ### Handler Timeout
152
+
153
+ Handlers are automatically cancelled if they exceed a deadline:
154
+
155
+ ```ruby
156
+ client = Prosody::Client.new(
157
+ group_id: "my-consumer-group",
158
+ subscribed_topics: "my-topic",
159
+ timeout: 30.0, # Cancel after 30 seconds
160
+ stall_threshold: 60.0 # Report unhealthy after 60 seconds
161
+ )
162
+ ```
163
+
164
+ When a handler times out, `context.should_cancel?` returns `true`. The handler should exit promptly. If not specified,
165
+ timeout defaults to 80% of `stall_threshold`.
166
+
167
+ ## Configuration
168
+
169
+ Configure via constructor options or environment variables. Options fall back to environment variables when unset.
170
+
171
+ ### Core
172
+
173
+ | Option / Environment Variable | Description | Default |
174
+ |-----------------------------------------|---------------------------------------------------|--------------|
175
+ | `bootstrap_servers` / `PROSODY_BOOTSTRAP_SERVERS` | Kafka servers to connect to | - |
176
+ | `group_id` / `PROSODY_GROUP_ID` | Consumer group name | - |
177
+ | `subscribed_topics` / `PROSODY_SUBSCRIBED_TOPICS` | Topics to read from | - |
178
+ | `allowed_events` / `PROSODY_ALLOWED_EVENTS` | Only process events matching these prefixes | (all) |
179
+ | `source_system` / `PROSODY_SOURCE_SYSTEM` | Tag for outgoing messages (prevents reprocessing)| `<group_id>` |
180
+ | `mock` / `PROSODY_MOCK` | Use in-memory Kafka for testing | false |
181
+
182
+ ### Consumer
183
+
184
+ | Option / Environment Variable | Description | Default |
185
+ |-----------------------------------------|------------------------------------------------------|------------------------|
186
+ | `max_concurrency` / `PROSODY_MAX_CONCURRENCY` | Max messages being processed simultaneously | 32 |
187
+ | `max_uncommitted` / `PROSODY_MAX_UNCOMMITTED` | Max queued messages before pausing consumption | 64 |
188
+ | `timeout` / `PROSODY_TIMEOUT` | Cancel handler if it runs longer than this | 80% of stall threshold |
189
+ | `commit_interval` / `PROSODY_COMMIT_INTERVAL` | How often to save progress to Kafka | 1s |
190
+ | `poll_interval` / `PROSODY_POLL_INTERVAL` | How often to fetch new messages from Kafka | 100ms |
191
+ | `shutdown_timeout` / `PROSODY_SHUTDOWN_TIMEOUT` | Shutdown budget; handlers run freely until cancellation fires near the end of the timeout | 30s |
192
+ | `stall_threshold` / `PROSODY_STALL_THRESHOLD` | Report unhealthy if no progress for this long | 5m |
193
+ | `probe_port` / `PROSODY_PROBE_PORT` | HTTP port for health checks (nil to disable) | 8000 |
194
+ | `failure_topic` / `PROSODY_FAILURE_TOPIC` | Send unprocessable messages here (dead letter queue) | - |
195
+ | `idempotence_cache_size` / `PROSODY_IDEMPOTENCE_CACHE_SIZE` | Global shared cache capacity across all partitions for message deduplication (0 disables the entire deduplication middleware, both in-memory and persistent) | 8192 |
196
+ | `idempotence_version` / `PROSODY_IDEMPOTENCE_VERSION` | Version string for cache-busting dedup hashes | 1 |
197
+ | `idempotence_ttl` / `PROSODY_IDEMPOTENCE_TTL` | TTL for dedup records in Cassandra | 7d (604800 seconds) |
198
+ | `slab_size` / `PROSODY_SLAB_SIZE` | Timer storage granularity (rarely needs changing) | 1h |
199
+ | `message_spans` / `PROSODY_MESSAGE_SPANS` | Span linking for message execution: `child` (child-of) or `follows_from` | `child` |
200
+ | `timer_spans` / `PROSODY_TIMER_SPANS` | Span linking for timer execution: `child` (child-of) or `follows_from` | `follows_from` |
201
+
202
+ ### Producer
203
+
204
+ | Option / Environment Variable | Description | Default |
205
+ |-----------------------------------------|---------------------------------|---------|
206
+ | `send_timeout` / `PROSODY_SEND_TIMEOUT` | Give up sending after this long | 1s |
207
+
208
+ ### Retry
209
+
210
+ When a handler fails, retry with exponential backoff:
211
+
212
+ | Option / Environment Variable | Description | Default |
213
+ |-----------------------------------------|-----------------------------------|---------|
214
+ | `max_retries` / `PROSODY_MAX_RETRIES` | Give up after this many attempts | 3 |
215
+ | `retry_base` / `PROSODY_RETRY_BASE` | Wait this long before first retry | 20ms |
216
+ | `max_retry_delay` / `PROSODY_RETRY_MAX_DELAY` | Never wait longer than this | 5m |
217
+
218
+ ### Deferral (Pipeline Mode)
219
+
220
+ | Option / Environment Variable | Description | Default |
221
+ |-----------------------------------------|---------------------------------------------------|---------|
222
+ | `defer_enabled` / `PROSODY_DEFER_ENABLED` | Enable deferral for new messages | true |
223
+ | `defer_base` / `PROSODY_DEFER_BASE` | Wait this long before first deferred retry | 1s |
224
+ | `defer_max_delay` / `PROSODY_DEFER_MAX_DELAY` | Never wait longer than this | 24h |
225
+ | `defer_failure_threshold` / `PROSODY_DEFER_FAILURE_THRESHOLD` | Disable deferral when failure rate exceeds this | 0.9 |
226
+ | `defer_failure_window` / `PROSODY_DEFER_FAILURE_WINDOW` | Measure failure rate over this time window | 5m |
227
+ | `defer_cache_size` / `PROSODY_DEFER_CACHE_SIZE` | Track this many deferred keys in memory | 1024 |
228
+ | `defer_seek_timeout` / `PROSODY_DEFER_SEEK_TIMEOUT` | Timeout when loading deferred messages | 30s |
229
+ | `defer_discard_threshold` / `PROSODY_DEFER_DISCARD_THRESHOLD` | Read optimization (rarely needs changing) | 100 |
230
+
231
+ ### Monopolization Detection (Pipeline Mode)
232
+
233
+ | Option / Environment Variable | Description | Default |
234
+ |-----------------------------------------|-----------------------------------------|---------|
235
+ | `monopolization_enabled` / `PROSODY_MONOPOLIZATION_ENABLED` | Enable hot key protection | true |
236
+ | `monopolization_threshold` / `PROSODY_MONOPOLIZATION_THRESHOLD` | Max handler time as fraction of window | 0.9 |
237
+ | `monopolization_window` / `PROSODY_MONOPOLIZATION_WINDOW` | Measurement window | 5m |
238
+ | `monopolization_cache_size` / `PROSODY_MONOPOLIZATION_CACHE_SIZE` | Max distinct keys to track | 8192 |
239
+
240
+ ### Fair Scheduling (All Modes)
241
+
242
+ | Option / Environment Variable | Description | Default |
243
+ |-----------------------------------------|------------------------------------------------------------------|---------|
244
+ | `scheduler_failure_weight` / `PROSODY_SCHEDULER_FAILURE_WEIGHT` | Fraction of processing time reserved for retries | 0.3 |
245
+ | `scheduler_max_wait` / `PROSODY_SCHEDULER_MAX_WAIT` | Messages waiting this long get maximum priority | 2m |
246
+ | `scheduler_wait_weight` / `PROSODY_SCHEDULER_WAIT_WEIGHT` | Priority boost for waiting messages (higher = more aggressive) | 200.0 |
247
+ | `scheduler_cache_size` / `PROSODY_SCHEDULER_CACHE_SIZE` | Max distinct keys to track | 8192 |
248
+
249
+ ### Cassandra
250
+
251
+ Persistent storage for timers and deferred retries (not needed if `mock: true`):
252
+
253
+ | Option / Environment Variable | Description | Default |
254
+ |-----------------------------------------|------------------------------------|---------|
255
+ | `cassandra_nodes` / `PROSODY_CASSANDRA_NODES` | Servers to connect to (host:port) | - |
256
+ | `cassandra_keyspace` / `PROSODY_CASSANDRA_KEYSPACE` | Keyspace name | prosody |
257
+ | `cassandra_user` / `PROSODY_CASSANDRA_USER` | Username | - |
258
+ | `cassandra_password` / `PROSODY_CASSANDRA_PASSWORD` | Password | - |
259
+ | `cassandra_datacenter` / `PROSODY_CASSANDRA_DATACENTER` | Prefer this datacenter for queries | - |
260
+ | `cassandra_rack` / `PROSODY_CASSANDRA_RACK` | Prefer this rack for queries | - |
261
+ | `cassandra_retention` / `PROSODY_CASSANDRA_RETENTION` | Delete data older than this | 1y |
262
+
263
+ ### Telemetry Emitter
264
+
265
+ Prosody can emit internal processing events (message lifecycle, timer events) to a Kafka topic for observability:
266
+
267
+ | Option / Environment Variable | Description | Default |
268
+ |-----------------------------------------|------------------------------------------------|----------------------------|
269
+ | `telemetry_topic` / `PROSODY_TELEMETRY_TOPIC` | Kafka topic to produce telemetry events to | `prosody.telemetry-events` |
270
+ | `telemetry_enabled` / `PROSODY_TELEMETRY_ENABLED` | Enable or disable the telemetry emitter | true |
271
+
272
+ ## Logging
273
+
274
+ Prosody exposes a module-level logger used by both the native Rust extension and the Ruby async processor. By default it
275
+ writes to `$stdout` at the `INFO` level.
276
+
277
+ ```ruby
278
+ # Read the current logger
279
+ Prosody.logger
280
+ # => #<Logger:... @level=1 ...>
281
+
282
+ # Assign a custom logger
283
+ Prosody.logger = Logger.new("log/prosody.log", level: Logger::DEBUG)
284
+
285
+ # Or silence logging entirely
286
+ Prosody.logger = Logger.new(File::NULL)
287
+ ```
288
+
289
+ Set `Prosody.logger` **before** creating a `Prosody::Client`. The Rust runtime reads the logger on first client
290
+ initialization and will use whatever logger is configured at that point.
291
+
292
+ Setting the logger back to `nil` restores the default:
293
+
294
+ ```ruby
295
+ Prosody.logger = nil
296
+ Prosody.logger.level # => Logger::INFO
297
+ ```
298
+
299
+ ## Liveness and Readiness Probes
300
+
301
+ Prosody includes a built-in probe server for consumer-based applications that provides health check endpoints. The probe
302
+ server is tied to the consumer's lifecycle and offers two main endpoints:
303
+
304
+ 1. `/readyz`: A readiness probe that checks if any partitions are assigned to the consumer. Returns a success status
305
+ only when the consumer has at least one partition assigned, indicating it's ready to process messages.
306
+
307
+ 2. `/livez`: A liveness probe that checks if any partitions have stalled (haven't processed a message within a
308
+ configured time threshold).
309
+
310
+ Configure the probe server using either the client constructor:
311
+
312
+ ```ruby
313
+ client = Prosody::Client.new(
314
+ group_id: "my-consumer-group",
315
+ subscribed_topics: "my-topic",
316
+ probe_port: 8000, # Set to false to disable
317
+ stall_threshold: 15.0 # Seconds before considering a partition stalled
318
+ )
319
+ ```
320
+
321
+ Or via environment variables:
322
+
323
+ ```bash
324
+ PROSODY_PROBE_PORT=8000 # Set to 'none' to disable
325
+ PROSODY_STALL_THRESHOLD=15s # Default stall detection threshold
326
+ ```
327
+
328
+ ### Important Notes
329
+
330
+ 1. The probe server starts automatically when the consumer is subscribed and stops when unsubscribed.
331
+ 2. A partition is considered "stalled" if it hasn't processed a message within the `stall_threshold` duration.
332
+ 3. The stall threshold should be set based on your application's message processing latency and expected message
333
+ frequency.
334
+ 4. Setting the threshold too low might cause false positives, while setting it too high could delay detection of actual
335
+ issues.
336
+ 5. The probe server is only active when consuming messages (not for producer-only usage).
337
+
338
+ You can monitor the stall state programmatically using the client's methods:
339
+
340
+ ```ruby
341
+ # Get the number of partitions currently assigned to this consumer
342
+ partition_count = client.assigned_partitions
343
+
344
+ # Check if the consumer has stalled partitions
345
+ if client.is_stalled?
346
+ warn 'Consumer has stalled partitions'
347
+ end
348
+ ```
349
+
350
+ ## Advanced Usage
351
+
352
+ ### Pipeline Mode
353
+
354
+ Pipeline mode is the default mode. Ensures ordered processing, retrying failed operations indefinitely:
355
+
356
+ ```ruby
357
+ # Initialize client in pipeline mode
358
+ client = Prosody::Client.new(
359
+ mode: :pipeline, # Explicitly set pipeline mode (this is the default)
360
+ group_id: "my-consumer-group",
361
+ subscribed_topics: "my-topic"
362
+ )
363
+ ```
364
+
365
+ ### Low-Latency Mode
366
+
367
+ Prioritizes quick processing, sending persistently failing messages to a failure topic:
368
+
369
+ ```ruby
370
+ # Initialize client in low-latency mode
371
+ client = Prosody::Client.new(
372
+ mode: :low_latency, # Set low-latency mode
373
+ group_id: "my-consumer-group",
374
+ subscribed_topics: "my-topic",
375
+ failure_topic: "failed-messages" # Specify a topic for failed messages
376
+ )
377
+ ```
378
+
379
+ ### Best-Effort Mode
380
+
381
+ Optimized for development environments or services where message processing failures are acceptable:
382
+
383
+ ```ruby
384
+ # Initialize client in best-effort mode
385
+ client = Prosody::Client.new(
386
+ mode: :best_effort, # Set best-effort mode
387
+ group_id: "my-consumer-group",
388
+ subscribed_topics: "my-topic"
389
+ )
390
+ ```
391
+
392
+ ## Event Type Filtering
393
+
394
+ Prosody supports filtering messages based on event type prefixes, allowing your consumer to process only specific types of events:
395
+
396
+ ```ruby
397
+ # Process only events with types starting with "user." or "account."
398
+ client = Prosody::Client.new(
399
+ group_id: "my-consumer-group",
400
+ subscribed_topics: "my-topic",
401
+ allowed_events: ["user.", "account."]
402
+ )
403
+ ```
404
+
405
+ Or via environment variables:
406
+
407
+ ```bash
408
+ PROSODY_ALLOWED_EVENTS=user.,account.
409
+ ```
410
+
411
+ ### Matching Behavior
412
+
413
+ Prefixes must match exactly from the start of the event type:
414
+
415
+ Matches:
416
+ - `{"type": "user.created"}` matches prefix `user.`
417
+ - `{"type": "account.deleted"}` matches prefix `account.`
418
+
419
+ No Match:
420
+ - `{"type": "admin.user.created"}` doesn't match `user.`
421
+ - `{"type": "my.account.deleted"}` doesn't match `account.`
422
+ - `{"type": "notification"}` doesn't match any prefix
423
+
424
+ If no prefixes are configured, all messages are processed. Messages without a `type` field are always processed.
425
+
426
+ ## Source System Deduplication
427
+
428
+ Prosody prevents processing loops in distributed systems by tracking the source of each message:
429
+
430
+ ```ruby
431
+ # Consumer and producer in one application
432
+ client = Prosody::Client.new(
433
+ group_id: "my-service",
434
+ source_system: "my-service-producer", # Must differ from group_id to allow loopbacks; defaults to group_id
435
+ subscribed_topics: "my-topic"
436
+ )
437
+ ```
438
+
439
+ Or via environment variable:
440
+
441
+ ```bash
442
+ PROSODY_SOURCE_SYSTEM=my-service-producer
443
+ ```
444
+
445
+ ### How It Works
446
+
447
+ 1. **Producers** add a `source-system` header to all outgoing messages.
448
+ 2. **Consumers** check this header on incoming messages.
449
+ 3. If a message's source system matches the consumer's group ID, the message is skipped.
450
+
451
+ This prevents endless loops where a service consumes its own produced messages.
452
+
453
+ ## Message Deduplication
454
+
455
+ Prosody automatically deduplicates messages using the `id` field in their JSON payload. Consecutive messages with the
456
+ same ID and key are processed only once.
457
+
458
+ The deduplication system uses:
459
+ - A **global in-memory cache** shared across all partitions, surviving partition reassignments within a process
460
+ - A **Cassandra-backed persistent store** for cross-restart deduplication
461
+
462
+ ```ruby
463
+ # Messages with IDs are deduplicated per key
464
+ client.send_message("my-topic", "key1", {
465
+ id: "msg-123", # Message will be processed
466
+ content: "Hello!"
467
+ })
468
+
469
+ client.send_message("my-topic", "key1", {
470
+ id: "msg-123", # Message will be skipped (duplicate)
471
+ content: "Hello again!"
472
+ })
473
+
474
+ client.send_message("my-topic", "key2", {
475
+ id: "msg-123", # Message will be processed (different key)
476
+ content: "Hello!"
477
+ })
478
+ ```
479
+
480
+ Setting `idempotence_cache_size` to `0` disables the **entire** deduplication middleware (both the in-memory cache and the Cassandra-backed persistent store):
481
+
482
+ ```ruby
483
+ client = Prosody::Client.new(
484
+ group_id: "my-consumer-group",
485
+ subscribed_topics: "my-topic",
486
+ idempotence_cache_size: 0 # Disable all deduplication (both in-memory and persistent)
487
+ )
488
+ ```
489
+
490
+ Or via environment variable:
491
+
492
+ ```bash
493
+ PROSODY_IDEMPOTENCE_CACHE_SIZE=0
494
+ ```
495
+
496
+ To invalidate all previously recorded dedup entries (e.g. after a data migration), change the version string:
497
+
498
+ ```ruby
499
+ client = Prosody::Client.new(
500
+ group_id: "my-consumer-group",
501
+ subscribed_topics: "my-topic",
502
+ idempotence_version: "2" # Changing this invalidates all existing dedup records
503
+ )
504
+ ```
505
+
506
+ The `idempotence_ttl` option controls how long dedup records are retained in Cassandra (default: 7 days):
507
+
508
+ ```ruby
509
+ client = Prosody::Client.new(
510
+ group_id: "my-consumer-group",
511
+ subscribed_topics: "my-topic",
512
+ idempotence_ttl: 86400.0 # Keep dedup records for 1 day
513
+ )
514
+ ```
515
+
516
+ Note that the in-memory cache is best-effort. Duplicates can still occur across different process instances.
517
+
518
+ ## Timer Functionality
519
+
520
+ Prosody supports timer-based delayed execution within message handlers. When a timer fires, your handler's `on_timer` method will be called:
521
+
522
+ ```ruby
523
+ class MyHandler < Prosody::EventHandler
524
+ def on_message(context, message)
525
+ # Schedule a timer to fire in 30 seconds
526
+ future_time = Time.now + 30
527
+ context.schedule(future_time)
528
+
529
+ # Schedule multiple timers
530
+ one_minute = Time.now + 60
531
+ two_minutes = Time.now + 120
532
+ context.schedule(one_minute)
533
+ context.schedule(two_minutes)
534
+
535
+ # Check what's scheduled
536
+ scheduled_times = context.scheduled
537
+ puts "Scheduled timers: #{scheduled_times.length}"
538
+ end
539
+
540
+ def on_timer(context, timer)
541
+ puts "Timer fired!"
542
+ puts "Key: #{timer.key}"
543
+ puts "Scheduled time: #{timer.time}"
544
+ end
545
+ end
546
+ ```
547
+
548
+ ### Timer Methods
549
+
550
+ The context provides timer scheduling methods that allow you to delay execution or implement timeout behavior:
551
+
552
+ - `schedule(time)`: Schedules a timer to fire at the specified time
553
+ - `clear_and_schedule(time)`: Clears all timers and schedules a new one
554
+ - `unschedule(time)`: Removes a timer scheduled for the specified time
555
+ - `clear_scheduled`: Removes all scheduled timers
556
+ - `scheduled`: Returns an array of all scheduled timer times
557
+
558
+ ### Timer Object
559
+
560
+ When a timer fires, the `on_timer` method receives a timer object with these properties:
561
+
562
+ - `key` (String): The entity key identifying what this timer belongs to
563
+ - `time` (Time): The time when this timer was scheduled to fire
564
+
565
+ **Note**: Timer precision is limited to seconds due to the underlying storage format. Sub-second precision in scheduled times will be rounded to the nearest second.
566
+
567
+ ### Timer Configuration
568
+
569
+ Timer functionality requires Cassandra for persistence unless running in mock mode. Configure Cassandra connection via environment variable:
570
+
571
+ ```bash
572
+ PROSODY_CASSANDRA_NODES=localhost:9042 # Required for timer persistence
573
+ ```
574
+
575
+ Or programmatically when creating the client:
576
+
577
+ ```ruby
578
+ client = Prosody::Client.new(
579
+ bootstrap_servers: "localhost:9092",
580
+ group_id: "my-application",
581
+ subscribed_topics: "my-topic",
582
+ cassandra_nodes: "localhost:9042" # Required unless mock: true
583
+ )
584
+ ```
585
+
586
+ For testing, you can use mock mode to avoid Cassandra dependency:
587
+
588
+ ```ruby
589
+ # Mock mode for testing (timers work but aren't persisted)
590
+ client = Prosody::Client.new(
591
+ bootstrap_servers: "localhost:9092",
592
+ group_id: "my-application",
593
+ subscribed_topics: "my-topic",
594
+ mock: true # No Cassandra required in mock mode
595
+ )
596
+ ```
597
+
598
+ ## OpenTelemetry Tracing
599
+
600
+ Prosody supports OpenTelemetry tracing, allowing you to monitor and analyze the performance of your Kafka-based
601
+ applications. The library will emit traces using the OTLP protocol if the `OTEL_EXPORTER_OTLP_ENDPOINT` environment
602
+ variable is defined.
603
+
604
+ Note: Prosody emits its own traces separately because it uses its own tracing runtime, as it would be expensive to send
605
+ all traces to Ruby.
606
+
607
+ ### Required Gems
608
+
609
+ To use OpenTelemetry tracing with Prosody, you need to install the following gems:
610
+
611
+ ```ruby
612
+ gem 'opentelemetry-sdk', '~> 1.10'
613
+ gem 'opentelemetry-api', '~> 1.7'
614
+ gem 'opentelemetry-exporter-otlp', '~> 0.31'
615
+ ```
616
+
617
+ ### Initializing Tracing
618
+
619
+ To initialize tracing in your application:
620
+
621
+ ```ruby
622
+ require 'opentelemetry/sdk'
623
+ require 'opentelemetry/exporter/otlp'
624
+
625
+ OpenTelemetry::SDK.configure do |c|
626
+ c.service_name = 'my-service-name'
627
+ c.add_span_processor(
628
+ OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
629
+ OpenTelemetry::Exporter::OTLP::Exporter.new
630
+ )
631
+ )
632
+ end
633
+
634
+ tracer = OpenTelemetry.tracer_provider.tracer('my-service-name')
635
+ ```
636
+
637
+ ### Setting OpenTelemetry Environment Variables
638
+
639
+ Set the following standard OpenTelemetry environment variables:
640
+
641
+ ```bash
642
+ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
643
+ OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
644
+ OTEL_SERVICE_NAME=my-service-name
645
+ ```
646
+
647
+ For more information on these and other OpenTelemetry environment variables, refer to
648
+ the [OpenTelemetry specification](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#general-sdk-configuration).
649
+
650
+ ### Using Tracing in Your Application
651
+
652
+ After initializing tracing, you can define spans in your application, and they will be properly propagated through
653
+ Kafka:
654
+
655
+ ```ruby
656
+ class MyHandler < Prosody::EventHandler
657
+ def initialize
658
+ @tracer = OpenTelemetry.tracer_provider.tracer('my-service-name')
659
+ end
660
+
661
+ def on_message(context, message)
662
+ @tracer.in_span('process-message') do |span|
663
+ # Process the received message
664
+ span.add_event('message.received', attributes: {
665
+ 'message.payload' => message.payload.to_json
666
+ })
667
+ end
668
+ end
669
+ end
670
+ ```
671
+
672
+ ### Span Linking
673
+
674
+ By default, message execution spans use **`child`** (child-of relationship — the execution span is part of
675
+ the same trace as the producer). Timer execution spans use **`follows_from`** (the execution span starts a
676
+ new trace with a span link back to the scheduling span, since timer execution is causally related but not part of
677
+ the same operation).
678
+
679
+ Both strategies are configurable via the `message_spans` / `PROSODY_MESSAGE_SPANS` and `timer_spans` /
680
+ `PROSODY_TIMER_SPANS` options. Accepted values: `'child'`, `'follows_from'`.
681
+
682
+ ## Best Practices
683
+
684
+ ### Ensuring Thread-Safe Handlers
685
+
686
+ Your event handler methods will be called concurrently. Avoid using mutable shared state across event handler calls.
687
+ If you must use shared state, use appropriate synchronization primitives.
688
+
689
+ ### Ensuring Idempotent Message Handlers
690
+
691
+ Idempotent message handlers are crucial for maintaining data consistency, fault tolerance, and scalability when working
692
+ with distributed, event-based systems. They ensure that processing a message multiple times has the same effect as
693
+ processing it once, which is essential for recovering from failures.
694
+
695
+ Strategies for achieving idempotence:
696
+
697
+ 1. **Natural Idempotence**: Use inherently idempotent operations (e.g., setting a value in a key-value store).
698
+
699
+ 2. **Deduplication with Unique Identifiers**:
700
+ - Kafka messages can be uniquely identified by their partition and offset.
701
+ - Before processing, check if the message has been handled before.
702
+ - Store processed message identifiers with an appropriate TTL.
703
+
704
+ 3. **Database Upserts**: Use upsert operations for database writes (e.g., `INSERT ... ON CONFLICT DO UPDATE` in
705
+ PostgreSQL).
706
+
707
+ 4. **Partition Offset Tracking**:
708
+ - Store the latest processed offset for each partition.
709
+ - Only process messages with higher offsets than the last processed one.
710
+ - Critically, store these offsets transactionally with other state updates to ensure consistency.
711
+
712
+ 5. **Idempotency Keys for External APIs**: Utilize idempotency keys when supported by external APIs.
713
+
714
+ 6. **Check-then-Act Pattern**:
715
+ - For non-idempotent external systems, verify if an operation was previously completed before execution.
716
+ - Maintain a record of completed operations, keyed by a unique message identifier.
717
+
718
+ 7. **Saga Pattern**:
719
+ - Implement a state machine in your database for multi-step operations.
720
+ - Each message advances the state machine, allowing for idempotent processing and easy failure recovery.
721
+ - Particularly useful for complex, distributed transactions across multiple services.
722
+
723
+ ### Proper Shutdown
724
+
725
+ Always unsubscribe from topics before exiting your application:
726
+
727
+ ```ruby
728
+ # Ensure proper shutdown
729
+ client.unsubscribe
730
+ ```
731
+
732
+ This ensures:
733
+
734
+ 1. Completion and commitment of all in-flight work
735
+ 2. Quick rebalancing, allowing other consumers to take over partitions
736
+ 3. Proper release of resources
737
+
738
+ Implement shutdown handling in your application using signal handlers:
739
+
740
+ ```ruby
741
+ require "prosody"
742
+
743
+ client = Prosody::Client.new(
744
+ bootstrap_servers: "localhost:9092",
745
+ group_id: "my-consumer-group",
746
+ subscribed_topics: "my-topic"
747
+ )
748
+
749
+ # Set up a shutdown queue
750
+ shutdown = Queue.new
751
+
752
+ # Configure signal handlers to trigger shutdown
753
+ Signal.trap("INT") { shutdown.push(nil) }
754
+ Signal.trap("TERM") { shutdown.push(nil) }
755
+
756
+ # Subscribe to messages
757
+ client.subscribe(MyHandler.new)
758
+
759
+ # Block until a signal is received
760
+ shutdown.pop # This blocks until something is pushed to the queue by a signal handler
761
+
762
+ # Clean shutdown
763
+ puts "Shutting down gracefully..."
764
+ client.unsubscribe
765
+ ```
766
+
767
+ ### Error Handling
768
+
769
+ Prosody classifies errors as transient (temporary, can be retried) or permanent (won't be resolved by retrying). By
770
+ default, all errors are considered transient.
771
+
772
+ Use the `Prosody::EventHandler` error classification methods:
773
+
774
+ ```ruby
775
+ class MyHandler < Prosody::EventHandler
776
+ # Mark TypeErrors and NoMethodErrors as permanent (not retryable)
777
+ permanent :on_message, TypeError, NoMethodError
778
+
779
+ # Mark JSON::ParserError as transient (retryable)
780
+ transient :on_message, JSON::ParserError
781
+
782
+ def on_message(context, message)
783
+ # Your message handling logic here
784
+ # TypeError and NoMethodError will be treated as permanent
785
+ # JSON::ParserError will be treated as transient
786
+ # All other exceptions will be treated as transient (default behavior)
787
+ end
788
+ end
789
+ ```
790
+
791
+ Best practices:
792
+
793
+ - Use permanent errors for issues like malformed data or business logic violations.
794
+ - Use transient errors for temporary issues like network problems.
795
+ - Be cautious with permanent errors as they prevent retries and can result in data loss.
796
+ - Consider system reliability and data consistency when classifying errors.
797
+
798
+ ### Handling Task Cancellation
799
+
800
+ Prosody cancels tasks during partition rebalancing, timeout, or shutdown. During shutdown, handlers run freely for most of the `shutdown_timeout` before the cancellation signal fires—giving in-flight work time to complete. When cancelled, your handler receives `Async::Stop` at the next yield point (I/O operation, sleep, etc.).
801
+
802
+ Best practices:
803
+
804
+ 1. Use `ensure` blocks for resource cleanup—they run even when `Async::Stop` is raised.
805
+ 2. For CPU-bound loops that don't yield, check `context.should_cancel?` periodically.
806
+ 3. Exit promptly when cancelled to avoid rebalancing delays.
807
+
808
+ ```ruby
809
+ class MyHandler < Prosody::EventHandler
810
+ def on_message(context, message)
811
+ resource = acquire_resource
812
+ begin
813
+ items = message.payload["items"]
814
+ items.each do |item|
815
+ # For CPU-bound work, check cancellation periodically
816
+ return if context.should_cancel?
817
+
818
+ process_item(item)
819
+ end
820
+ ensure
821
+ # Always runs, even on Async::Stop
822
+ release_resource(resource)
823
+ end
824
+ end
825
+ end
826
+ ```
827
+
828
+ If you catch `Async::Stop` and don't re-raise it, Prosody considers the task successful:
829
+
830
+ ```ruby
831
+ def on_message(context, message)
832
+ do_work
833
+ rescue Async::Stop
834
+ # Custom cleanup on cancellation
835
+ cleanup
836
+ raise # Re-raise to signal cancellation to Prosody
837
+ end
838
+ ```
839
+
840
+ Failing to handle cancellation properly can lead to resource leaks or delayed rebalancing.
841
+
842
+ ## Release Process
843
+
844
+ Prosody uses an automated release process managed by GitHub Actions. Here's an overview of how releases are handled:
845
+
846
+ 1. **Trigger**: The release process is triggered automatically on pushes to the `main` branch.
847
+
848
+ 2. **Release Please**: The process starts with the "Release Please" action, which:
849
+ - Analyzes commit messages since the last release.
850
+ - Creates or updates a release pull request with changelog updates and version bumps.
851
+ - When the PR is merged, it creates a GitHub release and a git tag.
852
+
853
+ 3. **Build Process**: If a new release is created, the following build jobs are triggered:
854
+ - Linux builds for x86_64 and aarch64 architectures.
855
+ - Linux musl builds for the same architectures.
856
+ - macOS builds for x86_64 and arm64 architectures.
857
+ - Windows builds for x64 architecture.
858
+
859
+ 4. **Artifact Upload**: Each build job uploads its artifacts (Ruby native extensions) to GitHub Actions.
860
+
861
+ 5. **Publication**: If all builds are successful, the final step publishes the built gems.
862
+
863
+ ### Contributing to Releases
864
+
865
+ To contribute to a release:
866
+
867
+ 1. Make your changes in a feature branch.
868
+ 2. Use [Conventional Commits](https://www.conventionalcommits.org/) syntax for your commit messages. This helps Release
869
+ Please determine the next version number and generate the changelog.
870
+ 3. Create a pull request to merge your changes into the `main` branch.
871
+ 4. Once your PR is approved and merged, Release Please will include your changes in the next release PR.
872
+
873
+ ### Manual Releases
874
+
875
+ While the process is automated, manual intervention may sometimes be necessary:
876
+
877
+ - You can manually trigger the release workflow from the GitHub Actions tab if needed.
878
+ - If you need to make changes to the release PR created by Release Please, you can do so before merging it.
879
+
880
+ Ensure you have thoroughly tested your changes before merging to `main`.
881
+
882
+ ## API Reference
883
+
884
+ ### Prosody::Client
885
+
886
+ - `new(**config)`: Initialize a new Prosody client with the given configuration.
887
+ - `send_message(topic, key, payload)`: Send a message to a specified topic.
888
+ - `consumer_state`: Get the current state of the consumer (`:unconfigured`, `:configured`, or `:running`).
889
+ - `source_system`: Get the source system identifier configured for the client.
890
+ - `subscribe(handler)`: Subscribe to messages using the provided handler.
891
+ - `unsubscribe`: Unsubscribe from messages and shut down the consumer.
892
+ - `assigned_partitions`: Get the number of partitions currently assigned to this consumer.
893
+ - `is_stalled?`: Check if the consumer has stalled partitions.
894
+
895
+ ### Prosody::EventHandler
896
+
897
+ A base class for user-defined handlers:
898
+
899
+ ```ruby
900
+ class MyHandler < Prosody::EventHandler
901
+ # Optional error classification
902
+ permanent :on_message, TypeError
903
+ transient :on_message, JSON::ParserError
904
+
905
+ def on_message(context, message)
906
+ # Implement your message handling logic here
907
+ end
908
+
909
+ def on_timer(context, timer)
910
+ # Implement your timer handling logic here
911
+ end
912
+ end
913
+ ```
914
+
915
+ ### Prosody::Message
916
+
917
+ Represents a Kafka message with the following attributes:
918
+
919
+ - `topic` (String): The name of the topic.
920
+ - `partition` (Integer): The partition number.
921
+ - `offset` (Integer): The message offset within the partition.
922
+ - `timestamp` (Time): The timestamp when the message was created or sent.
923
+ - `key` (String): The message key.
924
+ - `payload` (Hash/Array/String): The message payload as a JSON-deserializable value.
925
+
926
+ ### Prosody::Context
927
+
928
+ Represents the context of message processing:
929
+
930
+ - `should_cancel?`: Check if cancellation has been requested (includes timeout and shutdown).
931
+ - `on_cancel`: Blocks until cancellation is signaled.
932
+
933
+ Timer scheduling methods:
934
+
935
+ - `schedule(time)`: Schedules a timer to fire at the specified time
936
+ - `clear_and_schedule(time)`: Clears all timers and schedules a new one
937
+ - `unschedule(time)`: Removes a timer scheduled for the specified time
938
+ - `clear_scheduled`: Removes all scheduled timers
939
+ - `scheduled`: Returns an array of all scheduled timer times
940
+
941
+ ### Prosody::Timer
942
+
943
+ Represents a timer that has fired, provided to the `on_timer` method:
944
+
945
+ - `key` (String): The entity key identifying what this timer belongs to
946
+ - `time` (Time): The time when this timer was scheduled to fire