jetstream_bridge 4.6.1 → 5.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 45f0a65b80e45040eb7dad01d30fc1eecb22a523132dd4bbccb3c36f1e7d61d7
4
- data.tar.gz: 5b059406d2787d097d443f0b6b2211b6e544b641f35cb1dfea24727368a47f29
3
+ metadata.gz: 6b75d900f4ede2b0dbc787641f532cfcbe07619435eaf3a1673b163d8f87658b
4
+ data.tar.gz: 7295f58824d037c799c6aa3487b650fe3030b50acd24fe858878dfbafa8c3cba
5
5
  SHA512:
6
- metadata.gz: aea59cbd54ad871b81518fca9d2d9e2a647f12b9c5cd2884ba92336a989b790ae312d0a63684e83418f86b0addc5b6382ff6f4c3c5e9332845a6e545e0b1af08
7
- data.tar.gz: 2950483c92f5164497409a08d4a7bb0578dd5c4cedb01c1d49532eb8941767d203d7bcb767ccb4557d5fb9911f4ebd75b12fa3d88bf31ad8c756a79ef9ed9e66
6
+ metadata.gz: 5788fd8947c2d9d3f6e7dc9c7da0bd9fe072c9682fb13a220b12977e4c53e3ec9b373917f96a7ee415488b3081c3dd939afedd56a9ec98f324e0e31919a48106
7
+ data.tar.gz: a8223affe27e1cf99bf2db07bf8672b49ecece81e410b40f3727366813ba6d39e18ae828052b6c9cf68693354af2eb382a3566338fb4312ceb78008c2a993b4b
data/README.md CHANGED
@@ -34,7 +34,7 @@ Production-ready NATS JetStream bridge for Ruby/Rails with outbox, inbox, DLQ, a
34
34
 
35
35
  ```ruby
36
36
  # Gemfile
37
- gem "jetstream_bridge", "~> 4.5"
37
+ gem "jetstream_bridge", "~> 5.0"
38
38
  ```
39
39
 
40
40
  ```bash
@@ -62,10 +62,11 @@ consumer.run!
62
62
 
63
63
  ## Documentation
64
64
 
65
- - [Getting Started](docs/GETTING_STARTED.md)
66
- - [Production Guide](docs/PRODUCTION.md)
67
- - [Restricted Permissions & Provisioning](docs/RESTRICTED_PERMISSIONS.md)
68
- - [Testing with Mock NATS](docs/TESTING.md)
65
+ - [Getting Started](docs/GETTING_STARTED.md) - Setup, configuration, and basic usage
66
+ - [Architecture & Topology](docs/ARCHITECTURE.md) - Internal architecture, message flow, and patterns
67
+ - [Production Guide](docs/PRODUCTION.md) - Production deployment and monitoring
68
+ - [Restricted Permissions & Provisioning](docs/RESTRICTED_PERMISSIONS.md) - Manual provisioning and security
69
+ - [Testing with Mock NATS](docs/TESTING.md) - Fast, no-infra testing
69
70
 
70
71
  ## License
71
72
 
@@ -0,0 +1,1135 @@
1
+ # Architecture & Topology
2
+
3
+ This document explains the internal architecture, topology patterns, and message flow of JetStream Bridge.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [Core Components](#core-components)
9
+ - [Topology Model](#topology-model)
10
+ - [Subject Naming & Routing](#subject-naming--routing)
11
+ - [Message Flow](#message-flow)
12
+ - [Reliability Patterns](#reliability-patterns)
13
+ - [Consumer Modes](#consumer-modes)
14
+ - [Configuration & Lifecycle](#configuration--lifecycle)
15
+ - [Error Handling](#error-handling)
16
+ - [Thread Safety](#thread-safety)
17
+
18
+ ---
19
+
20
+ ## Overview
21
+
22
+ JetStream Bridge provides reliable, production-ready message passing between Ruby/Rails services using NATS JetStream. The architecture is designed around:
23
+
24
+ 1. **Single stream per application pair** - One JetStream stream handles bidirectional communication
25
+ 2. **Durable consumers** - Each application has a durable consumer (`{app_name}-workers`)
26
+ 3. **Subject-based routing** - Messages routed via subjects: `{source}.sync.{destination}`
27
+ 4. **Optional reliability patterns** - Outbox (publisher), Inbox (consumer), and DLQ (both)
28
+ 5. **Flexible deployment** - Pull or push consumers, with auto or manual provisioning
29
+
30
+ ### Architecture Diagram
31
+
32
+ ```markdown
33
+ ┌─────────────────┐ ┌─────────────────┐
34
+ │ Application │ │ Application │
35
+ │ "api" │ │ "worker" │
36
+ ├─────────────────┤ ├─────────────────┤
37
+ │ │ │ │
38
+ │ Publisher │ │ Publisher │
39
+ │ (Optional │ │ (Optional │
40
+ │ Outbox) │ │ Outbox) │
41
+ │ │ │ │
42
+ └────────┬────────┘ └────────┬────────┘
43
+ │ │
44
+ │ Publish to: │ Publish to:
45
+ │ api.sync.worker │ worker.sync.api
46
+ │ │
47
+ └──────────────┐ ┌────────────┘
48
+ │ │
49
+ ▼ ▼
50
+ ┌──────────────────────┐
51
+ │ NATS JetStream │
52
+ │ │
53
+ │ Stream: "my-stream" │
54
+ │ Subjects: │
55
+ │ - api.sync.worker │
56
+ │ - worker.sync.api │
57
+ │ - api.sync.dlq │
58
+ │ - worker.sync.dlq │
59
+ │ │
60
+ │ Consumers: │
61
+ │ - api-workers │
62
+ │ - worker-workers │
63
+ └──────────────────────┘
64
+ │ │
65
+ ┌──────────────┘ └────────────┐
66
+ │ │
67
+ │ Subscribe to: │ Subscribe to:
68
+ │ worker.sync.api │ api.sync.worker
69
+ │ │
70
+ ┌────────▼────────┐ ┌────────▼────────┐
71
+ │ Consumer │ │ Consumer │
72
+ │ (Optional │ │ (Optional │
73
+ │ Inbox) │ │ Inbox) │
74
+ │ │ │ │
75
+ │ DLQ Handler │ │ DLQ Handler │
76
+ └─────────────────┘ └─────────────────┘
77
+ ```
78
+
79
+ ---
80
+
81
+ ## Core Components
82
+
83
+ ### Connection (`lib/jetstream_bridge/core/connection.rb`)
84
+
85
+ Thread-safe singleton managing NATS connections:
86
+
87
+ - Validates NATS URLs and JetStream availability
88
+ - Automatic reconnection with configurable retry logic
89
+ - Health check API with caching (30s TTL)
90
+ - Reconnect handlers for post-fork scenarios (Puma, Sidekiq)
91
+ - State management: disconnected → connecting → connected → reconnecting
92
+
93
+ **Key Methods:**
94
+
95
+ - `Connection.instance` - Get singleton connection
96
+ - `connection.connect!` - Establish NATS connection
97
+ - `connection.nats` - Access raw NATS client
98
+ - `connection.jetstream` - Access JetStream context
99
+ - `connection.healthy?` - Check connection health
100
+ - `connection.reconnect!` - Force reconnection
101
+
102
+ ### Publisher (`lib/jetstream_bridge/publisher/publisher.rb`)
103
+
104
+ Publishes events to JetStream:
105
+
106
+ - Event envelope construction (event_id, timestamps, schema_version, etc.)
107
+ - Resource ID extraction from payload
108
+ - Optional outbox pattern for transactional guarantees
109
+ - Retry logic with exponential backoff
110
+ - Duplicate detection via NATS message ID header
111
+
112
+ **Usage:**
113
+
114
+ ```ruby
115
+ JetstreamBridge.publish(
116
+ resource_type: "user",
117
+ event_type: "user.created",
118
+ payload: { id: 1, email: "user@example.com" }
119
+ )
120
+ ```
121
+
122
+ **Envelope Structure:**
123
+
124
+ ```json
125
+ {
126
+ "event_id": "uuid",
127
+ "event_type": "user.created",
128
+ "resource_type": "user",
129
+ "resource_id": "1",
130
+ "payload": { "id": 1, "email": "user@example.com" },
131
+ "produced_at": "2024-01-01T00:00:00Z",
132
+ "producer": "api",
133
+ "schema_version": "1.0",
134
+ "trace_id": "trace-uuid"
135
+ }
136
+ ```
137
+
138
+ ### Consumer (`lib/jetstream_bridge/consumer/consumer.rb`)
139
+
140
+ Subscribes to and processes messages:
141
+
142
+ - Durable consumer creation and subscription binding
143
+ - Batch fetching (pull mode) or delivery subject subscription (push mode)
144
+ - Message parsing and handler invocation
145
+ - Optional inbox pattern for exactly-once processing
146
+ - Automatic DLQ routing for unrecoverable errors
147
+ - Graceful shutdown with signal handlers
148
+
149
+ **Usage:**
150
+
151
+ ```ruby
152
+ consumer = JetstreamBridge::Consumer.new do |event|
153
+ User.upsert({
154
+ id: event.resource_id,
155
+ email: event.payload["email"]
156
+ })
157
+ end
158
+
159
+ consumer.run! # Blocks and processes messages
160
+ ```
161
+
162
+ ### Provisioner (`lib/jetstream_bridge/provisioner.rb`)
163
+
164
+ Creates and updates JetStream streams and consumers:
165
+
166
+ - Stream creation with work-queue retention
167
+ - Subject management and overlap detection
168
+ - Consumer creation with delivery policies
169
+ - Idempotent operations (safe to re-run)
170
+ - Can run at deploy-time with admin credentials or at runtime
171
+
172
+ **Stream Configuration:**
173
+
174
+ ```ruby
175
+ {
176
+ name: "jetstream-bridge-stream",
177
+ retention: :workqueue,
178
+ storage: :file,
179
+ subjects: [
180
+ "api.sync.worker",
181
+ "worker.sync.api",
182
+ "api.sync.dlq",
183
+ "worker.sync.dlq"
184
+ ]
185
+ }
186
+ ```
187
+
188
+ **Consumer Configuration:**
189
+
190
+ ```ruby
191
+ {
192
+ durable_name: "api-workers",
193
+ filter_subject: "worker.sync.api",
194
+ ack_policy: :explicit,
195
+ deliver_policy: :all,
196
+ max_deliver: 5,
197
+ ack_wait: 30_000_000_000, # 30s in nanoseconds
198
+ backoff: [1_000_000_000, 5_000_000_000, ...]
199
+ }
200
+ ```
201
+
202
+ ### Topology (`lib/jetstream_bridge/topology/topology.rb`)
203
+
204
+ Orchestrates stream and consumer provisioning:
205
+
206
+ - Overlap guard prevents subject conflicts between streams
207
+ - Stream support retries when conflicts occur
208
+ - Validates and normalizes subjects
209
+ - Coordinates provisioning across components
210
+
211
+ ---
212
+
213
+ ## Topology Model
214
+
215
+ ### Stream Structure
216
+
217
+ **One stream per application pair** (or shared stream for multiple apps):
218
+
219
+ ```markdown
220
+ Stream: "jetstream-bridge-stream"
221
+ ├── Subjects:
222
+ │ ├── api.sync.worker (api publishes, worker consumes)
223
+ │ ├── worker.sync.api (worker publishes, api consumes)
224
+ │ ├── api.sync.dlq (api's dead letter queue)
225
+ │ └── worker.sync.dlq (worker's dead letter queue)
226
+
227
+ ├── Retention: workqueue (messages deleted after ack)
228
+ ├── Storage: file (persistent on disk)
229
+ └── Consumers:
230
+ ├── api-workers (filters: worker.sync.api)
231
+ └── worker-workers (filters: api.sync.worker)
232
+ ```
233
+
234
+ ### Subject Pattern
235
+
236
+ **IMPORTANT:** `app_name` should not include environment identifiers (e.g., use "api" not "api-production"). Consumer names are shared across environments.
237
+
238
+ #### Source Subject (Publisher)
239
+
240
+ ```ruby
241
+ {app_name}.sync.{destination_app}
242
+ ```
243
+
244
+ Example: `api.sync.worker`
245
+
246
+ #### Destination Subject (Consumer)
247
+
248
+ ```ruby
249
+ {destination_app}.sync.{app_name}
250
+ ```
251
+
252
+ Example: `worker.sync.api` (reverse of source)
253
+
254
+ #### DLQ Subject
255
+
256
+ ```ruby
257
+ {app_name}.sync.dlq
258
+ ```
259
+
260
+ Example: `api.sync.dlq`
261
+
262
+ ### Consumer Naming
263
+
264
+ **Durable consumer name:**
265
+
266
+ ```ruby
267
+ {app_name}-workers
268
+ ```
269
+
270
+ Example: `api-workers`
271
+
272
+ **Filter subject:**
273
+
274
+ ```ruby
275
+ {destination_app}.sync.{app_name}
276
+ ```
277
+
278
+ Example: `worker.sync.api`
279
+
280
+ ---
281
+
282
+ ## Subject Naming & Routing
283
+
284
+ ### Subject Validation
285
+
286
+ Subjects must:
287
+
288
+ - Not contain NATS wildcards (`*`, `>`)
289
+ - Not contain spaces or control characters
290
+ - Not exceed 255 characters
291
+ - Use valid characters: alphanumeric, hyphen, underscore, period
292
+
293
+ ### Subject Matching
294
+
295
+ JetStream Bridge implements NATS wildcard matching:
296
+
297
+ - `*` - Matches exactly one token (e.g., `api.*.worker` matches `api.sync.worker`)
298
+ - `>` - Matches one or more tokens (e.g., `api.>` matches `api.sync.worker`)
299
+
300
+ ### Overlap Detection
301
+
302
+ The `OverlapGuard` prevents subject conflicts:
303
+
304
+ ```ruby
305
+ # Existing stream has: "orders.>"
306
+ # New stream wants: "orders.created"
307
+ # Result: OVERLAP - "orders.created" would be captured by "orders.>"
308
+ ```
309
+
310
+ Overlap detection ensures messages route to exactly one stream.
311
+
312
+ ---
313
+
314
+ ## Message Flow
315
+
316
+ ### Publishing Flow
317
+
318
+ ```markdown
319
+ ┌──────────────────────────────────────────────────────────────┐
320
+ │ 1. Application calls JetstreamBridge.publish(...) │
321
+ └──────────────────────┬───────────────────────────────────────┘
322
+
323
+
324
+ ┌──────────────────────────────────────────────────────────────┐
325
+ │ 2. Publisher builds envelope │
326
+ │ - Generate event_id (UUID) │
327
+ │ - Extract resource_id from payload │
328
+ │ - Add timestamps, producer, schema_version │
329
+ └──────────────────────┬───────────────────────────────────────┘
330
+
331
+
332
+ ┌──────────────────────────────────────────────────────────────┐
333
+ │ 3. [OPTIONAL] Outbox pattern │
334
+ │ - OutboxRepository.persist_pre() │
335
+ │ - State: "publishing" │
336
+ │ - Database transaction commits │
337
+ └──────────────────────┬───────────────────────────────────────┘
338
+
339
+
340
+ ┌──────────────────────────────────────────────────────────────┐
341
+ │ 4. Publish to NATS JetStream │
342
+ │ - Subject: {app_name}.sync.{destination_app} │
343
+ │ - Header: nats-msg-id = event_id (deduplication) │
344
+ │ - Retry with exponential backoff on transient errors │
345
+ └──────────────────────┬───────────────────────────────────────┘
346
+
347
+
348
+ ┌──────────────────────────────────────────────────────────────┐
349
+ │ 5. [OPTIONAL] Outbox update │
350
+ │ - Success: OutboxRepository.persist_success() │
351
+ │ - Failure: OutboxRepository.persist_failure() │
352
+ └──────────────────────┬───────────────────────────────────────┘
353
+
354
+
355
+ ┌──────────────────────────────────────────────────────────────┐
356
+ │ 6. Return PublishResult │
357
+ │ - success: true/false │
358
+ │ - event_id: UUID │
359
+ │ - duplicate: true/false (if seen before) │
360
+ └──────────────────────────────────────────────────────────────┘
361
+ ```
362
+
363
+ ### Consuming Flow
364
+
365
+ ```markdown
366
+ ┌──────────────────────────────────────────────────────────────┐
367
+ │ 1. Application creates Consumer.new { |event| ... } │
368
+ └──────────────────────┬───────────────────────────────────────┘
369
+
370
+
371
+ ┌──────────────────────────────────────────────────────────────┐
372
+ │ 2. SubscriptionManager ensures durable consumer │
373
+ │ - Consumer: {app_name}-workers │
374
+ │ - Filter: {destination_app}.sync.{app_name} │
375
+ │ - Create if not exists (idempotent) │
376
+ └──────────────────────┬───────────────────────────────────────┘
377
+
378
+
379
+ ┌──────────────────────────────────────────────────────────────┐
380
+ │ 3. Subscribe to consumer │
381
+ │ - Pull mode: $JS.API.CONSUMER.MSG.NEXT.{stream}.{durable}│
382
+ │ - Push mode: {delivery_subject} │
383
+ └──────────────────────┬───────────────────────────────────────┘
384
+
385
+
386
+ ┌──────────────────────────────────────────────────────────────┐
387
+ │ 4. Consumer.run! starts main loop │
388
+ │ - Fetch batch of messages │
389
+ │ - Process each message sequentially │
390
+ │ - Idle backoff when no messages (0.05s → 1.0s) │
391
+ └──────────────────────┬───────────────────────────────────────┘
392
+
393
+
394
+ ┌──────────────────────────────────────────────────────────────┐
395
+ │ 5. [OPTIONAL] Inbox deduplication check │
396
+ │ - InboxRepository.find_or_build(event_id) │
397
+ │ - If already processed → skip and ack │
398
+ │ - If new → InboxRepository.persist_pre() │
399
+ │ - State: "processing" │
400
+ └──────────────────────┬───────────────────────────────────────┘
401
+
402
+
403
+ ┌──────────────────────────────────────────────────────────────┐
404
+ │ 6. MessageProcessor.handle_message() │
405
+ │ - Parse JSON envelope → Event object │
406
+ │ - Run middleware chain │
407
+ │ - Call user handler block │
408
+ │ - Return ActionResult (:ack or :nak) │
409
+ └──────────────────────┬───────────────────────────────────────┘
410
+
411
+
412
+ ┌──────────────────────────────────────────────────────────────┐
413
+ │ 7. Error handling │
414
+ │ - Unrecoverable (ArgumentError, TypeError) → DLQ + ack │
415
+ │ - Recoverable (StandardError) → nak with backoff │
416
+ │ - Malformed JSON → DLQ + ack │
417
+ └──────────────────────┬───────────────────────────────────────┘
418
+
419
+
420
+ ┌──────────────────────────────────────────────────────────────┐
421
+ │ 8. [OPTIONAL] Inbox update │
422
+ │ - Success: InboxRepository.persist_post() │
423
+ │ - Failure: InboxRepository.persist_failure() │
424
+ └──────────────────────┬───────────────────────────────────────┘
425
+
426
+
427
+ ┌──────────────────────────────────────────────────────────────┐
428
+ │ 9. Acknowledge message │
429
+ │ - :ack → msg.ack (removes from stream) │
430
+ │ - :nak → msg.nak(delay: backoff) (requeue for retry) │
431
+ └──────────────────────────────────────────────────────────────┘
432
+ ```
433
+
434
+ ---
435
+
436
+ ## Reliability Patterns
437
+
438
+ ### Outbox Pattern (Publisher Side)
439
+
440
+ **Purpose:** Guarantee at-least-once delivery by persisting events to database before publishing.
441
+
442
+ **Configuration:**
443
+
444
+ ```ruby
445
+ config.use_outbox = true
446
+ config.outbox_model = 'JetstreamBridge::OutboxEvent'
447
+ ```
448
+
449
+ **States:**
450
+
451
+ - `pending` - Event queued but not yet published
452
+ - `publishing` - Currently being published to NATS
453
+ - `sent` - Successfully published
454
+ - `failed` - Failed after retries
455
+ - `exception` - Unexpected error
456
+
457
+ **Recovery:**
458
+
459
+ ```ruby
460
+ # Retry failed events via background job
461
+ JetstreamBridge::OutboxEvent.where(status: 'failed').find_each do |event|
462
+ JetstreamBridge.publish(
463
+ event_id: event.event_id,
464
+ resource_type: event.resource_type,
465
+ event_type: event.event_type,
466
+ payload: event.payload
467
+ )
468
+ end
469
+ ```
470
+
471
+ ### Inbox Pattern (Consumer Side)
472
+
473
+ **Purpose:** Guarantee exactly-once processing by tracking received events in database.
474
+
475
+ **Configuration:**
476
+
477
+ ```ruby
478
+ config.use_inbox = true
479
+ config.inbox_model = 'JetstreamBridge::InboxEvent'
480
+ ```
481
+
482
+ **States:**
483
+
484
+ - `received` - Event received but not yet processed
485
+ - `processing` - Currently being processed
486
+ - `processed` - Successfully processed
487
+ - `failed` - Failed processing
488
+
489
+ **Deduplication:**
490
+
491
+ - Uses `event_id` for primary deduplication
492
+ - Falls back to `stream_seq` if event_id not available
493
+ - Database row locking prevents concurrent processing
494
+
495
+ **Example:**
496
+
497
+ ```ruby
498
+ # First delivery
499
+ inbox = InboxRepository.find_or_build(event_id: "abc123")
500
+ inbox.new_record? # => true
501
+ inbox.processed_at # => nil
502
+ # Process message...
503
+
504
+ # Second delivery (redelivery)
505
+ inbox = InboxRepository.find_or_build(event_id: "abc123")
506
+ inbox.new_record? # => false
507
+ inbox.processed_at # => 2024-01-01 00:00:00
508
+ # Skip processing, already done
509
+ ```
510
+
511
+ ### Dead Letter Queue (DLQ)
512
+
513
+ **Purpose:** Route unrecoverable messages to separate subject for manual intervention.
514
+
515
+ **Configuration:**
516
+
517
+ ```ruby
518
+ config.use_dlq = true
519
+ ```
520
+
521
+ **Triggered by:**
522
+
523
+ 1. **Malformed JSON** - Cannot parse event envelope
524
+ 2. **Max deliveries exceeded** - Message failed `config.max_deliver` times
525
+ 3. **Unrecoverable errors** - ArgumentError, TypeError, NameError
526
+
527
+ **DLQ Message Headers:**
528
+
529
+ ```json
530
+ {
531
+ "x-dead-letter": "true",
532
+ "x-dlq-reason": "max_deliveries_exceeded",
533
+ "x-deliveries": "5",
534
+ "x-dlq-context": {
535
+ "event_id": "abc123",
536
+ "error_class": "StandardError",
537
+ "error_message": "Something went wrong",
538
+ "original_subject": "worker.sync.api",
539
+ "stream_sequence": 42,
540
+ "consumer_sequence": 10,
541
+ "timestamp": "2024-01-01T00:00:00Z"
542
+ }
543
+ }
544
+ ```
545
+
546
+ **DLQ Subject:**
547
+
548
+ ```ruby
549
+ {app_name}.sync.dlq
550
+ ```
551
+
552
+ **Monitoring DLQ:**
553
+
554
+ ```bash
555
+ # View DLQ messages
556
+ nats sub 'api.sync.dlq'
557
+
558
+ # Check DLQ consumer
559
+ nats consumer info jetstream-bridge-stream api-dlq-consumer
560
+ ```
561
+
562
+ ### Retry Strategy
563
+
564
+ **Configuration:**
565
+
566
+ ```ruby
567
+ config.max_deliver = 5 # Max retry attempts
568
+ config.ack_wait = '30s' # Time before JetStream redelivers
569
+ config.backoff = ['1s', '5s', '15s', '30s', '60s']
570
+ ```
571
+
572
+ **Backoff Calculation:**
573
+
574
+ - Base delay: 0.5s (transient errors) or 2.0s (other errors)
575
+ - Exponential multiplier: 2^(attempt - 1)
576
+ - Min delay: 1 second
577
+ - Max delay: 60 seconds
578
+
579
+ **Example Timeline:**
580
+
581
+ ```shell
582
+ Attempt 1 → Fail → NAK with delay 1s
583
+ Attempt 2 (1s later) → Fail → NAK with delay 5s
584
+ Attempt 3 (5s later) → Fail → NAK with delay 15s
585
+ Attempt 4 (15s later) → Fail → NAK with delay 30s
586
+ Attempt 5 (30s later) → Fail → NAK with delay 60s
587
+ Attempt 6 (60s later) → Fail → DLQ + ACK (max_deliver exceeded)
588
+ ```
589
+
590
+ ---
591
+
592
+ ## Consumer Modes
593
+
594
+ ### Pull Mode (Default)
595
+
596
+ **Configuration:**
597
+
598
+ ```ruby
599
+ config.consumer_mode = :pull # default
600
+ ```
601
+
602
+ **How it works:**
603
+
604
+ 1. Consumer publishes request to `$JS.API.CONSUMER.MSG.NEXT.{stream}.{durable}`
605
+ 2. JetStream responds with batch of messages (up to `batch_size`)
606
+ 3. Consumer processes messages and requests next batch
607
+
608
+ **Advantages:**
609
+
610
+ - **Backpressure control** - Consumer pulls when ready
611
+ - **Restricted permissions** - No JetStream API access needed at runtime
612
+ - **Scalability** - Multiple workers pull at their own pace
613
+
614
+ **Message Fetch:**
615
+
616
+ ```ruby
617
+ # Pull request
618
+ {
619
+ "batch": 10,
620
+ "max_bytes": 1048576, # 1MB
621
+ "idle_heartbeat": 5000000000 # 5s
622
+ }
623
+ ```
624
+
625
+ **Use cases:**
626
+
627
+ - High-throughput processing
628
+ - Variable processing time per message
629
+ - Restricted production environments
630
+ - Multiple consumer instances
631
+
632
+ ### Push Mode
633
+
634
+ **Configuration:**
635
+
636
+ ```ruby
637
+ config.consumer_mode = :push
638
+ config.delivery_subject = 'worker.sync.api.worker' # optional
639
+ ```
640
+
641
+ **How it works:**
642
+
643
+ 1. JetStream automatically pushes messages to delivery subject
644
+ 2. Consumer subscribes to delivery subject
645
+ 3. Messages arrive as soon as available
646
+
647
+ **Advantages:**
648
+
649
+ - **Lower latency** - No request/response roundtrip
650
+ - **Simpler model** - Fire-and-forget from JetStream side
651
+ - **Good for real-time** - Immediate delivery
652
+
653
+ **Default Delivery Subject:**
654
+
655
+ ```ruby
656
+ {destination_subject}.worker
657
+ ```
658
+
659
+ Example: `worker.sync.api.worker`
660
+
661
+ **Use cases:**
662
+
663
+ - Low-latency requirements
664
+ - Event-driven architectures
665
+ - Moderate message volume
666
+ - Single consumer instance
667
+
668
+ ### Comparison Table
669
+
670
+ | Feature | Pull Mode | Push Mode |
671
+ |---------|-----------|-----------|
672
+ | **Control** | Consumer-driven | Server-driven |
673
+ | **Latency** | Slightly higher (request/response) | Lower (immediate push) |
674
+ | **Backpressure** | Natural (consumer controls fetch) | Requires management |
675
+ | **Permissions** | Works with restricted permissions | Standard permissions |
676
+ | **Scalability** | Better for high throughput | Good for moderate load |
677
+ | **Complexity** | More API calls | Simpler |
678
+ | **Best for** | Batch processing, high volume | Real-time, low volume |
679
+
680
+ ---
681
+
682
+ ## Configuration & Lifecycle
683
+
684
+ ### Configuration Flow
685
+
686
+ ```ruby
687
+ # config/initializers/jetstream_bridge.rb
688
+ JetstreamBridge.configure do |config|
689
+ # Connection
690
+ config.nats_urls = ENV.fetch('NATS_URLS', 'nats://localhost:4222')
691
+ config.stream_name = 'jetstream-bridge-stream'
692
+
693
+ # Application identity (no environment suffix!)
694
+ config.app_name = 'api'
695
+ config.destination_app = 'worker'
696
+
697
+ # Reliability
698
+ config.use_outbox = true
699
+ config.use_inbox = true
700
+ config.use_dlq = true
701
+
702
+ # Consumer tuning
703
+ config.max_deliver = 5
704
+ config.ack_wait = '30s'
705
+ config.backoff = %w[1s 5s 15s 30s 60s]
706
+
707
+ # Consumer mode
708
+ config.consumer_mode = :pull # or :push
709
+
710
+ # Provisioning
711
+ config.auto_provision = true # false for restricted environments
712
+
713
+ # Connection management
714
+ config.connect_retry_attempts = 3
715
+ config.connect_retry_delay = 2
716
+ config.lazy_connect = false
717
+ end
718
+ ```
719
+
720
+ ### Startup Lifecycle
721
+
722
+ **Automatic (Rails):**
723
+
724
+ ```ruby
725
+ # Railtie automatically calls after initialization:
726
+ JetstreamBridge.startup!
727
+ ```
728
+
729
+ **Manual:**
730
+
731
+ ```ruby
732
+ # Non-Rails or custom boot
733
+ JetstreamBridge.startup!
734
+ ```
735
+
736
+ **Lazy Connect:**
737
+
738
+ ```ruby
739
+ config.lazy_connect = true
740
+ # OR
741
+ ENV['JETSTREAM_BRIDGE_DISABLE_AUTOSTART'] = '1'
742
+
743
+ # Connection happens on first publish/subscribe
744
+ ```
745
+
746
+ **Startup Steps:**
747
+
748
+ 1. Validate configuration
749
+ 2. Connect to NATS
750
+ 3. Verify JetStream availability
751
+ 4. Ensure stream topology (if `auto_provision=true`)
752
+ 5. Cache connection for reuse
753
+
754
+ ### Provisioning Modes
755
+
756
+ #### Auto Provisioning (Default)
757
+
758
+ **Configuration:**
759
+
760
+ ```ruby
761
+ config.auto_provision = true
762
+ ```
763
+
764
+ **Behavior:**
765
+
766
+ - Creates streams and consumers at runtime
767
+ - Requires JetStream API permissions
768
+ - Idempotent (safe to re-run)
769
+
770
+ #### Manual Provisioning
771
+
772
+ **Configuration:**
773
+
774
+ ```ruby
775
+ config.auto_provision = false
776
+ ```
777
+
778
+ **Provisioning at deploy time:**
779
+
780
+ ```bash
781
+ # Using rake task with admin credentials
782
+ NATS_URLS=nats://admin:pass@host:4222 \
783
+ bundle exec rake jetstream_bridge:provision
784
+ ```
785
+
786
+ **Benefits:**
787
+
788
+ - Runtime credentials don't need admin permissions
789
+ - Separate provisioning from application lifecycle
790
+ - Better security posture
791
+
792
+ See [RESTRICTED_PERMISSIONS.md](RESTRICTED_PERMISSIONS.md) for details.
793
+
794
+ ### Reconnection Handling
795
+
796
+ **Automatic reconnection:**
797
+
798
+ ```ruby
799
+ # NATS client auto-reconnects on network failures
800
+ # JetstreamBridge preserves JetStream context
801
+ ```
802
+
803
+ **Manual reconnection:**
804
+
805
+ ```ruby
806
+ # After forking (Puma, Sidekiq)
807
+ JetstreamBridge.reconnect!
808
+
809
+ # Example: Puma config
810
+ on_worker_boot do
811
+ JetstreamBridge.reconnect!
812
+ end
813
+ ```
814
+
815
+ ---
816
+
817
+ ## Error Handling
818
+
819
+ ### Error Categories
820
+
821
+ #### Unrecoverable Errors
822
+
823
+ **Types:**
824
+
825
+ - `ArgumentError` - Invalid arguments
826
+ - `TypeError` - Type mismatch
827
+ - `NameError` - Undefined constant/method
828
+
829
+ **Handling:**
830
+
831
+ 1. Log error with full context
832
+ 2. Publish to DLQ with `x-dlq-reason: unrecoverable_error`
833
+ 3. ACK message (remove from stream)
834
+
835
+ #### Recoverable Errors
836
+
837
+ **Types:**
838
+
839
+ - `StandardError` (default)
840
+ - Transient failures (network, timeouts)
841
+ - Retryable business logic errors
842
+
843
+ **Handling:**
844
+
845
+ 1. Log error with delivery count
846
+ 2. NAK message with backoff delay
847
+ 3. JetStream redelivers after delay
848
+ 4. After `max_deliver` attempts → DLQ
849
+
850
+ #### Malformed Messages
851
+
852
+ **Types:**
853
+
854
+ - JSON parse errors
855
+ - Invalid envelope structure
856
+
857
+ **Handling:**
858
+
859
+ 1. Log raw message data
860
+ 2. Publish to DLQ with `x-dlq-reason: malformed_json`
861
+ 3. ACK message (remove from stream)
862
+
863
+ ### Error Context
864
+
865
+ **Logged Information:**
866
+
867
+ ```ruby
868
+ {
869
+ error_class: "StandardError",
870
+ error_message: "Database connection lost",
871
+ event_id: "abc123",
872
+ resource_type: "user",
873
+ event_type: "user.created",
874
+ delivery_count: 3,
875
+ stream_sequence: 42,
876
+ consumer_sequence: 10,
877
+ subject: "worker.sync.api",
878
+ backtrace: [...]
879
+ }
880
+ ```
881
+
882
+ ### Custom Error Handling
883
+
884
+ **Middleware approach:**
885
+
886
+ ```ruby
887
+ class CustomErrorHandler
888
+ def call(event, next_middleware)
889
+ next_middleware.call(event)
890
+ rescue CustomRetryableError => e
891
+ # Return ActionResult with custom delay
892
+ JetstreamBridge::Consumer::ActionResult.new(:nak, delay: 10)
893
+ rescue CustomPermanentError => e
894
+ # Log and move to DLQ
895
+ logger.error("Permanent error: #{e.message}")
896
+ publish_to_custom_dlq(event, e)
897
+ JetstreamBridge::Consumer::ActionResult.new(:ack)
898
+ end
899
+ end
900
+
901
+ consumer.use(CustomErrorHandler.new)
902
+ ```
903
+
904
+ ---
905
+
906
+ ## Thread Safety
907
+
908
+ ### Connection Singleton
909
+
910
+ **Thread-safe initialization:**
911
+
912
+ ```ruby
913
+ @@connection_lock = Mutex.new
914
+
915
+ def self.instance
916
+ return @@connection if @@connection
917
+
918
+ @@connection_lock.synchronize do
919
+ @@connection ||= new
920
+ end
921
+
922
+ @@connection
923
+ end
924
+ ```
925
+
926
+ **Health check cache:**
927
+
928
+ ```ruby
929
+ # Thread-safe cache updates
930
+ @health_cache_lock.synchronize do
931
+ @health_cache = { data: health_data, cached_at: Time.now }
932
+ end
933
+ ```
934
+
935
+ ### Consumer Processing
936
+
937
+ **Single-threaded by design:**
938
+
939
+ - Fetch batch → Process sequentially → Fetch next batch
940
+ - No concurrent message processing within one consumer instance
941
+ - Multiple consumer instances for parallelism
942
+
943
+ **Inbox Row Locking:**
944
+
945
+ ```ruby
946
+ # Prevents concurrent processing of same event_id
947
+ InboxEvent.lock.find_or_create_by!(event_id: event.event_id) do |inbox|
948
+ inbox.status = 'processing'
949
+ end
950
+ ```
951
+
952
+ ### Publisher
953
+
954
+ **Thread-safe publishing:**
955
+
956
+ - No global state mutation
957
+ - Independent envelope generation per call
958
+ - Outbox uses AR transactions for atomicity
959
+
960
+ **Concurrent publishing:**
961
+
962
+ ```ruby
963
+ # Safe to call from multiple threads
964
+ threads = 10.times.map do |i|
965
+ Thread.new do
966
+ JetstreamBridge.publish(
967
+ resource_type: 'user',
968
+ event_type: 'user.created',
969
+ payload: { id: i }
970
+ )
971
+ end
972
+ end
973
+
974
+ threads.each(&:join)
975
+ ```
976
+
977
+ ### Best Practices
978
+
979
+ 1. **One consumer per process** - Avoid multiple consumer loops in one process
980
+ 2. **Fork safety** - Call `JetstreamBridge.reconnect!` after forking
981
+ 3. **Database connections** - ActiveRecord handles connection pooling
982
+ 4. **Signal handling** - Consumer handles INT/TERM for graceful shutdown
983
+
984
+ ---
985
+
986
+ ## Performance Considerations
987
+
988
+ ### Batch Size
989
+
990
+ **Pull mode:**
991
+
992
+ ```ruby
993
+ consumer = JetstreamBridge::Consumer.new(batch_size: 10) do |event|
994
+ # Process event
995
+ end
996
+ ```
997
+
998
+ **Trade-offs:**
999
+
1000
+ - **Small batch (1-5):** Lower latency, more API calls
1001
+ - **Medium batch (10-50):** Balanced latency and throughput
1002
+ - **Large batch (50+):** Higher throughput, risk of processing timeouts
1003
+
1004
+ ### Idle Backoff
1005
+
1006
+ **Exponential backoff when no messages:**
1007
+
1008
+ ```markdown
1009
+ 0.05s → 0.1s → 0.2s → 0.4s → 0.8s → 1.0s (max)
1010
+ ```
1011
+
1012
+ **Benefit:** Reduces CPU and network usage during idle periods
1013
+
1014
+ ### Connection Pooling
1015
+
1016
+ **Single connection per process:**
1017
+
1018
+ - NATS client maintains connection pool internally
1019
+ - JetStream context cached for reuse
1020
+ - No need for application-level pooling
1021
+
1022
+ ### Memory Management
1023
+
1024
+ **Long-running consumers:**
1025
+
1026
+ - Periodic health checks every 10 minutes
1027
+ - Memory monitoring can be added via middleware
1028
+ - Graceful shutdown prevents memory leaks
1029
+
1030
+ ---
1031
+
1032
+ ## Observability
1033
+
1034
+ ### Health Checks
1035
+
1036
+ ```ruby
1037
+ health = JetstreamBridge.health_check(skip_cache: false)
1038
+
1039
+ {
1040
+ healthy: true,
1041
+ connection: {
1042
+ status: "connected",
1043
+ servers: ["nats://localhost:4222"],
1044
+ connected_at: "2024-01-01T00:00:00Z"
1045
+ },
1046
+ jetstream: {
1047
+ streams: 1,
1048
+ consumers: 2,
1049
+ memory_bytes: 104857600,
1050
+ storage_bytes: 1073741824
1051
+ },
1052
+ config: {
1053
+ stream_name: "jetstream-bridge-stream",
1054
+ app_name: "api",
1055
+ destination_app: "worker",
1056
+ use_outbox: true,
1057
+ use_inbox: true,
1058
+ use_dlq: true
1059
+ },
1060
+ performance: {
1061
+ message_processing_time_ms: 45.2,
1062
+ last_health_check_ms: 12.5
1063
+ }
1064
+ }
1065
+ ```
1066
+
1067
+ ### Logging
1068
+
1069
+ **Structured logging:**
1070
+
1071
+ ```ruby
1072
+ # Publisher
1073
+ INFO [JetstreamBridge::Publisher] Published api.sync.worker event_id=abc123
1074
+ DEBUG [JetstreamBridge::Publisher] Envelope: {...}
1075
+
1076
+ # Consumer
1077
+ INFO [JetstreamBridge::Consumer] Processing message event_id=abc123
1078
+ WARN [JetstreamBridge::Consumer] Retry 3/5 for event_id=abc123
1079
+ ERROR [JetstreamBridge::Consumer] Unrecoverable error: ArgumentError
1080
+ ```
1081
+
1082
+ ### Metrics Points
1083
+
1084
+ **Consider tracking:**
1085
+
1086
+ - Message publish rate and latency
1087
+ - Message processing rate and latency
1088
+ - Error rates by type
1089
+ - DLQ message count
1090
+ - Inbox/outbox table sizes
1091
+ - Consumer lag (JetStream consumer info)
1092
+
1093
+ **Example with middleware:**
1094
+
1095
+ ```ruby
1096
+ class MetricsMiddleware
1097
+ def call(event, next_middleware)
1098
+ start = Time.now
1099
+ result = next_middleware.call(event)
1100
+ duration = Time.now - start
1101
+
1102
+ StatsD.increment('jetstream.messages.processed')
1103
+ StatsD.histogram('jetstream.processing_time', duration)
1104
+
1105
+ result
1106
+ rescue => e
1107
+ StatsD.increment('jetstream.messages.failed', tags: ["error:#{e.class}"])
1108
+ raise
1109
+ end
1110
+ end
1111
+ ```
1112
+
1113
+ ---
1114
+
1115
+ ## Best Practices
1116
+
1117
+ 1. **App name without environment** - Use "api" not "api-production" for consumer name consistency
1118
+ 2. **Idempotent handlers** - Design handlers to be safely retried
1119
+ 3. **Enable outbox in production** - Prevents message loss on crashes
1120
+ 4. **Enable inbox for critical flows** - Guarantees exactly-once processing
1121
+ 5. **Monitor DLQ** - Set up alerts for messages in dead letter queue
1122
+ 6. **Provision separately** - Use manual provisioning in locked-down environments
1123
+ 7. **Health check endpoint** - Expose `JetstreamBridge.health_check` for monitoring
1124
+ 8. **Graceful shutdown** - Consumer handles signals automatically
1125
+ 9. **Test with Mock NATS** - Fast, no-infra testing (see [TESTING.md](TESTING.md))
1126
+ 10. **Tune batch size** - Balance latency vs throughput for your workload
1127
+
1128
+ ---
1129
+
1130
+ ## Next Steps
1131
+
1132
+ - [Getting Started Guide](GETTING_STARTED.md) - Basic setup and usage
1133
+ - [Production Guide](PRODUCTION.md) - Production deployment patterns
1134
+ - [Restricted Permissions](RESTRICTED_PERMISSIONS.md) - Manual provisioning and security
1135
+ - [Testing Guide](TESTING.md) - Testing with Mock NATS
@@ -6,7 +6,7 @@ This guide covers installation, Rails setup, configuration, and basic publish/co
6
6
 
7
7
  ```ruby
8
8
  # Gemfile
9
- gem "jetstream_bridge", "~> 4.5"
9
+ gem "jetstream_bridge", "~> 5.0"
10
10
  ```
11
11
 
12
12
  ```bash
@@ -662,7 +662,7 @@ subscribe: { allow: ["pwas.>"] }
662
662
 
663
663
  If you have any influence over NATS permissions, you have two options:
664
664
 
665
- ### Option 1: Pull Consumers (Default)
665
+ ### Option 1: Pull Consumers (Default) - Summary
666
666
 
667
667
  Request minimal JetStream API permissions:
668
668
 
@@ -16,7 +16,9 @@ JetstreamBridge.configure do |config|
16
16
  # Stream name (required) - managed separately from runtime credentials
17
17
  config.stream_name = ENV.fetch('JETSTREAM_STREAM_NAME', 'jetstream-bridge-stream')
18
18
 
19
- # Application name (used in subject routing)
19
+ # Application name (used in subject routing and consumer naming)
20
+ # IMPORTANT: Do not include environment identifiers (e.g., use "myapp" not "myapp-production")
21
+ # Consumer names are shared across environments for the same application
20
22
  config.app_name = ENV.fetch('APP_NAME', Rails.application.class.module_parent_name.underscore)
21
23
 
22
24
  # Destination app for cross-app sync (REQUIRED for publishing/consuming)
@@ -288,7 +288,7 @@ module JetstreamBridge
288
288
  # Push subscriptions don't have a fetch method, so we use next_msg
289
289
  messages = []
290
290
  @batch_size.times do
291
- msg = @psub.next_msg(FETCH_TIMEOUT_SECS)
291
+ msg = @psub.next_msg(timeout: FETCH_TIMEOUT_SECS)
292
292
  messages << msg if msg
293
293
  rescue NATS::Timeout, NATS::IO::Timeout
294
294
  break
@@ -8,6 +8,10 @@ module JetstreamBridge
8
8
  # Holds all configuration settings including NATS connection details,
9
9
  # application identifiers, reliability features, and consumer tuning.
10
10
  #
11
+ # IMPORTANT: app_name should not include environment identifiers
12
+ # (e.g., use "api" not "api-production") as consumer names are
13
+ # shared across environments for the same application.
14
+ #
11
15
  # @example Basic configuration
12
16
  # JetstreamBridge.configure do |config|
13
17
  # config.nats_urls = "nats://localhost:4222"
@@ -54,7 +58,8 @@ module JetstreamBridge
54
58
  # JetStream stream name (required)
55
59
  # @return [String]
56
60
  attr_accessor :stream_name
57
- # Application name for subject routing
61
+ # Application name for subject routing and consumer naming.
62
+ # Should not include environment identifiers (e.g., use "api" not "api-production").
58
63
  # @return [String]
59
64
  attr_accessor :app_name
60
65
  # Maximum delivery attempts before moving to DLQ
@@ -193,6 +198,15 @@ module JetstreamBridge
193
198
 
194
199
  # Get the durable consumer name for this application.
195
200
  #
201
+ # Returns the app_name with "-workers" suffix. Consumer names are
202
+ # shared across environments, so app_name should not include
203
+ # environment identifiers (e.g., use "myapp" not "myapp-production").
204
+ #
205
+ # @return [String] Durable consumer name
206
+ # @example
207
+ # config.app_name = "notifications"
208
+ # config.durable_name # => "notifications-workers"
209
+ #
196
210
  def durable_name
197
211
  "#{app_name}-workers"
198
212
  end
@@ -4,5 +4,5 @@
4
4
  #
5
5
  # Version constant for the gem.
6
6
  module JetstreamBridge
7
- VERSION = '4.6.1'
7
+ VERSION = '5.0.1'
8
8
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: jetstream_bridge
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.6.1
4
+ version: 5.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mike Attara
@@ -121,6 +121,7 @@ files:
121
121
  - CHANGELOG.md
122
122
  - LICENSE
123
123
  - README.md
124
+ - docs/ARCHITECTURE.md
124
125
  - docs/GETTING_STARTED.md
125
126
  - docs/PRODUCTION.md
126
127
  - docs/RESTRICTED_PERMISSIONS.md