sidekiq-assured-jobs 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 213608148534e316bf24588d51b181572130d503b664db631ce2a0bb5be53f1a
4
+ data.tar.gz: 5e158bcc84da84d90ed15918ba1216ad6eddcf555aa3f9ca923bf840551e57ad
5
+ SHA512:
6
+ metadata.gz: e63bf1b133743c43def063eede4e64a94a19d10ab90775f5cf0b36e0cdf9a50dc0597c78e349a4df2c650119b02db9cc8fb165b911ffff4af5cdd7a570df249a
7
+ data.tar.gz: f41c16f24cf7273f41a76f39c301f4398ed989496355cbe37097ebe7fedf03b798bf623e48c632026d46996caf9e41cf134a19fb6479e914365e026f47df70e3
data/CHANGELOG.md ADDED
@@ -0,0 +1,52 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.0.0] - 2025-06-20
9
+
10
+ ### Added
11
+ - **Initial Release**: First public release of sidekiq-assured-jobs
12
+ - **Job Assurance**: Guarantees that critical Sidekiq jobs are never lost due to worker crashes
13
+ - **Automatic Recovery**: Detects and re-enqueues orphaned jobs from crashed workers
14
+ - **Delayed Recovery System**: Configurable additional recovery passes for enhanced reliability
15
+ - **Production Ready**: Designed for high-throughput production environments
16
+ - **Zero Configuration**: Works out of the box with sensible defaults
17
+ - **Sidekiq Integration**: Uses Sidekiq's existing Redis connection pool
18
+ - **Distributed Locking**: Prevents duplicate recovery operations
19
+ - **Minimal Overhead**: Lightweight tracking with configurable heartbeat intervals
20
+
21
+ ### Features
22
+ - **Job Tracking**: Tracks in-flight jobs in Redis with instance identification
23
+ - **Heartbeat System**: Monitors worker instance health with configurable intervals
24
+ - **Orphan Recovery**: Automatically re-enqueues jobs from crashed instances
25
+ - **Background Recovery Threads**: Automatic spawning of delayed recovery threads after startup
26
+ - **Configurable Safety Net**: Multiple recovery passes to catch edge cases and network partition scenarios
27
+ - **Selective Tracking**: Only tracks workers that include the AssuredJobs::Worker module
28
+ - **SidekiqUniqueJobs Integration**: Automatically handles unique job lock clearing
29
+ - **Flexible Redis Configuration**: Uses Sidekiq's Redis by default, supports custom Redis
30
+
31
+ ### Configuration Options
32
+ - `ASSURED_JOBS_INSTANCE_ID`: Unique instance identifier (auto-generated if not set)
33
+ - `ASSURED_JOBS_NS`: Redis namespace for keys (default: "sidekiq_assured_jobs")
34
+ - `ASSURED_JOBS_HEARTBEAT_INTERVAL`: Seconds between heartbeats (default: 15)
35
+ - `ASSURED_JOBS_HEARTBEAT_TTL`: Instance timeout threshold (default: 45)
36
+ - `ASSURED_JOBS_RECOVERY_LOCK_TTL`: Recovery operation lock duration (default: 300)
37
+ - `ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL`: Seconds between delayed recovery passes (default: 300)
38
+ - `ASSURED_JOBS_DELAYED_RECOVERY_COUNT`: Number of delayed recovery passes to run (default: 1)
39
+
40
+ ### Dependencies
41
+ - Ruby >= 2.6.0
42
+ - Sidekiq >= 6.0, < 7
43
+ - Redis ~> 4.0
44
+
45
+ ### Breaking Changes from sidekiq-processing-tracker
46
+ - **Gem Name**: Changed from `sidekiq-processing-tracker` to `sidekiq-assured-jobs`
47
+ - **Module Name**: Changed from `Sidekiq::ProcessingTracker` to `Sidekiq::AssuredJobs`
48
+ - **Worker Mixin**: Changed from `ProcessingTracker::Worker` to `AssuredJobs::Worker`
49
+ - **Sidekiq Option**: Changed from `processing: true` to `assured_jobs: true`
50
+ - **Environment Variables**: All prefixed with `ASSURED_JOBS_` instead of `PROCESSING_`
51
+ - **Default Namespace**: Changed from `sidekiq_processing` to `sidekiq_assured_jobs`
52
+ - **Logging**: Reduced verbose logging for production use
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Sidekiq Assured Jobs Team
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,475 @@
1
+ # Sidekiq Assured Jobs
2
+
3
+ Reliable job execution guarantee for Sidekiq with automatic orphan recovery.
4
+
5
+ ## Overview
6
+
7
+ Sidekiq Assured Jobs ensures that your critical Sidekiq jobs are never lost due to worker crashes, pod restarts, or unexpected shutdowns. It provides a robust tracking system that monitors in-flight jobs and automatically recovers any work that was interrupted.
8
+
9
+ **Perfect for:**
10
+ - Critical business processes that cannot be lost
11
+ - Financial transactions and payment processing
12
+ - Data synchronization and ETL operations
13
+ - Email delivery and notification systems
14
+ - Any job where reliability is paramount
15
+
16
+ ## Key Features
17
+
18
+ - **🛡️ Job Assurance**: Guarantees that tracked jobs will complete or be automatically retried
19
+ - **🔄 Automatic Recovery**: Detects and re-enqueues orphaned jobs from crashed workers
20
+ - **⏰ Delayed Recovery**: Configurable additional recovery passes for enhanced reliability
21
+ - **⚡ Zero Configuration**: Works out of the box with sensible defaults
22
+ - **🏗️ Production Ready**: Designed for high-throughput production environments
23
+ - **🔗 Sidekiq Integration**: Uses Sidekiq's existing Redis connection pool
24
+ - **🔒 Distributed Locking**: Prevents duplicate recovery operations
25
+ - **📊 Minimal Overhead**: Lightweight tracking with configurable heartbeat intervals
26
+
27
+ ## The Problem
28
+
29
+ When Sidekiq workers crash or are forcefully terminated (SIGKILL), jobs that were being processed are lost forever:
30
+
31
+ ```mermaid
32
+ sequenceDiagram
33
+ participant Client
34
+ participant Queue as Sidekiq Queue
35
+ participant Worker as Worker Process
36
+ participant Redis
37
+
38
+ Client->>Queue: Enqueue Critical Job
39
+ Queue->>Worker: Fetch Job
40
+ Worker->>Redis: Job starts processing
41
+ Note over Worker: Worker crashes (SIGKILL)
42
+ Worker--xRedis: Job lost forever
43
+ Note over Queue: No retry, no error handling
44
+ Note over Client: Critical work never completed
45
+ ```
46
+
47
+ ## The Solution
48
+
49
+ Sidekiq Assured Jobs tracks in-flight jobs and automatically recovers them:
50
+
51
+ ```mermaid
52
+ graph TB
53
+ subgraph Cluster["Production Environment"]
54
+ subgraph W1["Worker Instance 1"]
55
+ SW1[Sidekiq Worker]
56
+ HB1[Heartbeat]
57
+ MW1[Tracking Middleware]
58
+ end
59
+
60
+ subgraph W2["Worker Instance 2"]
61
+ SW2[Sidekiq Worker]
62
+ HB2[Heartbeat]
63
+ MW2[Tracking Middleware]
64
+ end
65
+ end
66
+
67
+ subgraph Redis["Redis Storage"]
68
+ HK["Heartbeats<br/>instance:worker-1<br/>instance:worker-2"]
69
+ JT["Job Tracking<br/>jobs:worker-1<br/>jobs:worker-2"]
70
+ JP["Job Payloads<br/>job:abc123<br/>job:def456"]
71
+ RL["Recovery Lock"]
72
+ end
73
+
74
+ Queue[Sidekiq Queue]
75
+
76
+ HB1 -->|Every 15s| HK
77
+ HB2 -->|Every 15s| HK
78
+
79
+ MW1 -->|Track Start/End| JT
80
+ MW1 -->|Store Payload| JP
81
+ MW2 -->|Track Start/End| JT
82
+ MW2 -->|Store Payload| JP
83
+
84
+ SW2 -->|On Startup| RL
85
+ SW2 -->|Detect Orphans| JT
86
+ SW2 -->|Re-enqueue| Queue
87
+
88
+ style HK fill:#e8f5e8
89
+ style JT fill:#fff3e0
90
+ style JP fill:#e3f2fd
91
+ style RL fill:#ffebee
92
+ ```
93
+
94
+ ## Installation
95
+
96
+ Add this line to your application's Gemfile:
97
+
98
+ ```ruby
99
+ gem 'sidekiq-assured-jobs'
100
+ ```
101
+
102
+ And then execute:
103
+
104
+ ```bash
105
+ bundle install
106
+ ```
107
+
108
+ ## Quick Start
109
+
110
+ ### 1. Basic Setup
111
+
112
+ The gem auto-configures itself when required:
113
+
114
+ ```ruby
115
+ # In your application (e.g., config/application.rb or config/initializers/sidekiq.rb)
116
+ require 'sidekiq-assured-jobs'
117
+ ```
118
+
119
+ ### 2. Enable Job Tracking
120
+
121
+ Include the `AssuredJobs::Worker` module in workers you want to track:
122
+
123
+ ```ruby
124
+ class PaymentProcessor
125
+ include Sidekiq::Worker
126
+ include Sidekiq::AssuredJobs::Worker # Enables job assurance
127
+
128
+ def perform(payment_id, amount)
129
+ # This job will be tracked and recovered if the worker crashes
130
+ process_payment(payment_id, amount)
131
+ end
132
+ end
133
+
134
+ class LogCleanupWorker
135
+ include Sidekiq::Worker
136
+ # No AssuredJobs::Worker - not tracked (fine for non-critical work)
137
+
138
+ def perform
139
+ # This job won't be tracked
140
+ cleanup_old_logs
141
+ end
142
+ end
143
+ ```
144
+
145
+ ### 3. That's It!
146
+
147
+ Your critical jobs are now protected. If a worker crashes while processing a tracked job, another worker will automatically detect and re-enqueue it.
148
+
149
+ ## Configuration
150
+
151
+ The gem works with zero configuration but provides extensive customization options. See the [Complete Configuration Reference](#complete-configuration-reference) below for all available options.
152
+
153
+
154
+
155
+ ## Complete Configuration Reference
156
+
157
+ ### Core Configuration Options
158
+
159
+ | Option | Environment Variable | Default | Description |
160
+ |--------|---------------------|---------|-------------|
161
+ | `instance_id` | `ASSURED_JOBS_INSTANCE_ID` | Auto-generated | Unique identifier for this worker instance |
162
+ | `namespace` | `ASSURED_JOBS_NS` | `sidekiq_assured_jobs` | Redis namespace for all keys |
163
+ | `heartbeat_interval` | `ASSURED_JOBS_HEARTBEAT_INTERVAL` | `15` | Seconds between heartbeat updates |
164
+ | `heartbeat_ttl` | `ASSURED_JOBS_HEARTBEAT_TTL` | `45` | Seconds before instance considered dead |
165
+ | `recovery_lock_ttl` | `ASSURED_JOBS_RECOVERY_LOCK_TTL` | `300` | Seconds to hold recovery lock |
166
+ | `delayed_recovery_interval` | `ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL` | `300` | Seconds between delayed recovery passes |
167
+ | `delayed_recovery_count` | `ASSURED_JOBS_DELAYED_RECOVERY_COUNT` | `1` | Number of delayed recovery passes to run |
168
+
169
+ ### Configuration Methods
170
+
171
+ #### Environment Variables (Recommended for Production)
172
+ ```bash
173
+ export ASSURED_JOBS_INSTANCE_ID="worker-pod-1"
174
+ export ASSURED_JOBS_NS="myapp_assured_jobs"
175
+ export ASSURED_JOBS_HEARTBEAT_INTERVAL="30"
176
+ export ASSURED_JOBS_HEARTBEAT_TTL="90"
177
+ export ASSURED_JOBS_RECOVERY_LOCK_TTL="600"
178
+ export ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL="600"
179
+ export ASSURED_JOBS_DELAYED_RECOVERY_COUNT="2"
180
+ ```
181
+
182
+ #### Programmatic Configuration
183
+ ```ruby
184
+ Sidekiq::AssuredJobs.configure do |config|
185
+ config.namespace = "myapp_assured_jobs"
186
+ config.heartbeat_interval = 30
187
+ config.heartbeat_ttl = 90
188
+ config.recovery_lock_ttl = 600
189
+ config.delayed_recovery_interval = 600
190
+ config.delayed_recovery_count = 2
191
+
192
+ # Advanced: Custom Redis configuration
193
+ config.redis_options = {
194
+ url: ENV['ASSURED_JOBS_REDIS_URL'],
195
+ db: 2,
196
+ timeout: 5
197
+ }
198
+ end
199
+ ```
200
+
201
+ ### Configuration Guidelines
202
+
203
+ #### Heartbeat Settings
204
+ - **`heartbeat_interval`**: How often workers send "I'm alive" signals
205
+ - Lower values = faster orphan detection, higher Redis load
206
+ - Recommended: 15-30 seconds for production
207
+ - **`heartbeat_ttl`**: How long to wait before considering an instance dead
208
+ - Should be 2-3x the heartbeat interval
209
+ - Accounts for network delays and Redis latency
210
+
211
+ #### Recovery Settings
212
+ - **`recovery_lock_ttl`**: How long one instance holds the recovery lock
213
+ - Prevents multiple instances from recovering the same jobs
214
+ - Should be longer than expected recovery time
215
+ - **`delayed_recovery_interval`**: Time between additional recovery passes
216
+ - Provides safety net for missed orphans
217
+ - Recommended: 5-10 minutes for most applications
218
+ - **`delayed_recovery_count`**: Number of additional recovery attempts
219
+ - Balance between reliability and resource usage
220
+ - Recommended: 1-3 passes for most applications
221
+
222
+ ### Production Recommendations
223
+
224
+ #### High-Availability Setup
225
+ ```ruby
226
+ Sidekiq::AssuredJobs.configure do |config|
227
+ config.namespace = "#{Rails.application.class.module_parent_name.downcase}_assured_jobs"
228
+ config.heartbeat_interval = 30 # Balanced load vs detection speed
229
+ config.heartbeat_ttl = 90 # 3x heartbeat interval
230
+ config.recovery_lock_ttl = 900 # 15 minutes for large recovery operations
231
+ config.delayed_recovery_interval = 600 # 10 minutes between passes
232
+ config.delayed_recovery_count = 2 # Two additional safety passes
233
+ end
234
+ ```
235
+
236
+ #### Resource-Constrained Environment
237
+ ```ruby
238
+ Sidekiq::AssuredJobs.configure do |config|
239
+ config.heartbeat_interval = 60 # Reduce Redis load
240
+ config.heartbeat_ttl = 180 # 3x heartbeat interval
241
+ config.delayed_recovery_count = 1 # Single delayed pass
242
+ end
243
+ ```
244
+
245
+ #### Critical Systems (Maximum Reliability)
246
+ ```ruby
247
+ Sidekiq::AssuredJobs.configure do |config|
248
+ config.heartbeat_interval = 15 # Fast orphan detection
249
+ config.heartbeat_ttl = 45 # Quick failure detection
250
+ config.delayed_recovery_interval = 300 # 5 minutes between passes
251
+ config.delayed_recovery_count = 3 # Three additional passes
252
+ end
253
+ ```
254
+
255
+ ### Delayed Recovery System
256
+
257
+ In addition to immediate orphan recovery on startup, the gem provides a configurable delayed recovery system for enhanced reliability:
258
+
259
+ ```ruby
260
+ Sidekiq::AssuredJobs.configure do |config|
261
+ # Run 2 additional recovery passes, 10 minutes apart
262
+ config.delayed_recovery_count = 2
263
+ config.delayed_recovery_interval = 600 # 10 minutes
264
+ end
265
+ ```
266
+
267
+ **How Delayed Recovery Works:**
268
+
269
+ 1. **Immediate Recovery**: On startup, each worker instance performs immediate orphan recovery
270
+ 2. **Delayed Passes**: After startup, a background thread runs additional recovery passes
271
+ 3. **Configurable Timing**: Control both the interval between passes and total number of passes
272
+ 4. **Error Resilience**: Each delayed recovery pass is wrapped in error handling to prevent thread crashes
273
+
274
+ **Benefits:**
275
+ - **Enhanced Reliability**: Catches jobs that might be missed during startup recovery
276
+ - **Network Partition Recovery**: Handles cases where Redis connectivity issues cause temporary orphaning
277
+ - **Race Condition Mitigation**: Provides additional safety net for edge cases
278
+ - **Zero Application Impact**: Runs in background threads without affecting job processing
279
+
280
+ **Use Cases:**
281
+ - **High-Availability Systems**: Where maximum job recovery reliability is critical
282
+ - **Network-Unstable Environments**: Where Redis connectivity might be intermittent
283
+ - **Large-Scale Deployments**: Where startup recovery might miss some edge cases
284
+
285
+ ## Advanced Features
286
+
287
+ ### Redis Integration
288
+
289
+ The gem provides flexible Redis integration options:
290
+
291
+ #### Default Configuration (Recommended)
292
+ By default, the gem uses Sidekiq's existing Redis connection pool:
293
+
294
+ ```ruby
295
+ # Uses Sidekiq's Redis configuration automatically
296
+ Sidekiq::AssuredJobs.configure do |config|
297
+ config.namespace = "my_app_assured_jobs"
298
+ end
299
+ ```
300
+
301
+ #### Custom Redis Configuration (Advanced)
302
+ For advanced use cases requiring Redis isolation:
303
+
304
+ ```ruby
305
+ Sidekiq::AssuredJobs.configure do |config|
306
+ config.namespace = "my_app_assured_jobs"
307
+ config.redis_options = {
308
+ url: ENV['ASSURED_JOBS_REDIS_URL'],
309
+ db: 2,
310
+ timeout: 5
311
+ }
312
+ end
313
+ ```
314
+
315
+ ### Benefits
316
+ - **Connection Efficiency**: Reuses Sidekiq's connection pool by default
317
+ - **Custom Namespacing**: Efficient key prefixing without external dependencies
318
+ - **Configuration Consistency**: Inherits Sidekiq's Redis settings
319
+ - **Flexible Options**: Support for custom Redis when needed
320
+
321
+ ### SidekiqUniqueJobs Integration
322
+
323
+ The gem automatically integrates with [sidekiq-unique-jobs](https://github.com/mhenrixon/sidekiq-unique-jobs) to ensure orphaned unique jobs can be recovered immediately:
324
+
325
+ ```ruby
326
+ class UniquePaymentProcessor
327
+ include Sidekiq::Worker
328
+ include Sidekiq::AssuredJobs::Worker
329
+
330
+ sidekiq_options unique: :until_executed
331
+
332
+ def perform(payment_id)
333
+ # This job will be tracked and can be recovered even with unique constraints
334
+ process_payment(payment_id)
335
+ end
336
+ end
337
+ ```
338
+
339
+ **Benefits:**
340
+ - **Immediate Recovery**: Orphaned unique jobs are re-enqueued immediately (no waiting period)
341
+ - **Automatic Detection**: Works seamlessly whether SidekiqUniqueJobs is present or not
342
+ - **Surgical Precision**: Only clears locks for confirmed orphaned jobs
343
+ - **Error Resilience**: Continues operation even if lock clearing fails
344
+
345
+ ## Production Deployment
346
+
347
+ ### Kubernetes Example
348
+
349
+ ```yaml
350
+ apiVersion: apps/v1
351
+ kind: Deployment
352
+ metadata:
353
+ name: sidekiq-workers
354
+ spec:
355
+ replicas: 3
356
+ template:
357
+ spec:
358
+ containers:
359
+ - name: worker
360
+ image: myapp:latest
361
+ env:
362
+ - name: ASSURED_JOBS_INSTANCE_ID
363
+ valueFrom:
364
+ fieldRef:
365
+ fieldPath: metadata.name # Use pod name as instance ID
366
+ - name: ASSURED_JOBS_NS
367
+ value: "myapp_assured_jobs"
368
+ - name: ASSURED_JOBS_HEARTBEAT_INTERVAL
369
+ value: "30"
370
+ - name: ASSURED_JOBS_HEARTBEAT_TTL
371
+ value: "90"
372
+ - name: ASSURED_JOBS_RECOVERY_LOCK_TTL
373
+ value: "600"
374
+ - name: ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL
375
+ value: "600" # 10 minutes
376
+ - name: ASSURED_JOBS_DELAYED_RECOVERY_COUNT
377
+ value: "2"
378
+ ```
379
+
380
+ ### Docker Compose Example
381
+
382
+ ```yaml
383
+ version: '3.8'
384
+ services:
385
+ worker:
386
+ image: myapp:latest
387
+ environment:
388
+ - ASSURED_JOBS_INSTANCE_ID=${HOSTNAME}
389
+ - ASSURED_JOBS_NS=myapp_assured_jobs
390
+ - ASSURED_JOBS_HEARTBEAT_INTERVAL=30
391
+ - ASSURED_JOBS_HEARTBEAT_TTL=90
392
+ - ASSURED_JOBS_RECOVERY_LOCK_TTL=600
393
+ - ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL=600
394
+ - ASSURED_JOBS_DELAYED_RECOVERY_COUNT=2
395
+ deploy:
396
+ replicas: 3
397
+ ```
398
+
399
+ ## How It Works
400
+
401
+ 1. **Instance Registration**: Each worker instance generates a unique ID and sends periodic heartbeats to Redis
402
+ 2. **Job Tracking**: When a tracked job starts, the middleware records the job ID and payload in Redis
403
+ 3. **Job Cleanup**: When a job completes (success or failure), tracking data is removed
404
+ 4. **Immediate Recovery**: On startup, workers check for jobs tracked by dead instances (no recent heartbeat)
405
+ 5. **Safe Recovery**: Using distributed locking, one worker re-enqueues orphaned jobs back to Sidekiq
406
+ 6. **Delayed Recovery**: Background threads run additional recovery passes at configurable intervals
407
+ 7. **Cleanup**: Orphaned tracking data is removed after successful re-enqueuing
408
+
409
+ ## Use Cases
410
+
411
+ ### Financial Services
412
+ ```ruby
413
+ class PaymentProcessor
414
+ include Sidekiq::Worker
415
+ include Sidekiq::AssuredJobs::Worker
416
+
417
+ def perform(payment_id, amount)
418
+ # Critical: Payment must be processed
419
+ process_payment(payment_id, amount)
420
+ end
421
+ end
422
+ ```
423
+
424
+ ### Data Synchronization
425
+ ```ruby
426
+ class DataSyncWorker
427
+ include Sidekiq::Worker
428
+ include Sidekiq::AssuredJobs::Worker
429
+
430
+ def perform(sync_batch_id)
431
+ # Important: Data consistency depends on completion
432
+ sync_data_batch(sync_batch_id)
433
+ end
434
+ end
435
+ ```
436
+
437
+ ### Email Delivery
438
+ ```ruby
439
+ class CriticalEmailWorker
440
+ include Sidekiq::Worker
441
+ include Sidekiq::AssuredJobs::Worker
442
+
443
+ def perform(email_id)
444
+ # Must deliver: Password resets, order confirmations, etc.
445
+ deliver_critical_email(email_id)
446
+ end
447
+ end
448
+ ```
449
+
450
+ ## Testing
451
+
452
+ Run the test suite:
453
+
454
+ ```bash
455
+ bundle exec rspec
456
+ ```
457
+
458
+ ## Dependencies
459
+
460
+ ### Runtime Dependencies
461
+ - `sidekiq` (>= 6.0, < 7)
462
+ - `redis` (~> 4.0)
463
+
464
+ ### Development Dependencies
465
+ - `rspec` (~> 3.0)
466
+ - `bundler` (~> 2.0)
467
+ - `rubocop` (~> 1.0)
468
+
469
+ ## Contributing
470
+
471
+ Bug reports and pull requests are welcome on GitHub at https://github.com/example/sidekiq-assured-jobs.
472
+
473
+ ## License
474
+
475
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Sidekiq
4
+ module AssuredJobs
5
+ class Middleware
6
+ def call(worker, job, queue)
7
+ # Only track jobs that have assured_jobs: true option
8
+ should_track = should_track_job?(worker, job)
9
+
10
+ return yield unless should_track
11
+
12
+ jid = job["jid"]
13
+ instance_id = AssuredJobs.instance_id
14
+ logger = AssuredJobs.logger
15
+
16
+ # Create tracking keys (using custom namespacing)
17
+ job_tracking_key = AssuredJobs.send(:namespaced_key, "jobs:#{instance_id}")
18
+ job_data_key = AssuredJobs.send(:namespaced_key, "job:#{jid}")
19
+
20
+ begin
21
+ # Add job to tracking set and store job payload
22
+ begin
23
+ AssuredJobs.redis_sync do |conn|
24
+ conn.multi do |multi|
25
+ multi.sadd(job_tracking_key, jid)
26
+ multi.set(job_data_key, job.to_json)
27
+ end
28
+ end
29
+ logger.debug "AssuredJobs started tracking job #{jid} on instance #{instance_id}"
30
+ rescue => e
31
+ logger.error "AssuredJobs failed to start tracking job #{jid}: #{e.message}"
32
+ logger.error e.backtrace.join("\n")
33
+ end
34
+
35
+ # Execute the job
36
+ yield
37
+ ensure
38
+ # Remove job from tracking
39
+ begin
40
+ AssuredJobs.redis_sync do |conn|
41
+ conn.multi do |multi|
42
+ multi.srem(job_tracking_key, jid)
43
+ multi.del(job_data_key)
44
+ end
45
+ end
46
+ logger.debug "AssuredJobs stopped tracking job #{jid} on instance #{instance_id}"
47
+ rescue => e
48
+ logger.error "AssuredJobs failed to stop tracking job #{jid}: #{e.message}"
49
+ end
50
+ end
51
+ end
52
+
53
+ private
54
+
55
+ def should_track_job?(worker, job)
56
+ # Check if the worker class has assured_jobs: true option
57
+ worker_class = worker.is_a?(Class) ? worker : worker.class
58
+
59
+ unless worker_class.respond_to?(:sidekiq_options)
60
+ return false
61
+ end
62
+
63
+ options = worker_class.sidekiq_options
64
+ options["assured_jobs"] == true
65
+ end
66
+ end
67
+ end
68
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Sidekiq
4
+ module AssuredJobs
5
+ VERSION = "1.0.0"
6
+ end
7
+ end
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Sidekiq
4
+ module AssuredJobs
5
+ module Worker
6
+ def self.included(base)
7
+ base.extend(ClassMethods)
8
+ base.sidekiq_options assured_jobs: true
9
+ end
10
+
11
+ module ClassMethods
12
+ # Additional class methods can be added here if needed in the future
13
+ end
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,267 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "sidekiq"
4
+ require "redis"
5
+ require "logger"
6
+ require "securerandom"
7
+ require "set"
8
+
9
+ require_relative "sidekiq/assured_jobs/version"
10
+ require_relative "sidekiq/assured_jobs/middleware"
11
+ require_relative "sidekiq/assured_jobs/worker"
12
+
13
+ module Sidekiq
14
+ module AssuredJobs
15
+ class Error < StandardError; end
16
+
17
+ class << self
18
+ attr_accessor :instance_id, :namespace, :heartbeat_interval, :heartbeat_ttl, :recovery_lock_ttl, :logger, :redis_options, :delayed_recovery_count, :delayed_recovery_interval
19
+
20
+ def configure
21
+ yield self if block_given?
22
+ setup_defaults
23
+ setup_sidekiq_hooks
24
+ end
25
+
26
+ def redis(&block)
27
+ if redis_options
28
+ # Use custom Redis configuration if provided
29
+ redis_client = Redis.new(redis_options)
30
+ if block_given?
31
+ result = yield redis_client
32
+ redis_client.close
33
+ result
34
+ else
35
+ redis_client
36
+ end
37
+ else
38
+ # Use Sidekiq's Redis connection pool
39
+ Sidekiq.redis(&block)
40
+ end
41
+ end
42
+
43
+ def redis_sync(&block)
44
+ # Synchronous Redis operations using Sidekiq's pool or custom config
45
+ redis(&block)
46
+ end
47
+
48
+ # Helper method to add namespace prefix to Redis keys
49
+ def namespaced_key(key)
50
+ "#{namespace}:#{key}"
51
+ end
52
+
53
+ # Clear unique-jobs lock for orphaned jobs to allow immediate re-enqueuing
54
+ def clear_unique_jobs_lock(job_data)
55
+ return unless job_data['unique_digest']
56
+
57
+ begin
58
+ # Check if SidekiqUniqueJobs is available
59
+ if defined?(SidekiqUniqueJobs::Digests)
60
+ SidekiqUniqueJobs::Digests.del(digest: job_data['unique_digest'])
61
+ logger.info "AssuredJobs cleared unique-jobs lock for job #{job_data['jid']} with digest #{job_data['unique_digest']}"
62
+ else
63
+ logger.debug "AssuredJobs: SidekiqUniqueJobs not available, skipping lock cleanup for job #{job_data['jid']}"
64
+ end
65
+ rescue => e
66
+ logger.warn "AssuredJobs failed to clear unique-jobs lock for job #{job_data['jid']}: #{e.message}"
67
+ end
68
+ end
69
+
70
+ def reenqueue_orphans!
71
+ with_recovery_lock do
72
+ logger.info "AssuredJobs starting orphan job recovery"
73
+
74
+ redis_sync do |conn|
75
+ # Get all job keys and instance keys (using custom namespacing)
76
+ job_keys = conn.keys(namespaced_key("jobs:*"))
77
+ instance_keys = conn.keys(namespaced_key("instance:*"))
78
+
79
+ # Extract instance IDs from keys
80
+ live_instances = instance_keys.map { |key| key.split(":").last }.to_set
81
+
82
+ orphaned_jobs = []
83
+
84
+ job_keys.each do |job_key|
85
+ instance_id = job_key.split(":").last
86
+ unless live_instances.include?(instance_id)
87
+ # Get all job IDs for this dead instance
88
+ job_ids = conn.smembers(job_key)
89
+
90
+ job_ids.each do |jid|
91
+ # Get the job payload
92
+ job_data_key = namespaced_key("job:#{jid}")
93
+ job_payload = conn.get(job_data_key)
94
+
95
+ if job_payload
96
+ orphaned_jobs << JSON.parse(job_payload)
97
+ # Clean up the job data key
98
+ conn.del(job_data_key)
99
+ end
100
+ end
101
+
102
+ # Clean up the job tracking key
103
+ conn.del(job_key)
104
+ end
105
+ end
106
+
107
+ if orphaned_jobs.any?
108
+ logger.info "AssuredJobs found #{orphaned_jobs.size} orphaned jobs, re-enqueuing"
109
+ orphaned_jobs.each do |job_data|
110
+ # Clear unique-jobs lock before re-enqueuing to avoid lock conflicts
111
+ clear_unique_jobs_lock(job_data)
112
+
113
+ Sidekiq::Client.push(job_data)
114
+ logger.info "AssuredJobs re-enqueued job #{job_data['jid']} (#{job_data['class']})"
115
+ end
116
+ else
117
+ logger.info "AssuredJobs found no orphaned jobs"
118
+ end
119
+ end
120
+ end
121
+ rescue => e
122
+ logger.error "AssuredJobs orphan recovery failed: #{e.message}"
123
+ logger.error e.backtrace.join("\n")
124
+ end
125
+
126
+ def setup_sidekiq_hooks
127
+ return unless defined?(Sidekiq::VERSION)
128
+
129
+ Sidekiq.configure_server do |config|
130
+ config.server_middleware do |chain|
131
+ chain.add AssuredJobs::Middleware
132
+ end
133
+
134
+ # Add startup hook for heartbeat and orphan recovery
135
+ config.on(:startup) do
136
+ # Ensure configuration is set up
137
+ setup_defaults unless @instance_id
138
+
139
+ logger.info "AssuredJobs starting up on instance #{instance_id}"
140
+
141
+ # Start heartbeat system
142
+ setup_heartbeat
143
+
144
+ # Run orphan recovery on startup only
145
+ Thread.new do
146
+ sleep 5 # Give the server a moment to fully start
147
+ begin
148
+ reenqueue_orphans!
149
+ spinup_delayed_recovery_thread
150
+ rescue => e
151
+ logger.error "AssuredJobs startup orphan recovery failed: #{e.message}"
152
+ logger.error e.backtrace.join("\n")
153
+ end
154
+ end
155
+ end
156
+
157
+ # Add shutdown hook to clean up
158
+ config.on(:shutdown) do
159
+ logger.info "AssuredJobs shutting down instance #{instance_id}"
160
+ begin
161
+ # Stop heartbeat thread
162
+ if @heartbeat_thread&.alive?
163
+ @heartbeat_thread.kill
164
+ @heartbeat_thread = nil
165
+ end
166
+
167
+ redis_sync do |conn|
168
+ # Only clean up instance heartbeat - let orphan recovery handle job cleanup
169
+ # This ensures that if there are running jobs during shutdown, they will be
170
+ # detected as orphaned and recovered by the next instance
171
+ conn.del(namespaced_key("instance:#{instance_id}"))
172
+
173
+ # Log tracked jobs but don't clean them up - they should be recovered as orphans
174
+ job_tracking_key = namespaced_key("jobs:#{instance_id}")
175
+ tracked_jobs = conn.smembers(job_tracking_key)
176
+
177
+ if tracked_jobs.any?
178
+ logger.warn "AssuredJobs leaving #{tracked_jobs.size} tracked jobs for orphan recovery: #{tracked_jobs.join(', ')}"
179
+ logger.info "AssuredJobs: These jobs will be recovered by the next instance startup"
180
+ else
181
+ logger.info "AssuredJobs: No tracked jobs to leave for recovery"
182
+ end
183
+ end
184
+ rescue => e
185
+ logger.error "AssuredJobs shutdown cleanup failed: #{e.message}"
186
+ end
187
+ end
188
+ end
189
+ end
190
+
191
+ private
192
+
193
+ def setup_defaults
194
+ @instance_id ||= ENV.fetch("ASSURED_JOBS_INSTANCE_ID") { SecureRandom.hex(8) }
195
+ @namespace ||= ENV.fetch("ASSURED_JOBS_NS", "sidekiq_assured_jobs")
196
+ @heartbeat_interval ||= ENV.fetch("ASSURED_JOBS_HEARTBEAT_INTERVAL", "15").to_i
197
+ @heartbeat_ttl ||= ENV.fetch("ASSURED_JOBS_HEARTBEAT_TTL", "45").to_i
198
+ @recovery_lock_ttl ||= ENV.fetch("ASSURED_JOBS_RECOVERY_LOCK_TTL", "300").to_i
199
+ @logger ||= Sidekiq.logger
200
+ @delayed_recovery_count ||= ENV.fetch("ASSURED_JOBS_DELAYED_RECOVERY_COUNT", "1").to_i
201
+ @delayed_recovery_interval ||= ENV.fetch("ASSURED_JOBS_DELAYED_RECOVERY_INTERVAL", "300").to_i
202
+ end
203
+
204
+ def setup_heartbeat
205
+ # Initial heartbeat
206
+ send_heartbeat
207
+
208
+ # Background heartbeat thread
209
+ @heartbeat_thread = Thread.new do
210
+ loop do
211
+ sleep heartbeat_interval
212
+ begin
213
+ send_heartbeat
214
+ rescue => e
215
+ logger.error "AssuredJobs heartbeat failed: #{e.message}"
216
+ end
217
+ end
218
+ end
219
+ end
220
+
221
+ def send_heartbeat
222
+ key = namespaced_key("instance:#{instance_id}")
223
+ redis_sync do |conn|
224
+ conn.setex(key, heartbeat_ttl, Time.now.to_f)
225
+ end
226
+ logger.debug "AssuredJobs heartbeat sent for instance #{instance_id}"
227
+ end
228
+
229
+ def spinup_delayed_recovery_thread
230
+ Thread.new do
231
+ @delayed_recovery_count.times do |i|
232
+ sleep @delayed_recovery_interval
233
+ begin
234
+ reenqueue_orphans!
235
+ rescue => e
236
+ logger.error(
237
+ "[AssuredJobs] delayed recovery ##{i+1} failed: #{e.message}"
238
+ )
239
+ end
240
+ end
241
+ end
242
+ end
243
+ def with_recovery_lock
244
+ lock_key = namespaced_key("recovery_lock")
245
+ lock_acquired = redis_sync do |conn|
246
+ conn.set(lock_key, instance_id, nx: true, ex: recovery_lock_ttl)
247
+ end
248
+
249
+ if lock_acquired
250
+ logger.info "AssuredJobs recovery lock acquired by instance #{instance_id}"
251
+ begin
252
+ yield
253
+ ensure
254
+ redis_sync { |conn| conn.del(lock_key) }
255
+ logger.info "AssuredJobs recovery lock released by instance #{instance_id}"
256
+ end
257
+ else
258
+ logger.debug "AssuredJobs recovery lock not acquired, another instance is handling recovery"
259
+ end
260
+ end
261
+ end
262
+ end
263
+ end
264
+
265
+ # Auto-setup defaults and Sidekiq hooks when gem is required
266
+ Sidekiq::AssuredJobs.send(:setup_defaults)
267
+ Sidekiq::AssuredJobs.setup_sidekiq_hooks
metadata ADDED
@@ -0,0 +1,131 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: sidekiq-assured-jobs
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Manikanta Gopi
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2025-06-30 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: sidekiq
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '6.0'
20
+ - - "<"
21
+ - !ruby/object:Gem::Version
22
+ version: '7'
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ version: '6.0'
30
+ - - "<"
31
+ - !ruby/object:Gem::Version
32
+ version: '7'
33
+ - !ruby/object:Gem::Dependency
34
+ name: redis
35
+ requirement: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '4.0'
40
+ type: :runtime
41
+ prerelease: false
42
+ version_requirements: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '4.0'
47
+ - !ruby/object:Gem::Dependency
48
+ name: rspec
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '3.0'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '3.0'
61
+ - !ruby/object:Gem::Dependency
62
+ name: bundler
63
+ requirement: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '2.0'
68
+ type: :development
69
+ prerelease: false
70
+ version_requirements: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '2.0'
75
+ - !ruby/object:Gem::Dependency
76
+ name: rubocop
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '1.0'
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '1.0'
89
+ description: Ensures Sidekiq jobs are never lost due to worker crashes or restarts
90
+ by tracking in-flight jobs and automatically recovering orphaned work
91
+ email:
92
+ - gopimanikanta50@gmail.com
93
+ executables: []
94
+ extensions: []
95
+ extra_rdoc_files: []
96
+ files:
97
+ - CHANGELOG.md
98
+ - LICENSE.txt
99
+ - README.md
100
+ - lib/sidekiq-assured-jobs.rb
101
+ - lib/sidekiq/assured_jobs/middleware.rb
102
+ - lib/sidekiq/assured_jobs/version.rb
103
+ - lib/sidekiq/assured_jobs/worker.rb
104
+ homepage: https://github.com/praja/sidekiq-assured-jobs
105
+ licenses:
106
+ - MIT
107
+ metadata:
108
+ allowed_push_host: https://rubygems.org
109
+ homepage_uri: https://github.com/praja/sidekiq-assured-jobs
110
+ source_code_uri: https://github.com/praja/sidekiq-assured-jobs
111
+ changelog_uri: https://github.com/praja/sidekiq-assured-jobs/blob/main/CHANGELOG.md
112
+ post_install_message:
113
+ rdoc_options: []
114
+ require_paths:
115
+ - lib
116
+ required_ruby_version: !ruby/object:Gem::Requirement
117
+ requirements:
118
+ - - ">="
119
+ - !ruby/object:Gem::Version
120
+ version: 2.6.0
121
+ required_rubygems_version: !ruby/object:Gem::Requirement
122
+ requirements:
123
+ - - ">="
124
+ - !ruby/object:Gem::Version
125
+ version: '0'
126
+ requirements: []
127
+ rubygems_version: 3.4.17
128
+ signing_key:
129
+ specification_version: 4
130
+ summary: Reliable job execution guarantee for Sidekiq with automatic orphan recovery
131
+ test_files: []