smart_message 0.0.8 → 0.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/.irbrc +24 -0
- data/CHANGELOG.md +96 -0
- data/Gemfile.lock +6 -1
- data/README.md +289 -15
- data/docs/README.md +3 -1
- data/docs/addressing.md +119 -13
- data/docs/architecture.md +68 -0
- data/docs/dead_letter_queue.md +673 -0
- data/docs/dispatcher.md +87 -0
- data/docs/examples.md +59 -1
- data/docs/getting-started.md +8 -1
- data/docs/logging.md +382 -326
- data/docs/message_filtering.md +451 -0
- data/examples/01_point_to_point_orders.rb +54 -53
- data/examples/02_publish_subscribe_events.rb +14 -10
- data/examples/03_many_to_many_chat.rb +16 -8
- data/examples/04_redis_smart_home_iot.rb +20 -10
- data/examples/05_proc_handlers.rb +12 -11
- data/examples/06_custom_logger_example.rb +95 -100
- data/examples/07_error_handling_scenarios.rb +4 -2
- data/examples/08_entity_addressing_basic.rb +18 -6
- data/examples/08_entity_addressing_with_filtering.rb +27 -9
- data/examples/09_dead_letter_queue_demo.rb +559 -0
- data/examples/09_regex_filtering_microservices.rb +407 -0
- data/examples/10_header_block_configuration.rb +263 -0
- data/examples/11_global_configuration_example.rb +219 -0
- data/examples/README.md +102 -0
- data/examples/dead_letters.jsonl +12 -0
- data/examples/performance_metrics/benchmark_results_ractor_20250818_205603.json +135 -0
- data/examples/performance_metrics/benchmark_results_ractor_20250818_205831.json +135 -0
- data/examples/performance_metrics/benchmark_results_test_20250818_204942.json +130 -0
- data/examples/performance_metrics/benchmark_results_threadpool_20250818_204942.json +130 -0
- data/examples/performance_metrics/benchmark_results_threadpool_20250818_204959.json +130 -0
- data/examples/performance_metrics/benchmark_results_threadpool_20250818_205044.json +130 -0
- data/examples/performance_metrics/benchmark_results_threadpool_20250818_205109.json +130 -0
- data/examples/performance_metrics/benchmark_results_threadpool_20250818_205252.json +130 -0
- data/examples/performance_metrics/benchmark_results_unknown_20250819_172852.json +130 -0
- data/examples/performance_metrics/compare_benchmarks.rb +519 -0
- data/examples/performance_metrics/dead_letters.jsonl +3100 -0
- data/examples/performance_metrics/performance_benchmark.rb +344 -0
- data/examples/show_logger.rb +367 -0
- data/examples/show_me.rb +145 -0
- data/examples/temp.txt +94 -0
- data/examples/tmux_chat/bot_agent.rb +4 -2
- data/examples/tmux_chat/human_agent.rb +4 -2
- data/examples/tmux_chat/room_monitor.rb +4 -2
- data/examples/tmux_chat/shared_chat_system.rb +6 -3
- data/lib/smart_message/addressing.rb +259 -0
- data/lib/smart_message/base.rb +121 -599
- data/lib/smart_message/circuit_breaker.rb +2 -1
- data/lib/smart_message/configuration.rb +199 -0
- data/lib/smart_message/dead_letter_queue.rb +27 -10
- data/lib/smart_message/dispatcher.rb +90 -49
- data/lib/smart_message/header.rb +5 -0
- data/lib/smart_message/logger/base.rb +21 -1
- data/lib/smart_message/logger/default.rb +88 -138
- data/lib/smart_message/logger/lumberjack.rb +324 -0
- data/lib/smart_message/logger/null.rb +81 -0
- data/lib/smart_message/logger.rb +17 -9
- data/lib/smart_message/messaging.rb +100 -0
- data/lib/smart_message/plugins.rb +132 -0
- data/lib/smart_message/serializer/base.rb +25 -8
- data/lib/smart_message/serializer/json.rb +5 -4
- data/lib/smart_message/subscription.rb +193 -0
- data/lib/smart_message/transport/base.rb +72 -41
- data/lib/smart_message/transport/memory_transport.rb +7 -5
- data/lib/smart_message/transport/redis_transport.rb +15 -45
- data/lib/smart_message/transport/stdout_transport.rb +18 -8
- data/lib/smart_message/transport.rb +1 -34
- data/lib/smart_message/utilities.rb +142 -0
- data/lib/smart_message/version.rb +1 -1
- data/lib/smart_message/versioning.rb +85 -0
- data/lib/smart_message/wrapper.rb.bak +132 -0
- data/lib/smart_message.rb +74 -28
- data/smart_message.gemspec +3 -0
- metadata +76 -3
- data/lib/smart_message/serializer.rb +0 -10
- data/lib/smart_message/wrapper.rb +0 -43
@@ -0,0 +1,673 @@
|
|
1
|
+
# Dead Letter Queue
|
2
|
+
|
3
|
+
SmartMessage includes a comprehensive file-based Dead Letter Queue (DLQ) system for capturing, storing, and replaying failed messages. The DLQ provides production-grade reliability with automatic integration into the circuit breaker system.
|
4
|
+
|
5
|
+
## Overview
|
6
|
+
|
7
|
+
The Dead Letter Queue serves as a safety net for your messaging system:
|
8
|
+
|
9
|
+
- **Automatic Capture**: Failed messages are automatically stored when circuit breakers trip
|
10
|
+
- **Manual Capture**: Explicitly store messages that fail business logic validation
|
11
|
+
- **Replay Capabilities**: Retry failed messages individually, in batches, or all at once
|
12
|
+
- **Transport Override**: Replay messages through a different transport than originally configured
|
13
|
+
- **Administrative Tools**: Filter, analyze, and export messages for debugging
|
14
|
+
- **Thread-Safe**: All operations are protected with mutex for concurrent access
|
15
|
+
|
16
|
+
## File Format
|
17
|
+
|
18
|
+
The DLQ uses JSON Lines (.jsonl) format - one JSON object per line:
|
19
|
+
|
20
|
+
```json
|
21
|
+
{"timestamp":"2025-08-19T10:30:45Z","header":{...},"payload":"...","error":"Connection timeout","retry_count":0,"transport":"Redis","stack_trace":"..."}
|
22
|
+
{"timestamp":"2025-08-19T10:31:12Z","header":{...},"payload":"...","error":"Circuit breaker open","retry_count":1,"transport":"Redis","stack_trace":"..."}
|
23
|
+
```
|
24
|
+
|
25
|
+
Benefits of JSON Lines:
|
26
|
+
- Append-only for efficient writes
|
27
|
+
- Line-by-line processing for memory efficiency
|
28
|
+
- Human-readable for debugging
|
29
|
+
- Easy to process with standard Unix tools
|
30
|
+
|
31
|
+
## Configuration
|
32
|
+
|
33
|
+
### Global Default Configuration
|
34
|
+
|
35
|
+
Configure a default DLQ that all components will use:
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
# Set default path for all DLQ operations
|
39
|
+
SmartMessage::DeadLetterQueue.configure_default('/var/log/app/dlq.jsonl')
|
40
|
+
|
41
|
+
# Access the default instance anywhere
|
42
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
43
|
+
```
|
44
|
+
|
45
|
+
### Environment-Based Configuration
|
46
|
+
|
47
|
+
Use environment variables for different deployments:
|
48
|
+
|
49
|
+
```ruby
|
50
|
+
# In your application initialization
|
51
|
+
SmartMessage::DeadLetterQueue.configure_default(
|
52
|
+
ENV.fetch('SMART_MESSAGE_DLQ_PATH', 'dead_letters.jsonl')
|
53
|
+
)
|
54
|
+
```
|
55
|
+
|
56
|
+
### Per-Environment Configuration
|
57
|
+
|
58
|
+
Configure different paths for each environment:
|
59
|
+
|
60
|
+
```ruby
|
61
|
+
# config/initializers/smart_message.rb (Rails example)
|
62
|
+
case Rails.env
|
63
|
+
when 'production'
|
64
|
+
SmartMessage::DeadLetterQueue.configure_default('/var/log/smart_message/production_dlq.jsonl')
|
65
|
+
when 'staging'
|
66
|
+
SmartMessage::DeadLetterQueue.configure_default('/var/log/smart_message/staging_dlq.jsonl')
|
67
|
+
else
|
68
|
+
SmartMessage::DeadLetterQueue.configure_default('tmp/development_dlq.jsonl')
|
69
|
+
end
|
70
|
+
```
|
71
|
+
|
72
|
+
### Custom Instances
|
73
|
+
|
74
|
+
Create separate DLQ instances for different purposes:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
# Critical failures need special handling
|
78
|
+
critical_dlq = SmartMessage::DeadLetterQueue.new('/var/log/critical_failures.jsonl')
|
79
|
+
|
80
|
+
# Separate DLQ for payment messages
|
81
|
+
payment_dlq = SmartMessage::DeadLetterQueue.new('/var/log/payment_failures.jsonl')
|
82
|
+
|
83
|
+
# Temporary DLQ for testing
|
84
|
+
test_dlq = SmartMessage::DeadLetterQueue.new('/tmp/test_failures.jsonl')
|
85
|
+
```
|
86
|
+
|
87
|
+
## Core Operations
|
88
|
+
|
89
|
+
### FIFO Queue Operations
|
90
|
+
|
91
|
+
The DLQ operates as a First-In-First-Out queue:
|
92
|
+
|
93
|
+
```ruby
|
94
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
95
|
+
|
96
|
+
# Add a failed message
|
97
|
+
entry = dlq.enqueue(
|
98
|
+
message_header, # SmartMessage::Header object
|
99
|
+
message_payload, # Serialized message string
|
100
|
+
error: "Connection timeout",
|
101
|
+
retry_count: 0,
|
102
|
+
transport: "Redis",
|
103
|
+
stack_trace: exception.backtrace.join("\n")
|
104
|
+
)
|
105
|
+
|
106
|
+
# Check queue size
|
107
|
+
puts "Messages in queue: #{dlq.size}"
|
108
|
+
|
109
|
+
# Peek at the oldest message without removing it
|
110
|
+
next_message = dlq.peek
|
111
|
+
puts "Next for replay: #{next_message[:header][:message_class]}"
|
112
|
+
|
113
|
+
# Remove and get the oldest message
|
114
|
+
message = dlq.dequeue
|
115
|
+
process_message(message) if message
|
116
|
+
|
117
|
+
# Clear all messages
|
118
|
+
dlq.clear
|
119
|
+
```
|
120
|
+
|
121
|
+
### Message Structure
|
122
|
+
|
123
|
+
Each DLQ entry contains:
|
124
|
+
|
125
|
+
```ruby
|
126
|
+
{
|
127
|
+
timestamp: "2025-08-19T10:30:45Z", # When the failure occurred
|
128
|
+
header: { # Complete message header
|
129
|
+
uuid: "abc-123",
|
130
|
+
message_class: "OrderMessage",
|
131
|
+
published_at: "2025-08-19T10:30:40Z",
|
132
|
+
publisher_pid: 12345,
|
133
|
+
version: 1,
|
134
|
+
from: "order-service",
|
135
|
+
to: "payment-service",
|
136
|
+
reply_to: "order-service"
|
137
|
+
},
|
138
|
+
payload: '{"order_id":"123","amount":99.99}', # Original message payload
|
139
|
+
payload_format: "json", # Serialization format
|
140
|
+
error: "Connection refused", # Error message
|
141
|
+
retry_count: 2, # Number of retry attempts
|
142
|
+
transport: "Redis", # Transport that failed
|
143
|
+
stack_trace: "..." # Full stack trace (optional)
|
144
|
+
}
|
145
|
+
```
|
146
|
+
|
147
|
+
## Replay Capabilities
|
148
|
+
|
149
|
+
### Individual Message Replay
|
150
|
+
|
151
|
+
Replay the oldest message:
|
152
|
+
|
153
|
+
```ruby
|
154
|
+
result = dlq.replay_one
|
155
|
+
if result[:success]
|
156
|
+
puts "Message replayed successfully"
|
157
|
+
else
|
158
|
+
puts "Replay failed: #{result[:error]}"
|
159
|
+
end
|
160
|
+
```
|
161
|
+
|
162
|
+
### Batch Replay
|
163
|
+
|
164
|
+
Replay multiple messages:
|
165
|
+
|
166
|
+
```ruby
|
167
|
+
# Replay next 10 messages
|
168
|
+
results = dlq.replay_batch(10)
|
169
|
+
puts "Successful: #{results[:success]}"
|
170
|
+
puts "Failed: #{results[:failed]}"
|
171
|
+
results[:errors].each do |error|
|
172
|
+
puts "Error: #{error}"
|
173
|
+
end
|
174
|
+
```
|
175
|
+
|
176
|
+
### Full Queue Replay
|
177
|
+
|
178
|
+
Replay all messages:
|
179
|
+
|
180
|
+
```ruby
|
181
|
+
results = dlq.replay_all
|
182
|
+
puts "Replayed #{results[:success]} messages"
|
183
|
+
puts "Failed to replay #{results[:failed]} messages"
|
184
|
+
```
|
185
|
+
|
186
|
+
### Transport Override
|
187
|
+
|
188
|
+
Replay through a different transport:
|
189
|
+
|
190
|
+
```ruby
|
191
|
+
# Original message used Redis, replay through RabbitMQ
|
192
|
+
rabbit_transport = SmartMessage::Transport.create(:rabbitmq)
|
193
|
+
|
194
|
+
# Replay one with override
|
195
|
+
dlq.replay_one(rabbit_transport)
|
196
|
+
|
197
|
+
# Replay batch with override
|
198
|
+
dlq.replay_batch(10, rabbit_transport)
|
199
|
+
|
200
|
+
# Replay all with override
|
201
|
+
dlq.replay_all(rabbit_transport)
|
202
|
+
```
|
203
|
+
|
204
|
+
## Administrative Functions
|
205
|
+
|
206
|
+
### Message Filtering
|
207
|
+
|
208
|
+
Filter messages for analysis:
|
209
|
+
|
210
|
+
```ruby
|
211
|
+
# Find all failed OrderMessage instances
|
212
|
+
order_failures = dlq.filter_by_class('OrderMessage')
|
213
|
+
puts "Found #{order_failures.size} failed orders"
|
214
|
+
|
215
|
+
# Find all timeout errors
|
216
|
+
timeout_errors = dlq.filter_by_error_pattern(/timeout/i)
|
217
|
+
timeout_errors.each do |entry|
|
218
|
+
puts "Timeout at #{entry[:timestamp]}: #{entry[:error]}"
|
219
|
+
end
|
220
|
+
|
221
|
+
# Find connection errors
|
222
|
+
connection_errors = dlq.filter_by_error_pattern('Connection refused')
|
223
|
+
```
|
224
|
+
|
225
|
+
### Statistics
|
226
|
+
|
227
|
+
Get queue statistics:
|
228
|
+
|
229
|
+
```ruby
|
230
|
+
stats = dlq.statistics
|
231
|
+
puts "Total messages: #{stats[:total]}"
|
232
|
+
|
233
|
+
# Breakdown by message class
|
234
|
+
stats[:by_class].each do |klass, count|
|
235
|
+
puts "#{klass}: #{count} failures"
|
236
|
+
end
|
237
|
+
|
238
|
+
# Breakdown by error type
|
239
|
+
stats[:by_error].sort_by { |_, count| -count }.first(5).each do |error, count|
|
240
|
+
puts "#{error}: #{count} occurrences"
|
241
|
+
end
|
242
|
+
```
|
243
|
+
|
244
|
+
### Time-Based Export
|
245
|
+
|
246
|
+
Export messages within a time range:
|
247
|
+
|
248
|
+
```ruby
|
249
|
+
# Get failures from the last hour
|
250
|
+
one_hour_ago = Time.now - 3600
|
251
|
+
recent_failures = dlq.export_range(one_hour_ago, Time.now)
|
252
|
+
|
253
|
+
# Get failures from yesterday
|
254
|
+
yesterday_start = Time.now - 86400
|
255
|
+
yesterday_end = Time.now - 1
|
256
|
+
yesterday_failures = dlq.export_range(yesterday_start, yesterday_end)
|
257
|
+
|
258
|
+
# Export for analysis
|
259
|
+
File.write('failures_export.json', recent_failures.to_json)
|
260
|
+
```
|
261
|
+
|
262
|
+
### Message Inspection
|
263
|
+
|
264
|
+
Inspect messages without removing them:
|
265
|
+
|
266
|
+
```ruby
|
267
|
+
# Look at next 10 messages
|
268
|
+
messages = dlq.inspect_messages(limit: 10)
|
269
|
+
messages.each do |msg|
|
270
|
+
puts "#{msg[:timestamp]} - #{msg[:header][:message_class]}: #{msg[:error]}"
|
271
|
+
end
|
272
|
+
|
273
|
+
# Default limit is 10
|
274
|
+
dlq.inspect_messages.each do |msg|
|
275
|
+
analyze_failure(msg)
|
276
|
+
end
|
277
|
+
```
|
278
|
+
|
279
|
+
## Integration with Circuit Breakers
|
280
|
+
|
281
|
+
The DLQ is automatically integrated with SmartMessage's circuit breaker system:
|
282
|
+
|
283
|
+
### Automatic Capture
|
284
|
+
|
285
|
+
When circuit breakers trip, messages are automatically sent to the DLQ:
|
286
|
+
|
287
|
+
```ruby
|
288
|
+
class PaymentMessage < SmartMessage::Base
|
289
|
+
config do
|
290
|
+
transport SmartMessage::Transport.create(:redis)
|
291
|
+
# Circuit breaker configured automatically
|
292
|
+
end
|
293
|
+
end
|
294
|
+
|
295
|
+
# If Redis is down, circuit breaker trips and message goes to DLQ
|
296
|
+
message = PaymentMessage.new(amount: 100.00)
|
297
|
+
begin
|
298
|
+
message.publish
|
299
|
+
rescue => e
|
300
|
+
# Message is already in DLQ via circuit breaker
|
301
|
+
puts "Message saved to DLQ"
|
302
|
+
end
|
303
|
+
```
|
304
|
+
|
305
|
+
### Manual Circuit Breaker Integration
|
306
|
+
|
307
|
+
Configure custom circuit breakers with DLQ fallback:
|
308
|
+
|
309
|
+
```ruby
|
310
|
+
class CriticalService
|
311
|
+
include BreakerMachines::DSL
|
312
|
+
|
313
|
+
circuit :external_api do
|
314
|
+
threshold failures: 3, within: 60.seconds
|
315
|
+
reset_after 30.seconds
|
316
|
+
|
317
|
+
# Use custom DLQ for critical failures
|
318
|
+
custom_dlq = SmartMessage::DeadLetterQueue.new('/var/log/critical.jsonl')
|
319
|
+
fallback SmartMessage::CircuitBreaker::Fallbacks.dead_letter_queue(custom_dlq)
|
320
|
+
end
|
321
|
+
|
322
|
+
def call_api(message)
|
323
|
+
circuit(:external_api).wrap do
|
324
|
+
# API call that might fail
|
325
|
+
external_api.send(message)
|
326
|
+
end
|
327
|
+
end
|
328
|
+
end
|
329
|
+
```
|
330
|
+
|
331
|
+
## Monitoring and Alerting
|
332
|
+
|
333
|
+
### Queue Size Monitoring
|
334
|
+
|
335
|
+
Monitor DLQ growth:
|
336
|
+
|
337
|
+
```ruby
|
338
|
+
# Simple monitoring script
|
339
|
+
loop do
|
340
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
341
|
+
size = dlq.size
|
342
|
+
|
343
|
+
if size > 100
|
344
|
+
send_alert("DLQ size critical: #{size} messages")
|
345
|
+
elsif size > 50
|
346
|
+
send_warning("DLQ size warning: #{size} messages")
|
347
|
+
end
|
348
|
+
|
349
|
+
sleep 60 # Check every minute
|
350
|
+
end
|
351
|
+
```
|
352
|
+
|
353
|
+
### Error Pattern Detection
|
354
|
+
|
355
|
+
Detect systematic failures:
|
356
|
+
|
357
|
+
```ruby
|
358
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
359
|
+
stats = dlq.statistics
|
360
|
+
|
361
|
+
# Check for dominant error patterns
|
362
|
+
top_error = stats[:by_error].max_by { |_, count| count }
|
363
|
+
if top_error && top_error[1] > 10
|
364
|
+
alert("Systematic failure detected: #{top_error[0]} (#{top_error[1]} occurrences)")
|
365
|
+
end
|
366
|
+
|
367
|
+
# Check for specific service failures
|
368
|
+
stats[:by_class].each do |klass, count|
|
369
|
+
if count > 5
|
370
|
+
alert("Service degradation: #{klass} has #{count} failures")
|
371
|
+
end
|
372
|
+
end
|
373
|
+
```
|
374
|
+
|
375
|
+
## Best Practices
|
376
|
+
|
377
|
+
### 1. Regular Monitoring
|
378
|
+
|
379
|
+
Set up monitoring for DLQ size and growth rate:
|
380
|
+
|
381
|
+
```ruby
|
382
|
+
# Prometheus metrics example
|
383
|
+
dlq_size = Prometheus::Client::Gauge.new(:dlq_size, 'Dead letter queue size')
|
384
|
+
dlq_size.set(SmartMessage::DeadLetterQueue.default.size)
|
385
|
+
```
|
386
|
+
|
387
|
+
### 2. Automated Replay
|
388
|
+
|
389
|
+
Schedule periodic replay attempts:
|
390
|
+
|
391
|
+
```ruby
|
392
|
+
# Sidekiq job example
|
393
|
+
class ReplayDLQJob
|
394
|
+
include Sidekiq::Worker
|
395
|
+
|
396
|
+
def perform
|
397
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
398
|
+
|
399
|
+
# Only replay if queue is manageable
|
400
|
+
if dlq.size < 100
|
401
|
+
results = dlq.replay_all
|
402
|
+
log_results(results)
|
403
|
+
else
|
404
|
+
# Replay in smaller batches
|
405
|
+
results = dlq.replay_batch(10)
|
406
|
+
log_results(results)
|
407
|
+
end
|
408
|
+
end
|
409
|
+
|
410
|
+
private
|
411
|
+
|
412
|
+
def log_results(results)
|
413
|
+
Rails.logger.info("DLQ Replay: #{results[:success]} success, #{results[:failed]} failed")
|
414
|
+
end
|
415
|
+
end
|
416
|
+
```
|
417
|
+
|
418
|
+
### 3. Archival Strategy
|
419
|
+
|
420
|
+
Archive old messages:
|
421
|
+
|
422
|
+
```ruby
|
423
|
+
# Archive messages older than 7 days
|
424
|
+
def archive_old_messages
|
425
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
426
|
+
archive_path = "/var/archive/dlq_#{Date.today}.jsonl"
|
427
|
+
|
428
|
+
seven_days_ago = Time.now - (7 * 86400)
|
429
|
+
old_messages = dlq.export_range(Time.at(0), seven_days_ago)
|
430
|
+
|
431
|
+
if old_messages.any?
|
432
|
+
File.write(archive_path, old_messages.map(&:to_json).join("\n"))
|
433
|
+
# Remove archived messages from active DLQ
|
434
|
+
# (Note: This would require implementing a remove_range method)
|
435
|
+
end
|
436
|
+
end
|
437
|
+
```
|
438
|
+
|
439
|
+
### 4. Error Classification
|
440
|
+
|
441
|
+
Classify errors for better handling:
|
442
|
+
|
443
|
+
```ruby
|
444
|
+
class DLQAnalyzer
|
445
|
+
TRANSIENT_ERRORS = [
|
446
|
+
/connection refused/i,
|
447
|
+
/timeout/i,
|
448
|
+
/temporarily unavailable/i
|
449
|
+
]
|
450
|
+
|
451
|
+
PERMANENT_ERRORS = [
|
452
|
+
/invalid message format/i,
|
453
|
+
/unauthorized/i,
|
454
|
+
/not found/i
|
455
|
+
]
|
456
|
+
|
457
|
+
def self.classify_errors(dlq)
|
458
|
+
transient = []
|
459
|
+
permanent = []
|
460
|
+
|
461
|
+
dlq.inspect_messages(limit: 100).each do |msg|
|
462
|
+
if TRANSIENT_ERRORS.any? { |pattern| msg[:error].match?(pattern) }
|
463
|
+
transient << msg
|
464
|
+
elsif PERMANENT_ERRORS.any? { |pattern| msg[:error].match?(pattern) }
|
465
|
+
permanent << msg
|
466
|
+
end
|
467
|
+
end
|
468
|
+
|
469
|
+
{ transient: transient, permanent: permanent }
|
470
|
+
end
|
471
|
+
end
|
472
|
+
```
|
473
|
+
|
474
|
+
## Troubleshooting
|
475
|
+
|
476
|
+
### Common Issues
|
477
|
+
|
478
|
+
#### 1. DLQ File Growing Too Large
|
479
|
+
|
480
|
+
```ruby
|
481
|
+
# Rotate DLQ files
|
482
|
+
def rotate_dlq
|
483
|
+
dlq = SmartMessage::DeadLetterQueue.default
|
484
|
+
timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
|
485
|
+
|
486
|
+
# Move current file
|
487
|
+
FileUtils.mv(dlq.file_path, "#{dlq.file_path}.#{timestamp}")
|
488
|
+
|
489
|
+
# DLQ will create new file automatically
|
490
|
+
end
|
491
|
+
```
|
492
|
+
|
493
|
+
#### 2. Replay Failures
|
494
|
+
|
495
|
+
```ruby
|
496
|
+
# Debug replay failures
|
497
|
+
result = dlq.replay_one
|
498
|
+
unless result[:success]
|
499
|
+
puts "Replay failed: #{result[:error]}"
|
500
|
+
|
501
|
+
# Check if message class still exists
|
502
|
+
message = dlq.peek
|
503
|
+
begin
|
504
|
+
message[:header][:message_class].constantize
|
505
|
+
rescue NameError => e
|
506
|
+
puts "Message class no longer exists: #{e.message}"
|
507
|
+
end
|
508
|
+
end
|
509
|
+
```
|
510
|
+
|
511
|
+
#### 3. Corrupted DLQ File
|
512
|
+
|
513
|
+
```ruby
|
514
|
+
# Recover from corrupted file
|
515
|
+
def recover_dlq(corrupted_path)
|
516
|
+
recovered = []
|
517
|
+
|
518
|
+
File.foreach(corrupted_path) do |line|
|
519
|
+
begin
|
520
|
+
entry = JSON.parse(line.strip, symbolize_names: true)
|
521
|
+
recovered << entry
|
522
|
+
rescue JSON::ParserError
|
523
|
+
# Skip corrupted line
|
524
|
+
puts "Skipping corrupted line: #{line[0..50]}..."
|
525
|
+
end
|
526
|
+
end
|
527
|
+
|
528
|
+
# Write recovered entries to new file
|
529
|
+
new_dlq = SmartMessage::DeadLetterQueue.new("#{corrupted_path}.recovered")
|
530
|
+
recovered.each do |entry|
|
531
|
+
new_dlq.enqueue(
|
532
|
+
SmartMessage::Header.new(entry[:header]),
|
533
|
+
entry[:payload],
|
534
|
+
error: entry[:error],
|
535
|
+
retry_count: entry[:retry_count]
|
536
|
+
)
|
537
|
+
end
|
538
|
+
|
539
|
+
puts "Recovered #{recovered.size} messages"
|
540
|
+
end
|
541
|
+
```
|
542
|
+
|
543
|
+
## Performance Considerations
|
544
|
+
|
545
|
+
### File I/O Optimization
|
546
|
+
|
547
|
+
The DLQ uses several optimizations:
|
548
|
+
|
549
|
+
1. **Append-only writes**: New messages are appended, not inserted
|
550
|
+
2. **Immediate sync**: `file.fsync` ensures durability
|
551
|
+
3. **Mutex protection**: Thread-safe but may create contention
|
552
|
+
4. **Line-based processing**: Memory efficient for large files
|
553
|
+
|
554
|
+
### Scaling Strategies
|
555
|
+
|
556
|
+
For high-volume systems:
|
557
|
+
|
558
|
+
```ruby
|
559
|
+
# Use multiple DLQ instances by message type
|
560
|
+
class DLQRouter
|
561
|
+
def self.get_dlq_for(message_class)
|
562
|
+
case message_class
|
563
|
+
when /Payment/
|
564
|
+
@payment_dlq ||= SmartMessage::DeadLetterQueue.new('/var/log/payment_dlq.jsonl')
|
565
|
+
when /Order/
|
566
|
+
@order_dlq ||= SmartMessage::DeadLetterQueue.new('/var/log/order_dlq.jsonl')
|
567
|
+
else
|
568
|
+
SmartMessage::DeadLetterQueue.default
|
569
|
+
end
|
570
|
+
end
|
571
|
+
end
|
572
|
+
```
|
573
|
+
|
574
|
+
### Memory Usage
|
575
|
+
|
576
|
+
For large DLQ files:
|
577
|
+
|
578
|
+
```ruby
|
579
|
+
# Process in chunks to avoid memory issues
|
580
|
+
def process_large_dlq(dlq, chunk_size: 100)
|
581
|
+
processed = 0
|
582
|
+
|
583
|
+
while dlq.size > 0 && processed < 1000
|
584
|
+
# Process only chunk_size at a time
|
585
|
+
chunk_size.times do
|
586
|
+
break if dlq.size == 0
|
587
|
+
|
588
|
+
message = dlq.dequeue
|
589
|
+
process_message(message)
|
590
|
+
processed += 1
|
591
|
+
end
|
592
|
+
|
593
|
+
# Let other operations run
|
594
|
+
sleep(0.1)
|
595
|
+
end
|
596
|
+
|
597
|
+
processed
|
598
|
+
end
|
599
|
+
```
|
600
|
+
|
601
|
+
## Security Considerations
|
602
|
+
|
603
|
+
### File Permissions
|
604
|
+
|
605
|
+
Ensure proper file permissions:
|
606
|
+
|
607
|
+
```ruby
|
608
|
+
# Set restrictive permissions on DLQ files
|
609
|
+
def secure_dlq_file(path)
|
610
|
+
File.chmod(0600, path) if File.exist?(path) # Read/write for owner only
|
611
|
+
end
|
612
|
+
```
|
613
|
+
|
614
|
+
### Sensitive Data
|
615
|
+
|
616
|
+
Be careful with sensitive data in DLQ:
|
617
|
+
|
618
|
+
```ruby
|
619
|
+
# Sanitize sensitive data before storing
|
620
|
+
def sanitize_for_dlq(payload)
|
621
|
+
data = JSON.parse(payload)
|
622
|
+
data['credit_card'] = 'REDACTED' if data['credit_card']
|
623
|
+
data['password'] = 'REDACTED' if data['password']
|
624
|
+
data.to_json
|
625
|
+
end
|
626
|
+
```
|
627
|
+
|
628
|
+
### Encryption
|
629
|
+
|
630
|
+
For sensitive environments:
|
631
|
+
|
632
|
+
```ruby
|
633
|
+
# Example: Encrypt DLQ entries
|
634
|
+
require 'openssl'
|
635
|
+
|
636
|
+
class EncryptedDLQ < SmartMessage::DeadLetterQueue
|
637
|
+
def enqueue(header, payload, **options)
|
638
|
+
encrypted_payload = encrypt(payload)
|
639
|
+
super(header, encrypted_payload, **options)
|
640
|
+
end
|
641
|
+
|
642
|
+
def dequeue
|
643
|
+
entry = super
|
644
|
+
return nil unless entry
|
645
|
+
|
646
|
+
entry[:payload] = decrypt(entry[:payload])
|
647
|
+
entry
|
648
|
+
end
|
649
|
+
|
650
|
+
private
|
651
|
+
|
652
|
+
def encrypt(data)
|
653
|
+
# Implement encryption
|
654
|
+
end
|
655
|
+
|
656
|
+
def decrypt(data)
|
657
|
+
# Implement decryption
|
658
|
+
end
|
659
|
+
end
|
660
|
+
```
|
661
|
+
|
662
|
+
## Summary
|
663
|
+
|
664
|
+
The SmartMessage Dead Letter Queue provides:
|
665
|
+
|
666
|
+
- **Reliability**: Automatic capture of failed messages
|
667
|
+
- **Flexibility**: Multiple configuration options
|
668
|
+
- **Recoverability**: Comprehensive replay capabilities
|
669
|
+
- **Observability**: Statistics and filtering for analysis
|
670
|
+
- **Integration**: Seamless circuit breaker integration
|
671
|
+
- **Production-Ready**: Thread-safe, performant, and scalable
|
672
|
+
|
673
|
+
The DLQ ensures that no message is lost, even during system failures, and provides the tools needed to analyze, replay, and manage failed messages effectively.
|