e11y 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +4 -0
  3. data/.rubocop.yml +69 -0
  4. data/CHANGELOG.md +26 -0
  5. data/CODE_OF_CONDUCT.md +64 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +179 -0
  8. data/Rakefile +37 -0
  9. data/benchmarks/run_all.rb +33 -0
  10. data/config/README.md +83 -0
  11. data/config/loki-local-config.yaml +35 -0
  12. data/config/prometheus.yml +15 -0
  13. data/docker-compose.yml +78 -0
  14. data/docs/00-ICP-AND-TIMELINE.md +483 -0
  15. data/docs/01-SCALE-REQUIREMENTS.md +858 -0
  16. data/docs/ADR-001-architecture.md +2617 -0
  17. data/docs/ADR-002-metrics-yabeda.md +1395 -0
  18. data/docs/ADR-003-slo-observability.md +3337 -0
  19. data/docs/ADR-004-adapter-architecture.md +2385 -0
  20. data/docs/ADR-005-tracing-context.md +1372 -0
  21. data/docs/ADR-006-security-compliance.md +4143 -0
  22. data/docs/ADR-007-opentelemetry-integration.md +1385 -0
  23. data/docs/ADR-008-rails-integration.md +1911 -0
  24. data/docs/ADR-009-cost-optimization.md +2993 -0
  25. data/docs/ADR-010-developer-experience.md +2166 -0
  26. data/docs/ADR-011-testing-strategy.md +1836 -0
  27. data/docs/ADR-012-event-evolution.md +958 -0
  28. data/docs/ADR-013-reliability-error-handling.md +2750 -0
  29. data/docs/ADR-014-event-driven-slo.md +1533 -0
  30. data/docs/ADR-015-middleware-order.md +1061 -0
  31. data/docs/ADR-016-self-monitoring-slo.md +1234 -0
  32. data/docs/API-REFERENCE-L28.md +914 -0
  33. data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
  34. data/docs/IMPLEMENTATION_NOTES.md +2804 -0
  35. data/docs/IMPLEMENTATION_PLAN.md +1971 -0
  36. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
  37. data/docs/PLAN.md +148 -0
  38. data/docs/QUICK-START.md +934 -0
  39. data/docs/README.md +296 -0
  40. data/docs/design/00-memory-optimization.md +593 -0
  41. data/docs/guides/MIGRATION-L27-L28.md +692 -0
  42. data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
  43. data/docs/guides/README.md +44 -0
  44. data/docs/prd/01-overview-vision.md +440 -0
  45. data/docs/use_cases/README.md +119 -0
  46. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
  47. data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
  48. data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
  49. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
  50. data/docs/use_cases/UC-005-sentry-integration.md +759 -0
  51. data/docs/use_cases/UC-006-trace-context-management.md +905 -0
  52. data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
  53. data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
  54. data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
  55. data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
  56. data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
  57. data/docs/use_cases/UC-012-audit-trail.md +2301 -0
  58. data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
  59. data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
  60. data/docs/use_cases/UC-015-cost-optimization.md +735 -0
  61. data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
  62. data/docs/use_cases/UC-017-local-development.md +867 -0
  63. data/docs/use_cases/UC-018-testing-events.md +1081 -0
  64. data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
  65. data/docs/use_cases/UC-020-event-versioning.md +708 -0
  66. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
  67. data/docs/use_cases/UC-022-event-registry.md +648 -0
  68. data/docs/use_cases/backlog.md +226 -0
  69. data/e11y.gemspec +76 -0
  70. data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
  71. data/lib/e11y/adapters/audit_encrypted.rb +239 -0
  72. data/lib/e11y/adapters/base.rb +580 -0
  73. data/lib/e11y/adapters/file.rb +224 -0
  74. data/lib/e11y/adapters/in_memory.rb +216 -0
  75. data/lib/e11y/adapters/loki.rb +333 -0
  76. data/lib/e11y/adapters/otel_logs.rb +203 -0
  77. data/lib/e11y/adapters/registry.rb +141 -0
  78. data/lib/e11y/adapters/sentry.rb +230 -0
  79. data/lib/e11y/adapters/stdout.rb +108 -0
  80. data/lib/e11y/adapters/yabeda.rb +370 -0
  81. data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
  82. data/lib/e11y/buffers/base_buffer.rb +40 -0
  83. data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
  84. data/lib/e11y/buffers/ring_buffer.rb +267 -0
  85. data/lib/e11y/buffers.rb +14 -0
  86. data/lib/e11y/console.rb +122 -0
  87. data/lib/e11y/current.rb +48 -0
  88. data/lib/e11y/event/base.rb +894 -0
  89. data/lib/e11y/event/value_sampling_config.rb +84 -0
  90. data/lib/e11y/events/base_audit_event.rb +43 -0
  91. data/lib/e11y/events/base_payment_event.rb +33 -0
  92. data/lib/e11y/events/rails/cache/delete.rb +21 -0
  93. data/lib/e11y/events/rails/cache/read.rb +23 -0
  94. data/lib/e11y/events/rails/cache/write.rb +22 -0
  95. data/lib/e11y/events/rails/database/query.rb +45 -0
  96. data/lib/e11y/events/rails/http/redirect.rb +21 -0
  97. data/lib/e11y/events/rails/http/request.rb +26 -0
  98. data/lib/e11y/events/rails/http/send_file.rb +21 -0
  99. data/lib/e11y/events/rails/http/start_processing.rb +26 -0
  100. data/lib/e11y/events/rails/job/completed.rb +22 -0
  101. data/lib/e11y/events/rails/job/enqueued.rb +22 -0
  102. data/lib/e11y/events/rails/job/failed.rb +22 -0
  103. data/lib/e11y/events/rails/job/scheduled.rb +23 -0
  104. data/lib/e11y/events/rails/job/started.rb +22 -0
  105. data/lib/e11y/events/rails/log.rb +56 -0
  106. data/lib/e11y/events/rails/view/render.rb +23 -0
  107. data/lib/e11y/events.rb +18 -0
  108. data/lib/e11y/instruments/active_job.rb +201 -0
  109. data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
  110. data/lib/e11y/instruments/sidekiq.rb +175 -0
  111. data/lib/e11y/logger/bridge.rb +205 -0
  112. data/lib/e11y/metrics/cardinality_protection.rb +172 -0
  113. data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
  114. data/lib/e11y/metrics/registry.rb +234 -0
  115. data/lib/e11y/metrics/relabeling.rb +226 -0
  116. data/lib/e11y/metrics.rb +102 -0
  117. data/lib/e11y/middleware/audit_signing.rb +174 -0
  118. data/lib/e11y/middleware/base.rb +140 -0
  119. data/lib/e11y/middleware/event_slo.rb +167 -0
  120. data/lib/e11y/middleware/pii_filter.rb +266 -0
  121. data/lib/e11y/middleware/pii_filtering.rb +280 -0
  122. data/lib/e11y/middleware/rate_limiting.rb +214 -0
  123. data/lib/e11y/middleware/request.rb +163 -0
  124. data/lib/e11y/middleware/routing.rb +157 -0
  125. data/lib/e11y/middleware/sampling.rb +254 -0
  126. data/lib/e11y/middleware/slo.rb +168 -0
  127. data/lib/e11y/middleware/trace_context.rb +131 -0
  128. data/lib/e11y/middleware/validation.rb +118 -0
  129. data/lib/e11y/middleware/versioning.rb +132 -0
  130. data/lib/e11y/middleware.rb +12 -0
  131. data/lib/e11y/pii/patterns.rb +90 -0
  132. data/lib/e11y/pii.rb +13 -0
  133. data/lib/e11y/pipeline/builder.rb +155 -0
  134. data/lib/e11y/pipeline/zone_validator.rb +110 -0
  135. data/lib/e11y/pipeline.rb +12 -0
  136. data/lib/e11y/presets/audit_event.rb +65 -0
  137. data/lib/e11y/presets/debug_event.rb +34 -0
  138. data/lib/e11y/presets/high_value_event.rb +51 -0
  139. data/lib/e11y/presets.rb +19 -0
  140. data/lib/e11y/railtie.rb +138 -0
  141. data/lib/e11y/reliability/circuit_breaker.rb +216 -0
  142. data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
  143. data/lib/e11y/reliability/dlq/filter.rb +117 -0
  144. data/lib/e11y/reliability/retry_handler.rb +207 -0
  145. data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
  146. data/lib/e11y/sampling/error_spike_detector.rb +225 -0
  147. data/lib/e11y/sampling/load_monitor.rb +161 -0
  148. data/lib/e11y/sampling/stratified_tracker.rb +92 -0
  149. data/lib/e11y/sampling/value_extractor.rb +82 -0
  150. data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
  151. data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
  152. data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
  153. data/lib/e11y/slo/event_driven.rb +150 -0
  154. data/lib/e11y/slo/tracker.rb +119 -0
  155. data/lib/e11y/version.rb +9 -0
  156. data/lib/e11y.rb +283 -0
  157. metadata +452 -0
@@ -0,0 +1,958 @@
1
+ # ADR-012: Event Evolution & Versioning
2
+
3
+ **Status:** Draft
4
+ **Date:** January 13, 2026
5
+ **Covers:** UC-020 (Event Versioning)
6
+ **Depends On:** ADR-001 (Core), ADR-022 (Event Registry)
7
+
8
+ ---
9
+
10
+ ## 📋 Table of Contents
11
+
12
+ 1. [Context & Problem](#1-context--problem)
13
+ 2. [Solution: Parallel Versions](#2-solution-parallel-versions)
14
+ 3. [Naming Convention](#3-naming-convention)
15
+ 4. [Version in Payload](#4-version-in-payload)
16
+ 5. [Schema Evolution Guidelines](#5-schema-evolution-guidelines)
17
+ 6. [Event Registry Integration](#6-event-registry-integration)
18
+ 7. [Migration Strategy](#7-migration-strategy)
19
+ 8. [Schema Migrations and DLQ Replay (C15 Resolution)](#8-schema-migrations-and-dlq-replay-c15-resolution) ⚠️
20
+ 9. [Trade-offs](#9-trade-offs)
21
+ 10. [Summary](#10-summary)
22
+
23
+ ---
24
+
25
+ ## 1. Context & Problem
26
+
27
+ ### 1.1. When Do You Need Versioning?
28
+
29
+ **90% of changes DON'T need versioning:**
30
+
31
+ ```ruby
32
+ # ✅ Just add optional field - NO versioning!
33
+ optional(:currency).filled(:string)
34
+ ```
35
+
36
+ **10% of changes DO need versioning:**
37
+
38
+ ```ruby
39
+ # ❌ Adding REQUIRED field breaks old code
40
+ required(:currency).filled(:string) # Old code doesn't have this!
41
+
42
+ # ✅ Solution: Create V2
43
+ class Events::OrderPaidV2 < E11y::Event::Base
44
+ schema do
45
+ required(:currency).filled(:string)
46
+ end
47
+ end
48
+ ```
49
+
50
+ ### 1.2. Architecture Decision: Optional Middleware
51
+
52
+ **Versioning is opt-in (not everyone needs it):**
53
+
54
+ ```ruby
55
+ # config/initializers/e11y.rb
56
+
57
+ # ✅ Enable versioning middleware (optional)
58
+ E11y.configure do |config|
59
+ config.middleware.use E11y::Middleware::Versioning
60
+ end
61
+
62
+ # Result: Adds `v:` field to events (only if version > 1)
63
+ {
64
+ event_name: "Events::OrderPaidV2",
65
+ v: 2, # Added by middleware
66
+ payload: { ... }
67
+ }
68
+ ```
69
+
70
+ **Benefits:**
71
+ - ✅ **Opt-in:** Only enabled if you need it
72
+ - ✅ **Zero overhead:** If disabled, no performance cost
73
+ - ✅ **Clean separation:** Versioning logic in middleware, not in Base class
74
+
75
+ ### 1.3. Core Principles
76
+
77
+ 1. **Version ONLY for breaking changes** (add/remove required field, change type)
78
+ 2. **No automatic migration** (just keep V1 and V2 alive in parallel)
79
+ 3. **Version in payload only if > 1** (reduces noise for V1 events)
80
+ 4. **Gradual rollout** (deploy V2, update code, delete V1)
81
+
82
+ ---
83
+
84
+ ## 2. Solution: Parallel Versions
85
+
86
+ ### 2.1. Core Concept
87
+
88
+ **Two versions live in parallel:**
89
+
90
+ ```ruby
91
+ # app/events/order_paid.rb
92
+ class Events::OrderPaid < E11y::Event::Base
93
+ schema do
94
+ required(:order_id).filled(:string)
95
+ required(:amount).filled(:float)
96
+ end
97
+ end
98
+
99
+ # app/events/order_paid_v2.rb
100
+ class Events::OrderPaidV2 < E11y::Event::Base
101
+ schema do
102
+ required(:order_id).filled(:string)
103
+ required(:amount).filled(:float)
104
+ required(:currency).filled(:string) # NEW
105
+ end
106
+ end
107
+
108
+ # Old code (unchanged)
109
+ Events::OrderPaid.track(order_id: '123', amount: 99.99)
110
+
111
+ # New code (updated)
112
+ Events::OrderPaidV2.track(order_id: '123', amount: 99.99, currency: 'USD')
113
+
114
+ # Both work! No migration needed!
115
+ ```
116
+
117
+ ### 2.2. Gradual Rollout
118
+
119
+ ```ruby
120
+ # === Phase 1: Deploy V2 (Week 1) ===
121
+ # - Add OrderPaidV2 class
122
+ # - Keep OrderPaid class (don't delete!)
123
+
124
+ # === Phase 2: Update Code (Week 2-4) ===
125
+ # controllers/orders_controller.rb
126
+ def create
127
+ # Old code (still works)
128
+ # Events::OrderPaid.track(...)
129
+
130
+ # New code (updated)
131
+ Events::OrderPaidV2.track(
132
+ order_id: order.id,
133
+ amount: order.amount,
134
+ currency: order.currency
135
+ )
136
+ end
137
+
138
+ # === Phase 3: Monitor Usage (Week 5) ===
139
+ # Check metrics:
140
+ E11y::Metrics.get('e11y.events_tracked_total', event_name: 'Events::OrderPaid')
141
+ # => 0 (no longer used)
142
+
143
+ # === Phase 4: Delete V1 (Week 6) ===
144
+ # Delete app/events/order_paid.rb (V1 class)
145
+ # Keep OrderPaidV2 as the only version
146
+ ```
147
+
148
+ ---
149
+
150
+ ## 3. Naming Convention
151
+
152
+ ### 3.1. Version from Class Name
153
+
154
+ **Rule:** Version number is implicit from class name.
155
+
156
+ ```ruby
157
+ # V1: No suffix (implicit version 1)
158
+ class Events::OrderPaid < E11y::Event::Base
159
+ # Version 1 (extracted from class name: OrderPaid → v1)
160
+ end
161
+
162
+ # V2: "V2" suffix (explicit version 2)
163
+ class Events::OrderPaidV2 < E11y::Event::Base
164
+ # Version 2 (extracted from class name: OrderPaidV2 → v2)
165
+ end
166
+
167
+ # V3: "V3" suffix (explicit version 3)
168
+ class Events::OrderPaidV3 < E11y::Event::Base
169
+ # Version 3 (extracted from class name: OrderPaidV3 → v3)
170
+ end
171
+ ```
172
+
173
+ ### 3.2. Version Extraction Logic
174
+
175
+ ```ruby
176
+ # lib/e11y/versioning/version_extractor.rb
177
+ module E11y
178
+ module Versioning
179
+ class VersionExtractor
180
+ # Extract version number from class name
181
+ def self.extract_version(class_name)
182
+ # "Events::OrderPaidV2" → 2
183
+ # "Events::OrderPaid" → 1
184
+ # "Events::OrderPaidV10" → 10
185
+
186
+ if class_name =~ /V(\d+)$/
187
+ $1.to_i
188
+ else
189
+ 1 # No suffix = V1
190
+ end
191
+ end
192
+
193
+ # Extract base name (without version)
194
+ def self.extract_base_name(class_name)
195
+ # "Events::OrderPaidV2" → "Events::OrderPaid"
196
+ # "Events::OrderPaid" → "Events::OrderPaid"
197
+
198
+ class_name.sub(/V\d+$/, '')
199
+ end
200
+ end
201
+ end
202
+ end
203
+ ```
204
+
205
+ ---
206
+
207
+ ## 4. Version in Payload
208
+
209
+ ### 4.1. Middleware Implementation
210
+
211
+ **Versioning as optional middleware:**
212
+
213
+ ```ruby
214
+ # lib/e11y/middleware/versioning.rb
215
+ module E11y
216
+ module Middleware
217
+ # Optional middleware to normalize event names and add version field
218
+ #
219
+ # Usage:
220
+ # E11y.configure do |config|
221
+ # config.middleware.use E11y::Middleware::Versioning
222
+ # end
223
+ class Versioning
224
+ def call(event_data)
225
+ class_name = event_data[:event_name]
226
+
227
+ # Extract version from class name
228
+ version = extract_version(class_name)
229
+
230
+ # Normalize event_name to base name (without version suffix)
231
+ # "Events::OrderPaidV2" → "Events::OrderPaid"
232
+ event_data[:event_name] = extract_base_name(class_name)
233
+
234
+ # Only add `v:` field if version > 1 (reduce noise)
235
+ event_data[:v] = version if version > 1
236
+
237
+ # Pass to next middleware
238
+ yield event_data
239
+ end
240
+
241
+ private
242
+
243
+ def extract_version(class_name)
244
+ # "Events::OrderPaidV2" → 2
245
+ # "Events::OrderPaid" → 1
246
+ class_name =~ /V(\d+)$/ ? $1.to_i : 1
247
+ end
248
+
249
+ def extract_base_name(class_name)
250
+ # "Events::OrderPaidV2" → "Events::OrderPaid"
251
+ # "Events::OrderPaid" → "Events::OrderPaid"
252
+ class_name.sub(/V\d+$/, '')
253
+ end
254
+ end
255
+ end
256
+ end
257
+ ```
258
+
259
+ ### 4.2. Configuration & Middleware Order
260
+
261
+ **Versioning MUST be last middleware (before adapters):**
262
+
263
+ ```ruby
264
+ # config/initializers/e11y.rb
265
+
266
+ E11y.configure do |config|
267
+ # Default middleware stack (in order):
268
+ config.middleware.use E11y::Middleware::TraceContext # 1. Add trace_id
269
+ config.middleware.use E11y::Middleware::SchemaValidation # 2. Validate schema
270
+ config.middleware.use E11y::Middleware::PIIFiltering # 3. Filter PII
271
+ config.middleware.use E11y::Middleware::RateLimiting # 4. Check limits
272
+ config.middleware.use E11y::Middleware::AdaptiveSampling # 5. Sample
273
+
274
+ # ✅ Versioning LAST (normalize event_name before adapters)
275
+ config.middleware.use E11y::Middleware::Versioning # 6. Normalize
276
+
277
+ # Then: adapters receive normalized event_name
278
+ end
279
+ ```
280
+
281
+ **Why versioning must be last?**
282
+
283
+ ```ruby
284
+ # ✅ Correct order (versioning last):
285
+ 1. Validation: Uses Events::OrderPaidV2 schema ✅
286
+ 2. PII Filtering: Uses Events::OrderPaidV2 rules ✅
287
+ 3. Rate Limiting: Uses Events::OrderPaidV2 limits ✅
288
+ 4. Sampling: Uses Events::OrderPaidV2 config ✅
289
+ 5. Versioning: Normalize to Events::OrderPaid ✅
290
+ 6. Adapters: Receive normalized name (easy queries) ✅
291
+
292
+ # ❌ Wrong order (versioning first):
293
+ 1. Versioning: Normalize to Events::OrderPaid
294
+ 2. Validation: Can't find Events::OrderPaid schema (was V2!) ❌
295
+ 3. PII Filtering: Uses wrong V1 rules (needs V2!) ❌
296
+ 4. Rate Limiting: Uses wrong V1 limits (needs V2!) ❌
297
+ ```
298
+
299
+ **When to enable:**
300
+ - ✅ You have multiple event versions (OrderPaid, OrderPaidV2)
301
+ - ✅ Need to track version adoption in analytics
302
+ - ✅ Need to differentiate versions in Grafana/Loki
303
+
304
+ **When to disable (default):**
305
+ - ✅ No versioned events yet (all V1)
306
+ - ✅ Don't need version tracking
307
+ - ✅ Want zero overhead
308
+
309
+ ### 4.3. Payload Examples
310
+
311
+ ```ruby
312
+ # V1 event (no `v:` field)
313
+ Events::OrderPaid.track(order_id: '123', amount: 99.99)
314
+
315
+ # Result:
316
+ {
317
+ event_name: "Events::OrderPaid", # ✅ Base name (without version)
318
+ payload: { order_id: '123', amount: 99.99 },
319
+ timestamp: "2026-01-13T10:00:00Z",
320
+ trace_id: "trace-abc123"
321
+ # No `v:` field (version 1 implicit)
322
+ }
323
+
324
+ # V2 event (with `v:` field)
325
+ Events::OrderPaidV2.track(order_id: '123', amount: 99.99, currency: 'USD')
326
+
327
+ # Result:
328
+ {
329
+ event_name: "Events::OrderPaid", # ✅ Same base name (V2 suffix removed!)
330
+ v: 2, # ✅ Version in separate field
331
+ payload: { order_id: '123', amount: 99.99, currency: 'USD' },
332
+ timestamp: "2026-01-13T10:00:00Z",
333
+ trace_id: "trace-abc123"
334
+ }
335
+ ```
336
+
337
+ **Key Insight:** `event_name` is **normalized to base name** (without version suffix).
338
+
339
+ - ✅ **Same `event_name`** for all versions → easy to query
340
+ - ✅ **Version in `v:` field** → easy to filter
341
+ - ✅ **Semantically correct** → it's the same event, just different schema
342
+
343
+ ### 4.4. Querying Events by Version
344
+
345
+ **Loki queries (simple!):**
346
+
347
+ ```logql
348
+ # All OrderPaid events (both V1 and V2)
349
+ {event_name="Events::OrderPaid"}
350
+
351
+ # Only V1 events
352
+ {event_name="Events::OrderPaid"} | json | v != "2"
353
+
354
+ # Only V2 events
355
+ {event_name="Events::OrderPaid"} | json | v == "2"
356
+
357
+ # ✅ No need for: {event_name=~"Events::OrderPaid|Events::OrderPaidV2"}
358
+ ```
359
+
360
+ **Prometheus metrics:**
361
+
362
+ ```promql
363
+ # Total events by version
364
+ sum by(event_name, v) (rate(e11y_events_total[5m]))
365
+
366
+ # V1 vs V2 adoption rate
367
+ sum(rate(e11y_events_total{event_name="Events::OrderPaid", v="2"}[5m]))
368
+ /
369
+ sum(rate(e11y_events_total{event_name="Events::OrderPaid"}[5m]))
370
+ * 100
371
+
372
+ # Result: "75% of OrderPaid events are now V2"
373
+ ```
374
+
375
+ **Grafana dashboard:**
376
+
377
+ ```sql
378
+ -- Single panel for all versions (with version breakdown)
379
+ SELECT
380
+ event_name,
381
+ COALESCE(v, 1) as version, -- NULL = V1
382
+ COUNT(*) as count
383
+ FROM events
384
+ WHERE event_name = 'Events::OrderPaid'
385
+ GROUP BY event_name, version
386
+ ORDER BY version
387
+ ```
388
+
389
+ ### 4.5. Why Not Always Include `v:`?
390
+
391
+ **Reasons:**
392
+ 1. ✅ **Reduce noise:** 90% of events will be V1 (no versions needed)
393
+ 2. ✅ **Backward compatible:** Existing consumers don't expect `v:` field
394
+ 3. ✅ **Storage savings:** One less field per event (~5-10 bytes)
395
+ 4. ✅ **Implicit V1:** If no `v:` field → assume V1
396
+
397
+ **When to use `v:`:**
398
+ - ✅ Track version adoption rate (V1 vs V2)
399
+ - ✅ Debug: "Which version caused this issue?"
400
+ - ✅ Analytics: Compare behavior between versions
401
+
402
+ ---
403
+
404
+ ## 5. Schema Evolution Guidelines
405
+
406
+ ### 5.1. Non-Breaking Changes (NO versioning needed!)
407
+
408
+ **Pattern 1: Add Optional Field**
409
+
410
+ ```ruby
411
+ # Before
412
+ class Events::OrderPaid < E11y::Event::Base
413
+ schema do
414
+ required(:order_id).filled(:string)
415
+ required(:amount).filled(:float)
416
+ end
417
+ end
418
+
419
+ # After (NO V2 needed!)
420
+ class Events::OrderPaid < E11y::Event::Base
421
+ schema do
422
+ required(:order_id).filled(:string)
423
+ required(:amount).filled(:float)
424
+ optional(:currency).filled(:string) # ✅ Just add it!
425
+ end
426
+ end
427
+
428
+ # Old code still works:
429
+ Events::OrderPaid.track(order_id: '123', amount: 99.99)
430
+
431
+ # New code uses new field:
432
+ Events::OrderPaid.track(order_id: '123', amount: 99.99, currency: 'USD')
433
+ ```
434
+
435
+ **Pattern 2: Add Enum Value**
436
+
437
+ ```ruby
438
+ # Before
439
+ class Events::OrderStatusChanged < E11y::Event::Base
440
+ schema do
441
+ required(:order_id).filled(:string)
442
+ required(:status).filled(:string) # 'pending', 'paid', 'shipped'
443
+ end
444
+ end
445
+
446
+ # After (NO V2 needed!)
447
+ class Events::OrderStatusChanged < E11y::Event::Base
448
+ schema do
449
+ required(:order_id).filled(:string)
450
+ required(:status).filled(:string) # 'pending', 'paid', 'shipped', 'delivered'
451
+ end
452
+ end
453
+
454
+ # ✅ Old consumers ignore new 'delivered' status
455
+ ```
456
+
457
+ **Pattern 3: Deprecate Field (keep both)**
458
+
459
+ ```ruby
460
+ # Before
461
+ class Events::UserRegistered < E11y::Event::Base
462
+ schema do
463
+ required(:user_id).filled(:string)
464
+ required(:phone).filled(:string)
465
+ end
466
+ end
467
+
468
+ # After (NO V2 needed!)
469
+ class Events::UserRegistered < E11y::Event::Base
470
+ schema do
471
+ required(:user_id).filled(:string)
472
+ optional(:phone).filled(:string) # @deprecated Use phone_number
473
+ optional(:phone_number).filled(:string) # New field
474
+ end
475
+ end
476
+
477
+ # ✅ Both fields exist, old code still works
478
+ ```
479
+
480
+ ### 5.2. Breaking Changes (Versioning required!)
481
+
482
+ **Pattern 1: Add Required Field**
483
+
484
+ ```ruby
485
+ # V1
486
+ class Events::OrderPaid < E11y::Event::Base
487
+ schema do
488
+ required(:order_id).filled(:string)
489
+ required(:amount).filled(:float)
490
+ end
491
+ end
492
+
493
+ # V2: Add REQUIRED field
494
+ class Events::OrderPaidV2 < E11y::Event::Base
495
+ schema do
496
+ required(:order_id).filled(:string)
497
+ required(:amount).filled(:float)
498
+ required(:currency).filled(:string) # ✅ Required!
499
+ end
500
+ end
501
+
502
+ # Why V2? Old code can't provide 'currency' → breaks!
503
+ ```
504
+
505
+ **Pattern 2: Remove Required Field**
506
+
507
+ ```ruby
508
+ # V1
509
+ class Events::UserRegistered < E11y::Event::Base
510
+ schema do
511
+ required(:user_id).filled(:string)
512
+ required(:email).filled(:string)
513
+ required(:phone).filled(:string)
514
+ end
515
+ end
516
+
517
+ # V2: GDPR compliance - remove phone
518
+ class Events::UserRegisteredV2 < E11y::Event::Base
519
+ schema do
520
+ required(:user_id).filled(:string)
521
+ required(:email).filled(:string)
522
+ # phone REMOVED
523
+ end
524
+ end
525
+
526
+ # Why V2? Consumers expect 'phone' → breaks if removed!
527
+ ```
528
+
529
+ **Pattern 3: Change Field Type**
530
+
531
+ ```ruby
532
+ # V1: Amount in dollars (float)
533
+ class Events::PaymentProcessed < E11y::Event::Base
534
+ schema do
535
+ required(:amount).filled(:float) # 99.99
536
+ end
537
+ end
538
+
539
+ # V2: Amount in cents (integer)
540
+ class Events::PaymentProcessedV2 < E11y::Event::Base
541
+ schema do
542
+ required(:amount_cents).filled(:integer) # 9999
543
+ end
544
+ end
545
+
546
+ # Why V2? Type changed, field renamed → breaks consumers!
547
+ ```
548
+
549
+ ### 5.3. When to Version: Decision Matrix
550
+
551
+ | Change | Breaking? | Version? | Example |
552
+ |--------|-----------|----------|---------|
553
+ | Add **optional** field | ❌ No | ❌ No | `optional(:currency)` |
554
+ | Add enum value | ❌ No | ❌ No | `'delivered'` status |
555
+ | Deprecate field (keep) | ❌ No | ❌ No | Keep `phone` + add `phone_number` |
556
+ | Add **required** field | ✅ Yes | ✅ Yes | `required(:currency)` |
557
+ | Remove **required** field | ✅ Yes | ✅ Yes | Remove `phone` (GDPR) |
558
+ | Change field type | ✅ Yes | ✅ Yes | `float` → `integer` |
559
+ | Rename field | ✅ Yes | ✅ Yes | `amount` → `amount_cents` |
560
+
561
+ ---
562
+
563
+ ## 6. Event Registry Integration
564
+
565
+ ### 6.1. Auto-Registration
566
+
567
+ ```ruby
568
+ # lib/e11y/event/base.rb
569
+ module E11y
570
+ module Event
571
+ class Base
572
+ # Auto-register on class inheritance
573
+ def self.inherited(subclass)
574
+ super
575
+
576
+ # Extract version from class name
577
+ version = E11y::Versioning::VersionExtractor.extract_version(subclass.name)
578
+ base_name = E11y::Versioning::VersionExtractor.extract_base_name(subclass.name)
579
+
580
+ # Register in registry
581
+ E11y::Registry.register(
582
+ base_name: base_name,
583
+ version: version,
584
+ event_class: subclass
585
+ )
586
+ end
587
+ end
588
+ end
589
+ end
590
+ ```
591
+
592
+ ### 6.2. Registry API
593
+
594
+ ```ruby
595
+ # Get all versions of an event
596
+ E11y::Registry.all_versions('Events::OrderPaid')
597
+ # => [
598
+ # { version: 1, class: Events::OrderPaid, active: true },
599
+ # { version: 2, class: Events::OrderPaidV2, active: true }
600
+ # ]
601
+
602
+ # Get latest version
603
+ E11y::Registry.latest_version('Events::OrderPaid')
604
+ # => { version: 2, class: Events::OrderPaidV2 }
605
+
606
+ # Get specific version
607
+ E11y::Registry.get_version('Events::OrderPaid', 1)
608
+ # => { version: 1, class: Events::OrderPaid }
609
+
610
+ # List all events with multiple versions
611
+ E11y::Registry.versioned_events
612
+ # => ['Events::OrderPaid', 'Events::PaymentProcessed']
613
+ ```
614
+
615
+ ### 6.3. Metrics Integration
616
+
617
+ ```ruby
618
+ # Track version usage
619
+ E11y::Metrics.increment('e11y.events_tracked_total', {
620
+ event_name: 'Events::OrderPaid',
621
+ version: 1
622
+ })
623
+
624
+ # Grafana query:
625
+ # sum by(event_name, version) (rate(e11y_events_tracked_total[5m]))
626
+
627
+ # See V1 vs V2 adoption:
628
+ # Events::OrderPaid v1: 100 req/min (old code)
629
+ # Events::OrderPaidV2 v2: 50 req/min (new code)
630
+ ```
631
+
632
+ ---
633
+
634
+ ## 7. Migration Strategy
635
+
636
+ ### 7.1. Phase 1: Add V2 (Keep V1)
637
+
638
+ ```ruby
639
+ # Week 1: Deploy V2
640
+
641
+ # app/events/order_paid_v2.rb (NEW FILE)
642
+ class Events::OrderPaidV2 < E11y::Event::Base
643
+ schema do
644
+ required(:order_id).filled(:string)
645
+ required(:amount).filled(:float)
646
+ required(:currency).filled(:string) # NEW
647
+ end
648
+ end
649
+
650
+ # app/events/order_paid.rb (KEEP EXISTING)
651
+ class Events::OrderPaid < E11y::Event::Base
652
+ schema do
653
+ required(:order_id).filled(:string)
654
+ required(:amount).filled(:float)
655
+ end
656
+ end
657
+
658
+ # ✅ Both classes deployed, no breaking changes
659
+ ```
660
+
661
+ ### 7.2. Phase 2: Update Code Gradually
662
+
663
+ ```ruby
664
+ # Week 2-4: Update calling code
665
+
666
+ # controllers/orders_controller.rb
667
+ def create
668
+ order = Order.create!(order_params)
669
+
670
+ # ❌ Old code (before)
671
+ # Events::OrderPaid.track(
672
+ # order_id: order.id,
673
+ # amount: order.amount
674
+ # )
675
+
676
+ # ✅ New code (after)
677
+ Events::OrderPaidV2.track(
678
+ order_id: order.id,
679
+ amount: order.amount,
680
+ currency: order.currency || 'USD'
681
+ )
682
+ end
683
+
684
+ # Update in batches:
685
+ # - Week 2: Update 25% of code
686
+ # - Week 3: Update 50% of code
687
+ # - Week 4: Update 100% of code
688
+ ```
689
+
690
+ ### 7.3. Phase 3: Monitor V1 Usage
691
+
692
+ ```ruby
693
+ # Week 5: Check if V1 still used
694
+
695
+ # Grafana query:
696
+ sum(rate(e11y_events_tracked_total{event_name="Events::OrderPaid"}[1d]))
697
+ # Result: 0 (no V1 events in last 24h)
698
+
699
+ # Or via Rails console:
700
+ E11y::Registry.version_usage('Events::OrderPaid')
701
+ # => {
702
+ # 1 => { count: 0, last_tracked_at: nil },
703
+ # 2 => { count: 1234, last_tracked_at: 5.minutes.ago }
704
+ # }
705
+ ```
706
+
707
+ ### 7.4. Phase 4: Delete V1 Class
708
+
709
+ ```ruby
710
+ # Week 6: Delete V1 class
711
+
712
+ # ✅ Delete app/events/order_paid.rb
713
+ # ✅ V1 class no longer exists
714
+ # ✅ Only OrderPaidV2 remains
715
+
716
+ # Optionally: Rename V2 → V1
717
+ # git mv app/events/order_paid_v2.rb app/events/order_paid.rb
718
+ # class Events::OrderPaid < E11y::Event::Base
719
+ # # This is now V1 again (for next iteration)
720
+ # end
721
+ ```
722
+
723
+ ### 7.5. DLQ Replay During Migration
724
+
725
+ ```ruby
726
+ # === Scenario: V1 event in DLQ during migration ===
727
+
728
+ # Phase 2: V1 and V2 both exist
729
+ # DLQ has V1 event: { event_name: 'Events::OrderPaid', payload: {...} }
730
+
731
+ # Replay:
732
+ event_class = 'Events::OrderPaid'.constantize # ✅ Class still exists!
733
+ event_class.track(dlq_event[:payload]) # ✅ Just replay as V1
734
+
735
+ # Phase 4: V1 deleted
736
+ # If DLQ still has V1 events → ❌ Replay fails
737
+ # Solution: Wait for DLQ to empty before deleting V1
738
+ ```
739
+
740
+ ---
741
+
742
+ ## 8. Schema Migrations and DLQ Replay (C15 Resolution) ⚠️
743
+
744
+ **Reference:** [CONFLICT-ANALYSIS.md - C15: Event Versioning × DLQ Replay](../researches/CONFLICT-ANALYSIS.md#c15-event-versioning--dlq-replay)
745
+
746
+ ### Decision: User Responsibility (Not an E11y Problem)
747
+
748
+ **E11y Position:**
749
+
750
+ > Schema migrations during DLQ replay are **NOT an E11y responsibility**. This is an **operational edge case** that occurs only when DLQ is poorly managed (events sitting for weeks between deployments).
751
+
752
+ **Why this is NOT a problem:**
753
+
754
+ 1. **DLQ is for transient failures** (minutes/hours, not weeks!)
755
+ - Loki down 30 seconds → retry → success
756
+ - Loki down 2 hours → DLQ → replay after fix (same deployment, same schema!)
757
+
758
+ 2. **DLQ should be cleared between deployments**
759
+ - Replay DLQ before deploying schema changes
760
+ - If DLQ has events sitting for **weeks** → operational failure, not E11y problem
761
+
762
+ 3. **Real-world timeline:**
763
+ ```
764
+ 09:00 - Loki down (transient failure)
765
+ 09:02 - Events go to DLQ
766
+ 09:05 - Loki back online
767
+ 09:10 - DLQ replay ✅ (same schema, same deployment)
768
+ ```
769
+
770
+ **NOT:**
771
+ ```
772
+ Week 1 - Events in DLQ
773
+ Week 2 - Deploy new code with schema changes
774
+ Week 3 - Replay DLQ ❌ (BAD OPERATIONS!)
775
+ ```
776
+
777
+ **If you MUST replay old-schema events (edge case):**
778
+
779
+ This is **app-specific** and requires **user-implemented** migration logic:
780
+
781
+ ```ruby
782
+ # Option 1: Lenient validation (skip schema validation for replayed events)
783
+ E11y.configure do |config|
784
+ config.dlq_replay do
785
+ skip_validation true # Allow old schemas
786
+ end
787
+ end
788
+
789
+ # Option 2: Transform old events before replay (user code)
790
+ E11y::DeadLetterQueue.replay do |old_event|
791
+ # User implements migration logic
792
+ if old_event[:event_version] == 1
793
+ # Transform v1 → v2
794
+ {
795
+ order_id: old_event[:order_id],
796
+ amount_cents: (old_event[:amount] * 100).to_i # amount → amount_cents
797
+ }
798
+ else
799
+ old_event
800
+ end
801
+ end
802
+ ```
803
+
804
+ **E11y provides:**
805
+ - ✅ DLQ replay mechanism (UC-021)
806
+ - ✅ Event version metadata (stored with event)
807
+ - ✅ Validation bypass option (`skip_validation`)
808
+
809
+ **User provides:**
810
+ - 🔧 Migration logic (app-specific transformations)
811
+ - 🔧 Operational discipline (clear DLQ between deployments)
812
+
813
+ **Trade-off:**
814
+ - ✅ **Pro:** E11y stays simple, no complex migration framework
815
+ - ✅ **Pro:** User has full control over migration logic
816
+ - ⚠️ **Con:** User must implement migrations for edge cases (poorly managed DLQ)
817
+
818
+ ---
819
+
820
+ ## 9. Trade-offs
821
+
822
+ ### 9.1. Key Decisions
823
+
824
+ | Decision | Pro | Con | Rationale |
825
+ |----------|-----|-----|-----------|
826
+ | **Normalize event_name** | Same name for all versions, easy queries | Need to extract base name | Semantically correct |
827
+ | **Optional Middleware** | Opt-in, zero overhead if disabled | Need to enable manually | Not everyone needs versioning |
828
+ | **No auto-migration** | Simple, predictable | Manual code updates | YAGNI - not needed in practice |
829
+ | **Parallel versions** | Zero downtime, gradual rollout | Multiple classes to maintain | Standard practice |
830
+ | **`v:` only if > 1** | Reduces noise, storage | Need to infer V1 | 90% of events are V1 |
831
+ | **Version from class name** | Single source of truth | Can't rename classes | Consistent, explicit |
832
+ | **No dual emission** | Simple | Need to update consumers | Consumers are under our control |
833
+ | **DLQ replay with old schemas: User responsibility (C15)** ⚠️ | Simple gem, no migration framework | Edge case if DLQ poorly managed | Operational discipline > gem complexity |
834
+
835
+ ### 9.2. Alternatives Considered
836
+
837
+ **A) event_name = class name (with version suffix)**
838
+ ```ruby
839
+ # ❌ REJECTED
840
+ {
841
+ event_name: "Events::OrderPaidV2", # Different name for each version
842
+ v: 2
843
+ }
844
+ ```
845
+ - ❌ Need to query multiple names: `OrderPaid OR OrderPaidV2`
846
+ - ❌ Metrics split across different `event_name` labels
847
+ - ❌ Semantically wrong: it's the same event, different schema
848
+ - ✅ **CHOSEN: Normalize to base name** (same `event_name` for all versions)
849
+
850
+ **B) Built-in versioning (always enabled)**
851
+ ```ruby
852
+ # ❌ REJECTED: Always adds v: field in Base class
853
+ E11y::Event::Base # Always extracts version
854
+ ```
855
+ - ❌ Performance overhead for apps without versioning
856
+ - ❌ Not everyone needs versioning
857
+ - ✅ **CHOSEN: Optional middleware** (zero overhead if disabled)
858
+
859
+ **B) Auto-migrate V1→V2**
860
+ ```ruby
861
+ # ❌ REJECTED
862
+ def self.upgrade_from_v1(v1_payload)
863
+ v1_payload.merge(currency: 'USD')
864
+ end
865
+ ```
866
+ - ❌ Overcomplicated (chain migration, metadata storage)
867
+ - ❌ Not needed (just keep V1 alive during migration)
868
+ - ❌ Edge cases (lossy migration, impossible migration)
869
+
870
+ **C) Always include `v:` field**
871
+ ```ruby
872
+ # ❌ REJECTED
873
+ { v: 1, payload: {...} } # Even for V1
874
+ ```
875
+ - ❌ Noise for 90% of events
876
+ - ❌ Storage overhead (~5 bytes * billions of events)
877
+ - ❌ Breaking change for existing consumers
878
+
879
+ **D) Dual emission (V1 + V2)**
880
+ ```ruby
881
+ # ❌ REJECTED
882
+ def self.emit_legacy_formats
883
+ { 1 => { adapters: [:loki], downgrade: proc {...} } }
884
+ end
885
+ ```
886
+ - ❌ Complex (downgrade logic)
887
+ - ❌ Not needed (just update Grafana dashboard)
888
+ - ❌ Storage overhead (2x events)
889
+
890
+ ---
891
+
892
+ ## 10. Summary
893
+
894
+ ### 9.1. Core Principles
895
+
896
+ **1. Normalize event_name to base name**
897
+ - ✅ `Events::OrderPaidV2` → `event_name: "Events::OrderPaid"`
898
+ - ✅ Same name for all versions → easy to query
899
+ - ✅ Version in separate `v:` field
900
+
901
+ **2. Version ONLY for breaking changes**
902
+ - ✅ Add required field → V2
903
+ - ❌ Add optional field → Stay on V1
904
+
905
+ **3. Parallel versions for gradual rollout**
906
+ - ✅ Deploy V2, keep V1
907
+ - ✅ Update code gradually
908
+ - ✅ Delete V1 when no longer used
909
+
910
+ **4. Version in payload only if > 1**
911
+ - ✅ V1: No `v:` field (implicit)
912
+ - ✅ V2+: Add `v:` field (explicit)
913
+
914
+ **5. No automatic migration**
915
+ - ❌ No auto-upgrade V1→V2
916
+ - ✅ Just keep both classes alive
917
+
918
+ ### 9.2. Best Practices
919
+
920
+ **✅ DO:**
921
+ 1. Add optional fields freely (no versioning)
922
+ 2. Think twice before adding required fields (forces V2)
923
+ 3. Keep V1 alive during migration
924
+ 4. Monitor V1 usage before deleting
925
+ 5. Use Registry to track all versions
926
+
927
+ **❌ DON'T:**
928
+ 1. Delete V1 while DLQ has V1 events
929
+ 2. Version for non-breaking changes
930
+ 3. Auto-migrate (keep it simple)
931
+
932
+ ### 9.3. Implementation Checklist
933
+
934
+ **Phase 1: Core (Week 1)**
935
+ - [ ] Implement `VersionExtractor` (extract version from class name)
936
+ - [ ] Add `v:` field to payload (only if v > 1)
937
+ - [ ] Auto-register versions in Registry
938
+ - [ ] Add version metrics
939
+
940
+ **Phase 2: Tooling (Week 2)**
941
+ - [ ] Add `E11y::Registry.all_versions(base_name)`
942
+ - [ ] Add `E11y::Registry.version_usage(base_name)`
943
+ - [ ] Add Grafana dashboard for version adoption
944
+ - [ ] Add RSpec helpers for versioning
945
+
946
+ **Phase 3: Documentation (Week 3)**
947
+ - [ ] Document when to version
948
+ - [ ] Document migration strategy
949
+ - [ ] Add migration checklist
950
+ - [ ] Add examples for common scenarios
951
+
952
+ ---
953
+
954
+ **Status:** ✅ Complete (Simplified)
955
+ **Next:** Implementation
956
+ **Estimated Implementation:** 1 week (not 2 weeks!)
957
+
958
+ **Key Takeaway:** Keep it simple. Parallel versions + gradual rollout is enough. No need for auto-migration magic.