e11y 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +4 -0
  3. data/.rubocop.yml +69 -0
  4. data/CHANGELOG.md +26 -0
  5. data/CODE_OF_CONDUCT.md +64 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +179 -0
  8. data/Rakefile +37 -0
  9. data/benchmarks/run_all.rb +33 -0
  10. data/config/README.md +83 -0
  11. data/config/loki-local-config.yaml +35 -0
  12. data/config/prometheus.yml +15 -0
  13. data/docker-compose.yml +78 -0
  14. data/docs/00-ICP-AND-TIMELINE.md +483 -0
  15. data/docs/01-SCALE-REQUIREMENTS.md +858 -0
  16. data/docs/ADR-001-architecture.md +2617 -0
  17. data/docs/ADR-002-metrics-yabeda.md +1395 -0
  18. data/docs/ADR-003-slo-observability.md +3337 -0
  19. data/docs/ADR-004-adapter-architecture.md +2385 -0
  20. data/docs/ADR-005-tracing-context.md +1372 -0
  21. data/docs/ADR-006-security-compliance.md +4143 -0
  22. data/docs/ADR-007-opentelemetry-integration.md +1385 -0
  23. data/docs/ADR-008-rails-integration.md +1911 -0
  24. data/docs/ADR-009-cost-optimization.md +2993 -0
  25. data/docs/ADR-010-developer-experience.md +2166 -0
  26. data/docs/ADR-011-testing-strategy.md +1836 -0
  27. data/docs/ADR-012-event-evolution.md +958 -0
  28. data/docs/ADR-013-reliability-error-handling.md +2750 -0
  29. data/docs/ADR-014-event-driven-slo.md +1533 -0
  30. data/docs/ADR-015-middleware-order.md +1061 -0
  31. data/docs/ADR-016-self-monitoring-slo.md +1234 -0
  32. data/docs/API-REFERENCE-L28.md +914 -0
  33. data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
  34. data/docs/IMPLEMENTATION_NOTES.md +2804 -0
  35. data/docs/IMPLEMENTATION_PLAN.md +1971 -0
  36. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
  37. data/docs/PLAN.md +148 -0
  38. data/docs/QUICK-START.md +934 -0
  39. data/docs/README.md +296 -0
  40. data/docs/design/00-memory-optimization.md +593 -0
  41. data/docs/guides/MIGRATION-L27-L28.md +692 -0
  42. data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
  43. data/docs/guides/README.md +44 -0
  44. data/docs/prd/01-overview-vision.md +440 -0
  45. data/docs/use_cases/README.md +119 -0
  46. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
  47. data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
  48. data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
  49. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
  50. data/docs/use_cases/UC-005-sentry-integration.md +759 -0
  51. data/docs/use_cases/UC-006-trace-context-management.md +905 -0
  52. data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
  53. data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
  54. data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
  55. data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
  56. data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
  57. data/docs/use_cases/UC-012-audit-trail.md +2301 -0
  58. data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
  59. data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
  60. data/docs/use_cases/UC-015-cost-optimization.md +735 -0
  61. data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
  62. data/docs/use_cases/UC-017-local-development.md +867 -0
  63. data/docs/use_cases/UC-018-testing-events.md +1081 -0
  64. data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
  65. data/docs/use_cases/UC-020-event-versioning.md +708 -0
  66. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
  67. data/docs/use_cases/UC-022-event-registry.md +648 -0
  68. data/docs/use_cases/backlog.md +226 -0
  69. data/e11y.gemspec +76 -0
  70. data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
  71. data/lib/e11y/adapters/audit_encrypted.rb +239 -0
  72. data/lib/e11y/adapters/base.rb +580 -0
  73. data/lib/e11y/adapters/file.rb +224 -0
  74. data/lib/e11y/adapters/in_memory.rb +216 -0
  75. data/lib/e11y/adapters/loki.rb +333 -0
  76. data/lib/e11y/adapters/otel_logs.rb +203 -0
  77. data/lib/e11y/adapters/registry.rb +141 -0
  78. data/lib/e11y/adapters/sentry.rb +230 -0
  79. data/lib/e11y/adapters/stdout.rb +108 -0
  80. data/lib/e11y/adapters/yabeda.rb +370 -0
  81. data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
  82. data/lib/e11y/buffers/base_buffer.rb +40 -0
  83. data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
  84. data/lib/e11y/buffers/ring_buffer.rb +267 -0
  85. data/lib/e11y/buffers.rb +14 -0
  86. data/lib/e11y/console.rb +122 -0
  87. data/lib/e11y/current.rb +48 -0
  88. data/lib/e11y/event/base.rb +894 -0
  89. data/lib/e11y/event/value_sampling_config.rb +84 -0
  90. data/lib/e11y/events/base_audit_event.rb +43 -0
  91. data/lib/e11y/events/base_payment_event.rb +33 -0
  92. data/lib/e11y/events/rails/cache/delete.rb +21 -0
  93. data/lib/e11y/events/rails/cache/read.rb +23 -0
  94. data/lib/e11y/events/rails/cache/write.rb +22 -0
  95. data/lib/e11y/events/rails/database/query.rb +45 -0
  96. data/lib/e11y/events/rails/http/redirect.rb +21 -0
  97. data/lib/e11y/events/rails/http/request.rb +26 -0
  98. data/lib/e11y/events/rails/http/send_file.rb +21 -0
  99. data/lib/e11y/events/rails/http/start_processing.rb +26 -0
  100. data/lib/e11y/events/rails/job/completed.rb +22 -0
  101. data/lib/e11y/events/rails/job/enqueued.rb +22 -0
  102. data/lib/e11y/events/rails/job/failed.rb +22 -0
  103. data/lib/e11y/events/rails/job/scheduled.rb +23 -0
  104. data/lib/e11y/events/rails/job/started.rb +22 -0
  105. data/lib/e11y/events/rails/log.rb +56 -0
  106. data/lib/e11y/events/rails/view/render.rb +23 -0
  107. data/lib/e11y/events.rb +18 -0
  108. data/lib/e11y/instruments/active_job.rb +201 -0
  109. data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
  110. data/lib/e11y/instruments/sidekiq.rb +175 -0
  111. data/lib/e11y/logger/bridge.rb +205 -0
  112. data/lib/e11y/metrics/cardinality_protection.rb +172 -0
  113. data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
  114. data/lib/e11y/metrics/registry.rb +234 -0
  115. data/lib/e11y/metrics/relabeling.rb +226 -0
  116. data/lib/e11y/metrics.rb +102 -0
  117. data/lib/e11y/middleware/audit_signing.rb +174 -0
  118. data/lib/e11y/middleware/base.rb +140 -0
  119. data/lib/e11y/middleware/event_slo.rb +167 -0
  120. data/lib/e11y/middleware/pii_filter.rb +266 -0
  121. data/lib/e11y/middleware/pii_filtering.rb +280 -0
  122. data/lib/e11y/middleware/rate_limiting.rb +214 -0
  123. data/lib/e11y/middleware/request.rb +163 -0
  124. data/lib/e11y/middleware/routing.rb +157 -0
  125. data/lib/e11y/middleware/sampling.rb +254 -0
  126. data/lib/e11y/middleware/slo.rb +168 -0
  127. data/lib/e11y/middleware/trace_context.rb +131 -0
  128. data/lib/e11y/middleware/validation.rb +118 -0
  129. data/lib/e11y/middleware/versioning.rb +132 -0
  130. data/lib/e11y/middleware.rb +12 -0
  131. data/lib/e11y/pii/patterns.rb +90 -0
  132. data/lib/e11y/pii.rb +13 -0
  133. data/lib/e11y/pipeline/builder.rb +155 -0
  134. data/lib/e11y/pipeline/zone_validator.rb +110 -0
  135. data/lib/e11y/pipeline.rb +12 -0
  136. data/lib/e11y/presets/audit_event.rb +65 -0
  137. data/lib/e11y/presets/debug_event.rb +34 -0
  138. data/lib/e11y/presets/high_value_event.rb +51 -0
  139. data/lib/e11y/presets.rb +19 -0
  140. data/lib/e11y/railtie.rb +138 -0
  141. data/lib/e11y/reliability/circuit_breaker.rb +216 -0
  142. data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
  143. data/lib/e11y/reliability/dlq/filter.rb +117 -0
  144. data/lib/e11y/reliability/retry_handler.rb +207 -0
  145. data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
  146. data/lib/e11y/sampling/error_spike_detector.rb +225 -0
  147. data/lib/e11y/sampling/load_monitor.rb +161 -0
  148. data/lib/e11y/sampling/stratified_tracker.rb +92 -0
  149. data/lib/e11y/sampling/value_extractor.rb +82 -0
  150. data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
  151. data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
  152. data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
  153. data/lib/e11y/slo/event_driven.rb +150 -0
  154. data/lib/e11y/slo/tracker.rb +119 -0
  155. data/lib/e11y/version.rb +9 -0
  156. data/lib/e11y.rb +283 -0
  157. metadata +452 -0
@@ -0,0 +1,2127 @@
1
+ # UC-013: High Cardinality Protection
2
+
3
+ **Status:** v1.0 Feature (Critical for Scale)
4
+ **Complexity:** Advanced
5
+ **Setup Time:** 30-60 minutes
6
+ **Target Users:** Engineering Managers, SRE, DevOps, Backend Developers
7
+
8
+ ---
9
+
10
+ ## 📋 Overview
11
+
12
+ ### Problem Statement
13
+
14
+ **The $68,000/month mistake:**
15
+ ```ruby
16
+ # ❌ CATASTROPHIC: Using user_id as metric label
17
+ E11y.configure do |config|
18
+ config.metrics do
19
+ counter_for pattern: 'user.action',
20
+ name: 'user_actions_total',
21
+ tags: [:user_id, :action_type] # ← 💸💸💸
22
+ end
23
+ end
24
+
25
+ # With 100,000 users × 10 action types = 1,000,000 metric series
26
+ # Datadog cost: $68/host × 1000 hosts = $68,000/month
27
+ # Prometheus memory: ~200 bytes/series × 1M = 200 MB per host
28
+ # Query latency: 10x slower due to cardinality explosion
29
+ ```
30
+
31
+ **Real-world impact:**
32
+ - 200 services × 1,000 users × 5 dimensions = **1,000,000 metric series**
33
+ - **Datadog cost: $68,000/month**
34
+ - **Prometheus OOM crashes** (out of memory)
35
+ - **Query timeouts** (PromQL queries take 30+ seconds)
36
+ - **Incident during Black Friday** (metrics system collapsed)
37
+
38
+ ### E11y Solution
39
+
40
+ **4-Layer Defense System + 99% Cost Reduction:**
41
+ ```ruby
42
+ # ✅ SAFE: Aggregate user_id → user_segment
43
+ E11y.configure do |config|
44
+ config.metrics do
45
+ # Layer 1: Denylist (hard block)
46
+ forbidden_labels :user_id, :order_id, :session_id, :trace_id
47
+
48
+ # Layer 2: Safe aggregation
49
+ counter_for pattern: 'user.action',
50
+ name: 'user_actions_total',
51
+ tags: [:user_segment, :action_type], # ← 3 segments × 10 actions = 30 series
52
+ tag_extractors: {
53
+ user_segment: ->(event) {
54
+ user = User.find(event.payload[:user_id])
55
+ user.segment # 'free', 'paid', 'enterprise'
56
+ }
57
+ }
58
+
59
+ # Layer 3: Per-metric limits
60
+ cardinality_limit_for 'user_actions_total', max: 100
61
+
62
+ # Layer 4: Dynamic monitoring
63
+ cardinality_monitoring do
64
+ warn_threshold 0.7 # Alert at 70%
65
+ auto_aggregate true # Auto-fix if exceeded
66
+ end
67
+ end
68
+ end
69
+
70
+ # Result:
71
+ # - 200 services × 10 segments × 5 dimensions = 10,000 series
72
+ # - Datadog cost: $680/month
73
+ # - Savings: $67,320/month (99% reduction) ✅
74
+ ```
75
+
76
+ ---
77
+
78
+ ## 🎯 Event-Level Cardinality Protection (NEW - v1.1)
79
+
80
+ > **🎯 CONTRADICTION_01 Resolution:** Move cardinality config from global initializer to event classes.
81
+
82
+ **Event-level cardinality DSL:**
83
+
84
+ ```ruby
85
+ # app/events/user_action.rb
86
+ module Events
87
+ class UserAction < E11y::Event::Base
88
+ schema do
89
+ required(:user_id).filled(:string)
90
+ required(:action_type).filled(:string)
91
+ required(:user_segment).filled(:string)
92
+ end
93
+
94
+ # ✨ Event-level cardinality protection (right next to schema!)
95
+ metric :counter,
96
+ name: 'user_actions_total',
97
+ tags: [:user_segment, :action_type], # ← Safe labels
98
+ cardinality_limit: 100 # Max 100 series
99
+
100
+ # Forbidden labels (high cardinality)
101
+ forbidden_metric_labels :user_id, :session_id
102
+
103
+ # Safe labels (low cardinality)
104
+ safe_metric_labels :user_segment, :action_type, :status
105
+ end
106
+ end
107
+ ```
108
+
109
+ **Inheritance for cardinality protection:**
110
+
111
+ ```ruby
112
+ # Base class with common cardinality rules
113
+ module Events
114
+ class BaseUserEvent < E11y::Event::Base
115
+ # Common for ALL user events
116
+ forbidden_metric_labels :user_id, :email, :ip_address
117
+ safe_metric_labels :user_segment, :country, :plan
118
+
119
+ # Default cardinality limit
120
+ default_cardinality_limit 100
121
+ end
122
+ end
123
+
124
+ # Inherit from base
125
+ class Events::UserAction < Events::BaseUserEvent
126
+ schema do
127
+ required(:user_id).filled(:string)
128
+ required(:action_type).filled(:string)
129
+ end
130
+
131
+ metric :counter,
132
+ name: 'user_actions_total',
133
+ tags: [:user_segment, :action_type] # ← Uses safe labels
134
+ # ← Inherits: forbidden_metric_labels + safe_metric_labels
135
+ end
136
+
137
+ class Events::UserProfileUpdated < Events::BaseUserEvent
138
+ schema do
139
+ required(:user_id).filled(:string)
140
+ required(:field_name).filled(:string)
141
+ end
142
+
143
+ metric :counter,
144
+ name: 'profile_updates_total',
145
+ tags: [:user_segment, :field_name]
146
+ # ← Inherits: forbidden_metric_labels + safe_metric_labels
147
+ end
148
+ ```
149
+
150
+ **Preset modules for cardinality protection:**
151
+
152
+ ```ruby
153
+ # lib/e11y/presets/metric_safe_event.rb
154
+ module E11y
155
+ module Presets
156
+ module MetricSafeEvent
157
+ extend ActiveSupport::Concern
158
+ included do
159
+ # Common forbidden labels (high cardinality)
160
+ forbidden_metric_labels :user_id, :order_id, :session_id,
161
+ :trace_id, :request_id, :email,
162
+ :ip_address, :uuid
163
+
164
+ # Common safe labels (low cardinality)
165
+ safe_metric_labels :status, :severity, :country,
166
+ :plan, :segment, :method
167
+
168
+ # Default cardinality limit
169
+ default_cardinality_limit 100
170
+
171
+ # Auto-aggregate on limit
172
+ cardinality_monitoring do
173
+ warn_threshold 0.7
174
+ auto_aggregate true
175
+ end
176
+ end
177
+ end
178
+ end
179
+ end
180
+
181
+ # Usage:
182
+ class Events::OrderPlaced < E11y::Event::Base
183
+ include E11y::Presets::MetricSafeEvent # ← Cardinality rules inherited!
184
+
185
+ schema do
186
+ required(:order_id).filled(:string)
187
+ required(:user_id).filled(:string)
188
+ required(:status).filled(:string)
189
+ end
190
+
191
+ metric :counter,
192
+ name: 'orders_total',
193
+ tags: [:status] # ← Only safe labels (status)
194
+ # ← Inherits: forbidden_metric_labels (user_id blocked!)
195
+ end
196
+ ```
197
+
198
+ **Tag extractors (aggregation):**
199
+
200
+ ```ruby
201
+ # app/events/user_action.rb
202
+ module Events
203
+ class UserAction < E11y::Event::Base
204
+ schema do
205
+ required(:user_id).filled(:string)
206
+ required(:action_type).filled(:string)
207
+ end
208
+
209
+ # ✨ Event-level tag extractors (aggregate user_id → segment)
210
+ metric :counter,
211
+ name: 'user_actions_total',
212
+ tags: [:user_segment, :action_type],
213
+ tag_extractors: {
214
+ user_segment: ->(event) {
215
+ user = User.find(event.payload[:user_id])
216
+ user.segment # 'free', 'paid', 'enterprise'
217
+ }
218
+ },
219
+ cardinality_limit: 30 # 3 segments × 10 actions
220
+ end
221
+ end
222
+ ```
223
+
224
+ **Conventions for cardinality (sensible defaults):**
225
+
226
+ ```ruby
227
+ # Convention: Default cardinality limit = 100 series per metric
228
+ # Convention: Common forbidden labels auto-blocked
229
+
230
+ # Zero-config event (uses conventions):
231
+ class Events::OrderCreated < E11y::Event::Base
232
+ schema do
233
+ required(:order_id).filled(:string)
234
+ required(:status).filled(:string)
235
+ end
236
+
237
+ metric :counter,
238
+ name: 'orders_total',
239
+ tags: [:status] # ← Safe (low cardinality)
240
+ # ← Auto: cardinality_limit = 100 (default)
241
+ # ← Auto: order_id blocked (common forbidden label)
242
+ end
243
+
244
+ # Override convention:
245
+ class Events::OrderCreated < E11y::Event::Base
246
+ schema do; required(:order_id).filled(:string); end
247
+
248
+ metric :counter,
249
+ name: 'orders_total',
250
+ tags: [:status],
251
+ cardinality_limit: 50 # ← Override: 50 (not 100)
252
+ end
253
+ ```
254
+
255
+ **Precedence (event-level overrides global):**
256
+
257
+ ```ruby
258
+ # Global config (infrastructure):
259
+ E11y.configure do |config|
260
+ config.cardinality_protection do
261
+ forbidden_labels :user_id, :order_id # Global defaults
262
+ default_cardinality_limit 100
263
+ end
264
+ end
265
+
266
+ # Event-level config (overrides global):
267
+ class Events::UserAction < E11y::Event::Base
268
+ forbidden_metric_labels :user_id, :session_id # ← Override: adds session_id
269
+ default_cardinality_limit 50 # ← Override: 50 (not 100)
270
+ end
271
+ ```
272
+
273
+ **Benefits:**
274
+ - ✅ Locality of behavior (cardinality rules next to schema)
275
+ - ✅ DRY via inheritance/presets
276
+ - ✅ Sensible defaults (100 series limit)
277
+ - ✅ Easy to override when needed
278
+ - ✅ Tag extractors co-located with metrics
279
+
280
+ ---
281
+
282
+ ## 🎯 The 4-Layer Defense System
283
+
284
+ ### Layer Processing Flow
285
+
286
+ > **Implementation:** See [ADR-002 Section 4.1: Four-Layer Defense](../ADR-002-metrics-yabeda.md#41-four-layer-defense) for detailed architecture.
287
+
288
+ **🔑 Critical: Layers execute SEQUENTIALLY (not simultaneously).**
289
+
290
+ Each label is processed through all 4 layers **in order**. Once a layer makes a decision (DROP/KEEP), subsequent layers may be skipped:
291
+
292
+ ```
293
+ ┌────────────────────────────────────────────────────────┐
294
+ │ Incoming Event: { user_id: 123, status: 'paid' } │
295
+ └────────────────────────────────────────────────────────┘
296
+
297
+ ┌──────────────────────────────┐
298
+ │ For EACH label in event: │
299
+ └──────────────────────────────┘
300
+
301
+ ╔═══════════════════════════════════════╗
302
+ ║ Layer 1: Universal Denylist ║
303
+ ║ Q: Is label in FORBIDDEN_LABELS? ║
304
+ ╚═══════════════════════════════════════╝
305
+
306
+ ┌───────────┴───────────┐
307
+ │ YES │ NO
308
+ ↓ ↓
309
+ ❌ DROP Continue
310
+ (stop here) ↓
311
+ ╔═══════════════════════════════════════╗
312
+ ║ Layer 2: Safe Allowlist ║
313
+ ║ Q: Is label in SAFE_LABELS? ║
314
+ ╚═══════════════════════════════════════╝
315
+
316
+ ┌───────────┴───────────┐
317
+ │ YES │ NO
318
+ ↓ ↓
319
+ ✅ KEEP Continue
320
+ (skip Layer 3-4) ↓
321
+ ╔═══════════════════════════════════════╗
322
+ ║ Layer 3: Per-Metric Cardinality Limit ║
323
+ ║ Q: Is cardinality < limit? ║
324
+ ╚═══════════════════════════════════════╝
325
+
326
+ ┌───────────┴───────────┐
327
+ │ YES │ NO
328
+ ↓ ↓
329
+ ✅ KEEP Continue
330
+
331
+ ╔═══════════════════════════════════════╗
332
+ ║ Layer 4: Dynamic Action ║
333
+ ║ Execute: drop/alert/sample ║
334
+ ╚═══════════════════════════════════════╝
335
+
336
+ ❌ DROP (or alert)
337
+ ```
338
+
339
+ **Example: Processing 3 labels**
340
+
341
+ ```ruby
342
+ # Incoming event
343
+ Events::OrderPlaced.track(
344
+ user_id: 'user_12345', # ← Label 1
345
+ status: 'paid', # ← Label 2
346
+ custom_field: 'special_123' # ← Label 3
347
+ )
348
+
349
+ # Processing:
350
+
351
+ # user_id:
352
+ # → Layer 1: in FORBIDDEN_LABELS? ✅ YES → ❌ DROP (stop, skip Layer 2-4)
353
+ # Result: user_id not included in metric
354
+
355
+ # status:
356
+ # → Layer 1: in FORBIDDEN_LABELS? ❌ NO → continue
357
+ # → Layer 2: in SAFE_LABELS? ✅ YES → ✅ KEEP (skip Layer 3-4)
358
+ # Result: status='paid' included in metric
359
+
360
+ # custom_field:
361
+ # → Layer 1: in FORBIDDEN_LABELS? ❌ NO → continue
362
+ # → Layer 2: in SAFE_LABELS? ❌ NO → continue
363
+ # → Layer 3: cardinality < limit? ❌ NO (150 > 100) → continue
364
+ # → Layer 4: action = :drop → ❌ DROP
365
+ # Result: custom_field not included in metric
366
+
367
+ # Final metric:
368
+ # order_placed_total{status="paid"} 1
369
+ ```
370
+
371
+ **Key Properties:**
372
+
373
+ 1. **Early Exit Optimization:** If Layer 1 drops a label, Layers 2-4 never execute (performance optimization).
374
+ 2. **Safe Labels Fast Path:** Layer 2 approval skips expensive cardinality tracking (Layers 3-4).
375
+ 3. **Fallback to Dynamic Action:** Only labels that pass Layer 1-2 but fail Layer 3 reach Layer 4.
376
+ 4. **Order Matters:** Changing layer order breaks the protection model (e.g., Layer 3 before Layer 1 = wrong).
377
+
378
+ **Performance Impact:**
379
+
380
+ | Scenario | Layers Executed | Time | Example |
381
+ |---|---|---|---|
382
+ | Forbidden label | Layer 1 only | ~0.001ms | `user_id` |
383
+ | Safe label | Layer 1-2 | ~0.002ms | `status`, `method` |
384
+ | New label (under limit) | Layer 1-3 | ~0.01ms | `custom_field` (90th unique value) |
385
+ | Overflow label | Layer 1-4 | ~0.02ms | `custom_field` (101st unique value) |
386
+
387
+ **Why Sequential?**
388
+
389
+ ```ruby
390
+ # ❌ WRONG: Parallel layer execution
391
+ # Problem: All layers execute simultaneously, wasting CPU on labels already dropped
392
+
393
+ # ✅ CORRECT: Sequential execution
394
+ # Benefit: Early exit saves 75% CPU for forbidden labels
395
+ ```
396
+
397
+ ---
398
+
399
+ ### Layer 1: Denylist (Hard Block)
400
+
401
+ > **⚠️ CRITICAL: Adapter-Specific Filtering**
402
+ > **Implementation:** See [ADR-002 Section 4.2: Layer 1 - Universal Denylist](../ADR-002-metrics-yabeda.md#42-layer-1-universal-denylist) for detailed architecture.
403
+ >
404
+ > **Cardinality protection (denylist/allowlist) applies ONLY to metrics adapters (Yabeda/Prometheus), NOT to other adapters:**
405
+ >
406
+ > | Adapter Type | Denylist Applied? | Why? |
407
+ > |---|---|---|
408
+ > | **Metrics (Yabeda/Prometheus)** | ✅ YES | High-cardinality labels cause memory explosion in time-series databases (1M labels = 1GB RAM). |
409
+ > | **Logs (Loki)** | ❌ NO | Loki is designed for high-cardinality labels and uses different indexing strategy. Full payload preserved. |
410
+ > | **Errors (Sentry)** | ❌ NO | Sentry needs full context for debugging. High cardinality is acceptable for error tracking. |
411
+ > | **Audit (File/PostgreSQL)** | ❌ NO | Audit trails require complete, unfiltered data for compliance. |
412
+ >
413
+ > **Example:**
414
+ > ```ruby
415
+ > # Event with user_id (forbidden for metrics)
416
+ > Events::UserAction.track(user_id: "12345", action: "login")
417
+ >
418
+ > # What happens:
419
+ > # ✅ Prometheus: { action="login" } ← user_id DROPPED
420
+ > # ✅ Loki: { user_id="12345", action="login" } ← user_id PRESERVED
421
+ > # ✅ Sentry: { user_id="12345", action="login" } ← user_id PRESERVED
422
+ > # ✅ Audit: { user_id="12345", action="login" } ← user_id PRESERVED
423
+ > ```
424
+ >
425
+ > **Why This Matters:**
426
+ > - ✅ **Metrics stay safe:** Prometheus won't OOM due to cardinality explosion
427
+ > - ✅ **Debugging stays rich:** Loki/Sentry get full context for troubleshooting
428
+ > - ✅ **Compliance stays intact:** Audit logs remain complete and unfiltered
429
+ > - ✅ **Best of both worlds:** Safety for metrics + completeness for logs/errors
430
+
431
+ **Universal denylist - NEVER use these as labels (for metrics adapters):**
432
+
433
+ ```ruby
434
+ E11y.configure do |config|
435
+ config.metrics do
436
+ # === UNBOUNDED IDENTIFIERS (FORBIDDEN) ===
437
+ forbidden_labels :user_id, :customer_id, :account_id,
438
+ :order_id, :transaction_id, :invoice_id,
439
+ :session_id, :request_id, :trace_id, :span_id
440
+
441
+ # === INFRASTRUCTURE (FORBIDDEN) ===
442
+ forbidden_labels :pod_uid, :container_id, :instance_id,
443
+ :node_name # If dynamic
444
+
445
+ # === NETWORK/HTTP (FORBIDDEN) ===
446
+ forbidden_labels :url, # With query strings
447
+ :ip_address,
448
+ :user_agent,
449
+ :hostname # If ephemeral
450
+
451
+ # === TIME-BASED (FORBIDDEN) ===
452
+ forbidden_labels :timestamp, :created_at,
453
+ :version # Patch-level: 2.5.7234
454
+
455
+ # === ENFORCEMENT ===
456
+ enforcement :strict # ERROR on forbidden label usage
457
+ # OR
458
+ enforcement :warn # Log warning but allow
459
+ # OR
460
+ enforcement :aggregate # Auto-aggregate to "_other"
461
+ end
462
+ end
463
+
464
+ # Usage:
465
+ counter_for pattern: 'user.action',
466
+ tags: [:user_id] # ← ERROR: "user_id is forbidden!"
467
+
468
+ # Development warning:
469
+ # [E11y ERROR] Metric 'user.action_total' uses forbidden label 'user_id'
470
+ # Cardinality explosion risk! Use 'user_segment' instead.
471
+ ```
472
+
473
+ ---
474
+
475
+ ### Layer 2: Allowlist (Strict Mode)
476
+
477
+ **Only allow explicitly safe labels:**
478
+
479
+ ```ruby
480
+ E11y.configure do |config|
481
+ config.metrics do
482
+ # Strict mode: ONLY these labels allowed
483
+ allowed_labels_only true
484
+
485
+ # === BUSINESS DIMENSIONS (< 50 values) ===
486
+ allowed_labels :status, # pending, paid, failed (4-10 values)
487
+ :payment_method, # card, paypal (5-20 values)
488
+ :plan_tier # free, pro, enterprise (3-5 values)
489
+
490
+ # === INFRASTRUCTURE (< 20 values) ===
491
+ allowed_labels :env, # production, staging, dev (3 values)
492
+ :region, # us-east, eu-west (5-20 values)
493
+ :cluster, # main, backup (2-5 values)
494
+ :availability_zone
495
+
496
+ # === HTTP/SERVICE (< 100 values) ===
497
+ allowed_labels :http_method, # GET, POST, PUT, DELETE (10 values)
498
+ :http_status_code, # 200, 404, 500 (50 values)
499
+ :controller_action # UsersController#show (20-100 values)
500
+ end
501
+ end
502
+
503
+ # Usage:
504
+ counter_for pattern: 'order.paid',
505
+ tags: [:currency] # ← ERROR: "currency not in allowlist!"
506
+
507
+ # Must explicitly allow:
508
+ allowed_labels :currency # USD, EUR, GBP (3-20 values)
509
+ ```
510
+
511
+ **Rule of thumb:**
512
+ - ✅ **< 10 values** - Always safe
513
+ - 🟡 **10-100 values** - Usually OK, monitor
514
+ - 🔴 **> 100 values** - High risk, aggregate!
515
+
516
+ ---
517
+
518
+ ### Layer 3: Per-Metric Limits
519
+
520
+ **Set cardinality limits per metric:**
521
+
522
+ ```ruby
523
+ E11y.configure do |config|
524
+ config.metrics do
525
+ # === GLOBAL DEFAULT ===
526
+ default_cardinality_limit 1_000
527
+
528
+ # === PER-METRIC LIMITS ===
529
+ cardinality_limit_for 'http.requests' do
530
+ max_cardinality 2_000 # Higher limit for this metric
531
+ overflow_strategy :drop # → Drop overflow events
532
+ overflow_sample_rate 0.1 # Sample 10% of overflow events
533
+ end
534
+
535
+ cardinality_limit_for 'user.actions' do
536
+ max_cardinality 500 # Lower limit
537
+ overflow_strategy :drop # Drop overflow events
538
+ overflow_alert true # Alert on overflow
539
+ end
540
+
541
+ cardinality_limit_for 'orders.paid' do
542
+ max_cardinality 100
543
+ overflow_strategy :alert # Alert ops team + drop
544
+ end
545
+ end
546
+ end
547
+
548
+ # How it works:
549
+ # 1. Track unique label combinations per metric
550
+ # 2. If exceeds limit:
551
+ # - :drop → Discard overflow events (increment drop counter)
552
+ # - :alert → Alert ops team + drop
553
+ #
554
+ # NOTE: For aggregation/relabeling (e.g., user_id → user_segment),
555
+ # use tag_extractors (see "Aggregation" section below),
556
+ # NOT overflow_strategy.
557
+ ```
558
+
559
+ **Overflow strategies:**
560
+
561
+ | Strategy | Behavior | Use Case |
562
+ |----------|----------|----------|
563
+ | `:drop` | Discard overflow events | Default, simplest |
564
+ | `:alert` | Alert ops team + drop | Critical metrics |
565
+
566
+ #### Thread Safety
567
+
568
+ > **Implementation:** See [ADR-002 Section 4.4: Layer 3 - Per-Metric Cardinality Limits](../ADR-002-metrics-yabeda.md#44-layer-3-per-metric-cardinality-limits) for detailed architecture.
569
+ >
570
+ > **Sources:**
571
+ > - [Ruby Hash thread safety - Stack Overflow](https://stackoverflow.com/questions/22674498/thread-safety-for-hashes-in-ruby)
572
+ > - [Mutex performance overhead - Stack Overflow](https://stackoverflow.com/questions/9761899/why-does-this-code-run-slower-with-multiple-threads-even-on-a-multi-core-mach)
573
+ > - [Thread Safety with Mutexes - GoRails](https://gorails.com/episodes/thread-safety-with-mutexes-in-ruby)
574
+ > - [Understanding Ruby Threads and Concurrency - Better Stack](https://betterstack.com/community/guides/scaling-ruby/threads-and-concurrency/)
575
+
576
+ **🔒 Critical: CardinalityTracker is thread-safe by design.**
577
+
578
+ E11y applications typically handle hundreds of concurrent requests, each potentially emitting events with labels. The `CardinalityTracker` uses a **mutex** to ensure thread-safe tracking of unique label values across concurrent requests.
579
+
580
+ **Why Thread Safety Matters:**
581
+
582
+ ```ruby
583
+ # Scenario: 3 concurrent requests tracking same metric
584
+ Thread 1: track('orders_total', status: 'paid') # ← Same time
585
+ Thread 2: track('orders_total', status: 'pending') # ← Same time
586
+ Thread 3: track('orders_total', status: 'paid') # ← Same time
587
+
588
+ # Without mutex:
589
+ # - Race condition: both Thread 1 & 3 might think 'paid' is new
590
+ # - Tracker corruption: @trackers hash modified by 3 threads simultaneously
591
+ # - Lost updates: Thread 2's 'pending' might be overwritten
592
+ # - RESULT: Incorrect cardinality counts, potential memory leaks
593
+
594
+ # With mutex (actual E11y implementation):
595
+ # - Thread 1 acquires lock → adds 'paid' → releases (1/limit)
596
+ # - Thread 2 acquires lock → adds 'pending' → releases (2/limit)
597
+ # - Thread 3 acquires lock → sees 'paid' exists → releases (2/limit)
598
+ # - RESULT: Correct cardinality = 2
599
+ ```
600
+
601
+ **Implementation:**
602
+
603
+ ```ruby
604
+ # From ADR-002 Section 4.4
605
+ class CardinalityTracker
606
+ def initialize(limit: 100)
607
+ @limit = limit
608
+ @trackers = {} # { metric_name: { label_name: Set[values] } }
609
+ @mutex = Mutex.new # ← Thread safety
610
+ end
611
+
612
+ def check_and_track(metric_name, label_name, value)
613
+ @mutex.synchronize do # ← Only 1 thread executes this block at a time
614
+ @trackers[metric_name] ||= {}
615
+ @trackers[metric_name][label_name] ||= Set.new
616
+
617
+ tracker = @trackers[metric_name][label_name]
618
+
619
+ if tracker.include?(value)
620
+ true # Already seen
621
+ elsif tracker.size < @limit
622
+ tracker.add(value)
623
+ true # Added, under limit
624
+ else
625
+ false # Rejected, over limit
626
+ end
627
+ end
628
+ end
629
+ end
630
+ ```
631
+
632
+ **Performance Impact:**
633
+
634
+ ⚠️ **Reality Check:** Mutex synchronization has measurable overhead, especially under high concurrency:
635
+
636
+ - **Single-threaded baseline:** Hash lookup + Set operation ~0.001ms (1 microsecond)
637
+ - **With Mutex (low contention):** ~0.005-0.01ms (5-10 microseconds) - 5-10x slower
638
+ - **With Mutex (high contention):** Can degrade significantly due to cache coherency overhead
639
+
640
+ **Why slower?** Each `@mutex.synchronize` call forces CPU to:
641
+ 1. Acquire lock (coordinate with other cores)
642
+ 2. Access shared state from RAM (not L1/L2 cache) - ~100x slower than cache
643
+ 3. Release lock (notify waiting threads)
644
+
645
+ **Mitigation:** E11y minimizes overhead by:
646
+ - Keeping critical section **extremely short** (hash lookup + set add only)
647
+ - Using simple data structures (Hash + Set, not complex objects)
648
+ - Avoiding I/O or heavy computation inside `synchronize` block
649
+
650
+ **Real-world impact:** For most applications (100-1000 concurrent requests), mutex overhead is acceptable compared to the catastrophic cost of NOT having thread safety (corrupted cardinality counts, memory leaks, incorrect metrics)
651
+
652
+ **Monitoring Thread Contention:**
653
+
654
+ If you suspect mutex contention is becoming a bottleneck, monitor these indicators:
655
+
656
+ ```ruby
657
+ # Built-in E11y metrics (no extra config needed)
658
+ e11y_cardinality_checks_total # Total cardinality checks
659
+ e11y_cardinality_checks_duration_seconds # Duration histogram
660
+
661
+ # Prometheus query to detect contention:
662
+ # If p99 latency >> p50, likely contention
663
+ histogram_quantile(0.99, rate(e11y_cardinality_checks_duration_seconds_bucket[5m]))
664
+ /
665
+ histogram_quantile(0.50, rate(e11y_cardinality_checks_duration_seconds_bucket[5m]))
666
+ # Ratio > 10 = high contention
667
+ ```
668
+
669
+ **If contention becomes critical:**
670
+ - Consider using `Concurrent::Map` from concurrent-ruby gem (lock-free for reads)
671
+ - Shard cardinality trackers by metric name (separate mutex per metric)
672
+ - Profile with `ruby-prof` to identify exact bottleneck
673
+
674
+ ---
675
+
676
+ ### Layer 4: Dynamic Monitoring
677
+
678
+ **Auto-detect and alert on high cardinality:**
679
+
680
+ ```ruby
681
+ E11y.configure do |config|
682
+ config.metrics do
683
+ cardinality_monitoring do
684
+ # === THRESHOLDS ===
685
+ warn_threshold 0.7 # Alert at 70% of limit
686
+ critical_threshold 0.9 # Critical alert at 90%
687
+
688
+ # === AUTO-ADJUSTMENT ===
689
+ auto_adjust do
690
+ enabled true
691
+ threshold 0.8 # Trigger at 80%
692
+ action :aggregate # Auto-switch to aggregate strategy
693
+ notify :slack # Notify team
694
+ end
695
+
696
+ # === REPORTING ===
697
+ report_interval 1.minute # Check every minute
698
+ top_violators_count 10 # Track top 10 high-cardinality metrics
699
+
700
+ # === ALERTS ===
701
+ on_high_cardinality do |metric_name, current, limit|
702
+ Rails.logger.warn(
703
+ "[E11y] High cardinality: #{metric_name} at #{current}/#{limit}"
704
+ )
705
+
706
+ # Send to Slack
707
+ SlackNotifier.notify(
708
+ channel: '#observability',
709
+ message: "⚠️ Metric #{metric_name} cardinality: #{current}/#{limit}"
710
+ )
711
+ end
712
+ end
713
+ end
714
+ end
715
+ ```
716
+
717
+ #### Action Selection Guide
718
+
719
+ > **Implementation:** See [ADR-002 Section 4.5: Layer 4 - Dynamic Actions](../ADR-002-metrics-yabeda.md#45-layer-4-dynamic-actions) for detailed architecture.
720
+
721
+ **🎯 When cardinality limit is exceeded, which action should you choose?**
722
+
723
+ Use this decision tree to select the right strategy:
724
+
725
+ ```
726
+ ┌─────────────────────────────────────┐
727
+ │ Cardinality Limit Exceeded │
728
+ └─────────────────────────────────────┘
729
+
730
+ ┌────────────────┐
731
+ │ Critical to │ ← Question 1
732
+ │ investigate? │
733
+ └────────────────┘
734
+ ↙ ↘
735
+ YES NO
736
+ ↓ ↓
737
+ ┌─────────┐ ┌──────────────┐
738
+ │ ALERT │ │ Can group │ ← Question 2
739
+ │ │ │ values into │
740
+ │ + Drop │ │ categories? │
741
+ └─────────┘ └──────────────┘
742
+ ↓ ↙ ↘
743
+ │ YES NO
744
+ │ ↓ ↓
745
+ │ ┌─────────┐ ┌───────┐
746
+ │ │ RELABEL │ │ DROP │
747
+ │ └─────────┘ └───────┘
748
+ ↓ ↓ ↓
749
+ PagerDuty Reduced Silent
750
+ Alert Cardinality Removal
751
+ ```
752
+
753
+ **Decision Matrix:**
754
+
755
+ | Action | When to Use | Signal Preserved | Cardinality | Example |
756
+ |--------|-------------|------------------|-------------|---------|
757
+ | **DROP** | Label not important for analysis | ❌ None (label removed entirely) | 1 (label dropped) | Drop `request_id`, `trace_id` from metrics (keep in logs) |
758
+ | **RELABEL** | Clear categories exist (e.g., status codes, paths) | ✅✅✅ High (grouped into buckets) | 5-10 (category count) | `http_status: 200` → `status_class: 2xx` |
759
+ | **ALERT** | Unexpected high cardinality, needs investigation | ❌ None + 🚨 (label dropped + ops alerted) | 1 (label dropped) | Sudden spike in unique `customer_id` values |
760
+
761
+ **Practical Examples:**
762
+
763
+ **1. DROP - Default for non-critical labels**
764
+ ```ruby
765
+ # ❌ BAD: request_id creates 1M unique metrics
766
+ counter_for pattern: 'api.request',
767
+ tags: [:request_id, :endpoint] # request_id = high cardinality!
768
+
769
+ # ✅ GOOD: Drop request_id from metrics
770
+ counter_for pattern: 'api.request',
771
+ tags: [:endpoint] # Only low-cardinality tags
772
+
773
+ cardinality_limit_for 'api.request' do
774
+ max_cardinality 100
775
+ overflow_strategy :drop # Silent drop if exceeded
776
+ end
777
+
778
+ # Result: request_id still in logs/traces, just not in metrics
779
+ ```
780
+
781
+ **2. RELABEL - Best for known categories**
782
+ ```ruby
783
+ # ❌ BAD: 200 unique HTTP status codes
784
+ counter_for pattern: 'http.response',
785
+ tags: [:http_status] # 200, 201, 204, 400, 401, 403, ...
786
+
787
+ # ✅ GOOD: Relabel to status classes (5 categories)
788
+ counter_for pattern: 'http.response',
789
+ tags: [:status_class],
790
+ tag_extractors: {
791
+ status_class: ->(event) {
792
+ status = event.payload[:http_status].to_i
793
+ case status
794
+ when 100..199 then '1xx'
795
+ when 200..299 then '2xx'
796
+ when 300..399 then '3xx'
797
+ when 400..499 then '4xx'
798
+ when 500..599 then '5xx'
799
+ else 'unknown'
800
+ end
801
+ }
802
+ }
803
+
804
+ # Result: 200 values → 5 categories (99% cardinality reduction)
805
+ ```
806
+
807
+ **3. ALERT - For unexpected cardinality spikes**
808
+ ```ruby
809
+ # Payment events should have stable cardinality
810
+ cardinality_limit_for 'payments.processed' do
811
+ max_cardinality 50 # Expect ~10 payment methods
812
+ overflow_strategy :alert # Alert if exceeded
813
+ overflow_sample_rate 0.1 # Sample 10% of overflow events
814
+ end
815
+
816
+ # Scenario: Suddenly 1000 unique payment_method values
817
+ # → Alert sent to PagerDuty
818
+ # → Label dropped from metrics
819
+ # → Ops investigates (possible bug, data corruption, attack)
820
+ ```
821
+
822
+ **When NOT to use each action:**
823
+
824
+ | Action | DON'T Use When | Why |
825
+ |--------|---------------|-----|
826
+ | DROP | Label is critical for debugging | You lose all visibility into this dimension |
827
+ | RELABEL | No clear categories exist | Arbitrary bucketing (e.g., hash-based) loses signal |
828
+ | ALERT | High cardinality is expected | Alert fatigue, ops team overwhelmed |
829
+
830
+ **Common Patterns:**
831
+
832
+ ```ruby
833
+ # Pattern 1: DROP non-critical identifiers
834
+ # request_id, session_id, trace_id → DROP (keep in logs)
835
+ overflow_strategy :drop
836
+
837
+ # Pattern 2: RELABEL known enums
838
+ # http_status, country_code, user_tier → RELABEL (aggregate)
839
+ tag_extractors: { status_class: ->(e) { ... } }
840
+
841
+ # Pattern 3: ALERT on unexpected cardinality
842
+ # payment_method, product_sku → ALERT (should be stable)
843
+ overflow_strategy :alert
844
+ ```
845
+
846
+ **Monitoring Your Decisions:**
847
+
848
+ ```ruby
849
+ # Track how often each action triggers
850
+ Yabeda.e11y_internal.cardinality_actions_total.values
851
+ # => { action: 'drop', metric: 'api.requests' } => 42
852
+ # => { action: 'alert', metric: 'payments.processed' } => 1
853
+
854
+ # Prometheus query:
855
+ rate(e11y_cardinality_actions_total{action="alert"}[5m])
856
+ # → If >0, investigate what's causing unexpected cardinality
857
+ ```
858
+
859
+ ---
860
+
861
+ ## 💻 Advanced Techniques
862
+
863
+ ### 1. Aggregation (Best ROI - 99% Reduction)
864
+
865
+ > **Note:** This section describes **relabeling/normalization** (e.g., `user_id` → `user_segment`) via `tag_extractors`, which is different from `overflow_strategy`. Aggregation reduces cardinality **before** metrics are created, while overflow handling (`drop`/`alert`) deals with exceeding limits **after** creation. See [ADR-002 Section 4.5](../ADR-002-metrics-yabeda.md#45-cardinality-protection) for implementation details.
866
+
867
+ **Problem:** 1M users = 1M metric series
868
+
869
+ **Solution:** Aggregate to segments
870
+
871
+ ```ruby
872
+ # ❌ BAD: 1,000,000 users = 1,000,000 series
873
+ counter_for pattern: 'user.action',
874
+ tags: [:user_id]
875
+
876
+ # ✅ GOOD: 3 segments = 3 series (99.9997% reduction!)
877
+ counter_for pattern: 'user.action',
878
+ tags: [:user_segment],
879
+ tag_extractors: {
880
+ user_segment: ->(event) {
881
+ user_id = event.payload[:user_id]
882
+ user = User.find_by(id: user_id)
883
+ user&.segment || 'unknown' # 'free', 'paid', 'enterprise'
884
+ }
885
+ }
886
+
887
+ # Result:
888
+ # user_actions_total{user_segment="free"} 500000
889
+ # user_actions_total{user_segment="paid"} 400000
890
+ # user_actions_total{user_segment="enterprise"} 100000
891
+ ```
892
+
893
+ **Common aggregation strategies:**
894
+
895
+ | High-Cardinality Field | Aggregate To | Values |
896
+ |------------------------|--------------|--------|
897
+ | `user_id` (1M) | `user_segment` | free, paid, enterprise (3) |
898
+ | `order_id` (10M) | `order_status` | pending, paid, shipped (4) |
899
+ | `ip_address` (100k) | `country` | US, UK, DE, FR (50) |
900
+ | `version` (1000) | `major_version` | 1.x, 2.x, 3.x (3) |
901
+ | `url` (10k) | `endpoint_pattern` | /api/users/:id (100) |
902
+
903
+ ---
904
+
905
+ ### 2. Relabeling & Normalization
906
+
907
+ **Transform high-cardinality values to low-cardinality:**
908
+
909
+ ```ruby
910
+ counter_for pattern: 'http.request',
911
+ tags: [:http_status, :endpoint, :version],
912
+ tag_extractors: {
913
+ # Aggregate status codes: 200..299 → 2xx
914
+ http_status: ->(event) {
915
+ status = event.payload[:status]
916
+ "#{status / 100}xx" # 200 → "2xx", 404 → "4xx"
917
+ },
918
+
919
+ # Normalize endpoints: /api/users/123 → /api/users/:id
920
+ endpoint: ->(event) {
921
+ path = event.payload[:path]
922
+ path.gsub(/\/\d+/, '/:id') # Replace numbers with :id
923
+ },
924
+
925
+ # Major version only: 2.5.7234 → 2.x
926
+ version: ->(event) {
927
+ version = event.payload[:version]
928
+ major = version.split('.').first
929
+ "#{major}.x"
930
+ }
931
+ }
932
+
933
+ # Before relabeling: 50 status codes × 1000 endpoints × 100 versions = 5M series
934
+ # After relabeling: 5 status groups × 100 patterns × 10 major versions = 5k series
935
+ # Reduction: 99.9%
936
+ ```
937
+
938
+ ---
939
+
940
+ ### 3. Exemplars (Best of Both Worlds)
941
+
942
+ **Low-cardinality metrics + high-cardinality exemplars:**
943
+
944
+ ```ruby
945
+ counter_for pattern: 'order.paid',
946
+ name: 'orders_paid_total',
947
+ # LOW-cardinality labels (stored for all events)
948
+ tags: [:currency, :payment_method],
949
+ # HIGH-cardinality exemplars (sampled, not stored as labels)
950
+ exemplars: {
951
+ user_id: ->(event) { event.payload[:user_id] },
952
+ order_id: ->(event) { event.payload[:order_id] },
953
+ trace_id: ->(event) { event.trace_id }
954
+ },
955
+ exemplar_sample_rate: 0.01 # Sample 1% of events
956
+
957
+ # Result in Prometheus:
958
+ # Metric: orders_paid_total{currency="USD",payment_method="stripe"} 1234
959
+ # Exemplar (sampled): {user_id="12345",order_id="ord_abc",trace_id="xyz"}
960
+ #
961
+ # Benefits:
962
+ # - Low cardinality for storage/query (2 labels)
963
+ # - High cardinality context available (3 exemplars, sampled)
964
+ # - Can jump from metric to trace via trace_id
965
+ ```
966
+
967
+ ---
968
+
969
+ ### 4. Streaming Aggregation
970
+
971
+ **Aggregate BEFORE sending to metrics backend:**
972
+
973
+ ```ruby
974
+ E11y.configure do |config|
975
+ config.metrics do
976
+ # Pre-aggregate high-cardinality dimensions
977
+ streaming_aggregation do
978
+ # Aggregate all http.* events
979
+ aggregate pattern: 'http.*' do
980
+ # Keep these dimensions
981
+ keep_dimensions [:controller, :action, :http_status]
982
+
983
+ # Drop these dimensions (aggregate out)
984
+ drop_dimensions [:user_id, :session_id, :ip_address]
985
+
986
+ # Aggregation window
987
+ window 10.seconds
988
+
989
+ # Flush interval
990
+ flush_interval 5.seconds
991
+ end
992
+ end
993
+ end
994
+ end
995
+
996
+ # How it works:
997
+ # 1. Events buffered for 10 seconds
998
+ # 2. Aggregate by keep_dimensions (drop others)
999
+ # 3. Flush aggregated metrics every 5 seconds
1000
+ # 4. Result: 90% fewer metric updates
1001
+ ```
1002
+
1003
+ ---
1004
+
1005
+ ### 5. Tiered Retention
1006
+
1007
+ **Different retention for different cardinality:**
1008
+
1009
+ ```ruby
1010
+ E11y.configure do |config|
1011
+ config.metrics do
1012
+ # High-cardinality: short retention
1013
+ retention_for pattern: 'http.request.*',
1014
+ cardinality: :high,
1015
+ duration: 1.hour,
1016
+ aggregation: :mean # Downsample to mean after 1 hour
1017
+
1018
+ # Low-cardinality: long retention
1019
+ retention_for pattern: 'orders.paid.*',
1020
+ cardinality: :low,
1021
+ duration: 90.days,
1022
+ aggregation: :none # Keep raw data
1023
+
1024
+ # Auto-classify by actual cardinality
1025
+ auto_classify_retention true
1026
+ high_cardinality_threshold 1_000
1027
+ low_cardinality_threshold 100
1028
+ end
1029
+ end
1030
+
1031
+ # Result:
1032
+ # - High-cardinality metrics: 1 hour raw + 30 days aggregated
1033
+ # - Low-cardinality metrics: 90 days raw
1034
+ # - Cost savings: 70% reduction in storage
1035
+ ```
1036
+
1037
+ ---
1038
+
1039
+ ### 6. Universal Cardinality Protection (C04 Resolution) ⚠️ CRITICAL
1040
+
1041
+ > **⚠️ CRITICAL: C04 Conflict Resolution - Cardinality Protection for ALL Backends**
1042
+ > **See:** [ADR-009 Section 8](../ADR-009-cost-optimization.md#8-cardinality-protection-c04-resolution--critical) for detailed architecture and cost impact analysis.
1043
+ > **Problem:** Original UC-013 cardinality protection applied ONLY to Yabeda/Prometheus metrics, but NOT to OpenTelemetry span attributes or Loki log labels. High-cardinality values (`user_id`, `order_id`) bypassed protection and caused cost explosions in OTLP backends (Datadog, Honeycomb).
1044
+ > **Solution:** Universal `CardinalityFilter` middleware applies protection to **ALL backends** (Yabeda, OpenTelemetry, Loki) with optional per-backend overrides.
1045
+
1046
+ **The Problem - Inconsistent Cardinality Protection:**
1047
+
1048
+ Before C04 resolution, cardinality protection was **metrics-only**:
1049
+
1050
+ ```ruby
1051
+ # ❌ BEFORE C04: Inconsistent protection (cost explosion!)
1052
+ E11y.configure do |config|
1053
+ config.metrics do
1054
+ # Cardinality protection for Yabeda/Prometheus ✅
1055
+ forbidden_labels :user_id, :order_id
1056
+ cardinality_limit_for 'orders_total', max: 100
1057
+ end
1058
+
1059
+ # OpenTelemetry: NO cardinality protection! ❌
1060
+ config.opentelemetry do
1061
+ enabled true
1062
+ export_traces true # Spans include ALL attributes
1063
+ end
1064
+ end
1065
+
1066
+ # Event tracking (10,000 unique users):
1067
+ 10_000.times do |i|
1068
+ Events::OrderCreated.track(
1069
+ order_id: "order-#{i}", # ← 10,000 unique values!
1070
+ user_id: "user-#{i}", # ← 10,000 unique values!
1071
+ amount: 99.99
1072
+ )
1073
+ end
1074
+
1075
+ # Result:
1076
+ # ✅ Prometheus: order_id/user_id PROTECTED (only 100 unique values tracked)
1077
+ # ❌ OpenTelemetry: order_id/user_id NOT PROTECTED (all 10,000 exported!)
1078
+ # ❌ Loki: order_id/user_id NOT PROTECTED (index bloat!)
1079
+
1080
+ # Cost impact:
1081
+ # - Datadog: $0.10/span × 10,000 = $1,000/day = $30,000/month 💸
1082
+ # - Backend cardinality limit exceeded → data loss
1083
+ ```
1084
+
1085
+ **The Solution - Universal Cardinality Protection:**
1086
+
1087
+ After C04 resolution, protection applies to **ALL backends**:
1088
+
1089
+ ```ruby
1090
+ # ✅ AFTER C04: Unified protection (cost savings!)
1091
+ E11y.configure do |config|
1092
+ # GLOBAL cardinality protection (applies to ALL backends)
1093
+ config.cardinality_protection do
1094
+ enabled true
1095
+ max_unique_values 100 # Conservative default (Prometheus-safe)
1096
+ protected_labels [:user_id, :order_id, :session_id, :tenant_id]
1097
+ end
1098
+
1099
+ # Optional: Per-backend overrides (if needed)
1100
+ config.adapters do
1101
+ # Yabeda: Use global settings (default)
1102
+ yabeda do
1103
+ cardinality_protection.inherit_from :global
1104
+ end
1105
+
1106
+ # OpenTelemetry: Higher limits OK (OTLP backends handle more)
1107
+ opentelemetry do
1108
+ cardinality_protection do
1109
+ max_unique_values 1000 # OTLP backends can handle more
1110
+ protected_labels [:user_id, :order_id] # Subset of global
1111
+ end
1112
+ end
1113
+
1114
+ # Loki: Use global settings
1115
+ loki do
1116
+ cardinality_protection.inherit_from :global
1117
+ end
1118
+ end
1119
+ end
1120
+
1121
+ # Same event tracking (10,000 unique users):
1122
+ 10_000.times do |i|
1123
+ Events::OrderCreated.track(
1124
+ order_id: "order-#{i}",
1125
+ user_id: "user-#{i}",
1126
+ amount: 99.99
1127
+ )
1128
+ end
1129
+
1130
+ # Result:
1131
+ # ✅ Prometheus: order_id/user_id → 100 + [OTHER] (protected)
1132
+ # ✅ OpenTelemetry: order_id/user_id → 1000 + [OTHER] (protected)
1133
+ # ✅ Loki: order_id/user_id → 100 + [OTHER] (protected)
1134
+
1135
+ # Cost impact:
1136
+ # - Datadog: $0.01/span × 10,000 = $100/day = $3,000/month ✅
1137
+ # - Monthly savings: $27,000 💰 (90% reduction!)
1138
+ ```
1139
+
1140
+ **Configuration Examples:**
1141
+
1142
+ **1. Production: Strict Limits (Cost-Sensitive)**
1143
+
1144
+ ```ruby
1145
+ # config/environments/production.rb
1146
+ E11y.configure do |config|
1147
+ config.cardinality_protection do
1148
+ enabled true
1149
+ max_unique_values 100 # Prometheus-safe
1150
+ protected_labels [:user_id, :order_id, :session_id, :tenant_id, :ip_address]
1151
+ end
1152
+
1153
+ # OTLP can handle more (optional override)
1154
+ config.adapters.opentelemetry do
1155
+ cardinality_protection.max_unique_values 1000
1156
+ end
1157
+ end
1158
+ ```
1159
+
1160
+ **2. Development: No Limits (Full Visibility)**
1161
+
1162
+ ```ruby
1163
+ # config/environments/development.rb
1164
+ E11y.configure do |config|
1165
+ config.cardinality_protection.enabled false # Unlimited cardinality
1166
+ end
1167
+ ```
1168
+
1169
+ **3. Staging: Moderate Limits (Balance Cost vs Debugging)**
1170
+
1171
+ ```ruby
1172
+ # config/environments/staging.rb
1173
+ E11y.configure do |config|
1174
+ config.cardinality_protection do
1175
+ enabled true
1176
+ max_unique_values 500 # More than prod, less than unlimited
1177
+ protected_labels [:user_id, :order_id]
1178
+ end
1179
+
1180
+ # OTLP backend can handle even more
1181
+ config.adapters.opentelemetry do
1182
+ cardinality_protection.max_unique_values 1000
1183
+ end
1184
+ end
1185
+ ```
1186
+
1187
+ **Per-Backend Cardinality Budgets:**
1188
+
1189
+ Different backends have different cardinality tolerance:
1190
+
1191
+ | Backend | Recommended `max_unique_values` | Why |
1192
+ |---------|----------------------------------|-----|
1193
+ | **Prometheus (Yabeda)** | 100 | Time-series DB, high memory usage per series |
1194
+ | **OpenTelemetry (Datadog)** | 1000 | Columnar storage, better cardinality handling |
1195
+ | **Loki** | 100 | Label cardinality affects index size & query performance |
1196
+ | **Sentry** | Unlimited | Error tracking needs full context (not cost-sensitive) |
1197
+ | **Audit (PostgreSQL)** | Unlimited | Compliance requires complete data |
1198
+
1199
+ **Example: Different Limits per Backend**
1200
+
1201
+ ```ruby
1202
+ E11y.configure do |config|
1203
+ # Global default (applies to Yabeda, Loki)
1204
+ config.cardinality_protection do
1205
+ enabled true
1206
+ max_unique_values 100
1207
+ protected_labels [:user_id, :order_id]
1208
+ end
1209
+
1210
+ # OpenTelemetry: 10× higher limit
1211
+ config.adapters.opentelemetry do
1212
+ cardinality_protection.max_unique_values 1000
1213
+ end
1214
+
1215
+ # Sentry: No limit (need full context for debugging)
1216
+ config.adapters.sentry do
1217
+ cardinality_protection.enabled false
1218
+ end
1219
+
1220
+ # Audit: No limit (compliance)
1221
+ config.adapters.audit do
1222
+ cardinality_protection.enabled false
1223
+ end
1224
+ end
1225
+
1226
+ # Event with high-cardinality fields:
1227
+ Events::OrderCreated.track(
1228
+ order_id: "order-12345", # High-cardinality
1229
+ user_id: "user-67890", # High-cardinality
1230
+ amount: 99.99
1231
+ )
1232
+
1233
+ # Result per backend:
1234
+ # Prometheus: order_id/user_id → [OTHER] (after 100 unique values)
1235
+ # OpenTelemetry: order_id/user_id → [OTHER] (after 1000 unique values)
1236
+ # Loki: order_id/user_id → [OTHER] (after 100 unique values)
1237
+ # Sentry: order_id="order-12345", user_id="user-67890" (full context)
1238
+ # Audit: order_id="order-12345", user_id="user-67890" (full context)
1239
+ ```
1240
+
1241
+ **Monitoring Cardinality Protection:**
1242
+
1243
+ Track cardinality protection effectiveness:
1244
+
1245
+ ```ruby
1246
+ # Metrics:
1247
+ e11y_cardinality_filtered_labels_total{backend="all",label="user_id"}
1248
+ e11y_cardinality_unique_values{label="order_id"}
1249
+ e11y_cardinality_limit_breached_total{label="session_id"}
1250
+
1251
+ # Prometheus queries:
1252
+
1253
+ # 1. Cardinality protection rate (% of labels filtered)
1254
+ rate(e11y_cardinality_filtered_labels_total[5m])
1255
+ /
1256
+ rate(e11y_events_tracked_total[5m]) * 100
1257
+
1258
+ # 2. Labels at risk (approaching limit)
1259
+ e11y_cardinality_unique_values
1260
+ /
1261
+ 100 * 100 > 80 # 80% of max_unique_values (100)
1262
+
1263
+ # 3. Top high-cardinality labels
1264
+ topk(10,
1265
+ sum by (label) (
1266
+ rate(e11y_cardinality_filtered_labels_total[1h])
1267
+ )
1268
+ )
1269
+
1270
+ # 4. Cost savings estimate (assume $0.10 per unique span attribute)
1271
+ sum(rate(e11y_cardinality_filtered_labels_total[1d])) * 0.10
1272
+ # Result: Daily $ saved
1273
+ ```
1274
+
1275
+ **Trade-offs:**
1276
+
1277
+ | Aspect | Pros | Cons | Mitigation |
1278
+ |--------|------|------|------------|
1279
+ | **Unified protection** | Consistent across all backends | One size doesn't fit all backends | Per-backend overrides (`max_unique_values`) |
1280
+ | **[OTHER] grouping** | Prevents cost explosion | Loses context for debugging | Log original values at debug level |
1281
+ | **Global config** | Simple, DRY | May not fit all backend limits | Environment-specific: prod=100, staging=500, dev=unlimited |
1282
+ | **max_unique_values 100** | Conservative, safe for Prometheus | May be too strict for OTLP backends | Per-backend override: OTLP=1000, Yabeda=100 |
1283
+
1284
+ **Cost Impact:**
1285
+
1286
+ Real-world example from C04 analysis:
1287
+
1288
+ ```
1289
+ BEFORE C04 (no OTLP protection):
1290
+ - 10,000 orders/day with unique order_id
1291
+ - Datadog pricing: $0.10/span with high-cardinality attributes
1292
+ - Daily cost: $1,000
1293
+ - Monthly cost: $30,000 ❌
1294
+
1295
+ AFTER C04 (universal protection):
1296
+ - Same 10,000 orders/day
1297
+ - Cardinality protected: 1000 unique + [OTHER]
1298
+ - Datadog pricing: $0.01/span with low-cardinality attributes
1299
+ - Daily cost: $100
1300
+ - Monthly cost: $3,000 ✅
1301
+ - Monthly savings: $27,000 💰 (90% reduction!)
1302
+ ```
1303
+
1304
+ ---
1305
+
1306
+ ## 📊 Self-Monitoring Metrics
1307
+
1308
+ **E11y tracks its own cardinality:**
1309
+
1310
+ ```ruby
1311
+ # === CARDINALITY METRICS ===
1312
+ e11y_internal_metric_cardinality{metric="user_actions_total"} # Current unique series
1313
+ e11y_internal_metric_cardinality_limit{metric="user_actions_total"} # Configured limit
1314
+ e11y_internal_metric_cardinality_ratio{metric="user_actions_total"} # current/limit (0-1)
1315
+
1316
+ # === OVERFLOW METRICS ===
1317
+ e11y_internal_metric_overflow_count{metric="user_actions_total"} # Times limit exceeded
1318
+ e11y_internal_metric_overflow_events_total{metric="user_actions_total"} # Events via overflow path
1319
+
1320
+ # === VIOLATION METRICS ===
1321
+ e11y_internal_forbidden_label_violations_total{label="user_id"} # Denylist violations
1322
+ e11y_internal_label_value_count{metric="orders_paid_total",label="currency"} # Unique values per label
1323
+
1324
+ # === AGGREGATE METRICS ===
1325
+ e11y_internal_high_cardinality_metrics_total # Metrics above threshold
1326
+ e11y_internal_aggregated_series_total # Series using "_other" bucket
1327
+ ```
1328
+
1329
+ **Prometheus alerting:**
1330
+
1331
+ ```yaml
1332
+ # config/prometheus/alerts.yml
1333
+ groups:
1334
+ - name: e11y_cardinality
1335
+ rules:
1336
+ # Alert at 80% of limit
1337
+ - alert: E11yHighCardinality
1338
+ expr: e11y_internal_metric_cardinality_ratio > 0.8
1339
+ for: 5m
1340
+ annotations:
1341
+ summary: "Metric {{ $labels.metric }} at {{ $value }}% of limit"
1342
+ description: "Consider aggregating or increasing limit"
1343
+
1344
+ # Alert on overflow
1345
+ - alert: E11yCardinalityOverflow
1346
+ expr: rate(e11y_internal_metric_overflow_events_total[5m]) > 10
1347
+ for: 2m
1348
+ annotations:
1349
+ summary: "Metric {{ $labels.metric }} overflowing ({{ $value }} events/sec)"
1350
+
1351
+ # Alert on forbidden label usage
1352
+ - alert: E11yForbiddenLabelViolation
1353
+ expr: increase(e11y_internal_forbidden_label_violations_total[1h]) > 0
1354
+ annotations:
1355
+ summary: "Forbidden label {{ $labels.label }} used!"
1356
+ description: "Check metric configuration"
1357
+ ```
1358
+
1359
+ ---
1360
+
1361
+ ## 💻 Implementation Examples
1362
+
1363
+ ### Example 1: User Analytics (Safe)
1364
+
1365
+ ```ruby
1366
+ # ❌ BEFORE: High cardinality
1367
+ counter_for pattern: 'user.action',
1368
+ tags: [:user_id, :action] # 1M users × 10 actions = 10M series
1369
+
1370
+ # ✅ AFTER: Low cardinality
1371
+ counter_for pattern: 'user.action',
1372
+ tags: [:user_segment, :action, :cohort],
1373
+ tag_extractors: {
1374
+ user_segment: ->(e) {
1375
+ User.find(e.payload[:user_id]).segment # free, paid, enterprise
1376
+ },
1377
+ cohort: ->(e) {
1378
+ User.find(e.payload[:user_id]).cohort_month # 2024-01, 2024-02
1379
+ }
1380
+ }
1381
+ # Result: 3 segments × 10 actions × 12 cohorts = 360 series (99.996% reduction!)
1382
+ ```
1383
+
1384
+ ---
1385
+
1386
+ ### Example 2: HTTP Request Tracking
1387
+
1388
+ ```ruby
1389
+ counter_for pattern: 'http.request',
1390
+ tags: [:controller_action, :http_status_group, :region],
1391
+ tag_extractors: {
1392
+ # Normalize controller#action
1393
+ controller_action: ->(e) {
1394
+ "#{e.payload[:controller]}##{e.payload[:action]}"
1395
+ },
1396
+
1397
+ # Aggregate status codes
1398
+ http_status_group: ->(e) {
1399
+ status = e.payload[:status]
1400
+ case status
1401
+ when 200..299 then '2xx'
1402
+ when 300..399 then '3xx'
1403
+ when 400..499 then '4xx'
1404
+ when 500..599 then '5xx'
1405
+ else 'unknown'
1406
+ end
1407
+ }
1408
+ }
1409
+
1410
+ # With exemplars for debugging
1411
+ histogram_for pattern: 'http.request',
1412
+ value: ->(e) { e.duration_ms / 1000.0 },
1413
+ tags: [:controller_action, :http_status_group],
1414
+ exemplars: {
1415
+ trace_id: ->(e) { e.trace_id },
1416
+ user_id: ->(e) { e.context[:user_id] }
1417
+ },
1418
+ exemplar_sample_rate: 0.01 # 1% sampling
1419
+ ```
1420
+
1421
+ ---
1422
+
1423
+ ### Example 3: E-Commerce Orders
1424
+
1425
+ ```ruby
1426
+ # Orders by status, payment method, country
1427
+ counter_for pattern: 'order.paid',
1428
+ name: 'orders_paid_total',
1429
+ tags: [:status, :payment_method, :country, :amount_bucket],
1430
+ tag_extractors: {
1431
+ # Bucket amounts
1432
+ amount_bucket: ->(e) {
1433
+ amount = e.payload[:amount]
1434
+ case amount
1435
+ when 0..50 then 'small'
1436
+ when 51..200 then 'medium'
1437
+ when 201..1000 then 'large'
1438
+ else 'xlarge'
1439
+ end
1440
+ },
1441
+
1442
+ # Aggregate country to region
1443
+ country: ->(e) {
1444
+ Country.find(e.payload[:country_code]).region # US, EU, APAC
1445
+ }
1446
+ }
1447
+
1448
+ # Cardinality:
1449
+ # 4 statuses × 5 payment methods × 3 regions × 4 amount buckets = 240 series ✅
1450
+ ```
1451
+
1452
+ ---
1453
+
1454
+ ## 🧪 Testing
1455
+
1456
+ ```ruby
1457
+ # spec/e11y/cardinality_spec.rb
1458
+ RSpec.describe 'E11y Cardinality Protection' do
1459
+ describe 'forbidden labels' do
1460
+ it 'raises error on forbidden label usage' do
1461
+ E11y.configure do |config|
1462
+ config.metrics do
1463
+ forbidden_labels :user_id
1464
+ enforcement :strict
1465
+ end
1466
+ end
1467
+
1468
+ expect {
1469
+ E11y.configure do |config|
1470
+ config.metrics do
1471
+ counter_for pattern: 'test',
1472
+ tags: [:user_id]
1473
+ end
1474
+ end
1475
+ }.to raise_error(E11y::ForbiddenLabelError, /user_id/)
1476
+ end
1477
+ end
1478
+
1479
+ describe 'cardinality limits' do
1480
+ it 'drops overflow events' do
1481
+ E11y.configure do |config|
1482
+ config.metrics do
1483
+ cardinality_limit_for 'test_metric', max: 3
1484
+ overflow_strategy :drop
1485
+ end
1486
+ end
1487
+
1488
+ # Track 5 unique label values (exceeds limit of 3)
1489
+ 5.times do |i|
1490
+ Events::TestEvent.track(category: "cat_#{i}")
1491
+ end
1492
+
1493
+ metric = Yabeda.test_metric
1494
+ # Expect only 3 unique (2 dropped)
1495
+ expect(metric.values.keys.size).to eq(3)
1496
+
1497
+ # Verify drop counter incremented
1498
+ expect(Yabeda.e11y_internal.metric_overflow_events_total).to be > 0
1499
+ end
1500
+ end
1501
+
1502
+ describe 'self-monitoring' do
1503
+ it 'tracks cardinality ratio' do
1504
+ E11y.configure do |config|
1505
+ config.metrics do
1506
+ cardinality_limit_for 'test_metric', max: 100
1507
+ end
1508
+ end
1509
+
1510
+ 50.times { |i| Events::TestEvent.track(category: "cat_#{i}") }
1511
+
1512
+ ratio = Yabeda.e11y_internal.metric_cardinality_ratio.get(
1513
+ { metric: 'test_metric' }
1514
+ )
1515
+ expect(ratio).to eq(0.5) # 50/100
1516
+ end
1517
+ end
1518
+ end
1519
+ ```
1520
+
1521
+ ---
1522
+
1523
+ ## 💡 Best Practices
1524
+
1525
+ ### ✅ DO
1526
+
1527
+ **1. Use aggregation for high-cardinality dimensions**
1528
+ ```ruby
1529
+ # ✅ GOOD
1530
+ tags: [:user_segment] # free, paid, enterprise (3 values)
1531
+ ```
1532
+
1533
+ **2. Monitor cardinality proactively**
1534
+ ```ruby
1535
+ # ✅ GOOD
1536
+ cardinality_monitoring do
1537
+ warn_threshold 0.7
1538
+ alert_channel '#observability'
1539
+ end
1540
+ ```
1541
+
1542
+ **3. Use exemplars for debugging**
1543
+ ```ruby
1544
+ # ✅ GOOD
1545
+ exemplars: { trace_id: ->(e) { e.trace_id } }
1546
+ exemplar_sample_rate: 0.01
1547
+ ```
1548
+
1549
+ ---
1550
+
1551
+ ### ❌ DON'T
1552
+
1553
+ **1. Don't use unbounded identifiers as labels**
1554
+ ```ruby
1555
+ # ❌ BAD
1556
+ tags: [:user_id, :order_id, :session_id]
1557
+ ```
1558
+
1559
+ **2. Don't ignore cardinality warnings**
1560
+ ```ruby
1561
+ # ❌ BAD: Ignoring production alerts
1562
+ # [E11y WARNING] Metric at 95% of limit
1563
+ # → Action: Aggregate or increase limit immediately!
1564
+ ```
1565
+
1566
+ **3. Don't use timestamps as labels**
1567
+ ```ruby
1568
+ # ❌ BAD
1569
+ tags: [:timestamp, :created_at]
1570
+ # Use histogram buckets instead!
1571
+ ```
1572
+
1573
+ ---
1574
+
1575
+ ## 💰 Cost Calculator
1576
+
1577
+ ```ruby
1578
+ # Calculate your potential savings
1579
+ def calculate_cardinality_cost(
1580
+ services:,
1581
+ dimensions:,
1582
+ values_per_dimension:,
1583
+ cost_per_series: 0.068 # Datadog pricing
1584
+ )
1585
+ total_series = dimensions.map { |d| values_per_dimension[d] }.reduce(:*)
1586
+ total_series *= services
1587
+
1588
+ monthly_cost = total_series * cost_per_series
1589
+
1590
+ {
1591
+ total_series: total_series,
1592
+ monthly_cost: monthly_cost,
1593
+ yearly_cost: monthly_cost * 12
1594
+ }
1595
+ end
1596
+
1597
+ # Example: E-commerce app
1598
+ before = calculate_cardinality_cost(
1599
+ services: 50,
1600
+ dimensions: [:user_id, :product_id, :action],
1601
+ values_per_dimension: {
1602
+ user_id: 100_000,
1603
+ product_id: 10_000,
1604
+ action: 10
1605
+ }
1606
+ )
1607
+ # => 50B series, $3.4M/month! 😱
1608
+
1609
+ after = calculate_cardinality_cost(
1610
+ services: 50,
1611
+ dimensions: [:user_segment, :product_category, :action],
1612
+ values_per_dimension: {
1613
+ user_segment: 3,
1614
+ product_category: 20,
1615
+ action: 10
1616
+ }
1617
+ )
1618
+ # => 30k series, $2k/month ✅
1619
+ # SAVINGS: $3.4M - $2k = $3.398M/month (99.94% reduction!)
1620
+ ```
1621
+
1622
+ ---
1623
+
1624
+ ## ❓ Frequently Asked Questions
1625
+
1626
+ > **Technical Details:** See [ADR-002 Section 11: FAQ & Critical Clarifications](../ADR-002-metrics-yabeda.md#11-faq--critical-clarifications) for architectural rationale.
1627
+
1628
+ ### Q1: Does cardinality protection apply to all my logs and metrics?
1629
+
1630
+ **A: No, only to metrics (Prometheus/Yabeda). Logs keep full data.**
1631
+
1632
+ This is a common source of confusion. Let's clarify:
1633
+
1634
+ ```ruby
1635
+ # Same event, different treatment:
1636
+ Events::OrderCreated.track(
1637
+ user_id: '123', # High-cardinality
1638
+ status: 'paid', # Low-cardinality
1639
+ amount: 99.99
1640
+ )
1641
+ ```
1642
+
1643
+ **What happens:**
1644
+
1645
+ | Adapter | `user_id` | `status` | `amount` | Why |
1646
+ |---------|-----------|----------|----------|-----|
1647
+ | **Prometheus** | ❌ Dropped (denylist) | ✅ Kept | ❌ Dropped (value, not label) | Cardinality protection active |
1648
+ | **Loki (logs)** | ✅ Kept | ✅ Kept | ✅ Kept | No cardinality limits |
1649
+ | **Sentry** | ✅ Kept | ✅ Kept | ✅ Kept | Full context needed for debugging |
1650
+ | **Audit** | ✅ Kept | ✅ Kept | ✅ Kept | Compliance requires full data |
1651
+
1652
+ **Why this design?**
1653
+
1654
+ - **Metrics (Prometheus):** Cardinality explosions are catastrophic (cost, performance, query failures)
1655
+ - **Logs (Loki):** High-cardinality fields are fine (indexed differently, stored cheaper)
1656
+ - **Error tracking (Sentry):** Need full context to debug issues
1657
+ - **Audit trails:** Regulatory compliance requires complete data
1658
+
1659
+ **Practical implication:**
1660
+
1661
+ ```ruby
1662
+ # ✅ This is SAFE and RECOMMENDED:
1663
+ Events::ApiRequest.track(
1664
+ request_id: SecureRandom.uuid, # High-cardinality, but OK!
1665
+ endpoint: '/api/users',
1666
+ user_id: current_user.id
1667
+ )
1668
+
1669
+ # Result:
1670
+ # - Metrics: only endpoint tracked (request_id/user_id dropped)
1671
+ # - Logs: full payload with request_id for debugging
1672
+ # - Best of both worlds!
1673
+ ```
1674
+
1675
+ ---
1676
+
1677
+ ### Q2: Are the 4 layers checked simultaneously or one-by-one?
1678
+
1679
+ **A: One-by-one (sequential waterfall), not simultaneously.**
1680
+
1681
+ This is critical to understand for debugging and configuration:
1682
+
1683
+ ```
1684
+ Processing order for each label:
1685
+
1686
+ ┌─────────────────────────────────────────┐
1687
+ │ 1. Layer 1: Denylist Check │
1688
+ │ ↓ In denylist? → DROP, stop here │
1689
+ │ ↓ Not in denylist? → Continue to L2 │
1690
+ └─────────────────────────────────────────┘
1691
+
1692
+ ┌─────────────────────────────────────────┐
1693
+ │ 2. Layer 2: Allowlist Check │
1694
+ │ ↓ In allowlist? → KEEP, stop here │
1695
+ │ ↓ Not in allowlist? → Continue to L3 │
1696
+ └─────────────────────────────────────────┘
1697
+
1698
+ ┌─────────────────────────────────────────┐
1699
+ │ 3. Layer 3: Cardinality Limit │
1700
+ │ ↓ Under limit? → KEEP, stop here │
1701
+ │ ↓ Over limit? → Continue to L4 │
1702
+ └─────────────────────────────────────────┘
1703
+
1704
+ ┌─────────────────────────────────────────┐
1705
+ │ 4. Layer 4: Dynamic Action │
1706
+ │ ↓ Apply configured action: │
1707
+ │ drop / alert / relabel │
1708
+ └─────────────────────────────────────────┘
1709
+ ```
1710
+
1711
+ **Example trace through all layers:**
1712
+
1713
+ ```ruby
1714
+ # Event: { user_id: '123', status: 'paid', tier: 'premium' }
1715
+
1716
+ # Label: user_id
1717
+ # Layer 1: ✅ in FORBIDDEN_LABELS → ❌ DROP (stop here, never reaches L2-L4)
1718
+
1719
+ # Label: status
1720
+ # Layer 1: ✅ not in FORBIDDEN_LABELS → continue to L2
1721
+ # Layer 2: ✅ in SAFE_LABELS → ✅ KEEP (stop here, skip L3-L4)
1722
+
1723
+ # Label: tier
1724
+ # Layer 1: ✅ not in FORBIDDEN_LABELS → continue to L2
1725
+ # Layer 2: ✅ not in SAFE_LABELS → continue to L3
1726
+ # Layer 3: ❌ cardinality 150 > limit 100 → continue to L4
1727
+ # Layer 4: ✅ action=drop → ❌ DROP
1728
+
1729
+ # Final metric:
1730
+ # orders_total{status="paid"} 1
1731
+ # (user_id and tier dropped)
1732
+ ```
1733
+
1734
+ **Why sequential (not simultaneous)?**
1735
+
1736
+ - **Performance:** Early exit on denylist (L1) avoids expensive cardinality checks (L3)
1737
+ - **Predictability:** Clear precedence (denylist > allowlist > cardinality > action)
1738
+ - **Debuggability:** Easy to trace which layer made the decision
1739
+
1740
+ ---
1741
+
1742
+ ### Q3: What should I do when I hit a cardinality limit?
1743
+
1744
+ **A: Use relabeling if possible, otherwise drop the label.**
1745
+
1746
+ Use this decision process:
1747
+
1748
+ **Step 1: Can you group values into clear categories?**
1749
+
1750
+ ```ruby
1751
+ # ✅ YES - Use relabeling (best signal preservation):
1752
+
1753
+ # Example 1: HTTP status codes (200, 201, 204...) → status classes (2xx, 3xx...)
1754
+ tag_extractors: {
1755
+ status_class: ->(event) {
1756
+ case event.payload[:http_status].to_i
1757
+ when 200..299 then '2xx'
1758
+ when 400..499 then '4xx'
1759
+ when 500..599 then '5xx'
1760
+ end
1761
+ }
1762
+ }
1763
+ # Result: 50 values → 5 categories (90% reduction)
1764
+
1765
+ # Example 2: Paths (/users/123, /users/456...) → endpoint patterns (/users/:id)
1766
+ tag_extractors: {
1767
+ endpoint: ->(event) {
1768
+ event.payload[:path].gsub(/\d+/, ':id')
1769
+ }
1770
+ }
1771
+ # Result: Infinite values → ~100 endpoints
1772
+ ```
1773
+
1774
+ **Step 2: Is this label critical for alerts/dashboards?**
1775
+
1776
+ ```ruby
1777
+ # ❌ NO - Just drop it (keep in logs):
1778
+ cardinality_limit_for 'api.requests' do
1779
+ max_cardinality 100
1780
+ overflow_strategy :drop # Silent drop
1781
+ end
1782
+
1783
+ # Result: request_id removed from metrics, but still in logs for debugging
1784
+ ```
1785
+
1786
+ **Step 3: Is this an unexpected cardinality spike?**
1787
+
1788
+ ```ruby
1789
+ # ✅ YES - Alert ops team:
1790
+ cardinality_limit_for 'payments.processed' do
1791
+ max_cardinality 50
1792
+ overflow_strategy :alert # Alert + drop
1793
+ end
1794
+
1795
+ # Scenario: Suddenly 1000 unique payment_method values
1796
+ # → Alert sent to PagerDuty/Slack
1797
+ # → Ops investigates (possible bug, attack, data corruption)
1798
+ ```
1799
+
1800
+ **Common patterns:**
1801
+
1802
+ | Your Situation | Recommended Action | Example |
1803
+ |----------------|-------------------|---------|
1804
+ | Label not needed for analysis | **DROP** | `request_id`, `trace_id` → keep in logs only |
1805
+ | Clear categories exist | **RELABEL** | `http_status: 200` → `status_class: 2xx` |
1806
+ | Cardinality should be stable | **ALERT** | Payment methods suddenly spike to 1000 values |
1807
+ | Need debugging context | **Keep in logs** | Drop from metrics, query logs when debugging |
1808
+
1809
+ **Anti-pattern to avoid:**
1810
+
1811
+ ```ruby
1812
+ # ❌ DON'T: Keep high-cardinality labels in metrics
1813
+ counter_for pattern: 'api.request',
1814
+ tags: [:endpoint, :user_id] # user_id = millions of values!
1815
+
1816
+ # ✅ DO: Drop from metrics, keep in logs
1817
+ counter_for pattern: 'api.request',
1818
+ tags: [:endpoint] # Only low-cardinality tags
1819
+
1820
+ # user_id still available in Loki logs for debugging:
1821
+ # 2024-01-15 10:23:45 | api.request | endpoint=/api/users user_id=123 status=200
1822
+ ```
1823
+
1824
+ ---
1825
+
1826
+ ### Q4: How do I debug which layer dropped my label?
1827
+
1828
+ **A: Check E11y's built-in cardinality metrics:**
1829
+
1830
+ ```ruby
1831
+ # See which labels are being dropped:
1832
+ Yabeda.e11y_internal.cardinality_dropped_labels_total.values
1833
+ # => { metric: 'api_requests_total', label: 'user_id', reason: 'denylist' } => 1523
1834
+ # => { metric: 'api_requests_total', label: 'session_id', reason: 'limit_exceeded' } => 42
1835
+
1836
+ # See which layer made the decision:
1837
+ # - reason: 'denylist' → Layer 1
1838
+ # - reason: 'not_in_allowlist' → Layer 2 (if allowlist configured)
1839
+ # - reason: 'limit_exceeded' → Layer 3
1840
+ ```
1841
+
1842
+ **Prometheus queries for debugging:**
1843
+
1844
+ ```promql
1845
+ # Which metrics are dropping labels most frequently?
1846
+ topk(10, rate(e11y_cardinality_dropped_labels_total[5m]))
1847
+
1848
+ # Which labels are being dropped?
1849
+ sum by (label, reason) (e11y_cardinality_dropped_labels_total)
1850
+
1851
+ # Alert on unexpected drops:
1852
+ rate(e11y_cardinality_dropped_labels_total{reason="limit_exceeded"}[5m]) > 10
1853
+ ```
1854
+
1855
+ **Development/staging debugging:**
1856
+
1857
+ ```ruby
1858
+ # Temporarily log all cardinality decisions:
1859
+ E11y.configure do |config|
1860
+ config.metrics.cardinality_protection do
1861
+ debug_mode true # Logs every decision (verbose!)
1862
+ end
1863
+ end
1864
+
1865
+ # Output:
1866
+ # [E11y] Cardinality: KEEP label 'status' (Layer 2: allowlist)
1867
+ # [E11y] Cardinality: DROP label 'user_id' (Layer 1: denylist)
1868
+ # [E11y] Cardinality: DROP label 'tier' (Layer 3: limit 150/100, action=drop)
1869
+ ```
1870
+
1871
+ ---
1872
+
1873
+ ## 🔒 Validations (NEW - v1.1)
1874
+
1875
+ > **🎯 Pattern:** Validate cardinality configuration at class load time.
1876
+
1877
+ ### Cardinality Limit Validation
1878
+
1879
+ **Problem:** Invalid cardinality limits → metric explosion.
1880
+
1881
+ **Solution:** Validate cardinality_limit is positive integer:
1882
+
1883
+ ```ruby
1884
+ # Gem implementation (automatic):
1885
+ def self.cardinality_limit(max)
1886
+ unless max.is_a?(Integer) && max > 0 && max <= 10_000
1887
+ raise ArgumentError, "cardinality_limit must be 1..10_000, got: #{max.inspect}"
1888
+ end
1889
+ self._cardinality_limit = max
1890
+ end
1891
+
1892
+ # Result:
1893
+ class Events::UserAction < E11y::Event::Base
1894
+ metric :counter, name: 'actions_total', cardinality_limit: -100
1895
+ # ← ERROR: "cardinality_limit must be 1..10_000, got: -100"
1896
+ end
1897
+ ```
1898
+
1899
+ ### Forbidden Labels Validation
1900
+
1901
+ **Problem:** Using high-cardinality labels → cost explosion.
1902
+
1903
+ **Solution:** Validate against denylist:
1904
+
1905
+ ```ruby
1906
+ # Gem implementation (automatic):
1907
+ FORBIDDEN_LABELS = [:user_id, :order_id, :session_id, :trace_id, :request_id]
1908
+
1909
+ def self.metric(type, name:, tags:, **options)
1910
+ forbidden = tags & FORBIDDEN_LABELS
1911
+ if forbidden.any?
1912
+ raise ArgumentError, "Forbidden high-cardinality labels: #{forbidden.join(', ')}. Use aggregation instead!"
1913
+ end
1914
+ # ...
1915
+ end
1916
+
1917
+ # Result:
1918
+ class Events::UserAction < E11y::Event::Base
1919
+ metric :counter, name: 'actions_total', tags: [:user_id, :action_type]
1920
+ # ← ERROR: "Forbidden high-cardinality labels: user_id. Use aggregation instead!"
1921
+ end
1922
+ ```
1923
+
1924
+ ### Tag Extractors Validation
1925
+
1926
+ **Problem:** Tag extractors returning nil → metric gaps.
1927
+
1928
+ **Solution:** Validate extractor return values:
1929
+
1930
+ ```ruby
1931
+ # Gem implementation (runtime):
1932
+ def extract_tag_value(event, extractor)
1933
+ value = extractor.call(event)
1934
+ if value.nil? || value.to_s.empty?
1935
+ raise ArgumentError, "Tag extractor returned nil/empty for event: #{event.name}"
1936
+ end
1937
+ value.to_s
1938
+ end
1939
+ ```
1940
+
1941
+ ---
1942
+
1943
+ ## 🌍 Environment-Specific Cardinality Protection (NEW - v1.1)
1944
+
1945
+ > **🎯 Pattern:** Different cardinality limits per environment.
1946
+
1947
+ ### Example 1: Stricter Limits in Production
1948
+
1949
+ ```ruby
1950
+ class Events::UserAction < E11y::Event::Base
1951
+ schema do
1952
+ required(:user_id).filled(:string)
1953
+ required(:action_type).filled(:string)
1954
+ end
1955
+
1956
+ # Environment-specific cardinality limits
1957
+ metric :counter,
1958
+ name: 'user_actions_total',
1959
+ tags: [:user_segment, :action_type],
1960
+ cardinality_limit: Rails.env.production? ? 100 : 1_000,
1961
+ tag_extractors: {
1962
+ user_segment: ->(event) {
1963
+ if Rails.env.production?
1964
+ # Production: strict aggregation
1965
+ User.find(event.payload[:user_id]).segment # 'free', 'paid', 'enterprise'
1966
+ else
1967
+ # Dev/test: allow user_id for debugging
1968
+ event.payload[:user_id]
1969
+ end
1970
+ }
1971
+ }
1972
+ end
1973
+ ```
1974
+
1975
+ ### Example 2: Feature Flag for Cardinality Protection
1976
+
1977
+ ```ruby
1978
+ class Events::ApiRequest < E11y::Event::Base
1979
+ schema do
1980
+ required(:endpoint).filled(:string)
1981
+ required(:user_id).filled(:string)
1982
+ end
1983
+
1984
+ # Enable cardinality protection only when flag is on
1985
+ if ENV['ENABLE_CARDINALITY_PROTECTION'] == 'true'
1986
+ metric :counter,
1987
+ name: 'api_requests_total',
1988
+ tags: [:endpoint_group], # Aggregated
1989
+ cardinality_limit: 50,
1990
+ tag_extractors: {
1991
+ endpoint_group: ->(event) {
1992
+ # Group /users/123 → /users/:id
1993
+ event.payload[:endpoint].gsub(/\/\d+/, '/:id')
1994
+ }
1995
+ }
1996
+ else
1997
+ # Dev: no aggregation
1998
+ metric :counter,
1999
+ name: 'api_requests_total',
2000
+ tags: [:endpoint] # Full endpoint
2001
+ end
2002
+ end
2003
+ ```
2004
+
2005
+ ---
2006
+
2007
+ ## 📊 Precedence Rules for Cardinality Protection (NEW - v1.1)
2008
+
2009
+ > **🎯 Pattern:** Cardinality configuration precedence (most specific wins).
2010
+
2011
+ ### Precedence Order (Highest to Lowest)
2012
+
2013
+ ```
2014
+ 1. Event-level explicit config (highest priority)
2015
+
2016
+ 2. Preset module config
2017
+
2018
+ 3. Base class config (inheritance)
2019
+
2020
+ 4. Convention-based defaults (100 series)
2021
+
2022
+ 5. Global config (lowest priority)
2023
+ ```
2024
+
2025
+ ### Example: Mixing Inheritance + Presets for Cardinality
2026
+
2027
+ ```ruby
2028
+ # Global config (lowest priority)
2029
+ E11y.configure do |config|
2030
+ config.metrics do
2031
+ cardinality_limit 1_000 # Default for all metrics
2032
+ forbidden_labels :user_id, :session_id
2033
+ end
2034
+ end
2035
+
2036
+ # Base class (medium priority)
2037
+ class Events::BaseUserEvent < E11y::Event::Base
2038
+ # Common cardinality protection
2039
+ metric :counter,
2040
+ name: 'user_events_total',
2041
+ tags: [:user_segment, :event_type],
2042
+ cardinality_limit: 100, # Override global (stricter)
2043
+ tag_extractors: {
2044
+ user_segment: ->(event) { User.find(event.payload[:user_id]).segment }
2045
+ }
2046
+ end
2047
+
2048
+ # Preset module (higher priority)
2049
+ module E11y::Presets::MetricSafeEvent
2050
+ extend ActiveSupport::Concern
2051
+ included do
2052
+ # Override cardinality limit
2053
+ metric :counter,
2054
+ name: 'safe_events_total',
2055
+ tags: [:severity],
2056
+ cardinality_limit: 10 # Very strict!
2057
+ end
2058
+ end
2059
+
2060
+ # Event (highest priority)
2061
+ class Events::UserLogin < Events::BaseUserEvent
2062
+ include E11y::Presets::MetricSafeEvent
2063
+
2064
+ # Override preset (looser limit)
2065
+ metric :counter,
2066
+ name: 'user_logins_total',
2067
+ tags: [:user_segment, :login_method],
2068
+ cardinality_limit: 50 # Override preset
2069
+
2070
+ # Final config:
2071
+ # - cardinality_limit: 50 (event-level override)
2072
+ # - tags: [:user_segment, :login_method] (event-level)
2073
+ # - tag_extractors: inherited from base
2074
+ end
2075
+ ```
2076
+
2077
+ ### Precedence Rules Table
2078
+
2079
+ | Config | Global | Convention | Base Class | Preset | Event-Level | Winner |
2080
+ |--------|--------|------------|------------|--------|-------------|--------|
2081
+ | `cardinality_limit` | `1_000` | `100` | `100` | `10` | `50` | **`50`** (event) |
2082
+ | `tags` | - | - | `[:user_segment, :event_type]` | `[:severity]` | `[:user_segment, :login_method]` | **`[:user_segment, :login_method]`** (event) |
2083
+ | `forbidden_labels` | `[:user_id, :session_id]` | - | - | - | - | **`[:user_id, :session_id]`** (global) |
2084
+
2085
+ ### Convention-Based Defaults
2086
+
2087
+ **Convention:** If no cardinality_limit specified → default `100 series`:
2088
+
2089
+ ```ruby
2090
+ class Events::ApiRequest < E11y::Event::Base
2091
+ metric :counter, name: 'api_requests_total', tags: [:status]
2092
+ # ← Auto: cardinality_limit = 100 (convention!)
2093
+ end
2094
+ ```
2095
+
2096
+ ---
2097
+
2098
+ ## 📚 Related Use Cases
2099
+
2100
+ - **[UC-003: Pattern-Based Metrics](./UC-003-pattern-based-metrics.md)** - Auto-generate metrics
2101
+ - **[UC-008: OpenTelemetry Integration](./UC-008-opentelemetry-integration.md)** - OTLP cardinality protection (C04)
2102
+ - **[UC-015: Cost Optimization](./UC-015-cost-optimization.md)** - Reduce observability costs
2103
+
2104
+ ---
2105
+
2106
+ ## 🎯 Summary
2107
+
2108
+ ### E11y's Competitive Advantage
2109
+
2110
+ **ONLY Ruby gem with production-grade cardinality protection:**
2111
+
2112
+ | Feature | Yabeda | OTel Ruby | AppSignal | E11y |
2113
+ |---------|--------|-----------|-----------|------|
2114
+ | Forbidden labels | ❌ | ❌ | ❌ | ✅ |
2115
+ | Cardinality limits | ❌ | Basic (2000) | Vendor-specific | ✅ 4-layer defense |
2116
+ | Auto-aggregation | ❌ | ❌ | ❌ | ✅ |
2117
+ | Exemplars | ❌ | ❌ | ❌ | ✅ |
2118
+ | Self-monitoring | ❌ | Partial | Vendor-specific | ✅ 8+ metrics |
2119
+ | Cost reduction | 0% | ~30% | Vendor lock-in | **99%** |
2120
+
2121
+ **Real-world impact:** $67,320/month savings (99% reduction)
2122
+
2123
+ ---
2124
+
2125
+ **Document Version:** 1.1 (Unified DSL)
2126
+ **Last Updated:** January 16, 2026
2127
+ **Status:** ✅ Complete - Consistent with DSL-SPECIFICATION.md v1.1.0