e11y 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +4 -0
  3. data/.rubocop.yml +69 -0
  4. data/CHANGELOG.md +26 -0
  5. data/CODE_OF_CONDUCT.md +64 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +179 -0
  8. data/Rakefile +37 -0
  9. data/benchmarks/run_all.rb +33 -0
  10. data/config/README.md +83 -0
  11. data/config/loki-local-config.yaml +35 -0
  12. data/config/prometheus.yml +15 -0
  13. data/docker-compose.yml +78 -0
  14. data/docs/00-ICP-AND-TIMELINE.md +483 -0
  15. data/docs/01-SCALE-REQUIREMENTS.md +858 -0
  16. data/docs/ADR-001-architecture.md +2617 -0
  17. data/docs/ADR-002-metrics-yabeda.md +1395 -0
  18. data/docs/ADR-003-slo-observability.md +3337 -0
  19. data/docs/ADR-004-adapter-architecture.md +2385 -0
  20. data/docs/ADR-005-tracing-context.md +1372 -0
  21. data/docs/ADR-006-security-compliance.md +4143 -0
  22. data/docs/ADR-007-opentelemetry-integration.md +1385 -0
  23. data/docs/ADR-008-rails-integration.md +1911 -0
  24. data/docs/ADR-009-cost-optimization.md +2993 -0
  25. data/docs/ADR-010-developer-experience.md +2166 -0
  26. data/docs/ADR-011-testing-strategy.md +1836 -0
  27. data/docs/ADR-012-event-evolution.md +958 -0
  28. data/docs/ADR-013-reliability-error-handling.md +2750 -0
  29. data/docs/ADR-014-event-driven-slo.md +1533 -0
  30. data/docs/ADR-015-middleware-order.md +1061 -0
  31. data/docs/ADR-016-self-monitoring-slo.md +1234 -0
  32. data/docs/API-REFERENCE-L28.md +914 -0
  33. data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
  34. data/docs/IMPLEMENTATION_NOTES.md +2804 -0
  35. data/docs/IMPLEMENTATION_PLAN.md +1971 -0
  36. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
  37. data/docs/PLAN.md +148 -0
  38. data/docs/QUICK-START.md +934 -0
  39. data/docs/README.md +296 -0
  40. data/docs/design/00-memory-optimization.md +593 -0
  41. data/docs/guides/MIGRATION-L27-L28.md +692 -0
  42. data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
  43. data/docs/guides/README.md +44 -0
  44. data/docs/prd/01-overview-vision.md +440 -0
  45. data/docs/use_cases/README.md +119 -0
  46. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
  47. data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
  48. data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
  49. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
  50. data/docs/use_cases/UC-005-sentry-integration.md +759 -0
  51. data/docs/use_cases/UC-006-trace-context-management.md +905 -0
  52. data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
  53. data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
  54. data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
  55. data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
  56. data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
  57. data/docs/use_cases/UC-012-audit-trail.md +2301 -0
  58. data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
  59. data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
  60. data/docs/use_cases/UC-015-cost-optimization.md +735 -0
  61. data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
  62. data/docs/use_cases/UC-017-local-development.md +867 -0
  63. data/docs/use_cases/UC-018-testing-events.md +1081 -0
  64. data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
  65. data/docs/use_cases/UC-020-event-versioning.md +708 -0
  66. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
  67. data/docs/use_cases/UC-022-event-registry.md +648 -0
  68. data/docs/use_cases/backlog.md +226 -0
  69. data/e11y.gemspec +76 -0
  70. data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
  71. data/lib/e11y/adapters/audit_encrypted.rb +239 -0
  72. data/lib/e11y/adapters/base.rb +580 -0
  73. data/lib/e11y/adapters/file.rb +224 -0
  74. data/lib/e11y/adapters/in_memory.rb +216 -0
  75. data/lib/e11y/adapters/loki.rb +333 -0
  76. data/lib/e11y/adapters/otel_logs.rb +203 -0
  77. data/lib/e11y/adapters/registry.rb +141 -0
  78. data/lib/e11y/adapters/sentry.rb +230 -0
  79. data/lib/e11y/adapters/stdout.rb +108 -0
  80. data/lib/e11y/adapters/yabeda.rb +370 -0
  81. data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
  82. data/lib/e11y/buffers/base_buffer.rb +40 -0
  83. data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
  84. data/lib/e11y/buffers/ring_buffer.rb +267 -0
  85. data/lib/e11y/buffers.rb +14 -0
  86. data/lib/e11y/console.rb +122 -0
  87. data/lib/e11y/current.rb +48 -0
  88. data/lib/e11y/event/base.rb +894 -0
  89. data/lib/e11y/event/value_sampling_config.rb +84 -0
  90. data/lib/e11y/events/base_audit_event.rb +43 -0
  91. data/lib/e11y/events/base_payment_event.rb +33 -0
  92. data/lib/e11y/events/rails/cache/delete.rb +21 -0
  93. data/lib/e11y/events/rails/cache/read.rb +23 -0
  94. data/lib/e11y/events/rails/cache/write.rb +22 -0
  95. data/lib/e11y/events/rails/database/query.rb +45 -0
  96. data/lib/e11y/events/rails/http/redirect.rb +21 -0
  97. data/lib/e11y/events/rails/http/request.rb +26 -0
  98. data/lib/e11y/events/rails/http/send_file.rb +21 -0
  99. data/lib/e11y/events/rails/http/start_processing.rb +26 -0
  100. data/lib/e11y/events/rails/job/completed.rb +22 -0
  101. data/lib/e11y/events/rails/job/enqueued.rb +22 -0
  102. data/lib/e11y/events/rails/job/failed.rb +22 -0
  103. data/lib/e11y/events/rails/job/scheduled.rb +23 -0
  104. data/lib/e11y/events/rails/job/started.rb +22 -0
  105. data/lib/e11y/events/rails/log.rb +56 -0
  106. data/lib/e11y/events/rails/view/render.rb +23 -0
  107. data/lib/e11y/events.rb +18 -0
  108. data/lib/e11y/instruments/active_job.rb +201 -0
  109. data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
  110. data/lib/e11y/instruments/sidekiq.rb +175 -0
  111. data/lib/e11y/logger/bridge.rb +205 -0
  112. data/lib/e11y/metrics/cardinality_protection.rb +172 -0
  113. data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
  114. data/lib/e11y/metrics/registry.rb +234 -0
  115. data/lib/e11y/metrics/relabeling.rb +226 -0
  116. data/lib/e11y/metrics.rb +102 -0
  117. data/lib/e11y/middleware/audit_signing.rb +174 -0
  118. data/lib/e11y/middleware/base.rb +140 -0
  119. data/lib/e11y/middleware/event_slo.rb +167 -0
  120. data/lib/e11y/middleware/pii_filter.rb +266 -0
  121. data/lib/e11y/middleware/pii_filtering.rb +280 -0
  122. data/lib/e11y/middleware/rate_limiting.rb +214 -0
  123. data/lib/e11y/middleware/request.rb +163 -0
  124. data/lib/e11y/middleware/routing.rb +157 -0
  125. data/lib/e11y/middleware/sampling.rb +254 -0
  126. data/lib/e11y/middleware/slo.rb +168 -0
  127. data/lib/e11y/middleware/trace_context.rb +131 -0
  128. data/lib/e11y/middleware/validation.rb +118 -0
  129. data/lib/e11y/middleware/versioning.rb +132 -0
  130. data/lib/e11y/middleware.rb +12 -0
  131. data/lib/e11y/pii/patterns.rb +90 -0
  132. data/lib/e11y/pii.rb +13 -0
  133. data/lib/e11y/pipeline/builder.rb +155 -0
  134. data/lib/e11y/pipeline/zone_validator.rb +110 -0
  135. data/lib/e11y/pipeline.rb +12 -0
  136. data/lib/e11y/presets/audit_event.rb +65 -0
  137. data/lib/e11y/presets/debug_event.rb +34 -0
  138. data/lib/e11y/presets/high_value_event.rb +51 -0
  139. data/lib/e11y/presets.rb +19 -0
  140. data/lib/e11y/railtie.rb +138 -0
  141. data/lib/e11y/reliability/circuit_breaker.rb +216 -0
  142. data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
  143. data/lib/e11y/reliability/dlq/filter.rb +117 -0
  144. data/lib/e11y/reliability/retry_handler.rb +207 -0
  145. data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
  146. data/lib/e11y/sampling/error_spike_detector.rb +225 -0
  147. data/lib/e11y/sampling/load_monitor.rb +161 -0
  148. data/lib/e11y/sampling/stratified_tracker.rb +92 -0
  149. data/lib/e11y/sampling/value_extractor.rb +82 -0
  150. data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
  151. data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
  152. data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
  153. data/lib/e11y/slo/event_driven.rb +150 -0
  154. data/lib/e11y/slo/tracker.rb +119 -0
  155. data/lib/e11y/version.rb +9 -0
  156. data/lib/e11y.rb +283 -0
  157. metadata +452 -0
@@ -0,0 +1,2617 @@
1
+ # ADR-001: E11y Gem Architecture & Implementation Design
2
+
3
+ **Status:** Draft
4
+ **Date:** January 12, 2026
5
+ **Decision Makers:** Core Team
6
+ **Technical Level:** Implementation
7
+
8
+ ---
9
+
10
+ ## 📋 Table of Contents
11
+
12
+ 1. [Context & Goals](#1-context--goals)
13
+ 2. [Architecture Overview](#2-architecture-overview)
14
+ - 2.0. C4 Diagrams (Context, Container, Component, Code)
15
+ - 2.1. High-Level Architecture
16
+ - 2.2. Layered Architecture
17
+ - 2.6. Component Interaction Diagram
18
+ - 2.7. Memory Layout Diagram
19
+ - 2.8. Thread Model Diagram
20
+ - 2.9. Configuration Lifecycle
21
+ 3. [Core Components](#3-core-components)
22
+ - 3.1. Event Class (Zero-Allocation)
23
+ - 3.2. Pipeline (Middleware Chain)
24
+ - 3.3. Ring Buffer Implementation with Adaptive Memory Management ⚠️ C20
25
+ - 3.3.1. Base Ring Buffer (Lock-Free)
26
+ - 3.3.2. Adaptive Buffer with Memory Limits (C20 Resolution)
27
+ - 3.3.3. Configuration Examples
28
+ - 3.3.4. Trade-offs & Monitoring (C20)
29
+ - 3.4. Request-Scoped Buffer
30
+ - 3.5. Adapter Base Class
31
+ - 3.7. Request-Scoped Debug Buffer Flow
32
+ - 3.8. DLQ & Retry Flow
33
+ - 3.9. Adaptive Sampling Decision Tree
34
+ - 3.10. Cardinality Protection Flow
35
+ 4. [Processing Pipeline](#4-processing-pipeline)
36
+ - 4.1. Middleware Execution Order ⚠️ CRITICAL
37
+ 5. [Memory Optimization Strategy](#5-memory-optimization-strategy)
38
+ 6. [Thread Safety & Concurrency](#6-thread-safety--concurrency)
39
+ 7. [Extension Points](#7-extension-points)
40
+ 8. [Performance Requirements](#8-performance-requirements)
41
+ 9. [Testing Strategy](#9-testing-strategy)
42
+ 10. [Dependencies](#10-dependencies)
43
+ - 10.4. Deployment View
44
+ - 10.5. Multi-Environment Configuration
45
+ 11. [Deployment & Versioning](#11-deployment--versioning)
46
+ 12. [Trade-offs & Decisions](#12-trade-offs--decisions)
47
+
48
+ ---
49
+
50
+ ## 1. Context & Goals
51
+
52
+ ### 1.1. Problem Statement
53
+
54
+ Modern Rails applications need:
55
+ - Structured business event tracking
56
+ - Debug-on-error capabilities
57
+ - Built-in metrics & SLO tracking
58
+ - Multi-adapter routing (Loki, Sentry, etc.)
59
+ - GDPR-compliant PII filtering
60
+ - High performance (<1ms p99, <100MB memory)
61
+
62
+ ### 1.2. Goals
63
+
64
+ **Primary Goals:**
65
+ - ✅ 22 use cases as unified system
66
+ - ✅ Zero-allocation event tracking (class methods only)
67
+ - ✅ <1ms p99 latency @ 1000 events/sec
68
+ - ✅ <100MB memory footprint
69
+ - ✅ Rails 8.0+ exclusive
70
+ - ✅ Open-source extensibility
71
+
72
+ **Non-Goals:**
73
+ - ❌ Plain Ruby support (Rails only)
74
+ - ❌ Backwards compatibility with Rails 7.x
75
+ - ❌ Hot configuration reload
76
+ - ❌ Distributed tracing coordination (only propagation)
77
+
78
+ ### 1.3. Success Metrics
79
+
80
+ | Metric | Target | Critical? |
81
+ |--------|--------|-----------|
82
+ | **p99 Latency** | <1ms | ✅ Yes |
83
+ | **Memory** | <100MB @ steady state | ✅ Yes |
84
+ | **Throughput** | 1000 events/sec | ✅ Yes |
85
+ | **Test Coverage** | >90% | ✅ Yes |
86
+ | **Documentation** | All APIs documented | ⚠️ Important |
87
+ | **Adoption** | Community feedback | 🟡 Nice-to-have |
88
+
89
+ ---
90
+
91
+ ## 2. Architecture Overview
92
+
93
+ ### 2.0. C4 Diagrams
94
+
95
+ #### Level 1: System Context
96
+
97
+ ```mermaid
98
+ C4Context
99
+ title System Context - E11y Gem
100
+
101
+ Person(developer, "Developer", "Rails developer tracking business events")
102
+
103
+ System(e11y, "E11y Gem", "Event tracking & observability library")
104
+
105
+ System_Ext(loki, "Loki", "Log aggregation")
106
+ System_Ext(elasticsearch, "Elasticsearch", "Search & analytics")
107
+ System_Ext(sentry, "Sentry", "Error tracking")
108
+ System_Ext(prometheus, "Prometheus", "Metrics storage")
109
+ System_Ext(redis, "Redis", "Rate limiting")
110
+
111
+ Rel(developer, e11y, "Tracks events via", "Events::OrderPaid.track(...)")
112
+ Rel(e11y, loki, "Sends logs to", "HTTP/JSON")
113
+ Rel(e11y, elasticsearch, "Sends events to", "HTTP/JSON")
114
+ Rel(e11y, sentry, "Sends errors to", "Sentry SDK")
115
+ Rel(e11y, prometheus, "Exposes metrics to", "Yabeda")
116
+ Rel(e11y, redis, "Uses for rate limiting", "Redis client")
117
+ ```
118
+
119
+ #### Level 2: Container Diagram
120
+
121
+ ```mermaid
122
+ C4Container
123
+ title Container Diagram - E11y Gem Architecture
124
+
125
+ Person(developer, "Developer", "Tracks business events")
126
+
127
+ Container_Boundary(e11y, "E11y Gem") {
128
+ Container(api, "Public API", "Ruby Classes", "Event classes with DSL")
129
+ Container(pipeline, "Pipeline", "Middleware Chain", "Validation, PII filtering, rate limiting")
130
+ Container(buffers, "Buffers", "Ring Buffer + Thread-local", "Request-scoped & main buffers")
131
+ Container(adapters, "Adapters", "Adapter Registry", "Loki, Sentry, File, etc.")
132
+ Container(config, "Configuration", "Dry::Configurable", "Global configuration")
133
+ }
134
+
135
+ System_Ext(loki, "Loki", "Log aggregation")
136
+ System_Ext(sentry, "Sentry", "Error tracking")
137
+ System_Ext(redis, "Redis", "Rate limiting")
138
+
139
+ Rel(developer, api, "Calls", "Events::OrderPaid.track(...)")
140
+ Rel(api, pipeline, "Passes event data to", "Hash")
141
+ Rel(pipeline, buffers, "Routes to", "Ring buffer or thread-local")
142
+ Rel(buffers, adapters, "Flushes batches to", "Array of events")
143
+ Rel(adapters, loki, "Writes to", "HTTP/JSON")
144
+ Rel(adapters, sentry, "Writes to", "Sentry SDK")
145
+ Rel(pipeline, redis, "Checks rate limits", "Redis commands")
146
+ Rel(api, config, "Reads from", "Frozen config")
147
+ ```
148
+
149
+ #### Level 3: Component Diagram - Pipeline
150
+
151
+ ```mermaid
152
+ C4Component
153
+ title Component Diagram - Pipeline Processing
154
+
155
+ Container_Boundary(pipeline, "Pipeline") {
156
+ Component(middleware_chain, "Middleware Chain", "Builder Pattern", "Composable processing steps")
157
+ Component(validation, "Validation Middleware", "Dry::Schema", "Schema validation")
158
+ Component(enrichment, "Enrichment Middleware", "Context Injector", "Add trace_id, user_id")
159
+ Component(pii_filter, "PII Filter Middleware", "Pattern Matcher", "Mask sensitive data")
160
+ Component(rate_limiter, "Rate Limit Middleware", "Token Bucket", "Rate limiting")
161
+ Component(sampler, "Sampling Middleware", "Adaptive Sampler", "Dynamic sampling")
162
+ Component(router, "Routing Middleware", "Router", "Buffer selection")
163
+ }
164
+
165
+ Container_Ext(buffers, "Buffers", "Ring Buffer + Thread-local")
166
+ Container_Ext(config, "Configuration", "Frozen config")
167
+ Container_Ext(redis, "Redis", "Rate limit storage")
168
+
169
+ Rel(middleware_chain, validation, "Calls", "event_data")
170
+ Rel(validation, enrichment, "Calls next", "event_data")
171
+ Rel(enrichment, pii_filter, "Calls next", "event_data")
172
+ Rel(pii_filter, rate_limiter, "Calls next", "event_data")
173
+ Rel(rate_limiter, sampler, "Calls next", "event_data")
174
+ Rel(sampler, router, "Calls next", "event_data")
175
+ Rel(router, buffers, "Routes to", "Buffer write")
176
+
177
+ Rel(validation, config, "Reads schema", "Event class config")
178
+ Rel(pii_filter, config, "Reads rules", "PII config")
179
+ Rel(rate_limiter, redis, "Check/increment", "Redis commands")
180
+ Rel(sampler, config, "Reads rules", "Sampling config")
181
+ ```
182
+
183
+ #### Level 4: Code Diagram - Event Tracking Flow
184
+
185
+ ```mermaid
186
+ sequenceDiagram
187
+ participant App as Application
188
+ participant Event as Events::OrderPaid
189
+ participant Pipeline as Pipeline
190
+ participant Middleware as Middlewares
191
+ participant Buffer as Ring Buffer
192
+ participant Worker as Flush Worker
193
+ participant Adapter as Loki Adapter
194
+ participant Loki as Loki
195
+
196
+ App->>Event: .track(order_id: '123', amount: 99.99)
197
+ Event->>Event: Build event_data hash (no instance!)
198
+ Event->>Pipeline: .process(event_data)
199
+
200
+ Pipeline->>Middleware: ValidationMiddleware.call(event_data)
201
+ Middleware->>Middleware: Validate schema
202
+ Middleware->>Middleware: EnrichmentMiddleware (add trace_id)
203
+ Middleware->>Middleware: PiiFilterMiddleware (mask fields)
204
+ Middleware->>Middleware: RateLimitMiddleware (check Redis)
205
+ Middleware->>Middleware: SamplingMiddleware (decide)
206
+ Middleware->>Middleware: RoutingMiddleware (route to buffer)
207
+
208
+ Middleware->>Buffer: .push(event_data)
209
+ Buffer->>Buffer: Write to ring buffer (lock-free)
210
+
211
+ Note over Worker: Every 200ms
212
+ Worker->>Buffer: .pop(batch_size: 100)
213
+ Buffer-->>Worker: [event1, event2, ...]
214
+
215
+ Worker->>Worker: Batching
216
+ Worker->>Worker: Compression (gzip)
217
+
218
+ Worker->>Adapter: .write_batch(events)
219
+ Adapter->>Adapter: Serialize to JSON
220
+ Adapter->>Loki: HTTP POST /loki/api/v1/push
221
+ Loki-->>Adapter: 204 No Content
222
+ Adapter-->>Worker: Success
223
+ ```
224
+
225
+ ### 2.1. High-Level Architecture
226
+
227
+ ```
228
+ ┌─────────────────────────────────────────────────────────────────┐
229
+ │ Application Layer │
230
+ │ │
231
+ │ Events::OrderPaid.track(order_id: '123', amount: 99.99) │
232
+ │ │
233
+ └────────────────────────────┬────────────────────────────────────┘
234
+
235
+ ┌─────────────────────────────────────────────────────────────────┐
236
+ │ E11y::Pipeline (Middleware Chain) │
237
+ │ │
238
+ │ 1. TraceContextMiddleware ← Add trace_id, span_id, timestamp │
239
+ │ 2. ValidationMiddleware ← Schema validation (original class)│
240
+ │ 3. PiiFilterMiddleware ← PII filtering (original class) │
241
+ │ 4. RateLimitMiddleware ← Rate limiting (original class) │
242
+ │ 5. SamplingMiddleware ← Adaptive sampling (original class)│
243
+ │ 6. VersioningMiddleware ← Normalize event_name (LAST!) │
244
+ │ 7. RoutingMiddleware ← Buffer routing (debug vs main) │
245
+ │ │
246
+ └────────────────────────────┬────────────────────────────────────┘
247
+
248
+ ┌────────┴────────┐
249
+ │ │
250
+ ┌──────────▼─────┐ ┌──────▼──────────┐
251
+ │ Request Buffer │ │ Main Buffer │
252
+ │ (Thread-local) │ │ (Ring Buffer) │
253
+ │ │ │ │
254
+ │ :debug only │ │ :info+ events │
255
+ │ Flush on error │ │ Flush 200ms │
256
+ └──────────┬─────┘ └──────┬──────────┘
257
+ │ │
258
+ └────────┬───────┘
259
+
260
+ ┌───────────────────────────────────────┐
261
+ │ Flush Worker (Concurrent::TimerTask) │
262
+ │ │
263
+ │ - Batching │
264
+ │ - Payload minimization │
265
+ │ - Compression │
266
+ └───────────────────┬───────────────────┘
267
+
268
+ ┌───────────────────────────────────────┐
269
+ │ Adapter Registry │
270
+ │ │
271
+ │ - Circuit breakers │
272
+ │ - Retry policy │
273
+ │ - Dead letter queue │
274
+ └───────────────────┬───────────────────┘
275
+
276
+ ┌────────────────────────────────────────┐
277
+ │ Adapters (fan-out) │
278
+ │ │
279
+ │ → Loki Adapter │
280
+ │ → Sentry Adapter │
281
+ │ → File Adapter │
282
+ │ → Custom Adapters │
283
+ └─────────────────────────────────────────┘
284
+ ```
285
+
286
+ ### 2.2. Layered Architecture
287
+
288
+ ```
289
+ ┌─────────────────────────────────────────────────────────────────┐
290
+ │ Layer 1: Public API │
291
+ │ - Event classes (DSL) │
292
+ │ - Configuration (E11y.configure) │
293
+ │ - Helper methods (E11y.with_context) │
294
+ └─────────────────────────────────────────────────────────────────┘
295
+ ┌─────────────────────────────────────────────────────────────────┐
296
+ │ Layer 2: Pipeline Processing │
297
+ │ - Middleware chain │
298
+ │ - Validation, PII filtering, rate limiting │
299
+ │ - Context enrichment │
300
+ └─────────────────────────────────────────────────────────────────┘
301
+ ┌─────────────────────────────────────────────────────────────────┐
302
+ │ Layer 3: Buffering & Batching │
303
+ │ - Request-scoped buffer (thread-local) │
304
+ │ - Main ring buffer (concurrent) │
305
+ │ - Flush worker (timer task) │
306
+ └─────────────────────────────────────────────────────────────────┘
307
+ ┌─────────────────────────────────────────────────────────────────┐
308
+ │ Layer 4: Adapter Layer │
309
+ │ - Adapter registry │
310
+ │ - Circuit breakers, retry logic │
311
+ │ - Dead letter queue │
312
+ └─────────────────────────────────────────────────────────────────┘
313
+ ┌─────────────────────────────────────────────────────────────────┐
314
+ │ Layer 5: External Systems │
315
+ │ - Loki, Elasticsearch, Sentry │
316
+ │ - File system, S3 │
317
+ │ - Custom destinations │
318
+ └─────────────────────────────────────────────────────────────────┘
319
+ ```
320
+
321
+ ---
322
+
323
+ ### 2.6. Component Interaction Diagram - Buffers & Workers
324
+
325
+ ```mermaid
326
+ graph TB
327
+ subgraph "Event Sources"
328
+ API[Application Code]
329
+ Debug[Debug Events]
330
+ Info[Info+ Events]
331
+ end
332
+
333
+ subgraph "Pipeline"
334
+ MW1[Validation]
335
+ MW2[PII Filter]
336
+ MW3[Rate Limit]
337
+ MW4[Sampling]
338
+ MW5[Router]
339
+ end
340
+
341
+ subgraph "Buffers"
342
+ RB[Request Buffer<br/>Thread-local<br/>:debug only]
343
+ MB[Main Buffer<br/>Ring Buffer 100k<br/>:info+ events]
344
+ end
345
+
346
+ subgraph "Flush Workers"
347
+ RBF[Request Flush<br/>On error/end]
348
+ MBF[Main Flush<br/>Every 200ms]
349
+ end
350
+
351
+ subgraph "Post-Processing"
352
+ Batch[Batching]
353
+ Comp[Compression]
354
+ end
355
+
356
+ subgraph "Adapters"
357
+ A1[Loki]
358
+ A2[Sentry]
359
+ A3[File]
360
+ end
361
+
362
+ API --> MW1
363
+ MW1 --> MW2
364
+ MW2 --> MW3
365
+ MW3 --> MW4
366
+ MW4 --> MW5
367
+
368
+ MW5 -->|severity=debug| RB
369
+ MW5 -->|severity=info+| MB
370
+
371
+ RB -->|on error| RBF
372
+ RB -->|on success| Discard[Discard]
373
+ MB -->|timer 200ms| MBF
374
+
375
+ RBF --> DD
376
+ MBF --> DD
377
+ DD --> Batch
378
+ Batch --> Comp
379
+
380
+ Comp --> A1
381
+ Comp --> A2
382
+ Comp --> A3
383
+
384
+ style RB fill:#fff3cd
385
+ style MB fill:#d1ecf1
386
+ style DD fill:#d4edda
387
+ style Discard fill:#f8d7da
388
+ ```
389
+
390
+ ### 2.7. Memory Layout Diagram
391
+
392
+ ```mermaid
393
+ graph TB
394
+ subgraph "Heap Memory < 100MB Total"
395
+ direction TB
396
+
397
+ subgraph RingBuf["Ring Buffer: 50MB"]
398
+ RB1["Slot 0: event_data<br/>~500 bytes"]
399
+ RB2["Slot 1: event_data<br/>~500 bytes"]
400
+ RB3["..."]
401
+ RB4["Slot 99999: event_data<br/>~500 bytes"]
402
+ end
403
+
404
+ subgraph ReqBuf["Request Buffers: 0.5MB"]
405
+ T1["Thread 1<br/>debug events"]
406
+ T2["Thread 2<br/>debug events"]
407
+ T3["Thread N<br/>debug events"]
408
+ end
409
+
410
+ subgraph Registry["Event Registry: 1MB"]
411
+ E1["OrderPaid<br/>class config"]
412
+ E2["UserSignup<br/>class config"]
413
+ E3["100 classes<br/>~10KB each"]
414
+ end
415
+
416
+ subgraph Adapters["Adapters: 10MB"]
417
+ AD1["Loki Adapter<br/>connections"]
418
+ AD2["Sentry Adapter<br/>SDK state"]
419
+ AD3["File Adapter<br/>handles"]
420
+ end
421
+
422
+ subgraph VM["Ruby VM: 35MB"]
423
+ VMCore["Rails + Gems<br/>base overhead"]
424
+ end
425
+ end
426
+
427
+ RB1 -.->|capacity| RB2
428
+ RB2 -.-> RB3
429
+ RB3 -.-> RB4
430
+
431
+ T1 -.->|per-thread| T2
432
+ T2 -.-> T3
433
+
434
+ E1 -.->|frozen| E2
435
+ E2 -.-> E3
436
+
437
+ style RingBuf fill:#d1ecf1
438
+ style ReqBuf fill:#fff3cd
439
+ style Registry fill:#d4edda
440
+ style Adapters fill:#e2e3e5
441
+ style VM fill:#f8d7da
442
+ ```
443
+
444
+ ### 2.8. Thread Model Diagram
445
+
446
+ ```mermaid
447
+ graph TB
448
+ subgraph "Web Server Threads (Puma)"
449
+ WT1[Worker Thread 1<br/>Request 1]
450
+ WT2[Worker Thread 2<br/>Request 2]
451
+ WT3[Worker Thread N<br/>Request N]
452
+ end
453
+
454
+ subgraph "Thread-Local Storage"
455
+ TL1[Current.trace_id<br/>Current.request_buffer]
456
+ TL2[Current.trace_id<br/>Current.request_buffer]
457
+ TL3[Current.trace_id<br/>Current.request_buffer]
458
+ end
459
+
460
+ subgraph "Shared Resources"
461
+ RB[Ring Buffer<br/>Concurrent::AtomicFixnum]
462
+ Config[Configuration<br/>Frozen/Immutable]
463
+ Registry[Event Registry<br/>Frozen/Immutable]
464
+ end
465
+
466
+ subgraph "E11y Workers"
467
+ FW[Flush Worker<br/>Concurrent::TimerTask<br/>Single Thread]
468
+ end
469
+
470
+ WT1 -.->|thread-local| TL1
471
+ WT2 -.->|thread-local| TL2
472
+ WT3 -.->|thread-local| TL3
473
+
474
+ WT1 -->|write| RB
475
+ WT2 -->|write| RB
476
+ WT3 -->|write| RB
477
+
478
+ WT1 -->|read| Config
479
+ WT2 -->|read| Config
480
+ WT3 -->|read| Config
481
+
482
+ WT1 -->|read| Registry
483
+ WT2 -->|read| Registry
484
+ WT3 -->|read| Registry
485
+
486
+ FW -->|pop| RB
487
+ FW -->|read| Config
488
+
489
+ style TL1 fill:#fff3cd
490
+ style TL2 fill:#fff3cd
491
+ style TL3 fill:#fff3cd
492
+ style RB fill:#d1ecf1
493
+ style Config fill:#d4edda
494
+ style Registry fill:#d4edda
495
+ ```
496
+
497
+ ### 2.9. Configuration Lifecycle
498
+
499
+ ```mermaid
500
+ stateDiagram-v2
501
+ [*] --> Uninitialized
502
+ Uninitialized --> Configuring: E11y.configure
503
+
504
+ state Configuring {
505
+ [*] --> RegisterAdapters
506
+ RegisterAdapters --> SetupMiddleware
507
+ SetupMiddleware --> ConfigureBuffers
508
+ ConfigureBuffers --> SetupPII
509
+ SetupPII --> ConfigureMetrics
510
+ ConfigureMetrics --> [*]
511
+ }
512
+
513
+ Configuring --> Freezing: config.freeze!
514
+ Freezing --> Frozen
515
+ Frozen --> Running: Rails.application.initialize!
516
+
517
+ state Running {
518
+ [*] --> EagerLoad
519
+ EagerLoad --> FreezeRegistry
520
+ FreezeRegistry --> StartWorkers
521
+ StartWorkers --> Ready
522
+ Ready --> Processing: Events.track(...)
523
+ Processing --> Ready
524
+ }
525
+
526
+ Running --> Shutdown: Rails.application.shutdown
527
+ Shutdown --> [*]
528
+
529
+ note right of Frozen
530
+ Config immutable
531
+ Thread-safe reads
532
+ end note
533
+
534
+ note right of Ready
535
+ Accept events
536
+ Workers running
537
+ end note
538
+ ```
539
+
540
+ ---
541
+
542
+ ## 3. Core Components
543
+
544
+ ### 3.1. Event Class (Zero-Allocation)
545
+
546
+ **Design Decision:** No instance creation, class methods only.
547
+
548
+ ```ruby
549
+ module E11y
550
+ class Event
551
+ class Base
552
+ class << self
553
+ # Class-level configuration storage
554
+ attr_reader :_config
555
+
556
+ def inherited(subclass)
557
+ super
558
+ subclass.instance_variable_set(:@_config, {
559
+ adapters: [],
560
+ schema: nil,
561
+ metrics: [],
562
+ version: 1,
563
+ pii_rules: {},
564
+ retry_policy: {}
565
+ })
566
+
567
+ # Auto-register in registry
568
+ Registry.register(subclass)
569
+ end
570
+
571
+ # DSL methods (store config in @_config)
572
+ def adapters(list = nil)
573
+ return @_config[:adapters] if list.nil?
574
+ @_config[:adapters] = list
575
+ end
576
+
577
+ def schema(&block)
578
+ return @_config[:schema] if block.nil?
579
+ @_config[:schema] = Dry::Schema.define(&block)
580
+ end
581
+
582
+ def version(v = nil)
583
+ return @_config[:version] if v.nil?
584
+ @_config[:version] = v
585
+ end
586
+
587
+ # Main tracking method (NO INSTANCE CREATED!)
588
+ def track(**payload)
589
+ # Create event hash (not object)
590
+ event_data = {
591
+ event_class: self,
592
+ event_name: event_name,
593
+ event_version: @_config[:version],
594
+ payload: payload,
595
+ timestamp: Time.now.utc,
596
+ context: current_context
597
+ }
598
+
599
+ # Pass through pipeline (hash-based)
600
+ Pipeline.process(event_data)
601
+ end
602
+
603
+ def event_name
604
+ @event_name ||= name.demodulize.underscore.gsub('_v', '.v')
605
+ # Events::OrderPaid → 'order.paid'
606
+ # Events::OrderPaidV2 → 'order.paid.v2'
607
+ end
608
+
609
+ private
610
+
611
+ def current_context
612
+ {
613
+ trace_id: Current.trace_id,
614
+ user_id: Current.user_id,
615
+ request_id: Current.request_id
616
+ # ... other context
617
+ }
618
+ end
619
+ end
620
+ end
621
+ end
622
+ end
623
+ ```
624
+
625
+ **Key Points:**
626
+ - ✅ No `new` calls → zero allocation
627
+ - ✅ All data in Hash (not object)
628
+ - ✅ Class methods only
629
+ - ✅ Thread-safe (@_config frozen after definition)
630
+
631
+ ---
632
+
633
+ ### 3.2. Pipeline (Middleware Chain)
634
+
635
+ **Design Decision:** Middleware chain (Rails-familiar, extensible).
636
+
637
+ ```ruby
638
+ module E11y
639
+ class Pipeline
640
+ class << self
641
+ attr_accessor :middlewares
642
+
643
+ def use(middleware_class, *args, **options)
644
+ @middlewares ||= []
645
+ @middlewares << [middleware_class, args, options]
646
+ end
647
+
648
+ def process(event_data)
649
+ # Build middleware chain
650
+ chain = build_chain
651
+
652
+ # Execute chain
653
+ chain.call(event_data)
654
+ rescue => error
655
+ handle_pipeline_error(error, event_data)
656
+ end
657
+
658
+ private
659
+
660
+ def build_chain
661
+ # Reverse to build chain from inside out
662
+ @middlewares.reverse.reduce(final_handler) do |next_middleware, (klass, args, options)|
663
+ klass.new(next_middleware, *args, **options)
664
+ end
665
+ end
666
+
667
+ def final_handler
668
+ ->(event_data) { Router.route(event_data) }
669
+ end
670
+
671
+ def handle_pipeline_error(error, event_data)
672
+ case Config.on_error
673
+ when :raise
674
+ raise error
675
+ when :log
676
+ Rails.logger.error("E11y pipeline error", error: error, event: event_data)
677
+ when :ignore
678
+ # Silent
679
+ end
680
+
681
+ # Call custom error handler
682
+ Config.error_handler&.call(error, event_data) if Config.error_handler
683
+ end
684
+ end
685
+ end
686
+
687
+ # Middleware base class
688
+ class Middleware
689
+ def initialize(app)
690
+ @app = app
691
+ end
692
+
693
+ def call(event_data)
694
+ # Subclass implements logic
695
+ @app.call(event_data)
696
+ end
697
+ end
698
+ end
699
+ ```
700
+
701
+ **Built-in Middlewares (in execution order):**
702
+
703
+ ```ruby
704
+ # 1. Trace Context Middleware (Enrichment)
705
+ class TraceContextMiddleware < E11y::Middleware
706
+ def call(event_data)
707
+ event_data[:trace_id] ||= E11y::Current.trace_id || SecureRandom.uuid
708
+ event_data[:span_id] ||= SecureRandom.hex(8)
709
+ event_data[:timestamp] ||= Time.now.utc.iso8601(3)
710
+
711
+ @app.call(event_data)
712
+ end
713
+ end
714
+
715
+ # 2. Validation Middleware
716
+ class ValidationMiddleware < E11y::Middleware
717
+ def call(event_data)
718
+ schema = event_data[:event_class]._config[:schema]
719
+
720
+ if schema
721
+ result = schema.call(event_data[:payload])
722
+
723
+ if result.failure?
724
+ raise E11y::ValidationError, result.errors.to_h
725
+ end
726
+ end
727
+
728
+ @app.call(event_data)
729
+ end
730
+ end
731
+
732
+ # 3. PII Filter Middleware
733
+ class PiiFilterMiddleware < E11y::Middleware
734
+ def call(event_data)
735
+ # Get PII rules for event class (uses ORIGINAL class name!)
736
+ pii_rules = event_data[:event_class]._config[:pii_rules]
737
+
738
+ # Apply filtering
739
+ event_data[:payload] = PiiFilter.filter(
740
+ event_data[:payload],
741
+ rules: pii_rules
742
+ )
743
+
744
+ @app.call(event_data)
745
+ end
746
+ end
747
+
748
+ # 4. Rate Limit Middleware
749
+ class RateLimitMiddleware < E11y::Middleware
750
+ def call(event_data)
751
+ # Check limit for ORIGINAL class name (V1 vs V2 may differ!)
752
+ unless RateLimiter.allowed?(event_data)
753
+ # Drop event
754
+ Metrics.increment('e11y.events.rate_limited')
755
+ return :rate_limited
756
+ end
757
+
758
+ @app.call(event_data)
759
+ end
760
+ end
761
+
762
+ # 5. Sampling Middleware
763
+ class SamplingMiddleware < E11y::Middleware
764
+ def call(event_data)
765
+ unless Sampler.should_sample?(event_data)
766
+ Metrics.increment('e11y.events.sampled')
767
+ return :sampled
768
+ end
769
+
770
+ @app.call(event_data)
771
+ end
772
+ end
773
+
774
+ # 6. Versioning Middleware (LAST! Normalize for adapters)
775
+ class VersioningMiddleware < E11y::Middleware
776
+ def call(event_data)
777
+ # Extract version from class name (Events::OrderPaidV2 → 2)
778
+ class_name = event_data[:event_name]
779
+ version = extract_version(class_name)
780
+
781
+ # Normalize event_name (Events::OrderPaidV2 → Events::OrderPaid)
782
+ event_data[:event_name] = extract_base_name(class_name)
783
+
784
+ # Add v: field only if version > 1
785
+ event_data[:payload][:v] = version if version > 1
786
+
787
+ @app.call(event_data)
788
+ end
789
+
790
+ private
791
+
792
+ def extract_version(class_name)
793
+ class_name =~ /V(\d+)$/ ? $1.to_i : 1
794
+ end
795
+
796
+ def extract_base_name(class_name)
797
+ class_name.sub(/V\d+$/, '') # Remove V2, V3, etc.
798
+ end
799
+ end
800
+
801
+ # 7. Routing Middleware (final)
802
+ class RoutingMiddleware < E11y::Middleware
803
+ def call(event_data)
804
+ severity = event_data[:payload][:severity] || :info
805
+
806
+ if severity == :debug
807
+ # Route to request-scoped buffer
808
+ RequestBuffer.add(event_data)
809
+ else
810
+ # Route to main buffer
811
+ MainBuffer.add(event_data)
812
+ end
813
+ end
814
+ end
815
+ ```
816
+
817
+ ---
818
+
819
+ ### 3.3. Ring Buffer Implementation with Adaptive Memory Management
820
+
821
+ > **⚠️ CRITICAL: C20 Resolution - Memory Exhaustion Prevention**
822
+ > **See:** [CONFLICT-ANALYSIS.md C20](researches/CONFLICT-ANALYSIS.md#c20-memory-pressure--high-throughput) for detailed conflict analysis
823
+ > **Problem:** At high throughput (10K+ events/sec), fixed-size buffers can exhaust memory (up to 1GB+ per worker)
824
+ > **Solution:** Adaptive buffering with memory limits + backpressure mechanism
825
+
826
+ **Design Decision:** Lock-free SPSC ring buffer with adaptive memory management.
827
+
828
+ #### 3.3.1. Base Ring Buffer (Lock-Free)
829
+
830
+ ```ruby
831
+ module E11y
832
+ class RingBuffer
833
+ def initialize(capacity = 100_000)
834
+ @capacity = capacity
835
+ @buffer = Array.new(capacity)
836
+ @write_index = Concurrent::AtomicFixnum.new(0)
837
+ @read_index = Concurrent::AtomicFixnum.new(0)
838
+ @size = Concurrent::AtomicFixnum.new(0)
839
+ end
840
+
841
+ # Producer (single thread)
842
+ def push(item)
843
+ current_size = @size.value
844
+
845
+ if current_size >= @capacity
846
+ # Buffer full - handle backpressure
847
+ handle_backpressure(item)
848
+ return false
849
+ end
850
+
851
+ # Write to buffer
852
+ write_pos = @write_index.value % @capacity
853
+ @buffer[write_pos] = item
854
+
855
+ # Increment write index and size
856
+ @write_index.increment
857
+ @size.increment
858
+
859
+ true
860
+ end
861
+
862
+ # Consumer (single thread)
863
+ def pop(batch_size = 100)
864
+ items = []
865
+ current_size = @size.value
866
+
867
+ batch_size = [batch_size, current_size].min
868
+
869
+ batch_size.times do
870
+ read_pos = @read_index.value % @capacity
871
+ item = @buffer[read_pos]
872
+
873
+ items << item if item
874
+
875
+ @buffer[read_pos] = nil # Clear slot
876
+ @read_index.increment
877
+ @size.decrement
878
+ end
879
+
880
+ items
881
+ end
882
+
883
+ def size
884
+ @size.value
885
+ end
886
+
887
+ def empty?
888
+ @size.value.zero?
889
+ end
890
+
891
+ def full?
892
+ @size.value >= @capacity
893
+ end
894
+
895
+ private
896
+
897
+ def handle_backpressure(item)
898
+ case Config.backpressure_strategy
899
+ when :drop_oldest
900
+ pop(1) # Drop one old event
901
+ push(item) # Retry push
902
+ when :drop_new
903
+ # Drop current item
904
+ Metrics.increment('e11y.buffer.overflow')
905
+ when :block
906
+ # Wait until space available (risky!)
907
+ sleep 0.001 until !full?
908
+ push(item)
909
+ end
910
+ end
911
+ end
912
+ end
913
+ ```
914
+
915
+ #### 3.3.2. Adaptive Buffer with Memory Limits (C20 Resolution)
916
+
917
+ **Key Innovation:** Track memory usage across ALL buffers, enforce global limit.
918
+
919
+ ```ruby
920
+ module E11y
921
+ # Adaptive buffer manager with memory tracking
922
+ class AdaptiveBuffer
923
+ def initialize
924
+ @buffers = {} # Per-adapter buffer (Hash)
925
+ @total_memory_bytes = Concurrent::AtomicFixnum.new(0)
926
+ @memory_limit_bytes = (Config.buffering.memory_limit_mb || 100) * 1024 * 1024
927
+ @memory_warning_threshold = @memory_limit_bytes * 0.8 # 80% threshold
928
+ @flush_mutex = Mutex.new
929
+ end
930
+
931
+ # Add event with memory tracking
932
+ def add_event(event_data)
933
+ event_size = estimate_size(event_data)
934
+ current_memory = @total_memory_bytes.value
935
+
936
+ # Check memory limit
937
+ if current_memory + event_size > @memory_limit_bytes
938
+ return handle_memory_exhaustion(event_data, event_size)
939
+ end
940
+
941
+ # Warning threshold (trigger early flush)
942
+ if current_memory + event_size > @memory_warning_threshold
943
+ trigger_early_flush
944
+ end
945
+
946
+ # Add to appropriate buffer
947
+ adapter_key = event_data[:adapter] || :default
948
+ @buffers[adapter_key] ||= []
949
+ @buffers[adapter_key] << event_data
950
+
951
+ # Track memory
952
+ @total_memory_bytes.update { |v| v + event_size }
953
+
954
+ # Increment metrics
955
+ Metrics.gauge('e11y.buffer.memory_bytes', current_memory + event_size)
956
+ Metrics.increment('e11y.buffer.events_added')
957
+
958
+ true
959
+ end
960
+
961
+ # Memory estimation (C20 requirement)
962
+ def estimate_size(event_data)
963
+ # Estimate memory footprint:
964
+ # 1. Payload JSON size
965
+ # 2. Ruby object overhead (~200 bytes per Hash)
966
+ # 3. String overhead (~40 bytes per String)
967
+
968
+ payload_size = begin
969
+ event_data[:payload].to_json.bytesize
970
+ rescue
971
+ 500 # Fallback estimate
972
+ end
973
+
974
+ base_overhead = 200 # Hash object
975
+ string_overhead = event_data.keys.size * 40 # Keys
976
+
977
+ payload_size + base_overhead + string_overhead
978
+ end
979
+
980
+ # Flush buffers and return events
981
+ def flush
982
+ @flush_mutex.synchronize do
983
+ events = []
984
+ memory_freed = 0
985
+
986
+ @buffers.each do |adapter_key, buffer|
987
+ events.concat(buffer)
988
+
989
+ # Estimate memory freed
990
+ buffer.each { |event| memory_freed += estimate_size(event) }
991
+
992
+ buffer.clear
993
+ end
994
+
995
+ # Update memory tracking
996
+ @total_memory_bytes.update { |v| [v - memory_freed, 0].max }
997
+
998
+ # Metrics
999
+ Metrics.gauge('e11y.buffer.memory_bytes', @total_memory_bytes.value)
1000
+ Metrics.increment('e11y.buffer.flushes')
1001
+
1002
+ events
1003
+ end
1004
+ end
1005
+
1006
+ # Memory stats for monitoring
1007
+ def memory_stats
1008
+ {
1009
+ current_bytes: @total_memory_bytes.value,
1010
+ limit_bytes: @memory_limit_bytes,
1011
+ utilization: (@total_memory_bytes.value.to_f / @memory_limit_bytes * 100).round(2),
1012
+ buffer_counts: @buffers.transform_values(&:size)
1013
+ }
1014
+ end
1015
+
1016
+ private
1017
+
1018
+ # Handle memory exhaustion (C20 backpressure)
1019
+ def handle_memory_exhaustion(event_data, event_size)
1020
+ strategy = Config.buffering.backpressure.strategy
1021
+
1022
+ case strategy
1023
+ when :block
1024
+ # Block event ingestion until space available
1025
+ max_wait = Config.buffering.backpressure.max_block_time || 1.0
1026
+ wait_start = Time.now
1027
+
1028
+ loop do
1029
+ # Trigger immediate flush
1030
+ flush_all_buffers!
1031
+
1032
+ # Check if space available
1033
+ break if @total_memory_bytes.value + event_size <= @memory_limit_bytes
1034
+
1035
+ # Check timeout
1036
+ if Time.now - wait_start > max_wait
1037
+ # Timeout exceeded - drop event
1038
+ Metrics.increment('e11y.buffer.memory_exhaustion.dropped')
1039
+ Rails.logger.warn "[E11y] Buffer memory exhausted, dropped event: #{event_data[:event_name]}"
1040
+ return false
1041
+ end
1042
+
1043
+ sleep 0.01 # Wait 10ms before retry
1044
+ end
1045
+
1046
+ # Space available - retry add
1047
+ Metrics.increment('e11y.buffer.memory_exhaustion.blocked')
1048
+ add_event(event_data)
1049
+
1050
+ when :drop
1051
+ # Drop new event
1052
+ Metrics.increment('e11y.buffer.memory_exhaustion.dropped')
1053
+ Rails.logger.warn "[E11y] Buffer memory full, dropping event: #{event_data[:event_name]}"
1054
+ false
1055
+
1056
+ when :throttle
1057
+ # Trigger immediate flush, then drop if still full
1058
+ flush_all_buffers!
1059
+
1060
+ if @total_memory_bytes.value + event_size <= @memory_limit_bytes
1061
+ Metrics.increment('e11y.buffer.memory_exhaustion.throttled')
1062
+ add_event(event_data)
1063
+ else
1064
+ Metrics.increment('e11y.buffer.memory_exhaustion.dropped')
1065
+ Rails.logger.warn "[E11y] Buffer memory full after flush, dropping event: #{event_data[:event_name]}"
1066
+ false
1067
+ end
1068
+ end
1069
+ end
1070
+
1071
+ # Trigger early flush (80% threshold)
1072
+ def trigger_early_flush
1073
+ # Notify flush worker to flush NOW (not wait for timer)
1074
+ FlushWorker.trigger_immediate_flush
1075
+ Metrics.increment('e11y.buffer.early_flush_triggered')
1076
+ end
1077
+
1078
+ # Emergency flush (memory exhaustion)
1079
+ def flush_all_buffers!
1080
+ FlushWorker.flush_now!
1081
+ Metrics.increment('e11y.buffer.emergency_flush')
1082
+ end
1083
+ end
1084
+
1085
+ # Main buffer (singleton) with adaptive memory management
1086
+ class MainBuffer
1087
+ class << self
1088
+ def buffer
1089
+ @buffer ||= if Config.buffering.adaptive.enabled
1090
+ AdaptiveBuffer.new
1091
+ else
1092
+ RingBuffer.new(Config.buffer_capacity)
1093
+ end
1094
+ end
1095
+
1096
+ def add(event_data)
1097
+ buffer.add_event(event_data) rescue buffer.push(event_data)
1098
+ end
1099
+
1100
+ def flush
1101
+ buffer.flush
1102
+ end
1103
+
1104
+ # Memory stats (for monitoring)
1105
+ def memory_stats
1106
+ buffer.respond_to?(:memory_stats) ? buffer.memory_stats : {}
1107
+ end
1108
+ end
1109
+ end
1110
+ end
1111
+ ```
1112
+
1113
+ #### 3.3.3. Configuration Examples
1114
+
1115
+ **Production (High Throughput):**
1116
+ ```ruby
1117
+ E11y.configure do |config|
1118
+ config.buffering do
1119
+ adaptive do
1120
+ enabled true
1121
+ memory_limit_mb 100 # Max 100 MB per worker
1122
+
1123
+ # Backpressure strategy
1124
+ backpressure do
1125
+ enabled true
1126
+ strategy :block # Block event ingestion when full
1127
+ max_block_time 1.second # Max wait time before dropping
1128
+ end
1129
+ end
1130
+
1131
+ # Standard flush triggers still apply
1132
+ flush_interval 200.milliseconds
1133
+ max_buffer_size 1000
1134
+ end
1135
+ end
1136
+ ```
1137
+
1138
+ **Load Test Scenario (C20 Validation):**
1139
+ ```ruby
1140
+ # Test setup:
1141
+ # - Throughput: 10,000 events/sec
1142
+ # - Event size: ~5 KB average
1143
+ # - Memory limit: 100 MB
1144
+ # - Expected behavior: Buffer stays under 100 MB, backpressure activates
1145
+
1146
+ require 'benchmark'
1147
+
1148
+ # Generate high-throughput events
1149
+ events_per_second = 10_000
1150
+ duration_seconds = 60
1151
+
1152
+ total_events = events_per_second * duration_seconds
1153
+
1154
+ puts "Starting load test: #{events_per_second} events/sec for #{duration_seconds}s"
1155
+ puts "Memory limit: #{E11y::Config.buffering.memory_limit_mb} MB"
1156
+
1157
+ start_memory = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
1158
+ start_time = Time.now
1159
+
1160
+ # Generate events
1161
+ total_events.times do |i|
1162
+ Events::OrderPaid.track(
1163
+ order_id: "order-#{i}",
1164
+ amount: rand(10..1000),
1165
+ customer_id: "customer-#{rand(1..10_000)}",
1166
+ items: Array.new(rand(1..10)) { { sku: "SKU-#{rand(1000)}", qty: rand(1..5) } }
1167
+ )
1168
+
1169
+ # Report progress every 10k events
1170
+ if (i + 1) % 10_000 == 0
1171
+ stats = E11y::MainBuffer.memory_stats
1172
+ puts "[#{Time.now - start_time}s] Events: #{i + 1}, Memory: #{stats[:current_bytes] / 1024 / 1024} MB (#{stats[:utilization]}%)"
1173
+ end
1174
+
1175
+ # Throttle to match target rate
1176
+ sleep(1.0 / events_per_second) if i % 100 == 0
1177
+ end
1178
+
1179
+ end_time = Time.now
1180
+ end_memory = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
1181
+
1182
+ # Results
1183
+ puts "\n=== Load Test Results ==="
1184
+ puts "Duration: #{(end_time - start_time).round(2)}s"
1185
+ puts "Events: #{total_events}"
1186
+ puts "Rate: #{(total_events / (end_time - start_time)).round} events/sec"
1187
+ puts "Memory increase: #{((end_memory - start_memory) / 1024 / 1024).round(2)} MB"
1188
+
1189
+ stats = E11y::MainBuffer.memory_stats
1190
+ puts "Final buffer memory: #{stats[:current_bytes] / 1024 / 1024} MB (#{stats[:utilization]}%)"
1191
+
1192
+ # Assertions
1193
+ raise "Memory limit exceeded!" if stats[:current_bytes] > 105 * 1024 * 1024 # 5% tolerance
1194
+ puts "\n✅ Load test passed: Memory stayed under 100 MB limit"
1195
+ ```
1196
+
1197
+ #### 3.3.4. Trade-offs & Monitoring (C20)
1198
+
1199
+ **Trade-offs:**
1200
+
1201
+ | Aspect | Pro | Con | Mitigation |
1202
+ |--------|-----|-----|------------|
1203
+ | **Memory Safety** | ✅ Bounded memory usage | ⚠️ May drop events under extreme load | Monitor drop rate, alert if > 1% |
1204
+ | **Backpressure** | ✅ Prevents overload | ⚠️ Can slow request processing | Set max_block_time = 1s, then drop |
1205
+ | **Complexity** | ⚠️ Memory estimation overhead | ⚠️ ~50 bytes overhead per event | Acceptable for safety guarantee |
1206
+ | **Throughput** | ✅ Handles 10K+ events/sec | ⚠️ Early flush may increase I/O | Tune warning threshold (default 80%) |
1207
+
1208
+ **Monitoring (Critical for C20):**
1209
+
1210
+ ```ruby
1211
+ # Prometheus/Yabeda metrics
1212
+ Yabeda.configure do
1213
+ group :e11y_buffer do
1214
+ gauge :memory_bytes, comment: 'Current buffer memory usage in bytes'
1215
+ gauge :memory_utilization, comment: 'Buffer memory utilization %'
1216
+
1217
+ counter :events_added, comment: 'Events added to buffer'
1218
+ counter :flushes, comment: 'Buffer flushes triggered'
1219
+ counter :early_flush_triggered, comment: 'Early flushes (80% threshold)'
1220
+ counter :emergency_flush, comment: 'Emergency flushes (memory exhaustion)'
1221
+
1222
+ counter :memory_exhaustion_blocked, comment: 'Events blocked due to memory limit', tags: [:strategy]
1223
+ counter :memory_exhaustion_dropped, comment: 'Events dropped due to memory limit', tags: [:strategy]
1224
+ counter :memory_exhaustion_throttled, comment: 'Events throttled due to memory limit', tags: [:strategy]
1225
+ end
1226
+ end
1227
+
1228
+ # Alert rules (Grafana)
1229
+ # Alert: Buffer memory utilization > 90%
1230
+ # Alert: Drop rate > 1% of ingestion rate
1231
+ # Alert: Emergency flushes > 10/min
1232
+ ```
1233
+
1234
+ **Related Conflicts:**
1235
+ - **C14:** Development buffer tuning (see ADR-010)
1236
+ - **C18:** Non-failing event tracking in background jobs (see ADR-013)
1237
+
1238
+ ---
1239
+
1240
+ ### 3.4. Request-Scoped Buffer
1241
+
1242
+ **Design Decision:** Thread-local storage using ActiveSupport::CurrentAttributes.
1243
+
1244
+ ```ruby
1245
+ module E11y
1246
+ class Current < ActiveSupport::CurrentAttributes
1247
+ # Thread-local attributes
1248
+ attribute :trace_id
1249
+ attribute :user_id
1250
+ attribute :request_id
1251
+ attribute :request_buffer # Debug events buffer
1252
+ attribute :sampled # Sampling decision
1253
+
1254
+ def request_buffer
1255
+ attributes[:request_buffer] ||= []
1256
+ end
1257
+
1258
+ def add_debug_event(event_data)
1259
+ request_buffer << event_data if request_buffer.size < Config.request_buffer_limit
1260
+ end
1261
+
1262
+ def flush_debug_events
1263
+ events = request_buffer.dup
1264
+ request_buffer.clear
1265
+ events
1266
+ end
1267
+ end
1268
+
1269
+ # Request-scoped buffer manager
1270
+ class RequestBuffer
1271
+ class << self
1272
+ def add(event_data)
1273
+ Current.add_debug_event(event_data)
1274
+ end
1275
+
1276
+ def flush
1277
+ Current.flush_debug_events
1278
+ end
1279
+
1280
+ def setup_rails_integration
1281
+ # Hook into Rails request cycle
1282
+ ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
1283
+ event = ActiveSupport::Notifications::Event.new(*args)
1284
+
1285
+ # Flush on error
1286
+ if event.payload[:exception]
1287
+ flush_to_adapters
1288
+ else
1289
+ # Discard on success
1290
+ flush
1291
+ end
1292
+ end
1293
+ end
1294
+
1295
+ private
1296
+
1297
+ def flush_to_adapters
1298
+ events = flush
1299
+
1300
+ # Send debug events to adapters
1301
+ events.each do |event_data|
1302
+ MainBuffer.add(event_data)
1303
+ end
1304
+ end
1305
+ end
1306
+ end
1307
+ end
1308
+ ```
1309
+
1310
+ ---
1311
+
1312
+ ### 3.5. Adapter Base Class
1313
+
1314
+ **Design Decision:** Abstract base class with contract tests.
1315
+
1316
+ ```ruby
1317
+ module E11y
1318
+ module Adapters
1319
+ class Base
1320
+ # Required interface methods
1321
+ def write_batch(events)
1322
+ raise NotImplementedError, "#{self.class}#write_batch not implemented"
1323
+ end
1324
+
1325
+ def close
1326
+ # Optional: cleanup connections
1327
+ end
1328
+
1329
+ # Contract validation (for tests)
1330
+ def self.validate_contract!
1331
+ raise "Adapter must respond to :write_batch" unless instance_methods.include?(:write_batch)
1332
+ end
1333
+
1334
+ protected
1335
+
1336
+ # Helper for serialization
1337
+ def serialize(events)
1338
+ events.map { |e| serialize_event(e) }
1339
+ end
1340
+
1341
+ def serialize_event(event_data)
1342
+ {
1343
+ '@timestamp' => event_data[:timestamp].iso8601,
1344
+ 'event.name' => event_data[:event_name],
1345
+ 'event.version' => event_data[:event_version],
1346
+ 'trace.id' => event_data[:context][:trace_id],
1347
+ 'user.id' => event_data[:context][:user_id],
1348
+ **event_data[:payload]
1349
+ }
1350
+ end
1351
+ end
1352
+
1353
+ # Example: Loki Adapter
1354
+ class LokiAdapter < Base
1355
+ def initialize(url:, labels: {}, **options)
1356
+ @url = url
1357
+ @labels = labels
1358
+ @http = Faraday.new(url: url) do |f|
1359
+ f.request :json
1360
+ f.response :raise_error
1361
+ f.adapter Faraday.default_adapter
1362
+ end
1363
+ end
1364
+
1365
+ def write_batch(events)
1366
+ payload = {
1367
+ streams: [{
1368
+ stream: @labels,
1369
+ values: events.map { |e|
1370
+ [
1371
+ (e[:timestamp].to_f * 1_000_000_000).to_i.to_s, # Nanoseconds
1372
+ serialize_event(e).to_json
1373
+ ]
1374
+ }
1375
+ }]
1376
+ }
1377
+
1378
+ @http.post('/loki/api/v1/push', payload)
1379
+ rescue => error
1380
+ raise E11y::AdapterError, "Loki write failed: #{error.message}"
1381
+ end
1382
+ end
1383
+ end
1384
+ end
1385
+ ```
1386
+
1387
+ ---
1388
+
1389
+ ### 3.7. Request-Scoped Debug Buffer Flow
1390
+
1391
+ ```mermaid
1392
+ sequenceDiagram
1393
+ participant App as Application
1394
+ participant E11y as E11y::Pipeline
1395
+ participant RB as Request Buffer
1396
+ participant MB as Main Buffer
1397
+ participant Rails as Rails
1398
+
1399
+ Note over App,Rails: HTTP Request Starts
1400
+
1401
+ App->>E11y: Events::Debug.track(sql: 'SELECT...')
1402
+ E11y->>RB: Add to thread-local buffer
1403
+ Note over RB: Event buffered (not flushed)
1404
+
1405
+ App->>E11y: Events::Debug.track(cache_miss: true)
1406
+ E11y->>RB: Add to thread-local buffer
1407
+ Note over RB: 2 events buffered
1408
+
1409
+ App->>E11y: Events::Info.track(api_call: '...')
1410
+ E11y->>MB: Add to main buffer
1411
+ Note over MB: Info event → immediate flush
1412
+
1413
+ alt Request Succeeds
1414
+ Rails->>Rails: process_action.action_controller (success)
1415
+ Rails->>RB: Discard debug events
1416
+ Note over RB: Debug events dropped ✅
1417
+ else Request Fails (Error)
1418
+ Rails->>Rails: process_action.action_controller (exception)
1419
+ Rails->>RB: Flush debug events
1420
+ RB->>MB: Transfer all debug events to main buffer
1421
+ Note over MB: Debug events preserved for debugging 🔍
1422
+ end
1423
+ ```
1424
+
1425
+ ### 3.8. DLQ & Retry Flow
1426
+
1427
+ ```mermaid
1428
+ sequenceDiagram
1429
+ participant Buffer as Main Buffer
1430
+ participant Worker as Flush Worker
1431
+ participant Adapter as Loki Adapter
1432
+ participant Retry as Retry Policy
1433
+ participant DLQ as Dead Letter Queue
1434
+ participant Loki as Loki
1435
+
1436
+ Buffer->>Worker: pop(100 events)
1437
+ Worker->>Worker: Batching
1438
+ Worker->>Worker: Compression
1439
+
1440
+ Worker->>Adapter: write_batch(events)
1441
+ Adapter->>Loki: HTTP POST
1442
+ Loki-->>Adapter: ❌ 503 Service Unavailable
1443
+
1444
+ Adapter->>Retry: Handle error
1445
+
1446
+ Note over Retry: Retry #1 (100ms)
1447
+ Retry->>Loki: HTTP POST
1448
+ Loki-->>Retry: ❌ Timeout
1449
+
1450
+ Note over Retry: Retry #2 (200ms)
1451
+ Retry->>Loki: HTTP POST
1452
+ Loki-->>Retry: ❌ Timeout
1453
+
1454
+ Note over Retry: Retry #3 (400ms)
1455
+ Retry->>Loki: HTTP POST
1456
+ Loki-->>Retry: ❌ Timeout
1457
+
1458
+ Retry->>DLQ: Max retries exceeded
1459
+
1460
+ alt DLQ Filter: should_save?
1461
+ DLQ->>DLQ: Check filter rules
1462
+ DLQ->>DLQ: Save events to DLQ file
1463
+ Note over DLQ: Events preserved ✅
1464
+ else DLQ Filter: never_save
1465
+ DLQ->>DLQ: Drop events
1466
+ DLQ->>DLQ: Log to dropped_events.jsonl
1467
+ Note over DLQ: Events dropped but logged 📝
1468
+ end
1469
+
1470
+ Note over DLQ: Later: E11y::DeadLetterQueue.replay_all
1471
+ DLQ->>Loki: Replay when Loki is back
1472
+ Loki-->>DLQ: ✅ 204 Success
1473
+ ```
1474
+
1475
+ ### 3.9. Adaptive Sampling Decision Tree
1476
+
1477
+ ```mermaid
1478
+ graph TD
1479
+ Start[Event arrives] --> Severity{Severity?}
1480
+
1481
+ Severity -->|error/fatal| AlwaysSample[Always sample ✅]
1482
+ Severity -->|warn/info/debug| CheckPattern{Pattern match?}
1483
+
1484
+ CheckPattern -->|payment.*| AlwaysSample
1485
+ CheckPattern -->|audit.*| AlwaysSample
1486
+ CheckPattern -->|other| CheckLoad{System load?}
1487
+
1488
+ CheckLoad -->|<50%| HighRate[Sample 100% ✅]
1489
+ CheckLoad -->|50-80%| MediumRate{Error rate?}
1490
+ CheckLoad -->|>80%| LowRate[Sample 10% ⚠️]
1491
+
1492
+ MediumRate -->|<1%| Sample50[Sample 50%]
1493
+ MediumRate -->|>1%| Sample75[Sample 75%]
1494
+
1495
+ Sample50 --> Decision{Random < 0.5?}
1496
+ Sample75 --> Decision2{Random < 0.75?}
1497
+ LowRate --> Decision3{Random < 0.1?}
1498
+
1499
+ Decision -->|Yes| Sample[Sample ✅]
1500
+ Decision -->|No| Drop[Drop 🗑️]
1501
+ Decision2 -->|Yes| Sample
1502
+ Decision2 -->|No| Drop
1503
+ Decision3 -->|Yes| Sample
1504
+ Decision3 -->|No| Drop
1505
+
1506
+ AlwaysSample --> Pipeline[Continue pipeline]
1507
+ HighRate --> Pipeline
1508
+ Sample --> Pipeline
1509
+ Drop --> Metrics[Increment sampled metric]
1510
+
1511
+ style AlwaysSample fill:#d4edda
1512
+ style Sample fill:#d4edda
1513
+ style Drop fill:#f8d7da
1514
+ style LowRate fill:#fff3cd
1515
+ ```
1516
+
1517
+ ### 3.10. Cardinality Protection Flow
1518
+
1519
+ ```mermaid
1520
+ graph LR
1521
+ Event[Event with labels] --> Check1{In denylist?}
1522
+
1523
+ Check1 -->|Yes| Drop1[Drop label ❌]
1524
+ Check1 -->|No| Check2{In allowlist?}
1525
+
1526
+ Check2 -->|Yes| Keep[Keep label ✅]
1527
+ Check2 -->|No| Check3{Cardinality<br/>< limit?}
1528
+
1529
+ Check3 -->|Yes| Keep
1530
+ Check3 -->|No| Action{Protection<br/>action?}
1531
+
1532
+ Action -->|drop| Drop2[Drop label ❌]
1533
+ Action -->|alert| Alert[Alert + Drop ⚠️]
1534
+
1535
+ Drop1 --> Metric1[Log denylist hit]
1536
+ Drop2 --> Metric2[Log cardinality exceeded]
1537
+ Keep --> Export[Export metric]
1538
+ Alert --> PagerDuty[PagerDuty alert 🚨]
1539
+
1540
+ style Drop1 fill:#f8d7da
1541
+ style Drop2 fill:#f8d7da
1542
+ style Keep fill:#d4edda
1543
+ style Alert fill:#fff3cd
1544
+ style Agg fill:#fff3cd
1545
+ style PagerDuty fill:#f8d7da
1546
+ ```
1547
+
1548
+ ---
1549
+
1550
+ ## 4. Processing Pipeline
1551
+
1552
+ > ⚠️ **CRITICAL:** Middleware execution order is critical for correct operation!
1553
+ > 📖 **See:** [ADR-015: Middleware Order](ADR-015-middleware-order.md) for detailed reference guide
1554
+
1555
+ ### 4.1. Middleware Execution Order (CRITICAL!)
1556
+
1557
+ ```ruby
1558
+ # config/initializers/e11y.rb
1559
+ E11y.configure do |config|
1560
+ # Pipeline order (CRITICAL: Versioning MUST be last!)
1561
+ config.pipeline.use TraceContextMiddleware # 1. Add trace_id, timestamp
1562
+ config.pipeline.use ValidationMiddleware # 2. Fail fast (uses original class)
1563
+ config.pipeline.use PiiFilterMiddleware # 3. Security first (uses original class)
1564
+ config.pipeline.use RateLimitMiddleware # 4. System protection (uses original class)
1565
+ config.pipeline.use SamplingMiddleware # 5. Cost optimization (uses original class)
1566
+ config.pipeline.use VersioningMiddleware # 6. Normalize event_name (LAST!)
1567
+ config.pipeline.use RoutingMiddleware # 7. Buffer routing (final)
1568
+ end
1569
+ ```
1570
+
1571
+ ### 4.2. Pipeline Execution Flow
1572
+
1573
+ ```
1574
+ Event.track(payload)
1575
+
1576
+ Pipeline.process(event_data)
1577
+
1578
+ 1. TraceContextMiddleware
1579
+ ├─ Add trace_id (from Current or generate)
1580
+ ├─ Add span_id
1581
+ ├─ Add timestamp
1582
+ └─ next
1583
+
1584
+ 2. ValidationMiddleware (uses ORIGINAL class: Events::OrderPaidV2)
1585
+ ├─ Schema validation
1586
+ ├─ FAIL → raise ValidationError
1587
+ └─ PASS → next
1588
+
1589
+ 3. PiiFilterMiddleware (uses ORIGINAL class: Events::OrderPaidV2)
1590
+ ├─ Get PII rules (class-level, may differ V1 vs V2!)
1591
+ ├─ Apply filtering (per-adapter if configured)
1592
+ └─ next
1593
+
1594
+ 4. RateLimitMiddleware (uses ORIGINAL class: Events::OrderPaidV2)
1595
+ ├─ Check rate limit (Redis or local, may differ V1 vs V2!)
1596
+ ├─ EXCEEDED → return :rate_limited
1597
+ └─ ALLOWED → next
1598
+
1599
+ 5. SamplingMiddleware (uses ORIGINAL class: Events::OrderPaidV2)
1600
+ ├─ Check sampling rules (adaptive, trace-consistent)
1601
+ ├─ NOT SAMPLED → return :sampled
1602
+ └─ SAMPLED → next
1603
+
1604
+ 6. VersioningMiddleware (LAST! Normalize for adapters)
1605
+ ├─ Extract version from class name (V2 → 2)
1606
+ ├─ Normalize event_name (Events::OrderPaidV2 → Events::OrderPaid)
1607
+ ├─ Add v: 2 to payload (only if > 1)
1608
+ └─ next
1609
+
1610
+ 7. RoutingMiddleware
1611
+ ├─ severity == :debug? → RequestBuffer
1612
+ └─ severity == :info+? → MainBuffer
1613
+
1614
+ Buffer → Adapters (receive normalized event_name)
1615
+ ```
1616
+
1617
+ ---
1618
+
1619
+ ### 4.3. Why Middleware Order Matters
1620
+
1621
+ **Key Rule:** All business logic (validation, PII filtering, rate limiting, sampling) MUST use the **ORIGINAL class name** (e.g., `Events::OrderPaidV2`), not the normalized one.
1622
+
1623
+ **Why?**
1624
+ - V2 may have different schema than V1
1625
+ - V2 may have different PII rules than V1
1626
+ - V2 may have different rate limits than V1
1627
+ - V2 may have different sample rates than V1
1628
+
1629
+ **Versioning Middleware** is purely cosmetic normalization for external systems (adapters, Loki, Grafana). It MUST be the last middleware before routing.
1630
+
1631
+ **📖 For detailed explanation, examples, and troubleshooting, see:**
1632
+ - **[ADR-015: Middleware Order](ADR-015-middleware-order.md)** - Complete reference guide
1633
+ - **[ADR-012: Event Evolution](ADR-012-event-evolution.md)** - Versioning design
1634
+
1635
+ ---
1636
+
1637
+ ## 5. Memory Optimization Strategy
1638
+
1639
+ ### 5.1. Zero-Allocation Pattern
1640
+
1641
+ **Key Principle:** No object instances, only hashes.
1642
+
1643
+ ```ruby
1644
+ # ❌ BAD (creates instance):
1645
+ class OrderPaid < E11y::Event::Base
1646
+ def self.track(**payload)
1647
+ instance = new(payload) # ← Creates object!
1648
+ instance.process
1649
+ end
1650
+ end
1651
+
1652
+ # ✅ GOOD (zero allocation):
1653
+ class OrderPaid < E11y::Event::Base
1654
+ def self.track(**payload)
1655
+ event_data = { # ← Just a hash!
1656
+ event_class: self,
1657
+ payload: payload,
1658
+ timestamp: Time.now.utc
1659
+ }
1660
+
1661
+ Pipeline.process(event_data) # ← Pass hash through
1662
+ end
1663
+ end
1664
+ ```
1665
+
1666
+ ### 5.2. Memory Budget Breakdown
1667
+
1668
+ > **⚠️ C20 Update:** Memory budget now enforced via adaptive buffering (see §3.3.2)
1669
+
1670
+ **Target: <100MB @ steady state (1000 events/sec)**
1671
+
1672
+ ```
1673
+ Component Breakdown:
1674
+
1675
+ 1. Ring Buffer (main) - ADAPTIVE:
1676
+ - Capacity: Dynamic (memory-limited)
1677
+ - Memory limit: 100 MB (configurable)
1678
+ - Size per event: ~500 bytes (hash)
1679
+ - Max events: ~200k events @ 500 bytes each
1680
+ - Actual usage: Adaptive based on throughput
1681
+ - Total: ≤ 50MB (enforced by AdaptiveBuffer)
1682
+
1683
+ 2. Request Buffers (threads):
1684
+ - Threads: 10 concurrent requests
1685
+ - Events per request: 100 debug events
1686
+ - Size per event: ~500 bytes
1687
+ - Total: 10 × 100 × 500 = 500KB
1688
+
1689
+ 3. Event Classes (registry):
1690
+ - Classes: 100 event types
1691
+ - Size per class: ~10KB (metadata)
1692
+ - Total: 100 × 10KB = 1MB
1693
+
1694
+ 4. Adapters (connections):
1695
+ - Adapters: 5 (Loki, File, Sentry, etc.)
1696
+ - Connection overhead: ~2MB each
1697
+ - Total: 5 × 2MB = 10MB
1698
+
1699
+ 5. Ruby VM overhead:
1700
+ - Base Rails app: ~30MB
1701
+ - E11y gem code: ~5MB
1702
+ - Total: 35MB
1703
+
1704
+ TOTAL: 50 + 0.5 + 1 + 10 + 35 = 96.5MB
1705
+ ```
1706
+
1707
+ ✅ **Within budget: <100MB**
1708
+
1709
+ **C20 Safety Guarantee:**
1710
+ - Adaptive buffer enforces hard memory limit (default 100 MB)
1711
+ - At high throughput (10K+ events/sec), backpressure prevents overflow
1712
+ - Early flush triggered at 80% memory utilization
1713
+ - Emergency flush at 100% memory utilization
1714
+ - Monitoring alerts when memory > 90% for > 1 minute
1715
+
1716
+ **See:** §3.3.2 for adaptive buffer implementation details
1717
+
1718
+ ### 5.3. GC Optimization
1719
+
1720
+ ```ruby
1721
+ # Minimize GC pressure
1722
+ module E11y
1723
+ class Pipeline
1724
+ # Reuse hash instead of creating new
1725
+ SHARED_CONTEXT = {}
1726
+
1727
+ def self.process(event_data)
1728
+ # Don't merge! Mutate instead (if safe)
1729
+ event_data[:processed_at] = Time.now.utc
1730
+
1731
+ # ...
1732
+ end
1733
+ end
1734
+ end
1735
+
1736
+ # Pool objects where possible
1737
+ module E11y
1738
+ class StringPool
1739
+ @pool = {}
1740
+
1741
+ def self.intern(string)
1742
+ @pool[string] ||= string.freeze
1743
+ end
1744
+ end
1745
+ end
1746
+ ```
1747
+
1748
+ ---
1749
+
1750
+ ## 6. Thread Safety & Concurrency
1751
+
1752
+ ### 6.1. Concurrency Model
1753
+
1754
+ **Components:**
1755
+
1756
+ 1. **Thread-local (no sync needed):**
1757
+ - Request-scoped buffer (Current.request_buffer)
1758
+ - Context (Current.trace_id, etc.)
1759
+
1760
+ 2. **Concurrent (thread-safe):**
1761
+ - Main ring buffer (Concurrent::AtomicFixnum)
1762
+ - Adapter registry (frozen after boot)
1763
+
1764
+ 3. **Single-threaded (no contention):**
1765
+ - Flush worker (one timer task)
1766
+ - Event registry (immutable)
1767
+
1768
+ ### 6.2. Thread Safety Guarantees
1769
+
1770
+ ```ruby
1771
+ module E11y
1772
+ class Config
1773
+ def self.configure
1774
+ raise "Already configured" if @configured
1775
+
1776
+ yield configuration
1777
+
1778
+ # Freeze after configuration
1779
+ configuration.freeze!
1780
+ @configured = true
1781
+ end
1782
+
1783
+ class Configuration
1784
+ def freeze!
1785
+ @adapters.freeze
1786
+ @middlewares.freeze
1787
+ @pii_rules.freeze
1788
+ # ... freeze all config
1789
+ end
1790
+ end
1791
+ end
1792
+
1793
+ class Registry
1794
+ def self.register(event_class)
1795
+ @mutex.synchronize do
1796
+ @events[event_class.event_name] ||= {}
1797
+ @events[event_class.event_name][event_class.version] = event_class
1798
+ end
1799
+ end
1800
+
1801
+ def self.freeze!
1802
+ @events.freeze
1803
+ @mutex = nil # No more registration
1804
+ end
1805
+ end
1806
+ end
1807
+
1808
+ # After Rails boot:
1809
+ Rails.application.config.after_initialize do
1810
+ E11y::Registry.freeze!
1811
+ end
1812
+ ```
1813
+
1814
+ ---
1815
+
1816
+ ## 7. Extension Points
1817
+
1818
+ ### 7.1. Custom Middleware
1819
+
1820
+ ```ruby
1821
+ # Developers can add custom middleware
1822
+ class CustomMiddleware < E11y::Middleware
1823
+ def call(event_data)
1824
+ # Custom logic
1825
+ if event_data[:payload][:user_role] == 'admin'
1826
+ event_data[:payload][:priority] = 'high'
1827
+ end
1828
+
1829
+ @app.call(event_data)
1830
+ end
1831
+ end
1832
+
1833
+ # Register
1834
+ E11y.configure do |config|
1835
+ config.pipeline.use CustomMiddleware
1836
+ end
1837
+ ```
1838
+
1839
+ ### 7.2. Custom Adapters
1840
+
1841
+ ```ruby
1842
+ # Developers can write custom adapters
1843
+ class MyCustomAdapter < E11y::Adapters::Base
1844
+ def initialize(**options)
1845
+ @options = options
1846
+ end
1847
+
1848
+ def write_batch(events)
1849
+ # Custom logic
1850
+ events.each do |event_data|
1851
+ puts serialize_event(event_data).to_json
1852
+ end
1853
+ end
1854
+ end
1855
+
1856
+ # Register
1857
+ E11y.configure do |config|
1858
+ config.adapters.register :my_custom, MyCustomAdapter.new(...)
1859
+ end
1860
+ ```
1861
+
1862
+ ### 7.3. Custom Event Fields
1863
+
1864
+ ```ruby
1865
+ # Developers can add custom fields to events
1866
+ class OrderPaid < E11y::Event::Base
1867
+ schema do
1868
+ required(:order_id).filled(:string)
1869
+ required(:amount).filled(:decimal)
1870
+
1871
+ # Custom field
1872
+ optional(:internal_metadata).hash
1873
+ end
1874
+
1875
+ # Custom class method
1876
+ def self.track_with_metadata(**payload)
1877
+ track(**payload.merge(
1878
+ internal_metadata: {
1879
+ source: 'api',
1880
+ version: 'v2'
1881
+ }
1882
+ ))
1883
+ end
1884
+ end
1885
+ ```
1886
+
1887
+ ---
1888
+
1889
+ ## 8. Performance Requirements
1890
+
1891
+ ### 8.1. Latency Targets
1892
+
1893
+ | Operation | p50 | p95 | p99 | Max |
1894
+ |-----------|-----|-----|-----|-----|
1895
+ | **Event.track()** | <0.1ms | <0.5ms | <1ms | <5ms |
1896
+ | **Pipeline processing** | <0.05ms | <0.2ms | <0.5ms | <2ms |
1897
+ | **Buffer write** | <0.01ms | <0.05ms | <0.1ms | <1ms |
1898
+ | **Adapter write (batch)** | <10ms | <50ms | <100ms | <500ms |
1899
+
1900
+ ### 8.2. Throughput Targets
1901
+
1902
+ - **Sustained:** 1000 events/sec
1903
+ - **Burst:** 5000 events/sec (5 seconds)
1904
+ - **Peak:** 10000 events/sec (1 second)
1905
+
1906
+ ### 8.3. Resource Limits
1907
+
1908
+ - **Memory:** <100MB @ steady state
1909
+ - **CPU:** <5% @ 1000 events/sec
1910
+ - **GC time:** <10ms per minor GC
1911
+ - **Threads:** <5 (1 main + 4 workers)
1912
+
1913
+ ---
1914
+
1915
+ ## 9. Testing Strategy
1916
+
1917
+ ### 9.1. Test Pyramid
1918
+
1919
+ ```
1920
+ ┌─────────────┐
1921
+ │ Manual │ 1% - Exploratory
1922
+ │ Testing │
1923
+ ├─────────────┤
1924
+ │ E2E │ 4% - Full pipeline
1925
+ ├─────────────┤
1926
+ │ Integration │ 15% - Multi-component
1927
+ ├─────────────┤
1928
+ │ Unit │ 80% - Individual components
1929
+ └─────────────┘
1930
+ ```
1931
+
1932
+ ### 9.2. Test Coverage Requirements
1933
+
1934
+ | Component | Coverage | Critical? |
1935
+ |-----------|----------|-----------|
1936
+ | **Pipeline** | 95% | ✅ Yes |
1937
+ | **Buffers** | 90% | ✅ Yes |
1938
+ | **Adapters** | 85% | ✅ Yes |
1939
+ | **Middlewares** | 90% | ✅ Yes |
1940
+ | **Event DSL** | 95% | ✅ Yes |
1941
+ | **Configuration** | 80% | ⚠️ Important |
1942
+
1943
+ ### 9.3. Adapter Contract Tests
1944
+
1945
+ ```ruby
1946
+ # Shared contract tests for all adapters
1947
+ RSpec.shared_examples 'adapter contract' do
1948
+ describe '#write_batch' do
1949
+ it 'accepts array of event hashes' do
1950
+ events = [{ event_name: 'test', payload: {} }]
1951
+ expect { adapter.write_batch(events) }.not_to raise_error
1952
+ end
1953
+
1954
+ it 'returns success result' do
1955
+ events = [{ event_name: 'test', payload: {} }]
1956
+ result = adapter.write_batch(events)
1957
+ expect(result).to be_success
1958
+ end
1959
+
1960
+ it 'handles empty array' do
1961
+ expect { adapter.write_batch([]) }.not_to raise_error
1962
+ end
1963
+
1964
+ it 'raises AdapterError on failure' do
1965
+ allow(adapter).to receive(:http).and_raise(StandardError)
1966
+ events = [{ event_name: 'test', payload: {} }]
1967
+ expect { adapter.write_batch(events) }.to raise_error(E11y::AdapterError)
1968
+ end
1969
+ end
1970
+ end
1971
+
1972
+ # Usage in adapter specs
1973
+ RSpec.describe E11y::Adapters::LokiAdapter do
1974
+ it_behaves_like 'adapter contract'
1975
+
1976
+ # Adapter-specific tests
1977
+ end
1978
+ ```
1979
+
1980
+ ### 9.4. Performance Benchmarks
1981
+
1982
+ ```ruby
1983
+ # spec/performance/event_tracking_spec.rb
1984
+ RSpec.describe 'Event tracking performance' do
1985
+ it 'tracks 1000 events in <1 second' do
1986
+ elapsed = Benchmark.realtime do
1987
+ 1000.times do |i|
1988
+ Events::OrderPaid.track(order_id: i, amount: 99.99)
1989
+ end
1990
+ end
1991
+
1992
+ expect(elapsed).to be < 1.0
1993
+ end
1994
+
1995
+ it 'has <1ms p99 latency' do
1996
+ latencies = []
1997
+
1998
+ 1000.times do
1999
+ latency = Benchmark.realtime do
2000
+ Events::OrderPaid.track(order_id: '123', amount: 99.99)
2001
+ end
2002
+ latencies << latency
2003
+ end
2004
+
2005
+ p99 = latencies.sort[990]
2006
+ expect(p99).to be < 0.001 # <1ms
2007
+ end
2008
+
2009
+ it 'uses <100MB memory @ steady state' do
2010
+ GC.start
2011
+ before = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
2012
+
2013
+ 10_000.times do
2014
+ Events::OrderPaid.track(order_id: '123', amount: 99.99)
2015
+ end
2016
+
2017
+ GC.start
2018
+ after = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
2019
+
2020
+ memory_increase = (after - before) / 1024 / 1024 # MB
2021
+ expect(memory_increase).to be < 100
2022
+ end
2023
+ end
2024
+ ```
2025
+
2026
+ ---
2027
+
2028
+ ## 10. Dependencies
2029
+
2030
+ ### 10.1. Required Dependencies
2031
+
2032
+ ```ruby
2033
+ # e11y.gemspec
2034
+ Gem::Specification.new do |spec|
2035
+ spec.name = 'e11y'
2036
+ spec.version = E11y::VERSION
2037
+ spec.required_ruby_version = '>= 3.3.0'
2038
+
2039
+ # Required
2040
+ spec.add_dependency 'rails', '>= 8.0.0'
2041
+ spec.add_dependency 'dry-schema', '~> 1.13'
2042
+ spec.add_dependency 'dry-configurable', '~> 1.1'
2043
+ spec.add_dependency 'concurrent-ruby', '~> 1.2'
2044
+
2045
+ # Development
2046
+ spec.add_development_dependency 'rspec', '~> 3.12'
2047
+ spec.add_development_dependency 'rspec-rails', '~> 6.0'
2048
+ spec.add_development_dependency 'benchmark-ips', '~> 2.12'
2049
+ end
2050
+ ```
2051
+
2052
+ ### 10.2. Optional Dependencies (Features)
2053
+
2054
+ ```ruby
2055
+ # Gemfile
2056
+ group :e11y_optional do
2057
+ gem 'yabeda', '~> 0.12' # Metrics (UC-003)
2058
+ gem 'sentry-ruby', '~> 5.0' # Sentry adapter (UC-005)
2059
+ gem 'faraday', '~> 2.0' # HTTP adapters
2060
+ gem 'redis', '~> 5.0' # Rate limiting (UC-011)
2061
+ end
2062
+ ```
2063
+
2064
+ ### 10.3. Dependency Validation
2065
+
2066
+ ```ruby
2067
+ # Check dry-configurable version
2068
+ dry_configurable_version = Gem.loaded_specs['dry-configurable'].version
2069
+ if dry_configurable_version < Gem::Version.new('1.0.0')
2070
+ raise 'dry-configurable >= 1.0 required'
2071
+ end
2072
+
2073
+ # Verify actively maintained
2074
+ # Last release: 2024-01-15 (✅ active)
2075
+ ```
2076
+
2077
+ ---
2078
+
2079
+ ### 10.4. Deployment View
2080
+
2081
+ ```mermaid
2082
+ graph TB
2083
+ subgraph "Rails Application Pod"
2084
+ subgraph "Rails Process"
2085
+ App[Rails App<br/>Puma Workers]
2086
+ E11y[E11y Gem<br/>Embedded]
2087
+
2088
+ App -->|calls| E11y
2089
+ end
2090
+
2091
+ subgraph "E11y Workers"
2092
+ FW[Flush Worker<br/>Timer 200ms]
2093
+ RB[Request Buffers<br/>Per-thread]
2094
+ MB[Main Buffer<br/>Ring 100k]
2095
+ end
2096
+
2097
+ E11y --> RB
2098
+ E11y --> MB
2099
+ MB --> FW
2100
+ end
2101
+
2102
+ subgraph "External Services"
2103
+ Loki[Grafana Loki<br/>Log Aggregation]
2104
+ ES[Elasticsearch<br/>Analytics]
2105
+ Sentry[Sentry<br/>Error Tracking]
2106
+ Redis[Redis<br/>Rate Limiting]
2107
+ end
2108
+
2109
+ subgraph "Storage"
2110
+ S3[S3 Bucket<br/>DLQ + Archive]
2111
+ Local[Local Disk<br/>DLQ Fallback]
2112
+ end
2113
+
2114
+ FW -->|HTTP/JSON| Loki
2115
+ FW -->|HTTP/JSON| ES
2116
+ FW -->|Sentry SDK| Sentry
2117
+ E11y -->|Rate checks| Redis
2118
+ FW -->|DLQ events| S3
2119
+ FW -->|DLQ fallback| Local
2120
+
2121
+ style App fill:#d1ecf1
2122
+ style E11y fill:#d4edda
2123
+ style FW fill:#fff3cd
2124
+ ```
2125
+
2126
+ ### 10.5. Multi-Environment Configuration
2127
+
2128
+ ```mermaid
2129
+ graph TB
2130
+ subgraph "Development"
2131
+ Dev[Rails Dev Server]
2132
+ DevConfig[Config:<br/>- Console adapter<br/>- File adapter<br/>- No rate limits<br/>- Debug level]
2133
+ Dev --> DevConfig
2134
+ end
2135
+
2136
+ subgraph "Test/CI"
2137
+ Test[RSpec Tests]
2138
+ TestConfig[Config:<br/>- Memory adapter<br/>- Strict validation<br/>- No external calls<br/>- Fast flush]
2139
+ Test --> TestConfig
2140
+ end
2141
+
2142
+ subgraph "Staging"
2143
+ Stage[Rails Staging]
2144
+ StageConfig[Config:<br/>- Loki + File<br/>- Relaxed rate limits<br/>- 100% sampling<br/>- Info level]
2145
+ Stage --> StageConfig
2146
+ end
2147
+
2148
+ subgraph "Production"
2149
+ Prod[Rails Production]
2150
+ ProdConfig[Config:<br/>- Loki + ES + Sentry<br/>- Strict rate limits<br/>- Adaptive sampling<br/>- Warn level<br/>- DLQ enabled]
2151
+ Prod --> ProdConfig
2152
+ end
2153
+
2154
+ style Dev fill:#d1ecf1
2155
+ style Test fill:#fff3cd
2156
+ style Stage fill:#ffe5b4
2157
+ style Prod fill:#d4edda
2158
+ ```
2159
+
2160
+ ---
2161
+
2162
+ ## 11. Deployment & Versioning
2163
+
2164
+ ### 11.1. Semantic Versioning
2165
+
2166
+ ```
2167
+ Version Format: MAJOR.MINOR.PATCH
2168
+
2169
+ Examples:
2170
+ - 1.0.0 - Initial release (all 22 use cases)
2171
+ - 1.1.0 - New adapter (backward compatible)
2172
+ - 1.0.1 - Bug fix
2173
+ - 2.0.0 - Breaking change (API change, Rails 9 support)
2174
+ ```
2175
+
2176
+ ### 11.2. Breaking Change Policy
2177
+
2178
+ **Breaking changes only in MAJOR versions:**
2179
+
2180
+ - ✅ Allowed in MAJOR:
2181
+ - Change public API
2182
+ - Remove deprecated features
2183
+ - Change configuration format
2184
+ - Drop Rails version support
2185
+
2186
+ - ❌ Not allowed in MINOR/PATCH:
2187
+ - Remove public methods
2188
+ - Change method signatures
2189
+ - Remove configuration options
2190
+
2191
+ ### 11.3. Deprecation Policy
2192
+
2193
+ ```ruby
2194
+ # Deprecate in v1.x, remove in v2.0
2195
+ class OrderPaid < E11y::Event::Base
2196
+ def self.track_legacy(**payload)
2197
+ warn '[DEPRECATED] Use .track instead. Will be removed in v2.0'
2198
+ track(**payload)
2199
+ end
2200
+ end
2201
+ ```
2202
+
2203
+ ---
2204
+
2205
+ ## 12. Trade-offs & Decisions
2206
+
2207
+ ### 12.1. Key Trade-offs
2208
+
2209
+ | Decision | Pro | Con | Rationale |
2210
+ |----------|-----|-----|-----------|
2211
+ | **Rails-only** | Simpler code, use Rails features | Smaller audience | Target Rails devs |
2212
+ | **Zero-allocation** | Low memory, fast | More complex code | Performance critical |
2213
+ | **Adaptive buffer (C20)** | Memory safety, prevents exhaustion | May drop events under extreme load | Safety > throughput (see §3.3.2) |
2214
+ | **Ring buffer** | Lock-free, fast | Fixed size, complex | Throughput matters |
2215
+ | **Middleware chain** | Extensible, familiar | Slower than direct | Extensibility > speed |
2216
+ | **Strict validation** | Fail fast | Less forgiving | Correctness matters |
2217
+ | **No hot reload** | Simpler, safer | Requires restart | Config changes rare |
2218
+
2219
+ ### 12.2. Alternative Architectures Considered
2220
+
2221
+ **A) Actor Model (Concurrent::Actor)**
2222
+ - ✅ Pro: Cleaner concurrency
2223
+ - ❌ Con: More complex, unfamiliar
2224
+ - ❌ **Rejected:** Too complex for Ruby devs
2225
+
2226
+ **B) Evented I/O (EventMachine)**
2227
+ - ✅ Pro: High throughput
2228
+ - ❌ Con: Blocking calls problematic
2229
+ - ❌ **Rejected:** EventMachine unmaintained
2230
+
2231
+ **C) Simple Queue (Array + Mutex)**
2232
+ - ✅ Pro: Simple
2233
+ - ❌ Con: Lock contention @ high load
2234
+ - ❌ **Rejected:** Performance target not met
2235
+
2236
+ **D) Sidekiq Jobs**
2237
+ - ✅ Pro: Battle-tested
2238
+ - ❌ Con: Redis dependency, latency
2239
+ - ❌ **Rejected:** <1ms p99 impossible
2240
+
2241
+ ### 12.3. Future Considerations
2242
+
2243
+ **v1.x (stable):**
2244
+ - All 22 use cases
2245
+ - Performance targets met
2246
+ - Documentation complete
2247
+
2248
+ **v2.x (enhancements):**
2249
+ - Rails 9 support
2250
+ - Additional adapters (DataDog, New Relic)
2251
+ - OpenTelemetry full integration
2252
+
2253
+ **v3.x (possible breaking changes):**
2254
+ - Plain Ruby support (non-Rails)
2255
+ - Different buffer strategies
2256
+ - Distributed tracing coordination
2257
+
2258
+ ---
2259
+
2260
+ ## 12. Opt-In Features Pattern (C05: TRIZ #10 Extension)
2261
+
2262
+ > **🎯 CONTRADICTION_05 Resolution Extension:** Extend opt-in pattern to PII filtering and rate limiting for maximum performance flexibility.
2263
+
2264
+ ### 12.1. Motivation
2265
+
2266
+ **Problem:**
2267
+ - PII filtering adds ~0.2ms latency per event
2268
+ - Rate limiting adds ~0.01ms latency per event
2269
+ - Many events don't need these features (e.g., public page views have no PII, rare admin actions don't need rate limiting)
2270
+ - Disabling globally removes protection for events that need it
2271
+
2272
+ **Solution (TRIZ #10: Prior Action):**
2273
+ - Features enabled by default (safety first)
2274
+ - Allow opt-out per event for performance optimization
2275
+ - Follow Rails conventions (simple DSL)
2276
+
2277
+ **Benefit:**
2278
+ - Events without PII save ~0.2ms (20% of 1ms budget!)
2279
+ - Rare events without rate limiting save ~0.01ms
2280
+ - 90% of events use defaults (no code needed)
2281
+ - 10% edge cases opt-out for performance
2282
+
2283
+ ---
2284
+
2285
+ ### 12.2. Opt-Out DSL
2286
+
2287
+ **Pattern:**
2288
+ ```ruby
2289
+ class Events::OptimizedEvent < E11y::Event::Base
2290
+ # Opt-out of PII filtering (no PII in this event)
2291
+ pii_filtering false
2292
+
2293
+ # Opt-out of rate limiting (rare event, no need to limit)
2294
+ rate_limiting false
2295
+
2296
+ # Opt-out of sampling (critical event, always capture)
2297
+ sampling false
2298
+
2299
+ schema do
2300
+ # No PII fields here!
2301
+ required(:page_url).filled(:string)
2302
+ required(:referrer).filled(:string)
2303
+ end
2304
+ end
2305
+ ```
2306
+
2307
+ ---
2308
+
2309
+ ### 12.3. Use Cases
2310
+
2311
+ #### Use Case 1: Public Page View (No PII)
2312
+
2313
+ **Problem:** Page views have no PII, but PII filtering still runs (0.2ms wasted)
2314
+
2315
+ ```ruby
2316
+ class Events::PublicPageView < E11y::Event::Base
2317
+ severity :debug
2318
+ pii_filtering false # ← Opt-out (no PII fields!)
2319
+
2320
+ schema do
2321
+ required(:page_url).filled(:string)
2322
+ required(:referrer).filled(:string)
2323
+ required(:user_agent).filled(:string) # Not PII (public info)
2324
+ end
2325
+ end
2326
+ ```
2327
+
2328
+ **Performance gain:** 0.2ms per event × 1000 events/sec = 200ms/sec saved!
2329
+
2330
+ ---
2331
+
2332
+ #### Use Case 2: Rare Admin Action (No Rate Limiting)
2333
+
2334
+ **Problem:** Admin actions are rare (1/hour), but rate limiting still checks (0.01ms wasted)
2335
+
2336
+ ```ruby
2337
+ class Events::AdminServerRestart < E11y::Event::Base
2338
+ severity :warn
2339
+ rate_limiting false # ← Opt-out (rare event, <10/day)
2340
+
2341
+ schema do
2342
+ required(:admin_id).filled(:string)
2343
+ required(:reason).filled(:string)
2344
+ required(:downtime_seconds).filled(:integer)
2345
+ end
2346
+ end
2347
+ ```
2348
+
2349
+ **Performance gain:** 0.01ms per event (small, but adds up)
2350
+
2351
+ ---
2352
+
2353
+ #### Use Case 3: Critical Payment Event (No Sampling)
2354
+
2355
+ **Problem:** Payment events must be 100% captured, but sampling still checks (0.01ms wasted)
2356
+
2357
+ ```ruby
2358
+ class Events::PaymentProcessed < E11y::Event::Base
2359
+ severity :success
2360
+ sampling false # ← Opt-out (NEVER sample payments!)
2361
+
2362
+ schema do
2363
+ required(:payment_id).filled(:string)
2364
+ required(:amount).filled(:decimal)
2365
+ required(:currency).filled(:string)
2366
+ end
2367
+ end
2368
+ ```
2369
+
2370
+ **Performance gain:** 0.01ms per event + 100% capture guarantee
2371
+
2372
+ ---
2373
+
2374
+ ### 12.4. Implementation
2375
+
2376
+ **Middleware checks opt-out flag before processing:**
2377
+
2378
+ ```ruby
2379
+ # E11y::Middleware::PIIFiltering
2380
+ def call(event_data)
2381
+ event_class = event_data[:event_class]
2382
+
2383
+ # Check opt-out flag
2384
+ if event_class.pii_filtering_enabled?
2385
+ # Apply PII filtering
2386
+ filter_pii!(event_data[:payload])
2387
+ else
2388
+ # Skip PII filtering (0.2ms saved!)
2389
+ end
2390
+
2391
+ @app.call(event_data)
2392
+ end
2393
+ ```
2394
+
2395
+ **Event DSL:**
2396
+
2397
+ ```ruby
2398
+ class E11y::Event::Base
2399
+ # Default: enabled (safety first)
2400
+ class_attribute :pii_filtering_enabled, default: true
2401
+ class_attribute :rate_limiting_enabled, default: true
2402
+ class_attribute :sampling_enabled, default: true
2403
+
2404
+ def self.pii_filtering(enabled)
2405
+ self.pii_filtering_enabled = enabled
2406
+ end
2407
+
2408
+ def self.rate_limiting(enabled)
2409
+ self.rate_limiting_enabled = enabled
2410
+ end
2411
+
2412
+ def self.sampling(enabled)
2413
+ self.sampling_enabled = enabled
2414
+ end
2415
+ end
2416
+ ```
2417
+
2418
+ ---
2419
+
2420
+ ### 12.5. Validation & Safety
2421
+
2422
+ **Prevent dangerous opt-outs:**
2423
+
2424
+ ```ruby
2425
+ # Validate PII opt-out (must have no PII fields)
2426
+ class Events::PublicPageView < E11y::Event::Base
2427
+ pii_filtering false
2428
+
2429
+ schema do
2430
+ required(:email).filled(:string) # ← ERROR: PII field with pii_filtering disabled!
2431
+ end
2432
+ end
2433
+
2434
+ # E11y::Validators::PIIOptOutValidator
2435
+ def validate!
2436
+ if !pii_filtering_enabled? && schema_has_pii_fields?
2437
+ raise ConfigurationError, <<~ERROR
2438
+ Event #{event_class} has pii_filtering disabled but schema contains PII fields: #{pii_fields.join(', ')}
2439
+
2440
+ Either:
2441
+ 1. Enable PII filtering: pii_filtering true
2442
+ 2. Remove PII fields from schema
2443
+ ERROR
2444
+ end
2445
+ end
2446
+
2447
+ # PII field detection (via naming convention)
2448
+ def schema_has_pii_fields?
2449
+ schema.keys.any? { |field|
2450
+ field.to_s.match?(/email|phone|ip_address|ssn|passport/)
2451
+ }
2452
+ end
2453
+ ```
2454
+
2455
+ ---
2456
+
2457
+ ### 12.6. Already Implemented Opt-Ins
2458
+
2459
+ **1. Versioning Middleware (already opt-in):**
2460
+ ```ruby
2461
+ class Events::ApiRequest < E11y::Event::Base
2462
+ version 1 # ← Explicitly enable versioning (opt-in)
2463
+ end
2464
+ ```
2465
+
2466
+ **2. Adaptive Sampling (opt-in via conventions):**
2467
+ ```ruby
2468
+ class Events::PageView < E11y::Event::Base
2469
+ severity :debug
2470
+ # sample_rate 0.01 ← Auto from severity (convention)
2471
+
2472
+ # Override (opt-in):
2473
+ sample_rate 0.1 # ← Custom rate
2474
+ end
2475
+ ```
2476
+
2477
+ ---
2478
+
2479
+ ### 12.7. Performance Impact
2480
+
2481
+ **Without opt-out:**
2482
+ - PII filtering: 0.2ms per event
2483
+ - Rate limiting: 0.01ms per event
2484
+ - Sampling: 0.01ms per event
2485
+ - **Total:** 0.22ms per event
2486
+
2487
+ **With opt-out (edge cases):**
2488
+ - Public page views (no PII): 0.2ms saved
2489
+ - Rare admin actions (no rate limit): 0.01ms saved
2490
+ - Critical payments (no sampling): 0.01ms saved
2491
+
2492
+ **Performance budget:**
2493
+ - Target: <1ms p99
2494
+ - Middleware chain: 0.15-0.3ms
2495
+ - With opt-outs: **0.15-0.3ms - 0.22ms = saves up to 70% of middleware overhead!**
2496
+
2497
+ ---
2498
+
2499
+ ### 12.8. Trade-Offs
2500
+
2501
+ | Decision | Pro | Con | Rationale |
2502
+ |----------|-----|-----|-----------|
2503
+ | **Default: enabled** | Safety first | Slight overhead for edge cases | 90% of events need protection |
2504
+ | **Opt-out pattern** | Performance flexibility | Requires explicit opt-out | Edge cases (10%) benefit most |
2505
+ | **Validation at class load** | Catch errors early | Stricter enforcement | Prevent accidental PII exposure |
2506
+
2507
+ ---
2508
+
2509
+ ### 12.9. Migration Path
2510
+
2511
+ **Identify candidates for opt-out:**
2512
+
2513
+ ```bash
2514
+ # Find events with no PII fields
2515
+ bin/rails runner '
2516
+ events_without_pii = E11y::EventRegistry.all.select do |event_class|
2517
+ !event_class.schema.keys.any? { |field|
2518
+ field.to_s.match?(/email|phone|ip_address|ssn|passport/)
2519
+ }
2520
+ end
2521
+
2522
+ puts "Events without PII (#{events_without_pii.count}):"
2523
+ events_without_pii.each do |event_class|
2524
+ puts "- #{event_class.name}"
2525
+ end
2526
+ '
2527
+
2528
+ # Output:
2529
+ # Events without PII (15):
2530
+ # - Events::PublicPageView
2531
+ # - Events::StaticAssetLoaded
2532
+ # - Events::HealthCheckPing
2533
+ # ...
2534
+ ```
2535
+
2536
+ **Add opt-out incrementally:**
2537
+ ```ruby
2538
+ # Before:
2539
+ class Events::PublicPageView < E11y::Event::Base
2540
+ # PII filtering enabled by default (0.2ms overhead)
2541
+ end
2542
+
2543
+ # After:
2544
+ class Events::PublicPageView < E11y::Event::Base
2545
+ pii_filtering false # ← Opt-out (no PII fields)
2546
+ # 0.2ms saved! ✅
2547
+ end
2548
+ ```
2549
+
2550
+ ---
2551
+
2552
+ ### 12.10. Related
2553
+
2554
+ **See also:**
2555
+ - **ADR-006: Security & Compliance** - PII filtering architecture
2556
+ - **ADR-006 Section 4: Rate Limiting** - Rate limiting implementation
2557
+ - **UC-014: Adaptive Sampling** - Sampling conventions and opt-in overrides
2558
+ - **CONTRADICTION_05** - Performance vs. Features (TRIZ #10: Opt-In Features)
2559
+
2560
+ ---
2561
+
2562
+ ## 13. Next Steps
2563
+
2564
+ ### 13.1. Implementation Plan
2565
+
2566
+ **Phase 1: Core (Weeks 1-2)**
2567
+ - [ ] Event::Base class (zero-allocation)
2568
+ - [ ] Pipeline & middleware chain
2569
+ - [ ] Ring buffer implementation
2570
+ - [ ] Request-scoped buffer
2571
+
2572
+ **Phase 2: Features (Weeks 3-6)**
2573
+ - [ ] All 22 use cases
2574
+ - [ ] Adapters (Loki, File, Sentry, Stdout)
2575
+ - [ ] Configuration (dry-configurable)
2576
+ - [ ] Error handling
2577
+
2578
+ **Phase 3: Polish (Weeks 7-8)**
2579
+ - [ ] Performance optimization
2580
+ - [ ] Documentation
2581
+ - [ ] Testing (>90% coverage)
2582
+ - [ ] Example Rails app
2583
+
2584
+ ### 13.2. Success Criteria
2585
+
2586
+ - ✅ All 22 use cases working
2587
+ - ✅ <1ms p99 latency @ 1000 events/sec
2588
+ - ✅ <100MB memory
2589
+ - ✅ >90% test coverage
2590
+ - ✅ All APIs documented
2591
+ - ✅ Example app demonstrating features
2592
+
2593
+ ---
2594
+
2595
+ ## 📚 Related Documents
2596
+
2597
+ **Architecture & Design:**
2598
+ - **[ADR-015: Middleware Order](ADR-015-middleware-order.md)** ⚠️ CRITICAL - Middleware execution order reference
2599
+ - **[ADR-012: Event Evolution](ADR-012-event-evolution.md)** - Event versioning & schema evolution
2600
+ - **[ADR-002: Metrics & Yabeda](ADR-002-metrics-yabeda.md)** - Metrics integration
2601
+ - **[ADR-004: Adapter Architecture](ADR-004-adapter-architecture.md)** - Adapter design
2602
+ - **[ADR-006: Security & Compliance](ADR-006-security-compliance.md)** - PII filtering, rate limiting
2603
+ - **[ADR-011: Testing Strategy](ADR-011-testing-strategy.md)** - Testing approach
2604
+
2605
+ **Configuration:**
2606
+ - **[COMPREHENSIVE-CONFIGURATION.md](COMPREHENSIVE-CONFIGURATION.md)** - Complete configuration examples
2607
+ - **[CONFLICT-ANALYSIS.md](CONFLICT-ANALYSIS.md)** - Feature conflict resolutions
2608
+
2609
+ **Use Cases:**
2610
+ - **[docs/use_cases/](use_cases/)** - All 22 use cases documented
2611
+
2612
+ ---
2613
+
2614
+ **Status:** ✅ ADR Complete
2615
+ **Ready for:** Implementation
2616
+ **Estimated Effort:** 8 weeks (1 developer)
2617
+