e11y 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +4 -0
  3. data/.rubocop.yml +69 -0
  4. data/CHANGELOG.md +26 -0
  5. data/CODE_OF_CONDUCT.md +64 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +179 -0
  8. data/Rakefile +37 -0
  9. data/benchmarks/run_all.rb +33 -0
  10. data/config/README.md +83 -0
  11. data/config/loki-local-config.yaml +35 -0
  12. data/config/prometheus.yml +15 -0
  13. data/docker-compose.yml +78 -0
  14. data/docs/00-ICP-AND-TIMELINE.md +483 -0
  15. data/docs/01-SCALE-REQUIREMENTS.md +858 -0
  16. data/docs/ADR-001-architecture.md +2617 -0
  17. data/docs/ADR-002-metrics-yabeda.md +1395 -0
  18. data/docs/ADR-003-slo-observability.md +3337 -0
  19. data/docs/ADR-004-adapter-architecture.md +2385 -0
  20. data/docs/ADR-005-tracing-context.md +1372 -0
  21. data/docs/ADR-006-security-compliance.md +4143 -0
  22. data/docs/ADR-007-opentelemetry-integration.md +1385 -0
  23. data/docs/ADR-008-rails-integration.md +1911 -0
  24. data/docs/ADR-009-cost-optimization.md +2993 -0
  25. data/docs/ADR-010-developer-experience.md +2166 -0
  26. data/docs/ADR-011-testing-strategy.md +1836 -0
  27. data/docs/ADR-012-event-evolution.md +958 -0
  28. data/docs/ADR-013-reliability-error-handling.md +2750 -0
  29. data/docs/ADR-014-event-driven-slo.md +1533 -0
  30. data/docs/ADR-015-middleware-order.md +1061 -0
  31. data/docs/ADR-016-self-monitoring-slo.md +1234 -0
  32. data/docs/API-REFERENCE-L28.md +914 -0
  33. data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
  34. data/docs/IMPLEMENTATION_NOTES.md +2804 -0
  35. data/docs/IMPLEMENTATION_PLAN.md +1971 -0
  36. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
  37. data/docs/PLAN.md +148 -0
  38. data/docs/QUICK-START.md +934 -0
  39. data/docs/README.md +296 -0
  40. data/docs/design/00-memory-optimization.md +593 -0
  41. data/docs/guides/MIGRATION-L27-L28.md +692 -0
  42. data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
  43. data/docs/guides/README.md +44 -0
  44. data/docs/prd/01-overview-vision.md +440 -0
  45. data/docs/use_cases/README.md +119 -0
  46. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
  47. data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
  48. data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
  49. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
  50. data/docs/use_cases/UC-005-sentry-integration.md +759 -0
  51. data/docs/use_cases/UC-006-trace-context-management.md +905 -0
  52. data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
  53. data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
  54. data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
  55. data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
  56. data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
  57. data/docs/use_cases/UC-012-audit-trail.md +2301 -0
  58. data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
  59. data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
  60. data/docs/use_cases/UC-015-cost-optimization.md +735 -0
  61. data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
  62. data/docs/use_cases/UC-017-local-development.md +867 -0
  63. data/docs/use_cases/UC-018-testing-events.md +1081 -0
  64. data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
  65. data/docs/use_cases/UC-020-event-versioning.md +708 -0
  66. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
  67. data/docs/use_cases/UC-022-event-registry.md +648 -0
  68. data/docs/use_cases/backlog.md +226 -0
  69. data/e11y.gemspec +76 -0
  70. data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
  71. data/lib/e11y/adapters/audit_encrypted.rb +239 -0
  72. data/lib/e11y/adapters/base.rb +580 -0
  73. data/lib/e11y/adapters/file.rb +224 -0
  74. data/lib/e11y/adapters/in_memory.rb +216 -0
  75. data/lib/e11y/adapters/loki.rb +333 -0
  76. data/lib/e11y/adapters/otel_logs.rb +203 -0
  77. data/lib/e11y/adapters/registry.rb +141 -0
  78. data/lib/e11y/adapters/sentry.rb +230 -0
  79. data/lib/e11y/adapters/stdout.rb +108 -0
  80. data/lib/e11y/adapters/yabeda.rb +370 -0
  81. data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
  82. data/lib/e11y/buffers/base_buffer.rb +40 -0
  83. data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
  84. data/lib/e11y/buffers/ring_buffer.rb +267 -0
  85. data/lib/e11y/buffers.rb +14 -0
  86. data/lib/e11y/console.rb +122 -0
  87. data/lib/e11y/current.rb +48 -0
  88. data/lib/e11y/event/base.rb +894 -0
  89. data/lib/e11y/event/value_sampling_config.rb +84 -0
  90. data/lib/e11y/events/base_audit_event.rb +43 -0
  91. data/lib/e11y/events/base_payment_event.rb +33 -0
  92. data/lib/e11y/events/rails/cache/delete.rb +21 -0
  93. data/lib/e11y/events/rails/cache/read.rb +23 -0
  94. data/lib/e11y/events/rails/cache/write.rb +22 -0
  95. data/lib/e11y/events/rails/database/query.rb +45 -0
  96. data/lib/e11y/events/rails/http/redirect.rb +21 -0
  97. data/lib/e11y/events/rails/http/request.rb +26 -0
  98. data/lib/e11y/events/rails/http/send_file.rb +21 -0
  99. data/lib/e11y/events/rails/http/start_processing.rb +26 -0
  100. data/lib/e11y/events/rails/job/completed.rb +22 -0
  101. data/lib/e11y/events/rails/job/enqueued.rb +22 -0
  102. data/lib/e11y/events/rails/job/failed.rb +22 -0
  103. data/lib/e11y/events/rails/job/scheduled.rb +23 -0
  104. data/lib/e11y/events/rails/job/started.rb +22 -0
  105. data/lib/e11y/events/rails/log.rb +56 -0
  106. data/lib/e11y/events/rails/view/render.rb +23 -0
  107. data/lib/e11y/events.rb +18 -0
  108. data/lib/e11y/instruments/active_job.rb +201 -0
  109. data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
  110. data/lib/e11y/instruments/sidekiq.rb +175 -0
  111. data/lib/e11y/logger/bridge.rb +205 -0
  112. data/lib/e11y/metrics/cardinality_protection.rb +172 -0
  113. data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
  114. data/lib/e11y/metrics/registry.rb +234 -0
  115. data/lib/e11y/metrics/relabeling.rb +226 -0
  116. data/lib/e11y/metrics.rb +102 -0
  117. data/lib/e11y/middleware/audit_signing.rb +174 -0
  118. data/lib/e11y/middleware/base.rb +140 -0
  119. data/lib/e11y/middleware/event_slo.rb +167 -0
  120. data/lib/e11y/middleware/pii_filter.rb +266 -0
  121. data/lib/e11y/middleware/pii_filtering.rb +280 -0
  122. data/lib/e11y/middleware/rate_limiting.rb +214 -0
  123. data/lib/e11y/middleware/request.rb +163 -0
  124. data/lib/e11y/middleware/routing.rb +157 -0
  125. data/lib/e11y/middleware/sampling.rb +254 -0
  126. data/lib/e11y/middleware/slo.rb +168 -0
  127. data/lib/e11y/middleware/trace_context.rb +131 -0
  128. data/lib/e11y/middleware/validation.rb +118 -0
  129. data/lib/e11y/middleware/versioning.rb +132 -0
  130. data/lib/e11y/middleware.rb +12 -0
  131. data/lib/e11y/pii/patterns.rb +90 -0
  132. data/lib/e11y/pii.rb +13 -0
  133. data/lib/e11y/pipeline/builder.rb +155 -0
  134. data/lib/e11y/pipeline/zone_validator.rb +110 -0
  135. data/lib/e11y/pipeline.rb +12 -0
  136. data/lib/e11y/presets/audit_event.rb +65 -0
  137. data/lib/e11y/presets/debug_event.rb +34 -0
  138. data/lib/e11y/presets/high_value_event.rb +51 -0
  139. data/lib/e11y/presets.rb +19 -0
  140. data/lib/e11y/railtie.rb +138 -0
  141. data/lib/e11y/reliability/circuit_breaker.rb +216 -0
  142. data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
  143. data/lib/e11y/reliability/dlq/filter.rb +117 -0
  144. data/lib/e11y/reliability/retry_handler.rb +207 -0
  145. data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
  146. data/lib/e11y/sampling/error_spike_detector.rb +225 -0
  147. data/lib/e11y/sampling/load_monitor.rb +161 -0
  148. data/lib/e11y/sampling/stratified_tracker.rb +92 -0
  149. data/lib/e11y/sampling/value_extractor.rb +82 -0
  150. data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
  151. data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
  152. data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
  153. data/lib/e11y/slo/event_driven.rb +150 -0
  154. data/lib/e11y/slo/tracker.rb +119 -0
  155. data/lib/e11y/version.rb +9 -0
  156. data/lib/e11y.rb +283 -0
  157. metadata +452 -0
@@ -0,0 +1,813 @@
1
+ # UC-001: Request-Scoped Debug Buffering
2
+
3
+ **Status:** Core Feature (MVP)
4
+ **Complexity:** Intermediate
5
+ **Setup Time:** 15-30 minutes
6
+ **Target Users:** DevOps, SRE, Backend Developers
7
+
8
+ ---
9
+
10
+ ## 📋 Overview
11
+
12
+ ### Problem Statement
13
+
14
+ **Current Pain Points:**
15
+ 1. **Debug logs in production = noise**
16
+ - 99% of requests succeed → debug logs useless
17
+ - Searching through millions of debug lines for 1% errors
18
+ - High cost (storage, indexing, querying)
19
+
20
+ 2. **No debug = blind debugging**
21
+ - Production errors lack context
22
+ - Can't reproduce: "What SQL ran before error?"
23
+ - Need to deploy debug code → restart → wait for error → repeat
24
+
25
+ 3. **Trade-off dilemma:**
26
+ - Enable debug → drown in logs + high costs
27
+ - Disable debug → can't debug production issues
28
+ - Current "solutions": sampling, manual toggling (not good enough)
29
+
30
+ ### E11y Solution
31
+
32
+ **Request-scoped buffering (dual-buffer architecture):**
33
+ - **Debug events:** Buffered in thread-local storage (per-request)
34
+ - Happy path (99%): Buffer discarded → zero debug events sent
35
+ - Error path (1%): Buffer flushed → all debug context available
36
+ - **Other events (info/warn/error/success):** Go to main buffer → flush every 200ms
37
+ - **No conflict:** Two separate buffers, different flush logic
38
+
39
+ **Result:** Debug visibility when needed, zero noise when not. Fast delivery for important events.
40
+
41
+ ---
42
+
43
+ ## 🎯 Use Case Scenarios
44
+
45
+ ### Scenario 1: API Request Debugging
46
+
47
+ **Context:** Rails API endpoint processes orders
48
+
49
+ **Without E11y:**
50
+ ```ruby
51
+ # OrdersController
52
+ def create
53
+ Rails.logger.debug "Order validation started" # → Always logged (noise!)
54
+ Rails.logger.debug "Checking inventory for SKU #{params[:sku]}" # → Always logged
55
+ Rails.logger.debug "Inventory available: #{inventory.count}" # → Always logged
56
+
57
+ order = Order.create!(params)
58
+ Rails.logger.info "Order created: #{order.id}" # → Always logged
59
+
60
+ render json: order
61
+ rescue => e
62
+ Rails.logger.error "Order creation failed: #{e.message}" # → Logged
63
+ render json: { error: e.message }, status: 500
64
+ end
65
+
66
+ # Result in logs (for 100 successful + 1 failed request):
67
+ # - 303 debug lines (3 per request × 101 requests) ← 297 are useless!
68
+ # - 101 info lines
69
+ # - 1 error line
70
+ # Total: 405 lines (74% noise)
71
+ ```
72
+
73
+ **With E11y:**
74
+ ```ruby
75
+ # OrdersController
76
+ def create
77
+ Events::OrderValidationStarted.track(severity: :debug) # → Buffered
78
+ Events::InventoryCheck.track(sku: params[:sku], count: inventory.count, severity: :debug) # → Buffered
79
+
80
+ order = Order.create!(params)
81
+ Events::OrderCreated.track(order_id: order.id, severity: :success) # → Sent immediately
82
+
83
+ render json: order
84
+ rescue => e
85
+ # Exception triggers flush of ALL buffered debug events!
86
+ raise # E11y middleware catches & flushes buffer
87
+ end
88
+
89
+ # Result in logs (for 100 successful + 1 failed request):
90
+ # - 0 debug lines for 100 successful requests ← Discarded!
91
+ # - 2 debug lines for 1 failed request ← Flushed!
92
+ # - 100 success lines
93
+ # - 1 error line
94
+ # Total: 103 lines (99% noise reduction!)
95
+ ```
96
+
97
+ ---
98
+
99
+ ### Scenario 2: Multi-Step Business Flow
100
+
101
+ **Context:** Payment processing with multiple external API calls
102
+
103
+ **Code:**
104
+ ```ruby
105
+ class ProcessPaymentJob < ApplicationJob
106
+ def perform(order_id)
107
+ order = Order.find(order_id)
108
+
109
+ # Step 1: Validate
110
+ Events::PaymentValidationStarted.track(order_id: order.id, severity: :debug)
111
+ validator = PaymentValidator.new(order)
112
+ validator.validate!
113
+ Events::PaymentValidationCompleted.track(order_id: order.id, severity: :debug)
114
+
115
+ # Step 2: Charge card (external API)
116
+ Events::CardChargeStarted.track(order_id: order.id, amount: order.total, severity: :debug)
117
+ response = StripeClient.charge(order.payment_method, order.total)
118
+ Events::CardChargeCompleted.track(order_id: order.id, charge_id: response.id, severity: :debug)
119
+
120
+ # Step 3: Update inventory (external API)
121
+ Events::InventoryUpdateStarted.track(order_id: order.id, severity: :debug)
122
+ InventoryService.decrement(order.line_items)
123
+ Events::InventoryUpdateCompleted.track(order_id: order.id, severity: :debug)
124
+
125
+ # Step 4: Success
126
+ Events::PaymentProcessed.track(order_id: order.id, severity: :success)
127
+
128
+ rescue PaymentValidationError => e
129
+ # Only 2 debug events flushed (Steps 1-2 didn't run)
130
+ raise
131
+ rescue StripeError => e
132
+ # 4 debug events flushed (Steps 1-2 completed, Step 2 failed)
133
+ raise
134
+ rescue InventoryError => e
135
+ # 6 debug events flushed (all steps before Step 3 failed)
136
+ raise
137
+ end
138
+ end
139
+
140
+ # Result:
141
+ # - 99 successful jobs: 0 debug events → only 99 :success events
142
+ # - 1 failed job (Stripe error): 4 debug events + 1 error event
143
+ # Total: 99 + 5 = 104 events (vs 700 without buffering)
144
+ ```
145
+
146
+ **Why This is Powerful:**
147
+ - Debug events show **exact step** where failure occurred
148
+ - No need to guess: "Did validation run? Did Stripe charge succeed?"
149
+ - Full context without manual instrumentation changes
150
+
151
+ ---
152
+
153
+ ### Scenario 3: Debugging Database N+1 Queries
154
+
155
+ **Context:** Controller action with potential N+1 queries
156
+
157
+ **Code:**
158
+ ```ruby
159
+ class UsersController < ApplicationController
160
+ def index
161
+ @users = User.all
162
+
163
+ # Auto-instrumentation (via Rails Instrumentation - ASN → E11y)
164
+ # E11y captures all SQL queries as debug events (unidirectional flow)
165
+ # See: ADR-008 §4.1 for Rails Instrumentation architecture
166
+
167
+ @users.each do |user|
168
+ # N+1 query! Each iteration triggers SELECT from orders
169
+ Events::UserOrderCount.track(
170
+ user_id: user.id,
171
+ count: user.orders.count, # ← N+1 query here
172
+ severity: :debug
173
+ )
174
+ end
175
+
176
+ render json: @users
177
+ end
178
+ end
179
+
180
+ # Result (with N+1 but no error):
181
+ # - All SQL query debug events discarded (request succeeded)
182
+ # - Zero visibility into N+1 problem :(
183
+
184
+ # Solution: Force flush for slow requests
185
+ E11y.configure do |config|
186
+ config.request_scope do
187
+ flush_on :error # Default
188
+ flush_on_slow_request threshold: 500 # ms ← NEW!
189
+ end
190
+ end
191
+
192
+ # Now:
193
+ # - Fast requests (<500ms): debug events discarded
194
+ # - Slow requests (>500ms): debug events flushed
195
+ # Result: Automatic N+1 detection! Slow request logs show all SQL queries.
196
+ ```
197
+
198
+ ---
199
+
200
+ ## 🔧 Configuration
201
+
202
+ ### Basic Setup (Automatic)
203
+
204
+ ```ruby
205
+ # config/initializers/e11y.rb
206
+ E11y.configure do |config|
207
+ config.request_scope do
208
+ enabled true # Default: true
209
+ buffer_limit 100 # Max debug events per request
210
+ flush_on :error # Flush when exception raised
211
+ end
212
+ end
213
+ ```
214
+
215
+ **That's it!** Rails middleware auto-installed by generator.
216
+
217
+ > ⚠️ **CRITICAL: Middleware Order**
218
+ > Request-scoped buffer middleware MUST be positioned correctly in the E11y pipeline. **If you use custom middleware**, ensure buffer routing happens **after** all business logic (validation, PII filtering, rate limiting, sampling) but **before** adapter delivery.
219
+ >
220
+ > **Why:** Buffer routing needs access to fully processed events (with trace context, validated, filtered). If positioned too early, events may be buffered before PII filtering, creating compliance risks.
221
+ >
222
+ > **Consequences of wrong order:**
223
+ > - ❌ Buffered debug events may contain unfiltered PII → GDPR violation
224
+ > - ❌ Rate-limited events may still be buffered → memory waste
225
+ > - ❌ Invalid events may be buffered → validation bypassed
226
+ >
227
+ > **Correct order:**
228
+ > ```ruby
229
+ > config.pipeline.use TraceContextMiddleware # 1. Enrich first
230
+ > config.pipeline.use ValidationMiddleware # 2. Fail fast
231
+ > config.pipeline.use PiiFilterMiddleware # 3. Security (BEFORE buffer)
232
+ > config.pipeline.use RateLimitMiddleware # 4. Protection
233
+ > config.pipeline.use SamplingMiddleware # 5. Cost optimization
234
+ > config.pipeline.use RoutingMiddleware # 6. Buffer routing (LAST!)
235
+ > ```
236
+ >
237
+ > **See:** [ADR-001 Section 4.1: Middleware Execution Order](../ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../ADR-015-middleware-order.md) for detailed explanation.
238
+
239
+ ---
240
+
241
+ ### Advanced Configuration
242
+
243
+ ```ruby
244
+ E11y.configure do |config|
245
+ config.request_scope do
246
+ enabled true
247
+ buffer_limit 200 # Larger buffer for complex requests
248
+
249
+ # Multiple flush triggers
250
+ flush_on :error # On exception (default)
251
+ flush_on :warn # On any :warn event
252
+ flush_on :slow_request, threshold: 1000 # On requests >1s
253
+
254
+ # Custom flush condition
255
+ flush_if do |events, request|
256
+ # Flush if any event contains "payment" in name
257
+ events.any? { |e| e.name.include?('payment') }
258
+ end
259
+
260
+ # Exclude certain events from buffer (always send)
261
+ exclude_from_buffer do
262
+ severity [:info, :success, :warn, :error, :fatal] # Only buffer :debug
263
+ event_patterns ['security.*', 'audit.*'] # Never buffer security events
264
+ end
265
+
266
+ # Buffer overflow strategy
267
+ overflow_strategy :drop_oldest # or :drop_newest, :flush_immediately
268
+ end
269
+ end
270
+ ```
271
+
272
+ ---
273
+
274
+ ## 📊 How It Works (Technical Details)
275
+
276
+ ### Dual-Buffer Architecture
277
+
278
+ **E11y использует ДВА независимых буфера:**
279
+
280
+ ```
281
+ ┌─────────────────────────────────────────────────────────────────┐
282
+ │ Request Thread (Rack/Rails) │
283
+ │ │
284
+ │ ┌──────────────────────────────────────┐ │
285
+ │ │ E11y::Middleware::Rack │ │
286
+ │ │ - Initialize request-scoped buffer │ │
287
+ │ │ - Store in Thread.current[:e11y_*] │ │
288
+ │ └──────────────────────────────────────┘ │
289
+ │ ↓ │
290
+ │ ┌──────────────────────────────────────┐ │
291
+ │ │ Controller Action │ │
292
+ │ │ │ │
293
+ │ │ Events::DebugEvent.track(...) │ │
294
+ │ │ ↓ │ │
295
+ │ │ severity == :debug? │ │
296
+ │ │ YES ──→ Request-Scoped Buffer ───┐ │
297
+ │ │ (Thread-local) │ │
298
+ │ │ Flush: on error/end │ │
299
+ │ │ │ │
300
+ │ │ Events::InfoEvent.track(...) │ │
301
+ │ │ ↓ │ │
302
+ │ │ severity >= :info? │ │
303
+ │ │ YES ──→ Main Buffer ─────────────┼──→ Flush: every 200ms │
304
+ │ │ (Global, SPSC) │ (Background Thread) │
305
+ │ └──────────────────────────────────────┘ │
306
+ │ ↓ │
307
+ │ ┌──────────────────────────────────────┐ │
308
+ │ │ Response / Exception │ │
309
+ │ │ - Success → Discard debug buffer │ │
310
+ │ │ - Error → Flush debug buffer │ │
311
+ │ └──────────────────────────────────────┘ │
312
+ └─────────────────────────────────────────────────────────────────┘
313
+
314
+ Background Flush Thread (200ms interval):
315
+ Main Buffer → Adapters (Loki, Sentry, etc.)
316
+ ```
317
+
318
+ ### Buffer Routing Logic
319
+
320
+ ```ruby
321
+ # Pseudo-code для понимания
322
+ def track_event(event)
323
+ if event.severity == :debug && E11y.request_scope.active?
324
+ # → Request-scoped buffer (Thread-local)
325
+ Thread.current[:e11y_request_buffer] << event
326
+ else
327
+ # → Main buffer (Global SPSC ring buffer)
328
+ E11y.main_buffer << event
329
+ # Фоновый поток заберет через 200ms (или раньше если батч заполнится)
330
+ end
331
+ end
332
+ ```
333
+
334
+ ### Implementation Pseudocode
335
+
336
+ ```ruby
337
+ # lib/e11y/middleware/rack.rb
338
+ class E11y::Middleware::Rack
339
+ def call(env)
340
+ # 1. Initialize request-scoped buffer
341
+ E11y::RequestScope.initialize_buffer!
342
+
343
+ # 2. Call application
344
+ status, headers, body = @app.call(env)
345
+
346
+ # 3. Success → discard buffer
347
+ E11y::RequestScope.discard_buffer!
348
+
349
+ [status, headers, body]
350
+
351
+ rescue => exception
352
+ # 4. Error → flush buffer then re-raise
353
+ E11y::RequestScope.flush_buffer!(severity: :error)
354
+ raise
355
+ ensure
356
+ # 5. Cleanup
357
+ E11y::RequestScope.cleanup!
358
+ end
359
+ end
360
+
361
+ # lib/e11y/request_scope.rb
362
+ module E11y::RequestScope
363
+ def self.initialize_buffer!
364
+ Thread.current[:e11y_buffer] = []
365
+ Thread.current[:e11y_request_id] = SecureRandom.uuid
366
+ end
367
+
368
+ def self.buffer_event(event)
369
+ buffer = Thread.current[:e11y_buffer]
370
+ return false unless buffer # Not in request scope
371
+
372
+ if event.severity == :debug
373
+ buffer << event
374
+ true # Event buffered (not sent yet)
375
+ else
376
+ false # Non-debug events sent immediately
377
+ end
378
+ end
379
+
380
+ def self.flush_buffer!(severity: :error)
381
+ buffer = Thread.current[:e11y_buffer]
382
+ return if buffer.nil? || buffer.empty?
383
+
384
+ # Flush all buffered events with specified severity
385
+ buffer.each do |event|
386
+ event.severity = severity if event.severity == :debug
387
+ E11y::Collector.collect(event)
388
+ end
389
+
390
+ buffer.clear
391
+ end
392
+
393
+ def self.discard_buffer!
394
+ Thread.current[:e11y_buffer]&.clear
395
+ end
396
+
397
+ def self.cleanup!
398
+ Thread.current[:e11y_buffer] = nil
399
+ Thread.current[:e11y_request_id] = nil
400
+ end
401
+ end
402
+ ```
403
+
404
+ ---
405
+
406
+ ## 📈 Performance Impact
407
+
408
+ > **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
409
+
410
+ ### Buffer Metrics
411
+
412
+ **E11y automatically tracks request buffer performance:**
413
+
414
+ ```ruby
415
+ # Exposed via Yabeda (auto-configured)
416
+ Yabeda.e11y_request_buffer_size # Gauge: current buffer size per request
417
+ Yabeda.e11y_request_buffer_flushes_total # Counter: buffer flushes by trigger
418
+
419
+ # Accessible via Prometheus metrics endpoint
420
+ # Example queries:
421
+
422
+ # 1. Average buffer size
423
+ avg(e11y_request_buffer_size)
424
+
425
+ # 2. Buffer flush rate by trigger
426
+ rate(e11y_request_buffer_flushes_total{trigger="error"}[5m])
427
+
428
+ # 3. Buffer overflow alerts
429
+ e11y_request_buffer_size >= 100 # Alert if buffer limit reached
430
+ ```
431
+
432
+ **Monitoring Examples:**
433
+
434
+ ```ruby
435
+ # Grafana dashboard panels:
436
+
437
+ # Panel 1: Buffer Size Distribution
438
+ histogram_quantile(0.99,
439
+ sum(rate(e11y_request_buffer_size[5m])) by (le)
440
+ )
441
+ # Shows p99 buffer size
442
+
443
+ # Panel 2: Flush Triggers Breakdown
444
+ sum by (trigger) (
445
+ rate(e11y_request_buffer_flushes_total[5m])
446
+ )
447
+ # Shows why buffers flush (error vs. slow_request vs. custom)
448
+
449
+ # Panel 3: Memory Impact Estimate
450
+ avg(e11y_request_buffer_size) * 500 # bytes per event
451
+ # Estimates per-request memory usage
452
+ ```
453
+
454
+ **What to Monitor:**
455
+
456
+ | Metric | Normal | Warning | Alert |
457
+ |--------|--------|---------|-------|
458
+ | **Buffer Size (p99)** | <20 events | 50-80 events | >80 events |
459
+ | **Flush Rate (error)** | <1% of requests | 1-5% | >5% |
460
+ | **Flush Rate (slow)** | <5% of requests | 5-10% | >10% |
461
+ | **Buffer Overflows** | 0 | >0 | >10/min |
462
+
463
+ ### Memory
464
+
465
+ ```ruby
466
+ # Per-request memory usage
467
+
468
+ # Typical request (10 debug events):
469
+ # - Event object: ~500 bytes
470
+ # - Buffer array: ~100 bytes
471
+ # Total: ~5KB per request
472
+
473
+ # Worst case (100 debug events, limit reached):
474
+ # Total: ~50KB per request
475
+
476
+ # Concurrent requests (100):
477
+ # - Typical: 100 × 5KB = 500KB
478
+ # - Worst: 100 × 50KB = 5MB
479
+
480
+ # Conclusion: Negligible memory impact (<10MB even at high load)
481
+ ```
482
+
483
+ ### Latency
484
+
485
+ ```ruby
486
+ # Overhead per track() call
487
+
488
+ # Buffered event (debug):
489
+ # - Check Thread.current: ~1μs
490
+ # - Append to array: ~0.5μs
491
+ # Total: ~1.5μs
492
+
493
+ # Non-buffered event (info/success):
494
+ # - No buffering: 0μs
495
+ # - Send to collector: ~20μs (async, non-blocking)
496
+
497
+ # Conclusion: <2μs overhead for debug events (negligible)
498
+ ```
499
+
500
+ ---
501
+
502
+ ## 🧪 Testing
503
+
504
+ ### Test Request-Scoped Buffering
505
+
506
+ ```ruby
507
+ # spec/requests/orders_spec.rb
508
+ RSpec.describe 'Orders API' do
509
+ it 'discards debug events on successful request' do
510
+ # Spy on E11y collector
511
+ allow(E11y::Collector).to receive(:collect)
512
+
513
+ post '/orders', params: { sku: 'ABC123' }
514
+
515
+ expect(response).to be_successful
516
+
517
+ # Verify: only :success event sent, no :debug events
518
+ expect(E11y::Collector).to have_received(:collect).once
519
+ expect(E11y::Collector).to have_received(:collect).with(
520
+ have_attributes(severity: :success)
521
+ )
522
+ end
523
+
524
+ it 'flushes debug events on error' do
525
+ allow(E11y::Collector).to receive(:collect)
526
+
527
+ # Trigger error (invalid SKU)
528
+ post '/orders', params: { sku: 'INVALID' }
529
+
530
+ expect(response).to have_http_status(500)
531
+
532
+ # Verify: debug events flushed
533
+ expect(E11y::Collector).to have_received(:collect).at_least(2).times
534
+ expect(E11y::Collector).to have_received(:collect).with(
535
+ have_attributes(severity: :debug)
536
+ ).at_least(:once)
537
+ end
538
+ end
539
+ ```
540
+
541
+ ---
542
+
543
+ ## 💡 Best Practices
544
+
545
+ ### ✅ DO
546
+
547
+ **1. Use :debug for diagnostic events**
548
+ ```ruby
549
+ Events::SqlQuery.track(sql: query, duration: duration, severity: :debug)
550
+ Events::CacheHit.track(key: key, severity: :debug)
551
+ Events::ApiCallStarted.track(service: 'stripe', severity: :debug)
552
+ ```
553
+
554
+ **2. Use :success for business events**
555
+ ```ruby
556
+ Events::OrderPaid.track(order_id: order.id, severity: :success)
557
+ Events::UserRegistered.track(user_id: user.id, severity: :success)
558
+ ```
559
+
560
+ **3. Set reasonable buffer limits**
561
+ ```ruby
562
+ config.request_scope do
563
+ buffer_limit 100 # Typical: 10-50 debug events per request
564
+ end
565
+ ```
566
+
567
+ **4. Flush on custom conditions**
568
+ ```ruby
569
+ config.request_scope do
570
+ flush_on :slow_request, threshold: 500 # ms
571
+ flush_if { |events| events.any? { |e| e.name =~ /payment|security/ } }
572
+ end
573
+ ```
574
+
575
+ ---
576
+
577
+ ### ❌ DON'T
578
+
579
+ **1. Don't buffer non-debug events**
580
+ ```ruby
581
+ # ❌ BAD: Buffering :info events (defeats purpose)
582
+ Events::OrderCreated.track(order_id: order.id, severity: :info) # Should be :success
583
+
584
+ # ✅ GOOD:
585
+ Events::OrderCreated.track(order_id: order.id, severity: :success)
586
+ ```
587
+
588
+ **2. Don't set buffer limits too high**
589
+ ```ruby
590
+ # ❌ BAD: Huge buffer (memory risk)
591
+ config.request_scope { buffer_limit 10_000 }
592
+
593
+ # ✅ GOOD: Reasonable limit
594
+ config.request_scope { buffer_limit 100 }
595
+ ```
596
+
597
+ **3. Don't buffer security events**
598
+ ```ruby
599
+ # ❌ BAD: Security events must be sent immediately!
600
+ Events::LoginAttempt.track(user_id: user.id, severity: :debug)
601
+
602
+ # ✅ GOOD:
603
+ Events::LoginAttempt.track(user_id: user.id, severity: :info)
604
+ # OR explicitly exclude from buffer:
605
+ config.request_scope do
606
+ exclude_from_buffer { event_patterns ['security.*', 'audit.*'] }
607
+ end
608
+ ```
609
+
610
+ ---
611
+
612
+ ## 🔄 Взаимодействие с Flush Interval (200ms)
613
+
614
+ ### Вопрос: Не конфликтуют ли буферы?
615
+
616
+ **Ответ: НЕТ. Они независимы.**
617
+
618
+ ### Детальная Логика
619
+
620
+ ```ruby
621
+ # config/initializers/e11y.rb
622
+ E11y.configure do |config|
623
+ # === Main Buffer (Global) ===
624
+ config.buffer do
625
+ capacity 100_000 # Ring buffer size
626
+ flush_interval 200 # ms - for :info/:warn/:error/:success/:fatal
627
+ flush_batch_size 500
628
+ end
629
+
630
+ # === Request-Scoped Buffer (Thread-local) ===
631
+ config.request_scope do
632
+ enabled true
633
+ buffer_limit 100 # Per-request limit for :debug only
634
+ flush_on :error # Flush on exception
635
+ end
636
+ end
637
+ ```
638
+
639
+ ### Поток Событий
640
+
641
+ **Scenario 1: Обычный запрос (успешный)**
642
+ ```ruby
643
+ # Request starts
644
+ Events::DebugEvent.track(...) # → Request buffer (thread-local)
645
+ Events::DebugEvent.track(...) # → Request buffer (thread-local)
646
+ Events::OrderCreated.track(severity: :success) # → Main buffer → flush in 200ms
647
+ Events::DebugEvent.track(...) # → Request buffer (thread-local)
648
+ # Request ends successfully
649
+ # → Request buffer DISCARDED (debug events lost)
650
+ # → Main buffer flushed every 200ms (success event sent)
651
+ ```
652
+
653
+ **Scenario 2: Запрос с ошибкой**
654
+ ```ruby
655
+ # Request starts
656
+ Events::DebugEvent.track(...) # → Request buffer
657
+ Events::DebugEvent.track(...) # → Request buffer
658
+ Events::PaymentFailed.track(severity: :error) # → Main buffer → flush in 200ms
659
+ Events::DebugEvent.track(...) # → Request buffer
660
+ # Exception raised!
661
+ # → Request buffer FLUSHED immediately (all 3 debug events sent)
662
+ # → Main buffer continues flush every 200ms (error event sent)
663
+ ```
664
+
665
+ **Scenario 3: Высоконагруженный сервис**
666
+ ```ruby
667
+ # 1000 requests/sec, каждый с 5 debug events
668
+ # → 5000 debug events/sec в request buffers (thread-local)
669
+ # → 99% успешных → 4950 debug events/sec DISCARDED
670
+ # → 1% ошибок → 50 debug events/sec FLUSHED
671
+ #
672
+ # Параллельно:
673
+ # → 1000 info/success events/sec → Main buffer
674
+ # → Flush каждые 200ms = 5 batches/sec
675
+ # → 200 events per batch (в среднем)
676
+ ```
677
+
678
+ ### Итого: Никакого Конфликта!
679
+
680
+ | Event Type | Buffer | Flush Trigger | Latency |
681
+ |------------|--------|---------------|---------|
682
+ | `:debug` | Request-scoped (Thread-local) | On error or end-of-request | 0ms (discarded) or immediate (on error) |
683
+ | `:info` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
684
+ | `:success` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
685
+ | `:warn` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
686
+ | `:error` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
687
+ | `:fatal` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
688
+
689
+ **Преимущества двойного буфера:**
690
+ 1. ✅ Debug события не засоряют main buffer
691
+ 2. ✅ Важные события (info+) идут быстро (200ms)
692
+ 3. ✅ Debug события идут мгновенно при ошибке (flush triggered)
693
+ 4. ✅ 99% debug событий вообще не обрабатываются (discard = zero cost)
694
+ 5. ✅ Thread-safety: request buffer изолирован в Thread.current
695
+
696
+ ### Визуальная Диаграмма
697
+
698
+ ```
699
+ Time: ──────────────────────────────────────────────────>
700
+ 0ms 200ms 400ms 600ms 800ms 1000ms
701
+
702
+ Request Thread 1 (success):
703
+ ┌─────────────────────┐
704
+ │ :debug → [Req Buf] │ ← Discarded at end
705
+ │ :debug → [Req Buf] │ ← Discarded at end
706
+ │ :success → [Main] │ ─┐
707
+ └─────────────────────┘ │
708
+
709
+ Request Thread 2 (error): │
710
+ ┌────────────────────┐│
711
+ │ :debug → [Req Buf] ││ ← Flushed on error!
712
+ │ :error → [Main] ││ ─┐
713
+ │ EXCEPTION! ││ │
714
+ │ Flush req buffer ──┼┼──┼──→ Adapters
715
+ └────────────────────┘│ │
716
+ │ │
717
+ Background Flush Thread: │ │
718
+ Every 200ms: ────────────┴──┴──→ Adapters
719
+ ↑ ↑
720
+ 200ms 400ms
721
+ ```
722
+
723
+ ### Пример с Цифрами
724
+
725
+ **Нагрузка:**
726
+ - 100 requests/sec
727
+ - Каждый запрос: 3 debug события + 1 success событие
728
+ - Error rate: 1%
729
+
730
+ **Что происходит:**
731
+
732
+ | Time | Request Buffer (Thread-local) | Main Buffer (Global) | Flush |
733
+ |------|------------------------------|---------------------|-------|
734
+ | 0ms | Req1: [D, D, D] | [S1] | - |
735
+ | 10ms | Req2: [D, D, D] | [S1, S2] | - |
736
+ | 20ms | Req3: [D, D, D] | [S1, S2, S3] | - |
737
+ | ... | ... | ... | - |
738
+ | 200ms | Req20: [D, D, D] | [S1...S20] | **Flush 20 success events** |
739
+ | 210ms | Req21: [D, D, D] ERROR! | [S21, E21, **D, D, D from Req21**] | **Immediate flush debug** |
740
+ | 400ms | - | [S21...S40] | **Flush next batch** |
741
+
742
+ **Результат:**
743
+ - Success events: ~100/sec → flush каждые 200ms → latency <200ms ✅
744
+ - Debug events (99%): DISCARDED → zero overhead ✅
745
+ - Debug events (1% errors): flushed IMMEDIATELY with error context ✅
746
+
747
+ ---
748
+
749
+ ## 🎯 Success Metrics
750
+
751
+ ### Quantifiable Benefits
752
+
753
+ **1. Log Volume Reduction**
754
+ - Before: 1M debug lines/day
755
+ - After: 10K debug lines/day (only errors)
756
+ - **Reduction: 99%**
757
+
758
+ **2. Storage Cost Savings**
759
+ - Before: $500/month (ELK ingestion)
760
+ - After: $50/month
761
+ - **Savings: $450/month (90%)**
762
+
763
+ **3. Query Performance**
764
+ - Before: "Search last 1M lines" = 30 seconds
765
+ - After: "Search last 10K lines" = 0.5 seconds
766
+ - **Speedup: 60x**
767
+
768
+ **4. Debugging Efficiency**
769
+ - Before: "Guess what happened before error" = 30 minutes
770
+ - After: "See full context in logs" = 2 minutes
771
+ - **Time saved: 28 minutes per incident**
772
+
773
+ ---
774
+
775
+ ## 🚀 Migration Guide
776
+
777
+ ### From Rails.logger (No Buffering)
778
+
779
+ **Before:**
780
+ ```ruby
781
+ def create
782
+ Rails.logger.debug "Starting order creation"
783
+ # ... logic ...
784
+ Rails.logger.info "Order created: #{order.id}"
785
+ end
786
+
787
+ # Problem: Always logs debug (even on success)
788
+ ```
789
+
790
+ **After:**
791
+ ```ruby
792
+ def create
793
+ Events::OrderCreationStarted.track(severity: :debug)
794
+ # ... logic ...
795
+ Events::OrderCreated.track(order_id: order.id, severity: :success)
796
+ end
797
+
798
+ # Solution: Debug events buffered, only flushed on error
799
+ ```
800
+
801
+ ---
802
+
803
+ ## 📚 Related Use Cases
804
+
805
+ - **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Define structured events
806
+ - **[UC-010: Background Job Tracking](./UC-010-background-job-tracking.md)** - Buffering in Sidekiq/ActiveJob
807
+ - **[UC-015: Local Development](./UC-015-local-development.md)** - Test buffering locally
808
+
809
+ ---
810
+
811
+ **Document Version:** 1.0
812
+ **Last Updated:** January 12, 2026
813
+ **Status:** ✅ Complete