e11y 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (230) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +130 -10
  3. data/CHANGELOG.md +56 -1
  4. data/CLAUDE.md +168 -0
  5. data/CONTRIBUTING.md +640 -0
  6. data/README.md +134 -702
  7. data/RELEASE.md +18 -3
  8. data/Rakefile +108 -29
  9. data/config/README.md +1 -1
  10. data/config/loki-local-config.yaml +12 -0
  11. data/config/otel-collector-config.yaml +44 -0
  12. data/cucumber.yml +1 -0
  13. data/docker-compose.yml +18 -2
  14. data/docs/ADAPTERS.md +76 -0
  15. data/docs/ADAPTIVE_SAMPLING.md +59 -0
  16. data/docs/COMPARISON.md +104 -0
  17. data/docs/CONFIGURATION.md +52 -0
  18. data/docs/DISTRIBUTED_TRACING.md +44 -0
  19. data/docs/LIMITATIONS.md +13 -0
  20. data/docs/METRICS_DSL.md +84 -0
  21. data/docs/PERFORMANCE.md +60 -0
  22. data/docs/PII_FILTERING.md +40 -0
  23. data/docs/PRESETS.md +65 -0
  24. data/docs/QUICK-START.md +546 -587
  25. data/docs/RAILS_INTEGRATION.md +29 -0
  26. data/docs/SCHEMA_VALIDATION.md +63 -0
  27. data/docs/SLO-PROMQL-ALERTS.md +161 -0
  28. data/docs/TESTING.md +69 -0
  29. data/docs/{ADR-001-architecture.md → architecture/ADR-001-architecture.md} +35 -64
  30. data/docs/{ADR-002-metrics-yabeda.md → architecture/ADR-002-metrics-yabeda.md} +62 -236
  31. data/docs/{ADR-003-slo-observability.md → architecture/ADR-003-slo-observability.md} +27 -466
  32. data/docs/{ADR-004-adapter-architecture.md → architecture/ADR-004-adapter-architecture.md} +163 -146
  33. data/docs/{ADR-005-tracing-context.md → architecture/ADR-005-tracing-context.md} +10 -9
  34. data/docs/{ADR-006-security-compliance.md → architecture/ADR-006-security-compliance.md} +184 -191
  35. data/docs/{ADR-007-opentelemetry-integration.md → architecture/ADR-007-opentelemetry-integration.md} +3 -21
  36. data/docs/{ADR-008-rails-integration.md → architecture/ADR-008-rails-integration.md} +209 -339
  37. data/docs/{ADR-009-cost-optimization.md → architecture/ADR-009-cost-optimization.md} +45 -54
  38. data/docs/architecture/ADR-010-developer-experience.md +522 -0
  39. data/docs/{ADR-011-testing-strategy.md → architecture/ADR-011-testing-strategy.md} +41 -83
  40. data/docs/{ADR-013-reliability-error-handling.md → architecture/ADR-013-reliability-error-handling.md} +37 -12
  41. data/docs/{ADR-014-event-driven-slo.md → architecture/ADR-014-event-driven-slo.md} +12 -24
  42. data/docs/{ADR-015-middleware-order.md → architecture/ADR-015-middleware-order.md} +23 -41
  43. data/docs/{ADR-016-self-monitoring-slo.md → architecture/ADR-016-self-monitoring-slo.md} +52 -349
  44. data/docs/{ADR-017-multi-rails-compatibility.md → architecture/ADR-017-multi-rails-compatibility.md} +4 -11
  45. data/docs/architecture/ADR-018-memory-optimization.md +366 -0
  46. data/docs/{ADR-INDEX.md → architecture/ADR-INDEX.md} +11 -6
  47. data/docs/{00-ICP-AND-TIMELINE.md → prd/00-ICP-AND-TIMELINE.md} +6 -6
  48. data/docs/{01-SCALE-REQUIREMENTS.md → prd/01-SCALE-REQUIREMENTS.md} +6 -6
  49. data/docs/prd/01-overview-vision.md +19 -14
  50. data/docs/use_cases/README.md +22 -23
  51. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +50 -44
  52. data/docs/use_cases/UC-002-business-event-tracking.md +26 -95
  53. data/docs/use_cases/UC-003-event-metrics.md +66 -0
  54. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +42 -101
  55. data/docs/use_cases/UC-005-sentry-integration.md +13 -15
  56. data/docs/use_cases/UC-006-trace-context-management.md +30 -28
  57. data/docs/use_cases/UC-007-pii-filtering.md +35 -87
  58. data/docs/use_cases/UC-008-opentelemetry-integration.md +51 -89
  59. data/docs/use_cases/UC-009-multi-service-tracing.md +4 -4
  60. data/docs/use_cases/UC-010-background-job-tracking.md +5 -5
  61. data/docs/use_cases/UC-011-rate-limiting.md +95 -168
  62. data/docs/use_cases/UC-012-audit-trail.md +21 -46
  63. data/docs/use_cases/UC-013-high-cardinality-protection.md +29 -167
  64. data/docs/use_cases/UC-014-adaptive-sampling.md +2 -2
  65. data/docs/use_cases/UC-015-cost-optimization.md +46 -99
  66. data/docs/use_cases/UC-016-rails-logger-migration.md +39 -213
  67. data/docs/use_cases/UC-017-local-development.md +203 -777
  68. data/docs/use_cases/UC-018-testing-events.md +3 -3
  69. data/docs/use_cases/UC-019-retention-based-routing.md +53 -106
  70. data/docs/use_cases/UC-020-event-versioning.md +8 -9
  71. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +18 -22
  72. data/docs/use_cases/UC-022-event-registry.md +15 -21
  73. data/docs/use_cases/backlog.md +119 -87
  74. data/e11y.gemspec +2 -2
  75. data/gems/e11y-devtools/README.md +136 -0
  76. data/gems/e11y-devtools/config/routes.rb +8 -0
  77. data/gems/e11y-devtools/e11y-devtools.gemspec +25 -0
  78. data/gems/e11y-devtools/exe/e11y +34 -0
  79. data/gems/e11y-devtools/lib/e11y/devtools/mcp/server.rb +96 -0
  80. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tool_base.rb +25 -0
  81. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/clear.rb +31 -0
  82. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/errors.rb +35 -0
  83. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/event_detail.rb +33 -0
  84. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/events_by_trace.rb +33 -0
  85. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/interactions.rb +40 -0
  86. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/recent_events.rb +34 -0
  87. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/search.rb +34 -0
  88. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/stats.rb +30 -0
  89. data/gems/e11y-devtools/lib/e11y/devtools/overlay/assets/overlay.js +115 -0
  90. data/gems/e11y-devtools/lib/e11y/devtools/overlay/controller.rb +54 -0
  91. data/gems/e11y-devtools/lib/e11y/devtools/overlay/engine.rb +26 -0
  92. data/gems/e11y-devtools/lib/e11y/devtools/overlay/middleware.rb +80 -0
  93. data/gems/e11y-devtools/lib/e11y/devtools/overlay/rails_controller.rb +42 -0
  94. data/gems/e11y-devtools/lib/e11y/devtools/tui/app.rb +262 -0
  95. data/gems/e11y-devtools/lib/e11y/devtools/tui/grouping.rb +66 -0
  96. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_detail.rb +62 -0
  97. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_list.rb +70 -0
  98. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/interaction_list.rb +47 -0
  99. data/gems/e11y-devtools/lib/e11y/devtools/version.rb +8 -0
  100. data/gems/e11y-devtools/lib/e11y/devtools.rb +13 -0
  101. data/gems/e11y-devtools/spec/e11y/devtools/mcp/tools_spec.rb +107 -0
  102. data/gems/e11y-devtools/spec/e11y/devtools/overlay/controller_spec.rb +58 -0
  103. data/gems/e11y-devtools/spec/e11y/devtools/overlay/middleware_spec.rb +46 -0
  104. data/gems/e11y-devtools/spec/e11y/devtools/tui/app_spec.rb +85 -0
  105. data/gems/e11y-devtools/spec/e11y/devtools/tui/grouping_spec.rb +64 -0
  106. data/gems/e11y-devtools/spec/spec_helper.rb +5 -0
  107. data/gems/e11y-devtools/spec/tui/widgets/event_list_spec.rb +44 -0
  108. data/gems/e11y-devtools/spec/tui/widgets/interaction_list_spec.rb +62 -0
  109. data/lib/e11y/adapters/audit_encrypted.rb +53 -11
  110. data/lib/e11y/adapters/base.rb +33 -34
  111. data/lib/e11y/adapters/dev_log/file_store.rb +143 -0
  112. data/lib/e11y/adapters/dev_log/query.rb +219 -0
  113. data/lib/e11y/adapters/dev_log.rb +118 -0
  114. data/lib/e11y/adapters/file.rb +3 -6
  115. data/lib/e11y/adapters/in_memory.rb +52 -5
  116. data/lib/e11y/adapters/in_memory_test.rb +29 -0
  117. data/lib/e11y/adapters/loki.rb +58 -23
  118. data/lib/e11y/adapters/null.rb +82 -0
  119. data/lib/e11y/adapters/opentelemetry_collector.rb +183 -0
  120. data/lib/e11y/adapters/otel_logs.rb +136 -23
  121. data/lib/e11y/adapters/sentry.rb +4 -7
  122. data/lib/e11y/adapters/stdout.rb +73 -7
  123. data/lib/e11y/adapters/yabeda.rb +153 -29
  124. data/lib/e11y/buffers/adaptive_buffer.rb +3 -17
  125. data/lib/e11y/buffers/{request_scoped_buffer.rb → ephemeral_buffer.rb} +72 -58
  126. data/lib/e11y/buffers/ring_buffer.rb +3 -16
  127. data/lib/e11y/configuration.rb +272 -0
  128. data/lib/e11y/console.rb +10 -17
  129. data/lib/e11y/current.rb +53 -1
  130. data/lib/e11y/debug/pipeline_inspector.rb +96 -0
  131. data/lib/e11y/documentation/generator.rb +48 -0
  132. data/lib/e11y/event/base.rb +176 -82
  133. data/lib/e11y/event/value_sampling_config.rb +1 -5
  134. data/lib/e11y/events/rails/database/query.rb +1 -4
  135. data/lib/e11y/events/rails/job/failed.rb +2 -0
  136. data/lib/e11y/instruments/active_job.rb +46 -12
  137. data/lib/e11y/instruments/rails_instrumentation.rb +49 -24
  138. data/lib/e11y/instruments/sidekiq.rb +137 -31
  139. data/lib/e11y/linters/base.rb +11 -0
  140. data/lib/e11y/linters/pii/pii_declaration_linter.rb +120 -0
  141. data/lib/e11y/linters/slo/config_consistency_linter.rb +76 -0
  142. data/lib/e11y/linters/slo/explicit_declaration_linter.rb +36 -0
  143. data/lib/e11y/linters/slo/slo_status_from_linter.rb +41 -0
  144. data/lib/e11y/logger/bridge.rb +26 -7
  145. data/lib/e11y/metrics/cardinality_protection.rb +10 -15
  146. data/lib/e11y/metrics/cardinality_tracker.rb +16 -6
  147. data/lib/e11y/metrics/registry.rb +3 -5
  148. data/lib/e11y/metrics/test_backend.rb +62 -0
  149. data/lib/e11y/metrics.rb +56 -10
  150. data/lib/e11y/middleware/adapter_resolver.rb +40 -0
  151. data/lib/e11y/middleware/audit_signing.rb +43 -6
  152. data/lib/e11y/middleware/baggage_protection.rb +75 -0
  153. data/lib/e11y/middleware/dev_log_source.rb +24 -0
  154. data/lib/e11y/middleware/event_slo.rb +23 -9
  155. data/lib/e11y/middleware/otel_span.rb +23 -0
  156. data/lib/e11y/middleware/pii_filter.rb +104 -75
  157. data/lib/e11y/middleware/rate_limiting.rb +54 -27
  158. data/lib/e11y/middleware/request.rb +70 -23
  159. data/lib/e11y/middleware/routing.rb +78 -21
  160. data/lib/e11y/middleware/sampling.rb +66 -17
  161. data/lib/e11y/middleware/self_monitoring_emit.rb +39 -0
  162. data/lib/e11y/middleware/trace_context.rb +45 -10
  163. data/lib/e11y/middleware/track_latency.rb +34 -0
  164. data/lib/e11y/middleware/validation.rb +7 -16
  165. data/lib/e11y/middleware/versioning.rb +26 -22
  166. data/lib/e11y/opentelemetry/semantic_conventions.rb +109 -0
  167. data/lib/e11y/opentelemetry/span_creator.rb +142 -0
  168. data/lib/e11y/pii/patterns.rb +12 -1
  169. data/lib/e11y/pipeline/builder.rb +1 -1
  170. data/lib/e11y/presets/audit_event.rb +13 -2
  171. data/lib/e11y/railtie.rb +52 -15
  172. data/lib/e11y/registry.rb +306 -0
  173. data/lib/e11y/reliability/circuit_breaker.rb +19 -21
  174. data/lib/e11y/reliability/dlq/base.rb +71 -0
  175. data/lib/e11y/reliability/dlq/file_adapter.rb +301 -0
  176. data/lib/e11y/reliability/dlq/file_storage.rb +63 -34
  177. data/lib/e11y/reliability/dlq/filter.rb +37 -54
  178. data/lib/e11y/reliability/retry_handler.rb +26 -29
  179. data/lib/e11y/reliability/retry_rate_limiter.rb +3 -11
  180. data/lib/e11y/sampling/error_spike_detector.rb +0 -2
  181. data/lib/e11y/sampling/load_monitor.rb +5 -9
  182. data/lib/e11y/sampling/stratified_tracker.rb +18 -0
  183. data/lib/e11y/self_monitoring/buffer_monitor.rb +2 -0
  184. data/lib/e11y/self_monitoring/performance_monitor.rb +19 -61
  185. data/lib/e11y/self_monitoring/reliability_monitor.rb +4 -74
  186. data/lib/e11y/slo/config_loader.rb +40 -0
  187. data/lib/e11y/slo/config_validator.rb +58 -0
  188. data/lib/e11y/slo/dashboard_generator.rb +122 -0
  189. data/lib/e11y/slo/event_driven.rb +8 -0
  190. data/lib/e11y/slo/tracker.rb +31 -4
  191. data/lib/e11y/testing/have_tracked_event_matcher.rb +190 -0
  192. data/lib/e11y/testing/rspec_matchers.rb +21 -0
  193. data/lib/e11y/testing/snapshot_matcher.rb +86 -0
  194. data/lib/e11y/trace_context/sampler.rb +35 -0
  195. data/lib/e11y/tracing/faraday_middleware.rb +31 -0
  196. data/lib/e11y/tracing/net_http_patch.rb +33 -0
  197. data/lib/e11y/tracing/propagator.rb +116 -0
  198. data/lib/e11y/tracing.rb +47 -0
  199. data/lib/e11y/version.rb +1 -1
  200. data/lib/e11y/versioning/version_extractor.rb +32 -0
  201. data/lib/e11y.rb +141 -265
  202. data/lib/generators/e11y/event/event_generator.rb +22 -0
  203. data/lib/generators/e11y/event/templates/event.rb.tt +16 -0
  204. data/lib/generators/e11y/grafana_dashboard/grafana_dashboard_generator.rb +30 -0
  205. data/lib/generators/e11y/grafana_dashboard/templates/e11y_dashboard.json +81 -0
  206. data/lib/generators/e11y/install/install_generator.rb +34 -0
  207. data/lib/generators/e11y/install/templates/e11y.rb +239 -0
  208. data/lib/generators/e11y/prometheus_alerts/prometheus_alerts_generator.rb +29 -0
  209. data/lib/generators/e11y/prometheus_alerts/templates/e11y_alerts.yml +28 -0
  210. data/lib/tasks/e11y_docs.rake +30 -0
  211. data/lib/tasks/e11y_events.rake +71 -0
  212. data/lib/tasks/e11y_lint.rake +91 -0
  213. data/lib/tasks/e11y_slo.rake +29 -0
  214. metadata +129 -39
  215. data/docs/ADR-010-developer-experience.md +0 -2166
  216. data/docs/API-REFERENCE-L28.md +0 -914
  217. data/docs/COMPREHENSIVE-CONFIGURATION.md +0 -2366
  218. data/docs/CONTRIBUTING.md +0 -312
  219. data/docs/IMPLEMENTATION_NOTES.md +0 -2804
  220. data/docs/IMPLEMENTATION_PLAN.md +0 -1971
  221. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +0 -586
  222. data/docs/PLAN.md +0 -148
  223. data/docs/README.md +0 -296
  224. data/docs/design/00-memory-optimization.md +0 -593
  225. data/docs/guides/MIGRATION-L27-L28.md +0 -692
  226. data/docs/guides/PERFORMANCE-BENCHMARKS.md +0 -434
  227. data/docs/guides/README.md +0 -44
  228. data/docs/use_cases/UC-003-pattern-based-metrics.md +0 -1627
  229. data/lib/e11y/adapters/registry.rb +0 -141
  230. /data/docs/{ADR-012-event-evolution.md → architecture/ADR-012-event-evolution.md} +0 -0
@@ -234,7 +234,7 @@ end
234
234
  > config.pipeline.use RoutingMiddleware # 6. Buffer routing (LAST!)
235
235
  > ```
236
236
  >
237
- > **See:** [ADR-001 Section 4.1: Middleware Execution Order](../ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../ADR-015-middleware-order.md) for detailed explanation.
237
+ > **See:** [ADR-001 Section 4.1: Middleware Execution Order](../architecture/ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../architecture/ADR-015-middleware-order.md) for detailed explanation.
238
238
 
239
239
  ---
240
240
 
@@ -275,7 +275,7 @@ end
275
275
 
276
276
  ### Dual-Buffer Architecture
277
277
 
278
- **E11y использует ДВА независимых буфера:**
278
+ **E11y uses TWO independent buffers:**
279
279
 
280
280
  ```
281
281
  ┌─────────────────────────────────────────────────────────────────┐
@@ -318,15 +318,15 @@ Background Flush Thread (200ms interval):
318
318
  ### Buffer Routing Logic
319
319
 
320
320
  ```ruby
321
- # Pseudo-code для понимания
321
+ # Pseudo-code for understanding
322
322
  def track_event(event)
323
323
  if event.severity == :debug && E11y.request_scope.active?
324
324
  # → Request-scoped buffer (Thread-local)
325
- Thread.current[:e11y_request_buffer] << event
325
+ Thread.current[:e11y_ephemeral_buffer] << event
326
326
  else
327
327
  # → Main buffer (Global SPSC ring buffer)
328
328
  E11y.main_buffer << event
329
- # Фоновый поток заберет через 200ms (или раньше если батч заполнится)
329
+ # Background thread will pick up in 200ms (or sooner if batch fills)
330
330
  end
331
331
  end
332
332
  ```
@@ -401,11 +401,17 @@ module E11y::RequestScope
401
401
  end
402
402
  ```
403
403
 
404
+ > **DevLog integration (development/test):** When the debug buffer is flushed on request
405
+ > failure, events are delivered to all registered adapters — including
406
+ > `E11y::Adapters::DevLog` (auto-registered in development/test via Railtie). Debug
407
+ > events from failed requests automatically appear in `log/e11y_dev.jsonl` and become
408
+ > visible in the TUI and Browser Overlay. See [UC-017](UC-017-local-development.md).
409
+
404
410
  ---
405
411
 
406
412
  ## 📈 Performance Impact
407
413
 
408
- > **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
414
+ > **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../architecture/ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
409
415
 
410
416
  ### Buffer Metrics
411
417
 
@@ -413,20 +419,20 @@ end
413
419
 
414
420
  ```ruby
415
421
  # Exposed via Yabeda (auto-configured)
416
- Yabeda.e11y_request_buffer_size # Gauge: current buffer size per request
417
- Yabeda.e11y_request_buffer_flushes_total # Counter: buffer flushes by trigger
422
+ Yabeda.e11y_ephemeral_buffer_size # Gauge: current buffer size per request
423
+ Yabeda.e11y_ephemeral_buffer_flushes_total # Counter: buffer flushes by trigger
418
424
 
419
425
  # Accessible via Prometheus metrics endpoint
420
426
  # Example queries:
421
427
 
422
428
  # 1. Average buffer size
423
- avg(e11y_request_buffer_size)
429
+ avg(e11y_ephemeral_buffer_size)
424
430
 
425
431
  # 2. Buffer flush rate by trigger
426
- rate(e11y_request_buffer_flushes_total{trigger="error"}[5m])
432
+ rate(e11y_ephemeral_buffer_flushes_total{trigger="error"}[5m])
427
433
 
428
434
  # 3. Buffer overflow alerts
429
- e11y_request_buffer_size >= 100 # Alert if buffer limit reached
435
+ e11y_ephemeral_buffer_size >= 100 # Alert if buffer limit reached
430
436
  ```
431
437
 
432
438
  **Monitoring Examples:**
@@ -436,18 +442,18 @@ e11y_request_buffer_size >= 100 # Alert if buffer limit reached
436
442
 
437
443
  # Panel 1: Buffer Size Distribution
438
444
  histogram_quantile(0.99,
439
- sum(rate(e11y_request_buffer_size[5m])) by (le)
445
+ sum(rate(e11y_ephemeral_buffer_size[5m])) by (le)
440
446
  )
441
447
  # Shows p99 buffer size
442
448
 
443
449
  # Panel 2: Flush Triggers Breakdown
444
450
  sum by (trigger) (
445
- rate(e11y_request_buffer_flushes_total[5m])
451
+ rate(e11y_ephemeral_buffer_flushes_total[5m])
446
452
  )
447
453
  # Shows why buffers flush (error vs. slow_request vs. custom)
448
454
 
449
455
  # Panel 3: Memory Impact Estimate
450
- avg(e11y_request_buffer_size) * 500 # bytes per event
456
+ avg(e11y_ephemeral_buffer_size) * 500 # bytes per event
451
457
  # Estimates per-request memory usage
452
458
  ```
453
459
 
@@ -609,13 +615,13 @@ end
609
615
 
610
616
  ---
611
617
 
612
- ## 🔄 Взаимодействие с Flush Interval (200ms)
618
+ ## 🔄 Interaction with Flush Interval (200ms)
613
619
 
614
- ### Вопрос: Не конфликтуют ли буферы?
620
+ ### Question: Do the buffers conflict?
615
621
 
616
- **Ответ: НЕТ. Они независимы.**
622
+ **Answer: NO. They are independent.**
617
623
 
618
- ### Детальная Логика
624
+ ### Detailed Logic
619
625
 
620
626
  ```ruby
621
627
  # config/initializers/e11y.rb
@@ -636,9 +642,9 @@ E11y.configure do |config|
636
642
  end
637
643
  ```
638
644
 
639
- ### Поток Событий
645
+ ### Event Flow
640
646
 
641
- **Scenario 1: Обычный запрос (успешный)**
647
+ **Scenario 1: Normal request (successful)**
642
648
  ```ruby
643
649
  # Request starts
644
650
  Events::DebugEvent.track(...) # → Request buffer (thread-local)
@@ -650,7 +656,7 @@ Events::DebugEvent.track(...) # → Request buffer (thread-local)
650
656
  # → Main buffer flushed every 200ms (success event sent)
651
657
  ```
652
658
 
653
- **Scenario 2: Запрос с ошибкой**
659
+ **Scenario 2: Request with error**
654
660
  ```ruby
655
661
  # Request starts
656
662
  Events::DebugEvent.track(...) # → Request buffer
@@ -662,20 +668,20 @@ Events::DebugEvent.track(...) # → Request buffer
662
668
  # → Main buffer continues flush every 200ms (error event sent)
663
669
  ```
664
670
 
665
- **Scenario 3: Высоконагруженный сервис**
671
+ **Scenario 3: High-load service**
666
672
  ```ruby
667
- # 1000 requests/sec, каждый с 5 debug events
668
- # → 5000 debug events/sec в request buffers (thread-local)
669
- # → 99% успешных → 4950 debug events/sec DISCARDED
670
- # → 1% ошибок → 50 debug events/sec FLUSHED
673
+ # 1000 requests/sec, each with 5 debug events
674
+ # → 5000 debug events/sec in request buffers (thread-local)
675
+ # → 99% successful → 4950 debug events/sec DISCARDED
676
+ # → 1% errors → 50 debug events/sec FLUSHED
671
677
  #
672
- # Параллельно:
678
+ # In parallel:
673
679
  # → 1000 info/success events/sec → Main buffer
674
- # → Flush каждые 200ms = 5 batches/sec
675
- # → 200 events per batch (в среднем)
680
+ # → Flush every 200ms = 5 batches/sec
681
+ # → 200 events per batch (on average)
676
682
  ```
677
683
 
678
- ### Итого: Никакого Конфликта!
684
+ ### Summary: No Conflict!
679
685
 
680
686
  | Event Type | Buffer | Flush Trigger | Latency |
681
687
  |------------|--------|---------------|---------|
@@ -686,14 +692,14 @@ Events::DebugEvent.track(...) # → Request buffer
686
692
  | `:error` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
687
693
  | `:fatal` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
688
694
 
689
- **Преимущества двойного буфера:**
690
- 1. ✅ Debug события не засоряют main buffer
691
- 2. ✅ Важные события (info+) идут быстро (200ms)
692
- 3. ✅ Debug события идут мгновенно при ошибке (flush triggered)
693
- 4. ✅ 99% debug событий вообще не обрабатываются (discard = zero cost)
694
- 5. ✅ Thread-safety: request buffer изолирован в Thread.current
695
+ **Benefits of dual buffer:**
696
+ 1. ✅ Debug events don't clutter main buffer
697
+ 2. ✅ Important events (info+) go fast (200ms)
698
+ 3. ✅ Debug events go instantly on error (flush triggered)
699
+ 4. ✅ 99% of debug events are never processed (discard = zero cost)
700
+ 5. ✅ Thread-safety: request buffer isolated in Thread.current
695
701
 
696
- ### Визуальная Диаграмма
702
+ ### Visual Diagram
697
703
 
698
704
  ```
699
705
  Time: ──────────────────────────────────────────────────>
@@ -720,14 +726,14 @@ Background Flush Thread: │ │
720
726
  200ms 400ms
721
727
  ```
722
728
 
723
- ### Пример с Цифрами
729
+ ### Example with Numbers
724
730
 
725
- **Нагрузка:**
731
+ **Load:**
726
732
  - 100 requests/sec
727
- - Каждый запрос: 3 debug события + 1 success событие
733
+ - Each request: 3 debug events + 1 success event
728
734
  - Error rate: 1%
729
735
 
730
- **Что происходит:**
736
+ **What happens:**
731
737
 
732
738
  | Time | Request Buffer (Thread-local) | Main Buffer (Global) | Flush |
733
739
  |------|------------------------------|---------------------|-------|
@@ -739,8 +745,8 @@ Background Flush Thread: │ │
739
745
  | 210ms | Req21: [D, D, D] ERROR! | [S21, E21, **D, D, D from Req21**] | **Immediate flush debug** |
740
746
  | 400ms | - | [S21...S40] | **Flush next batch** |
741
747
 
742
- **Результат:**
743
- - Success events: ~100/sec → flush каждые 200ms → latency <200ms ✅
748
+ **Result:**
749
+ - Success events: ~100/sec → flush every 200ms → latency <200ms ✅
744
750
  - Debug events (99%): DISCARDED → zero overhead ✅
745
751
  - Debug events (1% errors): flushed IMMEDIATELY with error context ✅
746
752
 
@@ -804,7 +810,7 @@ end
804
810
 
805
811
  - **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Define structured events
806
812
  - **[UC-010: Background Job Tracking](./UC-010-background-job-tracking.md)** - Buffering in Sidekiq/ActiveJob
807
- - **[UC-015: Local Development](./UC-015-local-development.md)** - Test buffering locally
813
+ - **[UC-017: Local Development](./UC-017-local-development.md)** - Test buffering locally
808
814
 
809
815
  ---
810
816
 
@@ -42,7 +42,7 @@ Events::OrderPaid.track(
42
42
 
43
43
  # Result:
44
44
  # 1. Structured log in ELK/Loki (JSON)
45
- # 2. Auto-generated metrics (pattern-based)
45
+ # 2. Event metrics (from metrics do block)
46
46
  # 3. Trace context (automatic correlation)
47
47
  ```
48
48
 
@@ -102,22 +102,13 @@ class OrdersController < ApplicationController
102
102
  end
103
103
  end
104
104
 
105
- # Step 3: Configure pattern-based metrics (config/initializers/e11y.rb)
106
- E11y.configure do |config|
107
- config.metrics do
108
- # Counter: orders.created.total
109
- counter_for pattern: 'order.created',
110
- name: 'orders.created.total',
111
- tags: [:currency]
112
-
113
- # Histogram: orders.created.amount
114
- histogram_for pattern: 'order.created',
115
- name: 'orders.created.amount',
116
- value: ->(e) { e.payload[:total_amount] },
117
- tags: [:currency],
118
- buckets: [10, 50, 100, 500, 1000, 5000]
119
- end
120
- end
105
+ # Step 3: Add metrics in event class
106
+ # class Events::OrderCreated < E11y::Event::Base
107
+ # metrics do
108
+ # counter :orders_created_total, tags: [:currency]
109
+ # histogram :orders_created_amount, value: :total_amount, tags: [:currency], buckets: [10, 50, 100, 500]
110
+ # end
111
+ # end
121
112
  ```
122
113
 
123
114
  **Result in Logs (Loki/ELK):**
@@ -231,21 +222,7 @@ class RegistrationsController < ApplicationController
231
222
  end
232
223
  end
233
224
 
234
- # Metrics configuration
235
- E11y.configure do |config|
236
- config.metrics do
237
- # Funnel counter
238
- counter_for pattern: 'registration.*',
239
- name: 'registration.funnel.total',
240
- tags: [:event_name, :source]
241
-
242
- # Time to first login
243
- histogram_for pattern: 'first.login',
244
- name: 'registration.time_to_first_login_hours',
245
- value: ->(e) { e.payload[:time_since_registration_hours] },
246
- buckets: [1, 6, 12, 24, 48, 72, 168] # hours
247
- end
248
- end
225
+ # Add metrics do in each event class (Events::RegistrationStarted, Events::EmailVerified, etc.)
249
226
  ```
250
227
 
251
228
  **Funnel Analysis (Grafana/Prometheus):**
@@ -353,28 +330,7 @@ class ProcessPaymentJob < ApplicationJob
353
330
  end
354
331
  end
355
332
 
356
- # Metrics
357
- E11y.configure do |config|
358
- config.metrics do
359
- # Success rate (critical metric!)
360
- success_rate_for pattern: 'payment.*',
361
- name: 'payments.success_rate',
362
- tags: [:payment_method]
363
- # Auto-calculates: succeeded / (succeeded + failed) * 100
364
-
365
- # Payment duration (performance)
366
- histogram_for pattern: 'payment.succeeded',
367
- value: ->(e) { e.duration_ms },
368
- name: 'payments.duration_ms',
369
- tags: [:payment_method],
370
- buckets: [100, 250, 500, 1000, 2000, 5000]
371
-
372
- # Failed payments by error code (debugging)
373
- counter_for pattern: 'payment.failed',
374
- name: 'payments.failed.total',
375
- tags: [:error_code, :payment_method]
376
- end
377
- end
333
+ # Add metrics do in PaymentSucceeded, PaymentFailed event classes
378
334
  ```
379
335
 
380
336
  **Alerts (Prometheus):**
@@ -505,7 +461,7 @@ module Events
505
461
  rate_limit 1000
506
462
  sample_rate 1.0 # Never sample payments (high-value)
507
463
  retention 7.years # Financial records
508
- adapters [:loki, :sentry, :s3_archive]
464
+ adapters [:loki, :sentry]
509
465
 
510
466
  # Common PII filtering
511
467
  contains_pii true
@@ -605,7 +561,7 @@ module E11y
605
561
  rate_limit 10_000
606
562
  sample_rate 1.0 # Never sample
607
563
  retention 7.years
608
- adapters [:loki, :sentry, :s3_archive]
564
+ adapters [:loki, :sentry]
609
565
  end
610
566
  end
611
567
 
@@ -669,7 +625,7 @@ module Events
669
625
  rate_limit 5000
670
626
  sample_rate 1.0
671
627
  retention 7.years
672
- adapters [:loki, :elasticsearch, :s3_archive, :slack_business]
628
+ adapters [:loki, :elasticsearch, :slack_business]
673
629
 
674
630
  metric :counter,
675
631
  name: 'critical_business_events.total',
@@ -1051,9 +1007,7 @@ E11y.configure do |config|
1051
1007
  config.register_adapter :loki, E11y::Adapters::LokiAdapter.new(
1052
1008
  url: ENV['LOKI_URL']
1053
1009
  )
1054
- config.register_adapter :s3_archive, E11y::Adapters::S3Adapter.new(
1055
- bucket: 'payment-archive'
1056
- )
1010
+ # Archival: external jobs filter Loki by retention_until (ISO8601) for tier migration
1057
1011
  config.default_adapters = [:loki]
1058
1012
 
1059
1013
  when 'staging'
@@ -1082,9 +1036,9 @@ module Events
1082
1036
  required(:amount).filled(:decimal)
1083
1037
  end
1084
1038
 
1085
- # Production: also archive to S3
1039
+ # Production: retention_period 7.years retention_until in payload; archival jobs filter by it
1086
1040
  if Rails.env.production?
1087
- adapters [:loki, :s3_archive]
1041
+ adapters [:loki]
1088
1042
  end
1089
1043
  # Other envs: use default_adapters
1090
1044
  end
@@ -1095,36 +1049,13 @@ end
1095
1049
 
1096
1050
  ## 📊 Metrics Configuration
1097
1051
 
1098
- ### Pattern-Based Auto-Metrics
1052
+ Define metrics in each event class:
1099
1053
 
1100
1054
  ```ruby
1101
- E11y.configure do |config|
1102
- config.metrics do
1103
- # Global counter for ALL events
1104
- counter_for pattern: '*',
1105
- name: 'business_events.total',
1106
- tags: [:event_name, :severity]
1107
-
1108
- # Domain-specific counters
1109
- counter_for pattern: 'order.*',
1110
- name: 'orders.events.total',
1111
- tags: [:event_name]
1112
-
1113
- counter_for pattern: 'user.*',
1114
- name: 'users.events.total',
1115
- tags: [:event_name]
1116
-
1117
- # Histograms for amounts/durations
1118
- histogram_for pattern: '*.paid',
1119
- name: 'payments.amount',
1120
- value: ->(e) { e.payload[:amount] },
1121
- tags: [:currency],
1122
- buckets: [10, 50, 100, 500, 1000, 5000, 10000]
1123
-
1124
- # Success rate (special metric type)
1125
- success_rate_for pattern: 'payment.*',
1126
- name: 'payments.success_rate'
1127
- # Automatically calculates from :success and :error events
1055
+ class Events::OrderCreated < E11y::Event::Base
1056
+ metrics do
1057
+ counter :orders_created_total, tags: [:currency]
1058
+ histogram :order_amount, value: :amount, tags: [:currency], buckets: [10, 50, 100, 500]
1128
1059
  end
1129
1060
  end
1130
1061
  ```
@@ -1542,7 +1473,7 @@ E11y is designed for **high-performance production environments** with strict SL
1542
1473
  ```ruby
1543
1474
  # Benchmark: 1000 events/sec
1544
1475
  Benchmark.ips do |x|
1545
- x.report("E11y.track") do
1476
+ x.report("EventClass.track") do
1546
1477
  Events::OrderPaid.track(
1547
1478
  order_id: 'ORD-123',
1548
1479
  amount: 99.99
@@ -1551,7 +1482,7 @@ Benchmark.ips do |x|
1551
1482
  end
1552
1483
 
1553
1484
  # Results:
1554
- # E11y.track: 100,000 i/s → ~0.01ms per call
1485
+ # EventClass.track: 100,000 i/s → ~0.01ms per call
1555
1486
  # p99 latency: <1ms ✅
1556
1487
  ```
1557
1488
 
@@ -1918,11 +1849,11 @@ end
1918
1849
  class Events::CriticalPayment < Events::BasePaymentEvent
1919
1850
  include E11y::Presets::HighValueEvent
1920
1851
 
1921
- adapters [:loki, :sentry, :s3_archive] # Override base (add S3)
1852
+ adapters [:loki, :sentry]
1922
1853
 
1923
1854
  # Final config:
1924
1855
  # - severity: :success (from base)
1925
- # - adapters: [:loki, :sentry, :s3_archive] (event-level override)
1856
+ # - adapters: [:loki, :sentry] (event-level override)
1926
1857
  # - sample_rate: 1.0 (from base)
1927
1858
  # - rate_limit: 10_000 (from preset)
1928
1859
  # - retention: 7.years (from preset)
@@ -1943,7 +1874,7 @@ end
1943
1874
  ## 📚 Related Use Cases
1944
1875
 
1945
1876
  - **[UC-001: Request-Scoped Debug Buffering](./UC-001-request-scoped-debug-buffering.md)** - Debug vs business events
1946
- - **[UC-003: Pattern-Based Metrics](./UC-003-pattern-based-metrics.md)** - Auto-generate metrics
1877
+ - **[UC-003: Event Metrics](./UC-003-event-metrics.md)** - Metrics in event classes
1947
1878
  - **[UC-005: PII Filtering](./UC-005-pii-filtering.md)** - Secure event data
1948
1879
 
1949
1880
  ---
@@ -0,0 +1,66 @@
1
+ # UC-003: Event Metrics
2
+
3
+ **Status:** Implemented
4
+ **Complexity:** Intermediate
5
+ **Setup Time:** 15-30 minutes
6
+ **Target Users:** DevOps, SRE, Backend Developers
7
+
8
+ ---
9
+
10
+ ## Overview
11
+
12
+ Define metrics directly in event classes. Metrics are registered at boot and updated automatically when events are tracked.
13
+
14
+ ### Event-Level Metrics DSL
15
+
16
+ ```ruby
17
+ class Events::OrderPaid < E11y::Event::Base
18
+ schema do
19
+ required(:order_id).filled(:string)
20
+ required(:amount).filled(:float)
21
+ required(:currency).filled(:string)
22
+ required(:payment_method).filled(:string)
23
+ end
24
+
25
+ metrics do
26
+ counter :orders_paid_total, tags: [:currency, :payment_method]
27
+ histogram :orders_paid_amount, value: :amount, tags: [:currency], buckets: [10, 50, 100, 500, 1000, 5000]
28
+ end
29
+ end
30
+
31
+ Events::OrderPaid.track(order_id: '123', amount: 99.99, currency: 'USD', payment_method: 'stripe')
32
+ # → orders_paid_total{currency="USD",payment_method="stripe"} += 1
33
+ # → orders_paid_amount_bucket{currency="USD",le="100"} += 1
34
+ ```
35
+
36
+ ### Metric Types
37
+
38
+ - **counter** — monotonically increasing
39
+ - **histogram** — distribution (requires `value:` field, optional `buckets:`)
40
+ - **gauge** — point-in-time value (requires `value:`)
41
+
42
+ ### Boot-Time Validation
43
+
44
+ E11y validates metrics at Rails boot: label conflicts, type conflicts. Non-Rails: call `E11y::Metrics::Registry.instance.validate_all!` after loading events.
45
+
46
+ ### Shared Metrics via Inheritance
47
+
48
+ ```ruby
49
+ class BaseOrderEvent < E11y::Event::Base
50
+ metrics do
51
+ counter :orders_total, tags: [:currency, :status]
52
+ end
53
+ end
54
+
55
+ class Events::OrderPaid < BaseOrderEvent
56
+ metrics do
57
+ histogram :order_amount, value: :amount, tags: [:currency]
58
+ end
59
+ end
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Yabeda Integration
65
+
66
+ Register Yabeda adapter in `config.adapters`. Metrics flow to Prometheus via Yabeda.
@@ -122,6 +122,45 @@ yabeda_slo_sidekiq_job_duration_seconds{class="ProcessOrderJob"}
122
122
 
123
123
  ---
124
124
 
125
+ ### Event-Level SLO (Business Logic)
126
+
127
+ **Event-driven SLO** tracks business logic reliability (e.g., order created vs failed, payment processed vs rejected). Opt-in via `slo { enabled true }` in Event classes.
128
+
129
+ > **Implementation:** See [ADR-014 Event-Driven SLO](../architecture/ADR-014-event-driven-slo.md) for full architecture.
130
+
131
+ **Enable Event SLO in Event class:**
132
+
133
+ ```ruby
134
+ module Events
135
+ class OrderCreated < E11y::Event::Base
136
+ schema do
137
+ required(:order_id).filled(:string)
138
+ optional(:status).filled(:string)
139
+ end
140
+
141
+ slo do
142
+ enabled true
143
+ contributes_to "order_creation_success_rate"
144
+ slo_status_from do |payload|
145
+ case payload[:status].to_s
146
+ when "failed", "cancelled" then "failure"
147
+ when "pending", "completed" then "success"
148
+ else "success"
149
+ end
150
+ end
151
+ end
152
+ end
153
+ end
154
+ ```
155
+
156
+ **Required when `slo { enabled true }`:**
157
+ - `contributes_to "slo_name"` — which custom SLO this event feeds
158
+ - `slo_status_from { |payload| ... }` — compute `"success"`, `"failure"`, or `nil` (not counted)
159
+
160
+ **EventSlo middleware** (in default pipeline) emits `e11y_slo_event_result_total{slo_name, slo_status}` for events with SLO enabled.
161
+
162
+ ---
163
+
125
164
  ## 📐 Sampling Correction for Accurate SLO (C11 Resolution) ⚠️ CRITICAL
126
165
 
127
166
  **Reference:** [ADR-009 Section 3.7: Stratified Sampling for SLO Accuracy (C11 Resolution)](../ADR-009-cost-optimization.md#37-stratified-sampling-for-slo-accuracy-c11-resolution) and [CONFLICT-ANALYSIS.md C11](../researches/CONFLICT-ANALYSIS.md#c11-adaptive-sampling--slo-tracking)
@@ -499,107 +538,9 @@ resource "grafana_dashboard" "e11y_slo" {
499
538
 
500
539
  ---
501
540
 
502
- ## 🚨 Auto-Generated Alerts
503
-
504
- > **Implementation:** See [ADR-003 Section 5: Multi-Window Multi-Burn Rate Alerts](../ADR-003-slo-observability.md#5-multi-window-multi-burn-rate-alerts) for Google SRE best practice alert architecture.
541
+ ## 🚨 PromQL & Alert Rules
505
542
 
506
- ### Generate Prometheus Alerts
507
-
508
- ```bash
509
- rails g e11y:prometheus_alerts
510
-
511
- # Output: config/prometheus/e11y_slo_alerts.yml
512
- ```
513
-
514
- **Alerts include:**
515
- - High error rate (>1%)
516
- - Low availability (<99.9%)
517
- - High latency (p95 >200ms)
518
- - Job failure rate (>5%)
519
-
520
- **Example alerts.yml:**
521
- ```yaml
522
- groups:
523
- - name: e11y_slo
524
- rules:
525
- - alert: HighErrorRate
526
- expr: |
527
- (
528
- sum(rate(yabeda_slo_http_requests_total{status=~"5.."}[5m])) /
529
- sum(rate(yabeda_slo_http_requests_total[5m]))
530
- ) > 0.01
531
- for: 5m
532
- annotations:
533
- summary: "HTTP error rate >1%"
534
-
535
- - alert: HighLatency
536
- expr: histogram_quantile(0.95, rate(yabeda_slo_http_request_duration_seconds_bucket[5m])) > 0.2
537
- for: 5m
538
- annotations:
539
- summary: "HTTP p95 latency >200ms"
540
- ```
541
-
542
- ---
543
-
544
- ## 🎯 Error Budget Management
545
-
546
- > **Implementation:** See [ADR-003 Section 7: Error Budget Management](../ADR-003-slo-observability.md#7-error-budget-management) for detailed architecture and deployment gates.
547
-
548
- **Track your SLO error budget in real-time:**
549
-
550
- ```ruby
551
- # Query error budget for any endpoint
552
- budget = E11y::SLO::ErrorBudget.new('OrdersController', 'create', slo_config)
553
-
554
- budget.total # => 0.001 (0.1% for 99.9% target)
555
- budget.consumed # => 0.0005 (50% of budget used)
556
- budget.remaining # => 0.0005 (50% of budget left)
557
- budget.percent_consumed # => 50.0
558
- budget.exhausted? # => false
559
- budget.time_until_exhaustion # => 14.5 days (at current burn rate)
560
- ```
561
-
562
- ### Deployment Gate (Optional)
563
-
564
- **Prevent deployments when error budget is exhausted:**
565
-
566
- ```ruby
567
- # config/initializers/e11y.rb
568
- E11y.configure do |config|
569
- config.slo do
570
- error_budget do
571
- # Block deployments if <20% budget remaining
572
- deployment_gate enabled: true, minimum_budget_percent: 20
573
- end
574
- end
575
- end
576
- ```
577
-
578
- **CI/CD integration:**
579
-
580
- ```bash
581
- # Before deployment, check error budget
582
- rails e11y:slo:check_budget
583
-
584
- # Exit code 0: ✅ Budget available, deploy
585
- # Exit code 1: ❌ Budget exhausted, block deploy
586
- ```
587
-
588
- **Example output:**
589
-
590
- ```
591
- Checking SLO Error Budget...
592
-
593
- OrdersController#create:
594
- ✅ Budget: 75% remaining (Target: 99.9%, Actual: 99.925%)
595
-
596
- PaymentsController#process:
597
- ❌ Budget: 5% remaining (Target: 99.95%, Actual: 99.902%)
598
- ⚠️ DEPLOYMENT BLOCKED: Error budget below 20% threshold
599
-
600
- Overall: ❌ FAILED
601
- Cannot deploy: 1 endpoint(s) below minimum error budget
602
- ```
543
+ > See [SLO-PROMQL-ALERTS.md](../SLO-PROMQL-ALERTS.md) for PromQL queries and Prometheus alert rules.
603
544
 
604
545
  ---
605
546
 
@@ -720,7 +661,7 @@ end
720
661
  ## 📚 Related Use Cases
721
662
 
722
663
  - **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Events vs SLO metrics
723
- - **[UC-003: Pattern-Based Metrics](./UC-003-pattern-based-metrics.md)** - Custom metrics
664
+ - **[UC-003: Event Metrics](./UC-003-event-metrics.md)** - Custom metrics
724
665
 
725
666
  ---
726
667