e11y 0.2.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +130 -10
- data/CHANGELOG.md +56 -1
- data/CLAUDE.md +168 -0
- data/CONTRIBUTING.md +640 -0
- data/README.md +134 -702
- data/RELEASE.md +18 -3
- data/Rakefile +108 -29
- data/config/README.md +1 -1
- data/config/loki-local-config.yaml +12 -0
- data/config/otel-collector-config.yaml +44 -0
- data/cucumber.yml +1 -0
- data/docker-compose.yml +18 -2
- data/docs/ADAPTERS.md +76 -0
- data/docs/ADAPTIVE_SAMPLING.md +59 -0
- data/docs/COMPARISON.md +104 -0
- data/docs/CONFIGURATION.md +52 -0
- data/docs/DISTRIBUTED_TRACING.md +44 -0
- data/docs/LIMITATIONS.md +13 -0
- data/docs/METRICS_DSL.md +84 -0
- data/docs/PERFORMANCE.md +60 -0
- data/docs/PII_FILTERING.md +40 -0
- data/docs/PRESETS.md +65 -0
- data/docs/QUICK-START.md +546 -587
- data/docs/RAILS_INTEGRATION.md +29 -0
- data/docs/SCHEMA_VALIDATION.md +63 -0
- data/docs/SLO-PROMQL-ALERTS.md +161 -0
- data/docs/TESTING.md +69 -0
- data/docs/{ADR-001-architecture.md → architecture/ADR-001-architecture.md} +35 -64
- data/docs/{ADR-002-metrics-yabeda.md → architecture/ADR-002-metrics-yabeda.md} +62 -236
- data/docs/{ADR-003-slo-observability.md → architecture/ADR-003-slo-observability.md} +27 -466
- data/docs/{ADR-004-adapter-architecture.md → architecture/ADR-004-adapter-architecture.md} +163 -146
- data/docs/{ADR-005-tracing-context.md → architecture/ADR-005-tracing-context.md} +10 -9
- data/docs/{ADR-006-security-compliance.md → architecture/ADR-006-security-compliance.md} +184 -191
- data/docs/{ADR-007-opentelemetry-integration.md → architecture/ADR-007-opentelemetry-integration.md} +3 -21
- data/docs/{ADR-008-rails-integration.md → architecture/ADR-008-rails-integration.md} +209 -339
- data/docs/{ADR-009-cost-optimization.md → architecture/ADR-009-cost-optimization.md} +45 -54
- data/docs/architecture/ADR-010-developer-experience.md +522 -0
- data/docs/{ADR-011-testing-strategy.md → architecture/ADR-011-testing-strategy.md} +41 -83
- data/docs/{ADR-013-reliability-error-handling.md → architecture/ADR-013-reliability-error-handling.md} +37 -12
- data/docs/{ADR-014-event-driven-slo.md → architecture/ADR-014-event-driven-slo.md} +12 -24
- data/docs/{ADR-015-middleware-order.md → architecture/ADR-015-middleware-order.md} +23 -41
- data/docs/{ADR-016-self-monitoring-slo.md → architecture/ADR-016-self-monitoring-slo.md} +52 -349
- data/docs/{ADR-017-multi-rails-compatibility.md → architecture/ADR-017-multi-rails-compatibility.md} +4 -11
- data/docs/architecture/ADR-018-memory-optimization.md +366 -0
- data/docs/{ADR-INDEX.md → architecture/ADR-INDEX.md} +11 -6
- data/docs/{00-ICP-AND-TIMELINE.md → prd/00-ICP-AND-TIMELINE.md} +6 -6
- data/docs/{01-SCALE-REQUIREMENTS.md → prd/01-SCALE-REQUIREMENTS.md} +6 -6
- data/docs/prd/01-overview-vision.md +19 -14
- data/docs/use_cases/README.md +22 -23
- data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +50 -44
- data/docs/use_cases/UC-002-business-event-tracking.md +26 -95
- data/docs/use_cases/UC-003-event-metrics.md +66 -0
- data/docs/use_cases/UC-004-zero-config-slo-tracking.md +42 -101
- data/docs/use_cases/UC-005-sentry-integration.md +13 -15
- data/docs/use_cases/UC-006-trace-context-management.md +30 -28
- data/docs/use_cases/UC-007-pii-filtering.md +35 -87
- data/docs/use_cases/UC-008-opentelemetry-integration.md +51 -89
- data/docs/use_cases/UC-009-multi-service-tracing.md +4 -4
- data/docs/use_cases/UC-010-background-job-tracking.md +5 -5
- data/docs/use_cases/UC-011-rate-limiting.md +95 -168
- data/docs/use_cases/UC-012-audit-trail.md +21 -46
- data/docs/use_cases/UC-013-high-cardinality-protection.md +29 -167
- data/docs/use_cases/UC-014-adaptive-sampling.md +2 -2
- data/docs/use_cases/UC-015-cost-optimization.md +46 -99
- data/docs/use_cases/UC-016-rails-logger-migration.md +39 -213
- data/docs/use_cases/UC-017-local-development.md +203 -777
- data/docs/use_cases/UC-018-testing-events.md +3 -3
- data/docs/use_cases/UC-019-retention-based-routing.md +53 -106
- data/docs/use_cases/UC-020-event-versioning.md +8 -9
- data/docs/use_cases/UC-021-error-handling-retry-dlq.md +18 -22
- data/docs/use_cases/UC-022-event-registry.md +15 -21
- data/docs/use_cases/backlog.md +119 -87
- data/e11y.gemspec +2 -2
- data/gems/e11y-devtools/README.md +136 -0
- data/gems/e11y-devtools/config/routes.rb +8 -0
- data/gems/e11y-devtools/e11y-devtools.gemspec +25 -0
- data/gems/e11y-devtools/exe/e11y +34 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/server.rb +96 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tool_base.rb +25 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/clear.rb +31 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/errors.rb +35 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/event_detail.rb +33 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/events_by_trace.rb +33 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/interactions.rb +40 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/recent_events.rb +34 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/search.rb +34 -0
- data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/stats.rb +30 -0
- data/gems/e11y-devtools/lib/e11y/devtools/overlay/assets/overlay.js +115 -0
- data/gems/e11y-devtools/lib/e11y/devtools/overlay/controller.rb +54 -0
- data/gems/e11y-devtools/lib/e11y/devtools/overlay/engine.rb +26 -0
- data/gems/e11y-devtools/lib/e11y/devtools/overlay/middleware.rb +80 -0
- data/gems/e11y-devtools/lib/e11y/devtools/overlay/rails_controller.rb +42 -0
- data/gems/e11y-devtools/lib/e11y/devtools/tui/app.rb +262 -0
- data/gems/e11y-devtools/lib/e11y/devtools/tui/grouping.rb +66 -0
- data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_detail.rb +62 -0
- data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_list.rb +70 -0
- data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/interaction_list.rb +47 -0
- data/gems/e11y-devtools/lib/e11y/devtools/version.rb +8 -0
- data/gems/e11y-devtools/lib/e11y/devtools.rb +13 -0
- data/gems/e11y-devtools/spec/e11y/devtools/mcp/tools_spec.rb +107 -0
- data/gems/e11y-devtools/spec/e11y/devtools/overlay/controller_spec.rb +58 -0
- data/gems/e11y-devtools/spec/e11y/devtools/overlay/middleware_spec.rb +46 -0
- data/gems/e11y-devtools/spec/e11y/devtools/tui/app_spec.rb +85 -0
- data/gems/e11y-devtools/spec/e11y/devtools/tui/grouping_spec.rb +64 -0
- data/gems/e11y-devtools/spec/spec_helper.rb +5 -0
- data/gems/e11y-devtools/spec/tui/widgets/event_list_spec.rb +44 -0
- data/gems/e11y-devtools/spec/tui/widgets/interaction_list_spec.rb +62 -0
- data/lib/e11y/adapters/audit_encrypted.rb +53 -11
- data/lib/e11y/adapters/base.rb +33 -34
- data/lib/e11y/adapters/dev_log/file_store.rb +143 -0
- data/lib/e11y/adapters/dev_log/query.rb +219 -0
- data/lib/e11y/adapters/dev_log.rb +118 -0
- data/lib/e11y/adapters/file.rb +3 -6
- data/lib/e11y/adapters/in_memory.rb +52 -5
- data/lib/e11y/adapters/in_memory_test.rb +29 -0
- data/lib/e11y/adapters/loki.rb +58 -23
- data/lib/e11y/adapters/null.rb +82 -0
- data/lib/e11y/adapters/opentelemetry_collector.rb +183 -0
- data/lib/e11y/adapters/otel_logs.rb +136 -23
- data/lib/e11y/adapters/sentry.rb +4 -7
- data/lib/e11y/adapters/stdout.rb +73 -7
- data/lib/e11y/adapters/yabeda.rb +153 -29
- data/lib/e11y/buffers/adaptive_buffer.rb +3 -17
- data/lib/e11y/buffers/{request_scoped_buffer.rb → ephemeral_buffer.rb} +72 -58
- data/lib/e11y/buffers/ring_buffer.rb +3 -16
- data/lib/e11y/configuration.rb +272 -0
- data/lib/e11y/console.rb +10 -17
- data/lib/e11y/current.rb +53 -1
- data/lib/e11y/debug/pipeline_inspector.rb +96 -0
- data/lib/e11y/documentation/generator.rb +48 -0
- data/lib/e11y/event/base.rb +176 -82
- data/lib/e11y/event/value_sampling_config.rb +1 -5
- data/lib/e11y/events/rails/database/query.rb +1 -4
- data/lib/e11y/events/rails/job/failed.rb +2 -0
- data/lib/e11y/instruments/active_job.rb +46 -12
- data/lib/e11y/instruments/rails_instrumentation.rb +49 -24
- data/lib/e11y/instruments/sidekiq.rb +137 -31
- data/lib/e11y/linters/base.rb +11 -0
- data/lib/e11y/linters/pii/pii_declaration_linter.rb +120 -0
- data/lib/e11y/linters/slo/config_consistency_linter.rb +76 -0
- data/lib/e11y/linters/slo/explicit_declaration_linter.rb +36 -0
- data/lib/e11y/linters/slo/slo_status_from_linter.rb +41 -0
- data/lib/e11y/logger/bridge.rb +26 -7
- data/lib/e11y/metrics/cardinality_protection.rb +10 -15
- data/lib/e11y/metrics/cardinality_tracker.rb +16 -6
- data/lib/e11y/metrics/registry.rb +3 -5
- data/lib/e11y/metrics/test_backend.rb +62 -0
- data/lib/e11y/metrics.rb +56 -10
- data/lib/e11y/middleware/adapter_resolver.rb +40 -0
- data/lib/e11y/middleware/audit_signing.rb +43 -6
- data/lib/e11y/middleware/baggage_protection.rb +75 -0
- data/lib/e11y/middleware/dev_log_source.rb +24 -0
- data/lib/e11y/middleware/event_slo.rb +23 -9
- data/lib/e11y/middleware/otel_span.rb +23 -0
- data/lib/e11y/middleware/pii_filter.rb +104 -75
- data/lib/e11y/middleware/rate_limiting.rb +54 -27
- data/lib/e11y/middleware/request.rb +70 -23
- data/lib/e11y/middleware/routing.rb +78 -21
- data/lib/e11y/middleware/sampling.rb +66 -17
- data/lib/e11y/middleware/self_monitoring_emit.rb +39 -0
- data/lib/e11y/middleware/trace_context.rb +45 -10
- data/lib/e11y/middleware/track_latency.rb +34 -0
- data/lib/e11y/middleware/validation.rb +7 -16
- data/lib/e11y/middleware/versioning.rb +26 -22
- data/lib/e11y/opentelemetry/semantic_conventions.rb +109 -0
- data/lib/e11y/opentelemetry/span_creator.rb +142 -0
- data/lib/e11y/pii/patterns.rb +12 -1
- data/lib/e11y/pipeline/builder.rb +1 -1
- data/lib/e11y/presets/audit_event.rb +13 -2
- data/lib/e11y/railtie.rb +52 -15
- data/lib/e11y/registry.rb +306 -0
- data/lib/e11y/reliability/circuit_breaker.rb +19 -21
- data/lib/e11y/reliability/dlq/base.rb +71 -0
- data/lib/e11y/reliability/dlq/file_adapter.rb +301 -0
- data/lib/e11y/reliability/dlq/file_storage.rb +63 -34
- data/lib/e11y/reliability/dlq/filter.rb +37 -54
- data/lib/e11y/reliability/retry_handler.rb +26 -29
- data/lib/e11y/reliability/retry_rate_limiter.rb +3 -11
- data/lib/e11y/sampling/error_spike_detector.rb +0 -2
- data/lib/e11y/sampling/load_monitor.rb +5 -9
- data/lib/e11y/sampling/stratified_tracker.rb +18 -0
- data/lib/e11y/self_monitoring/buffer_monitor.rb +2 -0
- data/lib/e11y/self_monitoring/performance_monitor.rb +19 -61
- data/lib/e11y/self_monitoring/reliability_monitor.rb +4 -74
- data/lib/e11y/slo/config_loader.rb +40 -0
- data/lib/e11y/slo/config_validator.rb +58 -0
- data/lib/e11y/slo/dashboard_generator.rb +122 -0
- data/lib/e11y/slo/event_driven.rb +8 -0
- data/lib/e11y/slo/tracker.rb +31 -4
- data/lib/e11y/testing/have_tracked_event_matcher.rb +190 -0
- data/lib/e11y/testing/rspec_matchers.rb +21 -0
- data/lib/e11y/testing/snapshot_matcher.rb +86 -0
- data/lib/e11y/trace_context/sampler.rb +35 -0
- data/lib/e11y/tracing/faraday_middleware.rb +31 -0
- data/lib/e11y/tracing/net_http_patch.rb +33 -0
- data/lib/e11y/tracing/propagator.rb +116 -0
- data/lib/e11y/tracing.rb +47 -0
- data/lib/e11y/version.rb +1 -1
- data/lib/e11y/versioning/version_extractor.rb +32 -0
- data/lib/e11y.rb +141 -265
- data/lib/generators/e11y/event/event_generator.rb +22 -0
- data/lib/generators/e11y/event/templates/event.rb.tt +16 -0
- data/lib/generators/e11y/grafana_dashboard/grafana_dashboard_generator.rb +30 -0
- data/lib/generators/e11y/grafana_dashboard/templates/e11y_dashboard.json +81 -0
- data/lib/generators/e11y/install/install_generator.rb +34 -0
- data/lib/generators/e11y/install/templates/e11y.rb +239 -0
- data/lib/generators/e11y/prometheus_alerts/prometheus_alerts_generator.rb +29 -0
- data/lib/generators/e11y/prometheus_alerts/templates/e11y_alerts.yml +28 -0
- data/lib/tasks/e11y_docs.rake +30 -0
- data/lib/tasks/e11y_events.rake +71 -0
- data/lib/tasks/e11y_lint.rake +91 -0
- data/lib/tasks/e11y_slo.rake +29 -0
- metadata +129 -39
- data/docs/ADR-010-developer-experience.md +0 -2166
- data/docs/API-REFERENCE-L28.md +0 -914
- data/docs/COMPREHENSIVE-CONFIGURATION.md +0 -2366
- data/docs/CONTRIBUTING.md +0 -312
- data/docs/IMPLEMENTATION_NOTES.md +0 -2804
- data/docs/IMPLEMENTATION_PLAN.md +0 -1971
- data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +0 -586
- data/docs/PLAN.md +0 -148
- data/docs/README.md +0 -296
- data/docs/design/00-memory-optimization.md +0 -593
- data/docs/guides/MIGRATION-L27-L28.md +0 -692
- data/docs/guides/PERFORMANCE-BENCHMARKS.md +0 -434
- data/docs/guides/README.md +0 -44
- data/docs/use_cases/UC-003-pattern-based-metrics.md +0 -1627
- data/lib/e11y/adapters/registry.rb +0 -141
- /data/docs/{ADR-012-event-evolution.md → architecture/ADR-012-event-evolution.md} +0 -0
|
@@ -234,7 +234,7 @@ end
|
|
|
234
234
|
> config.pipeline.use RoutingMiddleware # 6. Buffer routing (LAST!)
|
|
235
235
|
> ```
|
|
236
236
|
>
|
|
237
|
-
> **See:** [ADR-001 Section 4.1: Middleware Execution Order](../ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../ADR-015-middleware-order.md) for detailed explanation.
|
|
237
|
+
> **See:** [ADR-001 Section 4.1: Middleware Execution Order](../architecture/ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../architecture/ADR-015-middleware-order.md) for detailed explanation.
|
|
238
238
|
|
|
239
239
|
---
|
|
240
240
|
|
|
@@ -275,7 +275,7 @@ end
|
|
|
275
275
|
|
|
276
276
|
### Dual-Buffer Architecture
|
|
277
277
|
|
|
278
|
-
**E11y
|
|
278
|
+
**E11y uses TWO independent buffers:**
|
|
279
279
|
|
|
280
280
|
```
|
|
281
281
|
┌─────────────────────────────────────────────────────────────────┐
|
|
@@ -318,15 +318,15 @@ Background Flush Thread (200ms interval):
|
|
|
318
318
|
### Buffer Routing Logic
|
|
319
319
|
|
|
320
320
|
```ruby
|
|
321
|
-
# Pseudo-code
|
|
321
|
+
# Pseudo-code for understanding
|
|
322
322
|
def track_event(event)
|
|
323
323
|
if event.severity == :debug && E11y.request_scope.active?
|
|
324
324
|
# → Request-scoped buffer (Thread-local)
|
|
325
|
-
Thread.current[:
|
|
325
|
+
Thread.current[:e11y_ephemeral_buffer] << event
|
|
326
326
|
else
|
|
327
327
|
# → Main buffer (Global SPSC ring buffer)
|
|
328
328
|
E11y.main_buffer << event
|
|
329
|
-
#
|
|
329
|
+
# Background thread will pick up in 200ms (or sooner if batch fills)
|
|
330
330
|
end
|
|
331
331
|
end
|
|
332
332
|
```
|
|
@@ -401,11 +401,17 @@ module E11y::RequestScope
|
|
|
401
401
|
end
|
|
402
402
|
```
|
|
403
403
|
|
|
404
|
+
> **DevLog integration (development/test):** When the debug buffer is flushed on request
|
|
405
|
+
> failure, events are delivered to all registered adapters — including
|
|
406
|
+
> `E11y::Adapters::DevLog` (auto-registered in development/test via Railtie). Debug
|
|
407
|
+
> events from failed requests automatically appear in `log/e11y_dev.jsonl` and become
|
|
408
|
+
> visible in the TUI and Browser Overlay. See [UC-017](UC-017-local-development.md).
|
|
409
|
+
|
|
404
410
|
---
|
|
405
411
|
|
|
406
412
|
## 📈 Performance Impact
|
|
407
413
|
|
|
408
|
-
> **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
|
|
414
|
+
> **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../architecture/ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
|
|
409
415
|
|
|
410
416
|
### Buffer Metrics
|
|
411
417
|
|
|
@@ -413,20 +419,20 @@ end
|
|
|
413
419
|
|
|
414
420
|
```ruby
|
|
415
421
|
# Exposed via Yabeda (auto-configured)
|
|
416
|
-
Yabeda.
|
|
417
|
-
Yabeda.
|
|
422
|
+
Yabeda.e11y_ephemeral_buffer_size # Gauge: current buffer size per request
|
|
423
|
+
Yabeda.e11y_ephemeral_buffer_flushes_total # Counter: buffer flushes by trigger
|
|
418
424
|
|
|
419
425
|
# Accessible via Prometheus metrics endpoint
|
|
420
426
|
# Example queries:
|
|
421
427
|
|
|
422
428
|
# 1. Average buffer size
|
|
423
|
-
avg(
|
|
429
|
+
avg(e11y_ephemeral_buffer_size)
|
|
424
430
|
|
|
425
431
|
# 2. Buffer flush rate by trigger
|
|
426
|
-
rate(
|
|
432
|
+
rate(e11y_ephemeral_buffer_flushes_total{trigger="error"}[5m])
|
|
427
433
|
|
|
428
434
|
# 3. Buffer overflow alerts
|
|
429
|
-
|
|
435
|
+
e11y_ephemeral_buffer_size >= 100 # Alert if buffer limit reached
|
|
430
436
|
```
|
|
431
437
|
|
|
432
438
|
**Monitoring Examples:**
|
|
@@ -436,18 +442,18 @@ e11y_request_buffer_size >= 100 # Alert if buffer limit reached
|
|
|
436
442
|
|
|
437
443
|
# Panel 1: Buffer Size Distribution
|
|
438
444
|
histogram_quantile(0.99,
|
|
439
|
-
sum(rate(
|
|
445
|
+
sum(rate(e11y_ephemeral_buffer_size[5m])) by (le)
|
|
440
446
|
)
|
|
441
447
|
# Shows p99 buffer size
|
|
442
448
|
|
|
443
449
|
# Panel 2: Flush Triggers Breakdown
|
|
444
450
|
sum by (trigger) (
|
|
445
|
-
rate(
|
|
451
|
+
rate(e11y_ephemeral_buffer_flushes_total[5m])
|
|
446
452
|
)
|
|
447
453
|
# Shows why buffers flush (error vs. slow_request vs. custom)
|
|
448
454
|
|
|
449
455
|
# Panel 3: Memory Impact Estimate
|
|
450
|
-
avg(
|
|
456
|
+
avg(e11y_ephemeral_buffer_size) * 500 # bytes per event
|
|
451
457
|
# Estimates per-request memory usage
|
|
452
458
|
```
|
|
453
459
|
|
|
@@ -609,13 +615,13 @@ end
|
|
|
609
615
|
|
|
610
616
|
---
|
|
611
617
|
|
|
612
|
-
## 🔄
|
|
618
|
+
## 🔄 Interaction with Flush Interval (200ms)
|
|
613
619
|
|
|
614
|
-
###
|
|
620
|
+
### Question: Do the buffers conflict?
|
|
615
621
|
|
|
616
|
-
|
|
622
|
+
**Answer: NO. They are independent.**
|
|
617
623
|
|
|
618
|
-
###
|
|
624
|
+
### Detailed Logic
|
|
619
625
|
|
|
620
626
|
```ruby
|
|
621
627
|
# config/initializers/e11y.rb
|
|
@@ -636,9 +642,9 @@ E11y.configure do |config|
|
|
|
636
642
|
end
|
|
637
643
|
```
|
|
638
644
|
|
|
639
|
-
###
|
|
645
|
+
### Event Flow
|
|
640
646
|
|
|
641
|
-
**Scenario 1:
|
|
647
|
+
**Scenario 1: Normal request (successful)**
|
|
642
648
|
```ruby
|
|
643
649
|
# Request starts
|
|
644
650
|
Events::DebugEvent.track(...) # → Request buffer (thread-local)
|
|
@@ -650,7 +656,7 @@ Events::DebugEvent.track(...) # → Request buffer (thread-local)
|
|
|
650
656
|
# → Main buffer flushed every 200ms (success event sent)
|
|
651
657
|
```
|
|
652
658
|
|
|
653
|
-
**Scenario 2:
|
|
659
|
+
**Scenario 2: Request with error**
|
|
654
660
|
```ruby
|
|
655
661
|
# Request starts
|
|
656
662
|
Events::DebugEvent.track(...) # → Request buffer
|
|
@@ -662,20 +668,20 @@ Events::DebugEvent.track(...) # → Request buffer
|
|
|
662
668
|
# → Main buffer continues flush every 200ms (error event sent)
|
|
663
669
|
```
|
|
664
670
|
|
|
665
|
-
**Scenario 3:
|
|
671
|
+
**Scenario 3: High-load service**
|
|
666
672
|
```ruby
|
|
667
|
-
# 1000 requests/sec,
|
|
668
|
-
# → 5000 debug events/sec
|
|
669
|
-
# → 99%
|
|
670
|
-
# → 1%
|
|
673
|
+
# 1000 requests/sec, each with 5 debug events
|
|
674
|
+
# → 5000 debug events/sec in request buffers (thread-local)
|
|
675
|
+
# → 99% successful → 4950 debug events/sec DISCARDED
|
|
676
|
+
# → 1% errors → 50 debug events/sec FLUSHED
|
|
671
677
|
#
|
|
672
|
-
#
|
|
678
|
+
# In parallel:
|
|
673
679
|
# → 1000 info/success events/sec → Main buffer
|
|
674
|
-
# → Flush
|
|
675
|
-
# → 200 events per batch (
|
|
680
|
+
# → Flush every 200ms = 5 batches/sec
|
|
681
|
+
# → 200 events per batch (on average)
|
|
676
682
|
```
|
|
677
683
|
|
|
678
|
-
###
|
|
684
|
+
### Summary: No Conflict!
|
|
679
685
|
|
|
680
686
|
| Event Type | Buffer | Flush Trigger | Latency |
|
|
681
687
|
|------------|--------|---------------|---------|
|
|
@@ -686,14 +692,14 @@ Events::DebugEvent.track(...) # → Request buffer
|
|
|
686
692
|
| `:error` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
687
693
|
| `:fatal` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
688
694
|
|
|
689
|
-
|
|
690
|
-
1. ✅ Debug
|
|
691
|
-
2. ✅
|
|
692
|
-
3. ✅ Debug
|
|
693
|
-
4. ✅ 99% debug
|
|
694
|
-
5. ✅ Thread-safety: request buffer
|
|
695
|
+
**Benefits of dual buffer:**
|
|
696
|
+
1. ✅ Debug events don't clutter main buffer
|
|
697
|
+
2. ✅ Important events (info+) go fast (200ms)
|
|
698
|
+
3. ✅ Debug events go instantly on error (flush triggered)
|
|
699
|
+
4. ✅ 99% of debug events are never processed (discard = zero cost)
|
|
700
|
+
5. ✅ Thread-safety: request buffer isolated in Thread.current
|
|
695
701
|
|
|
696
|
-
###
|
|
702
|
+
### Visual Diagram
|
|
697
703
|
|
|
698
704
|
```
|
|
699
705
|
Time: ──────────────────────────────────────────────────>
|
|
@@ -720,14 +726,14 @@ Background Flush Thread: │ │
|
|
|
720
726
|
200ms 400ms
|
|
721
727
|
```
|
|
722
728
|
|
|
723
|
-
###
|
|
729
|
+
### Example with Numbers
|
|
724
730
|
|
|
725
|
-
|
|
731
|
+
**Load:**
|
|
726
732
|
- 100 requests/sec
|
|
727
|
-
-
|
|
733
|
+
- Each request: 3 debug events + 1 success event
|
|
728
734
|
- Error rate: 1%
|
|
729
735
|
|
|
730
|
-
|
|
736
|
+
**What happens:**
|
|
731
737
|
|
|
732
738
|
| Time | Request Buffer (Thread-local) | Main Buffer (Global) | Flush |
|
|
733
739
|
|------|------------------------------|---------------------|-------|
|
|
@@ -739,8 +745,8 @@ Background Flush Thread: │ │
|
|
|
739
745
|
| 210ms | Req21: [D, D, D] ERROR! | [S21, E21, **D, D, D from Req21**] | **Immediate flush debug** |
|
|
740
746
|
| 400ms | - | [S21...S40] | **Flush next batch** |
|
|
741
747
|
|
|
742
|
-
|
|
743
|
-
- Success events: ~100/sec → flush
|
|
748
|
+
**Result:**
|
|
749
|
+
- Success events: ~100/sec → flush every 200ms → latency <200ms ✅
|
|
744
750
|
- Debug events (99%): DISCARDED → zero overhead ✅
|
|
745
751
|
- Debug events (1% errors): flushed IMMEDIATELY with error context ✅
|
|
746
752
|
|
|
@@ -804,7 +810,7 @@ end
|
|
|
804
810
|
|
|
805
811
|
- **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Define structured events
|
|
806
812
|
- **[UC-010: Background Job Tracking](./UC-010-background-job-tracking.md)** - Buffering in Sidekiq/ActiveJob
|
|
807
|
-
- **[UC-
|
|
813
|
+
- **[UC-017: Local Development](./UC-017-local-development.md)** - Test buffering locally
|
|
808
814
|
|
|
809
815
|
---
|
|
810
816
|
|
|
@@ -42,7 +42,7 @@ Events::OrderPaid.track(
|
|
|
42
42
|
|
|
43
43
|
# Result:
|
|
44
44
|
# 1. Structured log in ELK/Loki (JSON)
|
|
45
|
-
# 2.
|
|
45
|
+
# 2. Event metrics (from metrics do block)
|
|
46
46
|
# 3. Trace context (automatic correlation)
|
|
47
47
|
```
|
|
48
48
|
|
|
@@ -102,22 +102,13 @@ class OrdersController < ApplicationController
|
|
|
102
102
|
end
|
|
103
103
|
end
|
|
104
104
|
|
|
105
|
-
# Step 3:
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
# Histogram: orders.created.amount
|
|
114
|
-
histogram_for pattern: 'order.created',
|
|
115
|
-
name: 'orders.created.amount',
|
|
116
|
-
value: ->(e) { e.payload[:total_amount] },
|
|
117
|
-
tags: [:currency],
|
|
118
|
-
buckets: [10, 50, 100, 500, 1000, 5000]
|
|
119
|
-
end
|
|
120
|
-
end
|
|
105
|
+
# Step 3: Add metrics in event class
|
|
106
|
+
# class Events::OrderCreated < E11y::Event::Base
|
|
107
|
+
# metrics do
|
|
108
|
+
# counter :orders_created_total, tags: [:currency]
|
|
109
|
+
# histogram :orders_created_amount, value: :total_amount, tags: [:currency], buckets: [10, 50, 100, 500]
|
|
110
|
+
# end
|
|
111
|
+
# end
|
|
121
112
|
```
|
|
122
113
|
|
|
123
114
|
**Result in Logs (Loki/ELK):**
|
|
@@ -231,21 +222,7 @@ class RegistrationsController < ApplicationController
|
|
|
231
222
|
end
|
|
232
223
|
end
|
|
233
224
|
|
|
234
|
-
#
|
|
235
|
-
E11y.configure do |config|
|
|
236
|
-
config.metrics do
|
|
237
|
-
# Funnel counter
|
|
238
|
-
counter_for pattern: 'registration.*',
|
|
239
|
-
name: 'registration.funnel.total',
|
|
240
|
-
tags: [:event_name, :source]
|
|
241
|
-
|
|
242
|
-
# Time to first login
|
|
243
|
-
histogram_for pattern: 'first.login',
|
|
244
|
-
name: 'registration.time_to_first_login_hours',
|
|
245
|
-
value: ->(e) { e.payload[:time_since_registration_hours] },
|
|
246
|
-
buckets: [1, 6, 12, 24, 48, 72, 168] # hours
|
|
247
|
-
end
|
|
248
|
-
end
|
|
225
|
+
# Add metrics do in each event class (Events::RegistrationStarted, Events::EmailVerified, etc.)
|
|
249
226
|
```
|
|
250
227
|
|
|
251
228
|
**Funnel Analysis (Grafana/Prometheus):**
|
|
@@ -353,28 +330,7 @@ class ProcessPaymentJob < ApplicationJob
|
|
|
353
330
|
end
|
|
354
331
|
end
|
|
355
332
|
|
|
356
|
-
#
|
|
357
|
-
E11y.configure do |config|
|
|
358
|
-
config.metrics do
|
|
359
|
-
# Success rate (critical metric!)
|
|
360
|
-
success_rate_for pattern: 'payment.*',
|
|
361
|
-
name: 'payments.success_rate',
|
|
362
|
-
tags: [:payment_method]
|
|
363
|
-
# Auto-calculates: succeeded / (succeeded + failed) * 100
|
|
364
|
-
|
|
365
|
-
# Payment duration (performance)
|
|
366
|
-
histogram_for pattern: 'payment.succeeded',
|
|
367
|
-
value: ->(e) { e.duration_ms },
|
|
368
|
-
name: 'payments.duration_ms',
|
|
369
|
-
tags: [:payment_method],
|
|
370
|
-
buckets: [100, 250, 500, 1000, 2000, 5000]
|
|
371
|
-
|
|
372
|
-
# Failed payments by error code (debugging)
|
|
373
|
-
counter_for pattern: 'payment.failed',
|
|
374
|
-
name: 'payments.failed.total',
|
|
375
|
-
tags: [:error_code, :payment_method]
|
|
376
|
-
end
|
|
377
|
-
end
|
|
333
|
+
# Add metrics do in PaymentSucceeded, PaymentFailed event classes
|
|
378
334
|
```
|
|
379
335
|
|
|
380
336
|
**Alerts (Prometheus):**
|
|
@@ -505,7 +461,7 @@ module Events
|
|
|
505
461
|
rate_limit 1000
|
|
506
462
|
sample_rate 1.0 # Never sample payments (high-value)
|
|
507
463
|
retention 7.years # Financial records
|
|
508
|
-
adapters [:loki, :sentry
|
|
464
|
+
adapters [:loki, :sentry]
|
|
509
465
|
|
|
510
466
|
# Common PII filtering
|
|
511
467
|
contains_pii true
|
|
@@ -605,7 +561,7 @@ module E11y
|
|
|
605
561
|
rate_limit 10_000
|
|
606
562
|
sample_rate 1.0 # Never sample
|
|
607
563
|
retention 7.years
|
|
608
|
-
adapters [:loki, :sentry
|
|
564
|
+
adapters [:loki, :sentry]
|
|
609
565
|
end
|
|
610
566
|
end
|
|
611
567
|
|
|
@@ -669,7 +625,7 @@ module Events
|
|
|
669
625
|
rate_limit 5000
|
|
670
626
|
sample_rate 1.0
|
|
671
627
|
retention 7.years
|
|
672
|
-
adapters [:loki, :elasticsearch, :
|
|
628
|
+
adapters [:loki, :elasticsearch, :slack_business]
|
|
673
629
|
|
|
674
630
|
metric :counter,
|
|
675
631
|
name: 'critical_business_events.total',
|
|
@@ -1051,9 +1007,7 @@ E11y.configure do |config|
|
|
|
1051
1007
|
config.register_adapter :loki, E11y::Adapters::LokiAdapter.new(
|
|
1052
1008
|
url: ENV['LOKI_URL']
|
|
1053
1009
|
)
|
|
1054
|
-
|
|
1055
|
-
bucket: 'payment-archive'
|
|
1056
|
-
)
|
|
1010
|
+
# Archival: external jobs filter Loki by retention_until (ISO8601) for tier migration
|
|
1057
1011
|
config.default_adapters = [:loki]
|
|
1058
1012
|
|
|
1059
1013
|
when 'staging'
|
|
@@ -1082,9 +1036,9 @@ module Events
|
|
|
1082
1036
|
required(:amount).filled(:decimal)
|
|
1083
1037
|
end
|
|
1084
1038
|
|
|
1085
|
-
# Production:
|
|
1039
|
+
# Production: retention_period 7.years → retention_until in payload; archival jobs filter by it
|
|
1086
1040
|
if Rails.env.production?
|
|
1087
|
-
adapters [:loki
|
|
1041
|
+
adapters [:loki]
|
|
1088
1042
|
end
|
|
1089
1043
|
# Other envs: use default_adapters
|
|
1090
1044
|
end
|
|
@@ -1095,36 +1049,13 @@ end
|
|
|
1095
1049
|
|
|
1096
1050
|
## 📊 Metrics Configuration
|
|
1097
1051
|
|
|
1098
|
-
|
|
1052
|
+
Define metrics in each event class:
|
|
1099
1053
|
|
|
1100
1054
|
```ruby
|
|
1101
|
-
|
|
1102
|
-
|
|
1103
|
-
|
|
1104
|
-
|
|
1105
|
-
name: 'business_events.total',
|
|
1106
|
-
tags: [:event_name, :severity]
|
|
1107
|
-
|
|
1108
|
-
# Domain-specific counters
|
|
1109
|
-
counter_for pattern: 'order.*',
|
|
1110
|
-
name: 'orders.events.total',
|
|
1111
|
-
tags: [:event_name]
|
|
1112
|
-
|
|
1113
|
-
counter_for pattern: 'user.*',
|
|
1114
|
-
name: 'users.events.total',
|
|
1115
|
-
tags: [:event_name]
|
|
1116
|
-
|
|
1117
|
-
# Histograms for amounts/durations
|
|
1118
|
-
histogram_for pattern: '*.paid',
|
|
1119
|
-
name: 'payments.amount',
|
|
1120
|
-
value: ->(e) { e.payload[:amount] },
|
|
1121
|
-
tags: [:currency],
|
|
1122
|
-
buckets: [10, 50, 100, 500, 1000, 5000, 10000]
|
|
1123
|
-
|
|
1124
|
-
# Success rate (special metric type)
|
|
1125
|
-
success_rate_for pattern: 'payment.*',
|
|
1126
|
-
name: 'payments.success_rate'
|
|
1127
|
-
# Automatically calculates from :success and :error events
|
|
1055
|
+
class Events::OrderCreated < E11y::Event::Base
|
|
1056
|
+
metrics do
|
|
1057
|
+
counter :orders_created_total, tags: [:currency]
|
|
1058
|
+
histogram :order_amount, value: :amount, tags: [:currency], buckets: [10, 50, 100, 500]
|
|
1128
1059
|
end
|
|
1129
1060
|
end
|
|
1130
1061
|
```
|
|
@@ -1542,7 +1473,7 @@ E11y is designed for **high-performance production environments** with strict SL
|
|
|
1542
1473
|
```ruby
|
|
1543
1474
|
# Benchmark: 1000 events/sec
|
|
1544
1475
|
Benchmark.ips do |x|
|
|
1545
|
-
x.report("
|
|
1476
|
+
x.report("EventClass.track") do
|
|
1546
1477
|
Events::OrderPaid.track(
|
|
1547
1478
|
order_id: 'ORD-123',
|
|
1548
1479
|
amount: 99.99
|
|
@@ -1551,7 +1482,7 @@ Benchmark.ips do |x|
|
|
|
1551
1482
|
end
|
|
1552
1483
|
|
|
1553
1484
|
# Results:
|
|
1554
|
-
#
|
|
1485
|
+
# EventClass.track: 100,000 i/s → ~0.01ms per call
|
|
1555
1486
|
# p99 latency: <1ms ✅
|
|
1556
1487
|
```
|
|
1557
1488
|
|
|
@@ -1918,11 +1849,11 @@ end
|
|
|
1918
1849
|
class Events::CriticalPayment < Events::BasePaymentEvent
|
|
1919
1850
|
include E11y::Presets::HighValueEvent
|
|
1920
1851
|
|
|
1921
|
-
adapters [:loki, :sentry
|
|
1852
|
+
adapters [:loki, :sentry]
|
|
1922
1853
|
|
|
1923
1854
|
# Final config:
|
|
1924
1855
|
# - severity: :success (from base)
|
|
1925
|
-
# - adapters: [:loki, :sentry
|
|
1856
|
+
# - adapters: [:loki, :sentry] (event-level override)
|
|
1926
1857
|
# - sample_rate: 1.0 (from base)
|
|
1927
1858
|
# - rate_limit: 10_000 (from preset)
|
|
1928
1859
|
# - retention: 7.years (from preset)
|
|
@@ -1943,7 +1874,7 @@ end
|
|
|
1943
1874
|
## 📚 Related Use Cases
|
|
1944
1875
|
|
|
1945
1876
|
- **[UC-001: Request-Scoped Debug Buffering](./UC-001-request-scoped-debug-buffering.md)** - Debug vs business events
|
|
1946
|
-
- **[UC-003:
|
|
1877
|
+
- **[UC-003: Event Metrics](./UC-003-event-metrics.md)** - Metrics in event classes
|
|
1947
1878
|
- **[UC-005: PII Filtering](./UC-005-pii-filtering.md)** - Secure event data
|
|
1948
1879
|
|
|
1949
1880
|
---
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# UC-003: Event Metrics
|
|
2
|
+
|
|
3
|
+
**Status:** Implemented
|
|
4
|
+
**Complexity:** Intermediate
|
|
5
|
+
**Setup Time:** 15-30 minutes
|
|
6
|
+
**Target Users:** DevOps, SRE, Backend Developers
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Define metrics directly in event classes. Metrics are registered at boot and updated automatically when events are tracked.
|
|
13
|
+
|
|
14
|
+
### Event-Level Metrics DSL
|
|
15
|
+
|
|
16
|
+
```ruby
|
|
17
|
+
class Events::OrderPaid < E11y::Event::Base
|
|
18
|
+
schema do
|
|
19
|
+
required(:order_id).filled(:string)
|
|
20
|
+
required(:amount).filled(:float)
|
|
21
|
+
required(:currency).filled(:string)
|
|
22
|
+
required(:payment_method).filled(:string)
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
metrics do
|
|
26
|
+
counter :orders_paid_total, tags: [:currency, :payment_method]
|
|
27
|
+
histogram :orders_paid_amount, value: :amount, tags: [:currency], buckets: [10, 50, 100, 500, 1000, 5000]
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
Events::OrderPaid.track(order_id: '123', amount: 99.99, currency: 'USD', payment_method: 'stripe')
|
|
32
|
+
# → orders_paid_total{currency="USD",payment_method="stripe"} += 1
|
|
33
|
+
# → orders_paid_amount_bucket{currency="USD",le="100"} += 1
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Metric Types
|
|
37
|
+
|
|
38
|
+
- **counter** — monotonically increasing
|
|
39
|
+
- **histogram** — distribution (requires `value:` field, optional `buckets:`)
|
|
40
|
+
- **gauge** — point-in-time value (requires `value:`)
|
|
41
|
+
|
|
42
|
+
### Boot-Time Validation
|
|
43
|
+
|
|
44
|
+
E11y validates metrics at Rails boot: label conflicts, type conflicts. Non-Rails: call `E11y::Metrics::Registry.instance.validate_all!` after loading events.
|
|
45
|
+
|
|
46
|
+
### Shared Metrics via Inheritance
|
|
47
|
+
|
|
48
|
+
```ruby
|
|
49
|
+
class BaseOrderEvent < E11y::Event::Base
|
|
50
|
+
metrics do
|
|
51
|
+
counter :orders_total, tags: [:currency, :status]
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
class Events::OrderPaid < BaseOrderEvent
|
|
56
|
+
metrics do
|
|
57
|
+
histogram :order_amount, value: :amount, tags: [:currency]
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Yabeda Integration
|
|
65
|
+
|
|
66
|
+
Register Yabeda adapter in `config.adapters`. Metrics flow to Prometheus via Yabeda.
|
|
@@ -122,6 +122,45 @@ yabeda_slo_sidekiq_job_duration_seconds{class="ProcessOrderJob"}
|
|
|
122
122
|
|
|
123
123
|
---
|
|
124
124
|
|
|
125
|
+
### Event-Level SLO (Business Logic)
|
|
126
|
+
|
|
127
|
+
**Event-driven SLO** tracks business logic reliability (e.g., order created vs failed, payment processed vs rejected). Opt-in via `slo { enabled true }` in Event classes.
|
|
128
|
+
|
|
129
|
+
> **Implementation:** See [ADR-014 Event-Driven SLO](../architecture/ADR-014-event-driven-slo.md) for full architecture.
|
|
130
|
+
|
|
131
|
+
**Enable Event SLO in Event class:**
|
|
132
|
+
|
|
133
|
+
```ruby
|
|
134
|
+
module Events
|
|
135
|
+
class OrderCreated < E11y::Event::Base
|
|
136
|
+
schema do
|
|
137
|
+
required(:order_id).filled(:string)
|
|
138
|
+
optional(:status).filled(:string)
|
|
139
|
+
end
|
|
140
|
+
|
|
141
|
+
slo do
|
|
142
|
+
enabled true
|
|
143
|
+
contributes_to "order_creation_success_rate"
|
|
144
|
+
slo_status_from do |payload|
|
|
145
|
+
case payload[:status].to_s
|
|
146
|
+
when "failed", "cancelled" then "failure"
|
|
147
|
+
when "pending", "completed" then "success"
|
|
148
|
+
else "success"
|
|
149
|
+
end
|
|
150
|
+
end
|
|
151
|
+
end
|
|
152
|
+
end
|
|
153
|
+
end
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**Required when `slo { enabled true }`:**
|
|
157
|
+
- `contributes_to "slo_name"` — which custom SLO this event feeds
|
|
158
|
+
- `slo_status_from { |payload| ... }` — compute `"success"`, `"failure"`, or `nil` (not counted)
|
|
159
|
+
|
|
160
|
+
**EventSlo middleware** (in default pipeline) emits `e11y_slo_event_result_total{slo_name, slo_status}` for events with SLO enabled.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
125
164
|
## 📐 Sampling Correction for Accurate SLO (C11 Resolution) ⚠️ CRITICAL
|
|
126
165
|
|
|
127
166
|
**Reference:** [ADR-009 Section 3.7: Stratified Sampling for SLO Accuracy (C11 Resolution)](../ADR-009-cost-optimization.md#37-stratified-sampling-for-slo-accuracy-c11-resolution) and [CONFLICT-ANALYSIS.md C11](../researches/CONFLICT-ANALYSIS.md#c11-adaptive-sampling--slo-tracking)
|
|
@@ -499,107 +538,9 @@ resource "grafana_dashboard" "e11y_slo" {
|
|
|
499
538
|
|
|
500
539
|
---
|
|
501
540
|
|
|
502
|
-
## 🚨
|
|
503
|
-
|
|
504
|
-
> **Implementation:** See [ADR-003 Section 5: Multi-Window Multi-Burn Rate Alerts](../ADR-003-slo-observability.md#5-multi-window-multi-burn-rate-alerts) for Google SRE best practice alert architecture.
|
|
541
|
+
## 🚨 PromQL & Alert Rules
|
|
505
542
|
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
```bash
|
|
509
|
-
rails g e11y:prometheus_alerts
|
|
510
|
-
|
|
511
|
-
# Output: config/prometheus/e11y_slo_alerts.yml
|
|
512
|
-
```
|
|
513
|
-
|
|
514
|
-
**Alerts include:**
|
|
515
|
-
- High error rate (>1%)
|
|
516
|
-
- Low availability (<99.9%)
|
|
517
|
-
- High latency (p95 >200ms)
|
|
518
|
-
- Job failure rate (>5%)
|
|
519
|
-
|
|
520
|
-
**Example alerts.yml:**
|
|
521
|
-
```yaml
|
|
522
|
-
groups:
|
|
523
|
-
- name: e11y_slo
|
|
524
|
-
rules:
|
|
525
|
-
- alert: HighErrorRate
|
|
526
|
-
expr: |
|
|
527
|
-
(
|
|
528
|
-
sum(rate(yabeda_slo_http_requests_total{status=~"5.."}[5m])) /
|
|
529
|
-
sum(rate(yabeda_slo_http_requests_total[5m]))
|
|
530
|
-
) > 0.01
|
|
531
|
-
for: 5m
|
|
532
|
-
annotations:
|
|
533
|
-
summary: "HTTP error rate >1%"
|
|
534
|
-
|
|
535
|
-
- alert: HighLatency
|
|
536
|
-
expr: histogram_quantile(0.95, rate(yabeda_slo_http_request_duration_seconds_bucket[5m])) > 0.2
|
|
537
|
-
for: 5m
|
|
538
|
-
annotations:
|
|
539
|
-
summary: "HTTP p95 latency >200ms"
|
|
540
|
-
```
|
|
541
|
-
|
|
542
|
-
---
|
|
543
|
-
|
|
544
|
-
## 🎯 Error Budget Management
|
|
545
|
-
|
|
546
|
-
> **Implementation:** See [ADR-003 Section 7: Error Budget Management](../ADR-003-slo-observability.md#7-error-budget-management) for detailed architecture and deployment gates.
|
|
547
|
-
|
|
548
|
-
**Track your SLO error budget in real-time:**
|
|
549
|
-
|
|
550
|
-
```ruby
|
|
551
|
-
# Query error budget for any endpoint
|
|
552
|
-
budget = E11y::SLO::ErrorBudget.new('OrdersController', 'create', slo_config)
|
|
553
|
-
|
|
554
|
-
budget.total # => 0.001 (0.1% for 99.9% target)
|
|
555
|
-
budget.consumed # => 0.0005 (50% of budget used)
|
|
556
|
-
budget.remaining # => 0.0005 (50% of budget left)
|
|
557
|
-
budget.percent_consumed # => 50.0
|
|
558
|
-
budget.exhausted? # => false
|
|
559
|
-
budget.time_until_exhaustion # => 14.5 days (at current burn rate)
|
|
560
|
-
```
|
|
561
|
-
|
|
562
|
-
### Deployment Gate (Optional)
|
|
563
|
-
|
|
564
|
-
**Prevent deployments when error budget is exhausted:**
|
|
565
|
-
|
|
566
|
-
```ruby
|
|
567
|
-
# config/initializers/e11y.rb
|
|
568
|
-
E11y.configure do |config|
|
|
569
|
-
config.slo do
|
|
570
|
-
error_budget do
|
|
571
|
-
# Block deployments if <20% budget remaining
|
|
572
|
-
deployment_gate enabled: true, minimum_budget_percent: 20
|
|
573
|
-
end
|
|
574
|
-
end
|
|
575
|
-
end
|
|
576
|
-
```
|
|
577
|
-
|
|
578
|
-
**CI/CD integration:**
|
|
579
|
-
|
|
580
|
-
```bash
|
|
581
|
-
# Before deployment, check error budget
|
|
582
|
-
rails e11y:slo:check_budget
|
|
583
|
-
|
|
584
|
-
# Exit code 0: ✅ Budget available, deploy
|
|
585
|
-
# Exit code 1: ❌ Budget exhausted, block deploy
|
|
586
|
-
```
|
|
587
|
-
|
|
588
|
-
**Example output:**
|
|
589
|
-
|
|
590
|
-
```
|
|
591
|
-
Checking SLO Error Budget...
|
|
592
|
-
|
|
593
|
-
OrdersController#create:
|
|
594
|
-
✅ Budget: 75% remaining (Target: 99.9%, Actual: 99.925%)
|
|
595
|
-
|
|
596
|
-
PaymentsController#process:
|
|
597
|
-
❌ Budget: 5% remaining (Target: 99.95%, Actual: 99.902%)
|
|
598
|
-
⚠️ DEPLOYMENT BLOCKED: Error budget below 20% threshold
|
|
599
|
-
|
|
600
|
-
Overall: ❌ FAILED
|
|
601
|
-
Cannot deploy: 1 endpoint(s) below minimum error budget
|
|
602
|
-
```
|
|
543
|
+
> See [SLO-PROMQL-ALERTS.md](../SLO-PROMQL-ALERTS.md) for PromQL queries and Prometheus alert rules.
|
|
603
544
|
|
|
604
545
|
---
|
|
605
546
|
|
|
@@ -720,7 +661,7 @@ end
|
|
|
720
661
|
## 📚 Related Use Cases
|
|
721
662
|
|
|
722
663
|
- **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Events vs SLO metrics
|
|
723
|
-
- **[UC-003:
|
|
664
|
+
- **[UC-003: Event Metrics](./UC-003-event-metrics.md)** - Custom metrics
|
|
724
665
|
|
|
725
666
|
---
|
|
726
667
|
|