e11y 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +4 -0
- data/.rubocop.yml +69 -0
- data/CHANGELOG.md +26 -0
- data/CODE_OF_CONDUCT.md +64 -0
- data/LICENSE.txt +21 -0
- data/README.md +179 -0
- data/Rakefile +37 -0
- data/benchmarks/run_all.rb +33 -0
- data/config/README.md +83 -0
- data/config/loki-local-config.yaml +35 -0
- data/config/prometheus.yml +15 -0
- data/docker-compose.yml +78 -0
- data/docs/00-ICP-AND-TIMELINE.md +483 -0
- data/docs/01-SCALE-REQUIREMENTS.md +858 -0
- data/docs/ADR-001-architecture.md +2617 -0
- data/docs/ADR-002-metrics-yabeda.md +1395 -0
- data/docs/ADR-003-slo-observability.md +3337 -0
- data/docs/ADR-004-adapter-architecture.md +2385 -0
- data/docs/ADR-005-tracing-context.md +1372 -0
- data/docs/ADR-006-security-compliance.md +4143 -0
- data/docs/ADR-007-opentelemetry-integration.md +1385 -0
- data/docs/ADR-008-rails-integration.md +1911 -0
- data/docs/ADR-009-cost-optimization.md +2993 -0
- data/docs/ADR-010-developer-experience.md +2166 -0
- data/docs/ADR-011-testing-strategy.md +1836 -0
- data/docs/ADR-012-event-evolution.md +958 -0
- data/docs/ADR-013-reliability-error-handling.md +2750 -0
- data/docs/ADR-014-event-driven-slo.md +1533 -0
- data/docs/ADR-015-middleware-order.md +1061 -0
- data/docs/ADR-016-self-monitoring-slo.md +1234 -0
- data/docs/API-REFERENCE-L28.md +914 -0
- data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
- data/docs/IMPLEMENTATION_NOTES.md +2804 -0
- data/docs/IMPLEMENTATION_PLAN.md +1971 -0
- data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
- data/docs/PLAN.md +148 -0
- data/docs/QUICK-START.md +934 -0
- data/docs/README.md +296 -0
- data/docs/design/00-memory-optimization.md +593 -0
- data/docs/guides/MIGRATION-L27-L28.md +692 -0
- data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
- data/docs/guides/README.md +44 -0
- data/docs/prd/01-overview-vision.md +440 -0
- data/docs/use_cases/README.md +119 -0
- data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
- data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
- data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
- data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
- data/docs/use_cases/UC-005-sentry-integration.md +759 -0
- data/docs/use_cases/UC-006-trace-context-management.md +905 -0
- data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
- data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
- data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
- data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
- data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
- data/docs/use_cases/UC-012-audit-trail.md +2301 -0
- data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
- data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
- data/docs/use_cases/UC-015-cost-optimization.md +735 -0
- data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
- data/docs/use_cases/UC-017-local-development.md +867 -0
- data/docs/use_cases/UC-018-testing-events.md +1081 -0
- data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
- data/docs/use_cases/UC-020-event-versioning.md +708 -0
- data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
- data/docs/use_cases/UC-022-event-registry.md +648 -0
- data/docs/use_cases/backlog.md +226 -0
- data/e11y.gemspec +76 -0
- data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
- data/lib/e11y/adapters/audit_encrypted.rb +239 -0
- data/lib/e11y/adapters/base.rb +580 -0
- data/lib/e11y/adapters/file.rb +224 -0
- data/lib/e11y/adapters/in_memory.rb +216 -0
- data/lib/e11y/adapters/loki.rb +333 -0
- data/lib/e11y/adapters/otel_logs.rb +203 -0
- data/lib/e11y/adapters/registry.rb +141 -0
- data/lib/e11y/adapters/sentry.rb +230 -0
- data/lib/e11y/adapters/stdout.rb +108 -0
- data/lib/e11y/adapters/yabeda.rb +370 -0
- data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
- data/lib/e11y/buffers/base_buffer.rb +40 -0
- data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
- data/lib/e11y/buffers/ring_buffer.rb +267 -0
- data/lib/e11y/buffers.rb +14 -0
- data/lib/e11y/console.rb +122 -0
- data/lib/e11y/current.rb +48 -0
- data/lib/e11y/event/base.rb +894 -0
- data/lib/e11y/event/value_sampling_config.rb +84 -0
- data/lib/e11y/events/base_audit_event.rb +43 -0
- data/lib/e11y/events/base_payment_event.rb +33 -0
- data/lib/e11y/events/rails/cache/delete.rb +21 -0
- data/lib/e11y/events/rails/cache/read.rb +23 -0
- data/lib/e11y/events/rails/cache/write.rb +22 -0
- data/lib/e11y/events/rails/database/query.rb +45 -0
- data/lib/e11y/events/rails/http/redirect.rb +21 -0
- data/lib/e11y/events/rails/http/request.rb +26 -0
- data/lib/e11y/events/rails/http/send_file.rb +21 -0
- data/lib/e11y/events/rails/http/start_processing.rb +26 -0
- data/lib/e11y/events/rails/job/completed.rb +22 -0
- data/lib/e11y/events/rails/job/enqueued.rb +22 -0
- data/lib/e11y/events/rails/job/failed.rb +22 -0
- data/lib/e11y/events/rails/job/scheduled.rb +23 -0
- data/lib/e11y/events/rails/job/started.rb +22 -0
- data/lib/e11y/events/rails/log.rb +56 -0
- data/lib/e11y/events/rails/view/render.rb +23 -0
- data/lib/e11y/events.rb +18 -0
- data/lib/e11y/instruments/active_job.rb +201 -0
- data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
- data/lib/e11y/instruments/sidekiq.rb +175 -0
- data/lib/e11y/logger/bridge.rb +205 -0
- data/lib/e11y/metrics/cardinality_protection.rb +172 -0
- data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
- data/lib/e11y/metrics/registry.rb +234 -0
- data/lib/e11y/metrics/relabeling.rb +226 -0
- data/lib/e11y/metrics.rb +102 -0
- data/lib/e11y/middleware/audit_signing.rb +174 -0
- data/lib/e11y/middleware/base.rb +140 -0
- data/lib/e11y/middleware/event_slo.rb +167 -0
- data/lib/e11y/middleware/pii_filter.rb +266 -0
- data/lib/e11y/middleware/pii_filtering.rb +280 -0
- data/lib/e11y/middleware/rate_limiting.rb +214 -0
- data/lib/e11y/middleware/request.rb +163 -0
- data/lib/e11y/middleware/routing.rb +157 -0
- data/lib/e11y/middleware/sampling.rb +254 -0
- data/lib/e11y/middleware/slo.rb +168 -0
- data/lib/e11y/middleware/trace_context.rb +131 -0
- data/lib/e11y/middleware/validation.rb +118 -0
- data/lib/e11y/middleware/versioning.rb +132 -0
- data/lib/e11y/middleware.rb +12 -0
- data/lib/e11y/pii/patterns.rb +90 -0
- data/lib/e11y/pii.rb +13 -0
- data/lib/e11y/pipeline/builder.rb +155 -0
- data/lib/e11y/pipeline/zone_validator.rb +110 -0
- data/lib/e11y/pipeline.rb +12 -0
- data/lib/e11y/presets/audit_event.rb +65 -0
- data/lib/e11y/presets/debug_event.rb +34 -0
- data/lib/e11y/presets/high_value_event.rb +51 -0
- data/lib/e11y/presets.rb +19 -0
- data/lib/e11y/railtie.rb +138 -0
- data/lib/e11y/reliability/circuit_breaker.rb +216 -0
- data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
- data/lib/e11y/reliability/dlq/filter.rb +117 -0
- data/lib/e11y/reliability/retry_handler.rb +207 -0
- data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
- data/lib/e11y/sampling/error_spike_detector.rb +225 -0
- data/lib/e11y/sampling/load_monitor.rb +161 -0
- data/lib/e11y/sampling/stratified_tracker.rb +92 -0
- data/lib/e11y/sampling/value_extractor.rb +82 -0
- data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
- data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
- data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
- data/lib/e11y/slo/event_driven.rb +150 -0
- data/lib/e11y/slo/tracker.rb +119 -0
- data/lib/e11y/version.rb +9 -0
- data/lib/e11y.rb +283 -0
- metadata +452 -0
|
@@ -0,0 +1,813 @@
|
|
|
1
|
+
# UC-001: Request-Scoped Debug Buffering
|
|
2
|
+
|
|
3
|
+
**Status:** Core Feature (MVP)
|
|
4
|
+
**Complexity:** Intermediate
|
|
5
|
+
**Setup Time:** 15-30 minutes
|
|
6
|
+
**Target Users:** DevOps, SRE, Backend Developers
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 📋 Overview
|
|
11
|
+
|
|
12
|
+
### Problem Statement
|
|
13
|
+
|
|
14
|
+
**Current Pain Points:**
|
|
15
|
+
1. **Debug logs in production = noise**
|
|
16
|
+
- 99% of requests succeed → debug logs useless
|
|
17
|
+
- Searching through millions of debug lines for 1% errors
|
|
18
|
+
- High cost (storage, indexing, querying)
|
|
19
|
+
|
|
20
|
+
2. **No debug = blind debugging**
|
|
21
|
+
- Production errors lack context
|
|
22
|
+
- Can't reproduce: "What SQL ran before error?"
|
|
23
|
+
- Need to deploy debug code → restart → wait for error → repeat
|
|
24
|
+
|
|
25
|
+
3. **Trade-off dilemma:**
|
|
26
|
+
- Enable debug → drown in logs + high costs
|
|
27
|
+
- Disable debug → can't debug production issues
|
|
28
|
+
- Current "solutions": sampling, manual toggling (not good enough)
|
|
29
|
+
|
|
30
|
+
### E11y Solution
|
|
31
|
+
|
|
32
|
+
**Request-scoped buffering (dual-buffer architecture):**
|
|
33
|
+
- **Debug events:** Buffered in thread-local storage (per-request)
|
|
34
|
+
- Happy path (99%): Buffer discarded → zero debug events sent
|
|
35
|
+
- Error path (1%): Buffer flushed → all debug context available
|
|
36
|
+
- **Other events (info/warn/error/success):** Go to main buffer → flush every 200ms
|
|
37
|
+
- **No conflict:** Two separate buffers, different flush logic
|
|
38
|
+
|
|
39
|
+
**Result:** Debug visibility when needed, zero noise when not. Fast delivery for important events.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## 🎯 Use Case Scenarios
|
|
44
|
+
|
|
45
|
+
### Scenario 1: API Request Debugging
|
|
46
|
+
|
|
47
|
+
**Context:** Rails API endpoint processes orders
|
|
48
|
+
|
|
49
|
+
**Without E11y:**
|
|
50
|
+
```ruby
|
|
51
|
+
# OrdersController
|
|
52
|
+
def create
|
|
53
|
+
Rails.logger.debug "Order validation started" # → Always logged (noise!)
|
|
54
|
+
Rails.logger.debug "Checking inventory for SKU #{params[:sku]}" # → Always logged
|
|
55
|
+
Rails.logger.debug "Inventory available: #{inventory.count}" # → Always logged
|
|
56
|
+
|
|
57
|
+
order = Order.create!(params)
|
|
58
|
+
Rails.logger.info "Order created: #{order.id}" # → Always logged
|
|
59
|
+
|
|
60
|
+
render json: order
|
|
61
|
+
rescue => e
|
|
62
|
+
Rails.logger.error "Order creation failed: #{e.message}" # → Logged
|
|
63
|
+
render json: { error: e.message }, status: 500
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# Result in logs (for 100 successful + 1 failed request):
|
|
67
|
+
# - 303 debug lines (3 per request × 101 requests) ← 297 are useless!
|
|
68
|
+
# - 101 info lines
|
|
69
|
+
# - 1 error line
|
|
70
|
+
# Total: 405 lines (74% noise)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
**With E11y:**
|
|
74
|
+
```ruby
|
|
75
|
+
# OrdersController
|
|
76
|
+
def create
|
|
77
|
+
Events::OrderValidationStarted.track(severity: :debug) # → Buffered
|
|
78
|
+
Events::InventoryCheck.track(sku: params[:sku], count: inventory.count, severity: :debug) # → Buffered
|
|
79
|
+
|
|
80
|
+
order = Order.create!(params)
|
|
81
|
+
Events::OrderCreated.track(order_id: order.id, severity: :success) # → Sent immediately
|
|
82
|
+
|
|
83
|
+
render json: order
|
|
84
|
+
rescue => e
|
|
85
|
+
# Exception triggers flush of ALL buffered debug events!
|
|
86
|
+
raise # E11y middleware catches & flushes buffer
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Result in logs (for 100 successful + 1 failed request):
|
|
90
|
+
# - 0 debug lines for 100 successful requests ← Discarded!
|
|
91
|
+
# - 2 debug lines for 1 failed request ← Flushed!
|
|
92
|
+
# - 100 success lines
|
|
93
|
+
# - 1 error line
|
|
94
|
+
# Total: 103 lines (99% noise reduction!)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
### Scenario 2: Multi-Step Business Flow
|
|
100
|
+
|
|
101
|
+
**Context:** Payment processing with multiple external API calls
|
|
102
|
+
|
|
103
|
+
**Code:**
|
|
104
|
+
```ruby
|
|
105
|
+
class ProcessPaymentJob < ApplicationJob
|
|
106
|
+
def perform(order_id)
|
|
107
|
+
order = Order.find(order_id)
|
|
108
|
+
|
|
109
|
+
# Step 1: Validate
|
|
110
|
+
Events::PaymentValidationStarted.track(order_id: order.id, severity: :debug)
|
|
111
|
+
validator = PaymentValidator.new(order)
|
|
112
|
+
validator.validate!
|
|
113
|
+
Events::PaymentValidationCompleted.track(order_id: order.id, severity: :debug)
|
|
114
|
+
|
|
115
|
+
# Step 2: Charge card (external API)
|
|
116
|
+
Events::CardChargeStarted.track(order_id: order.id, amount: order.total, severity: :debug)
|
|
117
|
+
response = StripeClient.charge(order.payment_method, order.total)
|
|
118
|
+
Events::CardChargeCompleted.track(order_id: order.id, charge_id: response.id, severity: :debug)
|
|
119
|
+
|
|
120
|
+
# Step 3: Update inventory (external API)
|
|
121
|
+
Events::InventoryUpdateStarted.track(order_id: order.id, severity: :debug)
|
|
122
|
+
InventoryService.decrement(order.line_items)
|
|
123
|
+
Events::InventoryUpdateCompleted.track(order_id: order.id, severity: :debug)
|
|
124
|
+
|
|
125
|
+
# Step 4: Success
|
|
126
|
+
Events::PaymentProcessed.track(order_id: order.id, severity: :success)
|
|
127
|
+
|
|
128
|
+
rescue PaymentValidationError => e
|
|
129
|
+
# Only 2 debug events flushed (Steps 1-2 didn't run)
|
|
130
|
+
raise
|
|
131
|
+
rescue StripeError => e
|
|
132
|
+
# 4 debug events flushed (Steps 1-2 completed, Step 2 failed)
|
|
133
|
+
raise
|
|
134
|
+
rescue InventoryError => e
|
|
135
|
+
# 6 debug events flushed (all steps before Step 3 failed)
|
|
136
|
+
raise
|
|
137
|
+
end
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
# Result:
|
|
141
|
+
# - 99 successful jobs: 0 debug events → only 99 :success events
|
|
142
|
+
# - 1 failed job (Stripe error): 4 debug events + 1 error event
|
|
143
|
+
# Total: 99 + 5 = 104 events (vs 700 without buffering)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Why This is Powerful:**
|
|
147
|
+
- Debug events show **exact step** where failure occurred
|
|
148
|
+
- No need to guess: "Did validation run? Did Stripe charge succeed?"
|
|
149
|
+
- Full context without manual instrumentation changes
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
### Scenario 3: Debugging Database N+1 Queries
|
|
154
|
+
|
|
155
|
+
**Context:** Controller action with potential N+1 queries
|
|
156
|
+
|
|
157
|
+
**Code:**
|
|
158
|
+
```ruby
|
|
159
|
+
class UsersController < ApplicationController
|
|
160
|
+
def index
|
|
161
|
+
@users = User.all
|
|
162
|
+
|
|
163
|
+
# Auto-instrumentation (via Rails Instrumentation - ASN → E11y)
|
|
164
|
+
# E11y captures all SQL queries as debug events (unidirectional flow)
|
|
165
|
+
# See: ADR-008 §4.1 for Rails Instrumentation architecture
|
|
166
|
+
|
|
167
|
+
@users.each do |user|
|
|
168
|
+
# N+1 query! Each iteration triggers SELECT from orders
|
|
169
|
+
Events::UserOrderCount.track(
|
|
170
|
+
user_id: user.id,
|
|
171
|
+
count: user.orders.count, # ← N+1 query here
|
|
172
|
+
severity: :debug
|
|
173
|
+
)
|
|
174
|
+
end
|
|
175
|
+
|
|
176
|
+
render json: @users
|
|
177
|
+
end
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
# Result (with N+1 but no error):
|
|
181
|
+
# - All SQL query debug events discarded (request succeeded)
|
|
182
|
+
# - Zero visibility into N+1 problem :(
|
|
183
|
+
|
|
184
|
+
# Solution: Force flush for slow requests
|
|
185
|
+
E11y.configure do |config|
|
|
186
|
+
config.request_scope do
|
|
187
|
+
flush_on :error # Default
|
|
188
|
+
flush_on_slow_request threshold: 500 # ms ← NEW!
|
|
189
|
+
end
|
|
190
|
+
end
|
|
191
|
+
|
|
192
|
+
# Now:
|
|
193
|
+
# - Fast requests (<500ms): debug events discarded
|
|
194
|
+
# - Slow requests (>500ms): debug events flushed
|
|
195
|
+
# Result: Automatic N+1 detection! Slow request logs show all SQL queries.
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## 🔧 Configuration
|
|
201
|
+
|
|
202
|
+
### Basic Setup (Automatic)
|
|
203
|
+
|
|
204
|
+
```ruby
|
|
205
|
+
# config/initializers/e11y.rb
|
|
206
|
+
E11y.configure do |config|
|
|
207
|
+
config.request_scope do
|
|
208
|
+
enabled true # Default: true
|
|
209
|
+
buffer_limit 100 # Max debug events per request
|
|
210
|
+
flush_on :error # Flush when exception raised
|
|
211
|
+
end
|
|
212
|
+
end
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
**That's it!** Rails middleware auto-installed by generator.
|
|
216
|
+
|
|
217
|
+
> ⚠️ **CRITICAL: Middleware Order**
|
|
218
|
+
> Request-scoped buffer middleware MUST be positioned correctly in the E11y pipeline. **If you use custom middleware**, ensure buffer routing happens **after** all business logic (validation, PII filtering, rate limiting, sampling) but **before** adapter delivery.
|
|
219
|
+
>
|
|
220
|
+
> **Why:** Buffer routing needs access to fully processed events (with trace context, validated, filtered). If positioned too early, events may be buffered before PII filtering, creating compliance risks.
|
|
221
|
+
>
|
|
222
|
+
> **Consequences of wrong order:**
|
|
223
|
+
> - ❌ Buffered debug events may contain unfiltered PII → GDPR violation
|
|
224
|
+
> - ❌ Rate-limited events may still be buffered → memory waste
|
|
225
|
+
> - ❌ Invalid events may be buffered → validation bypassed
|
|
226
|
+
>
|
|
227
|
+
> **Correct order:**
|
|
228
|
+
> ```ruby
|
|
229
|
+
> config.pipeline.use TraceContextMiddleware # 1. Enrich first
|
|
230
|
+
> config.pipeline.use ValidationMiddleware # 2. Fail fast
|
|
231
|
+
> config.pipeline.use PiiFilterMiddleware # 3. Security (BEFORE buffer)
|
|
232
|
+
> config.pipeline.use RateLimitMiddleware # 4. Protection
|
|
233
|
+
> config.pipeline.use SamplingMiddleware # 5. Cost optimization
|
|
234
|
+
> config.pipeline.use RoutingMiddleware # 6. Buffer routing (LAST!)
|
|
235
|
+
> ```
|
|
236
|
+
>
|
|
237
|
+
> **See:** [ADR-001 Section 4.1: Middleware Execution Order](../ADR-001-architecture.md#41-middleware-execution-order-critical) and [ADR-015: Middleware Order Reference](../ADR-015-middleware-order.md) for detailed explanation.
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
### Advanced Configuration
|
|
242
|
+
|
|
243
|
+
```ruby
|
|
244
|
+
E11y.configure do |config|
|
|
245
|
+
config.request_scope do
|
|
246
|
+
enabled true
|
|
247
|
+
buffer_limit 200 # Larger buffer for complex requests
|
|
248
|
+
|
|
249
|
+
# Multiple flush triggers
|
|
250
|
+
flush_on :error # On exception (default)
|
|
251
|
+
flush_on :warn # On any :warn event
|
|
252
|
+
flush_on :slow_request, threshold: 1000 # On requests >1s
|
|
253
|
+
|
|
254
|
+
# Custom flush condition
|
|
255
|
+
flush_if do |events, request|
|
|
256
|
+
# Flush if any event contains "payment" in name
|
|
257
|
+
events.any? { |e| e.name.include?('payment') }
|
|
258
|
+
end
|
|
259
|
+
|
|
260
|
+
# Exclude certain events from buffer (always send)
|
|
261
|
+
exclude_from_buffer do
|
|
262
|
+
severity [:info, :success, :warn, :error, :fatal] # Only buffer :debug
|
|
263
|
+
event_patterns ['security.*', 'audit.*'] # Never buffer security events
|
|
264
|
+
end
|
|
265
|
+
|
|
266
|
+
# Buffer overflow strategy
|
|
267
|
+
overflow_strategy :drop_oldest # or :drop_newest, :flush_immediately
|
|
268
|
+
end
|
|
269
|
+
end
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## 📊 How It Works (Technical Details)
|
|
275
|
+
|
|
276
|
+
### Dual-Buffer Architecture
|
|
277
|
+
|
|
278
|
+
**E11y использует ДВА независимых буфера:**
|
|
279
|
+
|
|
280
|
+
```
|
|
281
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
282
|
+
│ Request Thread (Rack/Rails) │
|
|
283
|
+
│ │
|
|
284
|
+
│ ┌──────────────────────────────────────┐ │
|
|
285
|
+
│ │ E11y::Middleware::Rack │ │
|
|
286
|
+
│ │ - Initialize request-scoped buffer │ │
|
|
287
|
+
│ │ - Store in Thread.current[:e11y_*] │ │
|
|
288
|
+
│ └──────────────────────────────────────┘ │
|
|
289
|
+
│ ↓ │
|
|
290
|
+
│ ┌──────────────────────────────────────┐ │
|
|
291
|
+
│ │ Controller Action │ │
|
|
292
|
+
│ │ │ │
|
|
293
|
+
│ │ Events::DebugEvent.track(...) │ │
|
|
294
|
+
│ │ ↓ │ │
|
|
295
|
+
│ │ severity == :debug? │ │
|
|
296
|
+
│ │ YES ──→ Request-Scoped Buffer ───┐ │
|
|
297
|
+
│ │ (Thread-local) │ │
|
|
298
|
+
│ │ Flush: on error/end │ │
|
|
299
|
+
│ │ │ │
|
|
300
|
+
│ │ Events::InfoEvent.track(...) │ │
|
|
301
|
+
│ │ ↓ │ │
|
|
302
|
+
│ │ severity >= :info? │ │
|
|
303
|
+
│ │ YES ──→ Main Buffer ─────────────┼──→ Flush: every 200ms │
|
|
304
|
+
│ │ (Global, SPSC) │ (Background Thread) │
|
|
305
|
+
│ └──────────────────────────────────────┘ │
|
|
306
|
+
│ ↓ │
|
|
307
|
+
│ ┌──────────────────────────────────────┐ │
|
|
308
|
+
│ │ Response / Exception │ │
|
|
309
|
+
│ │ - Success → Discard debug buffer │ │
|
|
310
|
+
│ │ - Error → Flush debug buffer │ │
|
|
311
|
+
│ └──────────────────────────────────────┘ │
|
|
312
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
313
|
+
|
|
314
|
+
Background Flush Thread (200ms interval):
|
|
315
|
+
Main Buffer → Adapters (Loki, Sentry, etc.)
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### Buffer Routing Logic
|
|
319
|
+
|
|
320
|
+
```ruby
|
|
321
|
+
# Pseudo-code для понимания
|
|
322
|
+
def track_event(event)
|
|
323
|
+
if event.severity == :debug && E11y.request_scope.active?
|
|
324
|
+
# → Request-scoped buffer (Thread-local)
|
|
325
|
+
Thread.current[:e11y_request_buffer] << event
|
|
326
|
+
else
|
|
327
|
+
# → Main buffer (Global SPSC ring buffer)
|
|
328
|
+
E11y.main_buffer << event
|
|
329
|
+
# Фоновый поток заберет через 200ms (или раньше если батч заполнится)
|
|
330
|
+
end
|
|
331
|
+
end
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
### Implementation Pseudocode
|
|
335
|
+
|
|
336
|
+
```ruby
|
|
337
|
+
# lib/e11y/middleware/rack.rb
|
|
338
|
+
class E11y::Middleware::Rack
|
|
339
|
+
def call(env)
|
|
340
|
+
# 1. Initialize request-scoped buffer
|
|
341
|
+
E11y::RequestScope.initialize_buffer!
|
|
342
|
+
|
|
343
|
+
# 2. Call application
|
|
344
|
+
status, headers, body = @app.call(env)
|
|
345
|
+
|
|
346
|
+
# 3. Success → discard buffer
|
|
347
|
+
E11y::RequestScope.discard_buffer!
|
|
348
|
+
|
|
349
|
+
[status, headers, body]
|
|
350
|
+
|
|
351
|
+
rescue => exception
|
|
352
|
+
# 4. Error → flush buffer then re-raise
|
|
353
|
+
E11y::RequestScope.flush_buffer!(severity: :error)
|
|
354
|
+
raise
|
|
355
|
+
ensure
|
|
356
|
+
# 5. Cleanup
|
|
357
|
+
E11y::RequestScope.cleanup!
|
|
358
|
+
end
|
|
359
|
+
end
|
|
360
|
+
|
|
361
|
+
# lib/e11y/request_scope.rb
|
|
362
|
+
module E11y::RequestScope
|
|
363
|
+
def self.initialize_buffer!
|
|
364
|
+
Thread.current[:e11y_buffer] = []
|
|
365
|
+
Thread.current[:e11y_request_id] = SecureRandom.uuid
|
|
366
|
+
end
|
|
367
|
+
|
|
368
|
+
def self.buffer_event(event)
|
|
369
|
+
buffer = Thread.current[:e11y_buffer]
|
|
370
|
+
return false unless buffer # Not in request scope
|
|
371
|
+
|
|
372
|
+
if event.severity == :debug
|
|
373
|
+
buffer << event
|
|
374
|
+
true # Event buffered (not sent yet)
|
|
375
|
+
else
|
|
376
|
+
false # Non-debug events sent immediately
|
|
377
|
+
end
|
|
378
|
+
end
|
|
379
|
+
|
|
380
|
+
def self.flush_buffer!(severity: :error)
|
|
381
|
+
buffer = Thread.current[:e11y_buffer]
|
|
382
|
+
return if buffer.nil? || buffer.empty?
|
|
383
|
+
|
|
384
|
+
# Flush all buffered events with specified severity
|
|
385
|
+
buffer.each do |event|
|
|
386
|
+
event.severity = severity if event.severity == :debug
|
|
387
|
+
E11y::Collector.collect(event)
|
|
388
|
+
end
|
|
389
|
+
|
|
390
|
+
buffer.clear
|
|
391
|
+
end
|
|
392
|
+
|
|
393
|
+
def self.discard_buffer!
|
|
394
|
+
Thread.current[:e11y_buffer]&.clear
|
|
395
|
+
end
|
|
396
|
+
|
|
397
|
+
def self.cleanup!
|
|
398
|
+
Thread.current[:e11y_buffer] = nil
|
|
399
|
+
Thread.current[:e11y_request_id] = nil
|
|
400
|
+
end
|
|
401
|
+
end
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
---
|
|
405
|
+
|
|
406
|
+
## 📈 Performance Impact
|
|
407
|
+
|
|
408
|
+
> **Implementation:** See [ADR-001 Section 8.3: Resource Limits](../ADR-001-architecture.md#83-resource-limits) for architectural details and [ADR-002 Section 6: Self-Monitoring](../ADR-002-metrics-yabeda.md#6-self-monitoring) for metrics implementation.
|
|
409
|
+
|
|
410
|
+
### Buffer Metrics
|
|
411
|
+
|
|
412
|
+
**E11y automatically tracks request buffer performance:**
|
|
413
|
+
|
|
414
|
+
```ruby
|
|
415
|
+
# Exposed via Yabeda (auto-configured)
|
|
416
|
+
Yabeda.e11y_request_buffer_size # Gauge: current buffer size per request
|
|
417
|
+
Yabeda.e11y_request_buffer_flushes_total # Counter: buffer flushes by trigger
|
|
418
|
+
|
|
419
|
+
# Accessible via Prometheus metrics endpoint
|
|
420
|
+
# Example queries:
|
|
421
|
+
|
|
422
|
+
# 1. Average buffer size
|
|
423
|
+
avg(e11y_request_buffer_size)
|
|
424
|
+
|
|
425
|
+
# 2. Buffer flush rate by trigger
|
|
426
|
+
rate(e11y_request_buffer_flushes_total{trigger="error"}[5m])
|
|
427
|
+
|
|
428
|
+
# 3. Buffer overflow alerts
|
|
429
|
+
e11y_request_buffer_size >= 100 # Alert if buffer limit reached
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
**Monitoring Examples:**
|
|
433
|
+
|
|
434
|
+
```ruby
|
|
435
|
+
# Grafana dashboard panels:
|
|
436
|
+
|
|
437
|
+
# Panel 1: Buffer Size Distribution
|
|
438
|
+
histogram_quantile(0.99,
|
|
439
|
+
sum(rate(e11y_request_buffer_size[5m])) by (le)
|
|
440
|
+
)
|
|
441
|
+
# Shows p99 buffer size
|
|
442
|
+
|
|
443
|
+
# Panel 2: Flush Triggers Breakdown
|
|
444
|
+
sum by (trigger) (
|
|
445
|
+
rate(e11y_request_buffer_flushes_total[5m])
|
|
446
|
+
)
|
|
447
|
+
# Shows why buffers flush (error vs. slow_request vs. custom)
|
|
448
|
+
|
|
449
|
+
# Panel 3: Memory Impact Estimate
|
|
450
|
+
avg(e11y_request_buffer_size) * 500 # bytes per event
|
|
451
|
+
# Estimates per-request memory usage
|
|
452
|
+
```
|
|
453
|
+
|
|
454
|
+
**What to Monitor:**
|
|
455
|
+
|
|
456
|
+
| Metric | Normal | Warning | Alert |
|
|
457
|
+
|--------|--------|---------|-------|
|
|
458
|
+
| **Buffer Size (p99)** | <20 events | 50-80 events | >80 events |
|
|
459
|
+
| **Flush Rate (error)** | <1% of requests | 1-5% | >5% |
|
|
460
|
+
| **Flush Rate (slow)** | <5% of requests | 5-10% | >10% |
|
|
461
|
+
| **Buffer Overflows** | 0 | >0 | >10/min |
|
|
462
|
+
|
|
463
|
+
### Memory
|
|
464
|
+
|
|
465
|
+
```ruby
|
|
466
|
+
# Per-request memory usage
|
|
467
|
+
|
|
468
|
+
# Typical request (10 debug events):
|
|
469
|
+
# - Event object: ~500 bytes
|
|
470
|
+
# - Buffer array: ~100 bytes
|
|
471
|
+
# Total: ~5KB per request
|
|
472
|
+
|
|
473
|
+
# Worst case (100 debug events, limit reached):
|
|
474
|
+
# Total: ~50KB per request
|
|
475
|
+
|
|
476
|
+
# Concurrent requests (100):
|
|
477
|
+
# - Typical: 100 × 5KB = 500KB
|
|
478
|
+
# - Worst: 100 × 50KB = 5MB
|
|
479
|
+
|
|
480
|
+
# Conclusion: Negligible memory impact (<10MB even at high load)
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
### Latency
|
|
484
|
+
|
|
485
|
+
```ruby
|
|
486
|
+
# Overhead per track() call
|
|
487
|
+
|
|
488
|
+
# Buffered event (debug):
|
|
489
|
+
# - Check Thread.current: ~1μs
|
|
490
|
+
# - Append to array: ~0.5μs
|
|
491
|
+
# Total: ~1.5μs
|
|
492
|
+
|
|
493
|
+
# Non-buffered event (info/success):
|
|
494
|
+
# - No buffering: 0μs
|
|
495
|
+
# - Send to collector: ~20μs (async, non-blocking)
|
|
496
|
+
|
|
497
|
+
# Conclusion: <2μs overhead for debug events (negligible)
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
---
|
|
501
|
+
|
|
502
|
+
## 🧪 Testing
|
|
503
|
+
|
|
504
|
+
### Test Request-Scoped Buffering
|
|
505
|
+
|
|
506
|
+
```ruby
|
|
507
|
+
# spec/requests/orders_spec.rb
|
|
508
|
+
RSpec.describe 'Orders API' do
|
|
509
|
+
it 'discards debug events on successful request' do
|
|
510
|
+
# Spy on E11y collector
|
|
511
|
+
allow(E11y::Collector).to receive(:collect)
|
|
512
|
+
|
|
513
|
+
post '/orders', params: { sku: 'ABC123' }
|
|
514
|
+
|
|
515
|
+
expect(response).to be_successful
|
|
516
|
+
|
|
517
|
+
# Verify: only :success event sent, no :debug events
|
|
518
|
+
expect(E11y::Collector).to have_received(:collect).once
|
|
519
|
+
expect(E11y::Collector).to have_received(:collect).with(
|
|
520
|
+
have_attributes(severity: :success)
|
|
521
|
+
)
|
|
522
|
+
end
|
|
523
|
+
|
|
524
|
+
it 'flushes debug events on error' do
|
|
525
|
+
allow(E11y::Collector).to receive(:collect)
|
|
526
|
+
|
|
527
|
+
# Trigger error (invalid SKU)
|
|
528
|
+
post '/orders', params: { sku: 'INVALID' }
|
|
529
|
+
|
|
530
|
+
expect(response).to have_http_status(500)
|
|
531
|
+
|
|
532
|
+
# Verify: debug events flushed
|
|
533
|
+
expect(E11y::Collector).to have_received(:collect).at_least(2).times
|
|
534
|
+
expect(E11y::Collector).to have_received(:collect).with(
|
|
535
|
+
have_attributes(severity: :debug)
|
|
536
|
+
).at_least(:once)
|
|
537
|
+
end
|
|
538
|
+
end
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## 💡 Best Practices
|
|
544
|
+
|
|
545
|
+
### ✅ DO
|
|
546
|
+
|
|
547
|
+
**1. Use :debug for diagnostic events**
|
|
548
|
+
```ruby
|
|
549
|
+
Events::SqlQuery.track(sql: query, duration: duration, severity: :debug)
|
|
550
|
+
Events::CacheHit.track(key: key, severity: :debug)
|
|
551
|
+
Events::ApiCallStarted.track(service: 'stripe', severity: :debug)
|
|
552
|
+
```
|
|
553
|
+
|
|
554
|
+
**2. Use :success for business events**
|
|
555
|
+
```ruby
|
|
556
|
+
Events::OrderPaid.track(order_id: order.id, severity: :success)
|
|
557
|
+
Events::UserRegistered.track(user_id: user.id, severity: :success)
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
**3. Set reasonable buffer limits**
|
|
561
|
+
```ruby
|
|
562
|
+
config.request_scope do
|
|
563
|
+
buffer_limit 100 # Typical: 10-50 debug events per request
|
|
564
|
+
end
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
**4. Flush on custom conditions**
|
|
568
|
+
```ruby
|
|
569
|
+
config.request_scope do
|
|
570
|
+
flush_on :slow_request, threshold: 500 # ms
|
|
571
|
+
flush_if { |events| events.any? { |e| e.name =~ /payment|security/ } }
|
|
572
|
+
end
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
---
|
|
576
|
+
|
|
577
|
+
### ❌ DON'T
|
|
578
|
+
|
|
579
|
+
**1. Don't buffer non-debug events**
|
|
580
|
+
```ruby
|
|
581
|
+
# ❌ BAD: Buffering :info events (defeats purpose)
|
|
582
|
+
Events::OrderCreated.track(order_id: order.id, severity: :info) # Should be :success
|
|
583
|
+
|
|
584
|
+
# ✅ GOOD:
|
|
585
|
+
Events::OrderCreated.track(order_id: order.id, severity: :success)
|
|
586
|
+
```
|
|
587
|
+
|
|
588
|
+
**2. Don't set buffer limits too high**
|
|
589
|
+
```ruby
|
|
590
|
+
# ❌ BAD: Huge buffer (memory risk)
|
|
591
|
+
config.request_scope { buffer_limit 10_000 }
|
|
592
|
+
|
|
593
|
+
# ✅ GOOD: Reasonable limit
|
|
594
|
+
config.request_scope { buffer_limit 100 }
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
**3. Don't buffer security events**
|
|
598
|
+
```ruby
|
|
599
|
+
# ❌ BAD: Security events must be sent immediately!
|
|
600
|
+
Events::LoginAttempt.track(user_id: user.id, severity: :debug)
|
|
601
|
+
|
|
602
|
+
# ✅ GOOD:
|
|
603
|
+
Events::LoginAttempt.track(user_id: user.id, severity: :info)
|
|
604
|
+
# OR explicitly exclude from buffer:
|
|
605
|
+
config.request_scope do
|
|
606
|
+
exclude_from_buffer { event_patterns ['security.*', 'audit.*'] }
|
|
607
|
+
end
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
---
|
|
611
|
+
|
|
612
|
+
## 🔄 Взаимодействие с Flush Interval (200ms)
|
|
613
|
+
|
|
614
|
+
### Вопрос: Не конфликтуют ли буферы?
|
|
615
|
+
|
|
616
|
+
**Ответ: НЕТ. Они независимы.**
|
|
617
|
+
|
|
618
|
+
### Детальная Логика
|
|
619
|
+
|
|
620
|
+
```ruby
|
|
621
|
+
# config/initializers/e11y.rb
|
|
622
|
+
E11y.configure do |config|
|
|
623
|
+
# === Main Buffer (Global) ===
|
|
624
|
+
config.buffer do
|
|
625
|
+
capacity 100_000 # Ring buffer size
|
|
626
|
+
flush_interval 200 # ms - for :info/:warn/:error/:success/:fatal
|
|
627
|
+
flush_batch_size 500
|
|
628
|
+
end
|
|
629
|
+
|
|
630
|
+
# === Request-Scoped Buffer (Thread-local) ===
|
|
631
|
+
config.request_scope do
|
|
632
|
+
enabled true
|
|
633
|
+
buffer_limit 100 # Per-request limit for :debug only
|
|
634
|
+
flush_on :error # Flush on exception
|
|
635
|
+
end
|
|
636
|
+
end
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
### Поток Событий
|
|
640
|
+
|
|
641
|
+
**Scenario 1: Обычный запрос (успешный)**
|
|
642
|
+
```ruby
|
|
643
|
+
# Request starts
|
|
644
|
+
Events::DebugEvent.track(...) # → Request buffer (thread-local)
|
|
645
|
+
Events::DebugEvent.track(...) # → Request buffer (thread-local)
|
|
646
|
+
Events::OrderCreated.track(severity: :success) # → Main buffer → flush in 200ms
|
|
647
|
+
Events::DebugEvent.track(...) # → Request buffer (thread-local)
|
|
648
|
+
# Request ends successfully
|
|
649
|
+
# → Request buffer DISCARDED (debug events lost)
|
|
650
|
+
# → Main buffer flushed every 200ms (success event sent)
|
|
651
|
+
```
|
|
652
|
+
|
|
653
|
+
**Scenario 2: Запрос с ошибкой**
|
|
654
|
+
```ruby
|
|
655
|
+
# Request starts
|
|
656
|
+
Events::DebugEvent.track(...) # → Request buffer
|
|
657
|
+
Events::DebugEvent.track(...) # → Request buffer
|
|
658
|
+
Events::PaymentFailed.track(severity: :error) # → Main buffer → flush in 200ms
|
|
659
|
+
Events::DebugEvent.track(...) # → Request buffer
|
|
660
|
+
# Exception raised!
|
|
661
|
+
# → Request buffer FLUSHED immediately (all 3 debug events sent)
|
|
662
|
+
# → Main buffer continues flush every 200ms (error event sent)
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
**Scenario 3: Высоконагруженный сервис**
|
|
666
|
+
```ruby
|
|
667
|
+
# 1000 requests/sec, каждый с 5 debug events
|
|
668
|
+
# → 5000 debug events/sec в request buffers (thread-local)
|
|
669
|
+
# → 99% успешных → 4950 debug events/sec DISCARDED
|
|
670
|
+
# → 1% ошибок → 50 debug events/sec FLUSHED
|
|
671
|
+
#
|
|
672
|
+
# Параллельно:
|
|
673
|
+
# → 1000 info/success events/sec → Main buffer
|
|
674
|
+
# → Flush каждые 200ms = 5 batches/sec
|
|
675
|
+
# → 200 events per batch (в среднем)
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
### Итого: Никакого Конфликта!
|
|
679
|
+
|
|
680
|
+
| Event Type | Buffer | Flush Trigger | Latency |
|
|
681
|
+
|------------|--------|---------------|---------|
|
|
682
|
+
| `:debug` | Request-scoped (Thread-local) | On error or end-of-request | 0ms (discarded) or immediate (on error) |
|
|
683
|
+
| `:info` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
684
|
+
| `:success` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
685
|
+
| `:warn` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
686
|
+
| `:error` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
687
|
+
| `:fatal` | Main buffer (Global SPSC) | Every 200ms (background thread) | <200ms |
|
|
688
|
+
|
|
689
|
+
**Преимущества двойного буфера:**
|
|
690
|
+
1. ✅ Debug события не засоряют main buffer
|
|
691
|
+
2. ✅ Важные события (info+) идут быстро (200ms)
|
|
692
|
+
3. ✅ Debug события идут мгновенно при ошибке (flush triggered)
|
|
693
|
+
4. ✅ 99% debug событий вообще не обрабатываются (discard = zero cost)
|
|
694
|
+
5. ✅ Thread-safety: request buffer изолирован в Thread.current
|
|
695
|
+
|
|
696
|
+
### Визуальная Диаграмма
|
|
697
|
+
|
|
698
|
+
```
|
|
699
|
+
Time: ──────────────────────────────────────────────────>
|
|
700
|
+
0ms 200ms 400ms 600ms 800ms 1000ms
|
|
701
|
+
|
|
702
|
+
Request Thread 1 (success):
|
|
703
|
+
┌─────────────────────┐
|
|
704
|
+
│ :debug → [Req Buf] │ ← Discarded at end
|
|
705
|
+
│ :debug → [Req Buf] │ ← Discarded at end
|
|
706
|
+
│ :success → [Main] │ ─┐
|
|
707
|
+
└─────────────────────┘ │
|
|
708
|
+
│
|
|
709
|
+
Request Thread 2 (error): │
|
|
710
|
+
┌────────────────────┐│
|
|
711
|
+
│ :debug → [Req Buf] ││ ← Flushed on error!
|
|
712
|
+
│ :error → [Main] ││ ─┐
|
|
713
|
+
│ EXCEPTION! ││ │
|
|
714
|
+
│ Flush req buffer ──┼┼──┼──→ Adapters
|
|
715
|
+
└────────────────────┘│ │
|
|
716
|
+
│ │
|
|
717
|
+
Background Flush Thread: │ │
|
|
718
|
+
Every 200ms: ────────────┴──┴──→ Adapters
|
|
719
|
+
↑ ↑
|
|
720
|
+
200ms 400ms
|
|
721
|
+
```
|
|
722
|
+
|
|
723
|
+
### Пример с Цифрами
|
|
724
|
+
|
|
725
|
+
**Нагрузка:**
|
|
726
|
+
- 100 requests/sec
|
|
727
|
+
- Каждый запрос: 3 debug события + 1 success событие
|
|
728
|
+
- Error rate: 1%
|
|
729
|
+
|
|
730
|
+
**Что происходит:**
|
|
731
|
+
|
|
732
|
+
| Time | Request Buffer (Thread-local) | Main Buffer (Global) | Flush |
|
|
733
|
+
|------|------------------------------|---------------------|-------|
|
|
734
|
+
| 0ms | Req1: [D, D, D] | [S1] | - |
|
|
735
|
+
| 10ms | Req2: [D, D, D] | [S1, S2] | - |
|
|
736
|
+
| 20ms | Req3: [D, D, D] | [S1, S2, S3] | - |
|
|
737
|
+
| ... | ... | ... | - |
|
|
738
|
+
| 200ms | Req20: [D, D, D] | [S1...S20] | **Flush 20 success events** |
|
|
739
|
+
| 210ms | Req21: [D, D, D] ERROR! | [S21, E21, **D, D, D from Req21**] | **Immediate flush debug** |
|
|
740
|
+
| 400ms | - | [S21...S40] | **Flush next batch** |
|
|
741
|
+
|
|
742
|
+
**Результат:**
|
|
743
|
+
- Success events: ~100/sec → flush каждые 200ms → latency <200ms ✅
|
|
744
|
+
- Debug events (99%): DISCARDED → zero overhead ✅
|
|
745
|
+
- Debug events (1% errors): flushed IMMEDIATELY with error context ✅
|
|
746
|
+
|
|
747
|
+
---
|
|
748
|
+
|
|
749
|
+
## 🎯 Success Metrics
|
|
750
|
+
|
|
751
|
+
### Quantifiable Benefits
|
|
752
|
+
|
|
753
|
+
**1. Log Volume Reduction**
|
|
754
|
+
- Before: 1M debug lines/day
|
|
755
|
+
- After: 10K debug lines/day (only errors)
|
|
756
|
+
- **Reduction: 99%**
|
|
757
|
+
|
|
758
|
+
**2. Storage Cost Savings**
|
|
759
|
+
- Before: $500/month (ELK ingestion)
|
|
760
|
+
- After: $50/month
|
|
761
|
+
- **Savings: $450/month (90%)**
|
|
762
|
+
|
|
763
|
+
**3. Query Performance**
|
|
764
|
+
- Before: "Search last 1M lines" = 30 seconds
|
|
765
|
+
- After: "Search last 10K lines" = 0.5 seconds
|
|
766
|
+
- **Speedup: 60x**
|
|
767
|
+
|
|
768
|
+
**4. Debugging Efficiency**
|
|
769
|
+
- Before: "Guess what happened before error" = 30 minutes
|
|
770
|
+
- After: "See full context in logs" = 2 minutes
|
|
771
|
+
- **Time saved: 28 minutes per incident**
|
|
772
|
+
|
|
773
|
+
---
|
|
774
|
+
|
|
775
|
+
## 🚀 Migration Guide
|
|
776
|
+
|
|
777
|
+
### From Rails.logger (No Buffering)
|
|
778
|
+
|
|
779
|
+
**Before:**
|
|
780
|
+
```ruby
|
|
781
|
+
def create
|
|
782
|
+
Rails.logger.debug "Starting order creation"
|
|
783
|
+
# ... logic ...
|
|
784
|
+
Rails.logger.info "Order created: #{order.id}"
|
|
785
|
+
end
|
|
786
|
+
|
|
787
|
+
# Problem: Always logs debug (even on success)
|
|
788
|
+
```
|
|
789
|
+
|
|
790
|
+
**After:**
|
|
791
|
+
```ruby
|
|
792
|
+
def create
|
|
793
|
+
Events::OrderCreationStarted.track(severity: :debug)
|
|
794
|
+
# ... logic ...
|
|
795
|
+
Events::OrderCreated.track(order_id: order.id, severity: :success)
|
|
796
|
+
end
|
|
797
|
+
|
|
798
|
+
# Solution: Debug events buffered, only flushed on error
|
|
799
|
+
```
|
|
800
|
+
|
|
801
|
+
---
|
|
802
|
+
|
|
803
|
+
## 📚 Related Use Cases
|
|
804
|
+
|
|
805
|
+
- **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Define structured events
|
|
806
|
+
- **[UC-010: Background Job Tracking](./UC-010-background-job-tracking.md)** - Buffering in Sidekiq/ActiveJob
|
|
807
|
+
- **[UC-015: Local Development](./UC-015-local-development.md)** - Test buffering locally
|
|
808
|
+
|
|
809
|
+
---
|
|
810
|
+
|
|
811
|
+
**Document Version:** 1.0
|
|
812
|
+
**Last Updated:** January 12, 2026
|
|
813
|
+
**Status:** ✅ Complete
|