e11y 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (230) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +130 -10
  3. data/CHANGELOG.md +56 -1
  4. data/CLAUDE.md +168 -0
  5. data/CONTRIBUTING.md +640 -0
  6. data/README.md +134 -702
  7. data/RELEASE.md +18 -3
  8. data/Rakefile +108 -29
  9. data/config/README.md +1 -1
  10. data/config/loki-local-config.yaml +12 -0
  11. data/config/otel-collector-config.yaml +44 -0
  12. data/cucumber.yml +1 -0
  13. data/docker-compose.yml +18 -2
  14. data/docs/ADAPTERS.md +76 -0
  15. data/docs/ADAPTIVE_SAMPLING.md +59 -0
  16. data/docs/COMPARISON.md +104 -0
  17. data/docs/CONFIGURATION.md +52 -0
  18. data/docs/DISTRIBUTED_TRACING.md +44 -0
  19. data/docs/LIMITATIONS.md +13 -0
  20. data/docs/METRICS_DSL.md +84 -0
  21. data/docs/PERFORMANCE.md +60 -0
  22. data/docs/PII_FILTERING.md +40 -0
  23. data/docs/PRESETS.md +65 -0
  24. data/docs/QUICK-START.md +546 -587
  25. data/docs/RAILS_INTEGRATION.md +29 -0
  26. data/docs/SCHEMA_VALIDATION.md +63 -0
  27. data/docs/SLO-PROMQL-ALERTS.md +161 -0
  28. data/docs/TESTING.md +69 -0
  29. data/docs/{ADR-001-architecture.md → architecture/ADR-001-architecture.md} +35 -64
  30. data/docs/{ADR-002-metrics-yabeda.md → architecture/ADR-002-metrics-yabeda.md} +62 -236
  31. data/docs/{ADR-003-slo-observability.md → architecture/ADR-003-slo-observability.md} +27 -466
  32. data/docs/{ADR-004-adapter-architecture.md → architecture/ADR-004-adapter-architecture.md} +163 -146
  33. data/docs/{ADR-005-tracing-context.md → architecture/ADR-005-tracing-context.md} +10 -9
  34. data/docs/{ADR-006-security-compliance.md → architecture/ADR-006-security-compliance.md} +184 -191
  35. data/docs/{ADR-007-opentelemetry-integration.md → architecture/ADR-007-opentelemetry-integration.md} +3 -21
  36. data/docs/{ADR-008-rails-integration.md → architecture/ADR-008-rails-integration.md} +209 -339
  37. data/docs/{ADR-009-cost-optimization.md → architecture/ADR-009-cost-optimization.md} +45 -54
  38. data/docs/architecture/ADR-010-developer-experience.md +522 -0
  39. data/docs/{ADR-011-testing-strategy.md → architecture/ADR-011-testing-strategy.md} +41 -83
  40. data/docs/{ADR-013-reliability-error-handling.md → architecture/ADR-013-reliability-error-handling.md} +37 -12
  41. data/docs/{ADR-014-event-driven-slo.md → architecture/ADR-014-event-driven-slo.md} +12 -24
  42. data/docs/{ADR-015-middleware-order.md → architecture/ADR-015-middleware-order.md} +23 -41
  43. data/docs/{ADR-016-self-monitoring-slo.md → architecture/ADR-016-self-monitoring-slo.md} +52 -349
  44. data/docs/{ADR-017-multi-rails-compatibility.md → architecture/ADR-017-multi-rails-compatibility.md} +4 -11
  45. data/docs/architecture/ADR-018-memory-optimization.md +366 -0
  46. data/docs/{ADR-INDEX.md → architecture/ADR-INDEX.md} +11 -6
  47. data/docs/{00-ICP-AND-TIMELINE.md → prd/00-ICP-AND-TIMELINE.md} +6 -6
  48. data/docs/{01-SCALE-REQUIREMENTS.md → prd/01-SCALE-REQUIREMENTS.md} +6 -6
  49. data/docs/prd/01-overview-vision.md +19 -14
  50. data/docs/use_cases/README.md +22 -23
  51. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +50 -44
  52. data/docs/use_cases/UC-002-business-event-tracking.md +26 -95
  53. data/docs/use_cases/UC-003-event-metrics.md +66 -0
  54. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +42 -101
  55. data/docs/use_cases/UC-005-sentry-integration.md +13 -15
  56. data/docs/use_cases/UC-006-trace-context-management.md +30 -28
  57. data/docs/use_cases/UC-007-pii-filtering.md +35 -87
  58. data/docs/use_cases/UC-008-opentelemetry-integration.md +51 -89
  59. data/docs/use_cases/UC-009-multi-service-tracing.md +4 -4
  60. data/docs/use_cases/UC-010-background-job-tracking.md +5 -5
  61. data/docs/use_cases/UC-011-rate-limiting.md +95 -168
  62. data/docs/use_cases/UC-012-audit-trail.md +21 -46
  63. data/docs/use_cases/UC-013-high-cardinality-protection.md +29 -167
  64. data/docs/use_cases/UC-014-adaptive-sampling.md +2 -2
  65. data/docs/use_cases/UC-015-cost-optimization.md +46 -99
  66. data/docs/use_cases/UC-016-rails-logger-migration.md +39 -213
  67. data/docs/use_cases/UC-017-local-development.md +203 -777
  68. data/docs/use_cases/UC-018-testing-events.md +3 -3
  69. data/docs/use_cases/UC-019-retention-based-routing.md +53 -106
  70. data/docs/use_cases/UC-020-event-versioning.md +8 -9
  71. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +18 -22
  72. data/docs/use_cases/UC-022-event-registry.md +15 -21
  73. data/docs/use_cases/backlog.md +119 -87
  74. data/e11y.gemspec +2 -2
  75. data/gems/e11y-devtools/README.md +136 -0
  76. data/gems/e11y-devtools/config/routes.rb +8 -0
  77. data/gems/e11y-devtools/e11y-devtools.gemspec +25 -0
  78. data/gems/e11y-devtools/exe/e11y +34 -0
  79. data/gems/e11y-devtools/lib/e11y/devtools/mcp/server.rb +96 -0
  80. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tool_base.rb +25 -0
  81. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/clear.rb +31 -0
  82. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/errors.rb +35 -0
  83. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/event_detail.rb +33 -0
  84. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/events_by_trace.rb +33 -0
  85. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/interactions.rb +40 -0
  86. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/recent_events.rb +34 -0
  87. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/search.rb +34 -0
  88. data/gems/e11y-devtools/lib/e11y/devtools/mcp/tools/stats.rb +30 -0
  89. data/gems/e11y-devtools/lib/e11y/devtools/overlay/assets/overlay.js +115 -0
  90. data/gems/e11y-devtools/lib/e11y/devtools/overlay/controller.rb +54 -0
  91. data/gems/e11y-devtools/lib/e11y/devtools/overlay/engine.rb +26 -0
  92. data/gems/e11y-devtools/lib/e11y/devtools/overlay/middleware.rb +80 -0
  93. data/gems/e11y-devtools/lib/e11y/devtools/overlay/rails_controller.rb +42 -0
  94. data/gems/e11y-devtools/lib/e11y/devtools/tui/app.rb +262 -0
  95. data/gems/e11y-devtools/lib/e11y/devtools/tui/grouping.rb +66 -0
  96. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_detail.rb +62 -0
  97. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/event_list.rb +70 -0
  98. data/gems/e11y-devtools/lib/e11y/devtools/tui/widgets/interaction_list.rb +47 -0
  99. data/gems/e11y-devtools/lib/e11y/devtools/version.rb +8 -0
  100. data/gems/e11y-devtools/lib/e11y/devtools.rb +13 -0
  101. data/gems/e11y-devtools/spec/e11y/devtools/mcp/tools_spec.rb +107 -0
  102. data/gems/e11y-devtools/spec/e11y/devtools/overlay/controller_spec.rb +58 -0
  103. data/gems/e11y-devtools/spec/e11y/devtools/overlay/middleware_spec.rb +46 -0
  104. data/gems/e11y-devtools/spec/e11y/devtools/tui/app_spec.rb +85 -0
  105. data/gems/e11y-devtools/spec/e11y/devtools/tui/grouping_spec.rb +64 -0
  106. data/gems/e11y-devtools/spec/spec_helper.rb +5 -0
  107. data/gems/e11y-devtools/spec/tui/widgets/event_list_spec.rb +44 -0
  108. data/gems/e11y-devtools/spec/tui/widgets/interaction_list_spec.rb +62 -0
  109. data/lib/e11y/adapters/audit_encrypted.rb +53 -11
  110. data/lib/e11y/adapters/base.rb +33 -34
  111. data/lib/e11y/adapters/dev_log/file_store.rb +143 -0
  112. data/lib/e11y/adapters/dev_log/query.rb +219 -0
  113. data/lib/e11y/adapters/dev_log.rb +118 -0
  114. data/lib/e11y/adapters/file.rb +3 -6
  115. data/lib/e11y/adapters/in_memory.rb +52 -5
  116. data/lib/e11y/adapters/in_memory_test.rb +29 -0
  117. data/lib/e11y/adapters/loki.rb +58 -23
  118. data/lib/e11y/adapters/null.rb +82 -0
  119. data/lib/e11y/adapters/opentelemetry_collector.rb +183 -0
  120. data/lib/e11y/adapters/otel_logs.rb +136 -23
  121. data/lib/e11y/adapters/sentry.rb +4 -7
  122. data/lib/e11y/adapters/stdout.rb +73 -7
  123. data/lib/e11y/adapters/yabeda.rb +153 -29
  124. data/lib/e11y/buffers/adaptive_buffer.rb +3 -17
  125. data/lib/e11y/buffers/{request_scoped_buffer.rb → ephemeral_buffer.rb} +72 -58
  126. data/lib/e11y/buffers/ring_buffer.rb +3 -16
  127. data/lib/e11y/configuration.rb +272 -0
  128. data/lib/e11y/console.rb +10 -17
  129. data/lib/e11y/current.rb +53 -1
  130. data/lib/e11y/debug/pipeline_inspector.rb +96 -0
  131. data/lib/e11y/documentation/generator.rb +48 -0
  132. data/lib/e11y/event/base.rb +176 -82
  133. data/lib/e11y/event/value_sampling_config.rb +1 -5
  134. data/lib/e11y/events/rails/database/query.rb +1 -4
  135. data/lib/e11y/events/rails/job/failed.rb +2 -0
  136. data/lib/e11y/instruments/active_job.rb +46 -12
  137. data/lib/e11y/instruments/rails_instrumentation.rb +49 -24
  138. data/lib/e11y/instruments/sidekiq.rb +137 -31
  139. data/lib/e11y/linters/base.rb +11 -0
  140. data/lib/e11y/linters/pii/pii_declaration_linter.rb +120 -0
  141. data/lib/e11y/linters/slo/config_consistency_linter.rb +76 -0
  142. data/lib/e11y/linters/slo/explicit_declaration_linter.rb +36 -0
  143. data/lib/e11y/linters/slo/slo_status_from_linter.rb +41 -0
  144. data/lib/e11y/logger/bridge.rb +26 -7
  145. data/lib/e11y/metrics/cardinality_protection.rb +10 -15
  146. data/lib/e11y/metrics/cardinality_tracker.rb +16 -6
  147. data/lib/e11y/metrics/registry.rb +3 -5
  148. data/lib/e11y/metrics/test_backend.rb +62 -0
  149. data/lib/e11y/metrics.rb +56 -10
  150. data/lib/e11y/middleware/adapter_resolver.rb +40 -0
  151. data/lib/e11y/middleware/audit_signing.rb +43 -6
  152. data/lib/e11y/middleware/baggage_protection.rb +75 -0
  153. data/lib/e11y/middleware/dev_log_source.rb +24 -0
  154. data/lib/e11y/middleware/event_slo.rb +23 -9
  155. data/lib/e11y/middleware/otel_span.rb +23 -0
  156. data/lib/e11y/middleware/pii_filter.rb +104 -75
  157. data/lib/e11y/middleware/rate_limiting.rb +54 -27
  158. data/lib/e11y/middleware/request.rb +70 -23
  159. data/lib/e11y/middleware/routing.rb +78 -21
  160. data/lib/e11y/middleware/sampling.rb +66 -17
  161. data/lib/e11y/middleware/self_monitoring_emit.rb +39 -0
  162. data/lib/e11y/middleware/trace_context.rb +45 -10
  163. data/lib/e11y/middleware/track_latency.rb +34 -0
  164. data/lib/e11y/middleware/validation.rb +7 -16
  165. data/lib/e11y/middleware/versioning.rb +26 -22
  166. data/lib/e11y/opentelemetry/semantic_conventions.rb +109 -0
  167. data/lib/e11y/opentelemetry/span_creator.rb +142 -0
  168. data/lib/e11y/pii/patterns.rb +12 -1
  169. data/lib/e11y/pipeline/builder.rb +1 -1
  170. data/lib/e11y/presets/audit_event.rb +13 -2
  171. data/lib/e11y/railtie.rb +52 -15
  172. data/lib/e11y/registry.rb +306 -0
  173. data/lib/e11y/reliability/circuit_breaker.rb +19 -21
  174. data/lib/e11y/reliability/dlq/base.rb +71 -0
  175. data/lib/e11y/reliability/dlq/file_adapter.rb +301 -0
  176. data/lib/e11y/reliability/dlq/file_storage.rb +63 -34
  177. data/lib/e11y/reliability/dlq/filter.rb +37 -54
  178. data/lib/e11y/reliability/retry_handler.rb +26 -29
  179. data/lib/e11y/reliability/retry_rate_limiter.rb +3 -11
  180. data/lib/e11y/sampling/error_spike_detector.rb +0 -2
  181. data/lib/e11y/sampling/load_monitor.rb +5 -9
  182. data/lib/e11y/sampling/stratified_tracker.rb +18 -0
  183. data/lib/e11y/self_monitoring/buffer_monitor.rb +2 -0
  184. data/lib/e11y/self_monitoring/performance_monitor.rb +19 -61
  185. data/lib/e11y/self_monitoring/reliability_monitor.rb +4 -74
  186. data/lib/e11y/slo/config_loader.rb +40 -0
  187. data/lib/e11y/slo/config_validator.rb +58 -0
  188. data/lib/e11y/slo/dashboard_generator.rb +122 -0
  189. data/lib/e11y/slo/event_driven.rb +8 -0
  190. data/lib/e11y/slo/tracker.rb +31 -4
  191. data/lib/e11y/testing/have_tracked_event_matcher.rb +190 -0
  192. data/lib/e11y/testing/rspec_matchers.rb +21 -0
  193. data/lib/e11y/testing/snapshot_matcher.rb +86 -0
  194. data/lib/e11y/trace_context/sampler.rb +35 -0
  195. data/lib/e11y/tracing/faraday_middleware.rb +31 -0
  196. data/lib/e11y/tracing/net_http_patch.rb +33 -0
  197. data/lib/e11y/tracing/propagator.rb +116 -0
  198. data/lib/e11y/tracing.rb +47 -0
  199. data/lib/e11y/version.rb +1 -1
  200. data/lib/e11y/versioning/version_extractor.rb +32 -0
  201. data/lib/e11y.rb +141 -265
  202. data/lib/generators/e11y/event/event_generator.rb +22 -0
  203. data/lib/generators/e11y/event/templates/event.rb.tt +16 -0
  204. data/lib/generators/e11y/grafana_dashboard/grafana_dashboard_generator.rb +30 -0
  205. data/lib/generators/e11y/grafana_dashboard/templates/e11y_dashboard.json +81 -0
  206. data/lib/generators/e11y/install/install_generator.rb +34 -0
  207. data/lib/generators/e11y/install/templates/e11y.rb +239 -0
  208. data/lib/generators/e11y/prometheus_alerts/prometheus_alerts_generator.rb +29 -0
  209. data/lib/generators/e11y/prometheus_alerts/templates/e11y_alerts.yml +28 -0
  210. data/lib/tasks/e11y_docs.rake +30 -0
  211. data/lib/tasks/e11y_events.rake +71 -0
  212. data/lib/tasks/e11y_lint.rake +91 -0
  213. data/lib/tasks/e11y_slo.rake +29 -0
  214. metadata +129 -39
  215. data/docs/ADR-010-developer-experience.md +0 -2166
  216. data/docs/API-REFERENCE-L28.md +0 -914
  217. data/docs/COMPREHENSIVE-CONFIGURATION.md +0 -2366
  218. data/docs/CONTRIBUTING.md +0 -312
  219. data/docs/IMPLEMENTATION_NOTES.md +0 -2804
  220. data/docs/IMPLEMENTATION_PLAN.md +0 -1971
  221. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +0 -586
  222. data/docs/PLAN.md +0 -148
  223. data/docs/README.md +0 -296
  224. data/docs/design/00-memory-optimization.md +0 -593
  225. data/docs/guides/MIGRATION-L27-L28.md +0 -692
  226. data/docs/guides/PERFORMANCE-BENCHMARKS.md +0 -434
  227. data/docs/guides/README.md +0 -44
  228. data/docs/use_cases/UC-003-pattern-based-metrics.md +0 -1627
  229. data/lib/e11y/adapters/registry.rb +0 -141
  230. /data/docs/{ADR-012-event-evolution.md → architecture/ADR-012-event-evolution.md} +0 -0
@@ -381,7 +381,7 @@ end
381
381
 
382
382
  ### Layer 4: DLQ Filter Integration (C02 Resolution) ⚠️
383
383
 
384
- > **Reference:** See [ADR-013 §4.6: Rate Limiting × DLQ Filter](../ADR-013-reliability-error-handling.md#46-rate-limiting--dlq-filter-interaction-c02-resolution) for full architecture.
384
+ > **Reference:** See [ADR-013 §4.6: Rate Limiting × DLQ Filter](../architecture/ADR-013-reliability-error-handling.md#46-rate-limiting--dlq-filter-interaction-c02-resolution) for full architecture.
385
385
 
386
386
  **Problem:** Rate limiting drops events BEFORE they reach DLQ filter. Critical events (e.g., payments) may be lost during traffic spikes, even though DLQ filter says "always save payments".
387
387
 
@@ -472,7 +472,7 @@ end
472
472
 
473
473
  ### Layer 5: Retry Rate Limiting (C06 Resolution) ⚠️
474
474
 
475
- > **Reference:** See [ADR-013 §3.5: Retry Rate Limiting](../ADR-013-reliability-error-handling.md#35-retry-rate-limiting-c06-resolution) for full architecture.
475
+ > **Reference:** See [ADR-013 §3.5: Retry Rate Limiting](../architecture/ADR-013-reliability-error-handling.md#35-retry-rate-limiting-c06-resolution) for full architecture.
476
476
 
477
477
  **Problem:** Adapter failures trigger retries. If 1000 events fail → 3000 retry attempts (thundering herd) → buffer overflow.
478
478
 
@@ -732,130 +732,85 @@ end
732
732
 
733
733
  ---
734
734
 
735
- ## 📊 Implementation with Redis
735
+ ## 📊 Implementation: In-Memory Token Bucket
736
736
 
737
- **Production-ready implementation using Redis:**
737
+ **Current implementation uses in-memory token bucket algorithm (no Redis dependency):**
738
738
 
739
739
  ```ruby
740
- # lib/e11y/processing/rate_limiter.rb
740
+ # lib/e11y/middleware/rate_limiting.rb
741
741
  module E11y
742
- module Processing
743
- class RateLimiter
744
- def initialize(redis: Redis.new)
745
- @redis = redis
746
- @config = E11y.config.rate_limiting
747
- end
748
-
749
- def allowed?(event)
750
- # Check bypass rules first
751
- return true if bypassed?(event)
752
-
753
- # Check global limit
754
- return false unless check_global_limit(event)
755
-
756
- # Check per-event limit
757
- return false unless check_per_event_limit(event)
758
-
759
- # Check per-context limits
760
- return false unless check_per_context_limits(event)
761
-
762
- true
763
- end
764
-
765
- private
766
-
767
- def check_global_limit(event)
768
- key = 'e11y:rate_limit:global'
769
- limit = @config.global_limit
770
- window = @config.global_window
771
-
772
- check_limit(key, limit, window)
773
- end
774
-
775
- def check_per_event_limit(event)
776
- limit_config = @config.per_event_limits[event.event_name]
777
- return true unless limit_config
778
-
779
- key = "e11y:rate_limit:event:#{event.event_name}"
780
- check_limit(key, limit_config[:limit], limit_config[:window])
781
- end
782
-
783
- def check_per_context_limits(event)
784
- @config.per_context_limits.all? do |field, limit_config|
785
- value = extract_context_value(event, field, limit_config[:extractor])
786
- next true unless value
787
-
788
- key = "e11y:rate_limit:context:#{field}:#{value}"
789
- check_limit(key, limit_config[:limit], limit_config[:window])
742
+ module Middleware
743
+ class RateLimiting < Base
744
+ def initialize(app, global_limit: 10_000, per_event_limit: 1_000, window: 1.0)
745
+ super(app)
746
+ @global_limit = global_limit
747
+ @per_event_limit = per_event_limit
748
+ @window = window
749
+
750
+ # Token buckets for rate limiting (in-memory)
751
+ @global_bucket = TokenBucket.new(
752
+ capacity: @global_limit,
753
+ refill_rate: @global_limit,
754
+ window: @window
755
+ )
756
+ @per_event_buckets = Hash.new do |hash, event_name|
757
+ hash[event_name] = TokenBucket.new(
758
+ capacity: @per_event_limit,
759
+ refill_rate: @per_event_limit,
760
+ window: @window
761
+ )
790
762
  end
763
+
764
+ @mutex = Mutex.new
791
765
  end
792
-
793
- def check_limit(key, limit, window)
794
- # Sliding window counter using Redis sorted sets
795
- now = Time.now.to_f
796
- window_start = now - window
797
-
798
- # Remove old entries (outside window)
799
- @redis.zremrangebyscore(key, 0, window_start)
800
-
801
- # Count current entries
802
- current_count = @redis.zcard(key)
803
-
804
- if current_count < limit
805
- # Add new entry
806
- @redis.zadd(key, now, "#{now}-#{SecureRandom.hex(8)}")
807
- @redis.expire(key, window.to_i + 60) # TTL = window + buffer
808
- true
809
- else
810
- # Limit exceeded
811
- handle_exceeded(key, current_count, limit)
812
- false
766
+
767
+ def call(event_data)
768
+ event_name = event_data[:event_name]
769
+
770
+ # Check global rate limit
771
+ unless @global_bucket.allow?
772
+ handle_rate_limited(event_data, :global)
773
+ return nil
813
774
  end
814
- end
815
-
816
- def handle_exceeded(key, current, limit)
817
- # Track metric
818
- Yabeda.e11y_internal.rate_limit_hits_total.increment(
819
- limit_type: extract_limit_type(key),
820
- key: key
821
- )
822
-
823
- # Log warning
824
- E11y.logger.warn(
825
- "[E11y] Rate limit exceeded: #{key} (#{current}/#{limit})"
826
- )
827
-
828
- # Alert if configured
829
- if @config.alert_on_limit
830
- alert_rate_limit_exceeded(key, current, limit)
775
+
776
+ # Check per-event rate limit
777
+ per_event_bucket = @mutex.synchronize { @per_event_buckets[event_name] }
778
+ unless per_event_bucket.allow?
779
+ handle_rate_limited(event_data, :per_event)
780
+ return nil
831
781
  end
782
+
783
+ # Rate limit not exceeded - continue pipeline
784
+ event_data
832
785
  end
833
-
834
- def bypassed?(event)
835
- # Check bypass rules
836
- @config.bypass_rules.any? do |rule|
837
- case rule[:type]
838
- when :event_types
839
- rule[:values].include?(event.event_name)
840
- when :severities
841
- rule[:values].include?(event.severity)
842
- when :contexts
843
- rule[:values].all? { |k, v| event.context[k] == v }
844
- when :custom
845
- rule[:condition].call(event)
846
- end
847
- end
786
+
787
+ private
788
+
789
+ def handle_rate_limited(event_data, limit_type)
790
+ # C02 Resolution: Check if event should be saved to DLQ
791
+ return unless should_save_to_dlq?(event_data)
792
+
793
+ save_to_dlq(event_data, limit_type)
848
794
  end
849
795
  end
850
796
  end
851
797
  end
852
798
  ```
853
799
 
800
+ **Why In-Memory Token Bucket?**
801
+ - ✅ **Fast:** No network latency (O(1) operations)
802
+ - ✅ **Simple:** No external dependencies (Redis not required)
803
+ - ✅ **Thread-safe:** Mutex-protected token buckets
804
+ - ✅ **Smooth rate limiting:** Token bucket avoids bursty behavior
805
+ - ⚠️ **Trade-off:** Per-process limits (not shared across instances)
806
+
807
+ **Note:** In-memory rate limiting is sufficient for most use cases. Each application process maintains its own rate limits, which is appropriate for event tracking workloads.
808
+
854
809
  ---
855
810
 
856
811
  ## 🔧 Implementation Details
857
812
 
858
- > **Implementation:** See [ADR-006 Section 4.0: Rate Limiting + Retry Policy Resolution](../ADR-006-security-compliance.md#40-rate-limiting--retry-policy-resolution-conflict-14) for detailed architecture.
813
+ > **Implementation:** See [ADR-006 Section 4.0: Rate Limiting + Retry Policy Resolution](../architecture/ADR-006-security-compliance.md#40-rate-limiting--retry-policy-resolution-conflict-14) for detailed architecture.
859
814
 
860
815
  ### Middleware Flow
861
816
 
@@ -980,53 +935,27 @@ end
980
935
 
981
936
  ---
982
937
 
983
- ### Redis-Based Rate Limiting
938
+ ### Token Bucket Algorithm
984
939
 
985
- E11y uses **Redis sorted sets** for distributed rate limiting across multiple application instances.
940
+ E11y uses **in-memory token bucket algorithm** for rate limiting.
986
941
 
987
- **Algorithm: Sliding Window Counter**
942
+ **How Token Bucket Works:**
943
+ - Each bucket has a **capacity** (max tokens) and **refill rate** (tokens per second)
944
+ - When event arrives: Check if token available → consume token if yes
945
+ - Tokens refill continuously based on elapsed time
946
+ - Allows burst traffic up to capacity, then smooth rate limiting
988
947
 
989
- ```ruby
990
- def check_limit(key, limit, window)
991
- now = Time.now.to_f
992
- window_start = now - window
993
-
994
- # 1. Remove expired entries (outside window)
995
- redis.zremrangebyscore(key, 0, window_start)
996
-
997
- # 2. Count current entries
998
- current_count = redis.zcard(key)
999
-
1000
- # 3. Check limit
1001
- if current_count < limit
1002
- # Add new entry (score = timestamp, member = unique ID)
1003
- redis.zadd(key, now, "#{now}-#{SecureRandom.hex(8)}")
1004
- redis.expire(key, window.to_i + 60) # TTL cleanup
1005
- true # Allowed
1006
- else
1007
- false # Rate limited
1008
- end
1009
- end
1010
- ```
948
+ **Why Token Bucket?**
949
+ - **Smooth rate limiting:** Avoids bursty behavior
950
+ - **Burst support:** Allows traffic spikes up to capacity
951
+ - **Fast:** O(1) operations (no external dependencies)
952
+ - ✅ **Industry standard:** Used by Nginx, AWS API Gateway, Google Cloud
953
+ - ⚠️ **Per-process limits:** Each application instance has separate limits
1011
954
 
1012
- **Why Sorted Sets?**
1013
- - **Sliding window:** Accurate counting (no edge cases like fixed window)
1014
- - **Distributed:** Works across multiple app instances
1015
- - **Efficient:** O(log N) for add/remove operations
1016
- - ✅ **Automatic cleanup:** Redis TTL handles old entries
1017
-
1018
- **Redis Keys:**
1019
- ```ruby
1020
- # Global limit
1021
- "e11y:rate_limit:global"
1022
-
1023
- # Per-event limit
1024
- "e11y:rate_limit:event:payment.retry"
1025
-
1026
- # Per-context limit
1027
- "e11y:rate_limit:context:user_id:user-123"
1028
- "e11y:rate_limit:context:ip_address:192.168.1.100"
1029
- ```
955
+ **Memory Usage:**
956
+ - Global bucket: ~100 bytes (single TokenBucket instance)
957
+ - Per-event buckets: ~100 bytes per unique event type (lazy initialization)
958
+ - Example: 100 event types × 100 bytes = ~10KB total
1030
959
 
1031
960
  ---
1032
961
 
@@ -1095,30 +1024,28 @@ end
1095
1024
  # Overhead: ~0.5μs (5% increase)
1096
1025
  ```
1097
1026
 
1098
- **Redis Latency:**
1027
+ **In-Memory Performance:**
1099
1028
  ```ruby
1100
- # Redis operations per event (within limit):
1101
- # 1. ZREMRANGEBYSCORE (cleanup) ~0.1ms
1102
- # 2. ZCARD (count) ~0.05ms
1103
- # 3. ZADD (add entry) ~0.05ms
1104
- # 4. EXPIRE (set TTL) ~0.05ms
1105
- # Total: ~0.25ms per event
1029
+ # Token bucket operations per event (within limit):
1030
+ # 1. Mutex lock ~0.001ms
1031
+ # 2. Refill calculation ~0.001ms
1032
+ # 3. Token consumption ~0.001ms
1033
+ # Total: ~0.003ms per event (3000x faster than Redis!)
1106
1034
 
1107
1035
  # When rate limited:
1108
- # 1. ZREMRANGEBYSCORE ~0.1ms
1109
- # 2. ZCARD ~0.05ms
1110
- # Total: ~0.15ms (no write)
1036
+ # 1. Mutex lock ~0.001ms
1037
+ # 2. Refill calculation ~0.001ms
1038
+ # 3. Check tokens (0 available) ~0.001ms
1039
+ # Total: ~0.003ms (no network overhead)
1111
1040
  ```
1112
1041
 
1113
1042
  **Scaling:**
1114
1043
  ```ruby
1115
- # Redis memory usage:
1116
- # - Global limit (10k events/min): ~500KB
1117
- # - Per-event limit (100/min): ~5KB per event type
1118
- # - Per-context limit (1k/min): ~50KB per user
1119
- #
1120
- # Example: 1000 users × 50KB = 50MB
1121
- # → Acceptable for most deployments
1044
+ # Memory usage (in-memory):
1045
+ # - Global bucket: ~100 bytes
1046
+ # - Per-event bucket: ~100 bytes per unique event type
1047
+ # - Example: 100 event types × 100 bytes = ~10KB total
1048
+ # → Negligible memory footprint
1122
1049
  ```
1123
1050
 
1124
1051
  ---
@@ -1198,7 +1125,7 @@ end
1198
1125
 
1199
1126
  ## 📊 Self-Monitoring & Metrics
1200
1127
 
1201
- > **Implementation:** See [ADR-006 Section 4: Rate Limiting](../ADR-006-security-compliance.md#4-rate-limiting) for detailed architecture.
1128
+ > **Implementation:** See [ADR-006 Section 4: Rate Limiting](../architecture/ADR-006-security-compliance.md#4-rate-limiting) for detailed architecture.
1202
1129
 
1203
1130
  E11y provides comprehensive self-monitoring metrics for rate limiting. These metrics help you understand rate limit behavior, detect attacks, and optimize limits.
1204
1131
 
@@ -240,7 +240,7 @@ end
240
240
  > - ✅ Separate storage (isolated from app DB)
241
241
  > - ✅ Long retention (7-10 years)
242
242
  >
243
- > **Implementation:** See [ADR-015 §3.3: Audit Event Pipeline Separation](../ADR-015-middleware-order.md#33-audit-event-pipeline-separation-c01-resolution) for full architecture.
243
+ > **Implementation:** See [ADR-015 §3.3: Audit Event Pipeline Separation](../architecture/ADR-015-middleware-order.md#33-audit-event-pipeline-separation-c01-resolution) for full architecture.
244
244
 
245
245
  ---
246
246
 
@@ -424,15 +424,11 @@ Events::UserDeleted.track(
424
424
  E11y.configure do |config|
425
425
  config.audit_trail do
426
426
  # Separate storage for audit events
427
- storage adapter: :postgresql, # OR :s3, :file
427
+ storage adapter: :postgresql, # OR :file
428
428
  table: 'audit_events',
429
429
  read_only: true # Can't UPDATE/DELETE
430
430
 
431
- # S3 with object lock (true immutability)
432
- # storage adapter: :s3,
433
- # bucket: 'company-audit-trail',
434
- # object_lock: true, # WORM (Write Once Read Many)
435
- # retention_period: 7.years
431
+ # Object storage with WORM (external; E11y uses retention_until for archival filtering)
436
432
  end
437
433
  end
438
434
 
@@ -568,14 +564,13 @@ E11y.configure do |config|
568
564
 
569
565
  # === ARCHIVAL ===
570
566
  archive_after 1.year,
571
- to: :s3_glacier, # Cheaper cold storage
572
- bucket: 'company-audit-archive'
567
+ to: :archive, # Cold storage (external job filters by retention_until)
573
568
  end
574
569
  end
575
570
 
576
571
  # How it works:
577
- # 1. Events stored in hot storage (PostgreSQL/S3)
578
- # 2. After 1 year → moved to cold storage (Glacier)
572
+ # 1. Events stored in hot storage (PostgreSQL/File)
573
+ # 2. After 1 year → moved to cold storage (archival job filters by retention_until)
579
574
  # 3. After retention period → permanently deleted
580
575
  # 4. Deletion logged as audit event (audit the audit!)
581
576
  ```
@@ -1239,7 +1234,7 @@ end
1239
1234
 
1240
1235
  ## 🔧 Implementation Details
1241
1236
 
1242
- > **Implementation:** See [ADR-006 Section 5: Audit Trail](../ADR-006-security-compliance.md#5-audit-trail) for detailed architecture.
1237
+ > **Implementation:** See [ADR-006 Section 5: Audit Trail](../architecture/ADR-006-security-compliance.md#5-audit-trail) for detailed architecture.
1243
1238
 
1244
1239
  ### Audit Middleware Architecture
1245
1240
 
@@ -1554,17 +1549,18 @@ module E11y
1554
1549
  end
1555
1550
  ```
1556
1551
 
1557
- **3. S3 Audit Adapter (Cloud, Object Lock WORM)**
1552
+ **3. Object Storage Audit Adapter (conceptual; not in E11y)**
1553
+
1554
+ > E11y does not provide an S3/object-storage adapter. For cloud WORM storage, use OTel Collector's object-storage exporter, or an external archival job that filters Loki by `retention_until`. Events carry `retention_until` (ISO8601) for easy filtering.
1558
1555
 
1559
1556
  ```ruby
1560
- # lib/e11y/adapters/s3_audit.rb
1557
+ # Conceptual: Object storage with WORM (e.g., S3 Object Lock)
1558
+ # E11y does NOT implement this — use external archival
1561
1559
  module E11y
1562
1560
  module Adapters
1563
- class S3Audit < Base
1561
+ class ObjectStorageAudit < Base # Conceptual only
1564
1562
  def initialize(config)
1565
1563
  @bucket = config.bucket
1566
- @s3_client = Aws::S3::Client.new
1567
- @object_lock = config.object_lock || true
1568
1564
  @retention_days = config.retention_days || 2555 # 7 years
1569
1565
  end
1570
1566
 
@@ -1573,37 +1569,16 @@ module E11y
1573
1569
  end
1574
1570
 
1575
1571
  def write(event_data)
1576
- object_key = audit_object_key(event_data)
1577
-
1578
- @s3_client.put_object(
1579
- bucket: @bucket,
1580
- key: object_key,
1581
- body: event_data.to_json,
1582
- content_type: 'application/json',
1583
-
1584
- # WORM: Object Lock prevents deletion
1585
- object_lock_mode: 'GOVERNANCE', # OR 'COMPLIANCE' (stricter)
1586
- object_lock_retain_until_date: @retention_days.days.from_now,
1587
-
1588
- # Metadata for audit
1589
- metadata: {
1590
- 'event-id' => event_data[:event_id],
1591
- 'event-name' => event_data[:event_name],
1592
- 'signed-at' => event_data[:signed_at],
1593
- 'signature-algorithm' => event_data[:signature_algorithm]
1594
- }
1595
- )
1572
+ # Filter by retention_until for archival decisions
1573
+ # object_key = "#{event_data[:retention_until]}/#{event_data[:event_id]}.json"
1574
+ # ... PUT to object storage with WORM ...
1596
1575
  end
1597
1576
 
1598
1577
  private
1599
1578
 
1600
1579
  def audit_object_key(event_data)
1601
- timestamp = event_data[:timestamp]
1602
- date = timestamp.to_date
1603
- hour = timestamp.hour
1604
-
1605
- # Partition by date and hour for efficient queries
1606
- "audit/#{date.strftime('%Y/%m/%d')}/hour=#{hour.to_s.rjust(2, '0')}/#{event_data[:event_id]}.json"
1580
+ ts = Time.parse(event_data[:timestamp])
1581
+ "audit/#{ts.strftime('%Y/%m/%d')}/#{event_data[:event_id]}.json"
1607
1582
  end
1608
1583
  end
1609
1584
  end
@@ -1715,7 +1690,7 @@ end
1715
1690
 
1716
1691
  ## ⚡ Performance Guarantees
1717
1692
 
1718
- > **Implementation:** See [ADR-006 Section 5.2: Cryptographic Signing](../ADR-006-security-compliance.md#52-cryptographic-signing) for detailed architecture.
1693
+ > **Implementation:** See [ADR-006 Section 5.2: Cryptographic Signing](../architecture/ADR-006-security-compliance.md#52-cryptographic-signing) for detailed architecture.
1719
1694
 
1720
1695
  E11y audit trail is designed for **high-performance production environments** with strict SLOs. Audit events must not significantly impact application latency.
1721
1696
 
@@ -1880,10 +1855,10 @@ end
1880
1855
  |-----------------|---------------------|------------|----------|
1881
1856
  | **File (append-only)** | 1-2ms | 10,000/sec | Simple, local, fast |
1882
1857
  | **PostgreSQL** | 2-5ms | 5,000/sec | Queryable, ACID |
1883
- | **S3 (Object Lock)** | 10-50ms | 1,000/sec | Cloud, immutable (WORM) |
1858
+ | **Object storage (WORM)** | 10-50ms | 1,000/sec | Cloud, immutable (external archival) |
1884
1859
  | **Elasticsearch** | 5-10ms | 3,000/sec | Full-text search |
1885
1860
 
1886
- **Recommendation:** Use **File adapter** for lowest latency, **PostgreSQL** for queryability, **S3** for compliance (true WORM).
1861
+ **Recommendation:** Use **File adapter** for lowest latency, **PostgreSQL** for queryability. For cloud WORM, use external archival (filter by `retention_until`).
1887
1862
 
1888
1863
  ---
1889
1864