e11y 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +4 -0
  3. data/.rubocop.yml +69 -0
  4. data/CHANGELOG.md +26 -0
  5. data/CODE_OF_CONDUCT.md +64 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +179 -0
  8. data/Rakefile +37 -0
  9. data/benchmarks/run_all.rb +33 -0
  10. data/config/README.md +83 -0
  11. data/config/loki-local-config.yaml +35 -0
  12. data/config/prometheus.yml +15 -0
  13. data/docker-compose.yml +78 -0
  14. data/docs/00-ICP-AND-TIMELINE.md +483 -0
  15. data/docs/01-SCALE-REQUIREMENTS.md +858 -0
  16. data/docs/ADR-001-architecture.md +2617 -0
  17. data/docs/ADR-002-metrics-yabeda.md +1395 -0
  18. data/docs/ADR-003-slo-observability.md +3337 -0
  19. data/docs/ADR-004-adapter-architecture.md +2385 -0
  20. data/docs/ADR-005-tracing-context.md +1372 -0
  21. data/docs/ADR-006-security-compliance.md +4143 -0
  22. data/docs/ADR-007-opentelemetry-integration.md +1385 -0
  23. data/docs/ADR-008-rails-integration.md +1911 -0
  24. data/docs/ADR-009-cost-optimization.md +2993 -0
  25. data/docs/ADR-010-developer-experience.md +2166 -0
  26. data/docs/ADR-011-testing-strategy.md +1836 -0
  27. data/docs/ADR-012-event-evolution.md +958 -0
  28. data/docs/ADR-013-reliability-error-handling.md +2750 -0
  29. data/docs/ADR-014-event-driven-slo.md +1533 -0
  30. data/docs/ADR-015-middleware-order.md +1061 -0
  31. data/docs/ADR-016-self-monitoring-slo.md +1234 -0
  32. data/docs/API-REFERENCE-L28.md +914 -0
  33. data/docs/COMPREHENSIVE-CONFIGURATION.md +2366 -0
  34. data/docs/IMPLEMENTATION_NOTES.md +2804 -0
  35. data/docs/IMPLEMENTATION_PLAN.md +1971 -0
  36. data/docs/IMPLEMENTATION_PLAN_ARCHITECTURE.md +586 -0
  37. data/docs/PLAN.md +148 -0
  38. data/docs/QUICK-START.md +934 -0
  39. data/docs/README.md +296 -0
  40. data/docs/design/00-memory-optimization.md +593 -0
  41. data/docs/guides/MIGRATION-L27-L28.md +692 -0
  42. data/docs/guides/PERFORMANCE-BENCHMARKS.md +434 -0
  43. data/docs/guides/README.md +44 -0
  44. data/docs/prd/01-overview-vision.md +440 -0
  45. data/docs/use_cases/README.md +119 -0
  46. data/docs/use_cases/UC-001-request-scoped-debug-buffering.md +813 -0
  47. data/docs/use_cases/UC-002-business-event-tracking.md +1953 -0
  48. data/docs/use_cases/UC-003-pattern-based-metrics.md +1627 -0
  49. data/docs/use_cases/UC-004-zero-config-slo-tracking.md +728 -0
  50. data/docs/use_cases/UC-005-sentry-integration.md +759 -0
  51. data/docs/use_cases/UC-006-trace-context-management.md +905 -0
  52. data/docs/use_cases/UC-007-pii-filtering.md +2648 -0
  53. data/docs/use_cases/UC-008-opentelemetry-integration.md +1153 -0
  54. data/docs/use_cases/UC-009-multi-service-tracing.md +1043 -0
  55. data/docs/use_cases/UC-010-background-job-tracking.md +1018 -0
  56. data/docs/use_cases/UC-011-rate-limiting.md +1906 -0
  57. data/docs/use_cases/UC-012-audit-trail.md +2301 -0
  58. data/docs/use_cases/UC-013-high-cardinality-protection.md +2127 -0
  59. data/docs/use_cases/UC-014-adaptive-sampling.md +1940 -0
  60. data/docs/use_cases/UC-015-cost-optimization.md +735 -0
  61. data/docs/use_cases/UC-016-rails-logger-migration.md +785 -0
  62. data/docs/use_cases/UC-017-local-development.md +867 -0
  63. data/docs/use_cases/UC-018-testing-events.md +1081 -0
  64. data/docs/use_cases/UC-019-tiered-storage-migration.md +562 -0
  65. data/docs/use_cases/UC-020-event-versioning.md +708 -0
  66. data/docs/use_cases/UC-021-error-handling-retry-dlq.md +956 -0
  67. data/docs/use_cases/UC-022-event-registry.md +648 -0
  68. data/docs/use_cases/backlog.md +226 -0
  69. data/e11y.gemspec +76 -0
  70. data/lib/e11y/adapters/adaptive_batcher.rb +207 -0
  71. data/lib/e11y/adapters/audit_encrypted.rb +239 -0
  72. data/lib/e11y/adapters/base.rb +580 -0
  73. data/lib/e11y/adapters/file.rb +224 -0
  74. data/lib/e11y/adapters/in_memory.rb +216 -0
  75. data/lib/e11y/adapters/loki.rb +333 -0
  76. data/lib/e11y/adapters/otel_logs.rb +203 -0
  77. data/lib/e11y/adapters/registry.rb +141 -0
  78. data/lib/e11y/adapters/sentry.rb +230 -0
  79. data/lib/e11y/adapters/stdout.rb +108 -0
  80. data/lib/e11y/adapters/yabeda.rb +370 -0
  81. data/lib/e11y/buffers/adaptive_buffer.rb +339 -0
  82. data/lib/e11y/buffers/base_buffer.rb +40 -0
  83. data/lib/e11y/buffers/request_scoped_buffer.rb +246 -0
  84. data/lib/e11y/buffers/ring_buffer.rb +267 -0
  85. data/lib/e11y/buffers.rb +14 -0
  86. data/lib/e11y/console.rb +122 -0
  87. data/lib/e11y/current.rb +48 -0
  88. data/lib/e11y/event/base.rb +894 -0
  89. data/lib/e11y/event/value_sampling_config.rb +84 -0
  90. data/lib/e11y/events/base_audit_event.rb +43 -0
  91. data/lib/e11y/events/base_payment_event.rb +33 -0
  92. data/lib/e11y/events/rails/cache/delete.rb +21 -0
  93. data/lib/e11y/events/rails/cache/read.rb +23 -0
  94. data/lib/e11y/events/rails/cache/write.rb +22 -0
  95. data/lib/e11y/events/rails/database/query.rb +45 -0
  96. data/lib/e11y/events/rails/http/redirect.rb +21 -0
  97. data/lib/e11y/events/rails/http/request.rb +26 -0
  98. data/lib/e11y/events/rails/http/send_file.rb +21 -0
  99. data/lib/e11y/events/rails/http/start_processing.rb +26 -0
  100. data/lib/e11y/events/rails/job/completed.rb +22 -0
  101. data/lib/e11y/events/rails/job/enqueued.rb +22 -0
  102. data/lib/e11y/events/rails/job/failed.rb +22 -0
  103. data/lib/e11y/events/rails/job/scheduled.rb +23 -0
  104. data/lib/e11y/events/rails/job/started.rb +22 -0
  105. data/lib/e11y/events/rails/log.rb +56 -0
  106. data/lib/e11y/events/rails/view/render.rb +23 -0
  107. data/lib/e11y/events.rb +18 -0
  108. data/lib/e11y/instruments/active_job.rb +201 -0
  109. data/lib/e11y/instruments/rails_instrumentation.rb +141 -0
  110. data/lib/e11y/instruments/sidekiq.rb +175 -0
  111. data/lib/e11y/logger/bridge.rb +205 -0
  112. data/lib/e11y/metrics/cardinality_protection.rb +172 -0
  113. data/lib/e11y/metrics/cardinality_tracker.rb +134 -0
  114. data/lib/e11y/metrics/registry.rb +234 -0
  115. data/lib/e11y/metrics/relabeling.rb +226 -0
  116. data/lib/e11y/metrics.rb +102 -0
  117. data/lib/e11y/middleware/audit_signing.rb +174 -0
  118. data/lib/e11y/middleware/base.rb +140 -0
  119. data/lib/e11y/middleware/event_slo.rb +167 -0
  120. data/lib/e11y/middleware/pii_filter.rb +266 -0
  121. data/lib/e11y/middleware/pii_filtering.rb +280 -0
  122. data/lib/e11y/middleware/rate_limiting.rb +214 -0
  123. data/lib/e11y/middleware/request.rb +163 -0
  124. data/lib/e11y/middleware/routing.rb +157 -0
  125. data/lib/e11y/middleware/sampling.rb +254 -0
  126. data/lib/e11y/middleware/slo.rb +168 -0
  127. data/lib/e11y/middleware/trace_context.rb +131 -0
  128. data/lib/e11y/middleware/validation.rb +118 -0
  129. data/lib/e11y/middleware/versioning.rb +132 -0
  130. data/lib/e11y/middleware.rb +12 -0
  131. data/lib/e11y/pii/patterns.rb +90 -0
  132. data/lib/e11y/pii.rb +13 -0
  133. data/lib/e11y/pipeline/builder.rb +155 -0
  134. data/lib/e11y/pipeline/zone_validator.rb +110 -0
  135. data/lib/e11y/pipeline.rb +12 -0
  136. data/lib/e11y/presets/audit_event.rb +65 -0
  137. data/lib/e11y/presets/debug_event.rb +34 -0
  138. data/lib/e11y/presets/high_value_event.rb +51 -0
  139. data/lib/e11y/presets.rb +19 -0
  140. data/lib/e11y/railtie.rb +138 -0
  141. data/lib/e11y/reliability/circuit_breaker.rb +216 -0
  142. data/lib/e11y/reliability/dlq/file_storage.rb +277 -0
  143. data/lib/e11y/reliability/dlq/filter.rb +117 -0
  144. data/lib/e11y/reliability/retry_handler.rb +207 -0
  145. data/lib/e11y/reliability/retry_rate_limiter.rb +117 -0
  146. data/lib/e11y/sampling/error_spike_detector.rb +225 -0
  147. data/lib/e11y/sampling/load_monitor.rb +161 -0
  148. data/lib/e11y/sampling/stratified_tracker.rb +92 -0
  149. data/lib/e11y/sampling/value_extractor.rb +82 -0
  150. data/lib/e11y/self_monitoring/buffer_monitor.rb +79 -0
  151. data/lib/e11y/self_monitoring/performance_monitor.rb +97 -0
  152. data/lib/e11y/self_monitoring/reliability_monitor.rb +146 -0
  153. data/lib/e11y/slo/event_driven.rb +150 -0
  154. data/lib/e11y/slo/tracker.rb +119 -0
  155. data/lib/e11y/version.rb +9 -0
  156. data/lib/e11y.rb +283 -0
  157. metadata +452 -0
@@ -0,0 +1,562 @@
1
+ # UC-019: Retention Tagging for Downstream Data Lifecycle
2
+
3
+ **Status:** Cost Optimization Feature (v1.0)
4
+ **Complexity:** Simple (just tagging!)
5
+ **Setup Time:** 10 minutes (E11y config only)
6
+ **Target Users:** Platform Engineers, DevOps, Cost Optimization Teams
7
+
8
+ ---
9
+
10
+ ## πŸ“‹ Overview
11
+
12
+ ### Problem Statement
13
+
14
+ **Current Pain Points:**
15
+
16
+ 1. **Downstream systems don't know event retention requirements**
17
+ - Elasticsearch ILM treats all events the same
18
+ - S3 Lifecycle Rules need manual setup per prefix
19
+ - No way to express "this event should be kept 7 years"
20
+
21
+ 2. **Storage costs grow linearly**
22
+ - All events in hot tier (expensive)
23
+ - No differentiation between debug (1 day) and audit (7 years)
24
+
25
+ 3. **Manual configuration hell**
26
+ - Different ES indices for different retention? (nightmare)
27
+ - Different S3 buckets per retention? (management overhead)
28
+ - No single source of truth
29
+
30
+ ### E11y Solution
31
+
32
+ **Just Add Metadata Tags!**
33
+
34
+ E11y просто добавляСт **retention tags** ΠΊ ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡƒ ΡΠΎΠ±Ρ‹Ρ‚ΠΈΡŽ:
35
+ - `retention_days: 7` для debug events
36
+ - `retention_days: 2555` (7 years) для audit events
37
+ - **Downstream систСмы** (ES ILM, S3 Lifecycle) ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‚ эти Ρ‚Π΅Π³ΠΈ
38
+
39
+ **Result:** ΠŸΡ€ΠΎΡΡ‚Π°Ρ конфигурация, downstream Π΄Π΅Π»Π°Π΅Ρ‚ всю Ρ€Π°Π±ΠΎΡ‚Ρƒ.
40
+
41
+ ---
42
+
43
+ ## 🎯 Use Case Scenarios
44
+
45
+ ### Scenario 1: Standard Observability Events
46
+
47
+ **Context:** Regular application events (logs, metrics)
48
+
49
+ ```ruby
50
+ # Default retention: 30 days
51
+ class OrderCreated < E11y::Event::Base
52
+ # No explicit retention β†’ use default (30 days)
53
+ end
54
+
55
+ # E11y adds metadata:
56
+ Events::OrderCreated.track(order_id: '123')
57
+ # Event written with:
58
+ # {
59
+ # "@timestamp": "2026-01-12T10:30:00Z",
60
+ # "retention_until": "2026-02-11T10:30:00Z", # ← E11y calculates: @timestamp + 30 days
61
+ # "event_name": "order.created",
62
+ # ...
63
+ # }
64
+
65
+ # Downstream ES ILM simply checks:
66
+ # if now > retention_until β†’ delete
67
+ # No calculation needed!
68
+ ```
69
+
70
+ ### Scenario 2: Audit Events (Long Retention)
71
+
72
+ **Context:** Compliance events requiring 7-year retention
73
+
74
+ ```ruby
75
+ class UserPermissionChanged < E11y::AuditEvent
76
+ audit_retention 7.years # Compliance requirement
77
+
78
+ schema do
79
+ required(:user_id).filled(:string)
80
+ required(:old_role).filled(:string)
81
+ required(:new_role).filled(:string)
82
+ end
83
+ end
84
+
85
+ # E11y adds metadata:
86
+ Events::UserPermissionChanged.track(...)
87
+ # Event written with:
88
+ # {
89
+ # "@timestamp": "2026-01-12T10:30:00Z",
90
+ # "retention_until": "2033-01-12T10:30:00Z", # ← @timestamp + 7 years
91
+ # "event_name": "user.permission_changed",
92
+ # ...
93
+ # }
94
+
95
+ # Downstream systems simply check:
96
+ # if now > retention_until β†’ delete
97
+ ```
98
+
99
+ ### Scenario 3: High-Volume Events (Short Retention)
100
+
101
+ **Context:** Debug logs, page views (noise if kept long)
102
+
103
+ ```ruby
104
+ class PageView < E11y::Event::Base
105
+ retention 1.day # Short retention
106
+
107
+ schema do
108
+ required(:path).filled(:string)
109
+ required(:user_id).filled(:string)
110
+ end
111
+ end
112
+
113
+ # E11y adds metadata:
114
+ Events::PageView.track(...)
115
+ # Event written with:
116
+ # {
117
+ # "@timestamp": "2026-01-12T10:30:00Z",
118
+ # "retention_until": "2026-01-13T10:30:00Z", # ← @timestamp + 1 day
119
+ # "event_name": "page.view",
120
+ # ...
121
+ # }
122
+
123
+ # Downstream ES ILM:
124
+ # - Deletes when now > retention_until (1 day later)
125
+ ```
126
+
127
+ ---
128
+
129
+ ## πŸ—οΈ Architecture
130
+
131
+ > **Implementation:** See [ADR-009 Section 6: Tiered Storage](../ADR-009-cost-optimization.md#6-tiered-storage-retention_until-tagging) for retention tagger architecture, retention tiers configuration, and downstream integration (Elasticsearch ILM, S3 Lifecycle).
132
+
133
+ ### E11y's Simple Role: Just Add Expiry Date!
134
+
135
+ ```
136
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
137
+ β”‚ E11Y (Dead Simple: Calculate Expiry Date) β”‚
138
+ β”‚ β”‚
139
+ β”‚ Event.track(...) β”‚
140
+ β”‚ ↓ β”‚
141
+ β”‚ Add metadata to event: β”‚
142
+ β”‚ { β”‚
143
+ β”‚ "@timestamp": "2026-01-12T10:30:00Z", β”‚
144
+ β”‚ "retention_until": "2026-02-11T10:30:00Z", ← @timestamp + 30dβ”‚
145
+ β”‚ "event_name": "order.created", β”‚
146
+ β”‚ ... β”‚
147
+ β”‚ } β”‚
148
+ β”‚ ↓ β”‚
149
+ β”‚ Write to adapters (Loki, ES, S3) β”‚
150
+ β”‚ β”‚
151
+ β”‚ THAT'S IT! E11y's job done βœ… β”‚
152
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
153
+
154
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
155
+ β”‚ DOWNSTREAM SYSTEMS (Trivial Logic) β”‚
156
+ β”‚ β”‚
157
+ β”‚ Elasticsearch ILM / S3 Lifecycle / External job: β”‚
158
+ β”‚ β”‚
159
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
160
+ β”‚ β”‚ if now > retention_until β”‚ β”‚
161
+ β”‚ β”‚ delete(event) β”‚ β”‚
162
+ β”‚ β”‚ end β”‚ β”‚
163
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
164
+ β”‚ β”‚
165
+ β”‚ No calculations! Just date comparison βœ… β”‚
166
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
167
+ ```
168
+
169
+ ### Benefits of `retention_until` (Absolute Date)
170
+
171
+ **vs. `retention_days` (Relative):**
172
+
173
+ | Approach | Downstream Logic | Clock Skew Issues? | Simple? |
174
+ |----------|------------------|-------------------|---------|
175
+ | `retention_days: 30` | `if now > (@timestamp + 30.days)` | ❌ Yes (if clocks differ) | 🟑 Need calculation |
176
+ | `retention_until: "2026-02-11"` | `if now > retention_until` | βœ… No (date already calculated) | βœ… Trivial comparison |
177
+
178
+ **E11y calculates once, downstream just compares!**
179
+
180
+ ---
181
+
182
+ ## πŸ”§ Configuration
183
+
184
+ ### E11y Configuration (Dead Simple!)
185
+
186
+ ```ruby
187
+ # config/initializers/e11y.rb
188
+ E11y.configure do |config|
189
+ # Just enable retention tagging!
190
+ config.retention_tagging do
191
+ enabled true
192
+
193
+ # Default retention for events without explicit retention
194
+ default_retention 30.days
195
+
196
+ # Per-pattern retention rules
197
+ retention_by_pattern do
198
+ pattern 'audit.*', retention: 7.years
199
+ pattern 'security.*', retention: 1.year
200
+ pattern 'debug.*', retention: 1.day
201
+ pattern '*.page_view', retention: 7.days
202
+ pattern '*', retention: 30.days # Default
203
+ end
204
+
205
+ # Field name (what E11y adds to each event)
206
+ retention_field :retention_until # ISO8601 timestamp
207
+ end
208
+ end
209
+
210
+ # E11y automatically adds to each event:
211
+ # event["retention_until"] = event["@timestamp"] + retention_period
212
+ ```
213
+
214
+ **That's it for E11y!** Downstream just checks: `now > retention_until`
215
+
216
+ ---
217
+
218
+ ## πŸ”„ Downstream Configuration
219
+
220
+ **E11y doesn't migrate data!** Downstream systems use `@timestamp` + `retention_days`.
221
+
222
+ ### Option 1: Elasticsearch ILM (Recommended)
223
+
224
+ **Elasticsearch reads retention_days from each event:**
225
+
226
+ ```bash
227
+ # Create ILM policy in Elasticsearch
228
+ PUT _ilm/policy/e11y-events-policy
229
+ {
230
+ "policy": {
231
+ "phases": {
232
+ "hot": {
233
+ "min_age": "0ms",
234
+ "actions": {
235
+ "rollover": {
236
+ "max_primary_shard_size": "50GB",
237
+ "max_age": "1d"
238
+ },
239
+ "set_priority": {
240
+ "priority": 100
241
+ }
242
+ }
243
+ },
244
+ "warm": {
245
+ "min_age": "7d",
246
+ "actions": {
247
+ "shrink": {
248
+ "number_of_shards": 1
249
+ },
250
+ "forcemerge": {
251
+ "max_num_segments": 1
252
+ },
253
+ "set_priority": {
254
+ "priority": 50
255
+ }
256
+ }
257
+ },
258
+ "cold": {
259
+ "min_age": "30d",
260
+ "actions": {
261
+ "searchable_snapshot": {
262
+ "snapshot_repository": "e11y-s3-repository"
263
+ }
264
+ }
265
+ },
266
+ "delete": {
267
+ # IMPORTANT: Delete based on @timestamp + retention_days
268
+ # This requires ES script or external job
269
+ "min_age": "365d", # Max default
270
+ "actions": {
271
+ "delete": {}
272
+ }
273
+ }
274
+ }
275
+ }
276
+ }
277
+
278
+ # NOTE: ES ILM doesn't natively support per-document retention!
279
+ # You need EITHER:
280
+ # 1. Multiple ILM policies per retention period (complex)
281
+ # 2. External cleanup job (reads retention_days, deletes old docs)
282
+ ```
283
+
284
+ **Better approach: External cleanup job** (reads retention_days):
285
+
286
+ ### Option 2: S3 Lifecycle Rules
287
+
288
+ **Problem:** S3 Lifecycle works on object creation date, not event @timestamp!
289
+
290
+ **Solution:** E11y can add S3 object tags (if using S3 adapter):
291
+
292
+ ```ruby
293
+ # E11y S3 Adapter adds object tags
294
+ config.adapters do
295
+ register :s3, E11y::Adapters::S3Adapter.new(
296
+ bucket: 'e11y-events',
297
+ tagging: true, # Enable object tagging
298
+ tags_from_event: [:retention_days] # Copy event field to S3 tag
299
+ )
300
+ end
301
+
302
+ # AWS S3 Lifecycle (using tags)
303
+ resource "aws_s3_bucket_lifecycle_configuration" "e11y_events" {
304
+ bucket = aws_s3_bucket.e11y_events.id
305
+
306
+ # Rule for 7-day retention (debug, page views)
307
+ rule {
308
+ id = "short-retention"
309
+ status = "Enabled"
310
+
311
+ expiration {
312
+ days = 7
313
+ }
314
+
315
+ filter {
316
+ tag {
317
+ key = "retention_days"
318
+ value = "7"
319
+ }
320
+ }
321
+ }
322
+
323
+ # Rule for 30-day retention (standard events)
324
+ rule {
325
+ id = "standard-retention"
326
+ status = "Enabled"
327
+
328
+ transition {
329
+ days = 7
330
+ storage_class = "STANDARD_IA"
331
+ }
332
+
333
+ expiration {
334
+ days = 30
335
+ }
336
+
337
+ filter {
338
+ tag {
339
+ key = "retention_days"
340
+ value = "30"
341
+ }
342
+ }
343
+ }
344
+
345
+ # Rule for 7-year retention (audit)
346
+ rule {
347
+ id = "audit-retention"
348
+ status = "Enabled"
349
+
350
+ transition {
351
+ days = 30
352
+ storage_class = "GLACIER"
353
+ }
354
+
355
+ transition {
356
+ days = 365
357
+ storage_class = "DEEP_ARCHIVE"
358
+ }
359
+
360
+ expiration {
361
+ days = 2555 # 7 years
362
+ }
363
+
364
+ filter {
365
+ tag {
366
+ key = "retention_days"
367
+ value = "2555"
368
+ }
369
+ }
370
+ }
371
+ }
372
+ ```
373
+
374
+ **Note:** Need one rule per retention_days value (manageable for common values like 1, 7, 30, 365, 2555).
375
+
376
+ ### Option 3: External Cleanup Job (Recommended!)
377
+
378
+ **Trivial logic with `retention_until`:**
379
+
380
+ ```ruby
381
+ # lib/tasks/e11y_cleanup.rake
382
+ namespace :e11y do
383
+ desc "Delete events past their retention period"
384
+ task cleanup: :environment do
385
+ es_client = Elasticsearch::Client.new(url: ENV['ES_URL'])
386
+
387
+ # Delete expired events (dead simple query!)
388
+ response = es_client.delete_by_query(
389
+ index: 'e11y-events-*',
390
+ body: {
391
+ query: {
392
+ range: {
393
+ retention_until: {
394
+ lte: 'now' # ← That's it! Just: retention_until <= now
395
+ }
396
+ }
397
+ }
398
+ }
399
+ )
400
+
401
+ puts "Deleted #{response['deleted']} expired events"
402
+ end
403
+ end
404
+
405
+ # Schedule daily: 0 2 * * * rake e11y:cleanup
406
+ ```
407
+
408
+ **This approach:**
409
+ - βœ… Works with ANY retention period (no calculation!)
410
+ - βœ… Trivial query: `retention_until <= now`
411
+ - βœ… No Painless scripts (faster, simpler)
412
+ - βœ… Standard Elasticsearch range query
413
+
414
+ ---
415
+
416
+ ## πŸ“Š Cost Savings Example
417
+
418
+ ### Before Tiered Storage
419
+
420
+ ```
421
+ All events in Elasticsearch (hot tier):
422
+ - Volume: 1TB/month
423
+ - Retention: 365 days
424
+ - Total storage: 12TB/year
425
+ - ES cost: $0.10/GB/month
426
+ - Annual cost: $0.10 Γ— 12,000 GB Γ— 12 months = $14,400/year
427
+ ```
428
+
429
+ ### After Tiered Storage
430
+
431
+ ```
432
+ Hot tier (ES, 0-7 days):
433
+ - Volume: 1TB/month Γ— 7/30 = 233GB
434
+ - Cost: $0.10 Γ— 233 GB Γ— 12 = $280/year
435
+
436
+ Warm tier (S3 Standard, 7-30 days):
437
+ - Volume: 1TB/month Γ— 23/30 = 767GB
438
+ - Cost: $0.023/GB/month
439
+ - Annual cost: $0.023 Γ— 767 GB Γ— 12 = $212/year
440
+
441
+ Cold tier (S3 Glacier, 30-365 days):
442
+ - Volume: 1TB/month Γ— 335/365 = 918GB per month average
443
+ - Cost: $0.004/GB/month
444
+ - Annual cost: $0.004 Γ— 11,000 GB = $44/year
445
+
446
+ Total cost: $280 + $212 + $44 = $536/year
447
+ Savings: $14,400 - $536 = $13,864/year (96% reduction!)
448
+ ```
449
+
450
+ ---
451
+
452
+ ## πŸ’‘ Best Practices
453
+
454
+ ### βœ… DO
455
+
456
+ **1. Define retention at event level**
457
+ ```ruby
458
+ class AuditEvent < E11y::Event::Base
459
+ retention 7.years # Explicit
460
+ end
461
+ ```
462
+
463
+ **2. Use retention tagging for S3 lifecycle**
464
+ ```ruby
465
+ config.cost_optimization.retention_tagging do
466
+ enabled true
467
+ tag_with_retention true # Adds retention_days to event metadata
468
+ end
469
+ ```
470
+
471
+ **3. Query warm/cold data via Athena/BigQuery**
472
+ ```sql
473
+ -- Query S3 via AWS Athena
474
+ SELECT * FROM e11y_events_warm
475
+ WHERE date = '2024-01-15'
476
+ AND event_name = 'order.created'
477
+ LIMIT 100;
478
+ ```
479
+
480
+ **4. Set up ES ILM for automatic migration**
481
+ ```bash
482
+ # Let Elasticsearch handle hot→warm→cold automatically
483
+ ```
484
+
485
+ ---
486
+
487
+ ### ❌ DON'T
488
+
489
+ **1. Don't expect E11y to migrate data**
490
+ ```ruby
491
+ # ❌ E11y doesn't move data between adapters
492
+ # It only routes writes to appropriate tiers
493
+
494
+ # βœ… Use ES ILM or S3 Lifecycle for migration
495
+ ```
496
+
497
+ **2. Don't keep high-volume data in hot tier long**
498
+ ```ruby
499
+ # ❌ BAD: Debug logs in ES for 30 days
500
+ class DebugEvent < E11y::Event::Base
501
+ retention 30.days # Expensive!
502
+ end
503
+
504
+ # βœ… GOOD: Short retention for debug
505
+ class DebugEvent < E11y::Event::Base
506
+ retention 1.day # Cheap!
507
+ end
508
+ ```
509
+
510
+ **3. Don't forget to configure S3 lifecycle rules**
511
+ ```ruby
512
+ # If you send events to S3, set up lifecycle rules!
513
+ # Otherwise data stays in Standard tier (expensive)
514
+ ```
515
+
516
+ ---
517
+
518
+ ## 🎯 Success Metrics
519
+
520
+ ### Quantifiable Benefits
521
+
522
+ **1. Storage Cost Reduction**
523
+ - Before: $14,400/year (all in ES)
524
+ - After: $536/year (tiered)
525
+ - **Savings: 96%**
526
+
527
+ **2. Query Performance**
528
+ - Hot tier (0-7 days): <1s queries βœ…
529
+ - Warm tier (7-30 days): 1-5s queries (acceptable)
530
+ - Cold tier (30+ days): Rare access (minutes OK)
531
+
532
+ **3. Compliance**
533
+ - Audit events: 7-year retention βœ…
534
+ - PII events: Auto-deleted after 30 days βœ…
535
+ - Debug logs: Deleted after 1 day βœ…
536
+
537
+ ---
538
+
539
+ ## πŸ“š Related Use Cases
540
+
541
+ - **[UC-012: Audit Trail](./UC-012-audit-trail.md)** - Long-term retention for compliance
542
+ - **[UC-015: Cost Optimization](./UC-015-cost-optimization.md)** - Overall cost reduction strategies
543
+ - **[UC-002: Business Event Tracking](./UC-002-business-event-tracking.md)** - Event definitions with retention
544
+
545
+ ---
546
+
547
+ ## πŸš€ Quick Start Checklist
548
+
549
+ - [ ] Enable tiered storage in E11y config
550
+ - [ ] Configure retention tagging
551
+ - [ ] Set up Elasticsearch ILM policy
552
+ - [ ] Configure S3 lifecycle rules
553
+ - [ ] Define per-event retention policies
554
+ - [ ] Test write-time routing
555
+ - [ ] Monitor storage costs (before/after)
556
+ - [ ] Set up Athena for warm/cold queries (optional)
557
+
558
+ ---
559
+
560
+ **Status:** βœ… Ready for Implementation
561
+ **Priority:** High (significant cost savings)
562
+ **Complexity:** Advanced (requires downstream setup)