datadog-statsd-schema 0.1.2 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -0
- data/.rubocop_todo.yml +27 -22
- data/README.md +366 -502
- data/Rakefile +5 -1
- data/docs/img/dss-analyze.png +0 -0
- data/examples/schema/example_marathon.rb +33 -0
- data/examples/schema/web_schema.rb +32 -0
- data/exe/dss +8 -0
- data/lib/datadog/statsd/schema/analyzer.rb +477 -0
- data/lib/datadog/statsd/schema/cli.rb +16 -0
- data/lib/datadog/statsd/schema/commands/analyze.rb +63 -0
- data/lib/datadog/statsd/schema/commands.rb +14 -0
- data/lib/datadog/statsd/schema/namespace.rb +1 -1
- data/lib/datadog/statsd/schema/version.rb +1 -1
- data/lib/datadog/statsd/schema.rb +2 -0
- metadata +25 -4
- data/exe/datadog-statsd-schema +0 -3
data/README.md
CHANGED
@@ -1,638 +1,502 @@
|
|
1
1
|
[](https://github.com/kigster/datadog-statsd-schema/actions/workflows/ruby.yml)
|
2
2
|
|
3
|
-
# Datadog
|
3
|
+
# Datadog StatsD Schema
|
4
4
|
|
5
|
-
|
5
|
+
A Ruby gem that provides comprehensive schema definition, validation, and cost analysis for Datadog StatsD metrics. This library helps teams prevent metric explosion, control costs, and maintain consistent metric naming conventions.
|
6
6
|
|
7
|
-
|
7
|
+
## Features
|
8
8
|
|
9
|
-
|
9
|
+
- **Schema Definition**: Define metric schemas with type safety and validation
|
10
|
+
- **Tag Management**: Centralized tag definitions with inheritance and validation
|
11
|
+
- **Cost Analysis**: Analyze potential custom metric costs before deployment
|
12
|
+
- **Metric Validation**: Runtime validation of metrics against defined schemas
|
13
|
+
- **CLI Tools**: Command-line interface for schema analysis and validation
|
14
|
+
- **Global Configuration**: Centralized configuration for tags and StatsD clients
|
10
15
|
|
11
|
-
|
12
|
-
|
13
|
-
- **Marketing** added `statsd.increment('clicks', tags: { campaign_id: campaign.id })` across 10,000 campaigns
|
14
|
-
- **DevOps** thought `statsd.gauge('memory', tags: { container_id: container.uuid })` was a great idea
|
15
|
-
- **Frontend** started tracking `statsd.timing('page.load', tags: { user_id: current_user.id })` for 2 million users
|
16
|
-
- **Everyone** has their own creative naming conventions: `user_signups`, `user.sign.ups`, `users::signups`, `Users.Signups`
|
16
|
+
## Installation
|
17
17
|
|
18
|
-
|
18
|
+
Add this line to your application's Gemfile:
|
19
19
|
|
20
|
-
|
20
|
+
```ruby
|
21
|
+
gem 'datadog-statsd-schema'
|
22
|
+
```
|
21
23
|
|
22
|
-
|
24
|
+
And then execute:
|
23
25
|
|
24
|
-
|
26
|
+
```bash
|
27
|
+
bundle install
|
28
|
+
```
|
25
29
|
|
26
|
-
|
27
|
-
2. **📋 Schema Validation** - Define your metrics upfront, validate everything, prevent metric explosion
|
30
|
+
Or install it yourself as:
|
28
31
|
|
29
|
-
|
32
|
+
```bash
|
33
|
+
gem install datadog-statsd-schema
|
34
|
+
```
|
30
35
|
|
31
|
-
## Quick Start
|
36
|
+
## Quick Start
|
32
37
|
|
33
|
-
|
38
|
+
### Basic Schema Definition
|
34
39
|
|
35
40
|
```ruby
|
36
41
|
require 'datadog/statsd/schema'
|
37
42
|
|
38
|
-
#
|
43
|
+
# Define your metrics schema
|
44
|
+
schema = Datadog::Statsd::Schema.new do
|
45
|
+
namespace :web do
|
46
|
+
tags do
|
47
|
+
tag :environment, values: %w[production staging development]
|
48
|
+
tag :service, values: %w[api web worker]
|
49
|
+
tag :region, values: %w[us-east-1 us-west-2]
|
50
|
+
end
|
51
|
+
|
52
|
+
metrics do
|
53
|
+
counter :requests_total do
|
54
|
+
description "Total HTTP requests"
|
55
|
+
tags required: [:environment, :service], allowed: [:region]
|
56
|
+
end
|
57
|
+
|
58
|
+
gauge :memory_usage do
|
59
|
+
description "Memory usage in bytes"
|
60
|
+
tags required: [:environment], allowed: [:service, :region]
|
61
|
+
end
|
62
|
+
|
63
|
+
distribution :request_duration do
|
64
|
+
description "Request processing time in milliseconds"
|
65
|
+
tags required: [:environment, :service]
|
66
|
+
end
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
70
|
+
```
|
71
|
+
|
72
|
+
### Using the Emitter with Schema Validation
|
73
|
+
|
74
|
+
```ruby
|
75
|
+
# Configure global settings
|
39
76
|
Datadog::Statsd::Schema.configure do |config|
|
40
|
-
config.tags = { env: 'production', service: 'web-app', version: '1.2.3' }
|
41
77
|
config.statsd = Datadog::Statsd.new('localhost', 8125)
|
78
|
+
config.schema = schema
|
79
|
+
config.tags = { environment: 'production' }
|
42
80
|
end
|
43
81
|
|
44
|
-
# Create an emitter
|
45
|
-
|
46
|
-
|
47
|
-
|
82
|
+
# Create an emitter with validation
|
83
|
+
emitter = Datadog::Statsd::Emitter.new(
|
84
|
+
schema: schema,
|
85
|
+
validation_mode: :strict # :strict, :warn, or :disabled
|
48
86
|
)
|
49
87
|
|
50
|
-
# Send
|
51
|
-
|
88
|
+
# Send metrics with automatic validation
|
89
|
+
emitter.increment('web.requests_total', tags: { service: 'api', region: 'us-east-1' })
|
90
|
+
emitter.gauge('web.memory_usage', 512_000_000, tags: { service: 'api' })
|
91
|
+
emitter.distribution('web.request_duration', 45.2, tags: { service: 'api' })
|
52
92
|
```
|
53
93
|
|
54
|
-
|
55
|
-
```ruby
|
56
|
-
# Metric: auth_service.login.success
|
57
|
-
# Tags: {
|
58
|
-
# env: 'production', # From global config
|
59
|
-
# service: 'web-app', # From global config
|
60
|
-
# version: '1.2.3', # From global config
|
61
|
-
# emitter: 'auth_service', # Auto-generated from first argument
|
62
|
-
# feature: 'user_auth', # From emitter constructor
|
63
|
-
# method: 'oauth' # From method call
|
64
|
-
# }
|
65
|
-
```
|
94
|
+
## CLI Usage
|
66
95
|
|
67
|
-
|
68
|
-
- Method-level tags override emitter tags
|
69
|
-
- Emitter tags override global tags
|
70
|
-
- Global tags are always included
|
96
|
+
The gem provides a command-line interface for analyzing schemas and understanding their cost implications.
|
71
97
|
|
72
|
-
|
98
|
+
### Installation
|
73
99
|
|
74
|
-
|
100
|
+
After installing the gem, the `dss` (Datadog StatsD Schema) command will be available:
|
75
101
|
|
76
|
-
|
102
|
+
```bash
|
103
|
+
dss --help
|
104
|
+
```
|
105
|
+
|
106
|
+
### Schema Analysis
|
107
|
+
|
108
|
+
Create a schema file (e.g., `metrics_schema.rb`):
|
77
109
|
|
78
110
|
```ruby
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
end
|
88
|
-
|
111
|
+
namespace :web do
|
112
|
+
tags do
|
113
|
+
tag :environment, values: %w[production staging development]
|
114
|
+
tag :service, values: %w[api web worker]
|
115
|
+
tag :region, values: %w[us-east-1 us-west-2 eu-west-1]
|
116
|
+
end
|
117
|
+
|
118
|
+
namespace :requests do
|
89
119
|
metrics do
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
tags required: [:signup_method], allowed: [:plan_type, :feature_flag]
|
120
|
+
counter :total do
|
121
|
+
description "Total HTTP requests"
|
122
|
+
tags required: [:environment, :service], allowed: [:region]
|
94
123
|
end
|
95
|
-
|
96
|
-
|
97
|
-
description "
|
98
|
-
|
124
|
+
|
125
|
+
distribution :duration do
|
126
|
+
description "Request processing time in milliseconds"
|
127
|
+
inherit_tags "web.requests.total"
|
99
128
|
end
|
100
129
|
end
|
101
130
|
end
|
131
|
+
|
132
|
+
metrics do
|
133
|
+
gauge :memory_usage do
|
134
|
+
description "Memory usage in bytes"
|
135
|
+
tags required: [:environment], allowed: [:service]
|
136
|
+
end
|
137
|
+
end
|
102
138
|
end
|
139
|
+
```
|
103
140
|
|
104
|
-
|
105
|
-
user_emitter = Datadog::Statsd::Emitter.new(
|
106
|
-
'UserService',
|
107
|
-
schema: user_metrics_schema,
|
108
|
-
validation_mode: :strict # Explode on invalid metrics (good for development)
|
109
|
-
)
|
141
|
+
Analyze the schema to understand metric costs:
|
110
142
|
|
111
|
-
|
112
|
-
|
143
|
+
```bash
|
144
|
+
dss analyze --file metrics_schema.rb --color
|
145
|
+
```
|
113
146
|
|
114
|
-
|
115
|
-
user_emitter.increment('signups', tags: { signup_method: 'facebook' })
|
147
|
+
**Output:**
|
116
148
|
|
117
|
-
|
118
|
-
user_emitter.increment('user_registrations')
|
149
|
+

|
119
150
|
|
120
|
-
|
121
|
-
user_emitter.increment('signups', tags: { plan_type: 'free' })
|
122
|
-
```
|
151
|
+
This analysis shows that your schema will generate **342 custom metrics** across **16 unique metric names**. Understanding this before deployment helps prevent unexpected Datadog billing surprises.
|
123
152
|
|
124
|
-
|
125
|
-
- ❌ Metrics that don't exist
|
126
|
-
- ❌ Wrong metric types (counter vs gauge vs distribution)
|
127
|
-
- ❌ Missing required tags
|
128
|
-
- ❌ Invalid tag values
|
129
|
-
- ❌ Tags that aren't allowed on specific metrics
|
153
|
+
## Advanced Features
|
130
154
|
|
131
|
-
|
155
|
+
### Tag Inheritance
|
132
156
|
|
133
|
-
|
157
|
+
Metrics can inherit tag configurations from other metrics to reduce duplication:
|
134
158
|
|
135
159
|
```ruby
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
downcase: ->(text) { text.downcase }
|
141
|
-
end
|
142
|
-
|
143
|
-
namespace :ecommerce do
|
144
|
-
tags do
|
145
|
-
# Finite set of product categories (not product IDs!)
|
146
|
-
tag :category, values: %w[electronics clothing books home_garden]
|
147
|
-
|
148
|
-
# Payment methods you actually support
|
149
|
-
tag :payment_method, values: %w[credit_card paypal apple_pay]
|
150
|
-
|
151
|
-
# Order status progression
|
152
|
-
tag :status, values: %w[pending processing shipped delivered cancelled]
|
153
|
-
|
154
|
-
# A/B test groups (not test IDs!)
|
155
|
-
tag :checkout_flow, values: %w[single_page multi_step express]
|
156
|
-
end
|
157
|
-
|
158
|
-
namespace :orders do
|
159
|
-
metrics do
|
160
|
-
counter :created do
|
161
|
-
description "New orders placed"
|
162
|
-
tags required: [:category], allowed: [:payment_method, :checkout_flow]
|
163
|
-
end
|
164
|
-
|
165
|
-
counter :completed do
|
166
|
-
description "Successfully processed orders"
|
167
|
-
inherit_tags: "ecommerce.orders.created" # Reuse tag definition
|
168
|
-
tags required: [:status]
|
169
|
-
end
|
170
|
-
|
171
|
-
distribution :value do
|
172
|
-
description "Order value distribution in cents"
|
173
|
-
units "cents"
|
174
|
-
tags required: [:category], allowed: [:payment_method]
|
175
|
-
end
|
176
|
-
|
177
|
-
gauge :processing_queue_size do
|
178
|
-
description "Orders waiting to be processed"
|
179
|
-
# No tags - just a simple queue size metric
|
180
|
-
end
|
181
|
-
end
|
160
|
+
namespace :api do
|
161
|
+
metrics do
|
162
|
+
counter :requests_total do
|
163
|
+
tags required: [:environment, :service], allowed: [:region]
|
182
164
|
end
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
tags required: [:category]
|
189
|
-
end
|
190
|
-
|
191
|
-
counter :restocked do
|
192
|
-
description "Inventory replenishment events"
|
193
|
-
tags required: [:category]
|
194
|
-
end
|
195
|
-
end
|
165
|
+
|
166
|
+
# Inherits environment, service (required) and region (allowed) from requests_total
|
167
|
+
distribution :request_duration do
|
168
|
+
inherit_tags "api.requests_total"
|
169
|
+
tags required: [:endpoint] # Adds endpoint as additional required tag
|
196
170
|
end
|
197
171
|
end
|
198
172
|
end
|
199
|
-
|
200
|
-
# Usage in your order processing service
|
201
|
-
order_processor = Datadog::Statsd::Emitter.new(
|
202
|
-
'OrderProcessor',
|
203
|
-
schema: ecommerce_schema,
|
204
|
-
metric: 'ecommerce.orders', # Prefix for all metrics from this emitter
|
205
|
-
tags: { checkout_flow: 'single_page' }
|
206
|
-
)
|
207
|
-
|
208
|
-
# Process an order - clean, validated metrics
|
209
|
-
order_processor.increment('created', tags: {
|
210
|
-
category: 'electronics',
|
211
|
-
payment_method: 'credit_card'
|
212
|
-
})
|
213
|
-
|
214
|
-
order_processor.distribution('value', 15_99, tags: {
|
215
|
-
category: 'electronics',
|
216
|
-
payment_method: 'credit_card'
|
217
|
-
})
|
218
|
-
|
219
|
-
order_processor.gauge('processing_queue_size', 12)
|
220
173
|
```
|
221
174
|
|
222
|
-
###
|
175
|
+
### Nested Namespaces
|
176
|
+
|
177
|
+
Organize metrics hierarchically with nested namespaces:
|
223
178
|
|
224
179
|
```ruby
|
225
|
-
|
226
|
-
|
180
|
+
namespace :application do
|
181
|
+
tags do
|
182
|
+
tag :environment, values: %w[prod staging dev]
|
183
|
+
end
|
184
|
+
|
185
|
+
namespace :database do
|
227
186
|
tags do
|
228
|
-
|
229
|
-
tag :method, values: %w[GET POST PUT PATCH DELETE]
|
230
|
-
|
231
|
-
# Standardized controller names (transformed to snake_case)
|
232
|
-
tag :controller,
|
233
|
-
values: %r{^[a-z_]+$}, # Regex validation
|
234
|
-
transform: [:underscore, :downcase]
|
235
|
-
|
236
|
-
# Standard HTTP status code ranges
|
237
|
-
tag :status_class, values: %w[2xx 3xx 4xx 5xx]
|
238
|
-
tag :status_code,
|
239
|
-
type: :integer,
|
240
|
-
validate: ->(code) { (100..599).include?(code) }
|
241
|
-
|
242
|
-
# Feature flags for A/B testing
|
243
|
-
tag :feature_version, values: %w[v1 v2 experimental]
|
187
|
+
tag :table_name, values: %w[users orders products]
|
244
188
|
end
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
description "Total API requests"
|
250
|
-
tags required: [:method, :controller],
|
251
|
-
allowed: [:status_class, :feature_version]
|
252
|
-
end
|
253
|
-
|
254
|
-
distribution :duration do
|
255
|
-
description "Request processing time"
|
256
|
-
units "milliseconds"
|
257
|
-
inherit_tags: "api.requests.total"
|
258
|
-
tags required: [:status_code]
|
259
|
-
end
|
260
|
-
|
261
|
-
histogram :response_size do
|
262
|
-
description "Response payload size distribution"
|
263
|
-
units "bytes"
|
264
|
-
tags required: [:method, :controller]
|
265
|
-
end
|
266
|
-
end
|
267
|
-
end
|
268
|
-
|
269
|
-
namespace :errors do
|
270
|
-
metrics do
|
271
|
-
counter :total do
|
272
|
-
description "API errors by type"
|
273
|
-
tags required: [:controller, :status_code]
|
274
|
-
end
|
275
|
-
end
|
189
|
+
|
190
|
+
metrics do
|
191
|
+
counter :queries_total
|
192
|
+
distribution :query_duration
|
276
193
|
end
|
277
194
|
end
|
278
|
-
end
|
279
195
|
|
280
|
-
|
281
|
-
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
self.class.name,
|
290
|
-
schema: api_schema,
|
291
|
-
metric: 'api',
|
292
|
-
validation_mode: Rails.env.production? ? :warn : :strict
|
293
|
-
)
|
294
|
-
end
|
295
|
-
|
296
|
-
def track_request
|
297
|
-
controller_name = self.class.name.gsub('Controller', '').underscore
|
298
|
-
|
299
|
-
@api_metrics.increment('requests.total', tags: {
|
300
|
-
method: request.method,
|
301
|
-
controller: controller_name,
|
302
|
-
status_class: "#{response.status.to_s[0]}xx"
|
303
|
-
})
|
304
|
-
|
305
|
-
@api_metrics.distribution('requests.duration',
|
306
|
-
request_duration_ms,
|
307
|
-
tags: {
|
308
|
-
method: request.method,
|
309
|
-
controller: controller_name,
|
310
|
-
status_code: response.status
|
311
|
-
}
|
312
|
-
)
|
196
|
+
namespace :cache do
|
197
|
+
tags do
|
198
|
+
tag :cache_type, values: %w[redis memcached]
|
199
|
+
end
|
200
|
+
|
201
|
+
metrics do
|
202
|
+
counter :hits_total
|
203
|
+
counter :misses_total
|
204
|
+
end
|
313
205
|
end
|
314
206
|
end
|
315
207
|
```
|
316
208
|
|
317
|
-
|
209
|
+
### Validation Modes
|
318
210
|
|
319
|
-
|
211
|
+
Control how validation errors are handled:
|
320
212
|
|
321
213
|
```ruby
|
322
|
-
#
|
323
|
-
|
324
|
-
'MyService',
|
325
|
-
schema: my_schema,
|
326
|
-
validation_mode: :strict # Raises exceptions
|
327
|
-
)
|
214
|
+
# Strict mode: Raises exceptions on validation failures
|
215
|
+
emitter = Datadog::Statsd::Emitter.new(schema: schema, validation_mode: :strict)
|
328
216
|
|
329
|
-
#
|
330
|
-
|
331
|
-
'MyService',
|
332
|
-
schema: my_schema,
|
333
|
-
validation_mode: :warn # Prints to stderr, continues execution
|
334
|
-
)
|
217
|
+
# Warn mode: Logs warnings but continues execution
|
218
|
+
emitter = Datadog::Statsd::Emitter.new(schema: schema, validation_mode: :warn)
|
335
219
|
|
336
|
-
#
|
337
|
-
|
338
|
-
'MyService',
|
339
|
-
schema: my_schema,
|
340
|
-
validation_mode: :drop # Silently drops invalid metrics
|
341
|
-
)
|
342
|
-
|
343
|
-
# Emergency: Turn off validation entirely
|
344
|
-
emergency_emitter = Datadog::Statsd::Emitter.new(
|
345
|
-
'MyService',
|
346
|
-
schema: my_schema,
|
347
|
-
validation_mode: :off # No validation at all
|
348
|
-
)
|
220
|
+
# Disabled: No validation (production default)
|
221
|
+
emitter = Datadog::Statsd::Emitter.new(schema: schema, validation_mode: :disabled)
|
349
222
|
```
|
350
223
|
|
351
|
-
|
224
|
+
### Global Configuration
|
352
225
|
|
353
|
-
|
226
|
+
Set up global defaults for your application:
|
354
227
|
|
355
228
|
```ruby
|
356
|
-
|
357
|
-
|
358
|
-
|
359
|
-
|
360
|
-
|
361
|
-
|
362
|
-
|
363
|
-
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
description "User sessions initiated"
|
368
|
-
tags required: [:session_type], allowed: [:auth_method, :plan_tier]
|
369
|
-
end
|
370
|
-
|
371
|
-
counter :ended do
|
372
|
-
description "User sessions terminated"
|
373
|
-
tags required: [:session_type, :auth_method]
|
374
|
-
end
|
375
|
-
|
376
|
-
distribution :duration do
|
377
|
-
description "How long sessions last"
|
378
|
-
units "minutes"
|
379
|
-
tags required: [:session_type]
|
380
|
-
end
|
381
|
-
end
|
382
|
-
end
|
229
|
+
Datadog::Statsd::Schema.configure do |config|
|
230
|
+
config.statsd = Datadog::Statsd.new(
|
231
|
+
ENV['DATADOG_AGENT_HOST'] || 'localhost',
|
232
|
+
ENV['DATADOG_AGENT_PORT'] || 8125
|
233
|
+
)
|
234
|
+
config.schema = schema
|
235
|
+
config.tags = {
|
236
|
+
environment: ENV['RAILS_ENV'],
|
237
|
+
service: 'my-application',
|
238
|
+
version: ENV['APP_VERSION']
|
239
|
+
}
|
383
240
|
end
|
384
241
|
|
385
|
-
#
|
386
|
-
|
387
|
-
|
388
|
-
statsd.gauge('active_users_on_mobile_free_plan_from_usa', 1000) # Way too specific!
|
242
|
+
# These global tags are automatically added to all metrics
|
243
|
+
emitter = Datadog::Statsd::Emitter.new
|
244
|
+
emitter.increment('user.signup') # Automatically includes global tags
|
389
245
|
```
|
390
246
|
|
391
|
-
|
247
|
+
## Cost Control and Best Practices
|
248
|
+
|
249
|
+
### Understanding Metric Expansion
|
250
|
+
|
251
|
+
Different metric types create different numbers of time series:
|
252
|
+
|
253
|
+
- **Counter/Set**: 1 time series per unique tag combination
|
254
|
+
- **Gauge**: 5 time series (count, min, max, sum, avg)
|
255
|
+
- **Distribution/Histogram**: 10 time series (count, min, max, sum, avg, p50, p75, p90, p95, p99)
|
256
|
+
|
257
|
+
### Tag Value Limits
|
258
|
+
|
259
|
+
Be mindful of tag cardinality:
|
392
260
|
|
393
261
|
```ruby
|
394
|
-
#
|
395
|
-
tag :
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
400
|
-
tag :user_id # Millions of possible values!
|
401
|
-
tag :session_id # Unique every time!
|
402
|
-
tag :timestamp # Infinite values!
|
403
|
-
tag :request_path # Thousands of unique URLs!
|
262
|
+
# High cardinality - avoid
|
263
|
+
tag :user_id, type: :string # Could be millions of values
|
264
|
+
|
265
|
+
# Better approach - use bucketing
|
266
|
+
tag :user_tier, values: %w[free premium enterprise]
|
267
|
+
tag :user_cohort, values: %w[new_user returning_user power_user]
|
404
268
|
```
|
405
269
|
|
406
|
-
|
270
|
+
> [!CAUTION]
|
271
|
+
> Be mindful of the number of tags and tag values your schema allows.
|
272
|
+
|
273
|
+
### Schema Validation
|
274
|
+
|
275
|
+
Always validate your schema before deployment:
|
407
276
|
|
408
277
|
```ruby
|
409
|
-
|
410
|
-
|
411
|
-
|
412
|
-
|
413
|
-
|
414
|
-
end
|
415
|
-
|
416
|
-
# ✅ Use gauges for current state/levels
|
417
|
-
gauge :queue_size do
|
418
|
-
description "Emails waiting to be sent"
|
419
|
-
end
|
420
|
-
|
421
|
-
# ✅ Use distributions for value analysis (careful - creates 10 metrics!)
|
422
|
-
distribution :delivery_time do
|
423
|
-
description "Time from send to delivery"
|
424
|
-
units "seconds"
|
425
|
-
end
|
426
|
-
|
427
|
-
# ⚠️ Use histograms sparingly (creates 5 metrics each)
|
428
|
-
histogram :processing_time do
|
429
|
-
description "Email processing duration"
|
430
|
-
units "milliseconds"
|
431
|
-
end
|
432
|
-
|
433
|
-
# ⚠️ Use sets very carefully (tracks unique values)
|
434
|
-
set :unique_recipients do
|
435
|
-
description "Unique email addresses receiving mail"
|
436
|
-
end
|
437
|
-
end
|
278
|
+
# Check for common issues
|
279
|
+
errors = schema.validate
|
280
|
+
if errors.any?
|
281
|
+
puts "Schema validation errors:"
|
282
|
+
errors.each { |error| puts " - #{error}" }
|
438
283
|
end
|
439
284
|
```
|
440
285
|
|
441
|
-
|
286
|
+
## Integration Examples
|
287
|
+
|
288
|
+
### Sidekiq Job Monitoring
|
289
|
+
|
290
|
+
Imagine that we are building a Rails application, and we prefer to create our own tracking of the jobs performed, failed, succeeded, as well as their duration.
|
291
|
+
|
292
|
+
> [!TIP]
|
293
|
+
> A very similar approach would work for tracking eg. requests coming to the `ApplicationController` subclasses.
|
294
|
+
|
295
|
+
First, let's initialize the schema from a file (we'll dive into the schema a bit later):
|
442
296
|
|
443
297
|
```ruby
|
444
|
-
#
|
445
|
-
|
446
|
-
|
447
|
-
|
448
|
-
|
449
|
-
|
450
|
-
|
451
|
-
|
452
|
-
|
453
|
-
|
454
|
-
|
455
|
-
description "Payment attempts started"
|
456
|
-
tags required: [:payment_method], allowed: [:currency, :region]
|
457
|
-
end
|
458
|
-
|
459
|
-
counter :completed do
|
460
|
-
description "Successful payments"
|
461
|
-
inherit_tags: "payments.initiated" # Reuses the tag configuration
|
462
|
-
end
|
463
|
-
|
464
|
-
counter :failed do
|
465
|
-
description "Failed payment attempts"
|
466
|
-
inherit_tags: "payments.initiated"
|
467
|
-
tags required: [:failure_reason] # Add specific tags as needed
|
468
|
-
end
|
469
|
-
end
|
470
|
-
end
|
298
|
+
# config/initializers/datadog_statsd.rb
|
299
|
+
SIDEKIQ_SCHEMA = Datadog::Statsd::Schema.load_file(Rails.root.join('config/metrics/sidekiq.rb'))
|
300
|
+
|
301
|
+
Datadog::Statsd::Schema.configure do |config|
|
302
|
+
config.statsd = Datadog::Statsd.new
|
303
|
+
config.schema = SIDEKIQ_SCHEMA
|
304
|
+
config.tags = {
|
305
|
+
environment: Rails.env,
|
306
|
+
service: 'my-rails-app',
|
307
|
+
version: ENV['DEPLOY_SHA']
|
308
|
+
}
|
471
309
|
end
|
472
310
|
```
|
473
311
|
|
474
|
-
|
312
|
+
#### Adding Statsd Tracking to a Worker
|
313
|
+
|
314
|
+
In this example, a job monitors itself by submitting a relevant statsd metrics:
|
475
315
|
|
476
316
|
```ruby
|
477
|
-
|
478
|
-
|
479
|
-
|
480
|
-
|
481
|
-
|
482
|
-
|
483
|
-
|
484
|
-
|
485
|
-
# Stock and fulfillment metrics
|
486
|
-
end
|
317
|
+
class OrderProcessingJob
|
318
|
+
QUEUE = 'orders'.freeze
|
319
|
+
|
320
|
+
include Sidekiq::Job
|
321
|
+
sidekiq_options queue: QUEUE
|
322
|
+
|
323
|
+
def perform(order_id)
|
324
|
+
start_time = Time.current
|
487
325
|
|
488
|
-
|
489
|
-
|
326
|
+
begin
|
327
|
+
process_order(order_id)
|
328
|
+
emitter.increment('order_processing.success')
|
329
|
+
rescue => error
|
330
|
+
emitter.increment(
|
331
|
+
'order_processing.failure',
|
332
|
+
tags: { error_type: error.class.name }
|
333
|
+
)
|
334
|
+
raise
|
335
|
+
ensure
|
336
|
+
duration = Time.current - start_time
|
337
|
+
emitter.distribution(
|
338
|
+
'jobs.order_processing.duration',
|
339
|
+
duration * 1000
|
340
|
+
)
|
490
341
|
end
|
491
342
|
end
|
492
|
-
|
493
|
-
|
494
|
-
|
495
|
-
|
496
|
-
|
497
|
-
|
498
|
-
|
499
|
-
|
500
|
-
|
343
|
+
|
344
|
+
# Create an instance of an Emitter equipped with our metric
|
345
|
+
# prefix and the tags.
|
346
|
+
def emitter
|
347
|
+
@emitter ||= Datadog::Statsd::Emitter.new(
|
348
|
+
self,
|
349
|
+
metric: 'job',
|
350
|
+
tags: { queue: QUEUE }
|
351
|
+
)
|
501
352
|
end
|
502
353
|
end
|
503
|
-
|
504
|
-
# ❌ Bad: Flat namespace chaos
|
505
|
-
# orders.created
|
506
|
-
# orders_completed
|
507
|
-
# order::cancelled
|
508
|
-
# INVENTORY_LOW
|
509
|
-
# db.query.time
|
510
|
-
# cache_hits
|
511
354
|
```
|
512
355
|
|
513
|
-
|
356
|
+
The above Emitter will generate the following metrics:
|
514
357
|
|
515
|
-
|
358
|
+
* `job.order_processing.success` (counter)
|
359
|
+
* `job.order_processing.failure` (counter)
|
360
|
+
* `job.order_processing.duration.count`
|
361
|
+
* `job.order_processing.duration.min`
|
362
|
+
* `job.order_processing.duration.max`
|
363
|
+
* `job.order_processing.duration.sum`
|
364
|
+
* `job.order_processing.duration.avg`
|
516
365
|
|
517
|
-
```ruby
|
518
|
-
# Set up global configuration in your initializer
|
519
|
-
Datadog::Statsd::Schema.configure do |config|
|
520
|
-
# Global tags applied to ALL metrics
|
521
|
-
config.tags = {
|
522
|
-
env: Rails.env,
|
523
|
-
service: 'web-app',
|
524
|
-
version: ENV['GIT_SHA']&.first(7),
|
525
|
-
datacenter: ENV['DATACENTER'] || 'us-east-1'
|
526
|
-
}
|
527
|
-
|
528
|
-
# The actual StatsD client
|
529
|
-
config.statsd = Datadog::Statsd.new(
|
530
|
-
ENV['STATSD_HOST'] || 'localhost',
|
531
|
-
ENV['STATSD_PORT'] || 8125,
|
532
|
-
namespace: ENV['STATSD_NAMESPACE'],
|
533
|
-
tags: [], # Don't double-up tags here
|
534
|
-
delay_serialization: true
|
535
|
-
)
|
536
|
-
end
|
537
|
-
```
|
538
366
|
|
539
|
-
|
367
|
+
However, you can see that doing this in each job is not practical. Therefore the first question that should be on our mind is — how do we make it so that this behavior would automatically apply to any Job we create?
|
368
|
+
|
369
|
+
#### Tracking All Sidekiq Workers At Once
|
370
|
+
|
371
|
+
The qustion postulated above is — can we come up with a class design patter that allows us to write this code once and forget about it?
|
372
|
+
|
373
|
+
**Let's take Ruby's metaprogramming for a spin.**
|
374
|
+
|
375
|
+
One of the most flexible methods to add functionality to all jobs is to create a module that the job classes include *instead of* the implementation-specific `Sidekiq::Job`.
|
376
|
+
|
377
|
+
So let's create our own module, let's call it `BackgroundWorker`, that we'll include into our classes instead. Once created, we'd like for our job classes to look like this:
|
540
378
|
|
541
379
|
```ruby
|
542
|
-
|
543
|
-
|
544
|
-
|
545
|
-
downcase: ->(text) { text.downcase }
|
546
|
-
truncate: ->(text) { text.first(20) }
|
547
|
-
end
|
380
|
+
class OrderProcessingJob
|
381
|
+
include BackgroundWorker
|
382
|
+
sidekiq_options queue: 'orders'
|
548
383
|
|
549
|
-
|
550
|
-
|
551
|
-
# Controller names get normalized automatically
|
552
|
-
tag :controller,
|
553
|
-
values: %r{^[a-z_]+$},
|
554
|
-
transform: [:underscore, :downcase] # Applied in order
|
555
|
-
|
556
|
-
# Action names also get cleaned up
|
557
|
-
tag :action,
|
558
|
-
values: %w[index show create update destroy],
|
559
|
-
transform: [:downcase]
|
560
|
-
end
|
384
|
+
def perform(order_id)
|
385
|
+
# perform the work for the given order ID
|
561
386
|
end
|
562
387
|
end
|
563
|
-
|
564
|
-
# "UserSettingsController" becomes "user_settings_controller"
|
565
|
-
# "CreateUser" becomes "create_user"
|
566
388
|
```
|
567
389
|
|
568
|
-
|
390
|
+
So our module, when included, should:
|
391
|
+
|
392
|
+
* include `Sidekiq::Job` as well
|
393
|
+
* define the `emitter` method so that it's available to all Job instances
|
394
|
+
* wrap `perform` method in the exception handling block that emits corresponding metrics as in our example before.
|
395
|
+
|
396
|
+
The only tricky part here is the last one: wrapping `perform` method in some shared code. This used to require "monkey patching", but no more. These days Ruby gives us an all-powerful `prepend` method that does exactly what we need.
|
397
|
+
|
398
|
+
Final adjustment we'd like to make is the metric naming.
|
399
|
+
|
400
|
+
While the metrics such as:
|
401
|
+
|
402
|
+
* `job.order_processing.success` (counter)
|
403
|
+
* `job.order_processing.failure` (counter)
|
404
|
+
|
405
|
+
are easy to understand, the question begs: do we really need to insert the job's class name into the metric name? Or — is there a better way?
|
406
|
+
|
407
|
+
The truth is — there is! Why create 7 unique metrics **per job** when we can simply submit the same metrics for all jobs, tagged with our job's class name as an "emitter" source?
|
408
|
+
|
409
|
+
#### Module for including into Backround Worker
|
410
|
+
|
411
|
+
So, without furether ado, here we go:
|
569
412
|
|
570
413
|
```ruby
|
571
|
-
|
572
|
-
|
573
|
-
|
574
|
-
|
575
|
-
|
576
|
-
|
577
|
-
|
578
|
-
|
579
|
-
|
580
|
-
|
581
|
-
|
582
|
-
|
583
|
-
|
584
|
-
|
585
|
-
|
414
|
+
module BackgroundWorker
|
415
|
+
class << self
|
416
|
+
def included(klass)
|
417
|
+
klass.include(Sidekiq::Job)
|
418
|
+
klass.prepend(InstanceMethods)
|
419
|
+
end
|
420
|
+
|
421
|
+
module InstanceMethods
|
422
|
+
def perform(...)
|
423
|
+
start_time = Time.current
|
424
|
+
tags = {}
|
425
|
+
error = nil
|
426
|
+
|
427
|
+
begin
|
428
|
+
super(...)
|
429
|
+
emitter.increment("success")
|
430
|
+
rescue => error
|
431
|
+
tags.merge!({ error_type: error.class.name } )
|
432
|
+
emitter.increment("failure", tags:)
|
433
|
+
raise
|
434
|
+
ensure
|
435
|
+
duration = Time.current - start_time
|
436
|
+
emitter.distribution(
|
437
|
+
"duration",
|
438
|
+
duration * 1000,
|
439
|
+
tags:
|
440
|
+
)
|
441
|
+
end
|
442
|
+
end
|
443
|
+
|
444
|
+
def emitter
|
445
|
+
@emitter ||= Datadog::Statsd::Emitter.new(
|
446
|
+
metric: 'sidekiq.job',
|
447
|
+
tags: {
|
448
|
+
queue: sidekiq_options[:queue],
|
449
|
+
job: self.class.name
|
450
|
+
}
|
451
|
+
)
|
452
|
+
end
|
586
453
|
end
|
587
454
|
end
|
588
455
|
end
|
589
456
|
```
|
590
457
|
|
591
|
-
|
458
|
+
> [!TIP]
|
459
|
+
> In a nutshell, we created a reusable module that, upon being included into any Job class, provides reliable tracking of job successes and failures, as well as the duration. The duration can be graphed for all successful jobs by ensuring the tag `error_type` does not exist.
|
592
460
|
|
593
|
-
|
594
|
-
# config/metrics_schema.rb
|
595
|
-
Datadog::Statsd::Schema.new do
|
596
|
-
namespace :my_app do
|
597
|
-
# ... schema definition
|
598
|
-
end
|
599
|
-
end
|
461
|
+
So, the above strategy will generate the following metrics **FOR ALL** jobs:
|
600
462
|
|
601
|
-
|
602
|
-
schema = Datadog::Statsd::Schema.load_file('config/metrics_schema.rb')
|
603
|
-
```
|
463
|
+
The above Emitter will generate the following metrics:
|
604
464
|
|
605
|
-
|
465
|
+
* `sidekiq.job.success` (counter)
|
466
|
+
* `sidekiq.job.failure` (counter)
|
467
|
+
* `sidekiq.job.duration.count`
|
468
|
+
* `sidekiq.job.duration.min`
|
469
|
+
* `sidekiq.job.duration.max`
|
470
|
+
* `sidekiq.job.duration.sum`
|
471
|
+
* `sidekiq.job.duration.avg`
|
606
472
|
|
607
|
-
|
473
|
+
that will be tagged with the following tags:
|
608
474
|
|
609
|
-
|
610
|
-
|
611
|
-
|
475
|
+
* `queue: ... `
|
476
|
+
* `job: { 'OrderProcessingJob' | ... }`
|
477
|
+
* `environment: { "production" | "staging" | "development" }`
|
478
|
+
* `service: 'my-rails-app'`
|
479
|
+
* `version: { "git-sha" }`
|
480
|
+
|
481
|
+
## Development
|
612
482
|
|
613
|
-
|
483
|
+
After checking out the repo, run:
|
614
484
|
|
615
485
|
```bash
|
616
|
-
|
486
|
+
bin/setup # Install dependencies
|
487
|
+
bundle exec rspec # Run Specs
|
488
|
+
bundle exec rubocop # Run Rubocop
|
617
489
|
```
|
618
490
|
|
619
|
-
|
620
|
-
|
621
|
-
This gem transforms Datadog custom metrics from a "wild west" free-for-all into a disciplined, cost-effective observability strategy:
|
622
|
-
|
623
|
-
- **🎯 Intentional Metrics**: Define what you measure before you measure it
|
624
|
-
- **💰 Cost Control**: Prevent infinite cardinality and metric explosion
|
625
|
-
- **🏷️ Consistent Tagging**: Global and hierarchical tag management
|
626
|
-
- **🔍 Better Insights**: Finite tag values enable proper aggregation and analysis
|
627
|
-
- **👥 Team Alignment**: Schema serves as documentation and contract
|
491
|
+
To install this gem onto your local machine:
|
628
492
|
|
629
|
-
|
630
|
-
|
631
|
-
|
493
|
+
```bash
|
494
|
+
bundle exec rake install
|
495
|
+
```
|
632
496
|
|
633
497
|
## Contributing
|
634
498
|
|
635
|
-
Bug reports and pull requests are welcome on GitHub at
|
499
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/kigster/datadog-statsd-schema.
|
636
500
|
|
637
501
|
## License
|
638
502
|
|