dot-metrics 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dot_metrics-0.1.0/.claude/skills/integrate-dot-metrics/SKILL.md +322 -0
- dot_metrics-0.1.0/.gitignore +23 -0
- dot_metrics-0.1.0/LICENSE +1 -0
- dot_metrics-0.1.0/PKG-INFO +312 -0
- dot_metrics-0.1.0/README.md +292 -0
- dot_metrics-0.1.0/docs/CONTRIBUTING.md +31 -0
- dot_metrics-0.1.0/docs/DEVELOPMENT.md +51 -0
- dot_metrics-0.1.0/docs/PUBLICATION.md +89 -0
- dot_metrics-0.1.0/examples/basic.py +39 -0
- dot_metrics-0.1.0/pyproject.toml +59 -0
- dot_metrics-0.1.0/src/dot_metrics/__init__.py +21 -0
- dot_metrics-0.1.0/src/dot_metrics/display.py +75 -0
- dot_metrics-0.1.0/src/dot_metrics/metric_set.py +280 -0
- dot_metrics-0.1.0/src/dot_metrics/models.py +124 -0
- dot_metrics-0.1.0/tests/__init__.py +0 -0
- dot_metrics-0.1.0/tests/test_display.py +34 -0
- dot_metrics-0.1.0/tests/test_eval_result.py +61 -0
- dot_metrics-0.1.0/tests/test_metric_set.py +426 -0
- dot_metrics-0.1.0/uv.lock +486 -0
|
@@ -0,0 +1,322 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: integrate-dot-metrics
|
|
3
|
+
description: Integrates the dot-metrics package into an existing project that already has metrics implemented. Use when the user wants to migrate ad-hoc metric functions, quality scores, constraint checks, or evaluation logic to the dot-metrics framework. Covers identifying existing metrics, defining input types, registering metrics and constraints, wiring batch evaluation, and replacing call sites.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Skill: Integrate dot-metrics into an existing project
|
|
7
|
+
|
|
8
|
+
## Description
|
|
9
|
+
|
|
10
|
+
Step-by-step guide to migrate an existing metrics implementation to the `dot-metrics` framework. Covers identifying existing metrics, defining an input type, registering metrics and constraints, wiring batch evaluation, and replacing call sites.
|
|
11
|
+
|
|
12
|
+
## When to use
|
|
13
|
+
|
|
14
|
+
- A project already computes metrics (quality scores, constraint checks, KPIs, evaluation functions) with ad-hoc code
|
|
15
|
+
- You want structured, typed, batch-capable metric evaluation via `dot-metrics`
|
|
16
|
+
|
|
17
|
+
## Prerequisites
|
|
18
|
+
|
|
19
|
+
- Python 3.12+
|
|
20
|
+
- The project must already have some form of metrics or evaluation logic
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Step-by-step integration
|
|
25
|
+
|
|
26
|
+
### 1. Install dot-metrics
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
pip install dot-metrics
|
|
30
|
+
# or in pyproject.toml dependencies:
|
|
31
|
+
# "dot-metrics>=0.1.0"
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### 2. Identify the existing metrics landscape
|
|
35
|
+
|
|
36
|
+
Scan the codebase for:
|
|
37
|
+
- Functions that compute numeric scores, rates, counts, or pass/fail checks
|
|
38
|
+
- A "compute all metrics" orchestrator function (often returns a list or dict of results)
|
|
39
|
+
- The data types those functions receive as input (the "context" or "input" objects)
|
|
40
|
+
|
|
41
|
+
Typical patterns you'll find:
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
# Pattern A: Functions returning custom Metric objects
|
|
45
|
+
def coverage_score(context, response) -> Metric:
|
|
46
|
+
value = preserved / total * 100
|
|
47
|
+
return Metric(name="coverage_score", value=value, type=MetricType.PERCENTAGE, unit="%", ...)
|
|
48
|
+
|
|
49
|
+
# Pattern B: Functions returning raw floats
|
|
50
|
+
def compute_fulfillment_rate(request, schedule) -> float:
|
|
51
|
+
return fulfilled / total
|
|
52
|
+
|
|
53
|
+
# Pattern C: Dict-based definitions dispatched by key
|
|
54
|
+
METRICS = {"coverage": {"fn": compute_coverage, "higher_is_better": True, ...}}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Inventory what you find.** For each metric, note:
|
|
58
|
+
- Its **key/name** (e.g. `"coverage_score"`)
|
|
59
|
+
- Whether it's a **metric** (continuous value) or a **constraint** (pass/fail against a threshold)
|
|
60
|
+
- Its **unit**, **direction** (`higher_is_better`), and any **threshold** for constraints
|
|
61
|
+
- Whether the function returns **debug info** alongside the value
|
|
62
|
+
|
|
63
|
+
### 3. Define the metric input type
|
|
64
|
+
|
|
65
|
+
dot-metrics metric functions take **exactly one argument**. If your existing functions take multiple arguments (e.g. `context` and `response`), bundle them into a single object:
|
|
66
|
+
|
|
67
|
+
```python
|
|
68
|
+
from dataclasses import dataclass
|
|
69
|
+
|
|
70
|
+
@dataclass
|
|
71
|
+
class EvalInput:
|
|
72
|
+
context: Schedule
|
|
73
|
+
response: RecommendationResponse
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Or use a Pydantic model, a TypedDict, a plain dict, a NamedTuple -- anything works. The key is: **one argument in, one number out**.
|
|
77
|
+
|
|
78
|
+
> **Favour a type that can represent different parametric scenarios.** If you foresee running the same metric set on multiple batches (e.g. different schedules, different parameter configs), a single bundled input type makes `compute_batch()` natural.
|
|
79
|
+
|
|
80
|
+
### 4. Create the MetricSet
|
|
81
|
+
|
|
82
|
+
Create a dedicated module (e.g. `metrics/metric_set.py` or `evaluation/metrics.py`):
|
|
83
|
+
|
|
84
|
+
```python
|
|
85
|
+
from dot_metrics import MetricSet, ComputedValue
|
|
86
|
+
|
|
87
|
+
metric_set: MetricSet[EvalInput] = MetricSet()
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
The generic annotation `MetricSet[EvalInput]` is optional but gives you static type checking on all registered functions.
|
|
91
|
+
|
|
92
|
+
### 5. Register metrics
|
|
93
|
+
|
|
94
|
+
Translate each existing metric function. Two styles are available:
|
|
95
|
+
|
|
96
|
+
#### Decorator style (preferred for standalone metric modules)
|
|
97
|
+
|
|
98
|
+
```python
|
|
99
|
+
@metric_set.metric("coverage_score", unit="%", description="Percentage of initial items preserved")
|
|
100
|
+
def coverage_score(data: EvalInput):
|
|
101
|
+
"""Percentage of initial items preserved in the final output.
|
|
102
|
+
|
|
103
|
+
Range: 0-100
|
|
104
|
+
Interpretation:
|
|
105
|
+
- 90-100: Excellent retention
|
|
106
|
+
- <70: Significant loss
|
|
107
|
+
"""
|
|
108
|
+
initial = len(data.context.items)
|
|
109
|
+
preserved = sum(1 for u in data.response.updates if u.action != Action.DELETE)
|
|
110
|
+
return (preserved / initial) * 100 if initial else 100.0
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
#### Imperative style (preferred when migrating many metrics in bulk)
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
metric_set.add(
|
|
117
|
+
"coverage_score",
|
|
118
|
+
lambda data: (
|
|
119
|
+
sum(1 for u in data.response.updates if u.action != Action.DELETE)
|
|
120
|
+
/ len(data.context.items) * 100
|
|
121
|
+
if data.context.items else 100.0
|
|
122
|
+
),
|
|
123
|
+
unit="%",
|
|
124
|
+
description="Percentage of initial items preserved",
|
|
125
|
+
)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
#### Attaching debug info
|
|
129
|
+
|
|
130
|
+
If a metric currently returns debug data (e.g. list of violations, IDs of affected items), return a `ComputedValue`:
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
@metric_set.metric("scheduling_rate")
|
|
134
|
+
def scheduling_rate(data: EvalInput):
|
|
135
|
+
scheduled = [e for e in data.response.entries if e.scheduled]
|
|
136
|
+
unscheduled_ids = [e.id for e in data.response.entries if not e.scheduled]
|
|
137
|
+
return ComputedValue(
|
|
138
|
+
value=len(scheduled) / len(data.context.appointments),
|
|
139
|
+
debug={"unscheduled": unscheduled_ids},
|
|
140
|
+
)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### 6. Register constraints
|
|
144
|
+
|
|
145
|
+
Constraints are metrics with a **threshold**. A constraint passes when `value <= threshold`.
|
|
146
|
+
|
|
147
|
+
```python
|
|
148
|
+
@metric_set.constraint("constraint_violations", threshold=0, unit="violations")
|
|
149
|
+
def constraint_violations(data: EvalInput):
|
|
150
|
+
"""Total count of all constraint violations.
|
|
151
|
+
|
|
152
|
+
Range: 0+
|
|
153
|
+
Interpretation:
|
|
154
|
+
- 0: All constraints satisfied
|
|
155
|
+
- >0: Schedule has violations that need attention
|
|
156
|
+
"""
|
|
157
|
+
return (
|
|
158
|
+
count_double_bookings(data)
|
|
159
|
+
+ count_availability_violations(data)
|
|
160
|
+
+ count_schedule_violations(data)
|
|
161
|
+
)
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
The sub-counts (`count_double_bookings`, etc.) can be helper functions, or they can themselves be registered as separate constraints if you want individual pass/fail tracking:
|
|
165
|
+
|
|
166
|
+
```python
|
|
167
|
+
@metric_set.constraint("double_bookings", threshold=0, unit="violations")
|
|
168
|
+
def double_bookings(data: EvalInput):
|
|
169
|
+
overlaps = find_overlapping_slots(data)
|
|
170
|
+
return ComputedValue(value=len(overlaps), debug={"overlaps": overlaps})
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### 7. Replace the orchestrator
|
|
174
|
+
|
|
175
|
+
**Before** (typical pattern):
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
def compute_metrics(context, response) -> list[Metric]:
|
|
179
|
+
metrics = []
|
|
180
|
+
metrics.append(coverage_score(context, response))
|
|
181
|
+
metrics.append(constraint_violations(context, response))
|
|
182
|
+
metrics.extend(recommendation_counts(context, response))
|
|
183
|
+
return metrics
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
**After** (single compute call):
|
|
187
|
+
|
|
188
|
+
```python
|
|
189
|
+
from myproject.metrics.metric_set import metric_set, EvalInput
|
|
190
|
+
|
|
191
|
+
def compute_metrics(context, response):
|
|
192
|
+
return metric_set.compute(EvalInput(context=context, response=response))
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
The returned `EvalResult` gives you:
|
|
196
|
+
- `result.score("coverage_score")` -- float value
|
|
197
|
+
- `result.metrics["coverage_score"].debug` -- debug dict
|
|
198
|
+
- `result.constraints_ok` -- all constraints passed?
|
|
199
|
+
- `result.violations` -- list of failed constraints
|
|
200
|
+
- `result.assert_constraints()` -- raise if any failed
|
|
201
|
+
|
|
202
|
+
### 8. Wire up batch evaluation
|
|
203
|
+
|
|
204
|
+
If your code runs metrics on multiple scenarios (benchmark runs, parameter sweeps, A/B comparisons), use `compute_batch()`:
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
# Dict-keyed batch (recommended -- keys become result labels)
|
|
208
|
+
inputs = {
|
|
209
|
+
"baseline": EvalInput(context=ctx, response=baseline_response),
|
|
210
|
+
"optimized_v1": EvalInput(context=ctx, response=v1_response),
|
|
211
|
+
"optimized_v2": EvalInput(context=ctx, response=v2_response),
|
|
212
|
+
}
|
|
213
|
+
batch = metric_set.compute_batch(inputs)
|
|
214
|
+
|
|
215
|
+
batch["baseline"].score("coverage_score") # single value
|
|
216
|
+
batch.scores("coverage_score") # {"baseline": 95.0, "optimized_v1": 97.2, ...}
|
|
217
|
+
|
|
218
|
+
# List batch (indexed by position)
|
|
219
|
+
batch = metric_set.compute_batch([input1, input2, input3])
|
|
220
|
+
batch[0].score("coverage_score")
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
**Even if you only have single-input evaluation today, prefer structuring your code to support batch.** Future visualization features (3D parametric graphs, comparative charts) will consume `BatchResult` natively.
|
|
224
|
+
|
|
225
|
+
### 9. Update call sites
|
|
226
|
+
|
|
227
|
+
Find all places that currently call the old orchestrator or access the old metric objects. Common patterns to search for:
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
# Search for old call sites
|
|
231
|
+
grep -r "compute_metrics" --include="*.py"
|
|
232
|
+
grep -r "\.value" --include="*.py" | grep -i metric
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Before:**
|
|
236
|
+
```python
|
|
237
|
+
metrics = compute_metrics(context, response)
|
|
238
|
+
coverage = next(m for m in metrics if m.name == "coverage_score")
|
|
239
|
+
print(f"Coverage: {coverage.value}%")
|
|
240
|
+
violations = [m for m in metrics if m.name == "constraint_violations"]
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**After:**
|
|
244
|
+
```python
|
|
245
|
+
result = compute_metrics(context, response)
|
|
246
|
+
print(f"Coverage: {result.score('coverage_score')}%")
|
|
247
|
+
print(f"Constraints OK: {result.constraints_ok}")
|
|
248
|
+
for v in result.violations:
|
|
249
|
+
print(f" FAILED: {v.key} = {v.value} (threshold: {v.threshold})")
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### 10. Terminal visualization
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
from dot_metrics import draw_terminal_chart
|
|
256
|
+
|
|
257
|
+
result = metric_set.compute(data)
|
|
258
|
+
print(draw_terminal_chart(result))
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
For metrics where lower is better (e.g. latency, violation counts), set `higher_is_better=False` during registration. For absolute metrics (unbounded values like time-in-ms), add `metadata={"absolute": True}` -- the chart will normalize them differently.
|
|
262
|
+
|
|
263
|
+
### 11. Clean up
|
|
264
|
+
|
|
265
|
+
Once all call sites are migrated:
|
|
266
|
+
|
|
267
|
+
- Delete the old `Metric` / `MetricType` model classes (if they were project-specific)
|
|
268
|
+
- Delete the old orchestrator function
|
|
269
|
+
- Delete any manual normalization / chart code that dot-metrics now handles
|
|
270
|
+
- Remove metric-related utility functions that are now inlined in metric registrations
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## File organization recommendations
|
|
275
|
+
|
|
276
|
+
```
|
|
277
|
+
your_project/
|
|
278
|
+
metrics/
|
|
279
|
+
__init__.py # exports metric_set and EvalInput
|
|
280
|
+
metric_set.py # MetricSet definition + EvalInput dataclass
|
|
281
|
+
quality.py # @metric_set.metric / .constraint registrations
|
|
282
|
+
performance.py # more registrations
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
Keep all registrations as module-level decorators. They execute at import time, so importing the module registers the metrics. Your `__init__.py` just needs:
|
|
286
|
+
|
|
287
|
+
```python
|
|
288
|
+
from your_project.metrics.metric_set import metric_set, EvalInput
|
|
289
|
+
|
|
290
|
+
# Import modules to trigger registration
|
|
291
|
+
import your_project.metrics.quality
|
|
292
|
+
import your_project.metrics.performance
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## Migration checklist
|
|
298
|
+
|
|
299
|
+
- [ ] `dot-metrics` added to dependencies
|
|
300
|
+
- [ ] Input type defined (single object bundling all metric inputs)
|
|
301
|
+
- [ ] `MetricSet` created with generic annotation
|
|
302
|
+
- [ ] All metrics registered (decorator or imperative)
|
|
303
|
+
- [ ] All constraints registered with thresholds
|
|
304
|
+
- [ ] Debug info attached via `ComputedValue` where needed
|
|
305
|
+
- [ ] Docstrings added for metric documentation (Range, Interpretation, Notes)
|
|
306
|
+
- [ ] Orchestrator replaced with `metric_set.compute()`
|
|
307
|
+
- [ ] Batch evaluation wired where applicable
|
|
308
|
+
- [ ] Call sites updated to use `EvalResult` API
|
|
309
|
+
- [ ] Old metric models and utilities removed
|
|
310
|
+
- [ ] `higher_is_better` and `metadata={"absolute": True}` set correctly for visualization
|
|
311
|
+
|
|
312
|
+
---
|
|
313
|
+
|
|
314
|
+
## Common gotchas
|
|
315
|
+
|
|
316
|
+
**Metric functions must take exactly one argument.** If your existing functions take `(context, response)`, you must bundle them. Don't try to use `functools.partial` or closures to work around this -- define a proper input type.
|
|
317
|
+
|
|
318
|
+
**Constraints pass when `value <= threshold`.** This is a "lower is better" model for violations. If you have a constraint like "availability must be >= 95%", invert it: register the shortfall (`100 - availability`) with `threshold=5`.
|
|
319
|
+
|
|
320
|
+
**Don't duplicate computation.** If multiple metrics share expensive computation (e.g. building a conflict graph), extract it into a helper that takes the input type and cache or precompute as needed. The metric functions can then call the shared helper.
|
|
321
|
+
|
|
322
|
+
**Batch inputs must all be the same type.** `compute_batch()` runs the same metric set on each input. If you need different metric sets for different scenarios, use separate `MetricSet` instances.
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# Python > Build
|
|
2
|
+
*.py[cod]
|
|
3
|
+
*.egg-info
|
|
4
|
+
**/__pycache__/
|
|
5
|
+
|
|
6
|
+
# Python > Cache
|
|
7
|
+
.pytest_cache/
|
|
8
|
+
|
|
9
|
+
# Python > Coverage
|
|
10
|
+
.coverage
|
|
11
|
+
coverage_html_report/
|
|
12
|
+
|
|
13
|
+
# Environment
|
|
14
|
+
.venv/
|
|
15
|
+
|
|
16
|
+
# Ide
|
|
17
|
+
.vscode/
|
|
18
|
+
|
|
19
|
+
# Local files
|
|
20
|
+
local/
|
|
21
|
+
|
|
22
|
+
# Build artifacts
|
|
23
|
+
dist/
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
TODO: TO BE COMPLETED
|
|
@@ -0,0 +1,312 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: dot-metrics
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A lightweight metrics and constraint evaluation framework
|
|
5
|
+
Project-URL: Homepage, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics
|
|
6
|
+
Project-URL: Repository, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics
|
|
7
|
+
Project-URL: Issues, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics/-/issues
|
|
8
|
+
Author-email: deepika Team <contact@deepika.ai>
|
|
9
|
+
License: TODO: TO BE COMPLETED
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: constraints,deepika,evaluation,metrics,open-toolbox
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
17
|
+
Requires-Python: >=3.12
|
|
18
|
+
Requires-Dist: pydantic>=2.0
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
|
|
21
|
+
# dot-metrics
|
|
22
|
+
|
|
23
|
+

|
|
24
|
+

|
|
25
|
+
|
|
26
|
+
**dot-metrics** is a lightweight metrics and constraint evaluation framework. Define metrics and constraints, run them against your data, and get structured results with debug info.
|
|
27
|
+
|
|
28
|
+
## Install
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pip install dot-metrics
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Concept
|
|
35
|
+
|
|
36
|
+
A `MetricSet` holds metric and constraint definitions. Call `compute(data)` to evaluate them.
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
MetricSet
|
|
40
|
+
├── metrics: {"coverage": MetricDefinition}
|
|
41
|
+
└── constraints: {"errors": ConstraintDefinition}
|
|
42
|
+
│
|
|
43
|
+
▼
|
|
44
|
+
set.compute(data)
|
|
45
|
+
│
|
|
46
|
+
▼
|
|
47
|
+
EvalResult
|
|
48
|
+
├── metrics: {"coverage": Metric}
|
|
49
|
+
└── constraints: {"errors": Constraint}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Metrics** are continuous measurements (e.g. coverage rate, score).
|
|
53
|
+
**Constraints** are pass/fail checks against a threshold (e.g. error count ≤ 0).
|
|
54
|
+
|
|
55
|
+
A constraint passes when `value <= threshold`.
|
|
56
|
+
|
|
57
|
+
## Quick start
|
|
58
|
+
|
|
59
|
+
```python
|
|
60
|
+
from dot_metrics import MetricSet
|
|
61
|
+
|
|
62
|
+
metric_set = MetricSet()
|
|
63
|
+
|
|
64
|
+
@metric_set.metric("coverage")
|
|
65
|
+
def coverage(data):
|
|
66
|
+
return data["covered"] / data["total"]
|
|
67
|
+
|
|
68
|
+
@metric_set.constraint("errors", threshold=0)
|
|
69
|
+
def errors(data):
|
|
70
|
+
return data["error_count"]
|
|
71
|
+
|
|
72
|
+
result = metric_set.compute({"covered": 90, "total": 100, "error_count": 0})
|
|
73
|
+
|
|
74
|
+
result.score("coverage") # 0.9
|
|
75
|
+
result.constraints_ok # True
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Defining metrics and constraints
|
|
79
|
+
|
|
80
|
+
### Decorator style
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
metric_set = MetricSet()
|
|
84
|
+
|
|
85
|
+
@metric_set.metric("latency_ms", unit="ms", higher_is_better=False)
|
|
86
|
+
def latency(data):
|
|
87
|
+
return data["total_ms"] / data["requests"]
|
|
88
|
+
|
|
89
|
+
@metric_set.constraint("error_rate", threshold=0.01, unit="%")
|
|
90
|
+
def error_rate(data):
|
|
91
|
+
return data["errors"] / data["requests"]
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Imperative style
|
|
95
|
+
|
|
96
|
+
```python
|
|
97
|
+
metric_set = MetricSet()
|
|
98
|
+
metric_set.add("coverage", lambda data: data["covered"] / data["total"])
|
|
99
|
+
metric_set.add_constraint("errors", lambda data: data["error_count"], threshold=0)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Both styles are equivalent. `add()` and `add_constraint()` accept the same keyword arguments as the decorators.
|
|
103
|
+
|
|
104
|
+
### Parameters
|
|
105
|
+
|
|
106
|
+
**`metric_set.metric(key, *, unit="", description="", higher_is_better=True, metadata={})`**
|
|
107
|
+
**`metric_set.add(key, fn, *, unit="", description="", higher_is_better=True, metadata={})`**
|
|
108
|
+
|
|
109
|
+
**`metric_set.constraint(key, *, threshold, unit="", description="", metadata={})`**
|
|
110
|
+
**`metric_set.add_constraint(key, fn, *, threshold, unit="", description="", metadata={})`**
|
|
111
|
+
|
|
112
|
+
- `key` — unique name for the metric/constraint
|
|
113
|
+
- `threshold` — constraint passes when `value <= threshold`
|
|
114
|
+
- `higher_is_better` — affects terminal chart rendering
|
|
115
|
+
- `metadata` — arbitrary dict, passed through to results
|
|
116
|
+
|
|
117
|
+
## Computing results
|
|
118
|
+
|
|
119
|
+
```python
|
|
120
|
+
result = metric_set.compute(data)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Every metric and constraint function must accept **exactly one argument** — the data object. `data` can be anything: a dict, dataclass, Pydantic model, etc.
|
|
124
|
+
|
|
125
|
+
### Accessing results
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
result.score("coverage") # float — metric value
|
|
129
|
+
result.metrics["coverage"].value # same
|
|
130
|
+
result.metrics["coverage"].unit # ""
|
|
131
|
+
result.metrics["coverage"].debug # {} by default
|
|
132
|
+
|
|
133
|
+
result.constraints["errors"].value # float
|
|
134
|
+
result.constraints["errors"].passed # True/False
|
|
135
|
+
result.constraints["errors"].threshold # 0
|
|
136
|
+
|
|
137
|
+
result.constraints_ok # True if all constraints passed
|
|
138
|
+
result.violations # list of failed Constraint objects
|
|
139
|
+
result.assert_constraints() # raises ValueError if any failed
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
## Attaching debug info
|
|
143
|
+
|
|
144
|
+
Return a `ComputedValue` instead of a plain float to attach structured debug data:
|
|
145
|
+
|
|
146
|
+
```python
|
|
147
|
+
from dot_metrics import MetricSet, ComputedValue
|
|
148
|
+
|
|
149
|
+
metric_set = MetricSet()
|
|
150
|
+
|
|
151
|
+
@metric_set.metric("coverage")
|
|
152
|
+
def coverage(data):
|
|
153
|
+
missed = [x for x in data if not x["covered"]]
|
|
154
|
+
return ComputedValue(value=1 - len(missed) / len(data), debug={"missed": missed})
|
|
155
|
+
|
|
156
|
+
result = metric_set.compute(data)
|
|
157
|
+
result.metrics["coverage"].debug # {"missed": [...]}
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
`ComputedValue` works the same way for constraints.
|
|
161
|
+
|
|
162
|
+
## Batch evaluation
|
|
163
|
+
|
|
164
|
+
Evaluate a set of inputs in one call:
|
|
165
|
+
|
|
166
|
+
```python
|
|
167
|
+
# dict of inputs
|
|
168
|
+
batch = metric_set.compute_batch({"run_1": data1, "run_2": data2})
|
|
169
|
+
batch["run_1"].score("coverage") # 0.9
|
|
170
|
+
batch.scores("coverage") # {"run_1": 0.9, "run_2": 0.85}
|
|
171
|
+
|
|
172
|
+
# list of inputs
|
|
173
|
+
batch = metric_set.compute_batch([data1, data2, data3])
|
|
174
|
+
batch[0].score("coverage") # indexed by position
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
`BatchResult` supports iteration, `len()`, and `.items()`.
|
|
178
|
+
|
|
179
|
+
## Metric documentation
|
|
180
|
+
|
|
181
|
+
Add a Google-style docstring to a metric or constraint function and it gets parsed into a `help` dict on both the definition and the result:
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
@metric_set.metric("coverage", unit="%")
|
|
185
|
+
def coverage(data):
|
|
186
|
+
"""Percentage of code paths covered by tests.
|
|
187
|
+
|
|
188
|
+
Range: 0-100
|
|
189
|
+
Interpretation:
|
|
190
|
+
- 90-100: Excellent
|
|
191
|
+
- 70-90: Good
|
|
192
|
+
- <70: Needs improvement
|
|
193
|
+
Notes:
|
|
194
|
+
- Returns 0 for empty input.
|
|
195
|
+
"""
|
|
196
|
+
return sum(1 for x in data if x["covered"]) / len(data)
|
|
197
|
+
|
|
198
|
+
metric_set.metrics["coverage"].help
|
|
199
|
+
# {
|
|
200
|
+
# "summary": "Percentage of code paths covered by tests.",
|
|
201
|
+
# "range": "0-100",
|
|
202
|
+
# "interpretation": "- 90-100: Excellent\n- 70-90: Good\n- <70: Needs improvement",
|
|
203
|
+
# "notes": "- Returns 0 for empty input."
|
|
204
|
+
# }
|
|
205
|
+
|
|
206
|
+
result = metric_set.compute(data)
|
|
207
|
+
result.metrics["coverage"].help # same dict
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
Supported sections: `Range:`, `Interpretation:`, `Notes:`. No docstring → `help` is `{}`.
|
|
211
|
+
|
|
212
|
+
## Typing
|
|
213
|
+
|
|
214
|
+
`MetricSet` is generic over the input type `T`. Annotating it lets static type checkers (mypy, pyright) verify that every registered function accepts the right type:
|
|
215
|
+
|
|
216
|
+
```python
|
|
217
|
+
from dataclasses import dataclass
|
|
218
|
+
from dot_metrics import MetricSet
|
|
219
|
+
|
|
220
|
+
@dataclass
|
|
221
|
+
class SchedulingData:
|
|
222
|
+
appointments: list
|
|
223
|
+
solution: list
|
|
224
|
+
|
|
225
|
+
metric_set: MetricSet[SchedulingData] = MetricSet()
|
|
226
|
+
metric_set.add("rate", lambda d: len(d.solution) / len(d.appointments))
|
|
227
|
+
|
|
228
|
+
result = metric_set.compute(SchedulingData(appointments=[...], solution=[...]))
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
The annotation is optional — omitting it is fine and everything still works at runtime.
|
|
232
|
+
|
|
233
|
+
## Terminal chart
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
from dot_metrics import draw_terminal_chart
|
|
237
|
+
|
|
238
|
+
print(draw_terminal_chart(result))
|
|
239
|
+
# coverage ████████████████████ 0.90
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
`draw_terminal_chart(result, width=40, char="█")` returns a string — use `print()` to display it.
|
|
243
|
+
|
|
244
|
+
## Full example
|
|
245
|
+
|
|
246
|
+
```python
|
|
247
|
+
from dot_metrics import MetricSet, ComputedValue
|
|
248
|
+
|
|
249
|
+
appointments = [
|
|
250
|
+
{"id": "A1", "patient": "Alice", "duration": 30},
|
|
251
|
+
{"id": "A2", "patient": "Bob", "duration": 60},
|
|
252
|
+
{"id": "A3", "patient": "Charlie", "duration": 30},
|
|
253
|
+
]
|
|
254
|
+
|
|
255
|
+
solution = [
|
|
256
|
+
{"appointment_id": "A1", "practitioner": "Dr. Martin", "slot": "09:00", "scheduled": True},
|
|
257
|
+
{"appointment_id": "A2", "practitioner": "Dr. Martin", "slot": "09:00", "scheduled": True}, # conflict!
|
|
258
|
+
{"appointment_id": "A3", "practitioner": "Dr. Martin", "slot": "10:00", "scheduled": True},
|
|
259
|
+
]
|
|
260
|
+
|
|
261
|
+
metric_set = MetricSet()
|
|
262
|
+
|
|
263
|
+
@metric_set.metric("scheduling_rate")
|
|
264
|
+
def scheduling_rate(data):
|
|
265
|
+
scheduled = [e for e in data["solution"] if e["scheduled"]]
|
|
266
|
+
unscheduled = [e["appointment_id"] for e in data["solution"] if not e["scheduled"]]
|
|
267
|
+
return ComputedValue(value=len(scheduled) / len(data["appointments"]), debug={"unscheduled": unscheduled})
|
|
268
|
+
|
|
269
|
+
@metric_set.constraint("conflicts", threshold=0)
|
|
270
|
+
def count_conflicts(data):
|
|
271
|
+
seen = {}
|
|
272
|
+
conflicts = []
|
|
273
|
+
for entry in data["solution"]:
|
|
274
|
+
key = (entry["practitioner"], entry["slot"])
|
|
275
|
+
if key in seen:
|
|
276
|
+
conflicts.append((seen[key], entry["appointment_id"]))
|
|
277
|
+
seen[key] = entry["appointment_id"]
|
|
278
|
+
return ComputedValue(value=len(conflicts), debug={"conflicts": conflicts})
|
|
279
|
+
|
|
280
|
+
result = metric_set.compute({"appointments": appointments, "solution": solution})
|
|
281
|
+
|
|
282
|
+
result.score("scheduling_rate") # 1.0
|
|
283
|
+
result.constraints_ok # False
|
|
284
|
+
result.constraints["conflicts"].debug # {"conflicts": [("A1", "A2")]}
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
## Reference
|
|
288
|
+
|
|
289
|
+
| Import | Description |
|
|
290
|
+
|--------|-------------|
|
|
291
|
+
| `MetricSet` | Main class — holds definitions, runs computation |
|
|
292
|
+
| `EvalResult` | Output of `compute()` — holds `Metric` and `Constraint` dicts |
|
|
293
|
+
| `BatchResult` | Output of `compute_batch()` — maps keys to `EvalResult` |
|
|
294
|
+
| `ComputedValue` | Wraps a float return value with optional debug data |
|
|
295
|
+
| `Metric` | Computed metric result |
|
|
296
|
+
| `Constraint` | Computed constraint result with `passed` flag |
|
|
297
|
+
| `MetricDefinition` | Stored metric definition (in `metric_set.metrics`) |
|
|
298
|
+
| `ConstraintDefinition` | Stored constraint definition (in `metric_set.constraints`) |
|
|
299
|
+
| `draw_terminal_chart` | Renders a Unicode bar chart from an `EvalResult` |
|
|
300
|
+
|
|
301
|
+
## Contributing & Development
|
|
302
|
+
|
|
303
|
+
See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) and [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md).
|
|
304
|
+
|
|
305
|
+
## License
|
|
306
|
+
|
|
307
|
+
See [LICENSE](LICENSE) for details.
|
|
308
|
+
|
|
309
|
+
## Contact
|
|
310
|
+
|
|
311
|
+
deepika Team — contact@deepika.ai
|
|
312
|
+
Project: [gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics](https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics)
|