ato 2.0.3__py3-none-any.whl → 2.1.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of ato might be problematic. Click here for more details.
- ato/__init__.py +1 -1
- ato/scope.py +1 -0
- ato-2.1.1.dist-info/METADATA +1021 -0
- {ato-2.0.3.dist-info → ato-2.1.1.dist-info}/RECORD +7 -7
- ato-2.0.3.dist-info/METADATA +0 -1314
- {ato-2.0.3.dist-info → ato-2.1.1.dist-info}/WHEEL +0 -0
- {ato-2.0.3.dist-info → ato-2.1.1.dist-info}/licenses/LICENSE +0 -0
- {ato-2.0.3.dist-info → ato-2.1.1.dist-info}/top_level.txt +0 -0
ato-2.0.3.dist-info/METADATA
DELETED
|
@@ -1,1314 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ato
|
|
3
|
-
Version: 2.0.3
|
|
4
|
-
Summary: A frameworkless integration layer for ML pipelines. Ato doesn't compete with frameworks — it restores freedom between them.
|
|
5
|
-
Author: ato contributors
|
|
6
|
-
License: MIT
|
|
7
|
-
Project-URL: Homepage, https://github.com/yourusername/ato
|
|
8
|
-
Project-URL: Repository, https://github.com/yourusername/ato
|
|
9
|
-
Project-URL: Documentation, https://github.com/yourusername/ato#readme
|
|
10
|
-
Project-URL: Issues, https://github.com/yourusername/ato/issues
|
|
11
|
-
Keywords: config management,experiment tracking,hyperparameter optimization,lightweight,composable,namespace isolation,machine learning
|
|
12
|
-
Classifier: Development Status :: 4 - Beta
|
|
13
|
-
Classifier: Intended Audience :: Developers
|
|
14
|
-
Classifier: Intended Audience :: Science/Research
|
|
15
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
-
Classifier: Programming Language :: Python :: 3
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.7
|
|
18
|
-
Classifier: Programming Language :: Python :: 3.8
|
|
19
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
-
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
23
|
-
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
24
|
-
Requires-Python: >=3.7
|
|
25
|
-
Description-Content-Type: text/markdown
|
|
26
|
-
License-File: LICENSE
|
|
27
|
-
Requires-Dist: pyyaml>=6.0
|
|
28
|
-
Requires-Dist: toml>=0.10.2
|
|
29
|
-
Requires-Dist: sqlalchemy>=2.0
|
|
30
|
-
Requires-Dist: numpy>=1.19.0
|
|
31
|
-
Provides-Extra: distributed
|
|
32
|
-
Requires-Dist: torch>=1.8.0; extra == "distributed"
|
|
33
|
-
Dynamic: license-file
|
|
34
|
-
|
|
35
|
-
# Ato: A Tiny Orchestrator
|
|
36
|
-
|
|
37
|
-
## A frameworkless integration layer for Python and ML pipelines
|
|
38
|
-
|
|
39
|
-
**Ato doesn't compete with frameworks — it restores freedom between them.**
|
|
40
|
-
|
|
41
|
-
Every framework tries to own your workflow.
|
|
42
|
-
Ato doesn't. It just gives you the **handles to connect them**.
|
|
43
|
-
|
|
44
|
-
Unlike Hydra, MLflow, or W&B — which build ecosystems —
|
|
45
|
-
Ato provides **structural boundaries** that let you compose, track, and optimize
|
|
46
|
-
without surrendering control to any single system.
|
|
47
|
-
|
|
48
|
-
You can:
|
|
49
|
-
- **Chain** configs from multiple sources (Hydra YAML, OpenMMLab, raw Python)
|
|
50
|
-
- **Merge** them with explicit priority control — and **debug** the exact merge order
|
|
51
|
-
- **Track** experiments in SQLite with zero setup — or sync to MLflow/W&B when you need dashboards
|
|
52
|
-
- **Optimize** hyperparameters with built-in Hyperband — or use Optuna/Ray Tune alongside
|
|
53
|
-
|
|
54
|
-
Ato keeps everything **transparent** and **Pythonic**.
|
|
55
|
-
No magic. No vendor lock-in. Just composable pieces you control.
|
|
56
|
-
|
|
57
|
-
<details>
|
|
58
|
-
<summary><strong>Design Philosophy</strong></summary>
|
|
59
|
-
|
|
60
|
-
Ato was designed from a simple realization:
|
|
61
|
-
**frameworks solve composition, but they also create coupling.**
|
|
62
|
-
|
|
63
|
-
When I built Ato, I didn't know Hydra existed.
|
|
64
|
-
But I did know I wanted something that wouldn't **own** my workflow —
|
|
65
|
-
something that could **compose** with whatever came next.
|
|
66
|
-
|
|
67
|
-
That constraint led to three design principles:
|
|
68
|
-
|
|
69
|
-
1. **Structural neutrality** — Ato has no opinion on your stack.
|
|
70
|
-
It's a layer, not a platform.
|
|
71
|
-
|
|
72
|
-
2. **Explicit boundaries** — Each module (ADict, Scope, SQLTracker, HyperOpt) is independent.
|
|
73
|
-
Use one, use all, or mix with other tools. No forced dependencies.
|
|
74
|
-
|
|
75
|
-
3. **Debuggable composition** — When configs merge from 5 sources, you should see **why** a value was set.
|
|
76
|
-
Ato's `manual` command shows the exact merge order — a feature no other tool has.
|
|
77
|
-
|
|
78
|
-
This isn't minimalism for its own sake.
|
|
79
|
-
It's **structural restraint** — interfering only where necessary,
|
|
80
|
-
and staying out of the way everywhere else.
|
|
81
|
-
|
|
82
|
-
</details>
|
|
83
|
-
|
|
84
|
-
---
|
|
85
|
-
|
|
86
|
-
## Why Ato?
|
|
87
|
-
|
|
88
|
-
Ato solves a problem that frameworks create: **workflow ownership**.
|
|
89
|
-
|
|
90
|
-
| Framework Approach | Ato's Approach |
|
|
91
|
-
|-------------------|----------------|
|
|
92
|
-
| Hydra owns config composition | Ato **composes** Hydra configs + raw Python + CLI args |
|
|
93
|
-
| MLflow owns experiment tracking | Ato tracks locally in SQLite, **or** syncs to MLflow |
|
|
94
|
-
| W&B owns hyperparameter search | Ato provides Hyperband, **or** you use Optuna/Ray |
|
|
95
|
-
| Each framework wants to be **the** system | Ato is **a layer** you control |
|
|
96
|
-
|
|
97
|
-
**Ato is for teams who want:**
|
|
98
|
-
- Config flexibility without framework lock-in
|
|
99
|
-
- Experiment tracking without mandatory cloud platforms
|
|
100
|
-
- Hyperparameter optimization without opaque black boxes
|
|
101
|
-
- The ability to **change tools** without rewriting pipelines
|
|
102
|
-
|
|
103
|
-
### What Makes Ato Structurally Different
|
|
104
|
-
|
|
105
|
-
These aren't features — they're **architectural decisions** that frameworks can't replicate without breaking their own abstractions:
|
|
106
|
-
|
|
107
|
-
| Capability | Why Frameworks Can't Do This | What It Enables |
|
|
108
|
-
|------------|------------------------------|-----------------|
|
|
109
|
-
| **MultiScope** (namespace isolation) | Frameworks use global config namespaces | Multiple teams can own separate config scopes without key collisions. No `model_lr` vs `data_lr` prefixing needed. |
|
|
110
|
-
| **`manual` command** (merge order debugging) | Frameworks show final configs, not merge logic | See **why** a value was set — trace exact merge order across defaults, named configs, CLI args, and lazy evaluation. |
|
|
111
|
-
| **Structural hashing** | Frameworks track values, not structure | Detect when experiment **architecture** changes (not just hyperparameters). Critical for reproducibility. |
|
|
112
|
-
| **Offline-first tracking** | Frameworks assume centralized platforms | Zero-setup SQLite tracking. No servers, no auth, no vendor lock-in. Sync to MLflow/W&B only when needed. |
|
|
113
|
-
|
|
114
|
-
### Developer Experience
|
|
115
|
-
|
|
116
|
-
- **Zero boilerplate** — Auto-nested configs (`cfg.model.backbone.depth = 50` just works), lazy evaluation, attribute access
|
|
117
|
-
- **CLI-first** — Override any config from command line without touching code: `python train.py model.backbone=%resnet101%`
|
|
118
|
-
- **Framework agnostic** — Works with PyTorch, TensorFlow, JAX, or pure Python. No framework-specific decorators or magic.
|
|
119
|
-
|
|
120
|
-
## Quick Start
|
|
121
|
-
|
|
122
|
-
```bash
|
|
123
|
-
pip install ato
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
### 30-Second Example
|
|
127
|
-
|
|
128
|
-
```python
|
|
129
|
-
from ato.scope import Scope
|
|
130
|
-
|
|
131
|
-
scope = Scope()
|
|
132
|
-
|
|
133
|
-
@scope.observe(default=True)
|
|
134
|
-
def config(cfg):
|
|
135
|
-
cfg.lr = 0.001
|
|
136
|
-
cfg.batch_size = 32
|
|
137
|
-
cfg.model = 'resnet50'
|
|
138
|
-
|
|
139
|
-
@scope
|
|
140
|
-
def train(cfg):
|
|
141
|
-
print(f"Training {cfg.model} with lr={cfg.lr}")
|
|
142
|
-
# Your training code here
|
|
143
|
-
|
|
144
|
-
if __name__ == '__main__':
|
|
145
|
-
train() # python train.py
|
|
146
|
-
# Override from CLI: python train.py lr=0.01 model=%resnet101%
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
---
|
|
150
|
-
|
|
151
|
-
## Table of Contents
|
|
152
|
-
|
|
153
|
-
- [ADict: Enhanced Dictionary](#adict-enhanced-dictionary)
|
|
154
|
-
- [Scope: Configuration Management](#scope-configuration-management)
|
|
155
|
-
- [MultiScope: Namespace Isolation](#2-multiscope---multiple-configuration-contexts) ⭐ Unique to Ato
|
|
156
|
-
- [Config Documentation & Debugging](#5-configuration-documentation--inspection) ⭐ Unique to Ato
|
|
157
|
-
- [SQL Tracker: Experiment Tracking](#sql-tracker-experiment-tracking)
|
|
158
|
-
- [Hyperparameter Optimization](#hyperparameter-optimization)
|
|
159
|
-
- [Best Practices](#best-practices)
|
|
160
|
-
- [Roadmap](#roadmap-expanding-boundaries-without-breaking-neutrality)
|
|
161
|
-
- [Working with Existing Tools](#working-with-existing-tools)
|
|
162
|
-
|
|
163
|
-
---
|
|
164
|
-
|
|
165
|
-
## ADict: Enhanced Dictionary
|
|
166
|
-
|
|
167
|
-
`ADict` is an enhanced dictionary designed for managing experiment configurations. It combines the simplicity of Python dictionaries with powerful features for ML workflows.
|
|
168
|
-
|
|
169
|
-
### Core Features
|
|
170
|
-
|
|
171
|
-
These are the fundamental capabilities that make ADict powerful for experiment management:
|
|
172
|
-
|
|
173
|
-
| Feature | Description | Why It Matters |
|
|
174
|
-
|---------|-------------|----------------|
|
|
175
|
-
| **Structural Hashing** | Hash based on keys + types, not values | Track when experiment structure changes |
|
|
176
|
-
| **Nested Access** | Dot notation for nested configs | `config.model.lr` instead of `config['model']['lr']` |
|
|
177
|
-
| **Format Agnostic** | Load/save JSON, YAML, TOML, XYZ | Work with any config format |
|
|
178
|
-
| **Safe Updates** | `update_if_absent()` method | Prevent accidental overwrites |
|
|
179
|
-
|
|
180
|
-
### Developer Convenience Features
|
|
181
|
-
|
|
182
|
-
These utilities maximize developer productivity and reduce boilerplate:
|
|
183
|
-
|
|
184
|
-
| Feature | Description | Benefit |
|
|
185
|
-
|---------|-------------|---------|
|
|
186
|
-
| **Auto-nested (`ADict.auto()`)** | Infinite depth lazy creation | `config.a.b.c = 1` just works - no KeyError |
|
|
187
|
-
| **Attribute-style Assignment** | `config.lr = 0.1` | Cleaner, more readable code |
|
|
188
|
-
| **Conditional Updates** | Only update missing keys | Merge configs safely |
|
|
189
|
-
|
|
190
|
-
### Quick Examples
|
|
191
|
-
|
|
192
|
-
```python
|
|
193
|
-
from ato.adict import ADict
|
|
194
|
-
|
|
195
|
-
# Structural hashing - track config structure changes
|
|
196
|
-
config1 = ADict(lr=0.1, epochs=100, model='resnet50')
|
|
197
|
-
config2 = ADict(lr=0.01, epochs=200, model='resnet101')
|
|
198
|
-
print(config1.get_structural_hash() == config2.get_structural_hash()) # True
|
|
199
|
-
|
|
200
|
-
config3 = ADict(lr=0.1, epochs='100', model='resnet50') # epochs is str!
|
|
201
|
-
print(config1.get_structural_hash() == config3.get_structural_hash()) # False
|
|
202
|
-
|
|
203
|
-
# Load/save any format
|
|
204
|
-
config = ADict.from_file('config.json')
|
|
205
|
-
config.dump('config.yaml')
|
|
206
|
-
|
|
207
|
-
# Safe updates
|
|
208
|
-
config.update_if_absent(lr=0.01, scheduler='cosine') # Only adds scheduler
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
### Convenience Features in Detail
|
|
212
|
-
|
|
213
|
-
#### Auto-nested: Zero Boilerplate Config Building
|
|
214
|
-
|
|
215
|
-
The most loved feature - no more manual nesting:
|
|
216
|
-
|
|
217
|
-
```python
|
|
218
|
-
# ❌ Traditional way
|
|
219
|
-
config = ADict()
|
|
220
|
-
config.model = ADict()
|
|
221
|
-
config.model.backbone = ADict()
|
|
222
|
-
config.model.backbone.layers = [64, 128, 256]
|
|
223
|
-
|
|
224
|
-
# ✅ With ADict.auto()
|
|
225
|
-
config = ADict.auto()
|
|
226
|
-
config.model.backbone.layers = [64, 128, 256] # Just works!
|
|
227
|
-
config.data.augmentation.brightness = 0.2
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
**Perfect for Scope integration**:
|
|
231
|
-
|
|
232
|
-
```python
|
|
233
|
-
from ato.scope import Scope
|
|
234
|
-
|
|
235
|
-
scope = Scope()
|
|
236
|
-
|
|
237
|
-
@scope.observe(default=True)
|
|
238
|
-
def config(cfg):
|
|
239
|
-
# No pre-definition needed!
|
|
240
|
-
cfg.training.optimizer.name = 'AdamW'
|
|
241
|
-
cfg.training.optimizer.lr = 0.001
|
|
242
|
-
cfg.model.encoder.num_layers = 12
|
|
243
|
-
```
|
|
244
|
-
|
|
245
|
-
**Works with CLI**:
|
|
246
|
-
|
|
247
|
-
```bash
|
|
248
|
-
python train.py model.backbone.resnet.depth=50 data.batch_size=32
|
|
249
|
-
```
|
|
250
|
-
|
|
251
|
-
#### More Convenience Utilities
|
|
252
|
-
|
|
253
|
-
```python
|
|
254
|
-
# Attribute-style access
|
|
255
|
-
config.lr = 0.1
|
|
256
|
-
print(config.lr) # Instead of config['lr']
|
|
257
|
-
|
|
258
|
-
# Nested access
|
|
259
|
-
print(config.model.backbone.type) # Clean and readable
|
|
260
|
-
|
|
261
|
-
# Conditional updates - merge configs safely
|
|
262
|
-
base_config.update_if_absent(**experiment_config)
|
|
263
|
-
```
|
|
264
|
-
|
|
265
|
-
---
|
|
266
|
-
|
|
267
|
-
## Scope: Configuration Management
|
|
268
|
-
|
|
269
|
-
Scope solves configuration complexity through **priority-based merging** and **CLI integration**. No more scattered config files or hard-coded parameters.
|
|
270
|
-
|
|
271
|
-
### Key Concepts
|
|
272
|
-
|
|
273
|
-
```
|
|
274
|
-
Default Configs (priority=0)
|
|
275
|
-
↓
|
|
276
|
-
Named Configs (priority=0+)
|
|
277
|
-
↓
|
|
278
|
-
CLI Arguments (highest priority)
|
|
279
|
-
↓
|
|
280
|
-
Lazy Configs (computed after CLI)
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
### Basic Usage
|
|
284
|
-
|
|
285
|
-
#### Simple Configuration
|
|
286
|
-
|
|
287
|
-
```python
|
|
288
|
-
from ato.scope import Scope
|
|
289
|
-
|
|
290
|
-
scope = Scope()
|
|
291
|
-
|
|
292
|
-
@scope.observe()
|
|
293
|
-
def my_config(config):
|
|
294
|
-
config.dataset = 'cifar10'
|
|
295
|
-
config.lr = 0.001
|
|
296
|
-
config.batch_size = 32
|
|
297
|
-
|
|
298
|
-
@scope
|
|
299
|
-
def train(config):
|
|
300
|
-
print(f"Training on {config.dataset}")
|
|
301
|
-
# Your code here
|
|
302
|
-
|
|
303
|
-
if __name__ == '__main__':
|
|
304
|
-
train()
|
|
305
|
-
```
|
|
306
|
-
|
|
307
|
-
#### Priority-based Merging
|
|
308
|
-
|
|
309
|
-
```python
|
|
310
|
-
@scope.observe(default=True) # Always applied
|
|
311
|
-
def defaults(cfg):
|
|
312
|
-
cfg.lr = 0.001
|
|
313
|
-
cfg.epochs = 100
|
|
314
|
-
|
|
315
|
-
@scope.observe(priority=1) # Applied after defaults
|
|
316
|
-
def high_lr(cfg):
|
|
317
|
-
cfg.lr = 0.01
|
|
318
|
-
|
|
319
|
-
@scope.observe(priority=2) # Applied last
|
|
320
|
-
def long_training(cfg):
|
|
321
|
-
cfg.epochs = 300
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
```bash
|
|
325
|
-
python train.py # lr=0.001, epochs=100
|
|
326
|
-
python train.py high_lr # lr=0.01, epochs=100
|
|
327
|
-
python train.py high_lr long_training # lr=0.01, epochs=300
|
|
328
|
-
```
|
|
329
|
-
|
|
330
|
-
#### CLI Configuration
|
|
331
|
-
|
|
332
|
-
Override any parameter from command line:
|
|
333
|
-
|
|
334
|
-
```bash
|
|
335
|
-
# Simple values
|
|
336
|
-
python train.py lr=0.01 batch_size=64
|
|
337
|
-
|
|
338
|
-
# Nested configs
|
|
339
|
-
python train.py model.backbone=%resnet101% model.depth=101
|
|
340
|
-
|
|
341
|
-
# Lists and complex types
|
|
342
|
-
python train.py layers=[64,128,256,512] dropout=0.5
|
|
343
|
-
|
|
344
|
-
# Combine with named configs
|
|
345
|
-
python train.py my_config lr=0.001 batch_size=128
|
|
346
|
-
```
|
|
347
|
-
|
|
348
|
-
**Note**: Wrap strings with `%` (e.g., `%resnet101%`) instead of quotes.
|
|
349
|
-
|
|
350
|
-
### Advanced Features
|
|
351
|
-
|
|
352
|
-
#### 1. Lazy Evaluation - Dynamic Configuration
|
|
353
|
-
|
|
354
|
-
Sometimes you need configs that depend on other values set via CLI:
|
|
355
|
-
|
|
356
|
-
```python
|
|
357
|
-
@scope.observe()
|
|
358
|
-
def base_config(cfg):
|
|
359
|
-
cfg.model = 'resnet50'
|
|
360
|
-
cfg.dataset = 'imagenet'
|
|
361
|
-
|
|
362
|
-
@scope.observe(lazy=True) # Evaluated AFTER CLI args
|
|
363
|
-
def computed_config(cfg):
|
|
364
|
-
# Adjust based on dataset
|
|
365
|
-
if cfg.dataset == 'imagenet':
|
|
366
|
-
cfg.num_classes = 1000
|
|
367
|
-
cfg.image_size = 224
|
|
368
|
-
elif cfg.dataset == 'cifar10':
|
|
369
|
-
cfg.num_classes = 10
|
|
370
|
-
cfg.image_size = 32
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
```bash
|
|
374
|
-
python train.py dataset=%cifar10% computed_config
|
|
375
|
-
# Results in: num_classes=10, image_size=32
|
|
376
|
-
```
|
|
377
|
-
|
|
378
|
-
**Python 3.11+ Context Manager**:
|
|
379
|
-
|
|
380
|
-
```python
|
|
381
|
-
@scope.observe()
|
|
382
|
-
def my_config(cfg):
|
|
383
|
-
cfg.model = 'resnet50'
|
|
384
|
-
cfg.num_layers = 50
|
|
385
|
-
|
|
386
|
-
with Scope.lazy(): # Evaluated after CLI
|
|
387
|
-
if cfg.model == 'resnet101':
|
|
388
|
-
cfg.num_layers = 101
|
|
389
|
-
```
|
|
390
|
-
|
|
391
|
-
#### 2. MultiScope - Multiple Configuration Contexts
|
|
392
|
-
|
|
393
|
-
**Unique to Ato**: Manage completely separate configuration namespaces. Unlike Hydra's config groups, MultiScope provides true **namespace isolation** with independent priority systems.
|
|
394
|
-
|
|
395
|
-
##### Why MultiScope?
|
|
396
|
-
|
|
397
|
-
| Challenge | Hydra's Approach | Ato's MultiScope |
|
|
398
|
-
|-----------|------------------|---------------------|
|
|
399
|
-
| Separate model/data configs | Config groups in one namespace | **Independent scopes with own priorities** |
|
|
400
|
-
| Avoid key collisions | Manual prefixing (`model.lr`, `train.lr`) | **Automatic namespace isolation** |
|
|
401
|
-
| Different teams/modules | Single config file | **Each scope can be owned separately** |
|
|
402
|
-
| Priority conflicts | Global priority system | **Per-scope priority system** |
|
|
403
|
-
|
|
404
|
-
##### Basic Usage
|
|
405
|
-
|
|
406
|
-
```python
|
|
407
|
-
from ato.scope import Scope, MultiScope
|
|
408
|
-
|
|
409
|
-
model_scope = Scope(name='model')
|
|
410
|
-
data_scope = Scope(name='data')
|
|
411
|
-
scope = MultiScope(model_scope, data_scope)
|
|
412
|
-
|
|
413
|
-
@model_scope.observe(default=True)
|
|
414
|
-
def model_config(model):
|
|
415
|
-
model.backbone = 'resnet50'
|
|
416
|
-
model.pretrained = True
|
|
417
|
-
|
|
418
|
-
@data_scope.observe(default=True)
|
|
419
|
-
def data_config(data):
|
|
420
|
-
data.dataset = 'cifar10'
|
|
421
|
-
data.batch_size = 32
|
|
422
|
-
|
|
423
|
-
@scope
|
|
424
|
-
def train(model, data): # Named parameters match scope names
|
|
425
|
-
print(f"Training {model.backbone} on {data.dataset}")
|
|
426
|
-
```
|
|
427
|
-
|
|
428
|
-
##### Real-world: Team Collaboration
|
|
429
|
-
|
|
430
|
-
Different team members can own different scopes without conflicts:
|
|
431
|
-
|
|
432
|
-
```python
|
|
433
|
-
# team_model.py - ML team owns this
|
|
434
|
-
model_scope = Scope(name='model')
|
|
435
|
-
|
|
436
|
-
@model_scope.observe(default=True)
|
|
437
|
-
def resnet_default(model):
|
|
438
|
-
model.backbone = 'resnet50'
|
|
439
|
-
model.lr = 0.1 # Model-specific learning rate
|
|
440
|
-
|
|
441
|
-
@model_scope.observe(priority=1)
|
|
442
|
-
def resnet101(model):
|
|
443
|
-
model.backbone = 'resnet101'
|
|
444
|
-
model.lr = 0.05 # Different lr for bigger model
|
|
445
|
-
|
|
446
|
-
# team_data.py - Data team owns this
|
|
447
|
-
data_scope = Scope(name='data')
|
|
448
|
-
|
|
449
|
-
@data_scope.observe(default=True)
|
|
450
|
-
def cifar_default(data):
|
|
451
|
-
data.dataset = 'cifar10'
|
|
452
|
-
data.lr = 0.001 # Data augmentation learning rate (no conflict!)
|
|
453
|
-
|
|
454
|
-
@data_scope.observe(priority=1)
|
|
455
|
-
def imagenet(data):
|
|
456
|
-
data.dataset = 'imagenet'
|
|
457
|
-
data.workers = 16
|
|
458
|
-
|
|
459
|
-
# train.py - Integration point
|
|
460
|
-
from team_model import model_scope
|
|
461
|
-
from team_data import data_scope
|
|
462
|
-
|
|
463
|
-
scope = MultiScope(model_scope, data_scope)
|
|
464
|
-
|
|
465
|
-
@scope
|
|
466
|
-
def train(model, data):
|
|
467
|
-
# Both have 'lr' but in separate namespaces!
|
|
468
|
-
print(f"Model LR: {model.lr}, Data LR: {data.lr}")
|
|
469
|
-
```
|
|
470
|
-
|
|
471
|
-
**Key advantage**: `model.lr` and `data.lr` are completely independent. No need for naming conventions like `model_lr` vs `data_lr`.
|
|
472
|
-
|
|
473
|
-
##### CLI with MultiScope
|
|
474
|
-
|
|
475
|
-
Override each scope independently:
|
|
476
|
-
|
|
477
|
-
```bash
|
|
478
|
-
# Override model scope only
|
|
479
|
-
python train.py model.backbone=%resnet101%
|
|
480
|
-
|
|
481
|
-
# Override data scope only
|
|
482
|
-
python train.py data.dataset=%imagenet%
|
|
483
|
-
|
|
484
|
-
# Override both
|
|
485
|
-
python train.py model.backbone=%resnet101% data.dataset=%imagenet%
|
|
486
|
-
|
|
487
|
-
# Call named configs per scope
|
|
488
|
-
python train.py resnet101 imagenet
|
|
489
|
-
```
|
|
490
|
-
|
|
491
|
-
#### 3. Import/Export Configs
|
|
492
|
-
|
|
493
|
-
Ato supports importing configs from multiple frameworks:
|
|
494
|
-
|
|
495
|
-
```python
|
|
496
|
-
@scope.observe()
|
|
497
|
-
def load_external(config):
|
|
498
|
-
# Load from any format
|
|
499
|
-
config.load('experiments/baseline.json')
|
|
500
|
-
config.load('models/resnet.yaml')
|
|
501
|
-
|
|
502
|
-
# Export to any format
|
|
503
|
-
config.dump('output/final_config.toml')
|
|
504
|
-
|
|
505
|
-
# Import OpenMMLab configs - handles _base_ inheritance automatically
|
|
506
|
-
config.load_mm_config('mmdet_configs/faster_rcnn.py')
|
|
507
|
-
```
|
|
508
|
-
|
|
509
|
-
**OpenMMLab compatibility** is built-in:
|
|
510
|
-
- Automatically resolves `_base_` inheritance chains
|
|
511
|
-
- Supports `_delete_` keys for config overriding
|
|
512
|
-
- Makes migration from MMDetection/MMSegmentation/etc. seamless
|
|
513
|
-
|
|
514
|
-
**Hydra-style config composition** is also built-in via `compose_hierarchy`:
|
|
515
|
-
|
|
516
|
-
```python
|
|
517
|
-
from ato.adict import ADict
|
|
518
|
-
|
|
519
|
-
# Hydra-style directory structure:
|
|
520
|
-
# configs/
|
|
521
|
-
# ├── config.yaml # base config
|
|
522
|
-
# ├── model/
|
|
523
|
-
# │ ├── resnet50.yaml
|
|
524
|
-
# │ └── resnet101.yaml
|
|
525
|
-
# └── data/
|
|
526
|
-
# ├── cifar10.yaml
|
|
527
|
-
# └── imagenet.yaml
|
|
528
|
-
|
|
529
|
-
config = ADict.compose_hierarchy(
|
|
530
|
-
root='configs',
|
|
531
|
-
config_filename='config',
|
|
532
|
-
select={
|
|
533
|
-
'model': 'resnet50', # or ['resnet50', 'resnet101'] for multiple
|
|
534
|
-
'data': 'imagenet'
|
|
535
|
-
},
|
|
536
|
-
overrides={
|
|
537
|
-
'model.lr': 0.01,
|
|
538
|
-
'data.batch_size': 64
|
|
539
|
-
},
|
|
540
|
-
required=['model.backbone', 'data.dataset'], # Validation
|
|
541
|
-
on_missing='warn' # or 'error'
|
|
542
|
-
)
|
|
543
|
-
```
|
|
544
|
-
|
|
545
|
-
**Key features**:
|
|
546
|
-
- Config groups (model/, data/, optimizer/, etc.)
|
|
547
|
-
- Automatic file discovery (tries .yaml, .json, .toml, .xyz)
|
|
548
|
-
- Dotted overrides (`model.lr=0.01`)
|
|
549
|
-
- Required key validation
|
|
550
|
-
- Flexible error handling
|
|
551
|
-
|
|
552
|
-
#### 4. Argparse Integration
|
|
553
|
-
|
|
554
|
-
Mix Ato with existing argparse code:
|
|
555
|
-
|
|
556
|
-
```python
|
|
557
|
-
from ato.scope import Scope
|
|
558
|
-
import argparse
|
|
559
|
-
|
|
560
|
-
scope = Scope(use_external_parser=True)
|
|
561
|
-
parser = argparse.ArgumentParser()
|
|
562
|
-
parser.add_argument('--gpu', type=int, default=0)
|
|
563
|
-
parser.add_argument('--seed', type=int, default=42)
|
|
564
|
-
|
|
565
|
-
@scope.observe(default=True)
|
|
566
|
-
def config(cfg):
|
|
567
|
-
cfg.lr = 0.001
|
|
568
|
-
cfg.batch_size = 32
|
|
569
|
-
|
|
570
|
-
@scope
|
|
571
|
-
def train(cfg):
|
|
572
|
-
print(f"GPU: {cfg.gpu}, LR: {cfg.lr}")
|
|
573
|
-
|
|
574
|
-
if __name__ == '__main__':
|
|
575
|
-
parser.parse_args() # Merges argparse with scope
|
|
576
|
-
train()
|
|
577
|
-
```
|
|
578
|
-
|
|
579
|
-
#### 5. Configuration Documentation & Inspection
|
|
580
|
-
|
|
581
|
-
**One of Ato's most powerful features**: Auto-generate documentation AND visualize the exact order of configuration application.
|
|
582
|
-
|
|
583
|
-
##### Basic Documentation
|
|
584
|
-
|
|
585
|
-
```python
|
|
586
|
-
@scope.manual
|
|
587
|
-
def config_docs(cfg):
|
|
588
|
-
cfg.lr = 'Learning rate for optimizer'
|
|
589
|
-
cfg.batch_size = 'Number of samples per batch'
|
|
590
|
-
cfg.model = 'Model architecture (resnet50, resnet101, etc.)'
|
|
591
|
-
```
|
|
592
|
-
|
|
593
|
-
```bash
|
|
594
|
-
python train.py manual
|
|
595
|
-
```
|
|
596
|
-
|
|
597
|
-
**Output:**
|
|
598
|
-
```
|
|
599
|
-
--------------------------------------------------
|
|
600
|
-
[Scope "config"]
|
|
601
|
-
(The Applying Order of Views)
|
|
602
|
-
defaults → (CLI Inputs) → lazy_config → main
|
|
603
|
-
|
|
604
|
-
(User Manuals)
|
|
605
|
-
config.lr: Learning rate for optimizer
|
|
606
|
-
config.batch_size: Number of samples per batch
|
|
607
|
-
config.model: Model architecture (resnet50, resnet101, etc.)
|
|
608
|
-
--------------------------------------------------
|
|
609
|
-
```
|
|
610
|
-
|
|
611
|
-
##### Why This Matters
|
|
612
|
-
|
|
613
|
-
The **applying order visualization** shows you **exactly** how your configs are merged:
|
|
614
|
-
- Which config functions are applied (in order)
|
|
615
|
-
- When CLI inputs override values
|
|
616
|
-
- Where lazy configs are evaluated
|
|
617
|
-
- The final function that uses the config
|
|
618
|
-
|
|
619
|
-
**This prevents configuration bugs** by making the merge order explicit and debuggable.
|
|
620
|
-
|
|
621
|
-
##### MultiScope Documentation
|
|
622
|
-
|
|
623
|
-
For complex projects with multiple scopes, `manual` shows each scope separately:
|
|
624
|
-
|
|
625
|
-
```python
|
|
626
|
-
from ato.scope import Scope, MultiScope
|
|
627
|
-
|
|
628
|
-
model_scope = Scope(name='model')
|
|
629
|
-
train_scope = Scope(name='train')
|
|
630
|
-
scope = MultiScope(model_scope, train_scope)
|
|
631
|
-
|
|
632
|
-
@model_scope.observe(default=True)
|
|
633
|
-
def model_defaults(model):
|
|
634
|
-
model.backbone = 'resnet50'
|
|
635
|
-
model.num_layers = 50
|
|
636
|
-
|
|
637
|
-
@model_scope.observe(priority=1)
|
|
638
|
-
def model_advanced(model):
|
|
639
|
-
model.pretrained = True
|
|
640
|
-
|
|
641
|
-
@model_scope.observe(lazy=True)
|
|
642
|
-
def model_lazy(model):
|
|
643
|
-
if model.backbone == 'resnet101':
|
|
644
|
-
model.num_layers = 101
|
|
645
|
-
|
|
646
|
-
@train_scope.observe(default=True)
|
|
647
|
-
def train_defaults(train):
|
|
648
|
-
train.lr = 0.001
|
|
649
|
-
train.epochs = 100
|
|
650
|
-
|
|
651
|
-
@model_scope.manual
|
|
652
|
-
def model_docs(model):
|
|
653
|
-
model.backbone = 'Model backbone architecture'
|
|
654
|
-
model.num_layers = 'Number of layers in the model'
|
|
655
|
-
|
|
656
|
-
@train_scope.manual
|
|
657
|
-
def train_docs(train):
|
|
658
|
-
train.lr = 'Learning rate for optimizer'
|
|
659
|
-
train.epochs = 'Total training epochs'
|
|
660
|
-
|
|
661
|
-
@scope
|
|
662
|
-
def main(model, train):
|
|
663
|
-
print(f"Training {model.backbone} with lr={train.lr}")
|
|
664
|
-
|
|
665
|
-
if __name__ == '__main__':
|
|
666
|
-
main()
|
|
667
|
-
```
|
|
668
|
-
|
|
669
|
-
```bash
|
|
670
|
-
python train.py manual
|
|
671
|
-
```
|
|
672
|
-
|
|
673
|
-
**Output:**
|
|
674
|
-
```
|
|
675
|
-
--------------------------------------------------
|
|
676
|
-
[Scope "model"]
|
|
677
|
-
(The Applying Order of Views)
|
|
678
|
-
model_defaults → model_advanced → (CLI Inputs) → model_lazy → main
|
|
679
|
-
|
|
680
|
-
(User Manuals)
|
|
681
|
-
model.backbone: Model backbone architecture
|
|
682
|
-
model.num_layers: Number of layers in the model
|
|
683
|
-
--------------------------------------------------
|
|
684
|
-
[Scope "train"]
|
|
685
|
-
(The Applying Order of Views)
|
|
686
|
-
train_defaults → (CLI Inputs) → main
|
|
687
|
-
|
|
688
|
-
(User Manuals)
|
|
689
|
-
train.lr: Learning rate for optimizer
|
|
690
|
-
train.epochs: Total training epochs
|
|
691
|
-
--------------------------------------------------
|
|
692
|
-
```
|
|
693
|
-
|
|
694
|
-
##### Real-world Example
|
|
695
|
-
|
|
696
|
-
This is especially valuable when debugging why a config value isn't what you expect:
|
|
697
|
-
|
|
698
|
-
```python
|
|
699
|
-
@scope.observe(default=True)
|
|
700
|
-
def defaults(cfg):
|
|
701
|
-
cfg.lr = 0.001
|
|
702
|
-
|
|
703
|
-
@scope.observe(priority=1)
|
|
704
|
-
def experiment_config(cfg):
|
|
705
|
-
cfg.lr = 0.01
|
|
706
|
-
|
|
707
|
-
@scope.observe(priority=2)
|
|
708
|
-
def another_config(cfg):
|
|
709
|
-
cfg.lr = 0.1
|
|
710
|
-
|
|
711
|
-
@scope.observe(lazy=True)
|
|
712
|
-
def adaptive_lr(cfg):
|
|
713
|
-
if cfg.batch_size > 64:
|
|
714
|
-
cfg.lr = cfg.lr * 2
|
|
715
|
-
```
|
|
716
|
-
|
|
717
|
-
When you run `python train.py manual`, you see:
|
|
718
|
-
```
|
|
719
|
-
(The Applying Order of Views)
|
|
720
|
-
defaults → experiment_config → another_config → (CLI Inputs) → adaptive_lr → main
|
|
721
|
-
```
|
|
722
|
-
|
|
723
|
-
Now it's **crystal clear** why `lr=0.1` (from `another_config`) and not `0.01`!
|
|
724
|
-
|
|
725
|
-
---
|
|
726
|
-
|
|
727
|
-
## SQL Tracker: Experiment Tracking
|
|
728
|
-
|
|
729
|
-
Lightweight experiment tracking using SQLite - no external services, no setup complexity.
|
|
730
|
-
|
|
731
|
-
### Why SQL Tracker?
|
|
732
|
-
|
|
733
|
-
- **Zero Setup**: Just a SQLite file, no servers
|
|
734
|
-
- **Full History**: Track all runs, metrics, and artifacts
|
|
735
|
-
- **Smart Search**: Find similar experiments by config structure
|
|
736
|
-
- **Code Versioning**: Track code changes via fingerprints
|
|
737
|
-
|
|
738
|
-
### Database Schema
|
|
739
|
-
|
|
740
|
-
```
|
|
741
|
-
Project (my_ml_project)
|
|
742
|
-
├── Experiment (run_1)
|
|
743
|
-
│ ├── config: {...}
|
|
744
|
-
│ ├── structural_hash: "abc123..."
|
|
745
|
-
│ ├── Metrics: [loss, accuracy, ...]
|
|
746
|
-
│ ├── Artifacts: [model.pt, plots/*, ...]
|
|
747
|
-
│ └── Fingerprints: [model_forward, train_step, ...]
|
|
748
|
-
├── Experiment (run_2)
|
|
749
|
-
└── ...
|
|
750
|
-
```
|
|
751
|
-
|
|
752
|
-
### Quick Start
|
|
753
|
-
|
|
754
|
-
#### Logging Experiments
|
|
755
|
-
|
|
756
|
-
```python
|
|
757
|
-
from ato.db_routers.sql.manager import SQLLogger
|
|
758
|
-
from ato.adict import ADict
|
|
759
|
-
|
|
760
|
-
# Setup config
|
|
761
|
-
config = ADict(
|
|
762
|
-
experiment=ADict(
|
|
763
|
-
project_name='image_classification',
|
|
764
|
-
sql=ADict(db_path='sqlite:///experiments.db')
|
|
765
|
-
),
|
|
766
|
-
# Your hyperparameters
|
|
767
|
-
lr=0.001,
|
|
768
|
-
batch_size=32,
|
|
769
|
-
model='resnet50'
|
|
770
|
-
)
|
|
771
|
-
|
|
772
|
-
# Create logger
|
|
773
|
-
logger = SQLLogger(config)
|
|
774
|
-
|
|
775
|
-
# Start experiment run
|
|
776
|
-
run_id = logger.run(tags=['baseline', 'resnet50', 'cifar10'])
|
|
777
|
-
|
|
778
|
-
# Training loop
|
|
779
|
-
for epoch in range(100):
|
|
780
|
-
# Your training code
|
|
781
|
-
train_loss = train_one_epoch()
|
|
782
|
-
val_acc = validate()
|
|
783
|
-
|
|
784
|
-
# Log metrics
|
|
785
|
-
logger.log_metric('train_loss', train_loss, step=epoch)
|
|
786
|
-
logger.log_metric('val_accuracy', val_acc, step=epoch)
|
|
787
|
-
|
|
788
|
-
# Log artifacts
|
|
789
|
-
logger.log_artifact(run_id, 'checkpoints/model_best.pt',
|
|
790
|
-
data_type='model',
|
|
791
|
-
metadata={'epoch': best_epoch})
|
|
792
|
-
|
|
793
|
-
# Finish run
|
|
794
|
-
logger.finish(status='completed')
|
|
795
|
-
```
|
|
796
|
-
|
|
797
|
-
#### Querying Experiments
|
|
798
|
-
|
|
799
|
-
```python
|
|
800
|
-
from ato.db_routers.sql.manager import SQLFinder
|
|
801
|
-
|
|
802
|
-
finder = SQLFinder(config)
|
|
803
|
-
|
|
804
|
-
# Get all runs in project
|
|
805
|
-
runs = finder.get_runs_in_project('image_classification')
|
|
806
|
-
for run in runs:
|
|
807
|
-
print(f"Run {run.id}: {run.config.model} - {run.status}")
|
|
808
|
-
|
|
809
|
-
# Find best performing run
|
|
810
|
-
best_run = finder.find_best_run(
|
|
811
|
-
project_name='image_classification',
|
|
812
|
-
metric_key='val_accuracy',
|
|
813
|
-
mode='max' # or 'min' for loss
|
|
814
|
-
)
|
|
815
|
-
print(f"Best config: {best_run.config}")
|
|
816
|
-
|
|
817
|
-
# Find similar experiments (same config structure)
|
|
818
|
-
similar = finder.find_similar_runs(run_id=123)
|
|
819
|
-
print(f"Found {len(similar)} runs with similar config structure")
|
|
820
|
-
|
|
821
|
-
# Trace statistics (code fingerprints)
|
|
822
|
-
stats = finder.get_trace_statistics('image_classification', trace_id='model_forward')
|
|
823
|
-
print(f"Model forward pass has {stats['static_trace_versions']} versions")
|
|
824
|
-
```
|
|
825
|
-
|
|
826
|
-
### Real-world Example: Experiment Comparison
|
|
827
|
-
|
|
828
|
-
```python
|
|
829
|
-
# Compare hyperparameter impact
|
|
830
|
-
finder = SQLFinder(config)
|
|
831
|
-
|
|
832
|
-
runs = finder.get_runs_in_project('my_project')
|
|
833
|
-
for run in runs:
|
|
834
|
-
# Get final accuracy
|
|
835
|
-
final_metrics = [m for m in run.metrics if m.key == 'val_accuracy']
|
|
836
|
-
best_acc = max(m.value for m in final_metrics) if final_metrics else 0
|
|
837
|
-
|
|
838
|
-
print(f"LR: {run.config.lr}, Batch: {run.config.batch_size} → Acc: {best_acc:.2%}")
|
|
839
|
-
```
|
|
840
|
-
|
|
841
|
-
### Features Summary
|
|
842
|
-
|
|
843
|
-
| Feature | Description |
|
|
844
|
-
|---------|-------------|
|
|
845
|
-
| **Structural Hash** | Auto-track config structure changes |
|
|
846
|
-
| **Metric Logging** | Time-series metrics with step tracking |
|
|
847
|
-
| **Artifact Management** | Track model checkpoints, plots, data files |
|
|
848
|
-
| **Fingerprint Tracking** | Version control for code (static & runtime) |
|
|
849
|
-
| **Smart Search** | Find similar configs, best runs, statistics |
|
|
850
|
-
|
|
851
|
-
---
|
|
852
|
-
|
|
853
|
-
## Hyperparameter Optimization
|
|
854
|
-
|
|
855
|
-
Built-in **Hyperband** algorithm for efficient hyperparameter search with early stopping.
|
|
856
|
-
|
|
857
|
-
### Extensible Design
|
|
858
|
-
|
|
859
|
-
Ato's hyperopt module is built for extensibility and reusability:
|
|
860
|
-
|
|
861
|
-
| Component | Purpose | Benefit |
|
|
862
|
-
|-----------|---------|---------|
|
|
863
|
-
| `GridSpaceMixIn` | Parameter sampling logic | Reusable across different algorithms |
|
|
864
|
-
| `HyperOpt` | Base optimization class | Easy to implement custom strategies |
|
|
865
|
-
| `DistributedMixIn` | Distributed training support | Optional, composable |
|
|
866
|
-
|
|
867
|
-
**This design makes it trivial to implement custom search algorithms**:
|
|
868
|
-
|
|
869
|
-
```python
|
|
870
|
-
from ato.hyperopt.base import GridSpaceMixIn, HyperOpt
|
|
871
|
-
|
|
872
|
-
class RandomSearch(GridSpaceMixIn, HyperOpt):
|
|
873
|
-
def main(self, func):
|
|
874
|
-
# Reuse GridSpaceMixIn.prepare_distributions()
|
|
875
|
-
configs = self.prepare_distributions(self.config, self.search_spaces)
|
|
876
|
-
|
|
877
|
-
# Implement random sampling
|
|
878
|
-
import random
|
|
879
|
-
random.shuffle(configs)
|
|
880
|
-
|
|
881
|
-
results = []
|
|
882
|
-
for config in configs[:10]: # Sample 10 random configs
|
|
883
|
-
metric = func(config)
|
|
884
|
-
results.append((config, metric))
|
|
885
|
-
|
|
886
|
-
return max(results, key=lambda x: x[1])
|
|
887
|
-
```
|
|
888
|
-
|
|
889
|
-
### How Hyperband Works
|
|
890
|
-
|
|
891
|
-
Hyperband uses successive halving:
|
|
892
|
-
1. Start with many configs, train briefly
|
|
893
|
-
2. Keep top performers, discard poor ones
|
|
894
|
-
3. Train survivors longer
|
|
895
|
-
4. Repeat until one winner remains
|
|
896
|
-
|
|
897
|
-
### Basic Usage
|
|
898
|
-
|
|
899
|
-
```python
|
|
900
|
-
from ato.adict import ADict
|
|
901
|
-
from ato.hyperopt.hyperband import HyperBand
|
|
902
|
-
from ato.scope import Scope
|
|
903
|
-
|
|
904
|
-
scope = Scope()
|
|
905
|
-
|
|
906
|
-
# Define search space
|
|
907
|
-
search_spaces = ADict(
|
|
908
|
-
lr=ADict(
|
|
909
|
-
param_type='FLOAT',
|
|
910
|
-
param_range=(1e-5, 1e-1),
|
|
911
|
-
num_samples=20,
|
|
912
|
-
space_type='LOG' # Logarithmic spacing
|
|
913
|
-
),
|
|
914
|
-
batch_size=ADict(
|
|
915
|
-
param_type='INTEGER',
|
|
916
|
-
param_range=(16, 128),
|
|
917
|
-
num_samples=5,
|
|
918
|
-
space_type='LOG'
|
|
919
|
-
),
|
|
920
|
-
model=ADict(
|
|
921
|
-
param_type='CATEGORY',
|
|
922
|
-
categories=['resnet50', 'resnet101', 'efficientnet_b0']
|
|
923
|
-
)
|
|
924
|
-
)
|
|
925
|
-
|
|
926
|
-
# Create Hyperband optimizer
|
|
927
|
-
hyperband = HyperBand(
|
|
928
|
-
scope,
|
|
929
|
-
search_spaces,
|
|
930
|
-
halving_rate=0.3, # Keep top 30% each round
|
|
931
|
-
num_min_samples=3, # Stop when <= 3 configs remain
|
|
932
|
-
mode='max' # Maximize metric (use 'min' for loss)
|
|
933
|
-
)
|
|
934
|
-
|
|
935
|
-
@hyperband.main
|
|
936
|
-
def train(config):
|
|
937
|
-
# Your training code
|
|
938
|
-
model = create_model(config.model)
|
|
939
|
-
optimizer = Adam(lr=config.lr)
|
|
940
|
-
|
|
941
|
-
# Use __num_halved__ for early stopping
|
|
942
|
-
num_epochs = compute_epochs(config.__num_halved__)
|
|
943
|
-
|
|
944
|
-
# Train and return metric
|
|
945
|
-
val_acc = train_and_evaluate(model, optimizer, num_epochs)
|
|
946
|
-
return val_acc
|
|
947
|
-
|
|
948
|
-
if __name__ == '__main__':
|
|
949
|
-
# Run hyperparameter search
|
|
950
|
-
best_result = train()
|
|
951
|
-
print(f"Best config: {best_result.config}")
|
|
952
|
-
print(f"Best metric: {best_result.metric}")
|
|
953
|
-
```
|
|
954
|
-
|
|
955
|
-
### Automatic Step Calculation
|
|
956
|
-
|
|
957
|
-
Let Hyperband compute optimal training steps:
|
|
958
|
-
|
|
959
|
-
```python
|
|
960
|
-
hyperband = HyperBand(scope, search_spaces, halving_rate=0.3, num_min_samples=4)
|
|
961
|
-
|
|
962
|
-
max_steps = 100000
|
|
963
|
-
steps_per_generation = hyperband.compute_optimized_initial_training_steps(max_steps)
|
|
964
|
-
# Example output: [27, 88, 292, 972, 3240, 10800, 36000, 120000]
|
|
965
|
-
|
|
966
|
-
# Use in training
|
|
967
|
-
@hyperband.main
|
|
968
|
-
def train(config):
|
|
969
|
-
generation = config.__num_halved__
|
|
970
|
-
num_steps = steps_per_generation[generation]
|
|
971
|
-
|
|
972
|
-
metric = train_for_n_steps(num_steps)
|
|
973
|
-
return metric
|
|
974
|
-
```
|
|
975
|
-
|
|
976
|
-
### Parameter Types
|
|
977
|
-
|
|
978
|
-
| Type | Description | Example |
|
|
979
|
-
|------|-------------|---------|
|
|
980
|
-
| `FLOAT` | Continuous values | Learning rate, dropout |
|
|
981
|
-
| `INTEGER` | Discrete integers | Batch size, num layers |
|
|
982
|
-
| `CATEGORY` | Categorical choices | Model type, optimizer |
|
|
983
|
-
|
|
984
|
-
Space types:
|
|
985
|
-
- `LOG`: Logarithmic spacing (good for learning rates)
|
|
986
|
-
- `LINEAR`: Linear spacing (default)
|
|
987
|
-
|
|
988
|
-
### Distributed Hyperparameter Search
|
|
989
|
-
|
|
990
|
-
Ato supports distributed hyperparameter optimization out of the box:
|
|
991
|
-
|
|
992
|
-
```python
|
|
993
|
-
from ato.hyperopt.hyperband import DistributedHyperBand
|
|
994
|
-
import torch.distributed as dist
|
|
995
|
-
|
|
996
|
-
# Initialize distributed training
|
|
997
|
-
dist.init_process_group(backend='nccl')
|
|
998
|
-
rank = dist.get_rank()
|
|
999
|
-
world_size = dist.get_world_size()
|
|
1000
|
-
|
|
1001
|
-
# Create distributed hyperband
|
|
1002
|
-
hyperband = DistributedHyperBand(
|
|
1003
|
-
scope,
|
|
1004
|
-
search_spaces,
|
|
1005
|
-
halving_rate=0.3,
|
|
1006
|
-
num_min_samples=3,
|
|
1007
|
-
mode='max',
|
|
1008
|
-
rank=rank,
|
|
1009
|
-
world_size=world_size,
|
|
1010
|
-
backend='pytorch'
|
|
1011
|
-
)
|
|
1012
|
-
|
|
1013
|
-
@hyperband.main
|
|
1014
|
-
def train(config):
|
|
1015
|
-
# Your distributed training code
|
|
1016
|
-
model = create_model(config)
|
|
1017
|
-
model = DDP(model, device_ids=[rank])
|
|
1018
|
-
metric = train_and_evaluate(model)
|
|
1019
|
-
return metric
|
|
1020
|
-
|
|
1021
|
-
if __name__ == '__main__':
|
|
1022
|
-
result = train()
|
|
1023
|
-
if rank == 0:
|
|
1024
|
-
print(f"Best config: {result.config}")
|
|
1025
|
-
```
|
|
1026
|
-
|
|
1027
|
-
**Key features**:
|
|
1028
|
-
- Automatic work distribution across GPUs
|
|
1029
|
-
- Synchronized config selection via `broadcast_object_from_root`
|
|
1030
|
-
- Results aggregation with `all_gather_object`
|
|
1031
|
-
- Compatible with PyTorch DDP, FSDP, DeepSpeed
|
|
1032
|
-
|
|
1033
|
-
---
|
|
1034
|
-
|
|
1035
|
-
## Best Practices
|
|
1036
|
-
|
|
1037
|
-
### 1. Project Structure
|
|
1038
|
-
|
|
1039
|
-
```
|
|
1040
|
-
my_project/
|
|
1041
|
-
├── configs/
|
|
1042
|
-
│ ├── default.py # Default config with @scope.observe(default=True)
|
|
1043
|
-
│ ├── models.py # Model-specific configs
|
|
1044
|
-
│ └── datasets.py # Dataset configs
|
|
1045
|
-
├── train.py # Main training script
|
|
1046
|
-
├── experiments.db # SQLite experiment tracking
|
|
1047
|
-
└── experiments/
|
|
1048
|
-
├── run_001/
|
|
1049
|
-
│ ├── checkpoints/
|
|
1050
|
-
│ └── logs/
|
|
1051
|
-
└── run_002/
|
|
1052
|
-
```
|
|
1053
|
-
|
|
1054
|
-
### 2. Config Organization
|
|
1055
|
-
|
|
1056
|
-
```python
|
|
1057
|
-
# configs/default.py
|
|
1058
|
-
from ato.scope import Scope
|
|
1059
|
-
|
|
1060
|
-
scope = Scope()
|
|
1061
|
-
|
|
1062
|
-
@scope.observe(default=True)
|
|
1063
|
-
def defaults(cfg):
|
|
1064
|
-
# Data
|
|
1065
|
-
cfg.data = ADict(
|
|
1066
|
-
dataset='cifar10',
|
|
1067
|
-
batch_size=32,
|
|
1068
|
-
num_workers=4
|
|
1069
|
-
)
|
|
1070
|
-
|
|
1071
|
-
# Model
|
|
1072
|
-
cfg.model = ADict(
|
|
1073
|
-
backbone='resnet50',
|
|
1074
|
-
pretrained=True
|
|
1075
|
-
)
|
|
1076
|
-
|
|
1077
|
-
# Training
|
|
1078
|
-
cfg.train = ADict(
|
|
1079
|
-
lr=0.001,
|
|
1080
|
-
epochs=100,
|
|
1081
|
-
optimizer='adam'
|
|
1082
|
-
)
|
|
1083
|
-
|
|
1084
|
-
# Experiment tracking
|
|
1085
|
-
cfg.experiment = ADict(
|
|
1086
|
-
project_name='my_project',
|
|
1087
|
-
sql=ADict(db_path='sqlite:///experiments.db')
|
|
1088
|
-
)
|
|
1089
|
-
```
|
|
1090
|
-
|
|
1091
|
-
### 3. Combined Workflow
|
|
1092
|
-
|
|
1093
|
-
```python
|
|
1094
|
-
from ato.scope import Scope
|
|
1095
|
-
from ato.db_routers.sql.manager import SQLLogger
|
|
1096
|
-
from configs.default import scope
|
|
1097
|
-
|
|
1098
|
-
@scope
|
|
1099
|
-
def train(cfg):
|
|
1100
|
-
# Setup experiment tracking
|
|
1101
|
-
logger = SQLLogger(cfg)
|
|
1102
|
-
run_id = logger.run(tags=[cfg.model.backbone, cfg.data.dataset])
|
|
1103
|
-
|
|
1104
|
-
try:
|
|
1105
|
-
# Training loop
|
|
1106
|
-
for epoch in range(cfg.train.epochs):
|
|
1107
|
-
loss = train_epoch()
|
|
1108
|
-
acc = validate()
|
|
1109
|
-
|
|
1110
|
-
logger.log_metric('loss', loss, epoch)
|
|
1111
|
-
logger.log_metric('accuracy', acc, epoch)
|
|
1112
|
-
|
|
1113
|
-
logger.finish(status='completed')
|
|
1114
|
-
|
|
1115
|
-
except Exception as e:
|
|
1116
|
-
logger.finish(status='failed')
|
|
1117
|
-
raise e
|
|
1118
|
-
|
|
1119
|
-
if __name__ == '__main__':
|
|
1120
|
-
train()
|
|
1121
|
-
```
|
|
1122
|
-
|
|
1123
|
-
### 4. Reproducibility Checklist
|
|
1124
|
-
|
|
1125
|
-
- ✅ Use structural hashing to track config changes
|
|
1126
|
-
- ✅ Log all hyperparameters to SQLLogger
|
|
1127
|
-
- ✅ Tag experiments with meaningful labels
|
|
1128
|
-
- ✅ Track artifacts (checkpoints, plots)
|
|
1129
|
-
- ✅ Use lazy configs for derived parameters
|
|
1130
|
-
- ✅ Document configs with `@scope.manual`
|
|
1131
|
-
|
|
1132
|
-
---
|
|
1133
|
-
|
|
1134
|
-
## Requirements
|
|
1135
|
-
|
|
1136
|
-
- Python >= 3.7
|
|
1137
|
-
- SQLAlchemy (for SQL Tracker)
|
|
1138
|
-
- PyYAML, toml (for config serialization)
|
|
1139
|
-
|
|
1140
|
-
See `pyproject.toml` for full dependencies.
|
|
1141
|
-
|
|
1142
|
-
---
|
|
1143
|
-
|
|
1144
|
-
## License
|
|
1145
|
-
|
|
1146
|
-
MIT License
|
|
1147
|
-
|
|
1148
|
-
---
|
|
1149
|
-
|
|
1150
|
-
## Roadmap: Expanding Boundaries Without Breaking Neutrality
|
|
1151
|
-
|
|
1152
|
-
Ato's design constraint is **structural neutrality** — adding capabilities without creating dependencies.
|
|
1153
|
-
|
|
1154
|
-
### Planned: Local Dashboard (Optional Module)
|
|
1155
|
-
|
|
1156
|
-
A lightweight HTML dashboard for teams that want visual exploration **without** committing to MLflow/W&B:
|
|
1157
|
-
|
|
1158
|
-
**What it adds:**
|
|
1159
|
-
- Metric comparison & trends (read-only view of SQLite data)
|
|
1160
|
-
- Run history & artifact browsing
|
|
1161
|
-
- Config diff visualization (including structural hash changes)
|
|
1162
|
-
- Interactive hyperparameter analysis
|
|
1163
|
-
|
|
1164
|
-
**Design constraints:**
|
|
1165
|
-
- **No hard dependency** — Ato core works 100% without the dashboard
|
|
1166
|
-
- **Separate process** — Dashboard reads from SQLite; doesn't block or modify runs
|
|
1167
|
-
- **Zero lock-in** — Remove the dashboard, and your training code doesn't change
|
|
1168
|
-
- **Composable** — Use it alongside MLflow/W&B, or replace either one
|
|
1169
|
-
|
|
1170
|
-
### Why This Fits Ato's Philosophy
|
|
1171
|
-
|
|
1172
|
-
The dashboard is **not** a platform — it's a **view** into data you already own (SQLite).
|
|
1173
|
-
|
|
1174
|
-
| What It Doesn't Do | Why That Matters |
|
|
1175
|
-
|--------------------|------------------|
|
|
1176
|
-
| Doesn't store data | You can delete it without losing experiments |
|
|
1177
|
-
| Doesn't require auth | No accounts, no vendors, no network calls |
|
|
1178
|
-
| Doesn't modify configs | Pure read-only visualization |
|
|
1179
|
-
| Doesn't couple to Ato's core | Works with any SQLite database |
|
|
1180
|
-
|
|
1181
|
-
This preserves Ato's design principle: **provide handles, not ownership.**
|
|
1182
|
-
|
|
1183
|
-
### Modular Adoption Path
|
|
1184
|
-
|
|
1185
|
-
| What You Need | What You Use |
|
|
1186
|
-
|---------------|--------------|
|
|
1187
|
-
| Just configs | `ADict` + `Scope` — no DB, no UI |
|
|
1188
|
-
| Headless tracking | Add SQLTracker — still no UI |
|
|
1189
|
-
| Local visualization | Add dashboard daemon — run/stop anytime |
|
|
1190
|
-
| Team collaboration | Sync to MLflow/W&B dashboards |
|
|
1191
|
-
|
|
1192
|
-
**Guiding principle:** Ato remains a set of **independent, composable tools** — not a platform you commit to.
|
|
1193
|
-
|
|
1194
|
-
---
|
|
1195
|
-
|
|
1196
|
-
## Contributing
|
|
1197
|
-
|
|
1198
|
-
Contributions are welcome! Please feel free to submit issues or pull requests.
|
|
1199
|
-
|
|
1200
|
-
### Development Setup
|
|
1201
|
-
|
|
1202
|
-
```bash
|
|
1203
|
-
git clone https://github.com/yourusername/ato.git
|
|
1204
|
-
cd ato
|
|
1205
|
-
pip install -e .
|
|
1206
|
-
```
|
|
1207
|
-
|
|
1208
|
-
---
|
|
1209
|
-
|
|
1210
|
-
## Working with Existing Tools
|
|
1211
|
-
|
|
1212
|
-
**Ato is an integration layer, not a replacement.**
|
|
1213
|
-
|
|
1214
|
-
It's designed to **compose** with Hydra, MLflow, W&B, and whatever tools you already use.
|
|
1215
|
-
The goal isn't to compete — it's to **give you handles** for connecting systems without coupling your workflow to any single platform.
|
|
1216
|
-
|
|
1217
|
-
### Composition Strategy
|
|
1218
|
-
|
|
1219
|
-
Ato provides three structural capabilities that frameworks don't:
|
|
1220
|
-
|
|
1221
|
-
| What Ato Adds | Why It Matters | How It Composes |
|
|
1222
|
-
|---------------|----------------|-----------------|
|
|
1223
|
-
| **MultiScope** | True namespace isolation | Multiple config sources (Hydra, raw Python, CLI) coexist without key collisions |
|
|
1224
|
-
| **`manual` command** | Config merge order visualization | Debug **why** a value was set — see exact merge order across all sources |
|
|
1225
|
-
| **Offline-first tracking** | Zero-setup SQLite tracking | Track locally, **then** sync to MLflow/W&B only when you need dashboards |
|
|
1226
|
-
|
|
1227
|
-
These aren't "features" — they're **structural boundaries** that let you compose tools freely.
|
|
1228
|
-
|
|
1229
|
-
### Ato + Hydra: Designed to Compose
|
|
1230
|
-
|
|
1231
|
-
Ato has **native Hydra composition** via `compose_hierarchy()`:
|
|
1232
|
-
|
|
1233
|
-
```python
|
|
1234
|
-
from ato.adict import ADict
|
|
1235
|
-
|
|
1236
|
-
# Load Hydra-style configs directly
|
|
1237
|
-
config = ADict.compose_hierarchy(
|
|
1238
|
-
root='configs',
|
|
1239
|
-
config_filename='config',
|
|
1240
|
-
select={'model': 'resnet50', 'data': 'imagenet'},
|
|
1241
|
-
overrides={'model.lr': 0.01}
|
|
1242
|
-
)
|
|
1243
|
-
|
|
1244
|
-
# Now add Ato's structural boundaries:
|
|
1245
|
-
# - MultiScope for independent namespaces
|
|
1246
|
-
# - `manual` command to debug merge order
|
|
1247
|
-
# - SQLite tracking without MLflow overhead
|
|
1248
|
-
```
|
|
1249
|
-
|
|
1250
|
-
**You're not replacing Hydra** — you're **extending** it with namespace isolation and debuggable composition.
|
|
1251
|
-
|
|
1252
|
-
### Integration Matrix
|
|
1253
|
-
|
|
1254
|
-
Ato is designed to work **between** frameworks:
|
|
1255
|
-
|
|
1256
|
-
| Tool | What It Owns | What Ato Adds |
|
|
1257
|
-
|------|--------------|---------------|
|
|
1258
|
-
| **Hydra** | Config composition from YAML | MultiScope (namespace isolation) + merge debugging |
|
|
1259
|
-
| **MLflow** | Centralized experiment platform | Local-first SQLite tracking + structural hashing |
|
|
1260
|
-
| **W&B** | Cloud-based tracking + dashboards | Offline tracking + sync when ready |
|
|
1261
|
-
| **OpenMMLab** | Config inheritance (`_base_`) | Direct import via `load_mm_config()` |
|
|
1262
|
-
| **Optuna/Ray Tune** | Advanced hyperparameter search | Built-in Hyperband + composable with their optimizers |
|
|
1263
|
-
|
|
1264
|
-
### Composition Patterns
|
|
1265
|
-
|
|
1266
|
-
**Pattern 1: Ato as the integration layer**
|
|
1267
|
-
```
|
|
1268
|
-
Hydra (config source) → Ato (composition + tracking) → MLflow (dashboards)
|
|
1269
|
-
```
|
|
1270
|
-
|
|
1271
|
-
**Pattern 2: Ato for local development**
|
|
1272
|
-
```
|
|
1273
|
-
Local experiments: Ato (full stack)
|
|
1274
|
-
Production: Ato → MLflow (centralized tracking)
|
|
1275
|
-
```
|
|
1276
|
-
|
|
1277
|
-
**Pattern 3: Gradual adoption**
|
|
1278
|
-
```
|
|
1279
|
-
Start: Ato alone (zero dependencies)
|
|
1280
|
-
Scale: Add Hydra for complex configs
|
|
1281
|
-
Collaborate: Sync to W&B for team dashboards
|
|
1282
|
-
```
|
|
1283
|
-
|
|
1284
|
-
### When to Use What
|
|
1285
|
-
|
|
1286
|
-
**Use Ato alone** when:
|
|
1287
|
-
- You want zero external dependencies
|
|
1288
|
-
- You need namespace isolation (MultiScope)
|
|
1289
|
-
- You want to debug config merge order
|
|
1290
|
-
|
|
1291
|
-
**Compose with Hydra** when:
|
|
1292
|
-
- Your team already has Hydra YAML configs
|
|
1293
|
-
- You need deep config hierarchies + namespace isolation
|
|
1294
|
-
- You want to see **why** a Hydra config set a value (`manual` command)
|
|
1295
|
-
|
|
1296
|
-
**Compose with MLflow/W&B** when:
|
|
1297
|
-
- You want local-first tracking with optional cloud sync
|
|
1298
|
-
- You need structural hashing + offline SQLite
|
|
1299
|
-
- You're migrating between tracking platforms
|
|
1300
|
-
|
|
1301
|
-
**You don't need Ato** if:
|
|
1302
|
-
- You're fully committed to a single framework ecosystem
|
|
1303
|
-
- You don't need debuggable config composition
|
|
1304
|
-
- You never switch between tools
|
|
1305
|
-
|
|
1306
|
-
### What Ato Doesn't Do
|
|
1307
|
-
|
|
1308
|
-
Ato intentionally **doesn't** build an ecosystem:
|
|
1309
|
-
- No web dashboards → Use MLflow/W&B
|
|
1310
|
-
- No model registry → Use MLflow
|
|
1311
|
-
- No dataset versioning → Use DVC/W&B
|
|
1312
|
-
- No plugin marketplace → Use Hydra
|
|
1313
|
-
|
|
1314
|
-
**Ato's goal:** Stay out of your way. Provide handles. Let you change tools without rewriting code.
|