ato 2.0.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ato/__init__.py +1 -0
- ato/adict.py +582 -0
- ato/db_routers/__init__.py +8 -0
- ato/db_routers/sql/__init__.py +0 -0
- ato/db_routers/sql/manager.py +188 -0
- ato/db_routers/sql/schema.py +83 -0
- ato/hyperopt/__init__.py +0 -0
- ato/hyperopt/base.py +144 -0
- ato/hyperopt/hyperband.py +103 -0
- ato/parser.py +103 -0
- ato/scope.py +492 -0
- ato/utils.py +55 -0
- ato/xyz.py +234 -0
- ato-2.0.4.dist-info/METADATA +978 -0
- ato-2.0.4.dist-info/RECORD +18 -0
- ato-2.0.4.dist-info/WHEEL +5 -0
- ato-2.0.4.dist-info/licenses/LICENSE +21 -0
- ato-2.0.4.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,978 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ato
|
|
3
|
+
Version: 2.0.4
|
|
4
|
+
Summary: Configuration, experimentation, and hyperparameter optimization for Python. No runtime magic. No launcher. Just Python modules you compose.
|
|
5
|
+
Author: ato contributors
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/yourusername/ato
|
|
8
|
+
Project-URL: Repository, https://github.com/yourusername/ato
|
|
9
|
+
Project-URL: Documentation, https://github.com/yourusername/ato#readme
|
|
10
|
+
Project-URL: Issues, https://github.com/yourusername/ato/issues
|
|
11
|
+
Keywords: config management,experiment tracking,hyperparameter optimization,lightweight,composable,namespace isolation,machine learning
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.7
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
23
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
24
|
+
Requires-Python: >=3.7
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: pyyaml>=6.0
|
|
28
|
+
Requires-Dist: toml>=0.10.2
|
|
29
|
+
Requires-Dist: sqlalchemy>=2.0
|
|
30
|
+
Requires-Dist: numpy>=1.19.0
|
|
31
|
+
Provides-Extra: distributed
|
|
32
|
+
Requires-Dist: torch>=1.8.0; extra == "distributed"
|
|
33
|
+
Dynamic: license-file
|
|
34
|
+
|
|
35
|
+
# Ato: A Tiny Orchestrator
|
|
36
|
+
|
|
37
|
+
**Configuration, experimentation, and hyperparameter optimization for Python.**
|
|
38
|
+
|
|
39
|
+
No runtime magic. No launcher. No platform.
|
|
40
|
+
Just Python modules you compose.
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
pip install ato
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Design Philosophy
|
|
49
|
+
|
|
50
|
+
Ato was built on three constraints:
|
|
51
|
+
|
|
52
|
+
1. **Visibility** — When configs merge from multiple sources, you should see **why** a value was set.
|
|
53
|
+
2. **Composability** — Each module (ADict, Scope, SQLTracker, HyperOpt) works independently. Use one, use all, or mix with other tools.
|
|
54
|
+
3. **Structural neutrality** — Ato is a layer, not a platform. It has no opinion on your stack.
|
|
55
|
+
|
|
56
|
+
This isn't minimalism for its own sake.
|
|
57
|
+
It's **structural restraint** — interfering only where necessary, staying out of the way everywhere else.
|
|
58
|
+
|
|
59
|
+
**What Ato provides:**
|
|
60
|
+
- **Config composition** with explicit priority and merge order debugging
|
|
61
|
+
- **Namespace isolation** for multi-team projects (MultiScope)
|
|
62
|
+
- **Experiment tracking** in local SQLite with zero setup
|
|
63
|
+
- **Hyperparameter search** via Hyperband (or compose with Optuna/Ray Tune)
|
|
64
|
+
|
|
65
|
+
**What Ato doesn't provide:**
|
|
66
|
+
- Web dashboards (use MLflow/W&B)
|
|
67
|
+
- Model registry (use MLflow)
|
|
68
|
+
- Dataset versioning (use DVC)
|
|
69
|
+
- Plugin marketplace
|
|
70
|
+
|
|
71
|
+
Ato is designed to work **between** tools, not replace them.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Quick Start
|
|
76
|
+
|
|
77
|
+
### 30-Second Example
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
from ato.scope import Scope
|
|
81
|
+
|
|
82
|
+
scope = Scope()
|
|
83
|
+
|
|
84
|
+
@scope.observe(default=True)
|
|
85
|
+
def config(cfg):
|
|
86
|
+
cfg.lr = 0.001
|
|
87
|
+
cfg.batch_size = 32
|
|
88
|
+
cfg.model = 'resnet50'
|
|
89
|
+
|
|
90
|
+
@scope
|
|
91
|
+
def train(cfg):
|
|
92
|
+
print(f"Training {cfg.model} with lr={cfg.lr}")
|
|
93
|
+
# Your training code here
|
|
94
|
+
|
|
95
|
+
if __name__ == '__main__':
|
|
96
|
+
train() # python train.py
|
|
97
|
+
# Override from CLI: python train.py lr=0.01 model=%resnet101%
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
**Key features:**
|
|
101
|
+
- `@scope.observe()` defines config sources
|
|
102
|
+
- `@scope` injects the merged config
|
|
103
|
+
- CLI overrides work automatically
|
|
104
|
+
- Priority-based merging (defaults → named configs → CLI → lazy evaluation)
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Table of Contents
|
|
109
|
+
|
|
110
|
+
- [ADict: Enhanced Dictionary](#adict-enhanced-dictionary)
|
|
111
|
+
- [Scope: Configuration Management](#scope-configuration-management)
|
|
112
|
+
- [MultiScope: Namespace Isolation](#multiscope-namespace-isolation)
|
|
113
|
+
- [Config Documentation & Debugging](#configuration-documentation--debugging)
|
|
114
|
+
- [SQL Tracker: Experiment Tracking](#sql-tracker-experiment-tracking)
|
|
115
|
+
- [Hyperparameter Optimization](#hyperparameter-optimization)
|
|
116
|
+
- [Best Practices](#best-practices)
|
|
117
|
+
- [Contributing](#contributing)
|
|
118
|
+
- [Composability](#composability)
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## ADict: Enhanced Dictionary
|
|
123
|
+
|
|
124
|
+
`ADict` is an enhanced dictionary for managing experiment configurations.
|
|
125
|
+
|
|
126
|
+
### Core Features
|
|
127
|
+
|
|
128
|
+
| Feature | Description | Why It Matters |
|
|
129
|
+
|---------|-------------|----------------|
|
|
130
|
+
| **Structural Hashing** | Hash based on keys + types, not values | Track when experiment **structure** changes (not just hyperparameters) |
|
|
131
|
+
| **Nested Access** | Dot notation for nested configs | `config.model.lr` instead of `config['model']['lr']` |
|
|
132
|
+
| **Format Agnostic** | Load/save JSON, YAML, TOML, XYZ | Work with any config format |
|
|
133
|
+
| **Safe Updates** | `update_if_absent()` method | Merge configs without accidental overwrites |
|
|
134
|
+
| **Auto-nested** | `ADict.auto()` for lazy creation | `config.a.b.c = 1` just works - no KeyError |
|
|
135
|
+
|
|
136
|
+
### Examples
|
|
137
|
+
|
|
138
|
+
#### Structural Hashing
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
from ato.adict import ADict
|
|
142
|
+
|
|
143
|
+
# Same structure, different values
|
|
144
|
+
config1 = ADict(lr=0.1, epochs=100, model='resnet50')
|
|
145
|
+
config2 = ADict(lr=0.01, epochs=200, model='resnet101')
|
|
146
|
+
print(config1.get_structural_hash() == config2.get_structural_hash()) # True
|
|
147
|
+
|
|
148
|
+
# Different structure (epochs is str!)
|
|
149
|
+
config3 = ADict(lr=0.1, epochs='100', model='resnet50')
|
|
150
|
+
print(config1.get_structural_hash() == config3.get_structural_hash()) # False
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
#### Auto-nested Configs
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
# ❌ Traditional way
|
|
157
|
+
config = ADict()
|
|
158
|
+
config.model = ADict()
|
|
159
|
+
config.model.backbone = ADict()
|
|
160
|
+
config.model.backbone.layers = [64, 128, 256]
|
|
161
|
+
|
|
162
|
+
# ✅ With ADict.auto()
|
|
163
|
+
config = ADict.auto()
|
|
164
|
+
config.model.backbone.layers = [64, 128, 256] # Just works!
|
|
165
|
+
config.data.augmentation.brightness = 0.2
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
#### Format Agnostic
|
|
169
|
+
|
|
170
|
+
```python
|
|
171
|
+
# Load/save any format
|
|
172
|
+
config = ADict.from_file('config.json')
|
|
173
|
+
config.dump('config.yaml')
|
|
174
|
+
|
|
175
|
+
# Safe updates
|
|
176
|
+
config.update_if_absent(lr=0.01, scheduler='cosine') # Only adds scheduler
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Scope: Configuration Management
|
|
182
|
+
|
|
183
|
+
Scope manages configuration through **priority-based merging** and **CLI integration**.
|
|
184
|
+
|
|
185
|
+
### Key Concept: Priority Chain
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
Default Configs (priority=0)
|
|
189
|
+
↓
|
|
190
|
+
Named Configs (priority=0+)
|
|
191
|
+
↓
|
|
192
|
+
CLI Arguments (highest priority)
|
|
193
|
+
↓
|
|
194
|
+
Lazy Configs (computed after CLI)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Basic Usage
|
|
198
|
+
|
|
199
|
+
#### Simple Configuration
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
from ato.scope import Scope
|
|
203
|
+
|
|
204
|
+
scope = Scope()
|
|
205
|
+
|
|
206
|
+
@scope.observe()
|
|
207
|
+
def my_config(config):
|
|
208
|
+
config.dataset = 'cifar10'
|
|
209
|
+
config.lr = 0.001
|
|
210
|
+
config.batch_size = 32
|
|
211
|
+
|
|
212
|
+
@scope
|
|
213
|
+
def train(config):
|
|
214
|
+
print(f"Training on {config.dataset}")
|
|
215
|
+
# Your code here
|
|
216
|
+
|
|
217
|
+
if __name__ == '__main__':
|
|
218
|
+
train()
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
#### Priority-based Merging
|
|
222
|
+
|
|
223
|
+
```python
|
|
224
|
+
@scope.observe(default=True) # Always applied
|
|
225
|
+
def defaults(cfg):
|
|
226
|
+
cfg.lr = 0.001
|
|
227
|
+
cfg.epochs = 100
|
|
228
|
+
|
|
229
|
+
@scope.observe(priority=1) # Applied after defaults
|
|
230
|
+
def high_lr(cfg):
|
|
231
|
+
cfg.lr = 0.01
|
|
232
|
+
|
|
233
|
+
@scope.observe(priority=2) # Applied last
|
|
234
|
+
def long_training(cfg):
|
|
235
|
+
cfg.epochs = 300
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
python train.py # lr=0.001, epochs=100
|
|
240
|
+
python train.py high_lr # lr=0.01, epochs=100
|
|
241
|
+
python train.py high_lr long_training # lr=0.01, epochs=300
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
#### CLI Configuration
|
|
245
|
+
|
|
246
|
+
Override any parameter from command line:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Simple values
|
|
250
|
+
python train.py lr=0.01 batch_size=64
|
|
251
|
+
|
|
252
|
+
# Nested configs
|
|
253
|
+
python train.py model.backbone=%resnet101% model.depth=101
|
|
254
|
+
|
|
255
|
+
# Lists and complex types
|
|
256
|
+
python train.py layers=[64,128,256,512] dropout=0.5
|
|
257
|
+
|
|
258
|
+
# Combine with named configs
|
|
259
|
+
python train.py my_config lr=0.001 batch_size=128
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Note**: Wrap strings with `%` (e.g., `%resnet101%`) instead of quotes.
|
|
263
|
+
|
|
264
|
+
### Lazy Evaluation
|
|
265
|
+
|
|
266
|
+
Sometimes you need configs that depend on other values set via CLI:
|
|
267
|
+
|
|
268
|
+
```python
|
|
269
|
+
@scope.observe()
|
|
270
|
+
def base_config(cfg):
|
|
271
|
+
cfg.model = 'resnet50'
|
|
272
|
+
cfg.dataset = 'imagenet'
|
|
273
|
+
|
|
274
|
+
@scope.observe(lazy=True) # Evaluated AFTER CLI args
|
|
275
|
+
def computed_config(cfg):
|
|
276
|
+
# Adjust based on dataset
|
|
277
|
+
if cfg.dataset == 'imagenet':
|
|
278
|
+
cfg.num_classes = 1000
|
|
279
|
+
cfg.image_size = 224
|
|
280
|
+
elif cfg.dataset == 'cifar10':
|
|
281
|
+
cfg.num_classes = 10
|
|
282
|
+
cfg.image_size = 32
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
python train.py dataset=%cifar10% computed_config
|
|
287
|
+
# Results in: num_classes=10, image_size=32
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
**Python 3.11+ Context Manager**:
|
|
291
|
+
|
|
292
|
+
```python
|
|
293
|
+
@scope.observe()
|
|
294
|
+
def my_config(cfg):
|
|
295
|
+
cfg.model = 'resnet50'
|
|
296
|
+
cfg.num_layers = 50
|
|
297
|
+
|
|
298
|
+
with Scope.lazy(): # Evaluated after CLI
|
|
299
|
+
if cfg.model == 'resnet101':
|
|
300
|
+
cfg.num_layers = 101
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### MultiScope: Namespace Isolation
|
|
304
|
+
|
|
305
|
+
Manage completely separate configuration namespaces with independent priority systems.
|
|
306
|
+
|
|
307
|
+
**Use case**: Different teams own different scopes without key collisions.
|
|
308
|
+
|
|
309
|
+
```python
|
|
310
|
+
from ato.scope import Scope, MultiScope
|
|
311
|
+
|
|
312
|
+
model_scope = Scope(name='model')
|
|
313
|
+
data_scope = Scope(name='data')
|
|
314
|
+
scope = MultiScope(model_scope, data_scope)
|
|
315
|
+
|
|
316
|
+
@model_scope.observe(default=True)
|
|
317
|
+
def model_config(model):
|
|
318
|
+
model.backbone = 'resnet50'
|
|
319
|
+
model.lr = 0.1 # Model-specific learning rate
|
|
320
|
+
|
|
321
|
+
@data_scope.observe(default=True)
|
|
322
|
+
def data_config(data):
|
|
323
|
+
data.dataset = 'cifar10'
|
|
324
|
+
data.lr = 0.001 # Data augmentation learning rate (no conflict!)
|
|
325
|
+
|
|
326
|
+
@scope
|
|
327
|
+
def train(model, data): # Named parameters match scope names
|
|
328
|
+
# Both have 'lr' but in separate namespaces!
|
|
329
|
+
print(f"Model LR: {model.lr}, Data LR: {data.lr}")
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
**Key advantage**: `model.lr` and `data.lr` are completely independent. No need for naming conventions like `model_lr` vs `data_lr`.
|
|
333
|
+
|
|
334
|
+
**CLI with MultiScope:**
|
|
335
|
+
|
|
336
|
+
```bash
|
|
337
|
+
# Override model scope only
|
|
338
|
+
python train.py model.backbone=%resnet101%
|
|
339
|
+
|
|
340
|
+
# Override data scope only
|
|
341
|
+
python train.py data.dataset=%imagenet%
|
|
342
|
+
|
|
343
|
+
# Override both
|
|
344
|
+
python train.py model.backbone=%resnet101% data.dataset=%imagenet%
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
### Configuration Documentation & Debugging
|
|
348
|
+
|
|
349
|
+
**The `manual` command** visualizes the exact order of configuration application.
|
|
350
|
+
|
|
351
|
+
```python
|
|
352
|
+
@scope.observe(default=True)
|
|
353
|
+
def config(cfg):
|
|
354
|
+
cfg.lr = 0.001
|
|
355
|
+
cfg.batch_size = 32
|
|
356
|
+
cfg.model = 'resnet50'
|
|
357
|
+
|
|
358
|
+
@scope.manual
|
|
359
|
+
def config_docs(cfg):
|
|
360
|
+
cfg.lr = 'Learning rate for optimizer'
|
|
361
|
+
cfg.batch_size = 'Number of samples per batch'
|
|
362
|
+
cfg.model = 'Model architecture (resnet50, resnet101, etc.)'
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
```bash
|
|
366
|
+
python train.py manual
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
**Output:**
|
|
370
|
+
```
|
|
371
|
+
--------------------------------------------------
|
|
372
|
+
[Scope "config"]
|
|
373
|
+
(The Applying Order of Views)
|
|
374
|
+
config → (CLI Inputs)
|
|
375
|
+
|
|
376
|
+
(User Manuals)
|
|
377
|
+
lr: Learning rate for optimizer
|
|
378
|
+
batch_size: Number of samples per batch
|
|
379
|
+
model: Model architecture (resnet50, resnet101, etc.)
|
|
380
|
+
--------------------------------------------------
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
**Why this matters:**
|
|
384
|
+
When debugging "why is this config value not what I expect?", you can see **exactly** which function set it and in what order.
|
|
385
|
+
|
|
386
|
+
**Complex example:**
|
|
387
|
+
|
|
388
|
+
```python
|
|
389
|
+
@scope.observe(default=True)
|
|
390
|
+
def defaults(cfg):
|
|
391
|
+
cfg.lr = 0.001
|
|
392
|
+
|
|
393
|
+
@scope.observe(priority=1)
|
|
394
|
+
def experiment_config(cfg):
|
|
395
|
+
cfg.lr = 0.01
|
|
396
|
+
|
|
397
|
+
@scope.observe(priority=2)
|
|
398
|
+
def another_config(cfg):
|
|
399
|
+
cfg.lr = 0.1
|
|
400
|
+
|
|
401
|
+
@scope.observe(lazy=True)
|
|
402
|
+
def adaptive_lr(cfg):
|
|
403
|
+
if cfg.batch_size > 64:
|
|
404
|
+
cfg.lr = cfg.lr * 2
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
When you run `python train.py manual`, you see:
|
|
408
|
+
```
|
|
409
|
+
(The Applying Order of Views)
|
|
410
|
+
defaults → experiment_config → another_config → (CLI Inputs) → adaptive_lr
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
Now it's **crystal clear** why `lr=0.1` (from `another_config`) and not `0.01`!
|
|
414
|
+
|
|
415
|
+
### Config Import/Export
|
|
416
|
+
|
|
417
|
+
```python
|
|
418
|
+
@scope.observe()
|
|
419
|
+
def load_external(config):
|
|
420
|
+
# Load from any format
|
|
421
|
+
config.load('experiments/baseline.json')
|
|
422
|
+
config.load('models/resnet.yaml')
|
|
423
|
+
|
|
424
|
+
# Export to any format
|
|
425
|
+
config.dump('output/final_config.toml')
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
**OpenMMLab compatibility:**
|
|
429
|
+
|
|
430
|
+
```python
|
|
431
|
+
# Import OpenMMLab configs - handles _base_ inheritance automatically
|
|
432
|
+
config.load_mm_config('mmdet_configs/faster_rcnn.py')
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
**Hierarchical composition:**
|
|
436
|
+
|
|
437
|
+
```python
|
|
438
|
+
from ato.adict import ADict
|
|
439
|
+
|
|
440
|
+
# Load configs from directory structure
|
|
441
|
+
config = ADict.compose_hierarchy(
|
|
442
|
+
root='configs',
|
|
443
|
+
config_filename='config',
|
|
444
|
+
select={
|
|
445
|
+
'model': 'resnet50',
|
|
446
|
+
'data': 'imagenet'
|
|
447
|
+
},
|
|
448
|
+
overrides={
|
|
449
|
+
'model.lr': 0.01,
|
|
450
|
+
'data.batch_size': 64
|
|
451
|
+
},
|
|
452
|
+
required=['model.backbone', 'data.dataset'], # Validation
|
|
453
|
+
on_missing='warn' # or 'error'
|
|
454
|
+
)
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
### Argparse Integration
|
|
458
|
+
|
|
459
|
+
```python
|
|
460
|
+
from ato.scope import Scope
|
|
461
|
+
import argparse
|
|
462
|
+
|
|
463
|
+
scope = Scope(use_external_parser=True)
|
|
464
|
+
parser = argparse.ArgumentParser()
|
|
465
|
+
parser.add_argument('--gpu', type=int, default=0)
|
|
466
|
+
parser.add_argument('--seed', type=int, default=42)
|
|
467
|
+
|
|
468
|
+
@scope.observe(default=True)
|
|
469
|
+
def config(cfg):
|
|
470
|
+
cfg.lr = 0.001
|
|
471
|
+
cfg.batch_size = 32
|
|
472
|
+
|
|
473
|
+
@scope
|
|
474
|
+
def train(cfg):
|
|
475
|
+
print(f"GPU: {cfg.gpu}, LR: {cfg.lr}")
|
|
476
|
+
|
|
477
|
+
if __name__ == '__main__':
|
|
478
|
+
parser.parse_args() # Merges argparse with scope
|
|
479
|
+
train()
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
---
|
|
483
|
+
|
|
484
|
+
## SQL Tracker: Experiment Tracking
|
|
485
|
+
|
|
486
|
+
Lightweight experiment tracking using SQLite.
|
|
487
|
+
|
|
488
|
+
### Why SQL Tracker?
|
|
489
|
+
|
|
490
|
+
- **Zero Setup**: Just a SQLite file, no servers
|
|
491
|
+
- **Full History**: Track all runs, metrics, and artifacts
|
|
492
|
+
- **Smart Search**: Find similar experiments by config structure
|
|
493
|
+
- **Code Versioning**: Track code changes via fingerprints
|
|
494
|
+
- **Offline-first**: No network required, sync to cloud tracking later if needed
|
|
495
|
+
|
|
496
|
+
### Database Schema
|
|
497
|
+
|
|
498
|
+
```
|
|
499
|
+
Project (my_ml_project)
|
|
500
|
+
├── Experiment (run_1)
|
|
501
|
+
│ ├── config: {...}
|
|
502
|
+
│ ├── structural_hash: "abc123..."
|
|
503
|
+
│ ├── Metrics: [loss, accuracy, ...]
|
|
504
|
+
│ ├── Artifacts: [model.pt, plots/*, ...]
|
|
505
|
+
│ └── Fingerprints: [model_forward, train_step, ...]
|
|
506
|
+
├── Experiment (run_2)
|
|
507
|
+
└── ...
|
|
508
|
+
```
|
|
509
|
+
|
|
510
|
+
### Usage
|
|
511
|
+
|
|
512
|
+
#### Logging Experiments
|
|
513
|
+
|
|
514
|
+
```python
|
|
515
|
+
from ato.db_routers.sql.manager import SQLLogger
|
|
516
|
+
from ato.adict import ADict
|
|
517
|
+
|
|
518
|
+
# Setup config
|
|
519
|
+
config = ADict(
|
|
520
|
+
experiment=ADict(
|
|
521
|
+
project_name='image_classification',
|
|
522
|
+
sql=ADict(db_path='sqlite:///experiments.db')
|
|
523
|
+
),
|
|
524
|
+
# Your hyperparameters
|
|
525
|
+
lr=0.001,
|
|
526
|
+
batch_size=32,
|
|
527
|
+
model='resnet50'
|
|
528
|
+
)
|
|
529
|
+
|
|
530
|
+
# Create logger
|
|
531
|
+
logger = SQLLogger(config)
|
|
532
|
+
|
|
533
|
+
# Start experiment run
|
|
534
|
+
run_id = logger.run(tags=['baseline', 'resnet50', 'cifar10'])
|
|
535
|
+
|
|
536
|
+
# Training loop
|
|
537
|
+
for epoch in range(100):
|
|
538
|
+
# Your training code
|
|
539
|
+
train_loss = train_one_epoch()
|
|
540
|
+
val_acc = validate()
|
|
541
|
+
|
|
542
|
+
# Log metrics
|
|
543
|
+
logger.log_metric('train_loss', train_loss, step=epoch)
|
|
544
|
+
logger.log_metric('val_accuracy', val_acc, step=epoch)
|
|
545
|
+
|
|
546
|
+
# Log artifacts
|
|
547
|
+
logger.log_artifact(run_id, 'checkpoints/model_best.pt',
|
|
548
|
+
data_type='model',
|
|
549
|
+
metadata={'epoch': best_epoch})
|
|
550
|
+
|
|
551
|
+
# Finish run
|
|
552
|
+
logger.finish(status='completed')
|
|
553
|
+
```
|
|
554
|
+
|
|
555
|
+
#### Querying Experiments
|
|
556
|
+
|
|
557
|
+
```python
|
|
558
|
+
from ato.db_routers.sql.manager import SQLFinder
|
|
559
|
+
|
|
560
|
+
finder = SQLFinder(config)
|
|
561
|
+
|
|
562
|
+
# Get all runs in project
|
|
563
|
+
runs = finder.get_runs_in_project('image_classification')
|
|
564
|
+
for run in runs:
|
|
565
|
+
print(f"Run {run.id}: {run.config.model} - {run.status}")
|
|
566
|
+
|
|
567
|
+
# Find best performing run
|
|
568
|
+
best_run = finder.find_best_run(
|
|
569
|
+
project_name='image_classification',
|
|
570
|
+
metric_key='val_accuracy',
|
|
571
|
+
mode='max' # or 'min' for loss
|
|
572
|
+
)
|
|
573
|
+
print(f"Best config: {best_run.config}")
|
|
574
|
+
|
|
575
|
+
# Find similar experiments (same config structure)
|
|
576
|
+
similar = finder.find_similar_runs(run_id=123)
|
|
577
|
+
print(f"Found {len(similar)} runs with similar config structure")
|
|
578
|
+
|
|
579
|
+
# Trace statistics (code fingerprints)
|
|
580
|
+
stats = finder.get_trace_statistics('image_classification', trace_id='model_forward')
|
|
581
|
+
print(f"Model forward pass has {stats['static_trace_versions']} versions")
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
### Features
|
|
585
|
+
|
|
586
|
+
| Feature | Description |
|
|
587
|
+
|---------|-------------|
|
|
588
|
+
| **Structural Hash** | Auto-track config structure changes |
|
|
589
|
+
| **Metric Logging** | Time-series metrics with step tracking |
|
|
590
|
+
| **Artifact Management** | Track model checkpoints, plots, data files |
|
|
591
|
+
| **Fingerprint Tracking** | Version control for code (static & runtime) |
|
|
592
|
+
| **Smart Search** | Find similar configs, best runs, statistics |
|
|
593
|
+
|
|
594
|
+
---
|
|
595
|
+
|
|
596
|
+
## Hyperparameter Optimization
|
|
597
|
+
|
|
598
|
+
Built-in **Hyperband** algorithm for efficient hyperparameter search with early stopping.
|
|
599
|
+
|
|
600
|
+
### How Hyperband Works
|
|
601
|
+
|
|
602
|
+
Hyperband uses successive halving:
|
|
603
|
+
1. Start with many configs, train briefly
|
|
604
|
+
2. Keep top performers, discard poor ones
|
|
605
|
+
3. Train survivors longer
|
|
606
|
+
4. Repeat until one winner remains
|
|
607
|
+
|
|
608
|
+
### Basic Usage
|
|
609
|
+
|
|
610
|
+
```python
|
|
611
|
+
from ato.adict import ADict
|
|
612
|
+
from ato.hyperopt.hyperband import HyperBand
|
|
613
|
+
from ato.scope import Scope
|
|
614
|
+
|
|
615
|
+
scope = Scope()
|
|
616
|
+
|
|
617
|
+
# Define search space
|
|
618
|
+
search_spaces = ADict(
|
|
619
|
+
lr=ADict(
|
|
620
|
+
param_type='FLOAT',
|
|
621
|
+
param_range=(1e-5, 1e-1),
|
|
622
|
+
num_samples=20,
|
|
623
|
+
space_type='LOG' # Logarithmic spacing
|
|
624
|
+
),
|
|
625
|
+
batch_size=ADict(
|
|
626
|
+
param_type='INTEGER',
|
|
627
|
+
param_range=(16, 128),
|
|
628
|
+
num_samples=5,
|
|
629
|
+
space_type='LOG'
|
|
630
|
+
),
|
|
631
|
+
model=ADict(
|
|
632
|
+
param_type='CATEGORY',
|
|
633
|
+
categories=['resnet50', 'resnet101', 'efficientnet_b0']
|
|
634
|
+
)
|
|
635
|
+
)
|
|
636
|
+
|
|
637
|
+
# Create Hyperband optimizer
|
|
638
|
+
hyperband = HyperBand(
|
|
639
|
+
scope,
|
|
640
|
+
search_spaces,
|
|
641
|
+
halving_rate=0.3, # Keep top 30% each round
|
|
642
|
+
num_min_samples=3, # Stop when <= 3 configs remain
|
|
643
|
+
mode='max' # Maximize metric (use 'min' for loss)
|
|
644
|
+
)
|
|
645
|
+
|
|
646
|
+
@hyperband.main
|
|
647
|
+
def train(config):
|
|
648
|
+
# Your training code
|
|
649
|
+
model = create_model(config.model)
|
|
650
|
+
optimizer = Adam(lr=config.lr)
|
|
651
|
+
|
|
652
|
+
# Use __num_halved__ for early stopping
|
|
653
|
+
num_epochs = compute_epochs(config.__num_halved__)
|
|
654
|
+
|
|
655
|
+
# Train and return metric
|
|
656
|
+
val_acc = train_and_evaluate(model, optimizer, num_epochs)
|
|
657
|
+
return val_acc
|
|
658
|
+
|
|
659
|
+
if __name__ == '__main__':
|
|
660
|
+
# Run hyperparameter search
|
|
661
|
+
best_result = train()
|
|
662
|
+
print(f"Best config: {best_result.config}")
|
|
663
|
+
print(f"Best metric: {best_result.metric}")
|
|
664
|
+
```
|
|
665
|
+
|
|
666
|
+
### Automatic Step Calculation
|
|
667
|
+
|
|
668
|
+
```python
|
|
669
|
+
hyperband = HyperBand(scope, search_spaces, halving_rate=0.3, num_min_samples=4)
|
|
670
|
+
|
|
671
|
+
max_steps = 100000
|
|
672
|
+
steps_per_generation = hyperband.compute_optimized_initial_training_steps(max_steps)
|
|
673
|
+
# Example output: [27, 88, 292, 972, 3240, 10800, 36000, 120000]
|
|
674
|
+
|
|
675
|
+
# Use in training
|
|
676
|
+
@hyperband.main
|
|
677
|
+
def train(config):
|
|
678
|
+
generation = config.__num_halved__
|
|
679
|
+
num_steps = steps_per_generation[generation]
|
|
680
|
+
|
|
681
|
+
metric = train_for_n_steps(num_steps)
|
|
682
|
+
return metric
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
### Parameter Types
|
|
686
|
+
|
|
687
|
+
| Type | Description | Example |
|
|
688
|
+
|------|-------------|---------|
|
|
689
|
+
| `FLOAT` | Continuous values | Learning rate, dropout |
|
|
690
|
+
| `INTEGER` | Discrete integers | Batch size, num layers |
|
|
691
|
+
| `CATEGORY` | Categorical choices | Model type, optimizer |
|
|
692
|
+
|
|
693
|
+
Space types:
|
|
694
|
+
- `LOG`: Logarithmic spacing (good for learning rates)
|
|
695
|
+
- `LINEAR`: Linear spacing (default)
|
|
696
|
+
|
|
697
|
+
### Distributed Search
|
|
698
|
+
|
|
699
|
+
```python
|
|
700
|
+
from ato.hyperopt.hyperband import DistributedHyperBand
|
|
701
|
+
import torch.distributed as dist
|
|
702
|
+
|
|
703
|
+
# Initialize distributed training
|
|
704
|
+
dist.init_process_group(backend='nccl')
|
|
705
|
+
rank = dist.get_rank()
|
|
706
|
+
world_size = dist.get_world_size()
|
|
707
|
+
|
|
708
|
+
# Create distributed hyperband
|
|
709
|
+
hyperband = DistributedHyperBand(
|
|
710
|
+
scope,
|
|
711
|
+
search_spaces,
|
|
712
|
+
halving_rate=0.3,
|
|
713
|
+
num_min_samples=3,
|
|
714
|
+
mode='max',
|
|
715
|
+
rank=rank,
|
|
716
|
+
world_size=world_size,
|
|
717
|
+
backend='pytorch'
|
|
718
|
+
)
|
|
719
|
+
|
|
720
|
+
@hyperband.main
|
|
721
|
+
def train(config):
|
|
722
|
+
# Your distributed training code
|
|
723
|
+
model = create_model(config)
|
|
724
|
+
model = DDP(model, device_ids=[rank])
|
|
725
|
+
metric = train_and_evaluate(model)
|
|
726
|
+
return metric
|
|
727
|
+
|
|
728
|
+
if __name__ == '__main__':
|
|
729
|
+
result = train()
|
|
730
|
+
if rank == 0:
|
|
731
|
+
print(f"Best config: {result.config}")
|
|
732
|
+
```
|
|
733
|
+
|
|
734
|
+
### Extensible Design
|
|
735
|
+
|
|
736
|
+
Ato's hyperopt module is built for extensibility:
|
|
737
|
+
|
|
738
|
+
| Component | Purpose |
|
|
739
|
+
|-----------|---------|
|
|
740
|
+
| `GridSpaceMixIn` | Parameter sampling logic (reusable) |
|
|
741
|
+
| `HyperOpt` | Base optimization class |
|
|
742
|
+
| `DistributedMixIn` | Distributed training support (optional) |
|
|
743
|
+
|
|
744
|
+
**Example: Implement custom search algorithm**
|
|
745
|
+
|
|
746
|
+
```python
|
|
747
|
+
from ato.hyperopt.base import GridSpaceMixIn, HyperOpt
|
|
748
|
+
|
|
749
|
+
class RandomSearch(GridSpaceMixIn, HyperOpt):
|
|
750
|
+
def main(self, func):
|
|
751
|
+
# Reuse GridSpaceMixIn.prepare_distributions()
|
|
752
|
+
configs = self.prepare_distributions(self.config, self.search_spaces)
|
|
753
|
+
|
|
754
|
+
# Implement random sampling
|
|
755
|
+
import random
|
|
756
|
+
random.shuffle(configs)
|
|
757
|
+
|
|
758
|
+
results = []
|
|
759
|
+
for config in configs[:10]: # Sample 10 random configs
|
|
760
|
+
metric = func(config)
|
|
761
|
+
results.append((config, metric))
|
|
762
|
+
|
|
763
|
+
return max(results, key=lambda x: x[1])
|
|
764
|
+
```
|
|
765
|
+
|
|
766
|
+
---
|
|
767
|
+
|
|
768
|
+
## Best Practices
|
|
769
|
+
|
|
770
|
+
### 1. Project Structure
|
|
771
|
+
|
|
772
|
+
```
|
|
773
|
+
my_project/
|
|
774
|
+
├── configs/
|
|
775
|
+
│ ├── default.py # Default config with @scope.observe(default=True)
|
|
776
|
+
│ ├── models.py # Model-specific configs
|
|
777
|
+
│ └── datasets.py # Dataset configs
|
|
778
|
+
├── train.py # Main training script
|
|
779
|
+
├── experiments.db # SQLite experiment tracking
|
|
780
|
+
└── experiments/
|
|
781
|
+
├── run_001/
|
|
782
|
+
│ ├── checkpoints/
|
|
783
|
+
│ └── logs/
|
|
784
|
+
└── run_002/
|
|
785
|
+
```
|
|
786
|
+
|
|
787
|
+
### 2. Config Organization
|
|
788
|
+
|
|
789
|
+
```python
|
|
790
|
+
# configs/default.py
|
|
791
|
+
from ato.scope import Scope
|
|
792
|
+
from ato.adict import ADict
|
|
793
|
+
|
|
794
|
+
scope = Scope()
|
|
795
|
+
|
|
796
|
+
@scope.observe(default=True)
|
|
797
|
+
def defaults(cfg):
|
|
798
|
+
# Data
|
|
799
|
+
cfg.data = ADict(
|
|
800
|
+
dataset='cifar10',
|
|
801
|
+
batch_size=32,
|
|
802
|
+
num_workers=4
|
|
803
|
+
)
|
|
804
|
+
|
|
805
|
+
# Model
|
|
806
|
+
cfg.model = ADict(
|
|
807
|
+
backbone='resnet50',
|
|
808
|
+
pretrained=True
|
|
809
|
+
)
|
|
810
|
+
|
|
811
|
+
# Training
|
|
812
|
+
cfg.train = ADict(
|
|
813
|
+
lr=0.001,
|
|
814
|
+
epochs=100,
|
|
815
|
+
optimizer='adam'
|
|
816
|
+
)
|
|
817
|
+
|
|
818
|
+
# Experiment tracking
|
|
819
|
+
cfg.experiment = ADict(
|
|
820
|
+
project_name='my_project',
|
|
821
|
+
sql=ADict(db_path='sqlite:///experiments.db')
|
|
822
|
+
)
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
### 3. Combined Workflow
|
|
826
|
+
|
|
827
|
+
```python
|
|
828
|
+
from ato.scope import Scope
|
|
829
|
+
from ato.db_routers.sql.manager import SQLLogger
|
|
830
|
+
from configs.default import scope
|
|
831
|
+
|
|
832
|
+
@scope
|
|
833
|
+
def train(cfg):
|
|
834
|
+
# Setup experiment tracking
|
|
835
|
+
logger = SQLLogger(cfg)
|
|
836
|
+
run_id = logger.run(tags=[cfg.model.backbone, cfg.data.dataset])
|
|
837
|
+
|
|
838
|
+
try:
|
|
839
|
+
# Training loop
|
|
840
|
+
for epoch in range(cfg.train.epochs):
|
|
841
|
+
loss = train_epoch()
|
|
842
|
+
acc = validate()
|
|
843
|
+
|
|
844
|
+
logger.log_metric('loss', loss, epoch)
|
|
845
|
+
logger.log_metric('accuracy', acc, epoch)
|
|
846
|
+
|
|
847
|
+
logger.finish(status='completed')
|
|
848
|
+
|
|
849
|
+
except Exception as e:
|
|
850
|
+
logger.finish(status='failed')
|
|
851
|
+
raise e
|
|
852
|
+
|
|
853
|
+
if __name__ == '__main__':
|
|
854
|
+
train()
|
|
855
|
+
```
|
|
856
|
+
|
|
857
|
+
### 4. Reproducibility Checklist
|
|
858
|
+
|
|
859
|
+
- ✅ Use structural hashing to track config changes
|
|
860
|
+
- ✅ Log all hyperparameters to SQLLogger
|
|
861
|
+
- ✅ Tag experiments with meaningful labels
|
|
862
|
+
- ✅ Track artifacts (checkpoints, plots)
|
|
863
|
+
- ✅ Use lazy configs for derived parameters
|
|
864
|
+
- ✅ Document configs with `@scope.manual`
|
|
865
|
+
|
|
866
|
+
---
|
|
867
|
+
|
|
868
|
+
## Requirements
|
|
869
|
+
|
|
870
|
+
- Python >= 3.7
|
|
871
|
+
- SQLAlchemy (for SQL Tracker)
|
|
872
|
+
- PyYAML, toml (for config serialization)
|
|
873
|
+
|
|
874
|
+
See `pyproject.toml` for full dependencies.
|
|
875
|
+
|
|
876
|
+
---
|
|
877
|
+
|
|
878
|
+
## Contributing
|
|
879
|
+
|
|
880
|
+
Contributions are welcome! Please feel free to submit issues or pull requests.
|
|
881
|
+
|
|
882
|
+
### Development Setup
|
|
883
|
+
|
|
884
|
+
```bash
|
|
885
|
+
git clone https://github.com/yourusername/ato.git
|
|
886
|
+
cd ato
|
|
887
|
+
pip install -e .
|
|
888
|
+
```
|
|
889
|
+
|
|
890
|
+
### Quality Assurance
|
|
891
|
+
|
|
892
|
+
Ato's design philosophy — **structural neutrality** and **debuggable composition** — extends to our testing practices.
|
|
893
|
+
|
|
894
|
+
**Release Policy:**
|
|
895
|
+
- **All 100+ unit tests must pass before any release**
|
|
896
|
+
- No exceptions, no workarounds
|
|
897
|
+
- Tests cover every module: ADict, Scope, MultiScope, SQLTracker, HyperBand
|
|
898
|
+
|
|
899
|
+
**Why this matters:**
|
|
900
|
+
When you build on Ato, you're trusting it to stay out of your way. That means zero regressions, predictable behavior, and reliable APIs. Comprehensive test coverage ensures that each component works independently and composes correctly.
|
|
901
|
+
|
|
902
|
+
Run tests locally:
|
|
903
|
+
```bash
|
|
904
|
+
python -m pytest unit_tests/
|
|
905
|
+
```
|
|
906
|
+
|
|
907
|
+
---
|
|
908
|
+
|
|
909
|
+
## Composability
|
|
910
|
+
|
|
911
|
+
Ato is designed to **compose** with existing tools, not replace them.
|
|
912
|
+
|
|
913
|
+
### Works Where Other Systems Require Ecosystems
|
|
914
|
+
|
|
915
|
+
**Config composition:**
|
|
916
|
+
- Import OpenMMLab configs: `config.load_mm_config('mmdet_configs/faster_rcnn.py')`
|
|
917
|
+
- Load Hydra-style hierarchies: `ADict.compose_hierarchy(root='configs', select={'model': 'resnet50'})`
|
|
918
|
+
- Mix with argparse: `Scope(use_external_parser=True)`
|
|
919
|
+
|
|
920
|
+
**Experiment tracking:**
|
|
921
|
+
- Track locally in SQLite (zero setup)
|
|
922
|
+
- Sync to MLflow/W&B when you need dashboards
|
|
923
|
+
- Or use both: local SQLite + cloud tracking
|
|
924
|
+
|
|
925
|
+
**Hyperparameter optimization:**
|
|
926
|
+
- Built-in Hyperband
|
|
927
|
+
- Or compose with Optuna/Ray Tune — Ato's configs work with any optimizer
|
|
928
|
+
|
|
929
|
+
### Three Capabilities Other Tools Don't Provide
|
|
930
|
+
|
|
931
|
+
1. **MultiScope** — True namespace isolation with independent priority systems
|
|
932
|
+
2. **`manual` command** — Visualize exact config merge order for debugging
|
|
933
|
+
3. **Structural hashing** — Track when experiment **architecture** changes, not just values
|
|
934
|
+
|
|
935
|
+
### When to Use Ato
|
|
936
|
+
|
|
937
|
+
**Use Ato when:**
|
|
938
|
+
- You want zero boilerplate config management
|
|
939
|
+
- You need to debug why a config value isn't what you expect
|
|
940
|
+
- You're working on multi-team projects with namespace conflicts
|
|
941
|
+
- You want local-first experiment tracking
|
|
942
|
+
- You're migrating between config/tracking systems
|
|
943
|
+
|
|
944
|
+
**Ato works alongside:**
|
|
945
|
+
- Hydra (config composition)
|
|
946
|
+
- MLflow/W&B (cloud tracking)
|
|
947
|
+
- Optuna/Ray Tune (advanced hyperparameter search)
|
|
948
|
+
- PyTorch/TensorFlow/JAX (any ML framework)
|
|
949
|
+
|
|
950
|
+
---
|
|
951
|
+
|
|
952
|
+
## Roadmap
|
|
953
|
+
|
|
954
|
+
Ato's design constraint is **structural neutrality** — adding capabilities without creating dependencies.
|
|
955
|
+
|
|
956
|
+
### Planned: Local Dashboard (Optional Module)
|
|
957
|
+
|
|
958
|
+
A lightweight HTML dashboard for teams that want visual exploration without committing to cloud platforms:
|
|
959
|
+
|
|
960
|
+
**What it adds:**
|
|
961
|
+
- Metric comparison & trends (read-only view of SQLite data)
|
|
962
|
+
- Run history & artifact browsing
|
|
963
|
+
- Config diff visualization
|
|
964
|
+
- Interactive hyperparameter analysis
|
|
965
|
+
|
|
966
|
+
**Design constraints:**
|
|
967
|
+
- No hard dependency — Ato core works 100% without the dashboard
|
|
968
|
+
- Separate process — doesn't block or modify runs
|
|
969
|
+
- Zero lock-in — delete it anytime, training code doesn't change
|
|
970
|
+
- Composable — use alongside MLflow/W&B
|
|
971
|
+
|
|
972
|
+
**Guiding principle:** Ato remains a set of **independent, composable tools** — not a platform you commit to.
|
|
973
|
+
|
|
974
|
+
---
|
|
975
|
+
|
|
976
|
+
## License
|
|
977
|
+
|
|
978
|
+
MIT License
|