ato 2.0.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of ato might be problematic. Click here for more details.
- ato/__init__.py +1 -0
- ato/adict.py +582 -0
- ato/db_routers/__init__.py +8 -0
- ato/db_routers/sql/__init__.py +0 -0
- ato/db_routers/sql/manager.py +188 -0
- ato/db_routers/sql/schema.py +83 -0
- ato/hyperopt/__init__.py +0 -0
- ato/hyperopt/base.py +144 -0
- ato/hyperopt/hyperband.py +103 -0
- ato/parser.py +103 -0
- ato/scope.py +491 -0
- ato/utils.py +55 -0
- ato/xyz.py +234 -0
- ato-2.0.0.dist-info/METADATA +1181 -0
- ato-2.0.0.dist-info/RECORD +18 -0
- ato-2.0.0.dist-info/WHEEL +5 -0
- ato-2.0.0.dist-info/licenses/LICENSE +21 -0
- ato-2.0.0.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,1181 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ato
|
|
3
|
+
Version: 2.0.0
|
|
4
|
+
Summary: A Python library for experiment tracking and hyperparameter optimization
|
|
5
|
+
Author: ato contributors
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/yourusername/ato
|
|
8
|
+
Project-URL: Repository, https://github.com/yourusername/ato
|
|
9
|
+
Project-URL: Documentation, https://github.com/yourusername/ato#readme
|
|
10
|
+
Project-URL: Issues, https://github.com/yourusername/ato/issues
|
|
11
|
+
Keywords: machine learning,experiment tracking,hyperparameter optimization
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.7
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
23
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
24
|
+
Requires-Python: >=3.7
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: pyyaml>=6.0
|
|
28
|
+
Requires-Dist: toml>=0.10.2
|
|
29
|
+
Requires-Dist: sqlalchemy>=2.0
|
|
30
|
+
Requires-Dist: numpy>=1.19.0
|
|
31
|
+
Provides-Extra: distributed
|
|
32
|
+
Requires-Dist: torch>=1.8.0; extra == "distributed"
|
|
33
|
+
Dynamic: license-file
|
|
34
|
+
|
|
35
|
+
# Ato
|
|
36
|
+
|
|
37
|
+
Ato is intentionally small — it’s not about lines of code,
|
|
38
|
+
it’s about where they belong.
|
|
39
|
+
The core fits in a few hundred lines because it doesn’t need to fight Python — it flows with it.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
**Ato** is a lightweight Python library for experiment management in machine learning and data science.
|
|
44
|
+
It provides flexible configuration management, experiment tracking, and hyperparameter optimization —
|
|
45
|
+
all without the complexity or overhead of heavy frameworks.
|
|
46
|
+
|
|
47
|
+
## Why Ato?
|
|
48
|
+
|
|
49
|
+
### Core Differentiators
|
|
50
|
+
|
|
51
|
+
- **True Namespace Isolation**: MultiScope provides independent config contexts (unique to Ato!)
|
|
52
|
+
- **Configuration Transparency**: Visualize exact config merge order - debug configs with `manual` command
|
|
53
|
+
- **Built-in Experiment Tracking**: SQLite-based tracking with no external services required
|
|
54
|
+
- **Structural Hashing**: Track experiment structure changes automatically
|
|
55
|
+
|
|
56
|
+
### Developer Experience
|
|
57
|
+
|
|
58
|
+
- **Zero Boilerplate**: Auto-nested configs, lazy evaluation, attribute access
|
|
59
|
+
- **CLI-first Design**: Configure experiments from command line without touching code
|
|
60
|
+
- **Framework Agnostic**: Works with PyTorch, TensorFlow, JAX, or pure Python
|
|
61
|
+
|
|
62
|
+
## Quick Start
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
pip install ato-python
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### 30-Second Example
|
|
69
|
+
|
|
70
|
+
```python
|
|
71
|
+
from ato.scope import Scope
|
|
72
|
+
|
|
73
|
+
scope = Scope()
|
|
74
|
+
|
|
75
|
+
@scope.observe(default=True)
|
|
76
|
+
def config(cfg):
|
|
77
|
+
cfg.lr = 0.001
|
|
78
|
+
cfg.batch_size = 32
|
|
79
|
+
cfg.model = 'resnet50'
|
|
80
|
+
|
|
81
|
+
@scope
|
|
82
|
+
def train(cfg):
|
|
83
|
+
print(f"Training {cfg.model} with lr={cfg.lr}")
|
|
84
|
+
# Your training code here
|
|
85
|
+
|
|
86
|
+
if __name__ == '__main__':
|
|
87
|
+
train() # python train.py
|
|
88
|
+
# Override from CLI: python train.py lr=0.01 model=%resnet101%
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Table of Contents
|
|
94
|
+
|
|
95
|
+
- [ADict: Enhanced Dictionary](#adict-enhanced-dictionary)
|
|
96
|
+
- [Scope: Configuration Management](#scope-configuration-management)
|
|
97
|
+
- [MultiScope: Namespace Isolation](#2-multiscope---multiple-configuration-contexts) ⭐ Unique to Ato
|
|
98
|
+
- [Config Documentation & Debugging](#5-configuration-documentation--inspection) ⭐ Unique to Ato
|
|
99
|
+
- [SQL Tracker: Experiment Tracking](#sql-tracker-experiment-tracking)
|
|
100
|
+
- [Hyperparameter Optimization](#hyperparameter-optimization)
|
|
101
|
+
- [Best Practices](#best-practices)
|
|
102
|
+
- [Comparison with Hydra](#ato-vs-hydra)
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## ADict: Enhanced Dictionary
|
|
107
|
+
|
|
108
|
+
`ADict` is an enhanced dictionary designed for managing experiment configurations. It combines the simplicity of Python dictionaries with powerful features for ML workflows.
|
|
109
|
+
|
|
110
|
+
### Core Features
|
|
111
|
+
|
|
112
|
+
These are the fundamental capabilities that make ADict powerful for experiment management:
|
|
113
|
+
|
|
114
|
+
| Feature | Description | Why It Matters |
|
|
115
|
+
|---------|-------------|----------------|
|
|
116
|
+
| **Structural Hashing** | Hash based on keys + types, not values | Track when experiment structure changes |
|
|
117
|
+
| **Nested Access** | Dot notation for nested configs | `config.model.lr` instead of `config['model']['lr']` |
|
|
118
|
+
| **Format Agnostic** | Load/save JSON, YAML, TOML, XYZ | Work with any config format |
|
|
119
|
+
| **Safe Updates** | `update_if_absent()` method | Prevent accidental overwrites |
|
|
120
|
+
|
|
121
|
+
### Developer Convenience Features
|
|
122
|
+
|
|
123
|
+
These utilities maximize developer productivity and reduce boilerplate:
|
|
124
|
+
|
|
125
|
+
| Feature | Description | Benefit |
|
|
126
|
+
|---------|-------------|---------|
|
|
127
|
+
| **Auto-nested (`ADict.auto()`)** | Infinite depth lazy creation | `config.a.b.c = 1` just works - no KeyError |
|
|
128
|
+
| **Attribute-style Assignment** | `config.lr = 0.1` | Cleaner, more readable code |
|
|
129
|
+
| **Conditional Updates** | Only update missing keys | Merge configs safely |
|
|
130
|
+
|
|
131
|
+
### Quick Examples
|
|
132
|
+
|
|
133
|
+
```python
|
|
134
|
+
from ato.adict import ADict
|
|
135
|
+
|
|
136
|
+
# Structural hashing - track config structure changes
|
|
137
|
+
config1 = ADict(lr=0.1, epochs=100, model='resnet50')
|
|
138
|
+
config2 = ADict(lr=0.01, epochs=200, model='resnet101')
|
|
139
|
+
print(config1.get_structural_hash() == config2.get_structural_hash()) # True
|
|
140
|
+
|
|
141
|
+
config3 = ADict(lr=0.1, epochs='100', model='resnet50') # epochs is str!
|
|
142
|
+
print(config1.get_structural_hash() == config3.get_structural_hash()) # False
|
|
143
|
+
|
|
144
|
+
# Load/save any format
|
|
145
|
+
config = ADict.from_file('config.json')
|
|
146
|
+
config.dump('config.yaml')
|
|
147
|
+
|
|
148
|
+
# Safe updates
|
|
149
|
+
config.update_if_absent(lr=0.01, scheduler='cosine') # Only adds scheduler
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Convenience Features in Detail
|
|
153
|
+
|
|
154
|
+
#### Auto-nested: Zero Boilerplate Config Building
|
|
155
|
+
|
|
156
|
+
The most loved feature - no more manual nesting:
|
|
157
|
+
|
|
158
|
+
```python
|
|
159
|
+
# ❌ Traditional way
|
|
160
|
+
config = ADict()
|
|
161
|
+
config.model = ADict()
|
|
162
|
+
config.model.backbone = ADict()
|
|
163
|
+
config.model.backbone.layers = [64, 128, 256]
|
|
164
|
+
|
|
165
|
+
# ✅ With ADict.auto()
|
|
166
|
+
config = ADict.auto()
|
|
167
|
+
config.model.backbone.layers = [64, 128, 256] # Just works!
|
|
168
|
+
config.data.augmentation.brightness = 0.2
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
**Perfect for Scope integration**:
|
|
172
|
+
|
|
173
|
+
```python
|
|
174
|
+
from ato.scope import Scope
|
|
175
|
+
|
|
176
|
+
scope = Scope()
|
|
177
|
+
|
|
178
|
+
@scope.observe(default=True)
|
|
179
|
+
def config(cfg):
|
|
180
|
+
# No pre-definition needed!
|
|
181
|
+
cfg.training.optimizer.name = 'AdamW'
|
|
182
|
+
cfg.training.optimizer.lr = 0.001
|
|
183
|
+
cfg.model.encoder.num_layers = 12
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
**Works with CLI**:
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
python train.py model.backbone.resnet.depth=50 data.batch_size=32
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
#### More Convenience Utilities
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
# Attribute-style access
|
|
196
|
+
config.lr = 0.1
|
|
197
|
+
print(config.lr) # Instead of config['lr']
|
|
198
|
+
|
|
199
|
+
# Nested access
|
|
200
|
+
print(config.model.backbone.type) # Clean and readable
|
|
201
|
+
|
|
202
|
+
# Conditional updates - merge configs safely
|
|
203
|
+
base_config.update_if_absent(**experiment_config)
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## Scope: Configuration Management
|
|
209
|
+
|
|
210
|
+
Scope solves configuration complexity through **priority-based merging** and **CLI integration**. No more scattered config files or hard-coded parameters.
|
|
211
|
+
|
|
212
|
+
### Key Concepts
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
Default Configs (priority=0)
|
|
216
|
+
↓
|
|
217
|
+
Named Configs (priority=0+)
|
|
218
|
+
↓
|
|
219
|
+
CLI Arguments (highest priority)
|
|
220
|
+
↓
|
|
221
|
+
Lazy Configs (computed after CLI)
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Basic Usage
|
|
225
|
+
|
|
226
|
+
#### Simple Configuration
|
|
227
|
+
|
|
228
|
+
```python
|
|
229
|
+
from ato.scope import Scope
|
|
230
|
+
|
|
231
|
+
scope = Scope()
|
|
232
|
+
|
|
233
|
+
@scope.observe()
|
|
234
|
+
def my_config(config):
|
|
235
|
+
config.dataset = 'cifar10'
|
|
236
|
+
config.lr = 0.001
|
|
237
|
+
config.batch_size = 32
|
|
238
|
+
|
|
239
|
+
@scope
|
|
240
|
+
def train(config):
|
|
241
|
+
print(f"Training on {config.dataset}")
|
|
242
|
+
# Your code here
|
|
243
|
+
|
|
244
|
+
if __name__ == '__main__':
|
|
245
|
+
train()
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
#### Priority-based Merging
|
|
249
|
+
|
|
250
|
+
```python
|
|
251
|
+
@scope.observe(default=True) # Always applied
|
|
252
|
+
def defaults(cfg):
|
|
253
|
+
cfg.lr = 0.001
|
|
254
|
+
cfg.epochs = 100
|
|
255
|
+
|
|
256
|
+
@scope.observe(priority=1) # Applied after defaults
|
|
257
|
+
def high_lr(cfg):
|
|
258
|
+
cfg.lr = 0.01
|
|
259
|
+
|
|
260
|
+
@scope.observe(priority=2) # Applied last
|
|
261
|
+
def long_training(cfg):
|
|
262
|
+
cfg.epochs = 300
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
python train.py # lr=0.001, epochs=100
|
|
267
|
+
python train.py high_lr # lr=0.01, epochs=100
|
|
268
|
+
python train.py high_lr long_training # lr=0.01, epochs=300
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
#### CLI Configuration
|
|
272
|
+
|
|
273
|
+
Override any parameter from command line:
|
|
274
|
+
|
|
275
|
+
```bash
|
|
276
|
+
# Simple values
|
|
277
|
+
python train.py lr=0.01 batch_size=64
|
|
278
|
+
|
|
279
|
+
# Nested configs
|
|
280
|
+
python train.py model.backbone=%resnet101% model.depth=101
|
|
281
|
+
|
|
282
|
+
# Lists and complex types
|
|
283
|
+
python train.py layers=[64,128,256,512] dropout=0.5
|
|
284
|
+
|
|
285
|
+
# Combine with named configs
|
|
286
|
+
python train.py my_config lr=0.001 batch_size=128
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
**Note**: Wrap strings with `%` (e.g., `%resnet101%`) instead of quotes.
|
|
290
|
+
|
|
291
|
+
### Advanced Features
|
|
292
|
+
|
|
293
|
+
#### 1. Lazy Evaluation - Dynamic Configuration
|
|
294
|
+
|
|
295
|
+
Sometimes you need configs that depend on other values set via CLI:
|
|
296
|
+
|
|
297
|
+
```python
|
|
298
|
+
@scope.observe()
|
|
299
|
+
def base_config(cfg):
|
|
300
|
+
cfg.model = 'resnet50'
|
|
301
|
+
cfg.dataset = 'imagenet'
|
|
302
|
+
|
|
303
|
+
@scope.observe(lazy=True) # Evaluated AFTER CLI args
|
|
304
|
+
def computed_config(cfg):
|
|
305
|
+
# Adjust based on dataset
|
|
306
|
+
if cfg.dataset == 'imagenet':
|
|
307
|
+
cfg.num_classes = 1000
|
|
308
|
+
cfg.image_size = 224
|
|
309
|
+
elif cfg.dataset == 'cifar10':
|
|
310
|
+
cfg.num_classes = 10
|
|
311
|
+
cfg.image_size = 32
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
```bash
|
|
315
|
+
python train.py dataset=%cifar10% computed_config
|
|
316
|
+
# Results in: num_classes=10, image_size=32
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
**Python 3.11+ Context Manager**:
|
|
320
|
+
|
|
321
|
+
```python
|
|
322
|
+
@scope.observe()
|
|
323
|
+
def my_config(cfg):
|
|
324
|
+
cfg.model = 'resnet50'
|
|
325
|
+
cfg.num_layers = 50
|
|
326
|
+
|
|
327
|
+
with Scope.lazy(): # Evaluated after CLI
|
|
328
|
+
if cfg.model == 'resnet101':
|
|
329
|
+
cfg.num_layers = 101
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
#### 2. MultiScope - Multiple Configuration Contexts
|
|
333
|
+
|
|
334
|
+
**Unique to Ato**: Manage completely separate configuration namespaces. Unlike Hydra's config groups, MultiScope provides true **namespace isolation** with independent priority systems.
|
|
335
|
+
|
|
336
|
+
##### Why MultiScope?
|
|
337
|
+
|
|
338
|
+
| Challenge | Hydra's Approach | Ato's MultiScope |
|
|
339
|
+
|-----------|------------------|---------------------|
|
|
340
|
+
| Separate model/data configs | Config groups in one namespace | **Independent scopes with own priorities** |
|
|
341
|
+
| Avoid key collisions | Manual prefixing (`model.lr`, `train.lr`) | **Automatic namespace isolation** |
|
|
342
|
+
| Different teams/modules | Single config file | **Each scope can be owned separately** |
|
|
343
|
+
| Priority conflicts | Global priority system | **Per-scope priority system** |
|
|
344
|
+
|
|
345
|
+
##### Basic Usage
|
|
346
|
+
|
|
347
|
+
```python
|
|
348
|
+
from ato.scope import Scope, MultiScope
|
|
349
|
+
|
|
350
|
+
model_scope = Scope(name='model')
|
|
351
|
+
data_scope = Scope(name='data')
|
|
352
|
+
scope = MultiScope(model_scope, data_scope)
|
|
353
|
+
|
|
354
|
+
@model_scope.observe(default=True)
|
|
355
|
+
def model_config(model):
|
|
356
|
+
model.backbone = 'resnet50'
|
|
357
|
+
model.pretrained = True
|
|
358
|
+
|
|
359
|
+
@data_scope.observe(default=True)
|
|
360
|
+
def data_config(data):
|
|
361
|
+
data.dataset = 'cifar10'
|
|
362
|
+
data.batch_size = 32
|
|
363
|
+
|
|
364
|
+
@scope
|
|
365
|
+
def train(model, data): # Named parameters match scope names
|
|
366
|
+
print(f"Training {model.backbone} on {data.dataset}")
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
##### Real-world: Team Collaboration
|
|
370
|
+
|
|
371
|
+
Different team members can own different scopes without conflicts:
|
|
372
|
+
|
|
373
|
+
```python
|
|
374
|
+
# team_model.py - ML team owns this
|
|
375
|
+
model_scope = Scope(name='model')
|
|
376
|
+
|
|
377
|
+
@model_scope.observe(default=True)
|
|
378
|
+
def resnet_default(model):
|
|
379
|
+
model.backbone = 'resnet50'
|
|
380
|
+
model.lr = 0.1 # Model-specific learning rate
|
|
381
|
+
|
|
382
|
+
@model_scope.observe(priority=1)
|
|
383
|
+
def resnet101(model):
|
|
384
|
+
model.backbone = 'resnet101'
|
|
385
|
+
model.lr = 0.05 # Different lr for bigger model
|
|
386
|
+
|
|
387
|
+
# team_data.py - Data team owns this
|
|
388
|
+
data_scope = Scope(name='data')
|
|
389
|
+
|
|
390
|
+
@data_scope.observe(default=True)
|
|
391
|
+
def cifar_default(data):
|
|
392
|
+
data.dataset = 'cifar10'
|
|
393
|
+
data.lr = 0.001 # Data augmentation learning rate (no conflict!)
|
|
394
|
+
|
|
395
|
+
@data_scope.observe(priority=1)
|
|
396
|
+
def imagenet(data):
|
|
397
|
+
data.dataset = 'imagenet'
|
|
398
|
+
data.workers = 16
|
|
399
|
+
|
|
400
|
+
# train.py - Integration point
|
|
401
|
+
from team_model import model_scope
|
|
402
|
+
from team_data import data_scope
|
|
403
|
+
|
|
404
|
+
scope = MultiScope(model_scope, data_scope)
|
|
405
|
+
|
|
406
|
+
@scope
|
|
407
|
+
def train(model, data):
|
|
408
|
+
# Both have 'lr' but in separate namespaces!
|
|
409
|
+
print(f"Model LR: {model.lr}, Data LR: {data.lr}")
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
**Key advantage**: `model.lr` and `data.lr` are completely independent. No need for naming conventions like `model_lr` vs `data_lr`.
|
|
413
|
+
|
|
414
|
+
##### CLI with MultiScope
|
|
415
|
+
|
|
416
|
+
Override each scope independently:
|
|
417
|
+
|
|
418
|
+
```bash
|
|
419
|
+
# Override model scope only
|
|
420
|
+
python train.py model.backbone=%resnet101%
|
|
421
|
+
|
|
422
|
+
# Override data scope only
|
|
423
|
+
python train.py data.dataset=%imagenet%
|
|
424
|
+
|
|
425
|
+
# Override both
|
|
426
|
+
python train.py model.backbone=%resnet101% data.dataset=%imagenet%
|
|
427
|
+
|
|
428
|
+
# Call named configs per scope
|
|
429
|
+
python train.py resnet101 imagenet
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
#### 3. Import/Export Configs
|
|
433
|
+
|
|
434
|
+
Ato supports importing configs from multiple frameworks:
|
|
435
|
+
|
|
436
|
+
```python
|
|
437
|
+
@scope.observe()
|
|
438
|
+
def load_external(config):
|
|
439
|
+
# Load from any format
|
|
440
|
+
config.load('experiments/baseline.json')
|
|
441
|
+
config.load('models/resnet.yaml')
|
|
442
|
+
|
|
443
|
+
# Export to any format
|
|
444
|
+
config.dump('output/final_config.toml')
|
|
445
|
+
|
|
446
|
+
# Import OpenMMLab configs - handles _base_ inheritance automatically
|
|
447
|
+
config.load_mm_config('mmdet_configs/faster_rcnn.py')
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
**OpenMMLab compatibility** is built-in:
|
|
451
|
+
- Automatically resolves `_base_` inheritance chains
|
|
452
|
+
- Supports `_delete_` keys for config overriding
|
|
453
|
+
- Makes migration from MMDetection/MMSegmentation/etc. seamless
|
|
454
|
+
|
|
455
|
+
**Hydra-style config composition** is also built-in via `compose_hierarchy`:
|
|
456
|
+
|
|
457
|
+
```python
|
|
458
|
+
from ato.adict import ADict
|
|
459
|
+
|
|
460
|
+
# Hydra-style directory structure:
|
|
461
|
+
# configs/
|
|
462
|
+
# ├── config.yaml # base config
|
|
463
|
+
# ├── model/
|
|
464
|
+
# │ ├── resnet50.yaml
|
|
465
|
+
# │ └── resnet101.yaml
|
|
466
|
+
# └── data/
|
|
467
|
+
# ├── cifar10.yaml
|
|
468
|
+
# └── imagenet.yaml
|
|
469
|
+
|
|
470
|
+
config = ADict.compose_hierarchy(
|
|
471
|
+
root='configs',
|
|
472
|
+
config_filename='config',
|
|
473
|
+
select={
|
|
474
|
+
'model': 'resnet50', # or ['resnet50', 'resnet101'] for multiple
|
|
475
|
+
'data': 'imagenet'
|
|
476
|
+
},
|
|
477
|
+
overrides={
|
|
478
|
+
'model.lr': 0.01,
|
|
479
|
+
'data.batch_size': 64
|
|
480
|
+
},
|
|
481
|
+
required=['model.backbone', 'data.dataset'], # Validation
|
|
482
|
+
on_missing='warn' # or 'error'
|
|
483
|
+
)
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
**Key features**:
|
|
487
|
+
- Config groups (model/, data/, optimizer/, etc.)
|
|
488
|
+
- Automatic file discovery (tries .yaml, .json, .toml, .xyz)
|
|
489
|
+
- Dotted overrides (`model.lr=0.01`)
|
|
490
|
+
- Required key validation
|
|
491
|
+
- Flexible error handling
|
|
492
|
+
|
|
493
|
+
#### 4. Argparse Integration
|
|
494
|
+
|
|
495
|
+
Mix Ato with existing argparse code:
|
|
496
|
+
|
|
497
|
+
```python
|
|
498
|
+
from ato.scope import Scope
|
|
499
|
+
import argparse
|
|
500
|
+
|
|
501
|
+
scope = Scope(use_external_parser=True)
|
|
502
|
+
parser = argparse.ArgumentParser()
|
|
503
|
+
parser.add_argument('--gpu', type=int, default=0)
|
|
504
|
+
parser.add_argument('--seed', type=int, default=42)
|
|
505
|
+
|
|
506
|
+
@scope.observe(default=True)
|
|
507
|
+
def config(cfg):
|
|
508
|
+
cfg.lr = 0.001
|
|
509
|
+
cfg.batch_size = 32
|
|
510
|
+
|
|
511
|
+
@scope
|
|
512
|
+
def train(cfg):
|
|
513
|
+
print(f"GPU: {cfg.gpu}, LR: {cfg.lr}")
|
|
514
|
+
|
|
515
|
+
if __name__ == '__main__':
|
|
516
|
+
parser.parse_args() # Merges argparse with scope
|
|
517
|
+
train()
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
#### 5. Configuration Documentation & Inspection
|
|
521
|
+
|
|
522
|
+
**One of Ato's most powerful features**: Auto-generate documentation AND visualize the exact order of configuration application.
|
|
523
|
+
|
|
524
|
+
##### Basic Documentation
|
|
525
|
+
|
|
526
|
+
```python
|
|
527
|
+
@scope.manual
|
|
528
|
+
def config_docs(cfg):
|
|
529
|
+
cfg.lr = 'Learning rate for optimizer'
|
|
530
|
+
cfg.batch_size = 'Number of samples per batch'
|
|
531
|
+
cfg.model = 'Model architecture (resnet50, resnet101, etc.)'
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
```bash
|
|
535
|
+
python train.py manual
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
**Output:**
|
|
539
|
+
```
|
|
540
|
+
--------------------------------------------------
|
|
541
|
+
[Scope "config"]
|
|
542
|
+
(The Applying Order of Views)
|
|
543
|
+
defaults → (CLI Inputs) → lazy_config → main
|
|
544
|
+
|
|
545
|
+
(User Manuals)
|
|
546
|
+
config.lr: Learning rate for optimizer
|
|
547
|
+
config.batch_size: Number of samples per batch
|
|
548
|
+
config.model: Model architecture (resnet50, resnet101, etc.)
|
|
549
|
+
--------------------------------------------------
|
|
550
|
+
```
|
|
551
|
+
|
|
552
|
+
##### Why This Matters
|
|
553
|
+
|
|
554
|
+
The **applying order visualization** shows you **exactly** how your configs are merged:
|
|
555
|
+
- Which config functions are applied (in order)
|
|
556
|
+
- When CLI inputs override values
|
|
557
|
+
- Where lazy configs are evaluated
|
|
558
|
+
- The final function that uses the config
|
|
559
|
+
|
|
560
|
+
**This prevents configuration bugs** by making the merge order explicit and debuggable.
|
|
561
|
+
|
|
562
|
+
##### MultiScope Documentation
|
|
563
|
+
|
|
564
|
+
For complex projects with multiple scopes, `manual` shows each scope separately:
|
|
565
|
+
|
|
566
|
+
```python
|
|
567
|
+
from ato.scope import Scope, MultiScope
|
|
568
|
+
|
|
569
|
+
model_scope = Scope(name='model')
|
|
570
|
+
train_scope = Scope(name='train')
|
|
571
|
+
scope = MultiScope(model_scope, train_scope)
|
|
572
|
+
|
|
573
|
+
@model_scope.observe(default=True)
|
|
574
|
+
def model_defaults(model):
|
|
575
|
+
model.backbone = 'resnet50'
|
|
576
|
+
model.num_layers = 50
|
|
577
|
+
|
|
578
|
+
@model_scope.observe(priority=1)
|
|
579
|
+
def model_advanced(model):
|
|
580
|
+
model.pretrained = True
|
|
581
|
+
|
|
582
|
+
@model_scope.observe(lazy=True)
|
|
583
|
+
def model_lazy(model):
|
|
584
|
+
if model.backbone == 'resnet101':
|
|
585
|
+
model.num_layers = 101
|
|
586
|
+
|
|
587
|
+
@train_scope.observe(default=True)
|
|
588
|
+
def train_defaults(train):
|
|
589
|
+
train.lr = 0.001
|
|
590
|
+
train.epochs = 100
|
|
591
|
+
|
|
592
|
+
@model_scope.manual
|
|
593
|
+
def model_docs(model):
|
|
594
|
+
model.backbone = 'Model backbone architecture'
|
|
595
|
+
model.num_layers = 'Number of layers in the model'
|
|
596
|
+
|
|
597
|
+
@train_scope.manual
|
|
598
|
+
def train_docs(train):
|
|
599
|
+
train.lr = 'Learning rate for optimizer'
|
|
600
|
+
train.epochs = 'Total training epochs'
|
|
601
|
+
|
|
602
|
+
@scope
|
|
603
|
+
def main(model, train):
|
|
604
|
+
print(f"Training {model.backbone} with lr={train.lr}")
|
|
605
|
+
|
|
606
|
+
if __name__ == '__main__':
|
|
607
|
+
main()
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
```bash
|
|
611
|
+
python train.py manual
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
**Output:**
|
|
615
|
+
```
|
|
616
|
+
--------------------------------------------------
|
|
617
|
+
[Scope "model"]
|
|
618
|
+
(The Applying Order of Views)
|
|
619
|
+
model_defaults → model_advanced → (CLI Inputs) → model_lazy → main
|
|
620
|
+
|
|
621
|
+
(User Manuals)
|
|
622
|
+
model.backbone: Model backbone architecture
|
|
623
|
+
model.num_layers: Number of layers in the model
|
|
624
|
+
--------------------------------------------------
|
|
625
|
+
[Scope "train"]
|
|
626
|
+
(The Applying Order of Views)
|
|
627
|
+
train_defaults → (CLI Inputs) → main
|
|
628
|
+
|
|
629
|
+
(User Manuals)
|
|
630
|
+
train.lr: Learning rate for optimizer
|
|
631
|
+
train.epochs: Total training epochs
|
|
632
|
+
--------------------------------------------------
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
##### Real-world Example
|
|
636
|
+
|
|
637
|
+
This is especially valuable when debugging why a config value isn't what you expect:
|
|
638
|
+
|
|
639
|
+
```python
|
|
640
|
+
@scope.observe(default=True)
|
|
641
|
+
def defaults(cfg):
|
|
642
|
+
cfg.lr = 0.001
|
|
643
|
+
|
|
644
|
+
@scope.observe(priority=1)
|
|
645
|
+
def experiment_config(cfg):
|
|
646
|
+
cfg.lr = 0.01
|
|
647
|
+
|
|
648
|
+
@scope.observe(priority=2)
|
|
649
|
+
def another_config(cfg):
|
|
650
|
+
cfg.lr = 0.1
|
|
651
|
+
|
|
652
|
+
@scope.observe(lazy=True)
|
|
653
|
+
def adaptive_lr(cfg):
|
|
654
|
+
if cfg.batch_size > 64:
|
|
655
|
+
cfg.lr = cfg.lr * 2
|
|
656
|
+
```
|
|
657
|
+
|
|
658
|
+
When you run `python train.py manual`, you see:
|
|
659
|
+
```
|
|
660
|
+
(The Applying Order of Views)
|
|
661
|
+
defaults → experiment_config → another_config → (CLI Inputs) → adaptive_lr → main
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
Now it's **crystal clear** why `lr=0.1` (from `another_config`) and not `0.01`!
|
|
665
|
+
|
|
666
|
+
---
|
|
667
|
+
|
|
668
|
+
## SQL Tracker: Experiment Tracking
|
|
669
|
+
|
|
670
|
+
Lightweight experiment tracking using SQLite - no external services, no setup complexity.
|
|
671
|
+
|
|
672
|
+
### Why SQL Tracker?
|
|
673
|
+
|
|
674
|
+
- **Zero Setup**: Just a SQLite file, no servers
|
|
675
|
+
- **Full History**: Track all runs, metrics, and artifacts
|
|
676
|
+
- **Smart Search**: Find similar experiments by config structure
|
|
677
|
+
- **Code Versioning**: Track code changes via fingerprints
|
|
678
|
+
|
|
679
|
+
### Database Schema
|
|
680
|
+
|
|
681
|
+
```
|
|
682
|
+
Project (my_ml_project)
|
|
683
|
+
├── Experiment (run_1)
|
|
684
|
+
│ ├── config: {...}
|
|
685
|
+
│ ├── structural_hash: "abc123..."
|
|
686
|
+
│ ├── Metrics: [loss, accuracy, ...]
|
|
687
|
+
│ ├── Artifacts: [model.pt, plots/*, ...]
|
|
688
|
+
│ └── Fingerprints: [model_forward, train_step, ...]
|
|
689
|
+
├── Experiment (run_2)
|
|
690
|
+
└── ...
|
|
691
|
+
```
|
|
692
|
+
|
|
693
|
+
### Quick Start
|
|
694
|
+
|
|
695
|
+
#### Logging Experiments
|
|
696
|
+
|
|
697
|
+
```python
|
|
698
|
+
from ato.db_routers.sql.manager import SQLLogger
|
|
699
|
+
from ato.adict import ADict
|
|
700
|
+
|
|
701
|
+
# Setup config
|
|
702
|
+
config = ADict(
|
|
703
|
+
experiment=ADict(
|
|
704
|
+
project_name='image_classification',
|
|
705
|
+
sql=ADict(db_path='sqlite:///experiments.db')
|
|
706
|
+
),
|
|
707
|
+
# Your hyperparameters
|
|
708
|
+
lr=0.001,
|
|
709
|
+
batch_size=32,
|
|
710
|
+
model='resnet50'
|
|
711
|
+
)
|
|
712
|
+
|
|
713
|
+
# Create logger
|
|
714
|
+
logger = SQLLogger(config)
|
|
715
|
+
|
|
716
|
+
# Start experiment run
|
|
717
|
+
run_id = logger.run(tags=['baseline', 'resnet50', 'cifar10'])
|
|
718
|
+
|
|
719
|
+
# Training loop
|
|
720
|
+
for epoch in range(100):
|
|
721
|
+
# Your training code
|
|
722
|
+
train_loss = train_one_epoch()
|
|
723
|
+
val_acc = validate()
|
|
724
|
+
|
|
725
|
+
# Log metrics
|
|
726
|
+
logger.log_metric('train_loss', train_loss, step=epoch)
|
|
727
|
+
logger.log_metric('val_accuracy', val_acc, step=epoch)
|
|
728
|
+
|
|
729
|
+
# Log artifacts
|
|
730
|
+
logger.log_artifact(run_id, 'checkpoints/model_best.pt',
|
|
731
|
+
data_type='model',
|
|
732
|
+
metadata={'epoch': best_epoch})
|
|
733
|
+
|
|
734
|
+
# Finish run
|
|
735
|
+
logger.finish(status='completed')
|
|
736
|
+
```
|
|
737
|
+
|
|
738
|
+
#### Querying Experiments
|
|
739
|
+
|
|
740
|
+
```python
|
|
741
|
+
from ato.db_routers.sql.manager import SQLFinder
|
|
742
|
+
|
|
743
|
+
finder = SQLFinder(config)
|
|
744
|
+
|
|
745
|
+
# Get all runs in project
|
|
746
|
+
runs = finder.get_runs_in_project('image_classification')
|
|
747
|
+
for run in runs:
|
|
748
|
+
print(f"Run {run.id}: {run.config.model} - {run.status}")
|
|
749
|
+
|
|
750
|
+
# Find best performing run
|
|
751
|
+
best_run = finder.find_best_run(
|
|
752
|
+
project_name='image_classification',
|
|
753
|
+
metric_key='val_accuracy',
|
|
754
|
+
mode='max' # or 'min' for loss
|
|
755
|
+
)
|
|
756
|
+
print(f"Best config: {best_run.config}")
|
|
757
|
+
|
|
758
|
+
# Find similar experiments (same config structure)
|
|
759
|
+
similar = finder.find_similar_runs(run_id=123)
|
|
760
|
+
print(f"Found {len(similar)} runs with similar config structure")
|
|
761
|
+
|
|
762
|
+
# Trace statistics (code fingerprints)
|
|
763
|
+
stats = finder.get_trace_statistics('image_classification', trace_id='model_forward')
|
|
764
|
+
print(f"Model forward pass has {stats['static_trace_versions']} versions")
|
|
765
|
+
```
|
|
766
|
+
|
|
767
|
+
### Real-world Example: Experiment Comparison
|
|
768
|
+
|
|
769
|
+
```python
|
|
770
|
+
# Compare hyperparameter impact
|
|
771
|
+
finder = SQLFinder(config)
|
|
772
|
+
|
|
773
|
+
runs = finder.get_runs_in_project('my_project')
|
|
774
|
+
for run in runs:
|
|
775
|
+
# Get final accuracy
|
|
776
|
+
final_metrics = [m for m in run.metrics if m.key == 'val_accuracy']
|
|
777
|
+
best_acc = max(m.value for m in final_metrics) if final_metrics else 0
|
|
778
|
+
|
|
779
|
+
print(f"LR: {run.config.lr}, Batch: {run.config.batch_size} → Acc: {best_acc:.2%}")
|
|
780
|
+
```
|
|
781
|
+
|
|
782
|
+
### Features Summary
|
|
783
|
+
|
|
784
|
+
| Feature | Description |
|
|
785
|
+
|---------|-------------|
|
|
786
|
+
| **Structural Hash** | Auto-track config structure changes |
|
|
787
|
+
| **Metric Logging** | Time-series metrics with step tracking |
|
|
788
|
+
| **Artifact Management** | Track model checkpoints, plots, data files |
|
|
789
|
+
| **Fingerprint Tracking** | Version control for code (static & runtime) |
|
|
790
|
+
| **Smart Search** | Find similar configs, best runs, statistics |
|
|
791
|
+
|
|
792
|
+
---
|
|
793
|
+
|
|
794
|
+
## Hyperparameter Optimization
|
|
795
|
+
|
|
796
|
+
Built-in **Hyperband** algorithm for efficient hyperparameter search with early stopping.
|
|
797
|
+
|
|
798
|
+
### Extensible Design
|
|
799
|
+
|
|
800
|
+
Ato's hyperopt module is built for extensibility and reusability:
|
|
801
|
+
|
|
802
|
+
| Component | Purpose | Benefit |
|
|
803
|
+
|-----------|---------|---------|
|
|
804
|
+
| `GridSpaceMixIn` | Parameter sampling logic | Reusable across different algorithms |
|
|
805
|
+
| `HyperOpt` | Base optimization class | Easy to implement custom strategies |
|
|
806
|
+
| `DistributedMixIn` | Distributed training support | Optional, composable |
|
|
807
|
+
|
|
808
|
+
**This design makes it trivial to implement custom search algorithms**:
|
|
809
|
+
|
|
810
|
+
```python
|
|
811
|
+
from ato.hyperopt.base import GridSpaceMixIn, HyperOpt
|
|
812
|
+
|
|
813
|
+
class RandomSearch(GridSpaceMixIn, HyperOpt):
|
|
814
|
+
def main(self, func):
|
|
815
|
+
# Reuse GridSpaceMixIn.prepare_distributions()
|
|
816
|
+
configs = self.prepare_distributions(self.config, self.search_spaces)
|
|
817
|
+
|
|
818
|
+
# Implement random sampling
|
|
819
|
+
import random
|
|
820
|
+
random.shuffle(configs)
|
|
821
|
+
|
|
822
|
+
results = []
|
|
823
|
+
for config in configs[:10]: # Sample 10 random configs
|
|
824
|
+
metric = func(config)
|
|
825
|
+
results.append((config, metric))
|
|
826
|
+
|
|
827
|
+
return max(results, key=lambda x: x[1])
|
|
828
|
+
```
|
|
829
|
+
|
|
830
|
+
### How Hyperband Works
|
|
831
|
+
|
|
832
|
+
Hyperband uses successive halving:
|
|
833
|
+
1. Start with many configs, train briefly
|
|
834
|
+
2. Keep top performers, discard poor ones
|
|
835
|
+
3. Train survivors longer
|
|
836
|
+
4. Repeat until one winner remains
|
|
837
|
+
|
|
838
|
+
### Basic Usage
|
|
839
|
+
|
|
840
|
+
```python
|
|
841
|
+
from ato.adict import ADict
|
|
842
|
+
from ato.hyperopt.hyperband import HyperBand
|
|
843
|
+
from ato.scope import Scope
|
|
844
|
+
|
|
845
|
+
scope = Scope()
|
|
846
|
+
|
|
847
|
+
# Define search space
|
|
848
|
+
search_spaces = ADict(
|
|
849
|
+
lr=ADict(
|
|
850
|
+
param_type='FLOAT',
|
|
851
|
+
param_range=(1e-5, 1e-1),
|
|
852
|
+
num_samples=20,
|
|
853
|
+
space_type='LOG' # Logarithmic spacing
|
|
854
|
+
),
|
|
855
|
+
batch_size=ADict(
|
|
856
|
+
param_type='INTEGER',
|
|
857
|
+
param_range=(16, 128),
|
|
858
|
+
num_samples=5,
|
|
859
|
+
space_type='LOG'
|
|
860
|
+
),
|
|
861
|
+
model=ADict(
|
|
862
|
+
param_type='CATEGORY',
|
|
863
|
+
categories=['resnet50', 'resnet101', 'efficientnet_b0']
|
|
864
|
+
)
|
|
865
|
+
)
|
|
866
|
+
|
|
867
|
+
# Create Hyperband optimizer
|
|
868
|
+
hyperband = HyperBand(
|
|
869
|
+
scope,
|
|
870
|
+
search_spaces,
|
|
871
|
+
halving_rate=0.3, # Keep top 30% each round
|
|
872
|
+
num_min_samples=3, # Stop when <= 3 configs remain
|
|
873
|
+
mode='max' # Maximize metric (use 'min' for loss)
|
|
874
|
+
)
|
|
875
|
+
|
|
876
|
+
@hyperband.main
|
|
877
|
+
def train(config):
|
|
878
|
+
# Your training code
|
|
879
|
+
model = create_model(config.model)
|
|
880
|
+
optimizer = Adam(lr=config.lr)
|
|
881
|
+
|
|
882
|
+
# Use __num_halved__ for early stopping
|
|
883
|
+
num_epochs = compute_epochs(config.__num_halved__)
|
|
884
|
+
|
|
885
|
+
# Train and return metric
|
|
886
|
+
val_acc = train_and_evaluate(model, optimizer, num_epochs)
|
|
887
|
+
return val_acc
|
|
888
|
+
|
|
889
|
+
if __name__ == '__main__':
|
|
890
|
+
# Run hyperparameter search
|
|
891
|
+
best_result = train()
|
|
892
|
+
print(f"Best config: {best_result.config}")
|
|
893
|
+
print(f"Best metric: {best_result.metric}")
|
|
894
|
+
```
|
|
895
|
+
|
|
896
|
+
### Automatic Step Calculation
|
|
897
|
+
|
|
898
|
+
Let Hyperband compute optimal training steps:
|
|
899
|
+
|
|
900
|
+
```python
|
|
901
|
+
hyperband = HyperBand(scope, search_spaces, halving_rate=0.3, num_min_samples=4)
|
|
902
|
+
|
|
903
|
+
max_steps = 100000
|
|
904
|
+
steps_per_generation = hyperband.compute_optimized_initial_training_steps(max_steps)
|
|
905
|
+
# Example output: [27, 88, 292, 972, 3240, 10800, 36000, 120000]
|
|
906
|
+
|
|
907
|
+
# Use in training
|
|
908
|
+
@hyperband.main
|
|
909
|
+
def train(config):
|
|
910
|
+
generation = config.__num_halved__
|
|
911
|
+
num_steps = steps_per_generation[generation]
|
|
912
|
+
|
|
913
|
+
metric = train_for_n_steps(num_steps)
|
|
914
|
+
return metric
|
|
915
|
+
```
|
|
916
|
+
|
|
917
|
+
### Parameter Types
|
|
918
|
+
|
|
919
|
+
| Type | Description | Example |
|
|
920
|
+
|------|-------------|---------|
|
|
921
|
+
| `FLOAT` | Continuous values | Learning rate, dropout |
|
|
922
|
+
| `INTEGER` | Discrete integers | Batch size, num layers |
|
|
923
|
+
| `CATEGORY` | Categorical choices | Model type, optimizer |
|
|
924
|
+
|
|
925
|
+
Space types:
|
|
926
|
+
- `LOG`: Logarithmic spacing (good for learning rates)
|
|
927
|
+
- `LINEAR`: Linear spacing (default)
|
|
928
|
+
|
|
929
|
+
### Distributed Hyperparameter Search
|
|
930
|
+
|
|
931
|
+
Ato supports distributed hyperparameter optimization out of the box:
|
|
932
|
+
|
|
933
|
+
```python
|
|
934
|
+
from ato.hyperopt.hyperband import DistributedHyperBand
|
|
935
|
+
import torch.distributed as dist
|
|
936
|
+
|
|
937
|
+
# Initialize distributed training
|
|
938
|
+
dist.init_process_group(backend='nccl')
|
|
939
|
+
rank = dist.get_rank()
|
|
940
|
+
world_size = dist.get_world_size()
|
|
941
|
+
|
|
942
|
+
# Create distributed hyperband
|
|
943
|
+
hyperband = DistributedHyperBand(
|
|
944
|
+
scope,
|
|
945
|
+
search_spaces,
|
|
946
|
+
halving_rate=0.3,
|
|
947
|
+
num_min_samples=3,
|
|
948
|
+
mode='max',
|
|
949
|
+
rank=rank,
|
|
950
|
+
world_size=world_size,
|
|
951
|
+
backend='pytorch'
|
|
952
|
+
)
|
|
953
|
+
|
|
954
|
+
@hyperband.main
|
|
955
|
+
def train(config):
|
|
956
|
+
# Your distributed training code
|
|
957
|
+
model = create_model(config)
|
|
958
|
+
model = DDP(model, device_ids=[rank])
|
|
959
|
+
metric = train_and_evaluate(model)
|
|
960
|
+
return metric
|
|
961
|
+
|
|
962
|
+
if __name__ == '__main__':
|
|
963
|
+
result = train()
|
|
964
|
+
if rank == 0:
|
|
965
|
+
print(f"Best config: {result.config}")
|
|
966
|
+
```
|
|
967
|
+
|
|
968
|
+
**Key features**:
|
|
969
|
+
- Automatic work distribution across GPUs
|
|
970
|
+
- Synchronized config selection via `broadcast_object_from_root`
|
|
971
|
+
- Results aggregation with `all_gather_object`
|
|
972
|
+
- Compatible with PyTorch DDP, FSDP, DeepSpeed
|
|
973
|
+
|
|
974
|
+
---
|
|
975
|
+
|
|
976
|
+
## Best Practices
|
|
977
|
+
|
|
978
|
+
### 1. Project Structure
|
|
979
|
+
|
|
980
|
+
```
|
|
981
|
+
my_project/
|
|
982
|
+
├── configs/
|
|
983
|
+
│ ├── default.py # Default config with @scope.observe(default=True)
|
|
984
|
+
│ ├── models.py # Model-specific configs
|
|
985
|
+
│ └── datasets.py # Dataset configs
|
|
986
|
+
├── train.py # Main training script
|
|
987
|
+
├── experiments.db # SQLite experiment tracking
|
|
988
|
+
└── experiments/
|
|
989
|
+
├── run_001/
|
|
990
|
+
│ ├── checkpoints/
|
|
991
|
+
│ └── logs/
|
|
992
|
+
└── run_002/
|
|
993
|
+
```
|
|
994
|
+
|
|
995
|
+
### 2. Config Organization
|
|
996
|
+
|
|
997
|
+
```python
|
|
998
|
+
# configs/default.py
|
|
999
|
+
from ato.scope import Scope
|
|
1000
|
+
|
|
1001
|
+
scope = Scope()
|
|
1002
|
+
|
|
1003
|
+
@scope.observe(default=True)
|
|
1004
|
+
def defaults(cfg):
|
|
1005
|
+
# Data
|
|
1006
|
+
cfg.data = ADict(
|
|
1007
|
+
dataset='cifar10',
|
|
1008
|
+
batch_size=32,
|
|
1009
|
+
num_workers=4
|
|
1010
|
+
)
|
|
1011
|
+
|
|
1012
|
+
# Model
|
|
1013
|
+
cfg.model = ADict(
|
|
1014
|
+
backbone='resnet50',
|
|
1015
|
+
pretrained=True
|
|
1016
|
+
)
|
|
1017
|
+
|
|
1018
|
+
# Training
|
|
1019
|
+
cfg.train = ADict(
|
|
1020
|
+
lr=0.001,
|
|
1021
|
+
epochs=100,
|
|
1022
|
+
optimizer='adam'
|
|
1023
|
+
)
|
|
1024
|
+
|
|
1025
|
+
# Experiment tracking
|
|
1026
|
+
cfg.experiment = ADict(
|
|
1027
|
+
project_name='my_project',
|
|
1028
|
+
sql=ADict(db_path='sqlite:///experiments.db')
|
|
1029
|
+
)
|
|
1030
|
+
```
|
|
1031
|
+
|
|
1032
|
+
### 3. Combined Workflow
|
|
1033
|
+
|
|
1034
|
+
```python
|
|
1035
|
+
from ato.scope import Scope
|
|
1036
|
+
from ato.db_routers.sql.manager import SQLLogger
|
|
1037
|
+
from configs.default import scope
|
|
1038
|
+
|
|
1039
|
+
@scope
|
|
1040
|
+
def train(cfg):
|
|
1041
|
+
# Setup experiment tracking
|
|
1042
|
+
logger = SQLLogger(cfg)
|
|
1043
|
+
run_id = logger.run(tags=[cfg.model.backbone, cfg.data.dataset])
|
|
1044
|
+
|
|
1045
|
+
try:
|
|
1046
|
+
# Training loop
|
|
1047
|
+
for epoch in range(cfg.train.epochs):
|
|
1048
|
+
loss = train_epoch()
|
|
1049
|
+
acc = validate()
|
|
1050
|
+
|
|
1051
|
+
logger.log_metric('loss', loss, epoch)
|
|
1052
|
+
logger.log_metric('accuracy', acc, epoch)
|
|
1053
|
+
|
|
1054
|
+
logger.finish(status='completed')
|
|
1055
|
+
|
|
1056
|
+
except Exception as e:
|
|
1057
|
+
logger.finish(status='failed')
|
|
1058
|
+
raise e
|
|
1059
|
+
|
|
1060
|
+
if __name__ == '__main__':
|
|
1061
|
+
train()
|
|
1062
|
+
```
|
|
1063
|
+
|
|
1064
|
+
### 4. Reproducibility Checklist
|
|
1065
|
+
|
|
1066
|
+
- ✅ Use structural hashing to track config changes
|
|
1067
|
+
- ✅ Log all hyperparameters to SQLLogger
|
|
1068
|
+
- ✅ Tag experiments with meaningful labels
|
|
1069
|
+
- ✅ Track artifacts (checkpoints, plots)
|
|
1070
|
+
- ✅ Use lazy configs for derived parameters
|
|
1071
|
+
- ✅ Document configs with `@scope.manual`
|
|
1072
|
+
|
|
1073
|
+
---
|
|
1074
|
+
|
|
1075
|
+
## Requirements
|
|
1076
|
+
|
|
1077
|
+
- Python >= 3.7
|
|
1078
|
+
- SQLAlchemy (for SQL Tracker)
|
|
1079
|
+
- PyYAML, toml (for config serialization)
|
|
1080
|
+
|
|
1081
|
+
See `pyproject.toml` for full dependencies.
|
|
1082
|
+
|
|
1083
|
+
---
|
|
1084
|
+
|
|
1085
|
+
## License
|
|
1086
|
+
|
|
1087
|
+
MIT License
|
|
1088
|
+
|
|
1089
|
+
---
|
|
1090
|
+
|
|
1091
|
+
## Contributing
|
|
1092
|
+
|
|
1093
|
+
Contributions are welcome! Please feel free to submit issues or pull requests.
|
|
1094
|
+
|
|
1095
|
+
### Development Setup
|
|
1096
|
+
|
|
1097
|
+
```bash
|
|
1098
|
+
git clone https://github.com/yourusername/ato.git
|
|
1099
|
+
cd ato
|
|
1100
|
+
pip install -e .
|
|
1101
|
+
```
|
|
1102
|
+
|
|
1103
|
+
---
|
|
1104
|
+
|
|
1105
|
+
## Comparison with Other Tools
|
|
1106
|
+
|
|
1107
|
+
| Feature | Ato | MLflow | W&B | Hydra |
|
|
1108
|
+
|---------|--------|--------|-----|-------|
|
|
1109
|
+
| **Core Features** |
|
|
1110
|
+
| Zero setup | ✅ | ❌ | ❌ | ✅ |
|
|
1111
|
+
| Offline-first | ✅ | Partial | ❌ | ✅ |
|
|
1112
|
+
| Config priority system | ✅ Explicit | Partial (Tags) | Partial (Run params) | ✅ Override |
|
|
1113
|
+
| **True namespace isolation** | **✅ MultiScope** | **❌** | **❌** | **❌ Config groups only** |
|
|
1114
|
+
| **Config merge visualization** | **✅ `manual`** | **❌** | **❌** | **Partial (`--cfg` tree)** |
|
|
1115
|
+
| Structural hashing | ✅ | ❌ | ❌ | ❌ |
|
|
1116
|
+
| Built-in HyperOpt | ✅ Hyperband | ❌ | ✅ Sweeps | Plugins (Optuna) |
|
|
1117
|
+
| CLI-first design | ✅ | ❌ | ❌ | ✅ |
|
|
1118
|
+
| **Compatibility** |
|
|
1119
|
+
| Framework agnostic | ✅ | ✅ | ✅ | ✅ |
|
|
1120
|
+
| Distributed training | ✅ Native + DDP/FSDP⁽¹⁾ | ✅ | ✅ | ✅ |
|
|
1121
|
+
| Distributed HyperOpt | ✅ `DistributedHyperBand` | ❌ | Partial | Plugins |
|
|
1122
|
+
| Hydra-style composition | ✅ `compose_hierarchy` | N/A | N/A | Native |
|
|
1123
|
+
| OpenMMLab configs | ✅ `load_mm_config` | ❌ | ❌ | ❌ |
|
|
1124
|
+
| **Visualization & UI** |
|
|
1125
|
+
| Web dashboard | 🔜 Planned | ✅ | ✅ | ❌ |
|
|
1126
|
+
| Real-time metrics | 🔜 Planned | ✅ | ✅ | ❌ |
|
|
1127
|
+
| Interactive plots | 🔜 Planned | ✅ | ✅ | ❌ |
|
|
1128
|
+
| Metric comparison UI | 🔜 Planned | ✅ | ✅ | ❌ |
|
|
1129
|
+
| **Advanced Features** |
|
|
1130
|
+
| Model registry | 🔜 Planned | ✅ | ✅ | ❌ |
|
|
1131
|
+
| Dataset versioning | 🔜 Planned | Partial | ✅ | ❌ |
|
|
1132
|
+
| Team collaboration | ✅ MultiScope⁽²⁾ | ✅ Platform | ✅ Platform | ❌ |
|
|
1133
|
+
|
|
1134
|
+
⁽¹⁾ Native distributed hyperparameter optimization via `DistributedHyperBand`. Regular training is compatible with any distributed framework (DDP, FSDP, DeepSpeed) - just integrate logging, no special code needed.
|
|
1135
|
+
|
|
1136
|
+
⁽²⁾ Team collaboration via MultiScope: separate config ownership per team (e.g., Team A owns model scope, Team B owns data scope) without naming conflicts.
|
|
1137
|
+
|
|
1138
|
+
**Note on config compatibility**: Ato provides built-in support for other config frameworks:
|
|
1139
|
+
- **Hydra-style composition**: `compose_hierarchy()` supports config groups, select, overrides - full compatibility
|
|
1140
|
+
- **OpenMMLab configs**: `load_mm_config()` handles `_base_` inheritance and `_delete_` keys
|
|
1141
|
+
- Migration from existing projects is seamless - just import your configs and go
|
|
1142
|
+
|
|
1143
|
+
### Ato vs. Hydra
|
|
1144
|
+
|
|
1145
|
+
While Hydra is excellent for config composition, Ato provides unique features:
|
|
1146
|
+
|
|
1147
|
+
| Aspect | Hydra | Ato |
|
|
1148
|
+
|--------|-------|--------|
|
|
1149
|
+
| **Namespace isolation** | Config groups share namespace | ✅ MultiScope with independent namespaces<br/>(no key collisions) |
|
|
1150
|
+
| **Priority system** | Single global override system | ✅ Per-scope priority + lazy evaluation |
|
|
1151
|
+
| **Config merge debugging** | Tree view (`--cfg`)<br/>Shows final config | ✅ `manual` command<br/>Shows merge order & execution flow |
|
|
1152
|
+
| **Experiment tracking** | Requires external tools<br/>(MLflow/W&B) | ✅ Built-in SQL tracker |
|
|
1153
|
+
| **Team workflow** | Single config file ownership | ✅ Separate scope ownership per team⁽³⁾ |
|
|
1154
|
+
|
|
1155
|
+
⁽³⁾ Example: Team A defines `model_scope`, Team B defines `data_scope`, both can use `model.lr` and `data.lr` without conflicts.
|
|
1156
|
+
|
|
1157
|
+
**Use Ato over Hydra when:**
|
|
1158
|
+
- Multiple teams need independent config ownership (MultiScope)
|
|
1159
|
+
- You want to avoid key collision issues (no manual prefixing needed)
|
|
1160
|
+
- You need to debug why a config value was set (`manual` command)
|
|
1161
|
+
- You want experiment tracking without adding MLflow/W&B
|
|
1162
|
+
- You're migrating from OpenMMLab projects
|
|
1163
|
+
|
|
1164
|
+
**Use Hydra when:**
|
|
1165
|
+
- You have very deep config hierarchies with complex inheritance
|
|
1166
|
+
- You prefer YAML over Python
|
|
1167
|
+
- You need the mature plugin ecosystem (Ray, Joblib, etc.)
|
|
1168
|
+
- You don't need namespace isolation
|
|
1169
|
+
|
|
1170
|
+
**Why not both?**
|
|
1171
|
+
- Ato has **built-in Hydra-style composition** via `compose_hierarchy()`
|
|
1172
|
+
- You can use Hydra's directory structure and config groups directly in Ato
|
|
1173
|
+
- Get MultiScope + experiment tracking + merge debugging on top of Hydra's composition
|
|
1174
|
+
- Migration is literally just replacing `hydra.compose()` with `ADict.compose_hierarchy()`
|
|
1175
|
+
|
|
1176
|
+
**Ato is for you if:**
|
|
1177
|
+
- You want lightweight, offline-first experiment tracking
|
|
1178
|
+
- You need **true namespace isolation for team collaboration**
|
|
1179
|
+
- **You want to debug config merge order visually** (unique to Ato!)
|
|
1180
|
+
- You prefer simple Python over complex frameworks
|
|
1181
|
+
- You want reproducibility without overhead
|