ato 2.0.4__py3-none-any.whl → 2.1.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1134 @@
1
+ Metadata-Version: 2.4
2
+ Name: ato
3
+ Version: 2.1.4
4
+ Summary: Configuration, experimentation, and hyperparameter optimization for Python. No runtime magic. No launcher. Just Python modules you compose.
5
+ Author: ato contributors
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/yourusername/ato
8
+ Project-URL: Repository, https://github.com/yourusername/ato
9
+ Project-URL: Documentation, https://github.com/yourusername/ato#readme
10
+ Project-URL: Issues, https://github.com/yourusername/ato/issues
11
+ Keywords: config management,experiment tracking,hyperparameter optimization,lightweight,composable,namespace isolation,machine learning
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.7
18
+ Classifier: Programming Language :: Python :: 3.8
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
24
+ Requires-Python: >=3.7
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: pyyaml>=6.0
28
+ Requires-Dist: toml>=0.10.2
29
+ Requires-Dist: sqlalchemy>=2.0
30
+ Requires-Dist: numpy>=1.19.0
31
+ Provides-Extra: distributed
32
+ Requires-Dist: torch>=1.8.0; extra == "distributed"
33
+ Dynamic: license-file
34
+
35
+ # Ato: A Thin Operating Layer
36
+
37
+ **Minimal reproducibility for ML.**
38
+ Tracks config structure, code, and runtime so you can explain why runs differ — without a platform.
39
+
40
+ ```bash
41
+ pip install ato
42
+ ```
43
+
44
+ ```python
45
+ # Your experiment breaks. Here's how to debug it:
46
+ python train.py manual # See exactly how configs merged
47
+ finder.get_trace_statistics('my_project', 'train_step') # See which code versions ran
48
+ finder.find_similar_runs(run_id=123) # Find experiments with same structure
49
+ ```
50
+
51
+ **One question:** "Why did this result change?"
52
+ Ato fingerprints config structure, function logic, and runtime outputs to answer it.
53
+
54
+ ---
55
+
56
+ ## What Ato Is
57
+
58
+ Ato is a thin layer that fingerprints your config structure, function logic, and runtime outputs.
59
+
60
+ It doesn't replace your stack; it sits beside it to answer one question: **"Why did this result change?"**
61
+
62
+ Three pieces, zero coupling:
63
+
64
+ 1. **ADict** — Structural hashing for configs (tracks architecture changes, not just values)
65
+ 2. **Scope** — Priority-based config merging with reasoning and code fingerprinting
66
+ 3. **SQLTracker** — Local-first experiment tracking in SQLite (zero setup, zero servers)
67
+
68
+ Each works alone. Together, they explain why experiments diverge.
69
+
70
+ **Config is not logging — it's reasoning.**
71
+ Ato makes config merge order, priority, and causality visible.
72
+
73
+ ---
74
+
75
+ ## Config Superpowers (That Make Reproducibility Real)
76
+
77
+ These aren't features. They're how Ato is built:
78
+
79
+ | Capability | What It Does | Why It Matters |
80
+ |------------|--------------|----------------|
81
+ | **Structural hashing** | Hash based on keys + types, not values | Detect when experiment **architecture** changes, not just hyperparameters |
82
+ | **Priority/merge reasoning** | Explicit merge order with `manual` inspection | See **why** a config value won — trace the entire merge path |
83
+ | **Namespace isolation** | Each scope owns its keys | Team/module collisions are impossible — no need for `model_lr` vs `data_lr` prefixes |
84
+ | **Code fingerprinting** | SHA256 of function bytecode, not git commits | Track **logic changes** automatically — refactoring doesn't create false versions |
85
+ | **Runtime fingerprinting** | SHA256 of actual outputs | Detect silent failures when code is unchanged but behavior differs |
86
+
87
+ **No dashboards. No servers. No ecosystems.**
88
+ Just fingerprints and SQLite.
89
+
90
+ ---
91
+
92
+ ## Works With Your Stack (Keep Hydra, MLflow, W&B, ...)
93
+
94
+ Ato doesn't compete with your config system or tracking platform.
95
+ It **observes and fingerprints** what you already use.
96
+
97
+ **Compose configs however you like:**
98
+ - Load Hydra/OmegaConf configs → Ato fingerprints the final merged structure
99
+ - Use argparse → Ato observes and integrates seamlessly
100
+ - Import OpenMMLab configs → Ato handles `_base_` inheritance automatically
101
+ - Mix YAML/JSON/TOML → Ato is format-agnostic
102
+
103
+ **Track experiments however you like:**
104
+ - Log to MLflow/W&B for dashboards → Ato tracks causality in local SQLite
105
+ - Use both together → Cloud tracking for metrics, Ato for "why did this change?"
106
+ - Or just use Ato → Zero-setup local tracking with full history
107
+
108
+ **Ato is a complement, not a replacement.**
109
+ No migration required. No lock-in. Add it incrementally.
110
+
111
+ ---
112
+
113
+ ## When to Use Ato
114
+
115
+ Use Ato when:
116
+
117
+ - **Experiments diverge occasionally** and you need to narrow down the cause
118
+ - **Config include/override order** changes results in unexpected ways
119
+ - **"I didn't change the code but results differ"** happens repeatedly (dependency/environment/bytecode drift)
120
+ - **Multiple people modify configs** and you need to trace who set what and why
121
+ - **You're debugging non-determinism** and need runtime fingerprints to catch silent failures
122
+
123
+ **Ato is for causality, not compliance.**
124
+ If you need audit trails or dashboards, keep using your existing tracking platform.
125
+
126
+ ---
127
+
128
+ ## Non-Goals
129
+
130
+ Ato is **not**:
131
+
132
+ - A pipeline orchestrator (use Airflow, Prefect, Luigi, ...)
133
+ - A hyperparameter scheduler (use Optuna, Ray Tune, ...)
134
+ - A model registry (use MLflow Model Registry, ...)
135
+ - An experiment dashboard (use MLflow, W&B, TensorBoard, ...)
136
+ - A dataset versioner (use DVC, Pachyderm, ...)
137
+
138
+ **Ato has one job:** Explain why results changed.
139
+ Everything else belongs in specialized tools.
140
+
141
+ ---
142
+
143
+ ## Incremental Adoption (No Migration Required)
144
+
145
+ You don't need to replace anything. Add Ato in steps:
146
+
147
+ **Step 1: Fingerprint config structure (zero code changes)**
148
+ ```python
149
+ from ato.adict import ADict
150
+
151
+ config = ADict(lr=0.001, batch_size=32, model='resnet50')
152
+ print(config.get_structural_hash()) # Tracks structure, not values
153
+ ```
154
+
155
+ **Step 2: Add code fingerprinting to key functions**
156
+ ```python
157
+ from ato.scope import Scope
158
+
159
+ scope = Scope()
160
+
161
+ @scope.trace(trace_id='train_step')
162
+ @scope
163
+ def train_epoch(config):
164
+ # Your training code
165
+ pass
166
+ ```
167
+
168
+ **Step 3: Add runtime fingerprinting to outputs**
169
+ ```python
170
+ @scope.runtime_trace(
171
+ trace_id='predictions',
172
+ init_fn=lambda: np.random.seed(42), # Fix randomness
173
+ inspect_fn=lambda preds: preds[:100] # Track first 100
174
+ )
175
+ def evaluate(model, data):
176
+ return model.predict(data)
177
+ ```
178
+
179
+ **Step 4: Inspect config merge order**
180
+ ```bash
181
+ python train.py manual # See exactly how configs merged
182
+ ```
183
+
184
+ **Step 5: Track experiments locally**
185
+ ```python
186
+ from ato.db_routers.sql.manager import SQLLogger
187
+
188
+ logger = SQLLogger(config)
189
+ run_id = logger.run(tags=['baseline'])
190
+ # Your training loop
191
+ logger.log_metric('loss', loss, step=epoch)
192
+ logger.finish(status='completed')
193
+ ```
194
+
195
+ ---
196
+
197
+ ## Quick Start
198
+
199
+ **Three lines to tracked experiments:**
200
+
201
+ ```python
202
+ from ato.scope import Scope
203
+
204
+ scope = Scope()
205
+
206
+ @scope.observe(default=True)
207
+ def config(config):
208
+ config.lr = 0.001
209
+ config.batch_size = 32
210
+ config.model = 'resnet50'
211
+
212
+ @scope
213
+ def train(config):
214
+ print(f"Training {config.model} with lr={config.lr}")
215
+
216
+ if __name__ == '__main__':
217
+ train()
218
+ ```
219
+
220
+ **Run it:**
221
+ ```bash
222
+ python train.py # Uses defaults
223
+ python train.py lr=0.01 # Override from CLI
224
+ python train.py manual # See config merge order
225
+ ```
226
+
227
+ ---
228
+
229
+ ## Table of Contents
230
+
231
+ - [ADict: Structural Hashing](#adict-structural-hashing)
232
+ - [Scope: Config Reasoning](#scope-config-reasoning)
233
+ - [Priority-based Merging](#priority-based-merging)
234
+ - [Config Chaining](#config-chaining)
235
+ - [Lazy Evaluation](#lazy-evaluation)
236
+ - [MultiScope: Namespace Isolation](#multiscope-namespace-isolation)
237
+ - [Config Documentation & Debugging](#config-documentation--debugging)
238
+ - [Code Fingerprinting](#code-fingerprinting)
239
+ - [Runtime Fingerprinting](#runtime-fingerprinting)
240
+ - [SQL Tracker: Local Experiment Tracking](#sql-tracker-local-experiment-tracking)
241
+ - [Hyperparameter Optimization](#hyperparameter-optimization)
242
+ - [Best Practices](#best-practices)
243
+ - [FAQ](#faq)
244
+ - [Quality Signals](#quality-signals)
245
+ - [Contributing](#contributing)
246
+
247
+ ---
248
+
249
+ ## ADict: Structural Hashing
250
+
251
+ `ADict` tracks when experiment **architecture** changes, not just hyperparameter values.
252
+
253
+ ### Core Capabilities
254
+
255
+ | Feature | Description |
256
+ |---------|-------------|
257
+ | **Structural Hashing** | Hash based on keys + types → detect architecture changes |
258
+ | **Nested Access** | Dot notation: `config.model.lr` instead of `config['model']['lr']` |
259
+ | **Format Agnostic** | Load/save JSON, YAML, TOML, XYZ |
260
+ | **Safe Updates** | `update_if_absent()` → merge without overwrites |
261
+ | **Auto-nested** | `ADict.auto()` → `config.a.b.c = 1` just works |
262
+
263
+ ### Examples
264
+
265
+ #### Structural Hashing
266
+
267
+ ```python
268
+ from ato.adict import ADict
269
+
270
+ # Same structure, different values
271
+ config1 = ADict(lr=0.1, epochs=100, model='resnet50')
272
+ config2 = ADict(lr=0.01, epochs=200, model='resnet101')
273
+ print(config1.get_structural_hash() == config2.get_structural_hash()) # True
274
+
275
+ # Different structure (epochs is str!)
276
+ config3 = ADict(lr=0.1, epochs='100', model='resnet50')
277
+ print(config1.get_structural_hash() == config3.get_structural_hash()) # False
278
+ ```
279
+
280
+ **Why this matters:**
281
+ When results differ, you need to know if the experiment **architecture** changed or just the values.
282
+
283
+ #### Auto-nested Configs
284
+
285
+ ```python
286
+ # ❌ Traditional way
287
+ config = ADict()
288
+ config.model = ADict()
289
+ config.model.backbone = ADict()
290
+ config.model.backbone.layers = [64, 128, 256]
291
+
292
+ # ✅ With ADict.auto()
293
+ config = ADict.auto()
294
+ config.model.backbone.layers = [64, 128, 256] # Just works!
295
+ ```
296
+
297
+ #### Format Agnostic
298
+
299
+ ```python
300
+ # Load/save any format
301
+ config = ADict.from_file('config.json')
302
+ config.dump('config.yaml')
303
+
304
+ # Safe updates
305
+ config.update_if_absent(lr=0.01, scheduler='cosine') # Only adds scheduler
306
+ ```
307
+
308
+ ---
309
+
310
+ ## Scope: Config Reasoning
311
+
312
+ Scope manages configuration through **priority-based merging** with **full reasoning**.
313
+
314
+ **Config is not logging — it's reasoning.**
315
+ Scope makes merge order, priority, and causality visible.
316
+
317
+ ### Priority-based Merging
318
+
319
+ ```
320
+ Default Configs (priority=0)
321
+
322
+ Named Configs (priority=0+)
323
+
324
+ CLI Arguments (highest priority)
325
+
326
+ Lazy Configs (computed after CLI)
327
+ ```
328
+
329
+ #### Example
330
+
331
+ ```python
332
+ from ato.scope import Scope
333
+
334
+ scope = Scope()
335
+
336
+ @scope.observe(default=True) # Always applied
337
+ def defaults(config):
338
+ config.lr = 0.001
339
+ config.epochs = 100
340
+
341
+ @scope.observe(priority=1) # Applied after defaults
342
+ def high_lr(config):
343
+ config.lr = 0.01
344
+
345
+ @scope.observe(priority=2) # Applied last
346
+ def long_training(config):
347
+ config.epochs = 300
348
+ ```
349
+
350
+ ```bash
351
+ python train.py # lr=0.001, epochs=100
352
+ python train.py high_lr # lr=0.01, epochs=100
353
+ python train.py high_lr long_training # lr=0.01, epochs=300
354
+ ```
355
+
356
+ ### Config Chaining
357
+
358
+ Chain configs with dependencies:
359
+
360
+ ```python
361
+ @scope.observe()
362
+ def base_setup(config):
363
+ config.project_name = 'my_project'
364
+ config.data_dir = '/data'
365
+
366
+ @scope.observe(chain_with='base_setup') # Automatically applies base_setup first
367
+ def advanced_training(config):
368
+ config.distributed = True
369
+ config.mixed_precision = True
370
+
371
+ @scope.observe(chain_with=['base_setup', 'gpu_setup']) # Multiple dependencies
372
+ def multi_node_training(config):
373
+ config.nodes = 4
374
+ config.world_size = 16
375
+ ```
376
+
377
+ ```bash
378
+ # Calling advanced_training automatically applies base_setup first
379
+ python train.py advanced_training
380
+ # Results in: base_setup → advanced_training
381
+ ```
382
+
383
+ ### Lazy Evaluation
384
+
385
+ **Note:** Lazy evaluation requires Python 3.8 or higher.
386
+
387
+ Compute configs **after** CLI args are applied:
388
+
389
+ ```python
390
+ @scope.observe()
391
+ def base_config(config):
392
+ config.model = 'resnet50'
393
+ config.dataset = 'imagenet'
394
+
395
+ @scope.observe(lazy=True) # Evaluated AFTER CLI args
396
+ def computed_config(config):
397
+ # Adjust based on dataset
398
+ if config.dataset == 'imagenet':
399
+ config.num_classes = 1000
400
+ config.image_size = 224
401
+ elif config.dataset == 'cifar10':
402
+ config.num_classes = 10
403
+ config.image_size = 32
404
+ ```
405
+
406
+ ```bash
407
+ python train.py dataset=%cifar10% computed_config
408
+ # Results in: num_classes=10, image_size=32
409
+ ```
410
+
411
+ **Python 3.11+ Context Manager:**
412
+
413
+ ```python
414
+ @scope.observe()
415
+ def my_config(config):
416
+ config.model = 'resnet50'
417
+ config.num_layers = 50
418
+
419
+ with Scope.lazy(): # Evaluated after CLI
420
+ if config.model == 'resnet101':
421
+ config.num_layers = 101
422
+ ```
423
+
424
+ ### MultiScope: Namespace Isolation
425
+
426
+ Manage completely separate configuration namespaces with independent priority systems.
427
+
428
+ **Use case:** Different teams own different scopes without key collisions.
429
+
430
+ ```python
431
+ from ato.scope import Scope, MultiScope
432
+
433
+ model_scope = Scope(name='model')
434
+ data_scope = Scope(name='data')
435
+ scope = MultiScope(model_scope, data_scope)
436
+
437
+ @model_scope.observe(default=True)
438
+ def model_config(model):
439
+ model.backbone = 'resnet50'
440
+ model.lr = 0.1 # Model-specific learning rate
441
+
442
+ @data_scope.observe(default=True)
443
+ def data_config(data):
444
+ data.dataset = 'cifar10'
445
+ data.lr = 0.001 # Data augmentation learning rate (no conflict!)
446
+
447
+ @scope
448
+ def train(model, data): # Named parameters match scope names
449
+ # Both have 'lr' but in separate namespaces!
450
+ print(f"Model LR: {model.lr}, Data LR: {data.lr}")
451
+ ```
452
+
453
+ **Key advantage:** `model.lr` and `data.lr` are completely independent. No naming prefixes needed.
454
+
455
+ **CLI with MultiScope:**
456
+
457
+ ```bash
458
+ # Override model scope only
459
+ python train.py model.backbone=%resnet101%
460
+
461
+ # Override both
462
+ python train.py model.backbone=%resnet101% data.dataset=%imagenet%
463
+ ```
464
+
465
+ ### Config Documentation & Debugging
466
+
467
+ **The `manual` command** visualizes the exact order of configuration application.
468
+
469
+ ```python
470
+ @scope.observe(default=True)
471
+ def config(config):
472
+ config.lr = 0.001
473
+ config.batch_size = 32
474
+ config.model = 'resnet50'
475
+
476
+ @scope.manual
477
+ def config_docs(config):
478
+ config.lr = 'Learning rate for optimizer'
479
+ config.batch_size = 'Number of samples per batch'
480
+ config.model = 'Model architecture (resnet50, resnet101, etc.)'
481
+ ```
482
+
483
+ ```bash
484
+ python train.py manual
485
+ ```
486
+
487
+ **Output:**
488
+ ```
489
+ --------------------------------------------------
490
+ [Scope "config"]
491
+ (The Applying Order of Views)
492
+ config → (CLI Inputs)
493
+
494
+ (User Manuals)
495
+ lr: Learning rate for optimizer
496
+ batch_size: Number of samples per batch
497
+ model: Model architecture (resnet50, resnet101, etc.)
498
+ --------------------------------------------------
499
+ ```
500
+
501
+ **Why this matters:**
502
+ When debugging "why is this config value not what I expect?", you see **exactly** which function set it and in what order.
503
+
504
+ **Complex example:**
505
+
506
+ ```python
507
+ @scope.observe(default=True)
508
+ def defaults(config):
509
+ config.lr = 0.001
510
+
511
+ @scope.observe(priority=1)
512
+ def experiment_config(config):
513
+ config.lr = 0.01
514
+
515
+ @scope.observe(priority=2)
516
+ def another_config(config):
517
+ config.lr = 0.1
518
+
519
+ @scope.observe(lazy=True)
520
+ def adaptive_lr(config):
521
+ if config.batch_size > 64:
522
+ config.lr = config.lr * 2
523
+ ```
524
+
525
+ When you run `python train.py manual`, you see:
526
+ ```
527
+ (The Applying Order of Views)
528
+ defaults → experiment_config → another_config → (CLI Inputs) → adaptive_lr
529
+ ```
530
+
531
+ Now it's **crystal clear** why `lr=0.1` (from `another_config`) and not `0.01`!
532
+
533
+ ### Code Fingerprinting
534
+
535
+ Track **logic changes** automatically, ignoring cosmetic edits.
536
+
537
+ #### Static Tracing (`@scope.trace`)
538
+
539
+ Generates a fingerprint of the function's **logic**, not its name or formatting:
540
+
541
+ ```python
542
+ # These three functions have IDENTICAL fingerprints
543
+ @scope.trace(trace_id='train_step')
544
+ @scope
545
+ def train_v1(config):
546
+ loss = model(data)
547
+ return loss
548
+
549
+ @scope.trace(trace_id='train_step')
550
+ @scope
551
+ def train_v2(config):
552
+ # Added comments
553
+ loss = model(data) # Compute loss
554
+ return loss
555
+
556
+ @scope.trace(trace_id='train_step')
557
+ @scope
558
+ def completely_different_name(config):
559
+ loss=model(data) # Different whitespace
560
+ return loss
561
+ ```
562
+
563
+ All three produce the **same fingerprint** because the underlying logic is identical.
564
+
565
+ **When fingerprints change:**
566
+
567
+ ```python
568
+ @scope.trace(trace_id='train_step')
569
+ @scope
570
+ def train_v1(config):
571
+ loss = model(data)
572
+ return loss
573
+
574
+ @scope.trace(trace_id='train_step')
575
+ @scope
576
+ def train_v2(config):
577
+ loss = model(data) * 2 # ← Logic changed!
578
+ return loss
579
+ ```
580
+
581
+ Now fingerprints differ — you've changed the actual computation.
582
+
583
+ **Example: Catching refactoring bugs**
584
+
585
+ ```python
586
+ # Original implementation
587
+ @scope.trace(trace_id='forward_pass')
588
+ @scope
589
+ def forward(model, x):
590
+ out = model(x)
591
+ return out
592
+
593
+ # Safe refactoring: Added comments, changed variable name, different whitespace
594
+ @scope.trace(trace_id='forward_pass')
595
+ @scope
596
+ def forward(model,x):
597
+ # Forward pass through model
598
+ result=model(x) # No spaces
599
+ return result
600
+ ```
601
+
602
+ These have **the same fingerprint** because the underlying logic is identical — only cosmetic changes.
603
+
604
+ ```python
605
+ # Unsafe refactoring: Logic changed
606
+ @scope.trace(trace_id='forward_pass')
607
+ @scope
608
+ def forward(model, x):
609
+ features = model.backbone(x) # Now calling backbone + head separately!
610
+ logits = model.head(features)
611
+ return logits
612
+ ```
613
+
614
+ This has a **different fingerprint** — the logic changed. If you expected them to be equivalent but they have different fingerprints, you've caught a refactoring bug.
615
+
616
+ ### Runtime Fingerprinting
617
+
618
+ Track what the function **produces**, not what it does.
619
+
620
+ #### Runtime Tracing (`@scope.runtime_trace`)
621
+
622
+ ```python
623
+ import numpy as np
624
+
625
+ # Basic: Track full output
626
+ @scope.runtime_trace(trace_id='predictions')
627
+ @scope
628
+ def evaluate(model, data):
629
+ return model.predict(data)
630
+
631
+ # With init_fn: Fix randomness for reproducibility
632
+ @scope.runtime_trace(
633
+ trace_id='predictions',
634
+ init_fn=lambda: np.random.seed(42) # Initialize before execution
635
+ )
636
+ @scope
637
+ def evaluate_with_dropout(model, data):
638
+ return model.predict(data) # Now deterministic
639
+
640
+ # With inspect_fn: Track specific parts of output
641
+ @scope.runtime_trace(
642
+ trace_id='predictions',
643
+ inspect_fn=lambda preds: preds[:100] # Only hash first 100 predictions
644
+ )
645
+ @scope
646
+ def evaluate_large_output(model, data):
647
+ return model.predict(data)
648
+
649
+ # Advanced: Type-only checking (ignore values)
650
+ @scope.runtime_trace(
651
+ trace_id='predictions',
652
+ inspect_fn=lambda preds: type(preds).__name__ # Track output type only
653
+ )
654
+ @scope
655
+ def evaluate_structure(model, data):
656
+ return model.predict(data)
657
+ ```
658
+
659
+ **Parameters:**
660
+ - `init_fn`: Optional function called before execution (e.g., seed fixing, device setup)
661
+ - `inspect_fn`: Optional function to extract/filter what to track (e.g., first N items, specific fields, types only)
662
+
663
+ Even if code hasn't changed, if predictions differ, the runtime fingerprint changes.
664
+
665
+ #### When to Use Each
666
+
667
+ **Use `@scope.trace()` when:**
668
+ - You want to track code changes automatically
669
+ - You're refactoring and want to isolate performance impact
670
+ - You need to audit "which code produced this result?"
671
+ - You want to ignore cosmetic changes (comments, whitespace, renaming)
672
+
673
+ **Use `@scope.runtime_trace()` when:**
674
+ - You want to detect **silent failures** (code unchanged, output wrong)
675
+ - You're debugging non-determinism
676
+ - You need to verify model behavior across versions
677
+ - You care about what the function produces, not how it's written
678
+
679
+ **Use both when:**
680
+ - Building production ML systems
681
+ - Running long-term research experiments
682
+ - Multiple people modifying the same codebase
683
+
684
+ ---
685
+
686
+ ## SQL Tracker: Local Experiment Tracking
687
+
688
+ Lightweight experiment tracking using SQLite.
689
+
690
+ ### Why SQL Tracker?
691
+
692
+ - **Zero Setup**: Just a SQLite file, no servers
693
+ - **Full History**: Track all runs, metrics, and artifacts
694
+ - **Smart Search**: Find similar experiments by config structure
695
+ - **Code Versioning**: Track code changes via fingerprints
696
+ - **Offline-first**: No network required
697
+
698
+ ### Database Schema
699
+
700
+ ```
701
+ Project (my_ml_project)
702
+ ├── Experiment (run_1)
703
+ │ ├── config: {...}
704
+ │ ├── structural_hash: "abc123..."
705
+ │ ├── Metrics: [loss, accuracy, ...]
706
+ │ ├── Artifacts: [model.pt, plots/*, ...]
707
+ │ └── Fingerprints: [model_forward, train_step, ...]
708
+ ├── Experiment (run_2)
709
+ └── ...
710
+ ```
711
+
712
+ ### Usage
713
+
714
+ #### Logging Experiments
715
+
716
+ ```python
717
+ from ato.db_routers.sql.manager import SQLLogger
718
+ from ato.adict import ADict
719
+
720
+ # Setup config
721
+ config = ADict(
722
+ experiment=ADict(
723
+ project_name='image_classification',
724
+ sql=ADict(db_path='sqlite:///experiments.db')
725
+ ),
726
+ # Your hyperparameters
727
+ lr=0.001,
728
+ batch_size=32,
729
+ model='resnet50'
730
+ )
731
+
732
+ # Create logger
733
+ logger = SQLLogger(config)
734
+
735
+ # Start experiment run
736
+ run_id = logger.run(tags=['baseline', 'resnet50', 'cifar10'])
737
+
738
+ # Training loop
739
+ for epoch in range(100):
740
+ # Your training code
741
+ train_loss = train_one_epoch()
742
+ val_acc = validate()
743
+
744
+ # Log metrics
745
+ logger.log_metric('train_loss', train_loss, step=epoch)
746
+ logger.log_metric('val_accuracy', val_acc, step=epoch)
747
+
748
+ # Log artifacts
749
+ logger.log_artifact(run_id, 'checkpoints/model_best.pt',
750
+ data_type='model',
751
+ metadata={'epoch': best_epoch})
752
+
753
+ # Finish run
754
+ logger.finish(status='completed')
755
+ ```
756
+
757
+ #### Querying Experiments
758
+
759
+ ```python
760
+ from ato.db_routers.sql.manager import SQLFinder
761
+
762
+ finder = SQLFinder(config)
763
+
764
+ # Get all runs in project
765
+ runs = finder.get_runs_in_project('image_classification')
766
+ for run in runs:
767
+ print(f"Run {run.id}: {run.config.model} - {run.status}")
768
+
769
+ # Find best performing run
770
+ best_run = finder.find_best_run(
771
+ project_name='image_classification',
772
+ metric_key='val_accuracy',
773
+ mode='max' # or 'min' for loss
774
+ )
775
+ print(f"Best config: {best_run.config}")
776
+
777
+ # Find similar experiments (same config structure)
778
+ similar = finder.find_similar_runs(run_id=123)
779
+ print(f"Found {len(similar)} runs with similar config structure")
780
+
781
+ # Trace statistics (code fingerprints)
782
+ stats = finder.get_trace_statistics('image_classification', trace_id='model_forward')
783
+ print(f"Model forward pass has {stats['static_trace_versions']} versions")
784
+ ```
785
+
786
+ ### Features
787
+
788
+ | Feature | Description |
789
+ |---------|-------------|
790
+ | **Structural Hash** | Auto-track config structure changes |
791
+ | **Metric Logging** | Time-series metrics with step tracking |
792
+ | **Artifact Management** | Track model checkpoints, plots, data files |
793
+ | **Fingerprint Tracking** | Version control for code (static & runtime) |
794
+ | **Smart Search** | Find similar configs, best runs, statistics |
795
+
796
+ ---
797
+
798
+ ## Hyperparameter Optimization
799
+
800
+ Built-in **Hyperband** algorithm for efficient hyperparameter search with early stopping.
801
+
802
+ ### How Hyperband Works
803
+
804
+ Hyperband uses successive halving:
805
+ 1. Start with many configs, train briefly
806
+ 2. Keep top performers, discard poor ones
807
+ 3. Train survivors longer
808
+ 4. Repeat until one winner remains
809
+
810
+ ### Basic Usage
811
+
812
+ ```python
813
+ from ato.adict import ADict
814
+ from ato.hyperopt.hyperband import HyperBand
815
+ from ato.scope import Scope
816
+
817
+ scope = Scope()
818
+
819
+ # Define search space
820
+ search_spaces = ADict(
821
+ lr=ADict(
822
+ param_type='FLOAT',
823
+ param_range=(1e-5, 1e-1),
824
+ num_samples=20,
825
+ space_type='LOG' # Logarithmic spacing
826
+ ),
827
+ batch_size=ADict(
828
+ param_type='INTEGER',
829
+ param_range=(16, 128),
830
+ num_samples=5,
831
+ space_type='LOG'
832
+ ),
833
+ model=ADict(
834
+ param_type='CATEGORY',
835
+ categories=['resnet50', 'resnet101', 'efficientnet_b0']
836
+ )
837
+ )
838
+
839
+ # Create Hyperband optimizer
840
+ hyperband = HyperBand(
841
+ scope,
842
+ search_spaces,
843
+ halving_rate=0.3, # Keep top 30% each round
844
+ num_min_samples=3, # Stop when <= 3 configs remain
845
+ mode='max' # Maximize metric (use 'min' for loss)
846
+ )
847
+
848
+ @hyperband.main
849
+ def train(config):
850
+ # Your training code
851
+ model = create_model(config.model)
852
+ optimizer = Adam(lr=config.lr)
853
+
854
+ # Use __num_halved__ for early stopping
855
+ num_epochs = compute_epochs(config.__num_halved__)
856
+
857
+ # Train and return metric
858
+ val_acc = train_and_evaluate(model, optimizer, num_epochs)
859
+ return val_acc
860
+
861
+ if __name__ == '__main__':
862
+ # Run hyperparameter search
863
+ best_result = train()
864
+ print(f"Best config: {best_result.config}")
865
+ print(f"Best metric: {best_result.metric}")
866
+ ```
867
+
868
+ ### Parameter Types
869
+
870
+ | Type | Description | Example |
871
+ |------|-------------|---------|
872
+ | `FLOAT` | Continuous values | Learning rate, dropout |
873
+ | `INTEGER` | Discrete integers | Batch size, num layers |
874
+ | `CATEGORY` | Categorical choices | Model type, optimizer |
875
+
876
+ Space types:
877
+ - `LOG`: Logarithmic spacing (good for learning rates)
878
+ - `LINEAR`: Linear spacing (default)
879
+
880
+ ### Distributed Search
881
+
882
+ ```python
883
+ from ato.hyperopt.hyperband import DistributedHyperBand
884
+ import torch.distributed as dist
885
+
886
+ # Initialize distributed training
887
+ dist.init_process_group(backend='nccl')
888
+ rank = dist.get_rank()
889
+ world_size = dist.get_world_size()
890
+
891
+ # Create distributed hyperband
892
+ hyperband = DistributedHyperBand(
893
+ scope,
894
+ search_spaces,
895
+ halving_rate=0.3,
896
+ num_min_samples=3,
897
+ mode='max',
898
+ rank=rank,
899
+ world_size=world_size,
900
+ backend='pytorch'
901
+ )
902
+
903
+ @hyperband.main
904
+ def train(config):
905
+ # Your distributed training code
906
+ model = create_model(config)
907
+ model = DDP(model, device_ids=[rank])
908
+ metric = train_and_evaluate(model)
909
+ return metric
910
+
911
+ if __name__ == '__main__':
912
+ result = train()
913
+ if rank == 0:
914
+ print(f"Best config: {result.config}")
915
+ ```
916
+
917
+ ---
918
+
919
+ ## Best Practices
920
+
921
+ ### 1. Project Structure
922
+
923
+ ```
924
+ my_project/
925
+ ├── configs/
926
+ │ ├── default.py # Default config with @scope.observe(default=True)
927
+ │ ├── models.py # Model-specific configs
928
+ │ └── datasets.py # Dataset configs
929
+ ├── train.py # Main training script
930
+ ├── experiments.db # SQLite experiment tracking
931
+ └── experiments/
932
+ ├── run_001/
933
+ │ ├── checkpoints/
934
+ │ └── logs/
935
+ └── run_002/
936
+ ```
937
+
938
+ ### 2. Config Organization
939
+
940
+ ```python
941
+ # configs/default.py
942
+ from ato.scope import Scope
943
+ from ato.adict import ADict
944
+
945
+ scope = Scope()
946
+
947
+ @scope.observe(default=True)
948
+ def defaults(config):
949
+ # Data
950
+ config.data = ADict(
951
+ dataset='cifar10',
952
+ batch_size=32,
953
+ num_workers=4
954
+ )
955
+
956
+ # Model
957
+ config.model = ADict(
958
+ backbone='resnet50',
959
+ pretrained=True
960
+ )
961
+
962
+ # Training
963
+ config.train = ADict(
964
+ lr=0.001,
965
+ epochs=100,
966
+ optimizer='adam'
967
+ )
968
+
969
+ # Experiment tracking
970
+ config.experiment = ADict(
971
+ project_name='my_project',
972
+ sql=ADict(db_path='sqlite:///experiments.db')
973
+ )
974
+ ```
975
+
976
+ ### 3. Combined Workflow
977
+
978
+ ```python
979
+ from ato.scope import Scope
980
+ from ato.db_routers.sql.manager import SQLLogger
981
+ from configs.default import scope
982
+
983
+ @scope
984
+ def train(config):
985
+ # Setup experiment tracking
986
+ logger = SQLLogger(config)
987
+ run_id = logger.run(tags=[config.model.backbone, config.data.dataset])
988
+
989
+ try:
990
+ # Training loop
991
+ for epoch in range(config.train.epochs):
992
+ loss = train_epoch()
993
+ acc = validate()
994
+
995
+ logger.log_metric('loss', loss, epoch)
996
+ logger.log_metric('accuracy', acc, epoch)
997
+
998
+ logger.finish(status='completed')
999
+
1000
+ except Exception as e:
1001
+ logger.finish(status='failed')
1002
+ raise e
1003
+
1004
+ if __name__ == '__main__':
1005
+ train()
1006
+ ```
1007
+
1008
+ ### 4. Reproducibility Checklist
1009
+
1010
+ - ✅ Use structural hashing to track config changes
1011
+ - ✅ Log all hyperparameters to SQLLogger
1012
+ - ✅ Tag experiments with meaningful labels
1013
+ - ✅ Track artifacts (checkpoints, plots)
1014
+ - ✅ Use lazy configs for derived parameters
1015
+ - ✅ Document configs with `@scope.manual`
1016
+ - ✅ Add code fingerprinting to key functions
1017
+ - ✅ Add runtime fingerprinting to critical outputs
1018
+
1019
+ ---
1020
+
1021
+ ## FAQ
1022
+
1023
+ ### Does Ato replace Hydra?
1024
+
1025
+ No. Hydra is excellent at config composition.
1026
+ Ato is a layer that explains **why** results differ — it observes and fingerprints the final merged config.
1027
+
1028
+ Use them together: Hydra for composition, Ato for causality.
1029
+
1030
+ ### Does Ato conflict with MLflow/W&B?
1031
+
1032
+ No. MLflow/W&B provide dashboards and cloud tracking.
1033
+ Ato provides local causality tracking (config reasoning + code fingerprinting).
1034
+
1035
+ Use them together: MLflow/W&B for metrics/dashboards, Ato for "why did this change?"
1036
+
1037
+ ### Do I need a server?
1038
+
1039
+ No. Ato uses local SQLite. Zero setup, zero network calls.
1040
+
1041
+ ### Can I use Ato with my existing config files?
1042
+
1043
+ Yes. Ato is format-agnostic:
1044
+ - Load YAML/JSON/TOML → Ato fingerprints the result
1045
+ - Import OpenMMLab configs → Ato handles `_base_` inheritance
1046
+ - Use argparse → Ato integrates seamlessly
1047
+
1048
+ ### What if I already have experiment tracking?
1049
+
1050
+ Keep it. Ato complements existing tracking:
1051
+ - Your tracking: metrics, artifacts, dashboards
1052
+ - Ato: config reasoning, code fingerprinting, causality
1053
+
1054
+ No migration required.
1055
+
1056
+ ### Is Ato production-ready?
1057
+
1058
+ Yes. Ato has ~100 unit tests that pass on every release.
1059
+ Python codebase is ~10 files — small, readable, auditable.
1060
+
1061
+ ### What's the performance overhead?
1062
+
1063
+ Minimal:
1064
+ - Config fingerprinting: microseconds
1065
+ - Code fingerprinting: happens once at decoration time
1066
+ - Runtime fingerprinting: depends on `inspect_fn` complexity
1067
+ - SQLite logging: milliseconds per metric
1068
+
1069
+ ### Can I self-host?
1070
+
1071
+ Ato runs entirely locally. There's nothing to host.
1072
+ If you need centralized tracking, use MLflow/W&B alongside Ato.
1073
+
1074
+ ---
1075
+
1076
+ ## Quality Signals
1077
+
1078
+ **Every release passes 100+ unit tests.**
1079
+ No unchecked code. No silent failure.
1080
+
1081
+ This isn't a feature. It's a commitment.
1082
+
1083
+ When you fingerprint experiments, you're trusting the fingerprints are correct.
1084
+ When you merge configs, you're trusting the merge order is deterministic.
1085
+ When you trace code, you're trusting the bytecode hashing is stable.
1086
+
1087
+ Ato has zero tolerance for regressions.
1088
+
1089
+ Tests cover every module — ADict, Scope, MultiScope, SQLTracker, HyperBand — and every edge case we've encountered in production use.
1090
+
1091
+ ```bash
1092
+ python -m pytest unit_tests/ # Run locally. Always passes.
1093
+ ```
1094
+
1095
+ **If a test fails, the release doesn't ship. Period.**
1096
+
1097
+ **Codebase size:** ~10 Python files
1098
+ Small, readable, auditable. No magic, no metaprogramming.
1099
+
1100
+ ---
1101
+
1102
+ ## Requirements
1103
+
1104
+ - Python >= 3.7 (Python >= 3.8 required for lazy evaluation features)
1105
+ - SQLAlchemy (for SQL Tracker)
1106
+ - PyYAML, toml (for config serialization)
1107
+
1108
+ See `pyproject.toml` for full dependencies.
1109
+
1110
+ ---
1111
+
1112
+ ## Contributing
1113
+
1114
+ Contributions are welcome! Please feel free to submit issues or pull requests.
1115
+
1116
+ ### Development Setup
1117
+
1118
+ ```bash
1119
+ git clone https://github.com/Dirac-Robot/ato.git
1120
+ cd ato
1121
+ pip install -e .
1122
+ ```
1123
+
1124
+ ### Running Tests
1125
+
1126
+ ```bash
1127
+ python -m pytest unit_tests/
1128
+ ```
1129
+
1130
+ ---
1131
+
1132
+ ## License
1133
+
1134
+ MIT License