hydraflow 0.17.2__tar.gz → 0.18.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {hydraflow-0.17.2 → hydraflow-0.18.1}/PKG-INFO +1 -5
- {hydraflow-0.17.2 → hydraflow-0.18.1}/README.md +0 -3
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/concepts.md +29 -14
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/configuration.md +6 -2
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/execution.md +18 -7
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/index.md +2 -1
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/main-decorator.md +11 -4
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/index.md +29 -14
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/job-configuration.md +16 -7
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/sweep-syntax.md +8 -3
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/index.md +12 -5
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/run-class.md +5 -2
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/run-collection.md +43 -41
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/advanced.md +6 -6
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/analysis.md +46 -24
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/applications.md +20 -9
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/index.md +5 -2
- {hydraflow-0.17.2 → hydraflow-0.18.1}/pyproject.toml +2 -2
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/collection.py +320 -16
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/main.py +18 -1
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run.py +33 -6
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run_collection.py +2 -2
- hydraflow-0.18.1/src/hydraflow/utils/progress.py +90 -0
- hydraflow-0.18.1/tests/core/main/test_dry_run.py +26 -0
- hydraflow-0.18.1/tests/core/main/update.py +35 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run.py +13 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run_collection.py +8 -1
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_collection.py +118 -0
- hydraflow-0.18.1/tests/executor/__init__.py +0 -0
- hydraflow-0.18.1/tests/utils/__init__.py +0 -0
- hydraflow-0.18.1/tests/utils/test_progress.py +34 -0
- hydraflow-0.17.2/tests/core/main/update.py +0 -35
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/devcontainer.json +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/postCreate.sh +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/starship.toml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.gitattributes +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/ci.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/docs.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/publish.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/.gitignore +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/LICENSE +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/index.md +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/installation.md +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/index.md +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/updating-runs.md +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/example.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/hydraflow.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/submit.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/mkdocs.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/cli.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/context.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/group_by.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/io.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run_info.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/aio.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/conf.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/io.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/job.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/parser.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/py.typed +0 -0
- {hydraflow-0.17.2/tests → hydraflow-0.18.1/src/hydraflow/utils}/__init__.py +0 -0
- {hydraflow-0.17.2/tests/cli → hydraflow-0.18.1/tests}/__init__.py +0 -0
- {hydraflow-0.17.2/tests/core → hydraflow-0.18.1/tests/cli}/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/app.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/conftest.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/hydraflow.yaml +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/submit.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_setup.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_show.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_version.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/conftest.py +0 -0
- {hydraflow-0.17.2/tests/core/context → hydraflow-0.18.1/tests/core}/__init__.py +0 -0
- {hydraflow-0.17.2/tests/core/main → hydraflow-0.18.1/tests/core/context}/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/chdir.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/log_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/start_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_chdir.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_log_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_start_run.py +0 -0
- {hydraflow-0.17.2/tests/core/run → hydraflow-0.18.1/tests/core/main}/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/default.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/force_new_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/match_overrides.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/rerun_finished.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/skip_finished.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_default.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_force_new_run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_main.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_match_overrides.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_rerun_finished.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_skip_finished.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_update.py +0 -0
- {hydraflow-0.17.2/tests/executor → hydraflow-0.18.1/tests/core/run}/__init__.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/run.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run_info.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_group_by.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_io.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/conftest.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/echo.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/read.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_aio.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_args.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_conf.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_io.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_job.py +0 -0
- {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_parser.py +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: hydraflow
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.18.1
|
4
4
|
Summary: HydraFlow seamlessly integrates Hydra and MLflow to streamline ML experiment management, combining Hydra's configuration management with MLflow's tracking capabilities.
|
5
5
|
Project-URL: Documentation, https://daizutabi.github.io/hydraflow/
|
6
6
|
Project-URL: Source, https://github.com/daizutabi/hydraflow
|
@@ -47,7 +47,6 @@ Requires-Dist: omegaconf>=2.3
|
|
47
47
|
Requires-Dist: polars>=1.26
|
48
48
|
Requires-Dist: python-ulid>=3.0.0
|
49
49
|
Requires-Dist: rich>=13.9
|
50
|
-
Requires-Dist: ruff>=0.11
|
51
50
|
Requires-Dist: typer>=0.15
|
52
51
|
Description-Content-Type: text/markdown
|
53
52
|
|
@@ -119,9 +118,6 @@ def app(run: Run, cfg: Config) -> None:
|
|
119
118
|
# Your experiment code here
|
120
119
|
print(f"Running with width={cfg.width}, height={cfg.height}")
|
121
120
|
|
122
|
-
# Log metrics
|
123
|
-
hydraflow.log_metric("area", cfg.width * cfg.height)
|
124
|
-
|
125
121
|
if __name__ == "__main__":
|
126
122
|
app()
|
127
123
|
```
|
@@ -66,9 +66,6 @@ def app(run: Run, cfg: Config) -> None:
|
|
66
66
|
# Your experiment code here
|
67
67
|
print(f"Running with width={cfg.width}, height={cfg.height}")
|
68
68
|
|
69
|
-
# Log metrics
|
70
|
-
hydraflow.log_metric("area", cfg.width * cfg.height)
|
71
|
-
|
72
69
|
if __name__ == "__main__":
|
73
70
|
app()
|
74
71
|
```
|
@@ -1,15 +1,20 @@
|
|
1
1
|
# Core Concepts
|
2
2
|
|
3
|
-
This page introduces the fundamental concepts of HydraFlow that
|
3
|
+
This page introduces the fundamental concepts of HydraFlow that
|
4
|
+
form the foundation of the framework.
|
4
5
|
|
5
6
|
## Design Principles
|
6
7
|
|
7
8
|
HydraFlow is built on the following design principles:
|
8
9
|
|
9
|
-
1. **Type Safety** - Utilizing Python dataclasses for configuration
|
10
|
-
|
11
|
-
|
12
|
-
|
10
|
+
1. **Type Safety** - Utilizing Python dataclasses for configuration
|
11
|
+
type checking and IDE support
|
12
|
+
2. **Reproducibility** - Automatically tracking all experiment configurations
|
13
|
+
for fully reproducible experiments
|
14
|
+
3. **Workflow Integration** - Creating a cohesive workflow by integrating
|
15
|
+
Hydra's configuration management with MLflow's experiment tracking
|
16
|
+
4. **Analysis Capabilities** - Providing powerful APIs for easily
|
17
|
+
analyzing experiment results
|
13
18
|
|
14
19
|
## Key Components
|
15
20
|
|
@@ -17,7 +22,8 @@ HydraFlow consists of the following key components:
|
|
17
22
|
|
18
23
|
### Configuration Management
|
19
24
|
|
20
|
-
HydraFlow uses a hierarchical configuration system based on
|
25
|
+
HydraFlow uses a hierarchical configuration system based on
|
26
|
+
OmegaConf and Hydra. This provides:
|
21
27
|
|
22
28
|
- Type-safe configuration using Python dataclasses
|
23
29
|
- Schema validation to ensure configuration correctness
|
@@ -36,11 +42,13 @@ class Config:
|
|
36
42
|
epochs: int = 10
|
37
43
|
```
|
38
44
|
|
39
|
-
This configuration class defines the structure and default values
|
45
|
+
This configuration class defines the structure and default values
|
46
|
+
for your experiment, enabling type checking and auto-completion.
|
40
47
|
|
41
48
|
### Main Decorator
|
42
49
|
|
43
|
-
The [`@hydraflow.main`][hydraflow.main] decorator defines the entry
|
50
|
+
The [`@hydraflow.main`][hydraflow.main] decorator defines the entry
|
51
|
+
point for a HydraFlow application:
|
44
52
|
|
45
53
|
```python
|
46
54
|
import hydraflow
|
@@ -64,7 +72,8 @@ This decorator provides:
|
|
64
72
|
|
65
73
|
### Workflow Automation
|
66
74
|
|
67
|
-
HydraFlow allows you to automate experiment workflows using a
|
75
|
+
HydraFlow allows you to automate experiment workflows using a
|
76
|
+
YAML-based job definition system:
|
68
77
|
|
69
78
|
```yaml
|
70
79
|
jobs:
|
@@ -98,11 +107,14 @@ python train.py -m "model=(small,large)_(v1,v2)"
|
|
98
107
|
|
99
108
|
### Analysis Tools
|
100
109
|
|
101
|
-
After running experiments, HydraFlow provides powerful tools for accessing
|
110
|
+
After running experiments, HydraFlow provides powerful tools for accessing
|
111
|
+
and analyzing results. These tools help you track, compare, and derive
|
112
|
+
insights from your experiments.
|
102
113
|
|
103
114
|
#### Working with Individual Runs
|
104
115
|
|
105
|
-
For individual experiment analysis, HydraFlow provides the `Run` class,
|
116
|
+
For individual experiment analysis, HydraFlow provides the `Run` class,
|
117
|
+
which represents a single experiment run:
|
106
118
|
|
107
119
|
```python
|
108
120
|
from hydraflow import Run
|
@@ -139,7 +151,8 @@ print(run.cfg.learning_rate) # IDE auto-completion works
|
|
139
151
|
|
140
152
|
#### Comparing Multiple Runs
|
141
153
|
|
142
|
-
For comparing multiple runs, HydraFlow offers the `RunCollection` class,
|
154
|
+
For comparing multiple runs, HydraFlow offers the `RunCollection` class,
|
155
|
+
which enables efficient analysis across runs:
|
143
156
|
|
144
157
|
```python
|
145
158
|
# Load multiple runs
|
@@ -164,11 +177,13 @@ Key features of experiment comparison:
|
|
164
177
|
|
165
178
|
## Summary
|
166
179
|
|
167
|
-
These core concepts work together to provide a comprehensive framework
|
180
|
+
These core concepts work together to provide a comprehensive framework
|
181
|
+
for managing machine learning experiments:
|
168
182
|
|
169
183
|
1. **Configuration Management** - Type-safe configuration with Python dataclasses
|
170
184
|
2. **Main Decorator** - The entry point that integrates Hydra and MLflow
|
171
185
|
3. **Workflow Automation** - Reusable experiment definitions and advanced parameter sweeps
|
172
186
|
4. **Analysis Tools** - Access, filter, and analyze experiment results
|
173
187
|
|
174
|
-
Understanding these fundamental concepts will help you leverage the full power
|
188
|
+
Understanding these fundamental concepts will help you leverage the full power
|
189
|
+
of HydraFlow for your machine learning projects.
|
@@ -83,7 +83,9 @@ def train(run: Run, cfg: Config) -> None:
|
|
83
83
|
|
84
84
|
## Hydra Integration
|
85
85
|
|
86
|
-
HydraFlow integrates closely with Hydra for configuration management.
|
86
|
+
HydraFlow integrates closely with Hydra for configuration management.
|
87
|
+
For detailed explanations of Hydra's capabilities, please refer to
|
88
|
+
the [Hydra documentation](https://hydra.cc/docs/intro/).
|
87
89
|
|
88
90
|
HydraFlow leverages the following Hydra features, but does not modify their behavior:
|
89
91
|
|
@@ -100,7 +102,9 @@ When using HydraFlow, remember that:
|
|
100
102
|
2. HydraFlow automatically registers your top-level dataclass with Hydra
|
101
103
|
3. `@hydraflow.main` sets up the connection between your dataclass and Hydra
|
102
104
|
|
103
|
-
For advanced Hydra features and detailed usage examples, we recommend
|
105
|
+
For advanced Hydra features and detailed usage examples, we recommend
|
106
|
+
consulting the official Hydra documentation after you become familiar
|
107
|
+
with the basic HydraFlow concepts.
|
104
108
|
|
105
109
|
## Best Practices
|
106
110
|
|
@@ -14,13 +14,18 @@ python train.py
|
|
14
14
|
|
15
15
|
This will:
|
16
16
|
|
17
|
-
1. Set up an MLflow experiment with the same name as the Hydra job name
|
17
|
+
1. Set up an MLflow experiment with the same name as the Hydra job name
|
18
|
+
(using `mlflow.set_experiment`). If the experiment doesn't exist,
|
19
|
+
it will be created automatically
|
18
20
|
2. Create a new MLflow run or reuse an existing one based on the configuration
|
19
21
|
3. Save the Hydra configuration as an MLflow artifact
|
20
22
|
4. Execute your function decorated with `@hydraflow.main`
|
21
23
|
5. Save only `*.log` files from Hydra's output directory as MLflow artifacts
|
22
24
|
|
23
|
-
Note that any other artifacts (models, data files, etc.) must be explicitly
|
25
|
+
Note that any other artifacts (models, data files, etc.) must be explicitly
|
26
|
+
saved by your code using MLflow's logging functions. The `chdir` option in
|
27
|
+
the `@hydraflow.main` decorator can help with this by changing the working
|
28
|
+
directory to the run's artifact directory, making file operations more convenient.
|
24
29
|
|
25
30
|
## Command-line Override Syntax
|
26
31
|
|
@@ -62,7 +67,8 @@ of the specified parameters (2 learning rates × 3 model types).
|
|
62
67
|
|
63
68
|
### Advanced Parameter Sweeps
|
64
69
|
|
65
|
-
For more complex parameter spaces, HydraFlow provides an extended
|
70
|
+
For more complex parameter spaces, HydraFlow provides an extended
|
71
|
+
sweep syntax that goes beyond Hydra's basic capabilities:
|
66
72
|
|
67
73
|
```bash
|
68
74
|
# Define numerical ranges with start:stop:step
|
@@ -75,11 +81,13 @@ python train.py -m learning_rate=1:5:m # 0.001 to 0.005
|
|
75
81
|
python train.py -m model=(cnn,transformer)_(small,large)
|
76
82
|
```
|
77
83
|
|
78
|
-
See [Extended Sweep Syntax](../part2-advanced/sweep-syntax.md) for a
|
84
|
+
See [Extended Sweep Syntax](../part2-advanced/sweep-syntax.md) for a
|
85
|
+
complete reference on these powerful features.
|
79
86
|
|
80
87
|
### Managing Complex Experiment Workflows
|
81
88
|
|
82
|
-
HydraFlow provides CLI tools to work with multirun mode more efficiently
|
89
|
+
HydraFlow provides CLI tools to work with multirun mode more efficiently
|
90
|
+
than using long command lines:
|
83
91
|
|
84
92
|
```bash
|
85
93
|
# Define jobs in hydraflow.yaml
|
@@ -94,11 +102,14 @@ jobs:
|
|
94
102
|
hydraflow run train
|
95
103
|
```
|
96
104
|
|
97
|
-
This approach helps you organize complex experiments, track execution history,
|
105
|
+
This approach helps you organize complex experiments, track execution history,
|
106
|
+
and make experiments more reproducible. For details on these advanced capabilities,
|
107
|
+
see [Job Configuration](../part2-advanced/job-configuration.md) in Part 2.
|
98
108
|
|
99
109
|
## Output Organization
|
100
110
|
|
101
|
-
By default, Hydra organizes outputs in the following directory structure
|
111
|
+
By default, Hydra organizes outputs in the following directory structure
|
112
|
+
for HydraFlow applications:
|
102
113
|
|
103
114
|
```
|
104
115
|
ROOT_DIR/
|
@@ -55,7 +55,8 @@ if __name__ == "__main__":
|
|
55
55
|
|
56
56
|
## Practical Examples
|
57
57
|
|
58
|
-
If you prefer learning by example, check out our
|
58
|
+
If you prefer learning by example, check out our
|
59
|
+
[Practical Tutorials](../practical-tutorials/index.md) section, which includes:
|
59
60
|
|
60
61
|
- [Creating Your First HydraFlow Application](../practical-tutorials/applications.md): A step-by-step guide to building a basic application
|
61
62
|
- [Automating Complex Workflows](../practical-tutorials/advanced.md): How to define and execute complex experiment workflows
|
@@ -135,7 +135,10 @@ This default behavior improves efficiency by:
|
|
135
135
|
|
136
136
|
## Automatic Skipping of Completed Runs
|
137
137
|
|
138
|
-
HydraFlow automatically skips runs that have already completed successfully.
|
138
|
+
HydraFlow automatically skips runs that have already completed successfully.
|
139
|
+
This is especially valuable in environments where jobs are automatically
|
140
|
+
restarted after preemption. Without requiring any additional configuration,
|
141
|
+
HydraFlow will:
|
139
142
|
|
140
143
|
1. Identify already completed runs with the same configuration
|
141
144
|
2. Skip re-execution of those runs
|
@@ -161,7 +164,9 @@ This automatic skipping behavior:
|
|
161
164
|
|
162
165
|
## Advanced Features
|
163
166
|
|
164
|
-
The `hydraflow.main` decorator supports several keyword arguments that
|
167
|
+
The `hydraflow.main` decorator supports several keyword arguments that
|
168
|
+
enhance its functionality. All these options are set to `False` by
|
169
|
+
default and must be explicitly enabled when needed:
|
165
170
|
|
166
171
|
### Working Directory Management (`chdir`)
|
167
172
|
|
@@ -187,7 +192,8 @@ This option is beneficial when:
|
|
187
192
|
|
188
193
|
### Forcing New Runs (`force_new_run`)
|
189
194
|
|
190
|
-
Override the default run identification and reuse behavior by always
|
195
|
+
Override the default run identification and reuse behavior by always
|
196
|
+
creating a new run, even when identical configurations exist:
|
191
197
|
|
192
198
|
```python
|
193
199
|
@hydraflow.main(Config, force_new_run=True)
|
@@ -206,7 +212,8 @@ This option is useful when:
|
|
206
212
|
|
207
213
|
### Rerunning Finished Experiments (`rerun_finished`)
|
208
214
|
|
209
|
-
Override the automatic skipping of completed runs by explicitly
|
215
|
+
Override the automatic skipping of completed runs by explicitly
|
216
|
+
allowing rerunning of experiments that have already finished:
|
210
217
|
|
211
218
|
```python
|
212
219
|
@hydraflow.main(Config, rerun_finished=True)
|
@@ -1,10 +1,13 @@
|
|
1
1
|
# Automating Workflows
|
2
2
|
|
3
|
-
This section covers advanced techniques for automating and structuring
|
3
|
+
This section covers advanced techniques for automating and structuring
|
4
|
+
multiple experiments in HydraFlow. It provides tools for defining complex
|
5
|
+
parameter spaces and reusable experiment definitions.
|
4
6
|
|
5
7
|
## Overview
|
6
8
|
|
7
|
-
After creating your basic HydraFlow applications, the next step is to
|
9
|
+
After creating your basic HydraFlow applications, the next step is to
|
10
|
+
automate your experiment workflows. This includes:
|
8
11
|
|
9
12
|
- Creating parameter sweeps across complex combinations
|
10
13
|
- Defining reusable experiment configurations
|
@@ -14,20 +17,26 @@ After creating your basic HydraFlow applications, the next step is to automate y
|
|
14
17
|
|
15
18
|
The main components for workflow automation in HydraFlow are:
|
16
19
|
|
17
|
-
1. **Extended Sweep Syntax**: A powerful syntax for defining parameter
|
18
|
-
|
19
|
-
2. **Job Configuration**: A YAML-based definition system for creating
|
20
|
+
1. **Extended Sweep Syntax**: A powerful syntax for defining parameter
|
21
|
+
spaces beyond simple comma-separated values.
|
22
|
+
2. **Job Configuration**: A YAML-based definition system for creating
|
23
|
+
reusable experiment workflows.
|
20
24
|
|
21
25
|
## Practical Examples
|
22
26
|
|
23
|
-
For hands-on examples of workflow automation, see our
|
27
|
+
For hands-on examples of workflow automation, see our
|
28
|
+
[Practical Tutorials](../practical-tutorials/index.md) section, specifically:
|
24
29
|
|
25
|
-
- [Automating Complex Workflows](../practical-tutorials/advanced.md): A tutorial
|
26
|
-
|
30
|
+
- [Automating Complex Workflows](../practical-tutorials/advanced.md): A tutorial
|
31
|
+
that demonstrates how to use `hydraflow.yaml` to define and execute
|
32
|
+
various types of workflows
|
33
|
+
- [Analyzing Experiment Results](../practical-tutorials/analysis.md): Learn
|
34
|
+
how to work with results from automated experiment runs
|
27
35
|
|
28
36
|
## Extended Sweep Syntax
|
29
37
|
|
30
|
-
HydraFlow extends Hydra's sweep syntax to provide more powerful ways
|
38
|
+
HydraFlow extends Hydra's sweep syntax to provide more powerful ways
|
39
|
+
to define parameter spaces:
|
31
40
|
|
32
41
|
```bash
|
33
42
|
# Range of values (inclusive)
|
@@ -44,7 +53,8 @@ Learn more about these capabilities in [Sweep Syntax](sweep-syntax.md).
|
|
44
53
|
|
45
54
|
## Job Configuration
|
46
55
|
|
47
|
-
For more complex experiment workflows, you can use HydraFlow's job
|
56
|
+
For more complex experiment workflows, you can use HydraFlow's job
|
57
|
+
configuration system:
|
48
58
|
|
49
59
|
```yaml
|
50
60
|
jobs:
|
@@ -61,7 +71,9 @@ jobs:
|
|
61
71
|
all: test_data=validation
|
62
72
|
```
|
63
73
|
|
64
|
-
This approach allows you to define reusable experiment definitions that
|
74
|
+
This approach allows you to define reusable experiment definitions that
|
75
|
+
can be executed with a single command. Learn more in
|
76
|
+
[Job Configuration](job-configuration.md).
|
65
77
|
|
66
78
|
## Executing Workflows
|
67
79
|
|
@@ -82,7 +94,10 @@ hydraflow run train_models seed=123
|
|
82
94
|
|
83
95
|
In the following pages, we'll explore workflow automation in detail:
|
84
96
|
|
85
|
-
- [Sweep Syntax](sweep-syntax.md): Learn about HydraFlow's extended
|
86
|
-
|
97
|
+
- [Sweep Syntax](sweep-syntax.md): Learn about HydraFlow's extended
|
98
|
+
syntax for defining parameter spaces.
|
99
|
+
- [Job Configuration](job-configuration.md): Discover how to create
|
100
|
+
reusable job definitions for your experiments.
|
87
101
|
|
88
|
-
After automating your experiments, you'll want to analyze the results
|
102
|
+
After automating your experiments, you'll want to analyze the results
|
103
|
+
using the tools covered in [Part 3: Analyzing Results](../part3-analysis/index.md).
|
@@ -64,7 +64,8 @@ The specified function will be imported and called with the parameters.
|
|
64
64
|
|
65
65
|
### `submit`
|
66
66
|
|
67
|
-
The `submit` command collects all parameter combinations into a text
|
67
|
+
The `submit` command collects all parameter combinations into a text
|
68
|
+
file and passes this file to the specified command:
|
68
69
|
|
69
70
|
```yaml
|
70
71
|
jobs:
|
@@ -91,7 +92,9 @@ The key difference between `run` and `submit`:
|
|
91
92
|
- `run`: Executes the command once per parameter combination
|
92
93
|
- `submit`: Executes the command once, with all parameter combinations provided in a file
|
93
94
|
|
94
|
-
This gives you complete flexibility in how parameter combinations are
|
95
|
+
This gives you complete flexibility in how parameter combinations are
|
96
|
+
processed. Your handler script can implement any logic - from simple
|
97
|
+
sequential processing to complex distributed execution across a cluster.
|
95
98
|
|
96
99
|
## Parameter Sets
|
97
100
|
|
@@ -214,14 +217,18 @@ jobs:
|
|
214
217
|
add: hydra/launcher=submitit hydra.launcher.submitit.cpus_per_task=8
|
215
218
|
```
|
216
219
|
|
217
|
-
When a set has its own `add` parameter, it is merged with
|
218
|
-
|
220
|
+
When a set has its own `add` parameter, it is merged with
|
221
|
+
the job-level `add` parameter.
|
222
|
+
If the same parameter key exists in both the job-level and set-level
|
223
|
+
`add`, the set-level value takes precedence.
|
219
224
|
|
220
225
|
For example, with the configuration above:
|
226
|
+
|
221
227
|
- The first set uses: `hydra/launcher=joblib hydra.launcher.n_jobs=2`
|
222
228
|
- The second set uses: `hydra/launcher=submitit hydra.launcher.n_jobs=2 hydra.launcher.submitit.cpus_per_task=8`
|
223
229
|
|
224
|
-
Notice how `hydra/launcher` is overridden by the set-level value,
|
230
|
+
Notice how `hydra/launcher` is overridden by the set-level value,
|
231
|
+
while `hydra.launcher.n_jobs` from the job-level is retained.
|
225
232
|
|
226
233
|
This behavior allows you to:
|
227
234
|
|
@@ -229,11 +236,13 @@ This behavior allows you to:
|
|
229
236
|
2. Override or add specific parameters at the set level
|
230
237
|
3. Keep all non-conflicting parameters from both levels
|
231
238
|
|
232
|
-
This merging behavior makes it easy to maintain common configuration
|
239
|
+
This merging behavior makes it easy to maintain common configuration
|
240
|
+
options while customizing specific aspects for different parameter sets.
|
233
241
|
|
234
242
|
## Summary
|
235
243
|
|
236
|
-
HydraFlow's job configuration system provides a powerful way to define
|
244
|
+
HydraFlow's job configuration system provides a powerful way to define
|
245
|
+
and manage complex parameter sweeps:
|
237
246
|
|
238
247
|
1. **Execution Commands**:
|
239
248
|
|
@@ -24,7 +24,8 @@ python train.py -m model=medium
|
|
24
24
|
python train.py -m model=large
|
25
25
|
```
|
26
26
|
|
27
|
-
When using multiple parameters with `each`, all possible
|
27
|
+
When using multiple parameters with `each`, all possible
|
28
|
+
combinations (cartesian product) will be generated:
|
28
29
|
|
29
30
|
```yaml
|
30
31
|
jobs:
|
@@ -275,6 +276,10 @@ HydraFlow's extended sweep syntax provides several powerful features for paramet
|
|
275
276
|
5. **Parentheses grouping** - Create combinations of values and nested structures
|
276
277
|
6. **Pipe operator** - Run multiple independent parameter sweeps in the same job
|
277
278
|
|
278
|
-
All of these can be combined to create complex, expressive parameter sweeps
|
279
|
+
All of these can be combined to create complex, expressive parameter sweeps
|
280
|
+
with minimal configuration. Remember that using the `each` keyword creates a cartesian
|
281
|
+
product of all parameters (all possible combinations), while the pipe
|
282
|
+
operator (`|`) creates separate, independent parameter sweeps.
|
279
283
|
|
280
|
-
When using these features, HydraFlow will automatically generate the appropriate
|
284
|
+
When using these features, HydraFlow will automatically generate the appropriate
|
285
|
+
Hydra multirun commands with the `-m` flag.
|
@@ -33,9 +33,12 @@ The main components of HydraFlow's analysis tools are:
|
|
33
33
|
|
34
34
|
## Practical Examples
|
35
35
|
|
36
|
-
For hands-on examples of experiment analysis, check out our
|
36
|
+
For hands-on examples of experiment analysis, check out our
|
37
|
+
[Practical Tutorials](../practical-tutorials/index.md) section, specifically:
|
37
38
|
|
38
|
-
- [Analyzing Experiment Results](../practical-tutorials/analysis.md): A
|
39
|
+
- [Analyzing Experiment Results](../practical-tutorials/analysis.md): A
|
40
|
+
detailed tutorial demonstrating how to load, filter, group, and analyze
|
41
|
+
experiment data using HydraFlow's APIs
|
39
42
|
|
40
43
|
## Basic Analysis Workflow
|
41
44
|
|
@@ -66,7 +69,8 @@ best_run = df.sort("accuracy", descending=True).first()
|
|
66
69
|
|
67
70
|
## Finding and Loading Runs
|
68
71
|
|
69
|
-
HydraFlow provides utilities to easily find and load runs from your
|
72
|
+
HydraFlow provides utilities to easily find and load runs from your
|
73
|
+
MLflow tracking directory:
|
70
74
|
|
71
75
|
```python
|
72
76
|
from hydraflow import Run
|
@@ -83,7 +87,8 @@ runs = Run.load(iter_run_dirs(tracking_dir, "my_experiment"))
|
|
83
87
|
runs = Run.load(iter_run_dirs(tracking_dir, ["training_*", "finetuning_*"]))
|
84
88
|
```
|
85
89
|
|
86
|
-
This approach makes it easy to gather all relevant runs for analysis
|
90
|
+
This approach makes it easy to gather all relevant runs for analysis
|
91
|
+
without having to manually specify each run directory.
|
87
92
|
|
88
93
|
## Type-Safe Analysis
|
89
94
|
|
@@ -137,7 +142,9 @@ model = run.impl.load_model()
|
|
137
142
|
results = run.impl.analyze_performance()
|
138
143
|
```
|
139
144
|
|
140
|
-
The analysis capabilities covered in Part 3 are designed to work
|
145
|
+
The analysis capabilities covered in Part 3 are designed to work
|
146
|
+
seamlessly with the experiment definitions from [Part 1](../part1-applications/index.md)
|
147
|
+
and the advanced workflow automation from [Part 2](../part2-advanced/index.md).
|
141
148
|
|
142
149
|
## What's Next
|
143
150
|
|
@@ -210,7 +210,8 @@ runs = Run.load(run_dirs, n_jobs=-1) # Use all available CPU cores
|
|
210
210
|
|
211
211
|
### Finding Runs with `iter_run_dirs`
|
212
212
|
|
213
|
-
HydraFlow provides the [`iter_run_dirs`][hydraflow.core.io.iter_run_dirs]
|
213
|
+
HydraFlow provides the [`iter_run_dirs`][hydraflow.core.io.iter_run_dirs]
|
214
|
+
function to easily discover runs in your MLflow tracking directory:
|
214
215
|
|
215
216
|
```python
|
216
217
|
from hydraflow.core.io import iter_run_dirs
|
@@ -235,7 +236,9 @@ def filter_experiments(name: str) -> bool:
|
|
235
236
|
runs = Run.load(iter_run_dirs(tracking_dir, filter_experiments))
|
236
237
|
```
|
237
238
|
|
238
|
-
The `iter_run_dirs` function yields paths to run directories that can be
|
239
|
+
The `iter_run_dirs` function yields paths to run directories that can be
|
240
|
+
directly passed to `Run.load`. This makes it easy to find and load runs
|
241
|
+
based on experiment names or custom filtering criteria.
|
239
242
|
|
240
243
|
## Best Practices
|
241
244
|
|
@@ -7,19 +7,27 @@ instances, making it easy to compare and extract insights from your experiments.
|
|
7
7
|
|
8
8
|
## Architecture
|
9
9
|
|
10
|
-
`RunCollection` is built on top of the more general
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
10
|
+
`RunCollection` is built on top of the more general
|
11
|
+
[`Collection`][hydraflow.core.collection.Collection]
|
12
|
+
class, which provides a flexible foundation for working with sequences
|
13
|
+
of items. This architecture offers several benefits:
|
14
|
+
|
15
|
+
1. **Consistent Interface**: All collection-based classes in HydraFlow
|
16
|
+
share a common interface and behavior
|
17
|
+
2. **Code Reuse**: Core functionality is implemented once in the base
|
18
|
+
class and inherited by specialized collections
|
19
|
+
3. **Extensibility**: New collection types can easily be created
|
20
|
+
for different item types
|
21
|
+
4. **Type Safety**: Generic type parameters ensure type checking
|
22
|
+
throughout the collection hierarchy
|
23
|
+
|
24
|
+
The `Collection` class implements the Python `Sequence` protocol,
|
25
|
+
allowing it to be used like standard Python collections (lists, tuples)
|
26
|
+
while providing specialized methods for filtering, grouping, and data extraction.
|
27
|
+
|
28
|
+
`RunCollection` extends this foundation with run-specific functionality,
|
29
|
+
particularly for working with MLflow experiment data. This layered
|
30
|
+
design separates generic collection behavior from domain-specific operations.
|
23
31
|
|
24
32
|
## Creating a Run Collection
|
25
33
|
|
@@ -243,17 +251,19 @@ model_types = runs.unique("model_type")
|
|
243
251
|
num_model_types = runs.n_unique("model_type")
|
244
252
|
```
|
245
253
|
|
246
|
-
All data extraction methods (`to_list`, `to_numpy`, `to_series`, etc.)
|
247
|
-
|
248
|
-
the Run
|
254
|
+
All data extraction methods (`to_list`, `to_numpy`, `to_series`, etc.)
|
255
|
+
support both static and callable default values,
|
256
|
+
matching the behavior of the `Run.get` method. When using a callable default,
|
257
|
+
the function receives the Run instance as an argument, allowing you to:
|
249
258
|
|
250
259
|
- Implement fallback logic for missing parameters
|
251
260
|
- Create derived values based on multiple parameters
|
252
261
|
- Handle varying configuration schemas across different experiments
|
253
262
|
- Apply transformations to the raw parameter values
|
254
263
|
|
255
|
-
This makes it much easier to work with heterogeneous collections of
|
256
|
-
parameter sets or evolving configuration
|
264
|
+
This makes it much easier to work with heterogeneous collections of
|
265
|
+
runs that might have different parameter sets or evolving configuration
|
266
|
+
schemas.
|
257
267
|
|
258
268
|
## Converting to DataFrame
|
259
269
|
|
@@ -293,23 +303,6 @@ filled_df = missing_values_df.with_columns(
|
|
293
303
|
)
|
294
304
|
```
|
295
305
|
|
296
|
-
The `to_frame` method provides several ways to handle missing data:
|
297
|
-
|
298
|
-
1. **defaults parameter**: Provide static or callable default values for specific keys
|
299
|
-
- Static values: `defaults={"param": value}`
|
300
|
-
- Callable values: `defaults={"param": lambda run: computed_value}`
|
301
|
-
|
302
|
-
2. **None values**: Parameters without defaults are represented as `None` (null) in the DataFrame
|
303
|
-
- This lets you use Polars operations for handling null values:
|
304
|
-
- Filter: `df.filter(pl.col("param").is_not_null())`
|
305
|
-
- Fill nulls: `df.with_columns(pl.col("param").fill_null(value))`
|
306
|
-
- Aggregations: Most aggregation functions handle nulls appropriately
|
307
|
-
|
308
|
-
3. **Special object keys**: Use the special keys `"run"`, `"cfg"`, and `"impl"` to include the actual
|
309
|
-
Run objects, configuration objects, or implementation objects in the DataFrame
|
310
|
-
- This allows direct access to the original objects for further operations
|
311
|
-
- You can combine regular data columns with object columns as needed
|
312
|
-
|
313
306
|
## Grouping Runs
|
314
307
|
|
315
308
|
The `group_by` method allows you to organize runs based on parameter values:
|
@@ -343,16 +336,25 @@ model_avg_loss = model_groups.agg(
|
|
343
336
|
)
|
344
337
|
```
|
345
338
|
|
346
|
-
The `group_by` method returns a `GroupBy` instance that maps keys to
|
339
|
+
The `group_by` method returns a `GroupBy` instance that maps keys to
|
340
|
+
`RunCollection` instances. This design allows you to:
|
347
341
|
|
348
|
-
- Work with each group as a separate `RunCollection` with all the
|
349
|
-
|
342
|
+
- Work with each group as a separate `RunCollection` with all the
|
343
|
+
filtering, sorting, and analysis capabilities
|
344
|
+
- Perform custom operations on each group that might not be expressible
|
345
|
+
as simple aggregation functions
|
350
346
|
- Chain additional operations on specific groups that interest you
|
351
|
-
- Implement multi-stage analysis workflows where you need to maintain
|
347
|
+
- Implement multi-stage analysis workflows where you need to maintain
|
348
|
+
the full run information at each step
|
352
349
|
|
353
|
-
To perform aggregations on the grouped data, use the `agg` method on
|
350
|
+
To perform aggregations on the grouped data, use the `agg` method on
|
351
|
+
the GroupBy instance. This transforms the grouped data into a DataFrame
|
352
|
+
with aggregated results.
|
353
|
+
You can define multiple aggregation functions to compute different
|
354
|
+
metrics across each group.
|
354
355
|
|
355
|
-
This approach preserves all information in each group, giving
|
356
|
+
This approach preserves all information in each group, giving
|
357
|
+
you maximum flexibility for downstream analysis.
|
356
358
|
|
357
359
|
## Type-Safe Run Collections
|
358
360
|
|