hydraflow 0.17.2__tar.gz → 0.18.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (110) hide show
  1. {hydraflow-0.17.2 → hydraflow-0.18.1}/PKG-INFO +1 -5
  2. {hydraflow-0.17.2 → hydraflow-0.18.1}/README.md +0 -3
  3. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/concepts.md +29 -14
  4. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/configuration.md +6 -2
  5. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/execution.md +18 -7
  6. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/index.md +2 -1
  7. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part1-applications/main-decorator.md +11 -4
  8. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/index.md +29 -14
  9. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/job-configuration.md +16 -7
  10. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part2-advanced/sweep-syntax.md +8 -3
  11. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/index.md +12 -5
  12. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/run-class.md +5 -2
  13. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/run-collection.md +43 -41
  14. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/advanced.md +6 -6
  15. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/analysis.md +46 -24
  16. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/applications.md +20 -9
  17. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/practical-tutorials/index.md +5 -2
  18. {hydraflow-0.17.2 → hydraflow-0.18.1}/pyproject.toml +2 -2
  19. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/collection.py +320 -16
  20. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/main.py +18 -1
  21. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run.py +33 -6
  22. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run_collection.py +2 -2
  23. hydraflow-0.18.1/src/hydraflow/utils/progress.py +90 -0
  24. hydraflow-0.18.1/tests/core/main/test_dry_run.py +26 -0
  25. hydraflow-0.18.1/tests/core/main/update.py +35 -0
  26. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run.py +13 -0
  27. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run_collection.py +8 -1
  28. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_collection.py +118 -0
  29. hydraflow-0.18.1/tests/executor/__init__.py +0 -0
  30. hydraflow-0.18.1/tests/utils/__init__.py +0 -0
  31. hydraflow-0.18.1/tests/utils/test_progress.py +34 -0
  32. hydraflow-0.17.2/tests/core/main/update.py +0 -35
  33. {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/devcontainer.json +0 -0
  34. {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/postCreate.sh +0 -0
  35. {hydraflow-0.17.2 → hydraflow-0.18.1}/.devcontainer/starship.toml +0 -0
  36. {hydraflow-0.17.2 → hydraflow-0.18.1}/.gitattributes +0 -0
  37. {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/ci.yaml +0 -0
  38. {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/docs.yaml +0 -0
  39. {hydraflow-0.17.2 → hydraflow-0.18.1}/.github/workflows/publish.yaml +0 -0
  40. {hydraflow-0.17.2 → hydraflow-0.18.1}/.gitignore +0 -0
  41. {hydraflow-0.17.2 → hydraflow-0.18.1}/LICENSE +0 -0
  42. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/index.md +0 -0
  43. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/getting-started/installation.md +0 -0
  44. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/index.md +0 -0
  45. {hydraflow-0.17.2 → hydraflow-0.18.1}/docs/part3-analysis/updating-runs.md +0 -0
  46. {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/example.py +0 -0
  47. {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/hydraflow.yaml +0 -0
  48. {hydraflow-0.17.2 → hydraflow-0.18.1}/examples/submit.py +0 -0
  49. {hydraflow-0.17.2 → hydraflow-0.18.1}/mkdocs.yaml +0 -0
  50. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/__init__.py +0 -0
  51. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/cli.py +0 -0
  52. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/__init__.py +0 -0
  53. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/context.py +0 -0
  54. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/group_by.py +0 -0
  55. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/io.py +0 -0
  56. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/core/run_info.py +0 -0
  57. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/__init__.py +0 -0
  58. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/aio.py +0 -0
  59. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/conf.py +0 -0
  60. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/io.py +0 -0
  61. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/job.py +0 -0
  62. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/executor/parser.py +0 -0
  63. {hydraflow-0.17.2 → hydraflow-0.18.1}/src/hydraflow/py.typed +0 -0
  64. {hydraflow-0.17.2/tests → hydraflow-0.18.1/src/hydraflow/utils}/__init__.py +0 -0
  65. {hydraflow-0.17.2/tests/cli → hydraflow-0.18.1/tests}/__init__.py +0 -0
  66. {hydraflow-0.17.2/tests/core → hydraflow-0.18.1/tests/cli}/__init__.py +0 -0
  67. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/app.py +0 -0
  68. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/conftest.py +0 -0
  69. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/hydraflow.yaml +0 -0
  70. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/submit.py +0 -0
  71. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_run.py +0 -0
  72. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_setup.py +0 -0
  73. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_show.py +0 -0
  74. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/cli/test_version.py +0 -0
  75. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/conftest.py +0 -0
  76. {hydraflow-0.17.2/tests/core/context → hydraflow-0.18.1/tests/core}/__init__.py +0 -0
  77. {hydraflow-0.17.2/tests/core/main → hydraflow-0.18.1/tests/core/context}/__init__.py +0 -0
  78. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/chdir.py +0 -0
  79. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/log_run.py +0 -0
  80. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/start_run.py +0 -0
  81. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_chdir.py +0 -0
  82. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_log_run.py +0 -0
  83. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/context/test_start_run.py +0 -0
  84. {hydraflow-0.17.2/tests/core/run → hydraflow-0.18.1/tests/core/main}/__init__.py +0 -0
  85. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/default.py +0 -0
  86. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/force_new_run.py +0 -0
  87. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/match_overrides.py +0 -0
  88. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/rerun_finished.py +0 -0
  89. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/skip_finished.py +0 -0
  90. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_default.py +0 -0
  91. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_force_new_run.py +0 -0
  92. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_main.py +0 -0
  93. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_match_overrides.py +0 -0
  94. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_rerun_finished.py +0 -0
  95. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_skip_finished.py +0 -0
  96. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/main/test_update.py +0 -0
  97. {hydraflow-0.17.2/tests/executor → hydraflow-0.18.1/tests/core/run}/__init__.py +0 -0
  98. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/run.py +0 -0
  99. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/run/test_run_info.py +0 -0
  100. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_group_by.py +0 -0
  101. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/core/test_io.py +0 -0
  102. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/conftest.py +0 -0
  103. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/echo.py +0 -0
  104. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/read.py +0 -0
  105. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_aio.py +0 -0
  106. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_args.py +0 -0
  107. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_conf.py +0 -0
  108. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_io.py +0 -0
  109. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_job.py +0 -0
  110. {hydraflow-0.17.2 → hydraflow-0.18.1}/tests/executor/test_parser.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hydraflow
3
- Version: 0.17.2
3
+ Version: 0.18.1
4
4
  Summary: HydraFlow seamlessly integrates Hydra and MLflow to streamline ML experiment management, combining Hydra's configuration management with MLflow's tracking capabilities.
5
5
  Project-URL: Documentation, https://daizutabi.github.io/hydraflow/
6
6
  Project-URL: Source, https://github.com/daizutabi/hydraflow
@@ -47,7 +47,6 @@ Requires-Dist: omegaconf>=2.3
47
47
  Requires-Dist: polars>=1.26
48
48
  Requires-Dist: python-ulid>=3.0.0
49
49
  Requires-Dist: rich>=13.9
50
- Requires-Dist: ruff>=0.11
51
50
  Requires-Dist: typer>=0.15
52
51
  Description-Content-Type: text/markdown
53
52
 
@@ -119,9 +118,6 @@ def app(run: Run, cfg: Config) -> None:
119
118
  # Your experiment code here
120
119
  print(f"Running with width={cfg.width}, height={cfg.height}")
121
120
 
122
- # Log metrics
123
- hydraflow.log_metric("area", cfg.width * cfg.height)
124
-
125
121
  if __name__ == "__main__":
126
122
  app()
127
123
  ```
@@ -66,9 +66,6 @@ def app(run: Run, cfg: Config) -> None:
66
66
  # Your experiment code here
67
67
  print(f"Running with width={cfg.width}, height={cfg.height}")
68
68
 
69
- # Log metrics
70
- hydraflow.log_metric("area", cfg.width * cfg.height)
71
-
72
69
  if __name__ == "__main__":
73
70
  app()
74
71
  ```
@@ -1,15 +1,20 @@
1
1
  # Core Concepts
2
2
 
3
- This page introduces the fundamental concepts of HydraFlow that form the foundation of the framework.
3
+ This page introduces the fundamental concepts of HydraFlow that
4
+ form the foundation of the framework.
4
5
 
5
6
  ## Design Principles
6
7
 
7
8
  HydraFlow is built on the following design principles:
8
9
 
9
- 1. **Type Safety** - Utilizing Python dataclasses for configuration type checking and IDE support
10
- 2. **Reproducibility** - Automatically tracking all experiment configurations for fully reproducible experiments
11
- 3. **Workflow Integration** - Creating a cohesive workflow by integrating Hydra's configuration management with MLflow's experiment tracking
12
- 4. **Analysis Capabilities** - Providing powerful APIs for easily analyzing experiment results
10
+ 1. **Type Safety** - Utilizing Python dataclasses for configuration
11
+ type checking and IDE support
12
+ 2. **Reproducibility** - Automatically tracking all experiment configurations
13
+ for fully reproducible experiments
14
+ 3. **Workflow Integration** - Creating a cohesive workflow by integrating
15
+ Hydra's configuration management with MLflow's experiment tracking
16
+ 4. **Analysis Capabilities** - Providing powerful APIs for easily
17
+ analyzing experiment results
13
18
 
14
19
  ## Key Components
15
20
 
@@ -17,7 +22,8 @@ HydraFlow consists of the following key components:
17
22
 
18
23
  ### Configuration Management
19
24
 
20
- HydraFlow uses a hierarchical configuration system based on OmegaConf and Hydra. This provides:
25
+ HydraFlow uses a hierarchical configuration system based on
26
+ OmegaConf and Hydra. This provides:
21
27
 
22
28
  - Type-safe configuration using Python dataclasses
23
29
  - Schema validation to ensure configuration correctness
@@ -36,11 +42,13 @@ class Config:
36
42
  epochs: int = 10
37
43
  ```
38
44
 
39
- This configuration class defines the structure and default values for your experiment, enabling type checking and auto-completion.
45
+ This configuration class defines the structure and default values
46
+ for your experiment, enabling type checking and auto-completion.
40
47
 
41
48
  ### Main Decorator
42
49
 
43
- The [`@hydraflow.main`][hydraflow.main] decorator defines the entry point for a HydraFlow application:
50
+ The [`@hydraflow.main`][hydraflow.main] decorator defines the entry
51
+ point for a HydraFlow application:
44
52
 
45
53
  ```python
46
54
  import hydraflow
@@ -64,7 +72,8 @@ This decorator provides:
64
72
 
65
73
  ### Workflow Automation
66
74
 
67
- HydraFlow allows you to automate experiment workflows using a YAML-based job definition system:
75
+ HydraFlow allows you to automate experiment workflows using a
76
+ YAML-based job definition system:
68
77
 
69
78
  ```yaml
70
79
  jobs:
@@ -98,11 +107,14 @@ python train.py -m "model=(small,large)_(v1,v2)"
98
107
 
99
108
  ### Analysis Tools
100
109
 
101
- After running experiments, HydraFlow provides powerful tools for accessing and analyzing results. These tools help you track, compare, and derive insights from your experiments.
110
+ After running experiments, HydraFlow provides powerful tools for accessing
111
+ and analyzing results. These tools help you track, compare, and derive
112
+ insights from your experiments.
102
113
 
103
114
  #### Working with Individual Runs
104
115
 
105
- For individual experiment analysis, HydraFlow provides the `Run` class, which represents a single experiment run:
116
+ For individual experiment analysis, HydraFlow provides the `Run` class,
117
+ which represents a single experiment run:
106
118
 
107
119
  ```python
108
120
  from hydraflow import Run
@@ -139,7 +151,8 @@ print(run.cfg.learning_rate) # IDE auto-completion works
139
151
 
140
152
  #### Comparing Multiple Runs
141
153
 
142
- For comparing multiple runs, HydraFlow offers the `RunCollection` class, which enables efficient analysis across runs:
154
+ For comparing multiple runs, HydraFlow offers the `RunCollection` class,
155
+ which enables efficient analysis across runs:
143
156
 
144
157
  ```python
145
158
  # Load multiple runs
@@ -164,11 +177,13 @@ Key features of experiment comparison:
164
177
 
165
178
  ## Summary
166
179
 
167
- These core concepts work together to provide a comprehensive framework for managing machine learning experiments:
180
+ These core concepts work together to provide a comprehensive framework
181
+ for managing machine learning experiments:
168
182
 
169
183
  1. **Configuration Management** - Type-safe configuration with Python dataclasses
170
184
  2. **Main Decorator** - The entry point that integrates Hydra and MLflow
171
185
  3. **Workflow Automation** - Reusable experiment definitions and advanced parameter sweeps
172
186
  4. **Analysis Tools** - Access, filter, and analyze experiment results
173
187
 
174
- Understanding these fundamental concepts will help you leverage the full power of HydraFlow for your machine learning projects.
188
+ Understanding these fundamental concepts will help you leverage the full power
189
+ of HydraFlow for your machine learning projects.
@@ -83,7 +83,9 @@ def train(run: Run, cfg: Config) -> None:
83
83
 
84
84
  ## Hydra Integration
85
85
 
86
- HydraFlow integrates closely with Hydra for configuration management. For detailed explanations of Hydra's capabilities, please refer to the [Hydra documentation](https://hydra.cc/docs/intro/).
86
+ HydraFlow integrates closely with Hydra for configuration management.
87
+ For detailed explanations of Hydra's capabilities, please refer to
88
+ the [Hydra documentation](https://hydra.cc/docs/intro/).
87
89
 
88
90
  HydraFlow leverages the following Hydra features, but does not modify their behavior:
89
91
 
@@ -100,7 +102,9 @@ When using HydraFlow, remember that:
100
102
  2. HydraFlow automatically registers your top-level dataclass with Hydra
101
103
  3. `@hydraflow.main` sets up the connection between your dataclass and Hydra
102
104
 
103
- For advanced Hydra features and detailed usage examples, we recommend consulting the official Hydra documentation after you become familiar with the basic HydraFlow concepts.
105
+ For advanced Hydra features and detailed usage examples, we recommend
106
+ consulting the official Hydra documentation after you become familiar
107
+ with the basic HydraFlow concepts.
104
108
 
105
109
  ## Best Practices
106
110
 
@@ -14,13 +14,18 @@ python train.py
14
14
 
15
15
  This will:
16
16
 
17
- 1. Set up an MLflow experiment with the same name as the Hydra job name (using `mlflow.set_experiment`). If the experiment doesn't exist, it will be created automatically
17
+ 1. Set up an MLflow experiment with the same name as the Hydra job name
18
+ (using `mlflow.set_experiment`). If the experiment doesn't exist,
19
+ it will be created automatically
18
20
  2. Create a new MLflow run or reuse an existing one based on the configuration
19
21
  3. Save the Hydra configuration as an MLflow artifact
20
22
  4. Execute your function decorated with `@hydraflow.main`
21
23
  5. Save only `*.log` files from Hydra's output directory as MLflow artifacts
22
24
 
23
- Note that any other artifacts (models, data files, etc.) must be explicitly saved by your code using MLflow's logging functions. The `chdir` option in the `@hydraflow.main` decorator can help with this by changing the working directory to the run's artifact directory, making file operations more convenient.
25
+ Note that any other artifacts (models, data files, etc.) must be explicitly
26
+ saved by your code using MLflow's logging functions. The `chdir` option in
27
+ the `@hydraflow.main` decorator can help with this by changing the working
28
+ directory to the run's artifact directory, making file operations more convenient.
24
29
 
25
30
  ## Command-line Override Syntax
26
31
 
@@ -62,7 +67,8 @@ of the specified parameters (2 learning rates × 3 model types).
62
67
 
63
68
  ### Advanced Parameter Sweeps
64
69
 
65
- For more complex parameter spaces, HydraFlow provides an extended sweep syntax that goes beyond Hydra's basic capabilities:
70
+ For more complex parameter spaces, HydraFlow provides an extended
71
+ sweep syntax that goes beyond Hydra's basic capabilities:
66
72
 
67
73
  ```bash
68
74
  # Define numerical ranges with start:stop:step
@@ -75,11 +81,13 @@ python train.py -m learning_rate=1:5:m # 0.001 to 0.005
75
81
  python train.py -m model=(cnn,transformer)_(small,large)
76
82
  ```
77
83
 
78
- See [Extended Sweep Syntax](../part2-advanced/sweep-syntax.md) for a complete reference on these powerful features.
84
+ See [Extended Sweep Syntax](../part2-advanced/sweep-syntax.md) for a
85
+ complete reference on these powerful features.
79
86
 
80
87
  ### Managing Complex Experiment Workflows
81
88
 
82
- HydraFlow provides CLI tools to work with multirun mode more efficiently than using long command lines:
89
+ HydraFlow provides CLI tools to work with multirun mode more efficiently
90
+ than using long command lines:
83
91
 
84
92
  ```bash
85
93
  # Define jobs in hydraflow.yaml
@@ -94,11 +102,14 @@ jobs:
94
102
  hydraflow run train
95
103
  ```
96
104
 
97
- This approach helps you organize complex experiments, track execution history, and make experiments more reproducible. For details on these advanced capabilities, see [Job Configuration](../part2-advanced/job-configuration.md) in Part 2.
105
+ This approach helps you organize complex experiments, track execution history,
106
+ and make experiments more reproducible. For details on these advanced capabilities,
107
+ see [Job Configuration](../part2-advanced/job-configuration.md) in Part 2.
98
108
 
99
109
  ## Output Organization
100
110
 
101
- By default, Hydra organizes outputs in the following directory structure for HydraFlow applications:
111
+ By default, Hydra organizes outputs in the following directory structure
112
+ for HydraFlow applications:
102
113
 
103
114
  ```
104
115
  ROOT_DIR/
@@ -55,7 +55,8 @@ if __name__ == "__main__":
55
55
 
56
56
  ## Practical Examples
57
57
 
58
- If you prefer learning by example, check out our [Practical Tutorials](../practical-tutorials/index.md) section, which includes:
58
+ If you prefer learning by example, check out our
59
+ [Practical Tutorials](../practical-tutorials/index.md) section, which includes:
59
60
 
60
61
  - [Creating Your First HydraFlow Application](../practical-tutorials/applications.md): A step-by-step guide to building a basic application
61
62
  - [Automating Complex Workflows](../practical-tutorials/advanced.md): How to define and execute complex experiment workflows
@@ -135,7 +135,10 @@ This default behavior improves efficiency by:
135
135
 
136
136
  ## Automatic Skipping of Completed Runs
137
137
 
138
- HydraFlow automatically skips runs that have already completed successfully. This is especially valuable in environments where jobs are automatically restarted after preemption. Without requiring any additional configuration, HydraFlow will:
138
+ HydraFlow automatically skips runs that have already completed successfully.
139
+ This is especially valuable in environments where jobs are automatically
140
+ restarted after preemption. Without requiring any additional configuration,
141
+ HydraFlow will:
139
142
 
140
143
  1. Identify already completed runs with the same configuration
141
144
  2. Skip re-execution of those runs
@@ -161,7 +164,9 @@ This automatic skipping behavior:
161
164
 
162
165
  ## Advanced Features
163
166
 
164
- The `hydraflow.main` decorator supports several keyword arguments that enhance its functionality. All these options are set to `False` by default and must be explicitly enabled when needed:
167
+ The `hydraflow.main` decorator supports several keyword arguments that
168
+ enhance its functionality. All these options are set to `False` by
169
+ default and must be explicitly enabled when needed:
165
170
 
166
171
  ### Working Directory Management (`chdir`)
167
172
 
@@ -187,7 +192,8 @@ This option is beneficial when:
187
192
 
188
193
  ### Forcing New Runs (`force_new_run`)
189
194
 
190
- Override the default run identification and reuse behavior by always creating a new run, even when identical configurations exist:
195
+ Override the default run identification and reuse behavior by always
196
+ creating a new run, even when identical configurations exist:
191
197
 
192
198
  ```python
193
199
  @hydraflow.main(Config, force_new_run=True)
@@ -206,7 +212,8 @@ This option is useful when:
206
212
 
207
213
  ### Rerunning Finished Experiments (`rerun_finished`)
208
214
 
209
- Override the automatic skipping of completed runs by explicitly allowing rerunning of experiments that have already finished:
215
+ Override the automatic skipping of completed runs by explicitly
216
+ allowing rerunning of experiments that have already finished:
210
217
 
211
218
  ```python
212
219
  @hydraflow.main(Config, rerun_finished=True)
@@ -1,10 +1,13 @@
1
1
  # Automating Workflows
2
2
 
3
- This section covers advanced techniques for automating and structuring multiple experiments in HydraFlow. It provides tools for defining complex parameter spaces and reusable experiment definitions.
3
+ This section covers advanced techniques for automating and structuring
4
+ multiple experiments in HydraFlow. It provides tools for defining complex
5
+ parameter spaces and reusable experiment definitions.
4
6
 
5
7
  ## Overview
6
8
 
7
- After creating your basic HydraFlow applications, the next step is to automate your experiment workflows. This includes:
9
+ After creating your basic HydraFlow applications, the next step is to
10
+ automate your experiment workflows. This includes:
8
11
 
9
12
  - Creating parameter sweeps across complex combinations
10
13
  - Defining reusable experiment configurations
@@ -14,20 +17,26 @@ After creating your basic HydraFlow applications, the next step is to automate y
14
17
 
15
18
  The main components for workflow automation in HydraFlow are:
16
19
 
17
- 1. **Extended Sweep Syntax**: A powerful syntax for defining parameter spaces beyond simple comma-separated values.
18
-
19
- 2. **Job Configuration**: A YAML-based definition system for creating reusable experiment workflows.
20
+ 1. **Extended Sweep Syntax**: A powerful syntax for defining parameter
21
+ spaces beyond simple comma-separated values.
22
+ 2. **Job Configuration**: A YAML-based definition system for creating
23
+ reusable experiment workflows.
20
24
 
21
25
  ## Practical Examples
22
26
 
23
- For hands-on examples of workflow automation, see our [Practical Tutorials](../practical-tutorials/index.md) section, specifically:
27
+ For hands-on examples of workflow automation, see our
28
+ [Practical Tutorials](../practical-tutorials/index.md) section, specifically:
24
29
 
25
- - [Automating Complex Workflows](../practical-tutorials/advanced.md): A tutorial that demonstrates how to use `hydraflow.yaml` to define and execute various types of workflows
26
- - [Analyzing Experiment Results](../practical-tutorials/analysis.md): Learn how to work with results from automated experiment runs
30
+ - [Automating Complex Workflows](../practical-tutorials/advanced.md): A tutorial
31
+ that demonstrates how to use `hydraflow.yaml` to define and execute
32
+ various types of workflows
33
+ - [Analyzing Experiment Results](../practical-tutorials/analysis.md): Learn
34
+ how to work with results from automated experiment runs
27
35
 
28
36
  ## Extended Sweep Syntax
29
37
 
30
- HydraFlow extends Hydra's sweep syntax to provide more powerful ways to define parameter spaces:
38
+ HydraFlow extends Hydra's sweep syntax to provide more powerful ways
39
+ to define parameter spaces:
31
40
 
32
41
  ```bash
33
42
  # Range of values (inclusive)
@@ -44,7 +53,8 @@ Learn more about these capabilities in [Sweep Syntax](sweep-syntax.md).
44
53
 
45
54
  ## Job Configuration
46
55
 
47
- For more complex experiment workflows, you can use HydraFlow's job configuration system:
56
+ For more complex experiment workflows, you can use HydraFlow's job
57
+ configuration system:
48
58
 
49
59
  ```yaml
50
60
  jobs:
@@ -61,7 +71,9 @@ jobs:
61
71
  all: test_data=validation
62
72
  ```
63
73
 
64
- This approach allows you to define reusable experiment definitions that can be executed with a single command. Learn more in [Job Configuration](job-configuration.md).
74
+ This approach allows you to define reusable experiment definitions that
75
+ can be executed with a single command. Learn more in
76
+ [Job Configuration](job-configuration.md).
65
77
 
66
78
  ## Executing Workflows
67
79
 
@@ -82,7 +94,10 @@ hydraflow run train_models seed=123
82
94
 
83
95
  In the following pages, we'll explore workflow automation in detail:
84
96
 
85
- - [Sweep Syntax](sweep-syntax.md): Learn about HydraFlow's extended syntax for defining parameter spaces.
86
- - [Job Configuration](job-configuration.md): Discover how to create reusable job definitions for your experiments.
97
+ - [Sweep Syntax](sweep-syntax.md): Learn about HydraFlow's extended
98
+ syntax for defining parameter spaces.
99
+ - [Job Configuration](job-configuration.md): Discover how to create
100
+ reusable job definitions for your experiments.
87
101
 
88
- After automating your experiments, you'll want to analyze the results using the tools covered in [Part 3: Analyzing Results](../part3-analysis/index.md).
102
+ After automating your experiments, you'll want to analyze the results
103
+ using the tools covered in [Part 3: Analyzing Results](../part3-analysis/index.md).
@@ -64,7 +64,8 @@ The specified function will be imported and called with the parameters.
64
64
 
65
65
  ### `submit`
66
66
 
67
- The `submit` command collects all parameter combinations into a text file and passes this file to the specified command:
67
+ The `submit` command collects all parameter combinations into a text
68
+ file and passes this file to the specified command:
68
69
 
69
70
  ```yaml
70
71
  jobs:
@@ -91,7 +92,9 @@ The key difference between `run` and `submit`:
91
92
  - `run`: Executes the command once per parameter combination
92
93
  - `submit`: Executes the command once, with all parameter combinations provided in a file
93
94
 
94
- This gives you complete flexibility in how parameter combinations are processed. Your handler script can implement any logic - from simple sequential processing to complex distributed execution across a cluster.
95
+ This gives you complete flexibility in how parameter combinations are
96
+ processed. Your handler script can implement any logic - from simple
97
+ sequential processing to complex distributed execution across a cluster.
95
98
 
96
99
  ## Parameter Sets
97
100
 
@@ -214,14 +217,18 @@ jobs:
214
217
  add: hydra/launcher=submitit hydra.launcher.submitit.cpus_per_task=8
215
218
  ```
216
219
 
217
- When a set has its own `add` parameter, it is merged with the job-level `add` parameter.
218
- If the same parameter key exists in both the job-level and set-level `add`, the set-level value takes precedence.
220
+ When a set has its own `add` parameter, it is merged with
221
+ the job-level `add` parameter.
222
+ If the same parameter key exists in both the job-level and set-level
223
+ `add`, the set-level value takes precedence.
219
224
 
220
225
  For example, with the configuration above:
226
+
221
227
  - The first set uses: `hydra/launcher=joblib hydra.launcher.n_jobs=2`
222
228
  - The second set uses: `hydra/launcher=submitit hydra.launcher.n_jobs=2 hydra.launcher.submitit.cpus_per_task=8`
223
229
 
224
- Notice how `hydra/launcher` is overridden by the set-level value, while `hydra.launcher.n_jobs` from the job-level is retained.
230
+ Notice how `hydra/launcher` is overridden by the set-level value,
231
+ while `hydra.launcher.n_jobs` from the job-level is retained.
225
232
 
226
233
  This behavior allows you to:
227
234
 
@@ -229,11 +236,13 @@ This behavior allows you to:
229
236
  2. Override or add specific parameters at the set level
230
237
  3. Keep all non-conflicting parameters from both levels
231
238
 
232
- This merging behavior makes it easy to maintain common configuration options while customizing specific aspects for different parameter sets.
239
+ This merging behavior makes it easy to maintain common configuration
240
+ options while customizing specific aspects for different parameter sets.
233
241
 
234
242
  ## Summary
235
243
 
236
- HydraFlow's job configuration system provides a powerful way to define and manage complex parameter sweeps:
244
+ HydraFlow's job configuration system provides a powerful way to define
245
+ and manage complex parameter sweeps:
237
246
 
238
247
  1. **Execution Commands**:
239
248
 
@@ -24,7 +24,8 @@ python train.py -m model=medium
24
24
  python train.py -m model=large
25
25
  ```
26
26
 
27
- When using multiple parameters with `each`, all possible combinations (cartesian product) will be generated:
27
+ When using multiple parameters with `each`, all possible
28
+ combinations (cartesian product) will be generated:
28
29
 
29
30
  ```yaml
30
31
  jobs:
@@ -275,6 +276,10 @@ HydraFlow's extended sweep syntax provides several powerful features for paramet
275
276
  5. **Parentheses grouping** - Create combinations of values and nested structures
276
277
  6. **Pipe operator** - Run multiple independent parameter sweeps in the same job
277
278
 
278
- All of these can be combined to create complex, expressive parameter sweeps with minimal configuration. Remember that using the `each` keyword creates a cartesian product of all parameters (all possible combinations), while the pipe operator (`|`) creates separate, independent parameter sweeps.
279
+ All of these can be combined to create complex, expressive parameter sweeps
280
+ with minimal configuration. Remember that using the `each` keyword creates a cartesian
281
+ product of all parameters (all possible combinations), while the pipe
282
+ operator (`|`) creates separate, independent parameter sweeps.
279
283
 
280
- When using these features, HydraFlow will automatically generate the appropriate Hydra multirun commands with the `-m` flag.
284
+ When using these features, HydraFlow will automatically generate the appropriate
285
+ Hydra multirun commands with the `-m` flag.
@@ -33,9 +33,12 @@ The main components of HydraFlow's analysis tools are:
33
33
 
34
34
  ## Practical Examples
35
35
 
36
- For hands-on examples of experiment analysis, check out our [Practical Tutorials](../practical-tutorials/index.md) section, specifically:
36
+ For hands-on examples of experiment analysis, check out our
37
+ [Practical Tutorials](../practical-tutorials/index.md) section, specifically:
37
38
 
38
- - [Analyzing Experiment Results](../practical-tutorials/analysis.md): A detailed tutorial demonstrating how to load, filter, group, and analyze experiment data using HydraFlow's APIs
39
+ - [Analyzing Experiment Results](../practical-tutorials/analysis.md): A
40
+ detailed tutorial demonstrating how to load, filter, group, and analyze
41
+ experiment data using HydraFlow's APIs
39
42
 
40
43
  ## Basic Analysis Workflow
41
44
 
@@ -66,7 +69,8 @@ best_run = df.sort("accuracy", descending=True).first()
66
69
 
67
70
  ## Finding and Loading Runs
68
71
 
69
- HydraFlow provides utilities to easily find and load runs from your MLflow tracking directory:
72
+ HydraFlow provides utilities to easily find and load runs from your
73
+ MLflow tracking directory:
70
74
 
71
75
  ```python
72
76
  from hydraflow import Run
@@ -83,7 +87,8 @@ runs = Run.load(iter_run_dirs(tracking_dir, "my_experiment"))
83
87
  runs = Run.load(iter_run_dirs(tracking_dir, ["training_*", "finetuning_*"]))
84
88
  ```
85
89
 
86
- This approach makes it easy to gather all relevant runs for analysis without having to manually specify each run directory.
90
+ This approach makes it easy to gather all relevant runs for analysis
91
+ without having to manually specify each run directory.
87
92
 
88
93
  ## Type-Safe Analysis
89
94
 
@@ -137,7 +142,9 @@ model = run.impl.load_model()
137
142
  results = run.impl.analyze_performance()
138
143
  ```
139
144
 
140
- The analysis capabilities covered in Part 3 are designed to work seamlessly with the experiment definitions from [Part 1](../part1-applications/index.md) and the advanced workflow automation from [Part 2](../part2-advanced/index.md).
145
+ The analysis capabilities covered in Part 3 are designed to work
146
+ seamlessly with the experiment definitions from [Part 1](../part1-applications/index.md)
147
+ and the advanced workflow automation from [Part 2](../part2-advanced/index.md).
141
148
 
142
149
  ## What's Next
143
150
 
@@ -210,7 +210,8 @@ runs = Run.load(run_dirs, n_jobs=-1) # Use all available CPU cores
210
210
 
211
211
  ### Finding Runs with `iter_run_dirs`
212
212
 
213
- HydraFlow provides the [`iter_run_dirs`][hydraflow.core.io.iter_run_dirs] function to easily discover runs in your MLflow tracking directory:
213
+ HydraFlow provides the [`iter_run_dirs`][hydraflow.core.io.iter_run_dirs]
214
+ function to easily discover runs in your MLflow tracking directory:
214
215
 
215
216
  ```python
216
217
  from hydraflow.core.io import iter_run_dirs
@@ -235,7 +236,9 @@ def filter_experiments(name: str) -> bool:
235
236
  runs = Run.load(iter_run_dirs(tracking_dir, filter_experiments))
236
237
  ```
237
238
 
238
- The `iter_run_dirs` function yields paths to run directories that can be directly passed to `Run.load`. This makes it easy to find and load runs based on experiment names or custom filtering criteria.
239
+ The `iter_run_dirs` function yields paths to run directories that can be
240
+ directly passed to `Run.load`. This makes it easy to find and load runs
241
+ based on experiment names or custom filtering criteria.
239
242
 
240
243
  ## Best Practices
241
244
 
@@ -7,19 +7,27 @@ instances, making it easy to compare and extract insights from your experiments.
7
7
 
8
8
  ## Architecture
9
9
 
10
- `RunCollection` is built on top of the more general [`Collection`][hydraflow.core.collection.Collection]
11
- class, which provides a flexible foundation for working with sequences of items. This architecture offers several benefits:
12
-
13
- 1. **Consistent Interface**: All collection-based classes in HydraFlow share a common interface and behavior
14
- 2. **Code Reuse**: Core functionality is implemented once in the base class and inherited by specialized collections
15
- 3. **Extensibility**: New collection types can easily be created for different item types
16
- 4. **Type Safety**: Generic type parameters ensure type checking throughout the collection hierarchy
17
-
18
- The `Collection` class implements the Python `Sequence` protocol, allowing it to be used like standard Python
19
- collections (lists, tuples) while providing specialized methods for filtering, grouping, and data extraction.
20
-
21
- `RunCollection` extends this foundation with run-specific functionality, particularly for working with MLflow
22
- experiment data. This layered design separates generic collection behavior from domain-specific operations.
10
+ `RunCollection` is built on top of the more general
11
+ [`Collection`][hydraflow.core.collection.Collection]
12
+ class, which provides a flexible foundation for working with sequences
13
+ of items. This architecture offers several benefits:
14
+
15
+ 1. **Consistent Interface**: All collection-based classes in HydraFlow
16
+ share a common interface and behavior
17
+ 2. **Code Reuse**: Core functionality is implemented once in the base
18
+ class and inherited by specialized collections
19
+ 3. **Extensibility**: New collection types can easily be created
20
+ for different item types
21
+ 4. **Type Safety**: Generic type parameters ensure type checking
22
+ throughout the collection hierarchy
23
+
24
+ The `Collection` class implements the Python `Sequence` protocol,
25
+ allowing it to be used like standard Python collections (lists, tuples)
26
+ while providing specialized methods for filtering, grouping, and data extraction.
27
+
28
+ `RunCollection` extends this foundation with run-specific functionality,
29
+ particularly for working with MLflow experiment data. This layered
30
+ design separates generic collection behavior from domain-specific operations.
23
31
 
24
32
  ## Creating a Run Collection
25
33
 
@@ -243,17 +251,19 @@ model_types = runs.unique("model_type")
243
251
  num_model_types = runs.n_unique("model_type")
244
252
  ```
245
253
 
246
- All data extraction methods (`to_list`, `to_numpy`, `to_series`, etc.) support both static and callable default values,
247
- matching the behavior of the `Run.get` method. When using a callable default, the function receives
248
- the Run instance as an argument, allowing you to:
254
+ All data extraction methods (`to_list`, `to_numpy`, `to_series`, etc.)
255
+ support both static and callable default values,
256
+ matching the behavior of the `Run.get` method. When using a callable default,
257
+ the function receives the Run instance as an argument, allowing you to:
249
258
 
250
259
  - Implement fallback logic for missing parameters
251
260
  - Create derived values based on multiple parameters
252
261
  - Handle varying configuration schemas across different experiments
253
262
  - Apply transformations to the raw parameter values
254
263
 
255
- This makes it much easier to work with heterogeneous collections of runs that might have different
256
- parameter sets or evolving configuration schemas.
264
+ This makes it much easier to work with heterogeneous collections of
265
+ runs that might have different parameter sets or evolving configuration
266
+ schemas.
257
267
 
258
268
  ## Converting to DataFrame
259
269
 
@@ -293,23 +303,6 @@ filled_df = missing_values_df.with_columns(
293
303
  )
294
304
  ```
295
305
 
296
- The `to_frame` method provides several ways to handle missing data:
297
-
298
- 1. **defaults parameter**: Provide static or callable default values for specific keys
299
- - Static values: `defaults={"param": value}`
300
- - Callable values: `defaults={"param": lambda run: computed_value}`
301
-
302
- 2. **None values**: Parameters without defaults are represented as `None` (null) in the DataFrame
303
- - This lets you use Polars operations for handling null values:
304
- - Filter: `df.filter(pl.col("param").is_not_null())`
305
- - Fill nulls: `df.with_columns(pl.col("param").fill_null(value))`
306
- - Aggregations: Most aggregation functions handle nulls appropriately
307
-
308
- 3. **Special object keys**: Use the special keys `"run"`, `"cfg"`, and `"impl"` to include the actual
309
- Run objects, configuration objects, or implementation objects in the DataFrame
310
- - This allows direct access to the original objects for further operations
311
- - You can combine regular data columns with object columns as needed
312
-
313
306
  ## Grouping Runs
314
307
 
315
308
  The `group_by` method allows you to organize runs based on parameter values:
@@ -343,16 +336,25 @@ model_avg_loss = model_groups.agg(
343
336
  )
344
337
  ```
345
338
 
346
- The `group_by` method returns a `GroupBy` instance that maps keys to `RunCollection` instances. This design allows you to:
339
+ The `group_by` method returns a `GroupBy` instance that maps keys to
340
+ `RunCollection` instances. This design allows you to:
347
341
 
348
- - Work with each group as a separate `RunCollection` with all the filtering, sorting, and analysis capabilities
349
- - Perform custom operations on each group that might not be expressible as simple aggregation functions
342
+ - Work with each group as a separate `RunCollection` with all the
343
+ filtering, sorting, and analysis capabilities
344
+ - Perform custom operations on each group that might not be expressible
345
+ as simple aggregation functions
350
346
  - Chain additional operations on specific groups that interest you
351
- - Implement multi-stage analysis workflows where you need to maintain the full run information at each step
347
+ - Implement multi-stage analysis workflows where you need to maintain
348
+ the full run information at each step
352
349
 
353
- To perform aggregations on the grouped data, use the `agg` method on the GroupBy instance. This transforms the grouped data into a DataFrame with aggregated results. You can define multiple aggregation functions to compute different metrics across each group.
350
+ To perform aggregations on the grouped data, use the `agg` method on
351
+ the GroupBy instance. This transforms the grouped data into a DataFrame
352
+ with aggregated results.
353
+ You can define multiple aggregation functions to compute different
354
+ metrics across each group.
354
355
 
355
- This approach preserves all information in each group, giving you maximum flexibility for downstream analysis.
356
+ This approach preserves all information in each group, giving
357
+ you maximum flexibility for downstream analysis.
356
358
 
357
359
  ## Type-Safe Run Collections
358
360