PyPI - hydraflow - Versions diffs - 0.15.1__tar.gz → 0.16.0__tar.gz - Mend

hydraflow 0.15.1tar.gz → 0.16.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

{hydraflow-0.15.1 → hydraflow-0.16.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hydraflow
-Version: 0.15.1
+Version: 0.16.0
 Summary: HydraFlow seamlessly integrates Hydra and MLflow to streamline ML experiment management, combining Hydra's configuration management with MLflow's tracking capabilities.
 Project-URL: Documentation, https://daizutabi.github.io/hydraflow/
 Project-URL: Source, https://github.com/daizutabi/hydraflow
@@ -51,7 +51,7 @@ Requires-Dist: ruff>=0.11
 Requires-Dist: typer>=0.15
 Description-Content-Type: text/markdown
-# Hydraflow
+# HydraFlow
 [![PyPI Version][pypi-v-image]][pypi-v-link]
 [![Build Status][GHAction-image]][GHAction-link]
@@ -60,6 +60,7 @@ Description-Content-Type: text/markdown
 [![Python Version][python-v-image]][python-v-link]
 <!-- Badges -->
 [pypi-v-image]: https://img.shields.io/pypi/v/hydraflow.svg
 [pypi-v-link]: https://pypi.org/project/hydraflow/
 [GHAction-image]: https://github.com/daizutabi/hydraflow/actions/workflows/ci.yaml/badge.svg?branch=main&event=push
@@ -73,117 +74,125 @@ Description-Content-Type: text/markdown
 ## Overview
-Hydraflow is a library designed to seamlessly integrate
-[Hydra](https://hydra.cc/) and [MLflow](https://mlflow.org/), making it easier to
-manage and track machine learning experiments. By combining the flexibility of
-Hydra's configuration management with the robust experiment tracking capabilities
-of MLflow, Hydraflow provides a comprehensive solution for managing complex
-machine learning workflows.
+HydraFlow seamlessly integrates [Hydra](https://hydra.cc/) and [MLflow](https://mlflow.org/) to streamline machine learning experiment workflows. By combining Hydra's powerful configuration management with MLflow's robust experiment tracking, HydraFlow provides a comprehensive solution for defining, executing, and analyzing machine learning experiments.
+## Design Principles
+HydraFlow is built on the following design principles:
+1. **Type Safety** - Utilizing Python dataclasses for configuration type checking and IDE support
+2. **Reproducibility** - Automatically tracking all experiment configurations for fully reproducible experiments
+3. **Analysis Capabilities** - Providing powerful APIs for easily analyzing experiment results
+4. **Workflow Integration** - Creating a cohesive workflow by integrating Hydra's configuration management with MLflow's experiment tracking
 ## Key Features
-- **Configuration Management**: Utilize Hydra's advanced configuration management
-  to handle complex parameter sweeps and experiment setups.
-- **Experiment Tracking**: Leverage MLflow's tracking capabilities to log parameters,
-  metrics, and artifacts for each run.
-- **Artifact Management**: Automatically log and manage artifacts, such as model
-  checkpoints and configuration files, with MLflow.
-- **Seamless Integration**: Easily integrate Hydra and MLflow in your machine learning
-  projects with minimal setup.
-- **Rich CLI Interface**: Command-line tools for managing experiments and viewing results.
-- **Cross-Platform Support**: Works consistently across different operating systems.
+- **Type-safe Configuration Management** - Define experiment parameters using Python dataclasses with full IDE support and validation
+- **Seamless Hydra-MLflow Integration** - Automatically register configurations with Hydra and track experiments with MLflow
+- **Advanced Parameter Sweeps** - Define complex parameter spaces using extended sweep syntax for numerical ranges, combinations, and SI prefixes
+- **Workflow Automation** - Create reusable experiment workflows with YAML-based job definitions
+- **Powerful Analysis Tools** - Filter, group, and analyze experiment results with type-aware APIs
+- **Custom Implementation Support** - Extend experiment analysis with domain-specific functionality
 ## Installation
-You can install Hydraflow via pip:
 ```bash
 pip install hydraflow
 ```
 **Requirements:** Python 3.13+
-## Quick Start
-Here is a simple example to get you started with Hydraflow:
+## Quick Example
 ```python
-from __future__ import annotations
 from dataclasses import dataclass
-from typing import TYPE_CHECKING
+from mlflow.entities import Run
 import hydraflow
-import mlflow
-if TYPE_CHECKING:
-    from mlflow.entities import Run
+@dataclass
+class Config:
+    width: int = 1024
+    height: int = 768
+@hydraflow.main(Config)
+def app(run: Run, cfg: Config) -> None:
+    # Your experiment code here
+    print(f"Running with width={cfg.width}, height={cfg.height}")
+    # Log metrics
+    hydraflow.log_metric("area", cfg.width * cfg.height)
+if __name__ == "__main__":
+    app()
+```
+Execute a parameter sweep with:
+```bash
+python app.py -m width=800,1200 height=600,900
+```
+## Core Components
+HydraFlow consists of the following key components:
+### Configuration Management
+Define type-safe configurations using Python dataclasses:
+```python
 @dataclass
 class Config:
-    """Configuration for the ML training experiment."""
-    # Training hyperparameters
     learning_rate: float = 0.001
     batch_size: int = 32
     epochs: int = 10
+```
-    # Model architecture parameters
-    hidden_size: int = 128
-    dropout: float = 0.1
-    # Dataset parameters
-    train_size: float = 0.8
-    random_seed: int = 42
+### Main Decorator
+The `@hydraflow.main` decorator integrates Hydra and MLflow:
+```python
 @hydraflow.main(Config)
-def app(run: Run, cfg: Config):
-    """Train a model with the given configuration.
-    This example demonstrates how to:
+def train(run: Run, cfg: Config) -> None:
+    # Your experiment code
+```
-    1. Define a configuration using dataclasses
-    2. Use Hydraflow to integrate with MLflow
-    3. Track metrics and parameters automatically
+### Workflow Automation
-    Args:
-        run: MLflow run for the experiment corresponding to the Hydra app.
-            This `Run` instance is automatically created by Hydraflow.
-        cfg: Configuration for the experiment's run.
-            This `Config` instance is originally defined by Hydra, and then
-            automatically passed to the app by Hydraflow.
-    """
-    # Training loop
-    for epoch in range(cfg.epochs):
-        # Simulate training and validation
-        train_loss = 1.0 / (epoch + 1)
-        val_loss = 1.1 / (epoch + 1)
+Define reusable experiment workflows in YAML:
-        # Log metrics to MLflow
-        mlflow.log_metrics({
-            "train_loss": train_loss,
-            "val_loss": val_loss
-        }, step=epoch)
+```yaml
+jobs:
+  train_models:
+    run: python train.py
+    sets:
+      - each: model=small,medium,large
+        all: learning_rate=0.001,0.01,0.1
+```
-        print(f"Epoch {epoch}: train_loss={train_loss:.4f}, val_loss={val_loss:.4f}")
+### Analysis Tools
+Analyze experiment results with powerful APIs:
-if __name__ == "__main__":
-    app()
-```
+```python
+from hydraflow import Run, iter_run_dirs
-This example demonstrates:
+# Load runs
+runs = Run.load(iter_run_dirs("mlruns"))
-- Configuration management with Hydra
-- Automatic experiment tracking with MLflow
-- Parameter logging and metric tracking
-- Type-safe configuration with dataclasses
+# Filter and analyze
+best_runs = runs.filter(model_type="transformer").to_frame("learning_rate", "accuracy")
+```
 ## Documentation
-For detailed documentation, including advanced usage examples and API reference,
-visit our [documentation site](https://daizutabi.github.io/hydraflow/).
+For detailed documentation, visit our [documentation site](https://daizutabi.github.io/hydraflow/):
+- [Getting Started](https://daizutabi.github.io/hydraflow/getting-started/) - Installation and core concepts
+- [Practical Tutorials](https://daizutabi.github.io/hydraflow/practical-tutorials/) - Learn through hands-on examples
+- [User Guide](https://daizutabi.github.io/hydraflow/part1-applications/) - Detailed documentation of HydraFlow's capabilities
+- [API Reference](https://daizutabi.github.io/hydraflow/api/hydraflow/) - Complete API documentation
 ## Contributing
@@ -191,4 +200,4 @@ We welcome contributions! Please see our [contributing guide](CONTRIBUTING.md) f
 ## License
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

hydraflow-0.16.0/README.md ADDED Viewed

@@ -0,0 +1,150 @@
+# HydraFlow
+[![PyPI Version][pypi-v-image]][pypi-v-link]
+[![Build Status][GHAction-image]][GHAction-link]
+[![Coverage Status][codecov-image]][codecov-link]
+[![Documentation Status][docs-image]][docs-link]
+[![Python Version][python-v-image]][python-v-link]
+<!-- Badges -->
+[pypi-v-image]: https://img.shields.io/pypi/v/hydraflow.svg
+[pypi-v-link]: https://pypi.org/project/hydraflow/
+[GHAction-image]: https://github.com/daizutabi/hydraflow/actions/workflows/ci.yaml/badge.svg?branch=main&event=push
+[GHAction-link]: https://github.com/daizutabi/hydraflow/actions?query=event%3Apush+branch%3Amain
+[codecov-image]: https://codecov.io/github/daizutabi/hydraflow/coverage.svg?branch=main
+[codecov-link]: https://codecov.io/github/daizutabi/hydraflow?branch=main
+[docs-image]: https://img.shields.io/badge/docs-latest-blue.svg
+[docs-link]: https://daizutabi.github.io/hydraflow/
+[python-v-image]: https://img.shields.io/pypi/pyversions/hydraflow.svg
+[python-v-link]: https://pypi.org/project/hydraflow
+## Overview
+HydraFlow seamlessly integrates [Hydra](https://hydra.cc/) and [MLflow](https://mlflow.org/) to streamline machine learning experiment workflows. By combining Hydra's powerful configuration management with MLflow's robust experiment tracking, HydraFlow provides a comprehensive solution for defining, executing, and analyzing machine learning experiments.
+## Design Principles
+HydraFlow is built on the following design principles:
+1. **Type Safety** - Utilizing Python dataclasses for configuration type checking and IDE support
+2. **Reproducibility** - Automatically tracking all experiment configurations for fully reproducible experiments
+3. **Analysis Capabilities** - Providing powerful APIs for easily analyzing experiment results
+4. **Workflow Integration** - Creating a cohesive workflow by integrating Hydra's configuration management with MLflow's experiment tracking
+## Key Features
+- **Type-safe Configuration Management** - Define experiment parameters using Python dataclasses with full IDE support and validation
+- **Seamless Hydra-MLflow Integration** - Automatically register configurations with Hydra and track experiments with MLflow
+- **Advanced Parameter Sweeps** - Define complex parameter spaces using extended sweep syntax for numerical ranges, combinations, and SI prefixes
+- **Workflow Automation** - Create reusable experiment workflows with YAML-based job definitions
+- **Powerful Analysis Tools** - Filter, group, and analyze experiment results with type-aware APIs
+- **Custom Implementation Support** - Extend experiment analysis with domain-specific functionality
+## Installation
+```bash
+pip install hydraflow
+```
+**Requirements:** Python 3.13+
+## Quick Example
+```python
+from dataclasses import dataclass
+from mlflow.entities import Run
+import hydraflow
+@dataclass
+class Config:
+    width: int = 1024
+    height: int = 768
+@hydraflow.main(Config)
+def app(run: Run, cfg: Config) -> None:
+    # Your experiment code here
+    print(f"Running with width={cfg.width}, height={cfg.height}")
+    # Log metrics
+    hydraflow.log_metric("area", cfg.width * cfg.height)
+if __name__ == "__main__":
+    app()
+```
+Execute a parameter sweep with:
+```bash
+python app.py -m width=800,1200 height=600,900
+```
+## Core Components
+HydraFlow consists of the following key components:
+### Configuration Management
+Define type-safe configurations using Python dataclasses:
+```python
+@dataclass
+class Config:
+    learning_rate: float = 0.001
+    batch_size: int = 32
+    epochs: int = 10
+```
+### Main Decorator
+The `@hydraflow.main` decorator integrates Hydra and MLflow:
+```python
+@hydraflow.main(Config)
+def train(run: Run, cfg: Config) -> None:
+    # Your experiment code
+```
+### Workflow Automation
+Define reusable experiment workflows in YAML:
+```yaml
+jobs:
+  train_models:
+    run: python train.py
+    sets:
+      - each: model=small,medium,large
+        all: learning_rate=0.001,0.01,0.1
+```
+### Analysis Tools
+Analyze experiment results with powerful APIs:
+```python
+from hydraflow import Run, iter_run_dirs
+# Load runs
+runs = Run.load(iter_run_dirs("mlruns"))
+# Filter and analyze
+best_runs = runs.filter(model_type="transformer").to_frame("learning_rate", "accuracy")
+```
+## Documentation
+For detailed documentation, visit our [documentation site](https://daizutabi.github.io/hydraflow/):
+- [Getting Started](https://daizutabi.github.io/hydraflow/getting-started/) - Installation and core concepts
+- [Practical Tutorials](https://daizutabi.github.io/hydraflow/practical-tutorials/) - Learn through hands-on examples
+- [User Guide](https://daizutabi.github.io/hydraflow/part1-applications/) - Detailed documentation of HydraFlow's capabilities
+- [API Reference](https://daizutabi.github.io/hydraflow/api/hydraflow/) - Complete API documentation
+## Contributing
+We welcome contributions! Please see our [contributing guide](CONTRIBUTING.md) for details.
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

{hydraflow-0.15.1 → hydraflow-0.16.0}/docs/part3-analysis/run-class.md RENAMED Viewed

@@ -56,9 +56,22 @@ learning_rate = run.get("learning_rate")
 # Nested access with dot notation
 model_type = run.get("model.type")
+# Alternatively, use double underscore notation for nested access
+model_type = run.get("model__type")  # Equivalent to "model.type"
 # Access implementation attributes or run info
 metric_value = run.get("accuracy")  # From impl or cfg
 run_id = run.get("run_id")  # From RunInfo
+# Provide a default value if the key doesn't exist
+batch_size = run.get("batch_size", 32)
+# Use a callable as default to dynamically generate values based on the run
+# This is useful for derived parameters or conditional defaults
+lr = run.get("learning_rate", default=lambda r: r.get("base_lr", 0.01) / 10)
+# Complex default logic based on other parameters
+steps = run.get("steps", default=lambda r: r.get("epochs", 10) * r.get("steps_per_epoch", 100))
 ```
 The `get` method searches for values in the following order:
@@ -69,6 +82,20 @@ The `get` method searches for values in the following order:
 This provides a unified access interface regardless of where the data is stored.
+The double underscore notation (`__`) is automatically converted to dot notation (`.`) internally,
+making it useful for nested parameter access, especially when using keyword arguments in methods
+that don't allow dots in parameter names.
+When providing a default value, you can use either a static value or a callable function.
+If you provide a callable, it will receive the Run instance as an argument, allowing you to
+create context-dependent default values that can access other run parameters or properties.
+This is particularly useful for:
+- Creating derived parameters that don't exist in the original configuration
+- Handling schema evolution across different experiment iterations
+- Providing fallbacks that depend on other configuration values
+- Implementing conditional logic for parameter defaults
 ## Type-Safe Configuration Access
 For better IDE integration and type checking, you can specify the configuration

{hydraflow-0.15.1 → hydraflow-0.16.0}/docs/part3-analysis/run-collection.md RENAMED Viewed

@@ -86,6 +86,11 @@ specific_runs = runs.filter(
 # Use a tuple to specify the parameter name and value
 nested_filter = runs.filter(("model.hidden_size", 512))
+# Filter with double underscore notation for nested parameters
+# This is often more convenient with keyword arguments
+nested_filter = runs.filter(model__hidden_size=512)  # Equivalent to "model.hidden_size"
+nested_filter = runs.filter(model__encoder__num_layers=6)  # For deeply nested parameters
 # Filter with tuple for range values (inclusive)
 lr_range = runs.filter(learning_rate=(0.0001, 0.01))
@@ -99,6 +104,11 @@ def is_large_image(run: Run):
 good_runs = runs.filter(predicate=is_large_image)
 ```
+The double underscore notation (`__`) is particularly useful for accessing nested
+configuration parameters with keyword arguments, as it's automatically converted to
+dot notation (`.`) internally. This allows you to write more natural and Pythonic
+filtering expressions, especially for deeply nested configurations.
 ## Advanced Filtering
 The `filter` method supports more complex filtering patterns:
@@ -113,8 +123,23 @@ complex_filter = runs.filter(
 # Chained filtering
 final_runs = runs.filter(model_type="transformer").filter(learning_rate=0.001)
+# Advanced filtering using predicate functions with callable defaults
+# This example filters runs based on learning rate efficiency (lr * batch_size)
+# Even if some runs are missing one parameter, the default logic provides values
+def has_efficient_lr(run: Run) -> bool:
+    lr = run.get("learning_rate", default=lambda r: r.get("base_lr", 0.01) * r.get("lr_multiplier", 1.0))
+    batch_size = run.get("batch_size", default=lambda r: r.get("default_batch_size", 32))
+    return lr * batch_size < 0.5
+# Apply the complex predicate
+efficient_runs = runs.filter(predicate=has_efficient_lr)
 ```
+The combination of predicate functions with callable defaults in `get` enables sophisticated
+filtering logic that can handle missing parameters and varied configuration schemas across
+different experiment runs.
 ## Sorting Runs
 The `sort` method allows you to sort runs based on specific criteria:
@@ -154,9 +179,47 @@ RunCollection provides several methods to extract specific data from runs:
 # Extract values for a specific key as a list
 learning_rates = runs.to_list("learning_rate")
+# Extract values with a static default for missing values
+batch_sizes = runs.to_list("batch_size", default=32)
+# Extract values with a callable default that dynamically computes values
+# This is particularly useful for handling missing parameters or derived values
+accuracies = runs.to_list("accuracy", default=lambda run: run.get("val_accuracy", 0.0) * 0.9)
 # Extract values as a NumPy array
 batch_sizes = runs.to_numpy("batch_size")
+# Extract with callable default for complex scenarios
+learning_rates = runs.to_numpy(
+    "learning_rate",
+    default=lambda run: run.get("base_lr", 0.01) * run.get("lr_schedule_factor", 1.0)
+)
+# Extract values as a Polars Series
+lr_series = runs.to_series("learning_rate")
+# Extract with a custom name for the series
+model_series = runs.to_series("model_type", name="Model Architecture")
+# Extract with callable default and custom name
+effective_lr = runs.to_series(
+    "learning_rate",
+    default=lambda run: run.get("base_lr", 0.01) * run.get("lr_multiplier", 1.0),
+    name="Effective Learning Rate"
+)
+# Use Series for further analysis and operations
+import polars as pl
+# Combine multiple series into a DataFrame
+df = pl.DataFrame([
+    runs.to_series("model_type", name="Model"),
+    runs.to_series("batch_size", default=32, name="Batch Size"),
+    effective_lr
+])
+# Perform operations between Series
+normalized_acc = runs.to_series("accuracy", default=0.0, name="Accuracy")
+efficiency = normalized_acc / effective_lr  # Series division
 # Get unique values for a key
 model_types = runs.unique("model_type")
@@ -164,6 +227,18 @@ model_types = runs.unique("model_type")
 num_model_types = runs.n_unique("model_type")
 ```
+All data extraction methods (`to_list`, `to_numpy`, `to_series`, etc.) support both static and callable default values,
+matching the behavior of the `Run.get` method. When using a callable default, the function receives
+the Run instance as an argument, allowing you to:
+- Implement fallback logic for missing parameters
+- Create derived values based on multiple parameters
+- Handle varying configuration schemas across different experiments
+- Apply transformations to the raw parameter values
+This makes it much easier to work with heterogeneous collections of runs that might have different
+parameter sets or evolving configuration schemas.
 ## Converting to DataFrame
 For advanced analysis, you can convert your runs to a Polars DataFrame:
@@ -175,16 +250,76 @@ df = runs.to_frame()
 # DataFrame with specific configuration parameters
 df = runs.to_frame("model_type", "learning_rate", "batch_size")
-# Using a custom function that returns multiple columns
+# Specify default values for missing parameters using the defaults parameter
+df = runs.to_frame(
+    "model_type",
+    "learning_rate",
+    "batch_size",
+    defaults={"learning_rate": 0.01, "batch_size": 32}
+)
+# Use callable defaults for dynamic values based on each run
+df = runs.to_frame(
+    "model_type",
+    "learning_rate",
+    "epochs",
+    defaults={
+        "learning_rate": lambda run: run.get("base_lr", 0.01) * run.get("lr_multiplier", 1.0),
+        "epochs": lambda run: int(run.get("max_steps", 1000) / run.get("steps_per_epoch", 100))
+    }
+)
+# Missing values without defaults are represented as None (null) in the DataFrame
+# This allows for standard handling of missing data in Polars
+missing_values_df = runs.to_frame("model_type", "parameter_that_might_be_missing")
+# Filter rows with non-null values
+import polars as pl
+valid_rows = missing_values_df.filter(pl.col("parameter_that_might_be_missing").is_not_null())
+# Fill null values after creating the DataFrame
+filled_df = missing_values_df.with_columns(
+    pl.col("parameter_that_might_be_missing").fill_null("default_value")
+)
+# Using a custom function that returns multiple columns as keyword arguments
 def get_metrics(run: Run) -> dict[str, float]:
     return {
-        "accuracy": run.impl.accuracy(),
-        "precision": run.impl.precision(),
+        "accuracy": run.get("accuracy", default=lambda r: r.get("val_accuracy", 0.0) * 0.9),
+        "precision": run.get("precision", default=lambda r: r.get("val_precision", 0.0) * 0.9),
     }
+# Add custom columns using a function
 df = runs.to_frame("model_type", metrics=get_metrics)
+# Combine defaults with custom column generator functions
+df = runs.to_frame(
+    "model_type",
+    "learning_rate",
+    defaults={"learning_rate": 0.01},
+    metrics=get_metrics
+)
 ```
+The `to_frame` method provides several ways to handle missing data:
+1. **defaults parameter**: Provide static or callable default values for specific keys
+   - Static values: `defaults={"param": value}`
+   - Callable values: `defaults={"param": lambda run: computed_value}`
+2. **None values**: Parameters without defaults are represented as `None` (null) in the DataFrame
+   - This lets you use Polars operations for handling null values:
+     - Filter: `df.filter(pl.col("param").is_not_null())`
+     - Fill nulls: `df.with_columns(pl.col("param").fill_null(value))`
+     - Aggregations: Most aggregation functions handle nulls appropriately
+3. **Custom column generators**: Use keyword argument functions to compute complex columns
+   - These functions receive each Run instance and can implement custom logic
+   - They can use `run.get()` with defaults to handle missing parameters
+These approaches can be combined to create flexible and robust data extraction pipelines
+that handle different experiment configurations and parameter evolution over time.
 ## Grouping Runs
 The `group_by` method allows you to organize runs based on parameter values:
@@ -193,6 +328,9 @@ The `group_by` method allows you to organize runs based on parameter values:
 # Group by a single parameter
 model_groups = runs.group_by("model_type")
+# Group by nested parameter using dot notation
+architecture_groups = runs.group_by("model.architecture")
 # Iterate through groups
 for model_type, group in model_groups.items():
     print(f"Model type: {model_type}, Runs: {len(group)}")
@@ -200,6 +338,9 @@ for model_type, group in model_groups.items():
 # Group by multiple parameters
 param_groups = runs.group_by("model_type", "learning_rate")
+# Mix of regular and nested parameters using double underscore notation
+param_groups = runs.group_by("model_type", "model__hidden_size", "optimizer__learning_rate")
 # Access a specific group
 transformer_001_group = param_groups[("transformer", 0.001)]
 ```
@@ -218,14 +359,18 @@ This approach preserves all information in each group, giving you maximum flexib
 Combine `group_by` with aggregation for powerful analysis:
 ```python
-# Simple aggregation function using get method
+# Simple aggregation function using get method with callable defaults
 def mean_accuracy(runs: RunCollection) -> float:
-    return runs.to_numpy("accuracy").mean()
+    return runs.to_numpy(
+        "accuracy",
+        default=lambda run: run.get("val_accuracy", 0.0) * 0.9
+    ).mean()
-# Complex aggregation from implementation or configuration
+# Complex aggregation from implementation or configuration with fallbacks
 def combined_metric(runs: RunCollection) -> float:
-    accuracies = runs.to_numpy("accuracy")  # Could be from impl or cfg
-    precisions = runs.to_numpy("precision")  # Could be from impl or cfg
+    # Use callable defaults to handle missing values consistently
+    accuracies = runs.to_numpy("accuracy", default=lambda r: r.get("val_accuracy", 0.0))
+    precisions = runs.to_numpy("precision", default=lambda r: r.get("val_precision", 0.0))
     return (accuracies.mean() + precisions.mean()) / 2
@@ -243,9 +388,20 @@ results = runs.group_by(
     accuracy=mean_accuracy,
     combined=combined_metric
 )
+# Group by parameters that might be missing in some runs using callable defaults
+def normalize_architecture(run: Run) -> str:
+    # Get architecture with a fallback to model type if not available
+    arch = run.get("architecture", default=lambda r: r.get("model_type", "unknown"))
+    return arch.lower()  # Normalize to lowercase
+# Group by the normalized architecture
+arch_results = runs.group_by(normalize_architecture, accuracy=mean_accuracy)
 ```
-With the enhanced `get` method that can access both configuration and implementation attributes, writing aggregation functions becomes more straightforward. You no longer need to worry about whether a metric comes from configuration, implementation, or run information - the `get` method provides a unified access interface.
+With the enhanced `get` method and callable defaults support throughout the API, writing aggregation
+functions becomes more straightforward and robust. You can handle missing values consistently and
+implement complex transformations that work across heterogeneous runs.
 When aggregation functions are provided as keyword arguments, `group_by` returns a Polars DataFrame with the group keys and aggregated values. This design choice offers several advantages:

hydraflow 0.15.1__tar.gz → 0.16.0__tar.gz

hydraflow 0.15.1tar.gz → 0.16.0tar.gz