PyPI - expops - Versions diffs - 0.1.16.dev0__tar.gz → 0.1.18.dev0__tar.gz - Mend

expops 0.1.16.dev0tar.gz → 0.1.18.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (158) hide show

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: expops
-Version: 0.1.16.dev0
+Version: 0.1.18.dev0
 Summary: MLOps Platform with step-based pipeline execution
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/features/data-parallelism.md RENAMED Viewed

@@ -5,7 +5,7 @@ Use it when a process returns row-splittable data (pandas DataFrame, numpy array
 ## Config
-### Split from upstream (no `code_function`)
+### Split from upstream (no `code`)
 ```yaml
 processes:
@@ -17,16 +17,16 @@ processes:
 ```
 - `data_name` tells the system which upstream output key to split.
-- The split process has no `code_function`; it will merge its upstream outputs and split the `data_name` key.
-- `script` is optional for split nodes with no `code_function`.
+- The split process has no `code`; it will merge its upstream outputs and split the `data_name` key.
+- The script path for such helper split nodes is optional.
-### User-defined splitter (`code_function`)
+### User-defined splitter (`code`)
 ```yaml
 processes:
   - name: "nn_data_parallel"
     description: "Custom splitter"
-    code_function: "define_nn_data_parallel"
+    code: "define_nn_data_parallel"
     data_parallelism:
       size: [50, 20, 20]
 ```
@@ -42,7 +42,7 @@ def define_nn_data_parallel():
 - Return a **list of rows**; the framework will split it based on `size`.
 - If `data_name` is omitted and the function returns a list, the data key defaults to `data`.
-- `script_path` is required when using `code_function`.
+- `script_path` is required when using an explicit `code` function.
 ### Size formats
@@ -57,10 +57,10 @@ Add a dedicated aggregation process to collapse the **latest** data-parallel lay
 processes:
   - name: "aggregate_results"
     data_aggregation: true
-    code_function: "define_aggregate_results"
+    code: "define_aggregate_results"
 ```
-Aggregation processes **must** define a `code_function`.
+Aggregation processes **must** define a `code` function reference.
 ### Input shape at aggregation
@@ -76,8 +76,7 @@ def define_aggregate_results(df):
 ## Notes
 - Data parallelism duplicates downstream nodes in the graph and is visible in the UI.
-- Chart probe paths can use selector syntax to automatically expand to data-parallel
-  partitions; see the chart documentation for details.
+- Chart probe paths can use XPath selector syntax over the pipeline tree (including `@partition='p1'` predicates) to automatically expand to data-parallel partitions; see the chart documentation for details.
 - Multiple data-parallel layers are supported. Each aggregation collapses only the
   most recent data-parallel layer; outer layers are still represented by separate
   process nodes.

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/features/pipelines.md RENAMED Viewed

@@ -46,12 +46,12 @@ Each process must be explicitly defined with its configuration:
 processes:
   - name: "feature_engineering"
     description: "Load and prepare data"
-    code_function: "define_feature_engineering_process"
+    code: "define_feature_engineering_process"
     environment: "my-project-env"
   - name: "train_model"
     description: "Train the model"
-    code_function: "define_training_process"
+    code: "define_training_process"
     environment: "my-project-env"
     parameters:
       learning_rate: 0.001
@@ -67,8 +67,10 @@ processes:
 - `name`: Unique process identifier (must match names in `process_adjlist`)
 - `description`: Human-readable description
-- `script` (optional): Key from the top-level `scripts` section to use for this process. Defaults to the first key in `scripts` if omitted. See [Configuration](../project-structure/configuration.md) for the `scripts` section and defaults.
-- `code_function`: Name of the Python function that defines the process (see below). Defaults to the process name if omitted, so you can omit it when the function name matches the process name.
+- `code` (optional): Unified code reference for the process:
+  - `code: "script_key.function_name"` uses the script registered under `script_key` in the top-level `scripts` map and calls `function_name`.
+  - `code: "function_name"` uses the first script key in `scripts` and calls `function_name`.
+  - If omitted, ordinary processes default to a function with the same name as the process on the default script, while pure split/aggregation helper nodes can omit `code` entirely.
 - `environment`: Environment name to use (defaults to the first environment if omitted)
 - `parameters`: Optional parameters injected by name into the process function
 - `type`: Optional type (e.g., `"chart"` for chart generation processes)
@@ -77,7 +79,7 @@ processes:
 ## Process Functions
-Processes are implemented in Python using the `@process()` decorator. The function name must match the `code_function` in the config:
+Processes are implemented in Python using the `@process()` decorator. The function name referenced in `code` (or the process name when `code` is omitted) must match the registered process:
 ```python
 from expops.core import process, step,

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/features/reporting.md RENAMED Viewed

@@ -69,8 +69,8 @@ reporting:
   charts:
     - name: "my_chart"
       probe_paths:
-        train: "//*[@name=\"train_model\"]"
-        eval: "//*[@name=\"evaluate_model\"]"
+        train: "//*[@name='train_model']"
+        eval: "//*[@name='evaluate_model']"
 ```
 The chart function receives:
@@ -100,38 +100,15 @@ Common patterns:
 | Goal | XPath pattern |
 |------|----------------|
-| Process by name | `//*[@name="process_name"]` |
-| Process + step | `//*[@name="process_name"]/step[@name="step_name"]` or `//*[@name="process_name"]/*[@name="step_name"]` |
-| Specific partition/seed | `//*[@partition="p1"]/*[@seed="41"]/*[@name="process_name"]` |
-| Any partition/seed | `//*[@partition]/*[@seed]/*[@name="process_name"]` |
-In `project_config.yaml`, probe paths are double-quoted YAML strings, so escape each `"` as `\"` (see examples below).
-**Examples from projects:**
-- **sklearn-basic** (simple pipeline, no steps):
-```yaml
-probe_paths:
-  train: "//*[@name=\"train_model\"]"
-  eval: "//*[@name=\"evaluate_model\"]"
-```
-- **premier-league** (process + step; partition and seed):
-```yaml
-probe_paths:
-  feat: "//*[@name=\"feature_engineering_generic\"]/step[@name=\"feature_analysis\"]"
-  nn_a_p1_seed41: "//*[@partition=\"p1\"]/*[@seed=\"41\"]/*[@name=\"nn_training_a\"]/*[@name=\"train_and_evaluate_nn_classifier\"]"
-  linear: "//*[@partition]/*[@seed]/*[@name=\"linear_inference\"]/*[@name=\"test_inference_classification\"]"
-  nn_best: "//*[@partition]/*[@seed]/*[@name=\"nn_best_inference\"]/step[@name=\"test_inference_classification\"]"
-  ensemble: "//*[@partition]/*[@seed]/*[@name=\"ensemble_inference\"]"
-```
+| Process by name | `//*[@name='process_name']` |
+| Process + step | `//*[@name='process_name']/*[@name='step_name']` or `//*[@name='step_name']` if the step name is unique among process names |
+| Specific partition/seed | `//*[@partition='p1']/*[@seed='41']/*[@name='process_name']` |
+| Any partition/seed | `//*[@partition]/*[@seed]/*[@name='process_name']` |
 #### How keys map to chart metrics
 - **One XPath match**: The config key is preserved (e.g. `train` → `metrics['train']`).
-- **Multiple XPath matches**: Each resolved probe path becomes a key (e.g. `nn_training_a__p1_seed41/train_and_evaluate_nn_classifier`). Chart code can iterate over keys or use prefix/grouping logic to aggregate across partitions or seeds.
+- **Multiple XPath matches**: Each resolved probe path becomes a key. The key is the canonical XPath-style identifier for that process/step (e.g. `"//*[@partition='p1']/*[@seed='41']/*[@name='nn_training_a']/step[@name='train_and_evaluate_nn_classifier']"`). Chart code can iterate over keys or use prefix/grouping logic to aggregate across partitions or seeds.
 - **Literal path**: Single key as in config (e.g. `train: "train_model"` → `metrics['train']`).
 ### Output
@@ -147,7 +124,7 @@ Dynamic charts provide real-time, interactive visualizations.
 **Configuration**: Dynamic charts are defined as **pipeline processes** in `project_config.yaml` (under `experiment.parameters.pipeline.processes`). Each dynamic chart process must have:
-- `script` - a key that resolves to your JS chart script (e.g. `reporting_js`), defined under `scripts:` at the top of the config
+- `code` - a unified code reference that points to your JS chart script and function (e.g. `code: "reporting_js.nn_losses"`), where `reporting_js` is defined under `scripts:` at the top of the config
 - `chart_type: "dynamic"`
 - `probe_paths` - same XPath semantics as static charts (see [Probe paths](#probe-paths) below)
@@ -158,7 +135,7 @@ Dynamic charts provide real-time, interactive visualizations.
 ```yaml
 # Under experiment.parameters.pipeline.processes:
         - name: "nn_losses"
-          script: "reporting_js"
+          code: "reporting_js.nn_losses"
           environment: "premier-league-env-reporting"
           chart_type: "dynamic"
           probe_paths: ...

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/features/seed-parallelism.md RENAMED Viewed

@@ -1,10 +1,10 @@
-# Seed Parallelism
+### Seed Parallelism
 Seed parallelism duplicates the downstream process graph and runs each duplicate under a different random seed. It uses deterministic task-level seeding so that each duplicated branch is reproducible.
 ## Config
-### Predefined seed split (no `code_function`)
+### Predefined seed split (no `code`)
 ```yaml
 processes:
@@ -14,18 +14,17 @@ processes:
       seeds: [41, 42, 43]
 ```
-- The split process has no `code_function`; it will merge upstream outputs and pass them through.
+- The split process has no `code`; it will merge upstream outputs and pass them through.
 - The engine duplicates all downstream nodes until a `seed_aggregation` node is reached.
-- Duplicated nodes use a `_seed<value>` suffix (for example, `train_model_seed41`).
-- `script_path` is optional for split nodes with no `code_function`.
+- Duplicated nodes are tracked via structured `seed_value` metadata on each process instance.
-### User-defined seed split (`code_function`)
+### User-defined seed split (`code`)
 ```yaml
 processes:
   - name: "seed_parallel"
     description: "Custom seed hook"
-    code_function: "define_seed_parallel_process"
+    code: "define_seed_parallel_process"
     seed_parallelism:
       seeds: [41, 42, 43]
 ```
@@ -39,7 +38,6 @@ def define_seed_parallel_process(seeds, **inputs):
 ```
 - The `seeds` list is passed to the user-defined function as a kwarg.
-- `script_path` is required when using `code_function`.
 ## Aggregation
@@ -49,7 +47,7 @@ Add a dedicated aggregation process to collapse the **latest** seed-parallel lay
 processes:
   - name: "aggregate_seeds"
     seed_aggregation: true
-    code_function: "define_aggregate_seeds"
+    code: "define_aggregate_seeds"
 ```
 ### Input shape at aggregation
@@ -74,8 +72,11 @@ def define_aggregate_data_and_seed(metrics):
 ## Notes
-- Seed parallelism duplicates downstream nodes and is visible in the UI.
-- Task-level RNG seeding uses the most recent (innermost) seed layer when multiple
-  seed-parallel layers are nested.
-- Each aggregation collapses only the most recent seed-parallel layer; outer layers
-  remain as separate process nodes.
+- **UI representation**: Seed-parallel duplicates now use canonical XPath-style
+  process IDs in the process graph (for example
+  `//*[@partition='p1']/*[@seed='41']/process[@name='train']`), and human-readable
+  labels such as `train P1 S41` are derived from this metadata.
+- **Task-level RNG seeding**: Uses the most recent (innermost) seed layer when
+  multiple seed-parallel layers are nested.
+- **Aggregation behavior**: Each aggregation collapses only the most recent
+  seed-parallel layer; outer layers remain as separate process nodes.

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/project-structure/configuration.md RENAMED Viewed

@@ -41,8 +41,13 @@ reporting:         # Chart entrypoints, probe paths, reporting environment
 You can omit some process fields and rely on defaults to keep config minimal:
-- **Script**: If a process does not set `script`, the **first key** in the top-level `scripts` section is used. List your main script first so most processes need no `script` field.
-- **code_function**: If omitted, it defaults to the process **name**. When your Python function name matches the process name (e.g. process `train_model` and function `train_model`), you can omit `code_function`.
+- **Scripts section**: The top-level `scripts` map still defines script keys to file paths. The **first key** is treated as the default script for processes that do not explicitly specify a script key in `code`.
+- **Code field**: Each process may define a single `code` field instead of separate `script` and `code_function` fields:
+  - `code: "script_key.function_name"` → use the script registered under `script_key` and call `function_name` from that module.
+  - `code: "function_name"` → use the first script key in `scripts` and call `function_name`.
+- **Omitted code**:
+  - For ordinary processes, if `code` is omitted the system assumes a function with the same name as the process, loaded from the default script.
+  - For data/seed split helper nodes (processes that only define `data_parallelism` or `seed_parallelism` and no `code`), the system treats them as function-less split nodes.
 Example minimal process that uses both defaults:
@@ -88,7 +93,7 @@ For detailed information on each configuration section:
 - **Process & Step Code**: [Model Code](model-code.md)
 - **Caching**: [Caching & Reproducibility](../features/caching.md)
 - **Backends**: [Backends](../advanced/backends.md)
-- **Reporting/Charts**: [Reporting Features](../features/reporting.md) and [Chart Generation](charts.md)
+- **Reporting/Charts**: [Reporting Features](../features/reporting.md)
 - **Cluster Execution**: [Cluster Configuration](../advanced/cluster-config.md) and [Distributed Computing](../features/distributed.md)
 ## Example Configurations

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/docs/project-structure/overview.md RENAMED Viewed

@@ -49,8 +49,6 @@ The `charts/` directory contains visualization code:
 - **plot_metrics.py**: Static PNG chart generation
 - **plot_metrics.js**: Dynamic interactive charts
-See [Chart Generation](charts.md) for details.
 ### Dependencies
 - **requirements.txt**: Main dependencies for training/inference

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/src/expops.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: expops
-Version: 0.1.16.dev0
+Version: 0.1.18.dev0
 Summary: MLOps Platform with step-based pipeline execution
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/src/expops.egg-info/SOURCES.txt RENAMED Viewed

@@ -16,7 +16,6 @@ docs/features/reporting.md
 docs/features/seed-parallelism.md
 docs/getting-started/creating-a-project.md
 docs/getting-started/quick-start.md
-docs/project-structure/charts.md
 docs/project-structure/configuration.md
 docs/project-structure/model-code.md
 docs/project-structure/overview.md
@@ -134,13 +133,19 @@ tests/unit/test_core/test_data_hashing.py
 tests/unit/test_core/test_executor_worker.py
 tests/unit/test_core/test_graph_expansion.py
 tests/unit/test_core/test_networkx_parser.py
+tests/unit/test_core/test_networkx_parser_code_field.py
 tests/unit/test_core/test_payload_spill.py
+tests/unit/test_core/test_pipeline_tree_steps.py
 tests/unit/test_core/test_prepare_runner_kwargs.py
+tests/unit/test_core/test_probe_path_selectors_xpath.py
 tests/unit/test_core/test_process_hashing.py
 tests/unit/test_core/test_step_state_manager.py
 tests/unit/test_core/test_step_system.py
 tests/unit/test_managers/__init__.py
 tests/unit/test_managers/test_reproducibility_manager.py
+tests/unit/test_platform/test_dynamic_js_charts.py
+tests/unit/test_platform/test_project_metadata_resilience.py
+tests/unit/test_reporting/test_entrypoint.py
 tests/unit/test_storage/__init__.py
 tests/unit/test_storage/test_factory.py
 tests/unit/test_storage/test_gcp_kv_store.py

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/src/mlops/_version.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '0.1.16.dev0'
-__version_tuple__ = version_tuple = (0, 1, 16, 'dev0')
+__version__ = version = '0.1.18.dev0'
+__version_tuple__ = version_tuple = (0, 1, 18, 'dev0')
-__commit_id__ = commit_id = 'ga8d9fd60b'
+__commit_id__ = commit_id = 'g45a2e6dab'

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/src/mlops/adapters/custom/custom_adapter.py RENAMED Viewed

@@ -181,20 +181,34 @@ class CustomModelAdapter(ModelAdapter):
             probe_paths = proc.get("probe_paths")
             chart_type = proc.get("chart_type")
             is_chart = (proc_type == "chart") or (probe_paths is not None) or (chart_type is not None)
+            # Derive script key from unified `code` field where possible.
+            raw_code = proc.get("code")
+            code_str = str(raw_code).strip() if isinstance(raw_code, str) else ""
+            has_explicit_code = bool(code_str)
+            has_data_parallelism = proc.get("data_parallelism") is not None
+            has_seed_parallelism = proc.get("seed_parallelism") is not None
+            is_default_split = (not has_explicit_code) and (has_data_parallelism or has_seed_parallelism)
+            script_key: str | None = None
+            if has_explicit_code:
+                if "." in code_str:
+                    script_key, _ = code_str.rsplit(".", 1)
+                    script_key = script_key or None
+                else:
+                    # No explicit script key: use default script
+                    script_key = None
             if is_chart:
-                script_key = proc.get("script")
                 script_path = resolve_script_path(script_key, scripts_section, default_script) if scripts_section else None
                 if script_path and str(script_path).lower().endswith(".py"):
                     chart_script_paths.append(str(script_path))
                 continue
-            script_key = proc.get("script")
             script_path = resolve_script_path(script_key, scripts_section, default_script) if scripts_section else None
-            has_code_function = bool(proc.get("code_function"))
-            has_data_parallelism = proc.get("data_parallelism") is not None
-            has_seed_parallelism = proc.get("seed_parallelism") is not None
-            is_default_split = (not has_code_function) and (has_data_parallelism or has_seed_parallelism)
             if not script_path:
                 if is_default_split:
+                    # Default data/seed split nodes without explicit code may omit scripts.
                     continue
                 missing_script_paths.append(str(proc.get("name", "unknown")))
                 continue
@@ -204,7 +218,7 @@ class CustomModelAdapter(ModelAdapter):
             missing = ", ".join(sorted(set(missing_script_paths)))
             raise ValueError(
                 "script must be specified for non-chart processes "
-                "(except default data/seed split nodes without code_function): "
+                "(except default data/seed split nodes without explicit code): "
                 f"{missing}"
             )
         if not script_paths:
@@ -563,15 +577,16 @@ class CustomModelAdapter(ModelAdapter):
                 proc_type = str(proc.get("type", "process"))
                 if proc_type == "chart":
                     continue
-                # script is optional - default script will be used if omitted
-                has_code_function = bool(proc.get("code_function"))
+                raw_code = proc.get("code")
+                code_str = str(raw_code).strip() if isinstance(raw_code, str) else ""
+                has_explicit_code = bool(code_str)
                 has_data_parallelism = proc.get("data_parallelism") is not None
                 has_seed_parallelism = proc.get("seed_parallelism") is not None
-                is_default_split = (not has_code_function) and (has_data_parallelism or has_seed_parallelism)
+                is_default_split = (not has_explicit_code) and (has_data_parallelism or has_seed_parallelism)
                 if is_default_split:
                     continue
-                # For non-split processes, script key is optional (defaults to first script)
-                # Validation passes if script key exists or will use default
+                # For non-split processes, we require either explicit code or default
+                # behavior (code omitted but no split flags). In both cases validation passes.
             return True
         except Exception:
             return False

{expops-0.1.16.dev0 → expops-0.1.18.dev0}/src/mlops/core/executor_worker.py RENAMED Viewed

@@ -30,7 +30,12 @@ def _apply_hash_overrides(proc_payload: Dict[str, Any], config_hash: Optional[st
 def _strip_internal_keys(value: Any) -> Any:
     if isinstance(value, dict):
-        keep_keys = {"__data_partition_hashes__", "__data_partition_data_name__", "__data_hash__"}
+        keep_keys = {
+            "__data_partition_hashes__",
+            "__data_partition_data_name__",
+            "__data_hash__",
+            "__parallel_context__",
+        }
         return {k: v for k, v in value.items() if (k in keep_keys) or (not str(k).startswith("__"))}
     return value
@@ -559,8 +564,23 @@ def _prepare_runner_kwargs(
                 should_select = True
             if should_select:
                 dep_result = _select_partition_value(dep_result, data_name, int(partition_index))
-        part_idx = _partition_index_from_name(dep_name)
-        seed_val = _seed_value_from_name(dep_name)
+        part_idx = None
+        seed_val = None
+        try:
+            if isinstance(dep_result, dict):
+                part_idx = dep_result.get("__partition_index__")
+                seed_val = dep_result.get("__seed_value__")
+        except Exception:
+            part_idx = None
+            seed_val = None
+        # Fallback to parsing partition/seed indices from the dependency name when
+        # structured metadata is not present on the result payload. This keeps real
+        # metadata authoritative while allowing name-based aggregation for simple
+        # dict results (e.g. in tests or lightweight contexts).
+        if part_idx is None:
+            part_idx = _partition_index_from_name(dep_name)
+        if seed_val is None:
+            seed_val = _seed_value_from_name(dep_name)
         for key, value in dep_result.items():
             if isinstance(key, str) and key.startswith("__"):
                 continue
@@ -948,8 +968,6 @@ def _execute_process_on_worker(ctx: Any, proc_payload: Dict[str, Any], run_id: O
     seed_override = None
     if isinstance(proc_payload, dict):
         seed_override = proc_payload.get("seed_value")
-    if seed_override is None:
-        seed_override = _seed_value_from_name(process_name or "")
     _seed_rng_for_task(run_id, process_name, None, 0, seed_override=seed_override)
     dp_cfg = _normalize_data_parallelism_cfg(proc_payload.get("data_parallelism") if isinstance(proc_payload, dict) else None)
@@ -960,8 +978,7 @@ def _execute_process_on_worker(ctx: Any, proc_payload: Dict[str, Any], run_id: O
         or (isinstance(parallel_ctx, dict) and parallel_ctx.get("role") == "partition")
     )
     partition_index = proc_payload.get("partition_index") if isinstance(proc_payload, dict) else None
-    if partition_index is None:
-        partition_index = _partition_index_from_name(process_name or "")
+    seed_value = proc_payload.get("seed_value") if isinstance(proc_payload, dict) else None
     from .step_system import get_process_registry as _get_pr, set_current_context as _set_ctx, set_current_process_context as _set_proc
     import io as _io
@@ -1302,8 +1319,6 @@ def _execute_process_on_worker(ctx: Any, proc_payload: Dict[str, Any], run_id: O
                 allow_seed_agg = bool(proc_payload.get("seed_aggregation")) if isinstance(proc_payload, dict) else False
                 extra_inputs = {}
                 seed_value = proc_payload.get("seed_value") if isinstance(proc_payload, dict) else None
-                if seed_value is None:
-                    seed_value = _seed_value_from_name(process_name or "")
                 if seed_value is not None:
                     try:
                         extra_inputs["random_seed"] = int(seed_value)
@@ -1370,9 +1385,61 @@ def _execute_process_on_worker(ctx: Any, proc_payload: Dict[str, Any], run_id: O
             _captured = None
         return ExecutionResult(name=process_name, result=_captured, execution_time=exec_time, was_cached=False, error=f"Process '{process_name}' must return a dictionary, got {type(ret).__name__}.")
-    if data_hash and isinstance(ret, dict):
+    if isinstance(ret, dict):
+        # Attach structured parallelism metadata to the result so downstream
+        # aggregation can rely on it instead of parsing name suffixes.
+        try:
+            if partition_index is not None:
+                ret.setdefault("__partition_index__", partition_index)
+        except Exception:
+            pass
         try:
-            ret.setdefault("__data_hash__", data_hash)
+            if seed_value is not None:
+                ret.setdefault("__seed_value__", seed_value)
+        except Exception:
+            pass
+        try:
+            if data_hash:
+                ret.setdefault("__data_hash__", data_hash)
+        except Exception:
+            pass
+        # Expose the full layered parallel context (data/seed) on the result
+        # payload so downstream consumers can reason about multi-layer
+        # partition/seed structure without relying on process names.
+        try:
+            if isinstance(parallel_ctx, dict):
+                data_layers_src = parallel_ctx.get("data_layers")
+                seed_layers_src = parallel_ctx.get("seed_layers")
+                layers: Dict[str, Any] = {}
+                if isinstance(data_layers_src, list) and data_layers_src:
+                    dl_clean: list[dict[str, Any]] = []
+                    for layer in data_layers_src:
+                        if not isinstance(layer, dict):
+                            continue
+                        dl_clean.append(
+                            {
+                                "source": layer.get("source"),
+                                "partition_index": layer.get("partition_index"),
+                                "data_name": layer.get("data_name"),
+                            }
+                        )
+                    if dl_clean:
+                        layers["data_layers"] = dl_clean
+                if isinstance(seed_layers_src, list) and seed_layers_src:
+                    sl_clean: list[dict[str, Any]] = []
+                    for layer in seed_layers_src:
+                        if not isinstance(layer, dict):
+                            continue
+                        sl_clean.append(
+                            {
+                                "source": layer.get("source"),
+                                "seed_value": layer.get("seed_value"),
+                            }
+                        )
+                    if sl_clean:
+                        layers["seed_layers"] = sl_clean
+                if layers and "__parallel_context__" not in ret:
+                    ret["__parallel_context__"] = layers
         except Exception:
             pass
@@ -1487,6 +1554,49 @@ def _run_chart_process_on_worker(ctx: Any, proc_payload: Dict[str, Any], run_id:
                         break
         except Exception:
             pass
+    def _probe_paths_preview(value: Any, max_items: int = 8) -> str:
+        try:
+            if not isinstance(value, dict):
+                return str(value)
+            parts: list[str] = []
+            for idx, (k, v) in enumerate(value.items()):
+                if idx >= max_items:
+                    parts.append("...")
+                    break
+                parts.append(f"{k}=>{str(v)[:140]}")
+            return ", ".join(parts)
+        except Exception:
+            return "<unavailable>"
+    # Upgrade probe_paths to canonical XPath-based keys so they match the
+    # metrics written by the StepStateManager. This mirrors the behavior used
+    # for dynamic charts and the web UI chart-config endpoint.
+    try:
+        if chart_spec.get('probe_paths'):
+            gcfg = getattr(ctx, 'global_config', {}) if ctx else {}
+            pipeline_cfg = (gcfg.get('pipeline') or {}) if isinstance(gcfg, dict) else {}
+            if isinstance(pipeline_cfg, dict) and pipeline_cfg:
+                from mlops.core.networkx_parser import parse_networkx_pipeline_from_config
+                from mlops.core.graph_expansion import expand_process_graph
+                from mlops.core.probe_path_selectors import expand_probe_paths
+                nx_cfg = parse_networkx_pipeline_from_config(pipeline_cfg)
+                expanded_procs = expand_process_graph(nx_cfg.processes or [])
+                if expanded_procs:
+                    chart_spec['probe_paths'] = expand_probe_paths(chart_spec.get('probe_paths') or {}, expanded_procs)
+                    logger.info(
+                        "[Charts] Probe paths resolved for '%s': %s",
+                        name,
+                        _probe_paths_preview(chart_spec.get('probe_paths')),
+                    )
+    except Exception as exc:
+        # Best-effort only; fall back to raw probe_paths if expansion fails.
+        logger.warning(
+            "[Charts] Failed to expand probe_paths for '%s': %s. Using raw probe_paths: %s",
+            name,
+            exc,
+            _probe_paths_preview(chart_spec.get('probe_paths')),
+        )
     def _load_env_python_map() -> dict[str, str]:
         try:
             raw = os.environ.get("MLOPS_ENV_PYTHON_MAP") or ""
@@ -1722,8 +1832,12 @@ def _worker_execute_step_task(step_name: str, process_name: Optional[str], conte
         logger.warning(f"[Distributed] Worker state manager init failed for step {step_name}: {e}")
     set_current_context(ctx)
-    # Deterministic task-level seeding (step scope)
-    seed_override = _seed_value_from_name(process_name or "")
+    # Deterministic task-level seeding (step scope) using structured seed value
+    seed_override = None
+    try:
+        seed_override = getattr(ctx, "shared_state", {}).get("seed_value")
+    except Exception:
+        seed_override = None
     _seed_rng_for_task(run_id, process_name, step_name, iteration, seed_override=seed_override)
     try:
         from .step_system import get_current_state_manager as _get_sm
@@ -1899,8 +2013,6 @@ def _worker_execute_process_task(proc_payload: Dict[str, Any], context_payload:
         seed_override = None
         if isinstance(proc_payload, dict):
             seed_override = proc_payload.get("seed_value")
-        if seed_override is None:
-            seed_override = _seed_value_from_name(process_name or "")
         _seed_rng_for_task(run_id, process_name, None, 0, seed_override=seed_override)
         # Process start is now recorded inside _execute_process_on_worker at the exact timing start

expops 0.1.16.dev0__tar.gz → 0.1.18.dev0__tar.gz

expops 0.1.16.dev0tar.gz → 0.1.18.dev0tar.gz