PyPI - euler-preprocess - Versions diffs - 2.1.0__tar.gz → 2.2.0__tar.gz - Mend

euler-preprocess 2.1.0tar.gz → 2.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

{euler_preprocess-2.1.0 → euler_preprocess-2.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: euler-preprocess
-Version: 2.1.0
+Version: 2.2.0
 Summary: Physics-based preprocessing (fog, etc.) for RGB+depth datasets
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
@@ -49,6 +49,7 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
   "transform_config_path": "configs/run1.json",
   "output_path": "/path/to/output",
   "output_slot": "rgb",
+  "sample": 42,
   "modalities": {
     "rgb": {"path": "/path/to/rgb", "split": "train"},
     "depth": "/path/to/depth",
@@ -78,6 +79,8 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
 | `transform_config_path` | Path to the transform-specific config (see below). `fog_config_path` is also accepted for backward compatibility. |
 | `output_path` | Output root used when no pipeline target overrides it. Optional if `pipeline.output_root` or `pipeline.output_targets[].path` supplies the destination. |
 | `output_slot` | Optional slot selector when `pipeline.output_targets` contains multiple entries. Defaults to `rgb` for `fog`, `depth` for `sky-depth`, and `depth` for `radial`. |
+| `sample` | Optional 0-based euler-loading dataset index. When set, only `dataset[sample]` is transformed, which is useful for small augmented benchmark slices from large datasets. |
+| `samples` | Optional multi-sample selector. Use a list of 0-based indices (`[0, 10, 20]`) or a slice object such as `{"start": 0, "stop": 1000, "step": 2, "count": 100}`. `stop` is exclusive; `count` caps the selected indices after slicing. Do not set both `sample` and `samples`. |
 | `modalities` | Regular modalities that participate in sample-ID intersection. Each value is either a plain path string or an object with a `path` key and an optional `split` key (see below). Which modalities are required depends on the transform (see table below). |
 | `hierarchical_modalities` | Per-scene data (e.g. intrinsics). Same format as `modalities`. Loaded once per scene and cached. |
 | `pipeline` | Optional runtime routing block compatible with `euler-inference` (`output_root`, `outputs_manifest_path`, `output_targets`). |
@@ -86,6 +89,11 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
 When a modality directory contains [ds-crawler](https://github.com/d-rothen/ds-crawler) split files (`.ds_crawler/split_<name>.json`), you can select a subset of the data by setting the `split` key on that modality. Sample IDs are matched by intersection across all modalities, so specifying a split on a single modality is sufficient to restrict the entire dataset.
+For quick slices after euler-loading has matched modalities, set `samples`.
+For example, `{"samples": {"step": 2}}` processes every second matched sample,
+and `{"samples": {"start": 10, "step": 5, "count": 20}}` processes 20 samples
+starting at index 10 with stride 5.
 **Required modalities per transform:**
 | Transform | `modalities` | `hierarchical_modalities` |
@@ -142,6 +150,7 @@ Controls the fog simulation.
   "contrast_threshold": 0.05,
   "device": "cpu",
   "gpu_batch_size": 4,
+  "augmentations": { ... },
   "selection": { ... },
   "models": { ... }
 }
@@ -156,6 +165,7 @@ Controls the fog simulation.
 | `contrast_threshold` | Threshold *C_t* used in the visibility-to-attenuation conversion (default `0.05`). |
 | `device` | `"cpu"`, `"cuda"`, `"mps"`, or `"gpu"` (alias for cuda). |
 | `gpu_batch_size` | Batch size when running on GPU. Uniform-model samples are batched; heterogeneous samples are processed individually. |
+| `augmentations` | Optional stepped augmentation set. When present, every input sample produces every configured augmentation and uses the file-id hierarchy output layout described below. |
 ### Fog Model
@@ -256,6 +266,65 @@ Each model specifies a `visibility_m` distribution from which a visibility dista
 The sampled visibility *V* is converted to the attenuation coefficient: **k = -ln(C_t) / V**.
+### Stepped Augmentations
+For benchmark generation, set `augmentations` in the fog config. This switches
+the fog transform from one sampled output per input to one output per configured
+variant:
+```json
+{
+  "airlight": "from_sky",
+  "seed": 1337,
+  "contrast_threshold": 0.05,
+  "augmentations": {
+    "file_id_hierarchy_name": "file_id",
+    "attribute_key": "fog_augmentation",
+    "models": ["uniform"],
+    "visibility_m": [10, 20, 40, 70, 100],
+    "airlight_methods": ["from_sky"]
+  }
+}
+```
+The matrix form above expands as the Cartesian product of `models`,
+`visibility_m` (MOR in metres), optional `scattering_coefficients` / `beta`, and
+airlight choices. `file_id_hierarchy_name` names the inserted hierarchy level
+when the underlying ds-crawler writer has a hierarchy separator; the directory
+name is the source file id in either case. For tighter control, use explicit
+variants:
+```json
+"augmentations": {
+  "variants": [
+    {
+      "id": "mor_010m_sky",
+      "model": "uniform",
+      "visibility_m": 10,
+      "airlight_method": "from_sky"
+    },
+    {
+      "id": "beta_0.15_white",
+      "model": "heterogeneous_k",
+      "scattering_coefficient": 0.15,
+      "atmospheric_light": [1.0, 1.0, 1.0],
+      "k_hetero": {
+        "scales": "auto",
+        "min_factor": 0.5,
+        "max_factor": 1.5,
+        "normalize_to_mean": true
+      }
+    }
+  ]
+}
+```
+Each output entry receives per-file ds-crawler attributes under
+`fog_augmentation`, including the augmentation id, source id, source full id,
+model, actual scattering coefficient, actual atmospheric light, and configured
+MOR/beta descriptors when available. euler-loading exposes these as
+`sample["attributes"]["rgb"]["fog_augmentation"]`.
 ### Heterogeneous Noise Fields
 Both `k_hetero` and `ls_hetero` use Perlin FBM (fractional Brownian motion) to generate spatially-varying factor fields:
@@ -297,6 +366,22 @@ When a pipeline target is present, `pipeline.output_targets[].path` replaces
 `output_path` entirely. Standalone/direct `FogTransform(...)` usage without the
 CLI still uses the legacy per-model layout with `config.json` sidecars.
+With `augmentations` enabled, source-backed outputs are written one level below
+the source file id instead:
+```
+<output_path>/
+  .ds_crawler/output.json
+  Scene01/
+    Camera_0/
+      00000/
+        mor_10m_airlight_from_sky.png
+        mor_20m_airlight_from_sky.png
+```
+Auxiliary `scattering_coefficient` and `atmospheric_light` pipeline targets use
+the same file-id hierarchy and write matching `.npy` augmentation files.
 ---
 ## Sky-Depth Transform

{euler_preprocess-2.1.0 → euler_preprocess-2.2.0}/README.md RENAMED Viewed

@@ -35,6 +35,7 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
   "transform_config_path": "configs/run1.json",
   "output_path": "/path/to/output",
   "output_slot": "rgb",
+  "sample": 42,
   "modalities": {
     "rgb": {"path": "/path/to/rgb", "split": "train"},
     "depth": "/path/to/depth",
@@ -64,6 +65,8 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
 | `transform_config_path` | Path to the transform-specific config (see below). `fog_config_path` is also accepted for backward compatibility. |
 | `output_path` | Output root used when no pipeline target overrides it. Optional if `pipeline.output_root` or `pipeline.output_targets[].path` supplies the destination. |
 | `output_slot` | Optional slot selector when `pipeline.output_targets` contains multiple entries. Defaults to `rgb` for `fog`, `depth` for `sky-depth`, and `depth` for `radial`. |
+| `sample` | Optional 0-based euler-loading dataset index. When set, only `dataset[sample]` is transformed, which is useful for small augmented benchmark slices from large datasets. |
+| `samples` | Optional multi-sample selector. Use a list of 0-based indices (`[0, 10, 20]`) or a slice object such as `{"start": 0, "stop": 1000, "step": 2, "count": 100}`. `stop` is exclusive; `count` caps the selected indices after slicing. Do not set both `sample` and `samples`. |
 | `modalities` | Regular modalities that participate in sample-ID intersection. Each value is either a plain path string or an object with a `path` key and an optional `split` key (see below). Which modalities are required depends on the transform (see table below). |
 | `hierarchical_modalities` | Per-scene data (e.g. intrinsics). Same format as `modalities`. Loaded once per scene and cached. |
 | `pipeline` | Optional runtime routing block compatible with `euler-inference` (`output_root`, `outputs_manifest_path`, `output_targets`). |
@@ -72,6 +75,11 @@ Every subcommand takes a **dataset config** JSON that points to the input data a
 When a modality directory contains [ds-crawler](https://github.com/d-rothen/ds-crawler) split files (`.ds_crawler/split_<name>.json`), you can select a subset of the data by setting the `split` key on that modality. Sample IDs are matched by intersection across all modalities, so specifying a split on a single modality is sufficient to restrict the entire dataset.
+For quick slices after euler-loading has matched modalities, set `samples`.
+For example, `{"samples": {"step": 2}}` processes every second matched sample,
+and `{"samples": {"start": 10, "step": 5, "count": 20}}` processes 20 samples
+starting at index 10 with stride 5.
 **Required modalities per transform:**
 | Transform | `modalities` | `hierarchical_modalities` |
@@ -128,6 +136,7 @@ Controls the fog simulation.
   "contrast_threshold": 0.05,
   "device": "cpu",
   "gpu_batch_size": 4,
+  "augmentations": { ... },
   "selection": { ... },
   "models": { ... }
 }
@@ -142,6 +151,7 @@ Controls the fog simulation.
 | `contrast_threshold` | Threshold *C_t* used in the visibility-to-attenuation conversion (default `0.05`). |
 | `device` | `"cpu"`, `"cuda"`, `"mps"`, or `"gpu"` (alias for cuda). |
 | `gpu_batch_size` | Batch size when running on GPU. Uniform-model samples are batched; heterogeneous samples are processed individually. |
+| `augmentations` | Optional stepped augmentation set. When present, every input sample produces every configured augmentation and uses the file-id hierarchy output layout described below. |
 ### Fog Model
@@ -242,6 +252,65 @@ Each model specifies a `visibility_m` distribution from which a visibility dista
 The sampled visibility *V* is converted to the attenuation coefficient: **k = -ln(C_t) / V**.
+### Stepped Augmentations
+For benchmark generation, set `augmentations` in the fog config. This switches
+the fog transform from one sampled output per input to one output per configured
+variant:
+```json
+{
+  "airlight": "from_sky",
+  "seed": 1337,
+  "contrast_threshold": 0.05,
+  "augmentations": {
+    "file_id_hierarchy_name": "file_id",
+    "attribute_key": "fog_augmentation",
+    "models": ["uniform"],
+    "visibility_m": [10, 20, 40, 70, 100],
+    "airlight_methods": ["from_sky"]
+  }
+}
+```
+The matrix form above expands as the Cartesian product of `models`,
+`visibility_m` (MOR in metres), optional `scattering_coefficients` / `beta`, and
+airlight choices. `file_id_hierarchy_name` names the inserted hierarchy level
+when the underlying ds-crawler writer has a hierarchy separator; the directory
+name is the source file id in either case. For tighter control, use explicit
+variants:
+```json
+"augmentations": {
+  "variants": [
+    {
+      "id": "mor_010m_sky",
+      "model": "uniform",
+      "visibility_m": 10,
+      "airlight_method": "from_sky"
+    },
+    {
+      "id": "beta_0.15_white",
+      "model": "heterogeneous_k",
+      "scattering_coefficient": 0.15,
+      "atmospheric_light": [1.0, 1.0, 1.0],
+      "k_hetero": {
+        "scales": "auto",
+        "min_factor": 0.5,
+        "max_factor": 1.5,
+        "normalize_to_mean": true
+      }
+    }
+  ]
+}
+```
+Each output entry receives per-file ds-crawler attributes under
+`fog_augmentation`, including the augmentation id, source id, source full id,
+model, actual scattering coefficient, actual atmospheric light, and configured
+MOR/beta descriptors when available. euler-loading exposes these as
+`sample["attributes"]["rgb"]["fog_augmentation"]`.
 ### Heterogeneous Noise Fields
 Both `k_hetero` and `ls_hetero` use Perlin FBM (fractional Brownian motion) to generate spatially-varying factor fields:
@@ -283,6 +352,22 @@ When a pipeline target is present, `pipeline.output_targets[].path` replaces
 `output_path` entirely. Standalone/direct `FogTransform(...)` usage without the
 CLI still uses the legacy per-model layout with `config.json` sidecars.
+With `augmentations` enabled, source-backed outputs are written one level below
+the source file id instead:
+```
+<output_path>/
+  .ds_crawler/output.json
+  Scene01/
+    Camera_0/
+      00000/
+        mor_10m_airlight_from_sky.png
+        mor_20m_airlight_from_sky.png
+```
+Auxiliary `scattering_coefficient` and `atmospheric_light` pipeline targets use
+the same file-id hierarchy and write matching `.npy` augmentation files.
 ---
 ## Sky-Depth Transform

{euler_preprocess-2.1.0 → euler_preprocess-2.2.0}/euler_preprocess/cli.py RENAMED Viewed

@@ -8,7 +8,9 @@ from __future__ import annotations
 import argparse
 import inspect
 import json
+from collections.abc import Iterable, Iterator, Sequence
 from pathlib import Path
+from typing import Any
 from euler_preprocess.common.dataset import build_dataset
 from euler_preprocess.common.logging import get_logger, log_dataset_info
@@ -27,6 +29,132 @@ def _resolve(path_str: str, config_dir: Path) -> Path:
     return (config_dir / p).resolve()
+class _SelectedSamples(Sequence):
+    """Lazy view over selected euler-loading dataset entries."""
+    def __init__(self, dataset, indices: Iterable[int]) -> None:
+        self.dataset = dataset
+        self.indices = tuple(indices)
+    def __len__(self) -> int:
+        return len(self.indices)
+    def __iter__(self) -> Iterator[dict]:
+        for index in self.indices:
+            yield self.dataset[index]
+    def __getitem__(self, index: int | slice):
+        if isinstance(index, slice):
+            return [self.dataset[i] for i in self.indices[index]]
+        return self.dataset[self.indices[index]]
+def _validate_sample_index(value: Any, *, key: str, dataset_size: int) -> int:
+    if isinstance(value, bool) or not isinstance(value, int):
+        raise ValueError(f"{key} must be a non-negative integer index")
+    if value < 0:
+        raise ValueError(f"{key} must be a non-negative integer index")
+    if value >= dataset_size:
+        raise IndexError(
+            f"{key} {value} out of range for dataset of length {dataset_size}"
+        )
+    return value
+def _positive_int(value: Any, *, key: str) -> int:
+    if isinstance(value, bool) or not isinstance(value, int):
+        raise ValueError(f"{key} must be a positive integer")
+    if value <= 0:
+        raise ValueError(f"{key} must be a positive integer")
+    return value
+def _non_negative_int(value: Any, *, key: str) -> int:
+    if isinstance(value, bool) or not isinstance(value, int):
+        raise ValueError(f"{key} must be a non-negative integer")
+    if value < 0:
+        raise ValueError(f"{key} must be a non-negative integer")
+    return value
+def _resolve_sample_indices(selection: Any, *, dataset_size: int) -> tuple[int, ...]:
+    if isinstance(selection, list):
+        indices = tuple(
+            _validate_sample_index(value, key="samples[]", dataset_size=dataset_size)
+            for value in selection
+        )
+        if not indices:
+            raise ValueError("samples must select at least one dataset entry")
+        return indices
+    if not isinstance(selection, dict):
+        raise ValueError("samples must be an object or a list of integer indices")
+    allowed = {"start", "stop", "step", "count"}
+    unknown = sorted(set(selection) - allowed)
+    if unknown:
+        raise ValueError(f"samples contains unknown keys: {', '.join(unknown)}")
+    start = _non_negative_int(selection.get("start", 0), key="samples.start")
+    stop_value = selection.get("stop")
+    if stop_value is None:
+        stop = dataset_size
+    else:
+        stop = _non_negative_int(stop_value, key="samples.stop")
+    step = _positive_int(selection.get("step", 1), key="samples.step")
+    if start >= dataset_size:
+        raise IndexError(
+            f"samples.start {start} out of range for dataset of length {dataset_size}"
+        )
+    indices = tuple(range(start, min(stop, dataset_size), step))
+    if "count" in selection:
+        count = _positive_int(selection["count"], key="samples.count")
+        indices = indices[:count]
+    if not indices:
+        raise ValueError("samples must select at least one dataset entry")
+    return indices
+def _select_configured_samples(config: dict, dataset, logger):
+    """Apply optional top-level sample selection from the dataset config."""
+    has_sample = "sample" in config
+    has_samples = "samples" in config
+    if has_sample and has_samples:
+        raise ValueError("Use either sample or samples, not both")
+    if not has_sample and not has_samples:
+        return dataset
+    dataset_size = len(dataset)
+    if has_sample:
+        sample_index = _validate_sample_index(
+            config["sample"],
+            key="sample",
+            dataset_size=dataset_size,
+        )
+        sample = dataset[sample_index]
+        logger.info(
+            "Sample selection: using sample=%d of %d (id=%s, full_id=%s)",
+            sample_index,
+            dataset_size,
+            sample.get("id"),
+            sample.get("full_id"),
+        )
+        return [sample]
+    indices = _resolve_sample_indices(config["samples"], dataset_size=dataset_size)
+    logger.info(
+        "Sample selection: using %d/%d samples (first_index=%d, last_index=%d)",
+        len(indices),
+        dataset_size,
+        indices[0],
+        indices[-1],
+    )
+    return _SelectedSamples(dataset, indices)
 def _run_transform(args: argparse.Namespace, transform_class: type) -> int:
     """Shared logic for all subcommands."""
     logger = get_logger()
@@ -57,6 +185,7 @@ def _run_transform(args: argparse.Namespace, transform_class: type) -> int:
     dataset = build_dataset(config, required_modalities, required_hierarchical)
     output_backends = prepare_output_backends(config, dataset, transform_class)
     primary_backend = next(iter(output_backends.values()))
+    samples = _select_configured_samples(config, dataset, logger)
     dataset_name = config.get("dataset", "dataset")
     raw_modalities = {
@@ -69,7 +198,7 @@ def _run_transform(args: argparse.Namespace, transform_class: type) -> int:
             modality_info[name] = {"path": entry}
         else:
             modality_info[name] = entry
-    log_dataset_info(logger, dataset_name, len(dataset), modality_info, use_gpu)
+    log_dataset_info(logger, dataset_name, len(samples), modality_info, use_gpu)
     for slot, backend in output_backends.items():
         logger.info("Output path [%s]: %s", slot, backend.root)
@@ -98,7 +227,7 @@ def _run_transform(args: argparse.Namespace, transform_class: type) -> int:
         )
     transform = transform_class(**transform_kwargs)
-    saved_paths = transform.run(dataset)
+    saved_paths = transform.run(samples)
     logger.info("Transform complete. Generated %d outputs.", len(saved_paths))
     return 0

euler-preprocess 2.1.0__tar.gz → 2.2.0__tar.gz

euler-preprocess 2.1.0tar.gz → 2.2.0tar.gz