PyPI - kaparoo-python - Versions diffs - 0.4.0__tar.gz → 0.6.0__tar.gz - Mend

kaparoo-python 0.4.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: kaparoo-python
-Version: 0.4.0
+Version: 0.6.0
 Summary: Personally common and useful Python features
 Keywords: filesystem,pathlib,paths,utilities
 Author: Jaewoo Park
@@ -67,16 +67,17 @@ hook for custom filter kinds.
 `Timer` / `SegmentTimer` context-manager-and-decorator timers (with
 `lap`-split and `measure`-block timings); `Aggregator` for nested,
-pluggable metric aggregation (the batch → epoch → run pattern); plus a
-small family of helpers for working with `Optional[T]` values
-(`replace_if_none`, `unwrap_or_default`, ...).
+pluggable metric aggregation (the batch → epoch → run pattern;
+experimental); plus a small family of helpers for working with
+`Optional[T]` values (`replace_if_none`, `unwrap_or_default`, ...).
 ### [`kaparoo.data`](https://github.com/kaparoo/kaparoo-python/tree/main/kaparoo/data)
 Building blocks for dataset code: `DataSequence[T, M]` ABC (item +
 metadata), composers (`SlicedSequence`, `ConcatSequence`,
-`WindowedSequence`), file-backed templates (`FileFolderSequence`,
-`SingleFileSequence`), and `generate_batches`.
+`TransformedSequence`, `WindowedSequence`, `ZippedSequence`), file-backed
+templates (`FileFolderSequence`, `FileListSequence`, `SingleFileSequence`),
+and `generate_batches`.
 ## 🎯 Quick example

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/README.md RENAMED Viewed

@@ -46,16 +46,17 @@ hook for custom filter kinds.
 `Timer` / `SegmentTimer` context-manager-and-decorator timers (with
 `lap`-split and `measure`-block timings); `Aggregator` for nested,
-pluggable metric aggregation (the batch → epoch → run pattern); plus a
-small family of helpers for working with `Optional[T]` values
-(`replace_if_none`, `unwrap_or_default`, ...).
+pluggable metric aggregation (the batch → epoch → run pattern;
+experimental); plus a small family of helpers for working with
+`Optional[T]` values (`replace_if_none`, `unwrap_or_default`, ...).
 ### [`kaparoo.data`](https://github.com/kaparoo/kaparoo-python/tree/main/kaparoo/data)
 Building blocks for dataset code: `DataSequence[T, M]` ABC (item +
 metadata), composers (`SlicedSequence`, `ConcatSequence`,
-`WindowedSequence`), file-backed templates (`FileFolderSequence`,
-`SingleFileSequence`), and `generate_batches`.
+`TransformedSequence`, `WindowedSequence`, `ZippedSequence`), file-backed
+templates (`FileFolderSequence`, `FileListSequence`, `SingleFileSequence`),
+and `generate_batches`.
 ## 🎯 Quick example

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/data/README.md RENAMED Viewed

@@ -7,9 +7,10 @@ small set of composers, and ready-to-subclass file-backed templates.
 - [`sequences/base`](./sequences/base.py) — `DataSequence[T, M]` abstract base
 - [`sequences/composers`](./sequences/composers.py) — `SlicedSequence`,
-  `ConcatSequence`, `WindowedSequence`
+  `TransformedSequence`, `ConcatSequence`, `WindowedSequence`,
+  `ZippedSequence`
 - [`sequences/templates`](./sequences/templates.py) — `FileFolderSequence`,
-  `SingleFileSequence`
+  `FileListSequence`, `SingleFileSequence`
 - [`sequences/utils`](./sequences/utils.py) — `generate_batches`
 All public symbols are re-exported from both `kaparoo.data` and
@@ -83,18 +84,49 @@ combined = ConcatSequence(train_a, train_b, train_c)
 len(combined)  # == len(train_a) + len(train_b) + len(train_c)
 ```
+### `TransformedSequence`
+A lazy view that applies a `transform` callable to each item of
+`source`. The transform is called on demand in `get_item` -- nothing
+is computed at construction. `get_meta` passes through `source.get_meta`
+unchanged by default; override it in a subclass when `M_out` differs
+from `M_in`.
+```python
+from kaparoo.data.sequences import TransformedSequence
+# Item transform only -- metadata type is unchanged.
+normalized = TransformedSequence(image_folder, normalize_fn)
+# Meta transform via subclassing:
+class Augmented(TransformedSequence[ndarray, Path, ndarray, AugMeta]):
+    def get_meta(self, index: int) -> AugMeta:
+        return AugMeta(path=self.source.get_meta(index), applied="normalize")
+```
+Chaining two `TransformedSequence` instances applies the transforms in
+order:
+```python
+resized    = TransformedSequence(raw, resize)
+normalized = TransformedSequence(resized, normalize)
+```
+`T_out` and `M_out` default to `T_in` and `M_in` respectively (PEP 696),
+so you only need to specify them when the type actually changes.
 ### `WindowedSequence`
 An abstract sliding-window view: each item is a `tuple[T, ...]` of
 `size` frames from `source`. Per-frame `M_in` and window-level
-`M_out` are independent type parameters, so subclasses decide how
-metadata aggregates.
+`M_out` are independent type parameters (`M_out` defaults to `M_in`),
+so subclasses decide how metadata aggregates.
 ```python
 from pathlib import Path
 from kaparoo.data.sequences import WindowedSequence
-class FirstFrameMeta(WindowedSequence[bytes, Path, Path]):
+class FirstFrameMeta(WindowedSequence[bytes, Path]):
     def get_meta(self, index):
         # window's metadata is its first frame's metadata
         index = self._normalize_index(index)
@@ -109,6 +141,27 @@ windows.get_meta(0)   # frames.get_meta(0)
 `size`, `step`, `skip` follow the same semantics as
 [`generate_batches`](#generate_batches).
+### `ZippedSequence`
+Element-wise zip of two sequences — item `i` is `(first[i], second[i])`
+and metadata `i` is the `(M1, M2)` tuple. This is the "paired image +
+label" pattern that `ConcatSequence` (end-to-end) cannot express. With
+`strict=True` (the default) the lengths must match or construction raises
+`ValueError`; pass `strict=False` to truncate to the shorter length, like
+the builtin `zip`. For a different combined metadata shape, subclass and
+override `get_meta`.
+```python
+from kaparoo.data.sequences import ZippedSequence
+pairs = ZippedSequence(images, labels)
+pairs[0]            # (images[0], labels[0])
+pairs.get_meta(0)   # (images.get_meta(0), labels.get_meta(0))
+```
+For three or more, nest: `ZippedSequence(a, ZippedSequence(b, c))` yields
+`(a[i], (b[i], c[i]))`.
 ## Templates
 ### `FileFolderSequence`
@@ -158,6 +211,30 @@ class GlobFolder(FileFolderSequence[bytes]):
 folder = GlobFolder("data", pattern="*.png", recursive=True)
 ```
+### `FileListSequence`
+Same "one file per item" contract as `FileFolderSequence`, but the files
+are given as an explicit list instead of discovered under a `root` — so
+they may live in unrelated directories (or, on Windows, different drives),
+which `FileFolderSequence` cannot represent. There is no `list_files`;
+subclasses implement only `load_file` and `get_meta`. The input order is
+preserved verbatim (duplicates kept) — sort it yourself if needed.
+```python
+from pathlib import Path
+from kaparoo.data.sequences import FileListSequence
+class BytesList(FileListSequence[bytes]):
+    def get_meta(self, index):
+        return self.get_file(index)
+    def load_file(self, path):
+        return path.read_bytes()
+# Files from anywhere, in the order given:
+data = BytesList(["images/a.png", "/other/disk/b.png"])
+```
 ### `SingleFileSequence`
 Thin ABC for the "one file, many records" pattern (a video with many

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/data/__init__.py RENAMED Viewed

@@ -2,9 +2,12 @@ __all__ = (
     "ConcatSequence",
     "DataSequence",
     "FileFolderSequence",
+    "FileListSequence",
     "SingleFileSequence",
     "SlicedSequence",
+    "TransformedSequence",
     "WindowedSequence",
+    "ZippedSequence",
     "generate_batches",
 )
@@ -12,8 +15,11 @@ from kaparoo.data.sequences import (
     ConcatSequence,
     DataSequence,
     FileFolderSequence,
+    FileListSequence,
     SingleFileSequence,
     SlicedSequence,
+    TransformedSequence,
     WindowedSequence,
+    ZippedSequence,
     generate_batches,
 )

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/data/sequences/__init__.py RENAMED Viewed

@@ -4,9 +4,12 @@ __all__ = (
     "ConcatSequence",
     "DataSequence",
     "FileFolderSequence",
+    "FileListSequence",
     "SingleFileSequence",
     "SlicedSequence",
+    "TransformedSequence",
     "WindowedSequence",
+    "ZippedSequence",
     "generate_batches",
 )
@@ -14,10 +17,13 @@ from kaparoo.data.sequences.base import DataSequence
 from kaparoo.data.sequences.composers import (
     ConcatSequence,
     SlicedSequence,
+    TransformedSequence,
     WindowedSequence,
+    ZippedSequence,
 )
 from kaparoo.data.sequences.templates import (
     FileFolderSequence,
+    FileListSequence,
     SingleFileSequence,
 )
 from kaparoo.data.sequences.utils import generate_batches

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/data/sequences/composers.py RENAMED Viewed

@@ -1,15 +1,21 @@
 from __future__ import annotations
-__all__ = ("ConcatSequence", "SlicedSequence", "WindowedSequence")
+__all__ = (
+    "ConcatSequence",
+    "SlicedSequence",
+    "TransformedSequence",
+    "WindowedSequence",
+    "ZippedSequence",
+)
 from abc import abstractmethod
 from bisect import bisect_right
-from typing import TYPE_CHECKING
+from typing import TYPE_CHECKING, cast
 from kaparoo.data.sequences.base import DataSequence
 if TYPE_CHECKING:
-    from collections.abc import Sequence
+    from collections.abc import Callable, Sequence
 class SlicedSequence[T, M](DataSequence[T, M]):
@@ -59,6 +65,61 @@ class SlicedSequence[T, M](DataSequence[T, M]):
         return self._source.get_meta(self._indices[index])
+class TransformedSequence[T_in, M_in, T_out = T_in, M_out = M_in](
+    DataSequence[T_out, M_out]
+):
+    """A view of `source` with `transform` applied lazily to each item.
+    `transform` is called on demand in `get_item`; nothing is loaded or
+    converted at construction time. `get_meta` passes through
+    `source.get_meta` unchanged by default -- override it in a subclass
+    when `M_out` differs from `M_in`.
+    Type Parameters:
+        T_in: Item type of `source`.
+        M_in: Metadata type of `source`.
+        T_out: Item type after the transform. Defaults to `T_in`.
+        M_out: Metadata type exposed by this view. Defaults to `M_in`.
+            When `M_out != M_in`, override `get_meta` in a subclass;
+            the default passthrough is only safe when `M_out == M_in`.
+    Example:
+        >>> # Item-only transform; metadata passes through unchanged.
+        >>> normalized = TransformedSequence(image_folder, normalize)
+        >>> # Meta transform via subclassing:
+        >>> class Augmented(TransformedSequence[ndarray, Path, ndarray, AugMeta]):
+        ...     def get_meta(self, index: int) -> AugMeta:
+        ...         return AugMeta(
+        ...             path=self.source.get_meta(index),
+        ...             applied="normalize",
+        ...         )
+    """
+    def __init__(
+        self,
+        source: DataSequence[T_in, M_in],
+        transform: Callable[[T_in], T_out],
+    ) -> None:
+        self._source = source
+        self._transform = transform
+    @property
+    def source(self) -> DataSequence[T_in, M_in]:
+        """The wrapped sequence."""
+        return self._source
+    def __len__(self) -> int:
+        return len(self._source)
+    def get_item(self, index: int) -> T_out:
+        return self._transform(self._source.get_item(index))
+    def get_meta(self, index: int) -> M_out:
+        # Passthrough by default. Override when M_out != M_in.
+        return cast("M_out", self._source.get_meta(index))
 class ConcatSequence[T, M](DataSequence[T, M]):
     """The end-to-end concatenation of zero or more `sources`.
@@ -112,7 +173,7 @@ class ConcatSequence[T, M](DataSequence[T, M]):
         return source.get_meta(local)
-class WindowedSequence[T, M_in, M_out](DataSequence[tuple[T, ...], M_out]):
+class WindowedSequence[T, M_in, M_out = M_in](DataSequence[tuple[T, ...], M_out]):
     """An abstract sliding-window view over `source`.
     Each item is a tuple of `size` items from `source`, starting at
@@ -130,8 +191,8 @@ class WindowedSequence[T, M_in, M_out](DataSequence[tuple[T, ...], M_out]):
         T: Item type of `source` (also the per-frame type within each
             window).
         M_in: Metadata type of `source` (per-frame metadata).
-        M_out: Metadata type of the window. Determined by the
-            subclass's `get_meta` return.
+        M_out: Metadata type of the window. Defaults to `M_in`.
+            Determined by the subclass's `get_meta` return.
     Args:
         source: The sequence to window over.
@@ -219,3 +280,115 @@ class WindowedSequence[T, M_in, M_out](DataSequence[tuple[T, ...], M_out]):
     @abstractmethod
     def get_meta(self, index: int) -> M_out:
         raise NotImplementedError
+class ZippedSequence[T1, T2, M1 = None, M2 = None](
+    DataSequence[tuple[T1, T2], tuple[M1, M2]]
+):
+    """Element-wise zip of two sequences.
+    Item `i` is `(first[i], second[i])` and metadata `i` is
+    `(first.get_meta(i), second.get_meta(i))` -- the "paired image + label"
+    pattern that `ConcatSequence` (end-to-end) cannot express.
+    With `strict=True` (the default) the two sequences must have the same
+    length; a mismatch raises `ValueError` at construction. With
+    `strict=False` the view is truncated to the shorter length, like the
+    builtin `zip`. For a different combined-metadata shape, subclass and
+    override `get_meta`.
+    Type Parameters:
+        T1: Item type of the first source.
+        T2: Item type of the second source.
+        M1: Metadata type of the first source. Defaults to `None`.
+        M2: Metadata type of the second source. Defaults to `None`.
+    Args:
+        first: The first sequence.
+        second: The second sequence.
+        strict: When True (default), require equal lengths and raise on a
+            mismatch. When False, truncate to the shorter length.
+    Raises:
+        ValueError: If `strict` is True and the sequences differ in length.
+    Example:
+        >>> pairs = ZippedSequence(images, labels)
+        >>> pairs[0]  # (images[0], labels[0])
+        >>> pairs.get_meta(0)  # (images.get_meta(0), labels.get_meta(0))
+    """
+    def __init__(
+        self,
+        first: DataSequence[T1, M1],
+        second: DataSequence[T2, M2],
+        *,
+        strict: bool = True,
+    ) -> None:
+        if strict and len(first) != len(second):
+            msg = f"sequences differ in length: {len(first)} != {len(second)}"
+            raise ValueError(msg)
+        self._first = first
+        self._second = second
+        self._length = len(first) if strict else min(len(first), len(second))
+    @property
+    def first(self) -> DataSequence[T1, M1]:
+        """The first wrapped sequence."""
+        return self._first
+    @property
+    def second(self) -> DataSequence[T2, M2]:
+        """The second wrapped sequence."""
+        return self._second
+    def __len__(self) -> int:
+        return self._length
+    def _normalize_index(self, index: int) -> int:
+        """Normalize a possibly-negative index and validate range.
+        Indices resolve against the zipped length (the shorter source when
+        `strict=False`), so they address the same position in both sources.
+        Raises:
+            IndexError: If `index` is outside `[-len(self), len(self))`.
+        """
+        n = self._length
+        original = index
+        if index < 0:
+            index += n
+        if not 0 <= index < n:
+            msg = f"index {original} out of range for length {n}"
+            raise IndexError(msg)
+        return index
+    def get_item(self, index: int) -> tuple[T1, T2]:
+        index = self._normalize_index(index)
+        return self._first.get_item(index), self._second.get_item(index)
+    def get_items(self, indices: Sequence[int]) -> Sequence[tuple[T1, T2]]:
+        # Normalize, then bulk-delegate so each source's `get_items`
+        # optimization is used.
+        normalized = [self._normalize_index(i) for i in indices]
+        return list(
+            zip(
+                self._first.get_items(normalized),
+                self._second.get_items(normalized),
+                strict=True,
+            )
+        )
+    def get_meta(self, index: int) -> tuple[M1, M2]:
+        index = self._normalize_index(index)
+        return self._first.get_meta(index), self._second.get_meta(index)
+    def get_metas(self, indices: Sequence[int]) -> Sequence[tuple[M1, M2]]:
+        normalized = [self._normalize_index(i) for i in indices]
+        return list(
+            zip(
+                self._first.get_metas(normalized),
+                self._second.get_metas(normalized),
+                strict=True,
+            )
+        )

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/data/sequences/templates.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-__all__ = ("FileFolderSequence", "SingleFileSequence")
+__all__ = ("FileFolderSequence", "FileListSequence", "SingleFileSequence")
 from abc import abstractmethod
 from pathlib import Path
@@ -11,14 +11,95 @@ from kaparoo.filesystem.existence import ensure_dir_exists, ensure_file_exists
 from kaparoo.filesystem.utils import stringify_paths, wrap_path
 if TYPE_CHECKING:
-    from kaparoo.filesystem.types import StrPath
+    from kaparoo.filesystem.types import StrPath, StrPaths
-class FileFolderSequence[T, M = Path](DataSequence[T, M]):
-    """A folder-rooted `DataSequence` whose items live in individual files.
+class FileListSequence[T, M = Path](DataSequence[T, M]):
+    """A `DataSequence` over an explicit, ordered list of files.
-    The base class handles file discovery, indexing, and root-relative
-    path bookkeeping. Subclasses are responsible for three things:
+    Items live one-per-file; subclasses implement `load_file` and `get_meta`.
+    The files are given directly rather than discovered under a `root`, so
+    they may live in unrelated directories -- or, on Windows, on different
+    drives. (`FileFolderSequence` is the special case where the list is
+    discovered under a single root and stored relative to it.)
+    The given order is preserved verbatim and duplicates are kept; sort the
+    input yourself (`sorted(files, key=...)`) if a particular order is
+    needed. Paths are not checked for existence at construction; `load_file`
+    is called lazily on each `get_item`.
+    The base exposes:
+    - `files: tuple[Path, ...]` — full paths as an immutable snapshot.
+    - `get_file(index) -> Path` — full path of the i-th file.
+    Type Parameters:
+        T: Item type returned by `get_item`.
+        M: Per-item metadata type. Defaults to `Path`; override when the
+            metadata is something else (label, line number, ...).
+    Args:
+        files: The file paths to expose, in order.
+    Example:
+        >>> from pathlib import Path
+        >>> class BytesList(FileListSequence[bytes]):
+        ...     def get_meta(self, index: int) -> Path:
+        ...         return self.get_file(index)
+        ...
+        ...     def load_file(self, path: Path) -> bytes:
+        ...         return path.read_bytes()
+        >>>
+        >>> data = BytesList(["images/a.png", "/other/b.png"])
+    """
+    def __init__(self, files: StrPaths) -> None:
+        self._files = list(stringify_paths(files))
+    def __len__(self) -> int:
+        return len(self._files)
+    @property
+    def files(self) -> tuple[Path, ...]:
+        """Immutable snapshot of the full file paths, in order.
+        Returns a fresh `tuple[Path, ...]` on each access.
+        """
+        return tuple(self.get_file(i) for i in range(len(self)))
+    def get_file(self, index: int) -> Path:
+        """Full Path of the file at `index`."""
+        return Path(self._files[index])
+    def get_item(self, index: int) -> T:
+        return self.load_file(self.get_file(index))
+    @abstractmethod
+    def get_meta(self, index: int) -> M:
+        raise NotImplementedError
+    @abstractmethod
+    def load_file(self, path: Path) -> T:
+        """Decode a single file into an item of type `T`.
+        Called lazily on each `get_item` -- not at construction time.
+        Subclasses may freely use external libraries (PIL, librosa,
+        cv2, ...) to decode.
+        """
+        raise NotImplementedError
+class FileFolderSequence[T, M = Path](FileListSequence[T, M]):
+    """A `FileListSequence` whose file list is discovered under a root.
+    The special case of `FileListSequence` where every file lives under one
+    base directory. The list is produced by `list_files(root)`, validated to
+    be under `root`, and stored in root-relative form so memory stays low for
+    large datasets and the paths survive a `root` relocation; `get_file`
+    transparently re-prepends `root`. `load_file`, `get_item`, `files`, and
+    `__len__` are inherited unchanged.
+    Subclasses are responsible for three things:
     - **`list_files(self, root)`** (abstract): return the full `Path`
       of every file to expose, in the desired order. Called once from
@@ -33,16 +114,9 @@ class FileFolderSequence[T, M = Path](DataSequence[T, M]):
       to `Path` and `get_meta(i)` can be the one-liner
       `return self.get_file(i)`.
-    The base exposes:
+    The base adds, on top of `FileListSequence`:
     - `root: Path` — the base directory.
-    - `files: tuple[Path, ...]` — full paths as an immutable snapshot.
-    - `get_file(index) -> Path` — full path of the i-th file.
-    Paths are kept internally in their root-relative form so that
-    memory stays low for large datasets and the sequence survives
-    `root` relocations; the conversion is transparent to subclasses
-    and external callers.
     Parameterized subclasses:
         When a subclass needs instance-level options (e.g. `pattern`,
@@ -94,48 +168,20 @@ class FileFolderSequence[T, M = Path](DataSequence[T, M]):
     def __init__(self, root: StrPath) -> None:
         self._root = ensure_dir_exists(root)
-        self._files = list(
-            stringify_paths(self.list_files(self._root), after=self._root)
-        )
-    def __len__(self) -> int:
-        return len(self._files)
+        # `after=root` makes each path root-relative and raises ValueError if
+        # any file is not under `root`. The base then stores the relative
+        # form; `get_file` re-prepends `root`.
+        super().__init__(stringify_paths(self.list_files(self._root), after=self._root))
     @property
     def root(self) -> Path:
         """The base directory the sequence was constructed from."""
         return self._root
-    @property
-    def files(self) -> tuple[Path, ...]:
-        """Immutable snapshot of the full file paths this sequence exposes.
-        Returns a fresh `tuple[Path, ...]` on each access, in the order
-        established by `list_files`.
-        """
-        return tuple(self.get_file(i) for i in range(len(self)))
     def get_file(self, index: int) -> Path:
         """Full Path of the file at `index`."""
         return wrap_path(self._files[index], prepend=self._root)
-    def get_item(self, index: int) -> T:
-        return self.load_file(self.get_file(index))
-    @abstractmethod
-    def get_meta(self, index: int) -> M:
-        raise NotImplementedError
-    @abstractmethod
-    def load_file(self, path: Path) -> T:
-        """Decode a single file into an item of type `T`.
-        Called lazily on each `get_item` -- not at construction time.
-        Subclasses may freely use external libraries (PIL, librosa,
-        cv2, ...) to decode.
-        """
-        raise NotImplementedError
     @abstractmethod
     def list_files(self, root: Path) -> list[Path]:
         """Return the full Path of every file to expose, in order.

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/filesystem/README.md RENAMED Viewed

@@ -69,8 +69,8 @@ cache_dir = make_dir("var/cache", exist_ok=True)
 # Start from a clean slate: wipe an existing directory's contents and
 # recreate it empty. Destructive, and only ever wipes a *directory* (a
-# non-directory at the path still raises). `clean=True` makes `exist_ok`
-# moot, since the directory is removed and remade.
+# non-directory -- or a symlink -- at the path still raises). `clean=True`
+# makes `exist_ok` moot, since the directory is removed and remade.
 run_dir = make_dir("out/run_42", clean=True)
 # Bulk creation with a shared root

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/filesystem/directory.py RENAMED Viewed

@@ -38,6 +38,20 @@ if TYPE_CHECKING:
 # ========================== #
+def _ensure_directory_target(path: Path, *, clean: bool) -> None:
+    """Reject a path that cannot serve as a directory target.
+    Raises `NotADirectoryError` when `path` exists but is not a directory,
+    or when `clean` is requested on a symlink -- cleaning must operate on a
+    real directory, never through a link (which would otherwise reach the
+    link's target). A symlink to a directory is accepted only when `clean`
+    is False.
+    """
+    if (path.exists() and not path.is_dir()) or (clean and path.is_symlink()):
+        msg = f"not a usable directory target: {path}"
+        raise NotADirectoryError(msg)
 @overload
 def make_dir(
     path: StrPath,
@@ -88,9 +102,9 @@ def make_dir(
             Defaults to False.
         clean: Whether to recreate the directory empty when it already exists,
             removing its contents first. Only an existing *directory* is wiped;
-            a non-directory still raises. Because the directory is removed and
-            remade, `clean=True` makes `exist_ok` moot. **Destructive.**
-            Defaults to False.
+            a non-directory -- or a symlink -- still raises. Because the
+            directory is removed and remade, `clean=True` makes `exist_ok`
+            moot. **Destructive.** Defaults to False.
         stringify: Whether to return the path as a string. Defaults to False.
     Returns:
@@ -100,15 +114,14 @@ def make_dir(
     Raises:
         ValueError: If `mode` is outside the range 0o1-0o7777
             (not checked on Windows, where the mode is ignored).
-        NotADirectoryError: If the path exists but is not a directory.
+        NotADirectoryError: If the path exists but is not a directory, or
+            `clean` is True and the path is a symlink.
         OSError: If `exist_ok` is False, `clean` is False, and the path
             already exists.
     """
     _validate_mode(mode)
     path = Path(path)
-    if path.exists() and not path.is_dir():
-        msg = f"not a directory: {path}"
-        raise NotADirectoryError(msg)
+    _ensure_directory_target(path, clean=clean)
     if clean and path.is_dir():
         shutil.rmtree(path)
     path.mkdir(mode=mode, parents=True, exist_ok=exist_ok)
@@ -170,9 +183,9 @@ def make_dirs(
             Defaults to False.
         clean: Whether to recreate each directory empty when it already exists,
             removing its contents first. Only an existing *directory* is wiped;
-            a non-directory still raises. Because the directory is removed and
-            remade, `clean=True` makes `exist_ok` moot. **Destructive.**
-            Defaults to False.
+            a non-directory -- or a symlink -- still raises. Because the
+            directory is removed and remade, `clean=True` makes `exist_ok`
+            moot. **Destructive.** Defaults to False.
         stringify: Whether to return the paths as strings. Defaults to False.
     Returns:
@@ -183,15 +196,26 @@ def make_dirs(
         ValueError: If `mode` is outside the range 0o1-0o7777
             (not checked on Windows, where the mode is ignored).
         DirectoryNotFoundError: If `root` is provided and does not exist.
-        NotADirectoryError: If `root` is provided and is not a directory.
+        NotADirectoryError: If `root` is provided and is not a directory, if
+            any path exists but is not a directory, or `clean` is True and
+            any path is a symlink.
         ValueError: If `root` is provided and any of the paths are absolute.
         OSError: If `exist_ok` is False, `clean` is False, and any of the
             paths already exist.
-        OSError: If any of the paths are not directories.
+    Note:
+        Every path is validated (the non-directory / symlink checks above)
+        *before* any directory is wiped or created, so a deterministically
+        bad entry -- e.g. a file in the list -- fails without partially
+        cleaning earlier entries. Creation/cleanup is otherwise per-path and
+        not transactional, so a runtime failure (a race, a permission error)
+        partway through can still leave earlier entries created or cleaned.
     """
     _validate_mode(mode)
     paths = _join_root_if_provided(paths, root)
     directories = [Path(p) for p in paths]
+    for directory in directories:
+        _ensure_directory_target(directory, clean=clean)
     for directory in directories:
         if clean and directory.is_dir():
             shutil.rmtree(directory)

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/filesystem/staged.py RENAMED Viewed

@@ -58,6 +58,23 @@ def _default_dir_mode() -> int:
     return _umask_default(0o777)
+def _fsync_parent(path: Path) -> None:
+    """Best-effort fsync of `path`'s parent directory entry.
+    Makes a just-completed rename/link into `path` durable across a crash on
+    POSIX (the file's own data is fsynced separately). A no-op where a
+    directory cannot be opened for fsync, e.g. Windows.
+    """
+    try:
+        fd = os.open(path.parent, os.O_RDONLY)
+    except OSError:
+        return
+    try:
+        os.fsync(fd)
+    finally:
+        os.close(fd)
 class StagedFile[AnyStrT: (str, bytes)]:
     """Write a file safely: stage to a temp file, then commit by atomic move.
@@ -87,10 +104,13 @@ class StagedFile[AnyStrT: (str, bytes)]:
         ```
     With `overwrite=False` (the default) an existing destination is a
-    fail-fast `FileExistsError`, and the commit creates the file atomically --
-    it never clobbers a file that appeared meanwhile. With `overwrite=True`
-    the destination is atomically replaced, inheriting its previous
-    permissions.
+    fail-fast `FileExistsError`, and the commit creates the file atomically
+    via a hardlink -- it never clobbers a file that appeared meanwhile. On a
+    filesystem without hardlink support (FAT/exFAT, some network mounts) the
+    commit falls back to a best-effort existence check plus replace, leaving
+    a small window where a file appearing concurrently could be clobbered.
+    With `overwrite=True` the destination is atomically replaced, inheriting
+    its previous permissions.
     The committed file gets the usual umask-based permissions (not the
     restrictive mode of the internal temp file). The destination's parent
@@ -254,15 +274,26 @@ class StagedFile[AnyStrT: (str, bytes)]:
         if self._overwrite:
             self._temp_path.replace(self._path)
         else:
+            # Atomic exclusive create via hardlink where supported. A
+            # filesystem without hardlinks (FAT/exFAT, some network mounts)
+            # raises a non-`FileExistsError` `OSError`; fall back to a
+            # best-effort existence check plus `replace` (which leaves a
+            # small TOCTOU window where a file appearing meanwhile could be
+            # clobbered -- unavoidable without an atomic no-clobber move).
             try:
                 self._path.hardlink_to(self._temp_path)
-            except FileExistsError:
-                msg = (
-                    f"file already exists, pass overwrite=True to replace: {self._path}"
-                )
-                raise FileExistsError(msg) from None
-            finally:
-                self._temp_path.unlink(missing_ok=True)
+            except OSError as exc:
+                if isinstance(exc, FileExistsError) or self._path.exists():
+                    self._temp_path.unlink(missing_ok=True)
+                    msg = (
+                        "file already exists, pass overwrite=True to replace: "
+                        f"{self._path}"
+                    )
+                    raise FileExistsError(msg) from None
+                self._temp_path.replace(self._path)
+            else:
+                self._temp_path.unlink()
+        _fsync_parent(self._path)
         self._committed = True
         self._finalizer.detach()
         return self._path
@@ -318,9 +349,9 @@ class StagedDirectory:
     staged directory is moved into place with a single rename, and an existing
     destination is a fail-fast `FileExistsError`. Replacing an existing one
     (`overwrite=True`) is *not* fully atomic -- the old directory is swapped
-    aside and then removed, leaving a brief window where the destination is
-    absent and, on a rare failure mid-swap, the previous contents in a sibling
-    ``<name>.old`` directory for recovery.
+    aside, the staged one moved in, then the old removed. A failed move
+    restores the original; only a crash *between* the two renames leaves the
+    previous contents in a sibling ``<name>.old`` directory for recovery.
     The committed directory gets the usual umask-based permissions. Pass
     `make_parents=True` to create the destination's parent if it is missing.
@@ -395,6 +426,8 @@ class StagedDirectory:
                 appeared after this builder opened.
             NotADirectoryError: If `overwrite` is True and the destination
                 exists but is not a directory.
+            OSError: If replacing an existing directory and moving the staged
+                one into place fails; the original is restored first.
         """
         if self._committed:
             return self._path
@@ -420,16 +453,24 @@ class StagedDirectory:
                 mode = stat.S_IMODE(self._path.stat().st_mode)
         self._workdir.chmod(mode)
         if exists:
-            # Replacing an existing directory. No portable atomic dir replace:
-            # swap the old one aside, move the staged one in, then remove the
-            # old. A failure between the renames leaves the previous contents
-            # in `<name>.old`.
+            # Replacing an existing directory. There is no portable atomic
+            # directory replace, so swap the old one aside, move the staged one
+            # in, then remove the old. If the second move fails, restore the
+            # original; removing the backup is best-effort (the destination is
+            # already correct). A crash *between* the two moves is the residual
+            # non-atomic window -- the previous contents remain in a sibling
+            # `<name>.old` directory for manual recovery.
             backup = self._path.with_name(f"{self._workdir.name}.old")
             self._path.rename(backup)
-            self._workdir.rename(self._path)
-            shutil.rmtree(backup)
+            try:
+                self._workdir.rename(self._path)
+            except OSError:
+                backup.rename(self._path)
+                raise
+            shutil.rmtree(backup, ignore_errors=True)
         else:
             self._workdir.rename(self._path)
+        _fsync_parent(self._path)
         self._committed = True
         self._finalizer.detach()
         return self._path

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/filesystem/utils.py RENAMED Viewed

@@ -266,6 +266,10 @@ def reserve_path(
     an exclusive file create, `open(path, "x")` raises the same
     `FileExistsError` directly.
+    A symlink counts as occupying the path -- including a *broken* one,
+    which `Path.exists` alone reports as absent yet still takes the name
+    (so `open(path, "x")` would fail). Such a path is treated as existing.
     Args:
         path: The path that should not yet exist.
         exist_ok: Whether to allow an already-existing path. Defaults to False.
@@ -277,9 +281,13 @@ def reserve_path(
         The path as a Path object or a string, depending on `stringify`.
     Raises:
-        FileExistsError: If the path exists and `exist_ok` is False.
+        FileExistsError: If the path exists (or is a symlink) and `exist_ok`
+            is False.
+        OSError: If `make_parents` is True and the parent cannot be created
+            (e.g. an ancestor along the path is a file).
     """
-    if (path := Path(path)).exists() and not exist_ok:
+    path = Path(path)
+    if (path.exists() or path.is_symlink()) and not exist_ok:
         msg = f"path already exists: {path}"
         raise FileExistsError(msg)
     if make_parents:

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/utils/README.md RENAMED Viewed

@@ -164,6 +164,7 @@ print(run.compute())
 | Reduction | Result | Empty |
 | --- | --- | --- |
 | `Mean()` | weighted arithmetic mean | `nan` |
+| `Var()` / `Std()` | weighted population variance / std (Welford) | `nan` |
 | `Sum()` | sum of values (weight ignored) | `0.0` |
 | `Min()` / `Max()` | running min / max (weight ignored) | `nan` |
 | `Last()` | most recent value | `nan` |
@@ -177,8 +178,8 @@ import operator
 Aggregator(Fold(operator.mul, 1.0))           # running product
 ```
-For a reduction with richer state (weighted variance, RMS, ...), subclass
-`Reduction` (or `UnweightedReduction` when weight is irrelevant) and
+For a reduction with richer state (RMS, a weighted geometric mean, ...),
+subclass `Reduction` (or `UnweightedReduction` when weight is irrelevant) and
 implement `identity` / `step` (or `accumulate`) / `merge` / `result`. The
 `merge` method *is* the nesting behavior, so custom reductions nest as
 exactly as the built-ins.

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/utils/__init__.py RENAMED Viewed

@@ -8,9 +8,11 @@ __all__ = (
     "Reduction",
     "SegmentRecord",
     "SegmentTimer",
+    "Std",
     "Sum",
     "Timer",
     "UnweightedReduction",
+    "Var",
     "factory_if_none",
     "replace_if_none",
     "unwrap_or_default",
@@ -27,8 +29,10 @@ from kaparoo.utils.aggregate import (
     Mean,
     Min,
     Reduction,
+    Std,
     Sum,
     UnweightedReduction,
+    Var,
 )
 from kaparoo.utils.optional import (
     factory_if_none,

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/kaparoo/utils/aggregate.py RENAMED Viewed

@@ -16,10 +16,13 @@ __all__ = (
     "Mean",
     "Min",
     "Reduction",
+    "Std",
     "Sum",
     "UnweightedReduction",
+    "Var",
 )
+import math
 from abc import ABC, abstractmethod
 from dataclasses import dataclass
 from typing import TYPE_CHECKING
@@ -108,6 +111,65 @@ class Mean(Reduction[tuple[float, float]]):
         return state[0] / state[1] if state[1] else float("nan")
+@dataclass(frozen=True)
+class Var(Reduction[tuple[float, float, float]]):
+    """Weighted population variance; state is `(weight, mean, M2)`.
+    Accumulated online (Welford) and merged exactly (Chan's parallel
+    algorithm), so it nests across loop levels like the other reductions.
+    Uses the population convention -- M2 over the total weight, as in
+    numpy's default `ddof=0` -- which stays well-defined under weighting.
+    Empty -> `nan`.
+    """
+    def identity(self) -> tuple[float, float, float]:
+        return (0.0, 0.0, 0.0)
+    def step(
+        self, state: tuple[float, float, float], value: float, weight: float
+    ) -> tuple[float, float, float]:
+        total, mean, m2 = state
+        total += weight
+        delta = value - mean
+        mean += (weight / total) * delta
+        m2 += weight * delta * (value - mean)
+        return (total, mean, m2)
+    def merge(
+        self,
+        a: tuple[float, float, float],
+        b: tuple[float, float, float],
+    ) -> tuple[float, float, float]:
+        total_a, mean_a, m2_a = a
+        total_b, mean_b, m2_b = b
+        total = total_a + total_b
+        if total == 0:
+            return (0.0, 0.0, 0.0)
+        delta = mean_b - mean_a
+        mean = mean_a + delta * total_b / total
+        m2 = m2_a + m2_b + delta * delta * total_a * total_b / total
+        return (total, mean, m2)
+    def result(self, state: tuple[float, float, float]) -> float:
+        total, _mean, m2 = state
+        return m2 / total if total else float("nan")
+@dataclass(frozen=True)
+class Std(Var):
+    """Weighted population standard deviation: the square root of `Var`.
+    Shares `Var`'s online, mergeable moments; only the final projection
+    differs. Empty -> `nan`.
+    """
+    def result(self, state: tuple[float, float, float]) -> float:
+        variance = super().result(state)
+        if math.isnan(variance):  # empty state
+            return variance
+        return max(variance, 0.0) ** 0.5
 @dataclass(frozen=True)
 class Sum(UnweightedReduction[float]):
     """Running sum of values (weight ignored). Empty -> `0.0`."""

{kaparoo_python-0.4.0 → kaparoo_python-0.6.0}/pyproject.toml RENAMED Viewed

@@ -12,7 +12,7 @@ build-backend = "uv_build"
 [project]
 name = "kaparoo-python"
-version = "0.4.0"
+version = "0.6.0"
 description = "Personally common and useful Python features"
 readme = "README.md"
 requires-python = ">=3.14"