PyPI - simplezarr - Versions diffs - 0.0.1__tar.gz - Mend

simplezarr 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

simplezarr-0.0.1/LICENSE +21 -0
simplezarr-0.0.1/PKG-INFO +95 -0
simplezarr-0.0.1/README.md +68 -0
simplezarr-0.0.1/pyproject.toml +54 -0
simplezarr-0.0.1/simplezarr/__init__.py +6 -0
simplezarr-0.0.1/simplezarr/_version.py +227 -0
simplezarr-0.0.1/simplezarr/codecs.py +509 -0
simplezarr-0.0.1/simplezarr/misc.py +4 -0
simplezarr-0.0.1/simplezarr/nodes.py +466 -0
simplezarr-0.0.1/simplezarr/stores.py +504 -0
simplezarr-0.0.1/simplezarr/utils/__init__.py +0 -0
simplezarr-0.0.1/simplezarr/utils/chunkpool.py +447 -0
simplezarr-0.0.1/simplezarr/utils/multiscale.py +255 -0
simplezarr-0.0.1/simplezarr/utils/units.py +44 -0

simplezarr-0.0.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Almar Klein
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

simplezarr-0.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,95 @@
+Metadata-Version: 2.4
+Name: simplezarr
+Version: 0.0.1
+Summary: A simple, elegant, and efficient Zarr implementation.
+Keywords: zarr
+Author: Almar Klein
+Requires-Python: >= 3.12
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy
+Requires-Dist: numcodecs
+Requires-Dist: google-crc32c>=1.5
+Requires-Dist: simplezarr[lint, tests, docs] ; extra == "dev"
+Requires-Dist: sphinx>7.2 ; extra == "docs"
+Requires-Dist: sphinx_rtd_theme ; extra == "docs"
+Requires-Dist: ruff ; extra == "lint"
+Requires-Dist: pre-commit ; extra == "lint"
+Requires-Dist: pytest ; extra == "tests"
+Requires-Dist: pytest-cov ; extra == "tests"
+Project-URL: Homepage, https://github.com/canpute/simplezarr
+Project-URL: Repository, https://github.com/canpute/simplezarr
+Provides-Extra: dev
+Provides-Extra: docs
+Provides-Extra: lint
+Provides-Extra: tests
+# simplezarr
+A simple, elegant, and efficient Zarr implementation
+The core of `simplezarr` implements the Zarr 3 spec in straightforward
+Python, without extra fuzz. This makes the code easy to follow, and gives
+predictable performance. Extra functionality is provided as functions and classes
+that are provided in `simplezarr.utils`.
+Since `simplezarr` is nice and simple, it's easy to adopt in various
+use-cases. It supports parallel io, but does not force the use of asyncio.
+## Status
+* Stores are implemented, (except no remote stores yet).
+* Codecs are implemented (all except for sharding).
+* Main API can (asynchronously) read and write chunks.
+What is not yet supported:
+* Writing Zarr files.
+* Indexing (wip).
+* Sharding.
+## Motivation
+Zarr 3 is a great file format for large datasets. It's nice and elegant. The
+`simplezarr` lib is what happened when we took the Zarr 3 spec, and implemented
+it as directly as possible.
+Parallelism is achieved using a thread-pool and `concurrent.futures.Future`
+objects. And in once place exactly: the code that reads a chunk (`ZarrArray.get_chunk_future()`).
+We don't force asyncio. In fact, ``simplezarr`` does not even import
+asyncio (except in code paths that represent a utility specific to asyncio
+users).
+## Comparison with zarr-python
+Why not use zarr-python? We ran into performance issues, and upon
+investigating what happens under the hood, we found it hard to follow the path
+that the code takes, especially regarding threading and asyncio. Granted, part
+of that complexity is because it must support older Zarr versions as well.
+Another reason is that zarr-python does not seem to have a way to read individual blocks
+asynchronously (`AsyncArray.get_block_selection()` does not exist), which was a
+requirement for our use-case.
+### What zarr-python does
+* The store loads data using `asyncio.to_thread()`. This runs the io-bound reading of bytes in a separate thread (from the loop's default `ThreadPoolExecutor`).
+* It uses `asyncio.gather()` is parallelize concurrent reads/writes.
+* When using the `zarr.Array` (not `AsyncArray`), indexing is synchronous. To do this:
+  * It uses a dedicated asyncio loop that runs continuously in a dedicated thread.
+  * A dedicated `ThreadPoolExecutor` is set on that loop (which will be used to perform the store IO with).
+  * Then `asyncio.run_coroutine_threadsafe(the_asyncio_coroutine, dedicated_loop)` to turn the asyncio code into a `concurrent.futures.Future`.
+  * Then sync-wait on that future.
+It looks like this complexity is one of the reasons why the performance of ome-zarr is hard to get right. The ome-zarr library wraps zarr-python with Dask, which uses thread pools too, which results in a lot of threads being spawned.
+### What simplezarr does
+* Stores are synchronous.
+* `simplezarr.Array.get_chunk()` is synchronous (no threading or async).
+* `simplezarr.Array.get_chunk_future()` uses a `ThreadPoolExecutor`. It returns a `concurrent.futures.Future`.
+* This is enough to support concurrently reads.
+* No asyncio anywhere.
+* But can be used in `asyncio` (and other frameworks) using `await asyncio.wrap_future(f)` or `f.add_done_callback(call_soon_threadsafe)`.

simplezarr-0.0.1/README.md ADDED Viewed

@@ -0,0 +1,68 @@
+# simplezarr
+A simple, elegant, and efficient Zarr implementation
+The core of `simplezarr` implements the Zarr 3 spec in straightforward
+Python, without extra fuzz. This makes the code easy to follow, and gives
+predictable performance. Extra functionality is provided as functions and classes
+that are provided in `simplezarr.utils`.
+Since `simplezarr` is nice and simple, it's easy to adopt in various
+use-cases. It supports parallel io, but does not force the use of asyncio.
+## Status
+* Stores are implemented, (except no remote stores yet).
+* Codecs are implemented (all except for sharding).
+* Main API can (asynchronously) read and write chunks.
+What is not yet supported:
+* Writing Zarr files.
+* Indexing (wip).
+* Sharding.
+## Motivation
+Zarr 3 is a great file format for large datasets. It's nice and elegant. The
+`simplezarr` lib is what happened when we took the Zarr 3 spec, and implemented
+it as directly as possible.
+Parallelism is achieved using a thread-pool and `concurrent.futures.Future`
+objects. And in once place exactly: the code that reads a chunk (`ZarrArray.get_chunk_future()`).
+We don't force asyncio. In fact, ``simplezarr`` does not even import
+asyncio (except in code paths that represent a utility specific to asyncio
+users).
+## Comparison with zarr-python
+Why not use zarr-python? We ran into performance issues, and upon
+investigating what happens under the hood, we found it hard to follow the path
+that the code takes, especially regarding threading and asyncio. Granted, part
+of that complexity is because it must support older Zarr versions as well.
+Another reason is that zarr-python does not seem to have a way to read individual blocks
+asynchronously (`AsyncArray.get_block_selection()` does not exist), which was a
+requirement for our use-case.
+### What zarr-python does
+* The store loads data using `asyncio.to_thread()`. This runs the io-bound reading of bytes in a separate thread (from the loop's default `ThreadPoolExecutor`).
+* It uses `asyncio.gather()` is parallelize concurrent reads/writes.
+* When using the `zarr.Array` (not `AsyncArray`), indexing is synchronous. To do this:
+  * It uses a dedicated asyncio loop that runs continuously in a dedicated thread.
+  * A dedicated `ThreadPoolExecutor` is set on that loop (which will be used to perform the store IO with).
+  * Then `asyncio.run_coroutine_threadsafe(the_asyncio_coroutine, dedicated_loop)` to turn the asyncio code into a `concurrent.futures.Future`.
+  * Then sync-wait on that future.
+It looks like this complexity is one of the reasons why the performance of ome-zarr is hard to get right. The ome-zarr library wraps zarr-python with Dask, which uses thread pools too, which results in a lot of threads being spawned.
+### What simplezarr does
+* Stores are synchronous.
+* `simplezarr.Array.get_chunk()` is synchronous (no threading or async).
+* `simplezarr.Array.get_chunk_future()` uses a `ThreadPoolExecutor`. It returns a `concurrent.futures.Future`.
+* This is enough to support concurrently reads.
+* No asyncio anywhere.
+* But can be used in `asyncio` (and other frameworks) using `await asyncio.wrap_future(f)` or `f.add_done_callback(call_soon_threadsafe)`.

simplezarr-0.0.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,54 @@
+# ===== Project info
+[project]
+dynamic = ["version"]
+name = "simplezarr"
+description = "A simple, elegant, and efficient Zarr implementation."
+readme = "README.md"
+license = { file = "LICENSE" }
+authors = [{ name = "Almar Klein" }]
+keywords = ["zarr"]
+requires-python = ">= 3.12"
+dependencies = ['numpy', 'numcodecs', 'google-crc32c>=1.5']
+[project.optional-dependencies]
+lint = ["ruff", "pre-commit"]
+docs = ["sphinx>7.2", "sphinx_rtd_theme"]
+tests = ["pytest", "pytest-cov"]
+dev = ["simplezarr[lint,tests,docs]"]
+[project.urls]
+Homepage = "https://github.com/canpute/simplezarr"
+Repository = "https://github.com/canpute/simplezarr"
+# Documentation = "https://simplezarr.readthedocs.io"
+# ===== Building
+[build-system]
+requires = ["flit_core >=3.2,<4"]
+build-backend = "flit_core.buildapi"
+# ===== Tooling
+[tool.ruff]
+line-length = 88
+[tool.ruff.lint]
+select = ["F", "E", "W", "N", "B", "RUF"]
+ignore = [
+    "E501",   # Line too long
+    "RUF012", # Mutable class attributes should be annotated with `typing.ClassVar`
+    "E731",   # Do not assign a `lambda` expression, use a `def
+    "RUF022", # __all__ is not sorted
+]
+[tool.coverage.report]
+exclude_also = [
+    # Have to re-enable the standard pragma, plus a less-ugly flavor
+    "pragma: no cover",
+    "no-cover",
+    "raise NotImplementedError",
+    "raise AssertionError",
+    "if __name__ == .__main__.:",
+]

simplezarr-0.0.1/simplezarr/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+# ruff: noqa: F401
+from ._version import version_info, __version__
+from .stores import BaseStore, ReadableStore, WritableStore, ListableStore
+from .stores import MemoryStore, LocalStore, WrapperStore, SlowStore
+from .nodes import open_zarr, ZarrNode, ZarrGroup, ZarrArray

simplezarr-0.0.1/simplezarr/_version.py ADDED Viewed

@@ -0,0 +1,227 @@
+"""
+_version.py v1.6
+Simple version string management, using a hard-coded version string
+for simplicity and compatibility, while adding git info at runtime.
+See https://github.com/pygfx/_version for more info.
+This code is subject to The Unlicense (public domain).
+Any updates to this file should be done in https://github.com/pygfx/_version
+Usage in short:
+* Add this file to the root of your library (next to the `__init__.py`).
+* On a new release, you just update the __version__.
+"""
+# ruff: noqa: RUF100, S310, PLR2004, D212, D400, D415, S603, BLE001, COM812
+import logging
+import subprocess
+from pathlib import Path
+# This is the base version number, to be bumped before each release.
+# The build system detects this definition when building a distribution.
+__version__ = "0.0.1"
+# Set this to your library name
+project_name = "simplezarr"
+logger = logging.getLogger(project_name)
+# Get whether this is a repo. If so, repo_dir is the path, otherwise None.
+# .git is a dir in a normal repo and a file when in a submodule.
+repo_dir = Path(__file__).parents[1]
+repo_dir = repo_dir if repo_dir.joinpath(".git").exists() else None
+def get_version() -> str:
+    """Get the version string."""
+    try:
+        if repo_dir:
+            release, post, tag, dirty = get_version_info_from_git()
+            result = get_extended_version(release, post, tag, dirty)
+            # Warn if release does not match base_version.
+            # Can happen between bumping and tagging. And also when merging a
+            # version bump into a working branch, because we use --first-parent.
+            if release and release != base_version:
+                release2, _post, _tag, _dirty = get_version_info_from_git(
+                    first_parent=False
+                )
+                if release2 != base_version:
+                    warning(
+                        f"{project_name} version from git ({release})"
+                        f" and __version__ ({base_version}) don't match."
+                    )
+            return result
+    except Exception as err:
+        # Failsafe.
+        warning(f"Error getting refined version: {err}")
+    return base_version
+def get_extended_version(release: str, post: str, tag: str, dirty: str) -> str:
+    """Get an extended version string with information from git."""
+    # Start version string (__version__ string is leading).
+    version = base_version
+    labels = []
+    if release and release != base_version:
+        pre_label = "from_tag_" + release.replace(".", "_")
+        labels = [pre_label, f"post{post}", tag, dirty]
+    elif post and post != "0":
+        version += f".post{post}"
+        labels = [tag, dirty]
+    elif dirty:
+        labels = [tag, dirty]
+    else:
+        # If not post and not dirty, show 'clean' version without git tag.
+        pass
+    # Compose final version (remove empty labels, e.g. when not dirty).
+    # Everything after the '+' is not sortable (does not get in version_info).
+    label_str = ".".join(label for label in labels if label)
+    if label_str:
+        version += "+" + label_str
+    return version
+def get_version_info_from_git(
+    *, first_parent: bool = True
+) -> tuple[str, str, str, str]:
+    """
+    Get (release, post, tag, dirty) from Git.
+    With `release` the version number from the latest tag, `post` the
+    number of commits since that tag, `tag` the git hash, and `dirty` a string
+    that is either empty or says 'dirty'.
+    """
+    # Call out to Git.
+    command = ["git", "describe", "--long", "--always", "--tags", "--dirty"]
+    if first_parent:
+        command.append("--first-parent")
+    try:
+        p = subprocess.run(command, check=False, cwd=repo_dir, capture_output=True)
+    except Exception as e:
+        warning(f"Could not get {project_name} version: {e}")
+        p = None
+    # Parse the result into parts.
+    if p is None:
+        parts = ("", "", "unknown")
+    else:
+        output = p.stdout.decode(errors="ignore")
+        if p.returncode:
+            stderr = p.stderr.decode(errors="ignore")
+            warning(
+                f"Could not get {project_name} version.\n\nstdout: "
+                + output
+                + "\n\nstderr: "
+                + stderr
+            )
+            parts = ("", "", "unknown")
+        else:
+            parts = output.strip().lstrip("v").split("-")
+            if len(parts) <= 2:
+                # No tags (and thus no post). Only git hash and maybe 'dirty'.
+                parts = ("", "", *parts)
+    # Return unpacked parts.
+    release = parts[0]
+    post = parts[1]
+    tag = parts[2]
+    dirty = "dirty" if len(parts) > 3 else ""
+    return release, post, tag, dirty
+def version_to_tuple(v: str) -> tuple:
+    parts = []
+    for part in v.split("+", maxsplit=1)[0].split("."):
+        if not part:
+            pass
+        elif part.startswith("post"):
+            try:
+                parts.extend(["post", int(part[4:])])
+            except ValueError:
+                parts.append(part)
+        else:
+            try:
+                parts.append(int(part))
+            except ValueError:
+                parts.append(part)
+    return tuple(parts)
+def warning(m: str) -> None:
+    logger.warning(m)
+# Apply the versioning.
+base_version = __version__
+__version__ = get_version()
+version_info = version_to_tuple(__version__)
+# The CLI part.
+CLI_USAGE = """
+_version.py
+help            - Show this message.
+version         - Show the current version.
+bump VERSION    - Bump the __version__ to the given VERSION.
+update          - Self-update the _version.py module by downloading the
+                  reference code and replacing version number and project name.
+""".lstrip()
+if __name__ == "__main__":
+    import sys
+    import urllib.request
+    def prnt(m: str) -> None:
+        sys.stdout.write(m + "\n")
+        sys.stdout.flush()
+    _, *args = sys.argv
+    this_file = Path(__file__)
+    if not args or args[0] == "version":
+        prnt(f"{project_name} v{__version__}")
+    elif args[0] == "bump":
+        if len(args) != 2:
+            sys.exit("Expected a version number to bump to.")
+        new_version = args[1].lstrip("v")  # allow '1.2.3' and 'v1.2.3'
+        if new_version.count(".") != 2:
+            sys.exit("Expected two dots in new version string.")
+        if not all(s.isnumeric() for s in new_version.split(".")):
+            sys.exit("Expected only numbers in new version string.")
+        with this_file.open("rb") as f:
+            text = ref_text = f.read().decode()
+        text = text.replace(base_version, new_version, 1)
+        with this_file.open("wb") as f:
+            f.write(text.encode())
+        prnt(f"Bumped version from '{base_version}' to '{new_version}'.")
+    elif args[0] == "update":
+        u = "https://raw.githubusercontent.com/pygfx/_version/main/_version.py"
+        with urllib.request.urlopen(u) as f:
+            text = ref_text = f.read().decode()
+        text = text.replace("0.0.0", base_version, 1)
+        text = text.replace("PROJECT_NAME", project_name, 1)
+        with this_file.open("wb") as f:
+            f.write(text.encode())
+        prnt("Updated to the latest _version.py.")
+    elif args[0].lstrip("-") in ["h", "help"]:
+        prnt(CLI_USAGE)
+    else:
+        prnt(f"Unknown command for _version.py: {args[0]!r}")
+        prnt("Use ``python _version.py help`` to see a list of options.")
+        sys.exit(1)