simplezarr 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Almar Klein
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,95 @@
1
+ Metadata-Version: 2.4
2
+ Name: simplezarr
3
+ Version: 0.0.1
4
+ Summary: A simple, elegant, and efficient Zarr implementation.
5
+ Keywords: zarr
6
+ Author: Almar Klein
7
+ Requires-Python: >= 3.12
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: numpy
11
+ Requires-Dist: numcodecs
12
+ Requires-Dist: google-crc32c>=1.5
13
+ Requires-Dist: simplezarr[lint, tests, docs] ; extra == "dev"
14
+ Requires-Dist: sphinx>7.2 ; extra == "docs"
15
+ Requires-Dist: sphinx_rtd_theme ; extra == "docs"
16
+ Requires-Dist: ruff ; extra == "lint"
17
+ Requires-Dist: pre-commit ; extra == "lint"
18
+ Requires-Dist: pytest ; extra == "tests"
19
+ Requires-Dist: pytest-cov ; extra == "tests"
20
+ Project-URL: Homepage, https://github.com/canpute/simplezarr
21
+ Project-URL: Repository, https://github.com/canpute/simplezarr
22
+ Provides-Extra: dev
23
+ Provides-Extra: docs
24
+ Provides-Extra: lint
25
+ Provides-Extra: tests
26
+
27
+ # simplezarr
28
+ A simple, elegant, and efficient Zarr implementation
29
+
30
+ The core of `simplezarr` implements the Zarr 3 spec in straightforward
31
+ Python, without extra fuzz. This makes the code easy to follow, and gives
32
+ predictable performance. Extra functionality is provided as functions and classes
33
+ that are provided in `simplezarr.utils`.
34
+
35
+ Since `simplezarr` is nice and simple, it's easy to adopt in various
36
+ use-cases. It supports parallel io, but does not force the use of asyncio.
37
+
38
+
39
+ ## Status
40
+
41
+ * Stores are implemented, (except no remote stores yet).
42
+ * Codecs are implemented (all except for sharding).
43
+ * Main API can (asynchronously) read and write chunks.
44
+
45
+ What is not yet supported:
46
+
47
+ * Writing Zarr files.
48
+ * Indexing (wip).
49
+ * Sharding.
50
+
51
+
52
+ ## Motivation
53
+
54
+ Zarr 3 is a great file format for large datasets. It's nice and elegant. The
55
+ `simplezarr` lib is what happened when we took the Zarr 3 spec, and implemented
56
+ it as directly as possible.
57
+
58
+ Parallelism is achieved using a thread-pool and `concurrent.futures.Future`
59
+ objects. And in once place exactly: the code that reads a chunk (`ZarrArray.get_chunk_future()`).
60
+
61
+ We don't force asyncio. In fact, ``simplezarr`` does not even import
62
+ asyncio (except in code paths that represent a utility specific to asyncio
63
+ users).
64
+
65
+ ## Comparison with zarr-python
66
+
67
+ Why not use zarr-python? We ran into performance issues, and upon
68
+ investigating what happens under the hood, we found it hard to follow the path
69
+ that the code takes, especially regarding threading and asyncio. Granted, part
70
+ of that complexity is because it must support older Zarr versions as well.
71
+ Another reason is that zarr-python does not seem to have a way to read individual blocks
72
+ asynchronously (`AsyncArray.get_block_selection()` does not exist), which was a
73
+ requirement for our use-case.
74
+
75
+ ### What zarr-python does
76
+
77
+ * The store loads data using `asyncio.to_thread()`. This runs the io-bound reading of bytes in a separate thread (from the loop's default `ThreadPoolExecutor`).
78
+ * It uses `asyncio.gather()` is parallelize concurrent reads/writes.
79
+ * When using the `zarr.Array` (not `AsyncArray`), indexing is synchronous. To do this:
80
+ * It uses a dedicated asyncio loop that runs continuously in a dedicated thread.
81
+ * A dedicated `ThreadPoolExecutor` is set on that loop (which will be used to perform the store IO with).
82
+ * Then `asyncio.run_coroutine_threadsafe(the_asyncio_coroutine, dedicated_loop)` to turn the asyncio code into a `concurrent.futures.Future`.
83
+ * Then sync-wait on that future.
84
+
85
+ It looks like this complexity is one of the reasons why the performance of ome-zarr is hard to get right. The ome-zarr library wraps zarr-python with Dask, which uses thread pools too, which results in a lot of threads being spawned.
86
+
87
+ ### What simplezarr does
88
+
89
+ * Stores are synchronous.
90
+ * `simplezarr.Array.get_chunk()` is synchronous (no threading or async).
91
+ * `simplezarr.Array.get_chunk_future()` uses a `ThreadPoolExecutor`. It returns a `concurrent.futures.Future`.
92
+ * This is enough to support concurrently reads.
93
+ * No asyncio anywhere.
94
+ * But can be used in `asyncio` (and other frameworks) using `await asyncio.wrap_future(f)` or `f.add_done_callback(call_soon_threadsafe)`.
95
+
@@ -0,0 +1,68 @@
1
+ # simplezarr
2
+ A simple, elegant, and efficient Zarr implementation
3
+
4
+ The core of `simplezarr` implements the Zarr 3 spec in straightforward
5
+ Python, without extra fuzz. This makes the code easy to follow, and gives
6
+ predictable performance. Extra functionality is provided as functions and classes
7
+ that are provided in `simplezarr.utils`.
8
+
9
+ Since `simplezarr` is nice and simple, it's easy to adopt in various
10
+ use-cases. It supports parallel io, but does not force the use of asyncio.
11
+
12
+
13
+ ## Status
14
+
15
+ * Stores are implemented, (except no remote stores yet).
16
+ * Codecs are implemented (all except for sharding).
17
+ * Main API can (asynchronously) read and write chunks.
18
+
19
+ What is not yet supported:
20
+
21
+ * Writing Zarr files.
22
+ * Indexing (wip).
23
+ * Sharding.
24
+
25
+
26
+ ## Motivation
27
+
28
+ Zarr 3 is a great file format for large datasets. It's nice and elegant. The
29
+ `simplezarr` lib is what happened when we took the Zarr 3 spec, and implemented
30
+ it as directly as possible.
31
+
32
+ Parallelism is achieved using a thread-pool and `concurrent.futures.Future`
33
+ objects. And in once place exactly: the code that reads a chunk (`ZarrArray.get_chunk_future()`).
34
+
35
+ We don't force asyncio. In fact, ``simplezarr`` does not even import
36
+ asyncio (except in code paths that represent a utility specific to asyncio
37
+ users).
38
+
39
+ ## Comparison with zarr-python
40
+
41
+ Why not use zarr-python? We ran into performance issues, and upon
42
+ investigating what happens under the hood, we found it hard to follow the path
43
+ that the code takes, especially regarding threading and asyncio. Granted, part
44
+ of that complexity is because it must support older Zarr versions as well.
45
+ Another reason is that zarr-python does not seem to have a way to read individual blocks
46
+ asynchronously (`AsyncArray.get_block_selection()` does not exist), which was a
47
+ requirement for our use-case.
48
+
49
+ ### What zarr-python does
50
+
51
+ * The store loads data using `asyncio.to_thread()`. This runs the io-bound reading of bytes in a separate thread (from the loop's default `ThreadPoolExecutor`).
52
+ * It uses `asyncio.gather()` is parallelize concurrent reads/writes.
53
+ * When using the `zarr.Array` (not `AsyncArray`), indexing is synchronous. To do this:
54
+ * It uses a dedicated asyncio loop that runs continuously in a dedicated thread.
55
+ * A dedicated `ThreadPoolExecutor` is set on that loop (which will be used to perform the store IO with).
56
+ * Then `asyncio.run_coroutine_threadsafe(the_asyncio_coroutine, dedicated_loop)` to turn the asyncio code into a `concurrent.futures.Future`.
57
+ * Then sync-wait on that future.
58
+
59
+ It looks like this complexity is one of the reasons why the performance of ome-zarr is hard to get right. The ome-zarr library wraps zarr-python with Dask, which uses thread pools too, which results in a lot of threads being spawned.
60
+
61
+ ### What simplezarr does
62
+
63
+ * Stores are synchronous.
64
+ * `simplezarr.Array.get_chunk()` is synchronous (no threading or async).
65
+ * `simplezarr.Array.get_chunk_future()` uses a `ThreadPoolExecutor`. It returns a `concurrent.futures.Future`.
66
+ * This is enough to support concurrently reads.
67
+ * No asyncio anywhere.
68
+ * But can be used in `asyncio` (and other frameworks) using `await asyncio.wrap_future(f)` or `f.add_done_callback(call_soon_threadsafe)`.
@@ -0,0 +1,54 @@
1
+ # ===== Project info
2
+
3
+ [project]
4
+ dynamic = ["version"]
5
+ name = "simplezarr"
6
+ description = "A simple, elegant, and efficient Zarr implementation."
7
+ readme = "README.md"
8
+ license = { file = "LICENSE" }
9
+ authors = [{ name = "Almar Klein" }]
10
+ keywords = ["zarr"]
11
+ requires-python = ">= 3.12"
12
+ dependencies = ['numpy', 'numcodecs', 'google-crc32c>=1.5']
13
+ [project.optional-dependencies]
14
+ lint = ["ruff", "pre-commit"]
15
+ docs = ["sphinx>7.2", "sphinx_rtd_theme"]
16
+ tests = ["pytest", "pytest-cov"]
17
+ dev = ["simplezarr[lint,tests,docs]"]
18
+
19
+ [project.urls]
20
+ Homepage = "https://github.com/canpute/simplezarr"
21
+ Repository = "https://github.com/canpute/simplezarr"
22
+ # Documentation = "https://simplezarr.readthedocs.io"
23
+
24
+ # ===== Building
25
+
26
+ [build-system]
27
+ requires = ["flit_core >=3.2,<4"]
28
+ build-backend = "flit_core.buildapi"
29
+
30
+ # ===== Tooling
31
+
32
+ [tool.ruff]
33
+ line-length = 88
34
+
35
+ [tool.ruff.lint]
36
+ select = ["F", "E", "W", "N", "B", "RUF"]
37
+ ignore = [
38
+ "E501", # Line too long
39
+ "RUF012", # Mutable class attributes should be annotated with `typing.ClassVar`
40
+ "E731", # Do not assign a `lambda` expression, use a `def
41
+ "RUF022", # __all__ is not sorted
42
+ ]
43
+
44
+
45
+ [tool.coverage.report]
46
+
47
+ exclude_also = [
48
+ # Have to re-enable the standard pragma, plus a less-ugly flavor
49
+ "pragma: no cover",
50
+ "no-cover",
51
+ "raise NotImplementedError",
52
+ "raise AssertionError",
53
+ "if __name__ == .__main__.:",
54
+ ]
@@ -0,0 +1,6 @@
1
+ # ruff: noqa: F401
2
+
3
+ from ._version import version_info, __version__
4
+ from .stores import BaseStore, ReadableStore, WritableStore, ListableStore
5
+ from .stores import MemoryStore, LocalStore, WrapperStore, SlowStore
6
+ from .nodes import open_zarr, ZarrNode, ZarrGroup, ZarrArray
@@ -0,0 +1,227 @@
1
+ """
2
+ _version.py v1.6
3
+
4
+ Simple version string management, using a hard-coded version string
5
+ for simplicity and compatibility, while adding git info at runtime.
6
+ See https://github.com/pygfx/_version for more info.
7
+ This code is subject to The Unlicense (public domain).
8
+
9
+ Any updates to this file should be done in https://github.com/pygfx/_version
10
+
11
+ Usage in short:
12
+
13
+ * Add this file to the root of your library (next to the `__init__.py`).
14
+ * On a new release, you just update the __version__.
15
+ """
16
+
17
+ # ruff: noqa: RUF100, S310, PLR2004, D212, D400, D415, S603, BLE001, COM812
18
+
19
+ import logging
20
+ import subprocess
21
+ from pathlib import Path
22
+
23
+ # This is the base version number, to be bumped before each release.
24
+ # The build system detects this definition when building a distribution.
25
+ __version__ = "0.0.1"
26
+
27
+ # Set this to your library name
28
+ project_name = "simplezarr"
29
+
30
+
31
+ logger = logging.getLogger(project_name)
32
+
33
+ # Get whether this is a repo. If so, repo_dir is the path, otherwise None.
34
+ # .git is a dir in a normal repo and a file when in a submodule.
35
+ repo_dir = Path(__file__).parents[1]
36
+ repo_dir = repo_dir if repo_dir.joinpath(".git").exists() else None
37
+
38
+
39
+ def get_version() -> str:
40
+ """Get the version string."""
41
+ try:
42
+ if repo_dir:
43
+ release, post, tag, dirty = get_version_info_from_git()
44
+ result = get_extended_version(release, post, tag, dirty)
45
+
46
+ # Warn if release does not match base_version.
47
+ # Can happen between bumping and tagging. And also when merging a
48
+ # version bump into a working branch, because we use --first-parent.
49
+ if release and release != base_version:
50
+ release2, _post, _tag, _dirty = get_version_info_from_git(
51
+ first_parent=False
52
+ )
53
+ if release2 != base_version:
54
+ warning(
55
+ f"{project_name} version from git ({release})"
56
+ f" and __version__ ({base_version}) don't match."
57
+ )
58
+
59
+ return result
60
+
61
+ except Exception as err:
62
+ # Failsafe.
63
+ warning(f"Error getting refined version: {err}")
64
+
65
+ return base_version
66
+
67
+
68
+ def get_extended_version(release: str, post: str, tag: str, dirty: str) -> str:
69
+ """Get an extended version string with information from git."""
70
+ # Start version string (__version__ string is leading).
71
+ version = base_version
72
+ labels = []
73
+
74
+ if release and release != base_version:
75
+ pre_label = "from_tag_" + release.replace(".", "_")
76
+ labels = [pre_label, f"post{post}", tag, dirty]
77
+ elif post and post != "0":
78
+ version += f".post{post}"
79
+ labels = [tag, dirty]
80
+ elif dirty:
81
+ labels = [tag, dirty]
82
+ else:
83
+ # If not post and not dirty, show 'clean' version without git tag.
84
+ pass
85
+
86
+ # Compose final version (remove empty labels, e.g. when not dirty).
87
+ # Everything after the '+' is not sortable (does not get in version_info).
88
+ label_str = ".".join(label for label in labels if label)
89
+ if label_str:
90
+ version += "+" + label_str
91
+ return version
92
+
93
+
94
+ def get_version_info_from_git(
95
+ *, first_parent: bool = True
96
+ ) -> tuple[str, str, str, str]:
97
+ """
98
+ Get (release, post, tag, dirty) from Git.
99
+
100
+ With `release` the version number from the latest tag, `post` the
101
+ number of commits since that tag, `tag` the git hash, and `dirty` a string
102
+ that is either empty or says 'dirty'.
103
+ """
104
+ # Call out to Git.
105
+ command = ["git", "describe", "--long", "--always", "--tags", "--dirty"]
106
+ if first_parent:
107
+ command.append("--first-parent")
108
+ try:
109
+ p = subprocess.run(command, check=False, cwd=repo_dir, capture_output=True)
110
+ except Exception as e:
111
+ warning(f"Could not get {project_name} version: {e}")
112
+ p = None
113
+
114
+ # Parse the result into parts.
115
+ if p is None:
116
+ parts = ("", "", "unknown")
117
+ else:
118
+ output = p.stdout.decode(errors="ignore")
119
+ if p.returncode:
120
+ stderr = p.stderr.decode(errors="ignore")
121
+ warning(
122
+ f"Could not get {project_name} version.\n\nstdout: "
123
+ + output
124
+ + "\n\nstderr: "
125
+ + stderr
126
+ )
127
+ parts = ("", "", "unknown")
128
+ else:
129
+ parts = output.strip().lstrip("v").split("-")
130
+ if len(parts) <= 2:
131
+ # No tags (and thus no post). Only git hash and maybe 'dirty'.
132
+ parts = ("", "", *parts)
133
+
134
+ # Return unpacked parts.
135
+ release = parts[0]
136
+ post = parts[1]
137
+ tag = parts[2]
138
+ dirty = "dirty" if len(parts) > 3 else ""
139
+ return release, post, tag, dirty
140
+
141
+
142
+ def version_to_tuple(v: str) -> tuple:
143
+ parts = []
144
+ for part in v.split("+", maxsplit=1)[0].split("."):
145
+ if not part:
146
+ pass
147
+ elif part.startswith("post"):
148
+ try:
149
+ parts.extend(["post", int(part[4:])])
150
+ except ValueError:
151
+ parts.append(part)
152
+ else:
153
+ try:
154
+ parts.append(int(part))
155
+ except ValueError:
156
+ parts.append(part)
157
+ return tuple(parts)
158
+
159
+
160
+ def warning(m: str) -> None:
161
+ logger.warning(m)
162
+
163
+
164
+ # Apply the versioning.
165
+ base_version = __version__
166
+ __version__ = get_version()
167
+ version_info = version_to_tuple(__version__)
168
+
169
+
170
+ # The CLI part.
171
+
172
+ CLI_USAGE = """
173
+ _version.py
174
+
175
+ help - Show this message.
176
+ version - Show the current version.
177
+ bump VERSION - Bump the __version__ to the given VERSION.
178
+ update - Self-update the _version.py module by downloading the
179
+ reference code and replacing version number and project name.
180
+ """.lstrip()
181
+
182
+ if __name__ == "__main__":
183
+ import sys
184
+ import urllib.request
185
+
186
+ def prnt(m: str) -> None:
187
+ sys.stdout.write(m + "\n")
188
+ sys.stdout.flush()
189
+
190
+ _, *args = sys.argv
191
+ this_file = Path(__file__)
192
+
193
+ if not args or args[0] == "version":
194
+ prnt(f"{project_name} v{__version__}")
195
+
196
+ elif args[0] == "bump":
197
+ if len(args) != 2:
198
+ sys.exit("Expected a version number to bump to.")
199
+ new_version = args[1].lstrip("v") # allow '1.2.3' and 'v1.2.3'
200
+ if new_version.count(".") != 2:
201
+ sys.exit("Expected two dots in new version string.")
202
+ if not all(s.isnumeric() for s in new_version.split(".")):
203
+ sys.exit("Expected only numbers in new version string.")
204
+ with this_file.open("rb") as f:
205
+ text = ref_text = f.read().decode()
206
+ text = text.replace(base_version, new_version, 1)
207
+ with this_file.open("wb") as f:
208
+ f.write(text.encode())
209
+ prnt(f"Bumped version from '{base_version}' to '{new_version}'.")
210
+
211
+ elif args[0] == "update":
212
+ u = "https://raw.githubusercontent.com/pygfx/_version/main/_version.py"
213
+ with urllib.request.urlopen(u) as f:
214
+ text = ref_text = f.read().decode()
215
+ text = text.replace("0.0.0", base_version, 1)
216
+ text = text.replace("PROJECT_NAME", project_name, 1)
217
+ with this_file.open("wb") as f:
218
+ f.write(text.encode())
219
+ prnt("Updated to the latest _version.py.")
220
+
221
+ elif args[0].lstrip("-") in ["h", "help"]:
222
+ prnt(CLI_USAGE)
223
+
224
+ else:
225
+ prnt(f"Unknown command for _version.py: {args[0]!r}")
226
+ prnt("Use ``python _version.py help`` to see a list of options.")
227
+ sys.exit(1)