checkpointer 2.11.2__py3-none-any.whl → 2.13.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,260 @@
1
+ Metadata-Version: 2.4
2
+ Name: checkpointer
3
+ Version: 2.13.0
4
+ Summary: checkpointer adds code-aware caching to Python functions, maintaining correctness and speeding up execution as your code changes.
5
+ Project-URL: Repository, https://github.com/Reddan/checkpointer.git
6
+ Author: Hampus Hallman
7
+ License-Expression: MIT
8
+ License-File: ATTRIBUTION.md
9
+ License-File: LICENSE
10
+ Keywords: async,cache,caching,data analysis,data processing,fast,hashing,invalidation,memoization,optimization,performance,workflow
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Programming Language :: Python :: 3.13
14
+ Requires-Python: >=3.11
15
+ Description-Content-Type: text/markdown
16
+
17
+ # checkpointer · [![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/Reddan/checkpointer/blob/master/LICENSE) [![pypi](https://img.shields.io/pypi/v/checkpointer)](https://pypi.org/project/checkpointer/) [![pypi](https://img.shields.io/pypi/pyversions/checkpointer)](https://pypi.org/project/checkpointer/)
18
+
19
+ `checkpointer` is a Python library offering a decorator-based API for memoizing (caching) function results with code-aware cache invalidation. It works with sync and async functions, supports multiple storage backends, and refreshes caches automatically when your code or dependencies change - helping you maintain correctness, speed up execution, and smooth out your workflows by skipping redundant, costly operations.
20
+
21
+ ## 📦 Installation
22
+
23
+ ```bash
24
+ pip install checkpointer
25
+ ```
26
+
27
+ ## 🚀 Quick Start
28
+
29
+ Apply the `@checkpoint` decorator to any function:
30
+
31
+ ```python
32
+ from checkpointer import checkpoint
33
+
34
+ @checkpoint
35
+ def expensive_function(x: int) -> int:
36
+ print("Computing...")
37
+ return x ** 2
38
+
39
+ result = expensive_function(4) # Computes and stores the result
40
+ result = expensive_function(4) # Loads from the cache
41
+ ```
42
+
43
+ ## 🧠 How It Works
44
+
45
+ When a `@checkpoint`-decorated function is called, `checkpointer` computes a unique identifier for the call. This identifier derives from the function's source code, its dependencies, captured variables, and the arguments passed.
46
+
47
+ It then tries to retrieve a cached result using this identifier. If a valid cached result is found, it's returned immediately. Otherwise, the original function executes, its result is stored, and then returned.
48
+
49
+ ### 🚨 What Triggers Cache Invalidation?
50
+
51
+ `checkpointer` maintains cache correctness using two types of hashes:
52
+
53
+ #### 1. Function Identity Hash (One-Time per Function)
54
+
55
+ This hash represents the decorated function itself and is computed once (usually on first invocation). It covers:
56
+
57
+ * **Decorated Function's Code:**\
58
+ The function's logic and signature (excluding parameter type annotations) are hashed. Formatting changes like whitespace, newlines, comments, or trailing commas do **not** cause invalidation.
59
+
60
+ * **Dependencies:**\
61
+ All user-defined functions and methods the function calls or uses are included recursively. Dependencies are detected by:
62
+ * Inspecting the function's global scope for referenced functions/objects.
63
+ * Inferring from argument type annotations.
64
+ * Analyzing object constructions and method calls to identify classes and methods used.
65
+
66
+ * **Top-Level Module Code:**\
67
+ Changes unrelated to the function or its dependencies in the module do **not** trigger invalidation.
68
+
69
+ #### 2. Call Hash (Computed on Every Function Call)
70
+
71
+ Each function call's cache key (the **call hash**) combines:
72
+
73
+ * **Passed Arguments:**\
74
+ Includes positional and keyword arguments, combined with default values. Changing defaults alone doesn't necessarily trigger invalidation unless it affects actual call values.
75
+
76
+ * **Captured Global Variables:**\
77
+ When `capture=True` or explicit capture annotations are used, `checkpointer` hashes global variables referenced by the function:
78
+ * `CaptureMe` variables are hashed on every call, so changes trigger invalidation.
79
+ * `CaptureMeOnce` variables are hashed once per session for performance optimization.
80
+
81
+ * **Custom Argument Hashing:**\
82
+ Using `HashBy` annotations, arguments or captured variables can be transformed before hashing (e.g., sorting lists to ignore order), allowing more precise or efficient call hashes.
83
+
84
+ ## 💡 Usage
85
+
86
+ Once a function is decorated with `@checkpoint`, you can interact with its caching behavior using the following methods:
87
+
88
+ * **`expensive_function(...)`**:\
89
+ Call the function normally. This will compute and cache the result or load it from cache.
90
+
91
+ * **`expensive_function.rerun(...)`**:\
92
+ Force the original function to execute and overwrite any existing cached result.
93
+
94
+ * **`expensive_function.fn(...)`**:\
95
+ Call the undecorated function directly, bypassing the cache (useful in recursion to prevent caching intermediate steps).
96
+
97
+ * **`expensive_function.get(...)`**:\
98
+ Retrieve the cached result without executing the function. Raises `CheckpointError` if no valid cache exists.
99
+
100
+ * **`expensive_function.exists(...)`**:\
101
+ Check if a cached result exists without computing or loading it.
102
+
103
+ * **`expensive_function.delete(...)`**:\
104
+ Remove the cached entry for given arguments.
105
+
106
+ * **`expensive_function.reinit(recursive: bool = False)`**:\
107
+ Recalculate the function identity hash and recapture `CaptureMeOnce` variables, updating the cached function state within the same Python session.
108
+
109
+ ## ⚙️ Configuration & Customization
110
+
111
+ The `@checkpoint` decorator accepts the following parameters:
112
+
113
+ * **`storage`** (Type: `str` or `checkpointer.Storage`, Default: `"pickle"`)\
114
+ Storage backend to use: `"pickle"` (disk-based, persistent), `"memory"` (in-memory, non-persistent), or a custom `Storage` class.
115
+
116
+ * **`directory`** (Type: `str` or `pathlib.Path` or `None`, Default: `~/.cache/checkpoints`)\
117
+ Base directory for disk-based checkpoints (only for `"pickle"` storage).
118
+
119
+ * **`when`** (Type: `bool`, Default: `True`)\
120
+ Enable or disable checkpointing dynamically, useful for environment-based toggling.
121
+
122
+ * **`capture`** (Type: `bool`, Default: `False`)\
123
+ If `True`, includes global variables referenced by the function in call hashes (except those excluded via `NoHash`).
124
+
125
+ * **`should_expire`** (Type: `Callable[[datetime.datetime], bool]`, Default: `None`)\
126
+ A custom callable that receives the `datetime` timestamp of a cached result. It should return `True` if the cached result is considered expired and needs recomputation, or `False` otherwise.
127
+
128
+ * **`fn_hash_from`** (Type: `Any`, Default: `None`)\
129
+ Override the computed function identity hash with any hashable object you provide (e.g., version strings, config IDs). This gives you explicit control over the function's version and when its cache should be invalidated.
130
+
131
+ * **`verbosity`** (Type: `int` (`0`, `1`, or `2`), Default: `1`)\
132
+ Controls the level of logging output from `checkpointer`.
133
+ * `0`: No output.
134
+ * `1`: Shows when functions are computed and cached.
135
+ * `2`: Also shows when cached results are remembered (loaded from cache).
136
+
137
+ ## 🔬 Customize Argument Hashing
138
+
139
+ You can customize how arguments are hashed without modifying the actual argument values to improve cache hit rates or speed up hashing.
140
+
141
+ * **`Annotated[T, HashBy[fn]]`**:\
142
+ Transform the argument via `fn(argument)` before hashing. Useful for normalization (e.g., sorting lists) or optimized hashing for complex inputs.
143
+
144
+ * **`NoHash[T]`**:\
145
+ Exclude the argument from hashing completely, so changes to it won't trigger cache invalidation.
146
+
147
+ **Example:**
148
+
149
+ ```python
150
+ from typing import Annotated
151
+ from checkpointer import checkpoint, HashBy, NoHash
152
+ from pathlib import Path
153
+ import logging
154
+
155
+ def file_bytes(path: Path) -> bytes:
156
+ return path.read_bytes()
157
+
158
+ @checkpoint
159
+ def process(
160
+ numbers: Annotated[list[int], HashBy[sorted]], # Hash by sorted list
161
+ data_file: Annotated[Path, HashBy[file_bytes]], # Hash by file content
162
+ log: NoHash[logging.Logger], # Exclude logger from hashing
163
+ ):
164
+ ...
165
+ ```
166
+
167
+ In this example, the hash for `numbers` ignores order, `data_file` is hashed based on its contents rather than path, and changes to `log` don't affect caching.
168
+
169
+ ## 🎯 Capturing Global Variables
170
+
171
+ `checkpointer` can include **captured global variables** in call hashes - these are globals your function reads during execution that may affect results.
172
+
173
+ Use `capture=True` on `@checkpoint` to capture **all** referenced globals (except those explicitly excluded with `NoHash`).
174
+
175
+ Alternatively, you can **opt-in selectively** by annotating globals with:
176
+
177
+ * **`CaptureMe[T]`**:\
178
+ Capture the variable on every call (triggers invalidation on changes).
179
+
180
+ * **`CaptureMeOnce[T]`**:\
181
+ Capture once per Python session (for expensive, immutable globals).
182
+
183
+ You can also combine these with `HashBy` to customize how captured variables are hashed (e.g., hash by subset of attributes).
184
+
185
+ **Example:**
186
+
187
+ ```python
188
+ from typing import Annotated
189
+ from checkpointer import checkpoint, CaptureMe, CaptureMeOnce, HashBy
190
+ from pathlib import Path
191
+
192
+ def file_bytes(path: Path) -> bytes:
193
+ return path.read_bytes()
194
+
195
+ captured_data: CaptureMe[Annotated[Path, HashBy[file_bytes]]] = Path("data.txt")
196
+ session_config: CaptureMeOnce[dict] = {"mode": "prod"}
197
+
198
+ @checkpoint
199
+ def process():
200
+ # `captured_data` is included in the call hash on every call, hashed by file content
201
+ # `session_config` is hashed once per session
202
+ ...
203
+ ```
204
+
205
+ ## 🗄️ Custom Storage Backends
206
+
207
+ Implement your own storage backend by subclassing `checkpointer.Storage` and overriding required methods.
208
+
209
+ Within storage methods, `call_hash` identifies calls by arguments. Use `self.fn_id()` to get function identity (name + hash/version), important for organizing checkpoints.
210
+
211
+ **Example:**
212
+
213
+ ```python
214
+ from checkpointer import checkpoint, Storage
215
+ from datetime import datetime
216
+
217
+ class MyCustomStorage(Storage):
218
+ def exists(self, call_hash):
219
+ fn_dir = self.checkpointer.directory / self.fn_id()
220
+ return (fn_dir / call_hash).exists()
221
+
222
+ def store(self, call_hash, data):
223
+ ... # Store serialized data
224
+ return data # Must return data to checkpointer
225
+
226
+ def checkpoint_date(self, call_hash): ...
227
+ def load(self, call_hash): ...
228
+ def delete(self, call_hash): ...
229
+
230
+ @checkpoint(storage=MyCustomStorage)
231
+ def custom_cached_function(x: int):
232
+ return x ** 2
233
+ ```
234
+
235
+ ## ⚡ Async Support
236
+
237
+ `checkpointer` works with Python's `asyncio` and other async runtimes.
238
+
239
+ ```python
240
+ import asyncio
241
+ from checkpointer import checkpoint
242
+
243
+ @checkpoint
244
+ async def async_compute_sum(a: int, b: int) -> int:
245
+ print(f"Asynchronously computing {a} + {b}...")
246
+ await asyncio.sleep(1)
247
+ return a + b
248
+
249
+ async def main():
250
+ result1 = await async_compute_sum(3, 7)
251
+ print(f"Result 1: {result1}")
252
+
253
+ result2 = await async_compute_sum(3, 7)
254
+ print(f"Result 2: {result2}")
255
+
256
+ result3 = async_compute_sum.get(3, 7)
257
+ print(f"Result 3 (from cache): {result3}")
258
+
259
+ asyncio.run(main())
260
+ ```
@@ -0,0 +1,18 @@
1
+ checkpointer/__init__.py,sha256=l14EbRTgmkPlJUJc-5uAjWmB4dQr6kXXOPAjHcqbKK8,890
2
+ checkpointer/checkpoint.py,sha256=Ylf0Yel9WiyHg7EoP355b2huS_yh0hgBUx2l3bOyI_c,10149
3
+ checkpointer/fn_ident.py,sha256=mZfGPSIidlZVHsG3Kc6jk8rTNDoN_k2tCkJmbSKRhdg,6921
4
+ checkpointer/fn_string.py,sha256=R1evcaBKoVP9SSDd741O1FoaC9SiaA3-kfn6QLxd9No,2532
5
+ checkpointer/import_mappings.py,sha256=ESqWvZTzYAmaVnJ6NulUvn3_8CInOOPmEKUXO2WD_WA,1794
6
+ checkpointer/object_hash.py,sha256=NXlhME87iA9rrMRjF_Au4SKXFVc14j-SiSEsGhH7M8s,8327
7
+ checkpointer/print_checkpoint.py,sha256=uUQ493fJCaB4nhp4Ox60govSCiBTIPbBX15zt2QiRGo,1356
8
+ checkpointer/types.py,sha256=GFqbGACdDxzQX3bb2LmF9UxQVWOEisGvdtobnqCBAOA,1129
9
+ checkpointer/utils.py,sha256=FEabT0jp7Bx2KN-uco0QDAsBJ6hgauKwEjKR4B8IKzk,2977
10
+ checkpointer/storages/__init__.py,sha256=p-r4YrPXn505_S3qLrSXHSlsEtb13w_DFnCt9IiUomk,296
11
+ checkpointer/storages/memory_storage.py,sha256=aQRSOmAfS0UudubCpv8cdfu2ycM8mlsO9tFMcD2kmgo,1133
12
+ checkpointer/storages/pickle_storage.py,sha256=je1LM2lTSs5yzm25Apg5tJ9jU9T6nXCgD9SlqQRIFaM,1652
13
+ checkpointer/storages/storage.py,sha256=9CZquRjw9lWpcCVCiU7E6P-eJxGBUVbNKWncsTRwmoc,916
14
+ checkpointer-2.13.0.dist-info/METADATA,sha256=vyBK4qpjq4c3S2yCCxjHTF2C5QtuQ3WDTWhEj-etz6U,10925
15
+ checkpointer-2.13.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
16
+ checkpointer-2.13.0.dist-info/licenses/ATTRIBUTION.md,sha256=WF6L7-sD4s9t9ytVJOhjhpDoZ6TrWpqE3_bMdDIeJxI,1078
17
+ checkpointer-2.13.0.dist-info/licenses/LICENSE,sha256=drXs6vIb7uW49r70UuMz2A1VtOCl626kiTbcmrar1Xo,1072
18
+ checkpointer-2.13.0.dist-info/RECORD,,
@@ -0,0 +1,33 @@
1
+ # Attribution and License Notices
2
+
3
+ This project includes code copied or adapted from third-party open-source projects. The following acknowledges the original sources and complies with their licensing requirements.
4
+
5
+ ---
6
+
7
+ ## Third-Party Code
8
+
9
+ ### more-itertools
10
+ - **Source:** https://github.com/more-itertools/more-itertools
11
+ - **Author:** Erik Rose
12
+ - **Copyright:** (c) 2012 Erik Rose
13
+ - **License:** MIT (https://github.com/more-itertools/more-itertools/blob/master/LICENSE)
14
+
15
+ ### colored
16
+ - **Source:** https://gitlab.com/dslackw/colored
17
+ - **Author:** Dimitris Zlatanidis
18
+ - **Copyright:** (c) 2014-2025 Dimitris Zlatanidis
19
+ - **License:** MIT (https://gitlab.com/dslackw/colored/-/blob/master/LICENSE.txt)
20
+
21
+ ---
22
+
23
+ ## License
24
+
25
+ This project is licensed under the MIT License. See the `LICENSE` file for details.
26
+
27
+ ---
28
+
29
+ ## Notes
30
+
31
+ - Third-party code is included under their original MIT licenses.
32
+ - This file documents those license notices, fulfilling attribution obligations.
33
+ - Source files with copied code may omit individual license headers in favor of this centralized attribution.
@@ -1,3 +1,5 @@
1
+ MIT License
2
+
1
3
  Copyright 2018-2025 Hampus Hallman
2
4
 
3
5
  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
@@ -1,168 +0,0 @@
1
- import asyncio
2
- import pytest
3
- from checkpointer import CheckpointError, checkpoint
4
- from .utils import AttrDict
5
-
6
- def global_multiply(a: int, b: int) -> int:
7
- return a * b
8
-
9
- @pytest.fixture(autouse=True)
10
- def run_before_and_after_tests(tmpdir):
11
- global checkpoint
12
- checkpoint = checkpoint(root_path=tmpdir)
13
- yield
14
-
15
- def test_basic_caching():
16
- @checkpoint
17
- def square(x: int) -> int:
18
- return x ** 2
19
-
20
- result1 = square(4)
21
- result2 = square(4)
22
-
23
- assert result1 == result2 == 16
24
-
25
- def test_cache_invalidation():
26
- @checkpoint
27
- def multiply(a: int, b: int):
28
- return a * b
29
-
30
- @checkpoint
31
- def helper(x: int):
32
- return multiply(x + 1, 2)
33
-
34
- @checkpoint
35
- def compute(a: int, b: int):
36
- return helper(a) + helper(b)
37
-
38
- result1 = compute(3, 4)
39
- assert result1 == 18
40
-
41
- def test_layered_caching():
42
- dev_checkpoint = checkpoint(when=True)
43
-
44
- @checkpoint(format="memory")
45
- @dev_checkpoint
46
- def expensive_function(x: int):
47
- return x ** 2
48
-
49
- assert expensive_function(4) == 16
50
- assert expensive_function(4) == 16
51
-
52
- def test_recursive_caching1():
53
- @checkpoint
54
- def fib(n: int) -> int:
55
- return fib(n - 1) + fib(n - 2) if n > 1 else n
56
-
57
- assert fib(10) == 55
58
- assert fib.get(10) == 55
59
- assert fib.get(5) == 5
60
-
61
- def test_recursive_caching2():
62
- @checkpoint
63
- def fib(n: int) -> int:
64
- return fib.fn(n - 1) + fib.fn(n - 2) if n > 1 else n
65
-
66
- assert fib(10) == 55
67
- assert fib.get(10) == 55
68
- with pytest.raises(CheckpointError):
69
- fib.get(5)
70
-
71
- @pytest.mark.asyncio
72
- async def test_async_caching():
73
- @checkpoint(format="memory")
74
- async def async_square(x: int) -> int:
75
- await asyncio.sleep(0.1)
76
- return x ** 2
77
-
78
- result1 = await async_square(3)
79
- result2 = await async_square(3)
80
- result3 = async_square.get(3)
81
-
82
- assert result1 == result2 == result3 == 9
83
-
84
- def test_force_recalculation():
85
- @checkpoint
86
- def square(x: int) -> int:
87
- return x ** 2
88
-
89
- assert square(5) == 25
90
- square.rerun(5)
91
- assert square.get(5) == 25
92
-
93
- def test_multi_layer_decorator():
94
- @checkpoint(format="memory")
95
- @checkpoint(format="pickle")
96
- def add(a: int, b: int) -> int:
97
- return a + b
98
-
99
- assert add(2, 3) == 5
100
- assert add.get(2, 3) == 5
101
-
102
- def test_capture():
103
- item_dict = AttrDict({"a": 1, "b": 1})
104
-
105
- @checkpoint(capture=True)
106
- def test_whole():
107
- return item_dict
108
-
109
- @checkpoint(capture=True)
110
- def test_a():
111
- return item_dict.a + 1
112
-
113
- init_hash_a = test_a.ident.captured_hash
114
- init_hash_whole = test_whole.ident.captured_hash
115
- item_dict.b += 1
116
- test_whole.reinit()
117
- test_a.reinit()
118
- assert test_whole.ident.captured_hash != init_hash_whole
119
- assert test_a.ident.captured_hash == init_hash_a
120
- item_dict.a += 1
121
- test_a.reinit()
122
- assert test_a.ident.captured_hash != init_hash_a
123
-
124
- def test_depends():
125
- def multiply_wrapper(a: int, b: int) -> int:
126
- return global_multiply(a, b)
127
-
128
- def helper(a: int, b: int) -> int:
129
- return multiply_wrapper(a + 1, b + 1)
130
-
131
- @checkpoint
132
- def test_a(a: int, b: int) -> int:
133
- return helper(a, b)
134
-
135
- @checkpoint
136
- def test_b(a: int, b: int) -> int:
137
- return test_a(a, b) + multiply_wrapper(a, b)
138
-
139
- assert set(test_a.depends) == {test_a.fn, helper, multiply_wrapper, global_multiply}
140
- assert set(test_b.depends) == {test_b.fn, test_a, multiply_wrapper, global_multiply}
141
-
142
- def test_lazy_init_1():
143
- @checkpoint
144
- def fn1(x: object) -> object:
145
- return fn2(x)
146
-
147
- @checkpoint
148
- def fn2(x: object) -> object:
149
- return fn1(x)
150
-
151
- assert set(fn1.depends) == {fn1.fn, fn2}
152
- assert set(fn2.depends) == {fn1, fn2.fn}
153
-
154
- def test_lazy_init_2():
155
- @checkpoint
156
- def fn1(x: object) -> object:
157
- return fn2(x)
158
-
159
- assert set(fn1.depends) == {fn1.fn}
160
-
161
- @checkpoint
162
- def fn2(x: object) -> object:
163
- return fn1(x)
164
-
165
- assert set(fn1.depends) == {fn1.fn}
166
- fn1.reinit()
167
- assert set(fn1.depends) == {fn1.fn, fn2}
168
- assert set(fn2.depends) == {fn1, fn2.fn}