checkpointer 2.12.0__tar.gz → 2.13.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checkpointer-2.13.0/PKG-INFO +260 -0
- checkpointer-2.13.0/README.md +244 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/__init__.py +2 -2
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/checkpoint.py +74 -62
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/fn_ident.py +36 -34
- checkpointer-2.13.0/checkpointer/fn_string.py +77 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/object_hash.py +22 -17
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/storages/storage.py +5 -4
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/utils.py +13 -15
- {checkpointer-2.12.0 → checkpointer-2.13.0}/pyproject.toml +6 -8
- {checkpointer-2.12.0 → checkpointer-2.13.0}/uv.lock +32 -22
- checkpointer-2.12.0/PKG-INFO +0 -236
- checkpointer-2.12.0/README.md +0 -220
- {checkpointer-2.12.0 → checkpointer-2.13.0}/.gitignore +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/.python-version +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/ATTRIBUTION.md +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/LICENSE +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/import_mappings.py +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/print_checkpoint.py +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/storages/__init__.py +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/storages/memory_storage.py +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/storages/pickle_storage.py +0 -0
- {checkpointer-2.12.0 → checkpointer-2.13.0}/checkpointer/types.py +0 -0
@@ -0,0 +1,260 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: checkpointer
|
3
|
+
Version: 2.13.0
|
4
|
+
Summary: checkpointer adds code-aware caching to Python functions, maintaining correctness and speeding up execution as your code changes.
|
5
|
+
Project-URL: Repository, https://github.com/Reddan/checkpointer.git
|
6
|
+
Author: Hampus Hallman
|
7
|
+
License-Expression: MIT
|
8
|
+
License-File: ATTRIBUTION.md
|
9
|
+
License-File: LICENSE
|
10
|
+
Keywords: async,cache,caching,data analysis,data processing,fast,hashing,invalidation,memoization,optimization,performance,workflow
|
11
|
+
Classifier: Programming Language :: Python :: 3.11
|
12
|
+
Classifier: Programming Language :: Python :: 3.12
|
13
|
+
Classifier: Programming Language :: Python :: 3.13
|
14
|
+
Requires-Python: >=3.11
|
15
|
+
Description-Content-Type: text/markdown
|
16
|
+
|
17
|
+
# checkpointer · [](https://github.com/Reddan/checkpointer/blob/master/LICENSE) [](https://pypi.org/project/checkpointer/) [](https://pypi.org/project/checkpointer/)
|
18
|
+
|
19
|
+
`checkpointer` is a Python library offering a decorator-based API for memoizing (caching) function results with code-aware cache invalidation. It works with sync and async functions, supports multiple storage backends, and refreshes caches automatically when your code or dependencies change - helping you maintain correctness, speed up execution, and smooth out your workflows by skipping redundant, costly operations.
|
20
|
+
|
21
|
+
## 📦 Installation
|
22
|
+
|
23
|
+
```bash
|
24
|
+
pip install checkpointer
|
25
|
+
```
|
26
|
+
|
27
|
+
## 🚀 Quick Start
|
28
|
+
|
29
|
+
Apply the `@checkpoint` decorator to any function:
|
30
|
+
|
31
|
+
```python
|
32
|
+
from checkpointer import checkpoint
|
33
|
+
|
34
|
+
@checkpoint
|
35
|
+
def expensive_function(x: int) -> int:
|
36
|
+
print("Computing...")
|
37
|
+
return x ** 2
|
38
|
+
|
39
|
+
result = expensive_function(4) # Computes and stores the result
|
40
|
+
result = expensive_function(4) # Loads from the cache
|
41
|
+
```
|
42
|
+
|
43
|
+
## 🧠 How It Works
|
44
|
+
|
45
|
+
When a `@checkpoint`-decorated function is called, `checkpointer` computes a unique identifier for the call. This identifier derives from the function's source code, its dependencies, captured variables, and the arguments passed.
|
46
|
+
|
47
|
+
It then tries to retrieve a cached result using this identifier. If a valid cached result is found, it's returned immediately. Otherwise, the original function executes, its result is stored, and then returned.
|
48
|
+
|
49
|
+
### 🚨 What Triggers Cache Invalidation?
|
50
|
+
|
51
|
+
`checkpointer` maintains cache correctness using two types of hashes:
|
52
|
+
|
53
|
+
#### 1. Function Identity Hash (One-Time per Function)
|
54
|
+
|
55
|
+
This hash represents the decorated function itself and is computed once (usually on first invocation). It covers:
|
56
|
+
|
57
|
+
* **Decorated Function's Code:**\
|
58
|
+
The function's logic and signature (excluding parameter type annotations) are hashed. Formatting changes like whitespace, newlines, comments, or trailing commas do **not** cause invalidation.
|
59
|
+
|
60
|
+
* **Dependencies:**\
|
61
|
+
All user-defined functions and methods the function calls or uses are included recursively. Dependencies are detected by:
|
62
|
+
* Inspecting the function's global scope for referenced functions/objects.
|
63
|
+
* Inferring from argument type annotations.
|
64
|
+
* Analyzing object constructions and method calls to identify classes and methods used.
|
65
|
+
|
66
|
+
* **Top-Level Module Code:**\
|
67
|
+
Changes unrelated to the function or its dependencies in the module do **not** trigger invalidation.
|
68
|
+
|
69
|
+
#### 2. Call Hash (Computed on Every Function Call)
|
70
|
+
|
71
|
+
Each function call's cache key (the **call hash**) combines:
|
72
|
+
|
73
|
+
* **Passed Arguments:**\
|
74
|
+
Includes positional and keyword arguments, combined with default values. Changing defaults alone doesn't necessarily trigger invalidation unless it affects actual call values.
|
75
|
+
|
76
|
+
* **Captured Global Variables:**\
|
77
|
+
When `capture=True` or explicit capture annotations are used, `checkpointer` hashes global variables referenced by the function:
|
78
|
+
* `CaptureMe` variables are hashed on every call, so changes trigger invalidation.
|
79
|
+
* `CaptureMeOnce` variables are hashed once per session for performance optimization.
|
80
|
+
|
81
|
+
* **Custom Argument Hashing:**\
|
82
|
+
Using `HashBy` annotations, arguments or captured variables can be transformed before hashing (e.g., sorting lists to ignore order), allowing more precise or efficient call hashes.
|
83
|
+
|
84
|
+
## 💡 Usage
|
85
|
+
|
86
|
+
Once a function is decorated with `@checkpoint`, you can interact with its caching behavior using the following methods:
|
87
|
+
|
88
|
+
* **`expensive_function(...)`**:\
|
89
|
+
Call the function normally. This will compute and cache the result or load it from cache.
|
90
|
+
|
91
|
+
* **`expensive_function.rerun(...)`**:\
|
92
|
+
Force the original function to execute and overwrite any existing cached result.
|
93
|
+
|
94
|
+
* **`expensive_function.fn(...)`**:\
|
95
|
+
Call the undecorated function directly, bypassing the cache (useful in recursion to prevent caching intermediate steps).
|
96
|
+
|
97
|
+
* **`expensive_function.get(...)`**:\
|
98
|
+
Retrieve the cached result without executing the function. Raises `CheckpointError` if no valid cache exists.
|
99
|
+
|
100
|
+
* **`expensive_function.exists(...)`**:\
|
101
|
+
Check if a cached result exists without computing or loading it.
|
102
|
+
|
103
|
+
* **`expensive_function.delete(...)`**:\
|
104
|
+
Remove the cached entry for given arguments.
|
105
|
+
|
106
|
+
* **`expensive_function.reinit(recursive: bool = False)`**:\
|
107
|
+
Recalculate the function identity hash and recapture `CaptureMeOnce` variables, updating the cached function state within the same Python session.
|
108
|
+
|
109
|
+
## ⚙️ Configuration & Customization
|
110
|
+
|
111
|
+
The `@checkpoint` decorator accepts the following parameters:
|
112
|
+
|
113
|
+
* **`storage`** (Type: `str` or `checkpointer.Storage`, Default: `"pickle"`)\
|
114
|
+
Storage backend to use: `"pickle"` (disk-based, persistent), `"memory"` (in-memory, non-persistent), or a custom `Storage` class.
|
115
|
+
|
116
|
+
* **`directory`** (Type: `str` or `pathlib.Path` or `None`, Default: `~/.cache/checkpoints`)\
|
117
|
+
Base directory for disk-based checkpoints (only for `"pickle"` storage).
|
118
|
+
|
119
|
+
* **`when`** (Type: `bool`, Default: `True`)\
|
120
|
+
Enable or disable checkpointing dynamically, useful for environment-based toggling.
|
121
|
+
|
122
|
+
* **`capture`** (Type: `bool`, Default: `False`)\
|
123
|
+
If `True`, includes global variables referenced by the function in call hashes (except those excluded via `NoHash`).
|
124
|
+
|
125
|
+
* **`should_expire`** (Type: `Callable[[datetime.datetime], bool]`, Default: `None`)\
|
126
|
+
A custom callable that receives the `datetime` timestamp of a cached result. It should return `True` if the cached result is considered expired and needs recomputation, or `False` otherwise.
|
127
|
+
|
128
|
+
* **`fn_hash_from`** (Type: `Any`, Default: `None`)\
|
129
|
+
Override the computed function identity hash with any hashable object you provide (e.g., version strings, config IDs). This gives you explicit control over the function's version and when its cache should be invalidated.
|
130
|
+
|
131
|
+
* **`verbosity`** (Type: `int` (`0`, `1`, or `2`), Default: `1`)\
|
132
|
+
Controls the level of logging output from `checkpointer`.
|
133
|
+
* `0`: No output.
|
134
|
+
* `1`: Shows when functions are computed and cached.
|
135
|
+
* `2`: Also shows when cached results are remembered (loaded from cache).
|
136
|
+
|
137
|
+
## 🔬 Customize Argument Hashing
|
138
|
+
|
139
|
+
You can customize how arguments are hashed without modifying the actual argument values to improve cache hit rates or speed up hashing.
|
140
|
+
|
141
|
+
* **`Annotated[T, HashBy[fn]]`**:\
|
142
|
+
Transform the argument via `fn(argument)` before hashing. Useful for normalization (e.g., sorting lists) or optimized hashing for complex inputs.
|
143
|
+
|
144
|
+
* **`NoHash[T]`**:\
|
145
|
+
Exclude the argument from hashing completely, so changes to it won't trigger cache invalidation.
|
146
|
+
|
147
|
+
**Example:**
|
148
|
+
|
149
|
+
```python
|
150
|
+
from typing import Annotated
|
151
|
+
from checkpointer import checkpoint, HashBy, NoHash
|
152
|
+
from pathlib import Path
|
153
|
+
import logging
|
154
|
+
|
155
|
+
def file_bytes(path: Path) -> bytes:
|
156
|
+
return path.read_bytes()
|
157
|
+
|
158
|
+
@checkpoint
|
159
|
+
def process(
|
160
|
+
numbers: Annotated[list[int], HashBy[sorted]], # Hash by sorted list
|
161
|
+
data_file: Annotated[Path, HashBy[file_bytes]], # Hash by file content
|
162
|
+
log: NoHash[logging.Logger], # Exclude logger from hashing
|
163
|
+
):
|
164
|
+
...
|
165
|
+
```
|
166
|
+
|
167
|
+
In this example, the hash for `numbers` ignores order, `data_file` is hashed based on its contents rather than path, and changes to `log` don't affect caching.
|
168
|
+
|
169
|
+
## 🎯 Capturing Global Variables
|
170
|
+
|
171
|
+
`checkpointer` can include **captured global variables** in call hashes - these are globals your function reads during execution that may affect results.
|
172
|
+
|
173
|
+
Use `capture=True` on `@checkpoint` to capture **all** referenced globals (except those explicitly excluded with `NoHash`).
|
174
|
+
|
175
|
+
Alternatively, you can **opt-in selectively** by annotating globals with:
|
176
|
+
|
177
|
+
* **`CaptureMe[T]`**:\
|
178
|
+
Capture the variable on every call (triggers invalidation on changes).
|
179
|
+
|
180
|
+
* **`CaptureMeOnce[T]`**:\
|
181
|
+
Capture once per Python session (for expensive, immutable globals).
|
182
|
+
|
183
|
+
You can also combine these with `HashBy` to customize how captured variables are hashed (e.g., hash by subset of attributes).
|
184
|
+
|
185
|
+
**Example:**
|
186
|
+
|
187
|
+
```python
|
188
|
+
from typing import Annotated
|
189
|
+
from checkpointer import checkpoint, CaptureMe, CaptureMeOnce, HashBy
|
190
|
+
from pathlib import Path
|
191
|
+
|
192
|
+
def file_bytes(path: Path) -> bytes:
|
193
|
+
return path.read_bytes()
|
194
|
+
|
195
|
+
captured_data: CaptureMe[Annotated[Path, HashBy[file_bytes]]] = Path("data.txt")
|
196
|
+
session_config: CaptureMeOnce[dict] = {"mode": "prod"}
|
197
|
+
|
198
|
+
@checkpoint
|
199
|
+
def process():
|
200
|
+
# `captured_data` is included in the call hash on every call, hashed by file content
|
201
|
+
# `session_config` is hashed once per session
|
202
|
+
...
|
203
|
+
```
|
204
|
+
|
205
|
+
## 🗄️ Custom Storage Backends
|
206
|
+
|
207
|
+
Implement your own storage backend by subclassing `checkpointer.Storage` and overriding required methods.
|
208
|
+
|
209
|
+
Within storage methods, `call_hash` identifies calls by arguments. Use `self.fn_id()` to get function identity (name + hash/version), important for organizing checkpoints.
|
210
|
+
|
211
|
+
**Example:**
|
212
|
+
|
213
|
+
```python
|
214
|
+
from checkpointer import checkpoint, Storage
|
215
|
+
from datetime import datetime
|
216
|
+
|
217
|
+
class MyCustomStorage(Storage):
|
218
|
+
def exists(self, call_hash):
|
219
|
+
fn_dir = self.checkpointer.directory / self.fn_id()
|
220
|
+
return (fn_dir / call_hash).exists()
|
221
|
+
|
222
|
+
def store(self, call_hash, data):
|
223
|
+
... # Store serialized data
|
224
|
+
return data # Must return data to checkpointer
|
225
|
+
|
226
|
+
def checkpoint_date(self, call_hash): ...
|
227
|
+
def load(self, call_hash): ...
|
228
|
+
def delete(self, call_hash): ...
|
229
|
+
|
230
|
+
@checkpoint(storage=MyCustomStorage)
|
231
|
+
def custom_cached_function(x: int):
|
232
|
+
return x ** 2
|
233
|
+
```
|
234
|
+
|
235
|
+
## ⚡ Async Support
|
236
|
+
|
237
|
+
`checkpointer` works with Python's `asyncio` and other async runtimes.
|
238
|
+
|
239
|
+
```python
|
240
|
+
import asyncio
|
241
|
+
from checkpointer import checkpoint
|
242
|
+
|
243
|
+
@checkpoint
|
244
|
+
async def async_compute_sum(a: int, b: int) -> int:
|
245
|
+
print(f"Asynchronously computing {a} + {b}...")
|
246
|
+
await asyncio.sleep(1)
|
247
|
+
return a + b
|
248
|
+
|
249
|
+
async def main():
|
250
|
+
result1 = await async_compute_sum(3, 7)
|
251
|
+
print(f"Result 1: {result1}")
|
252
|
+
|
253
|
+
result2 = await async_compute_sum(3, 7)
|
254
|
+
print(f"Result 2: {result2}")
|
255
|
+
|
256
|
+
result3 = async_compute_sum.get(3, 7)
|
257
|
+
print(f"Result 3 (from cache): {result3}")
|
258
|
+
|
259
|
+
asyncio.run(main())
|
260
|
+
```
|
@@ -0,0 +1,244 @@
|
|
1
|
+
# checkpointer · [](https://github.com/Reddan/checkpointer/blob/master/LICENSE) [](https://pypi.org/project/checkpointer/) [](https://pypi.org/project/checkpointer/)
|
2
|
+
|
3
|
+
`checkpointer` is a Python library offering a decorator-based API for memoizing (caching) function results with code-aware cache invalidation. It works with sync and async functions, supports multiple storage backends, and refreshes caches automatically when your code or dependencies change - helping you maintain correctness, speed up execution, and smooth out your workflows by skipping redundant, costly operations.
|
4
|
+
|
5
|
+
## 📦 Installation
|
6
|
+
|
7
|
+
```bash
|
8
|
+
pip install checkpointer
|
9
|
+
```
|
10
|
+
|
11
|
+
## 🚀 Quick Start
|
12
|
+
|
13
|
+
Apply the `@checkpoint` decorator to any function:
|
14
|
+
|
15
|
+
```python
|
16
|
+
from checkpointer import checkpoint
|
17
|
+
|
18
|
+
@checkpoint
|
19
|
+
def expensive_function(x: int) -> int:
|
20
|
+
print("Computing...")
|
21
|
+
return x ** 2
|
22
|
+
|
23
|
+
result = expensive_function(4) # Computes and stores the result
|
24
|
+
result = expensive_function(4) # Loads from the cache
|
25
|
+
```
|
26
|
+
|
27
|
+
## 🧠 How It Works
|
28
|
+
|
29
|
+
When a `@checkpoint`-decorated function is called, `checkpointer` computes a unique identifier for the call. This identifier derives from the function's source code, its dependencies, captured variables, and the arguments passed.
|
30
|
+
|
31
|
+
It then tries to retrieve a cached result using this identifier. If a valid cached result is found, it's returned immediately. Otherwise, the original function executes, its result is stored, and then returned.
|
32
|
+
|
33
|
+
### 🚨 What Triggers Cache Invalidation?
|
34
|
+
|
35
|
+
`checkpointer` maintains cache correctness using two types of hashes:
|
36
|
+
|
37
|
+
#### 1. Function Identity Hash (One-Time per Function)
|
38
|
+
|
39
|
+
This hash represents the decorated function itself and is computed once (usually on first invocation). It covers:
|
40
|
+
|
41
|
+
* **Decorated Function's Code:**\
|
42
|
+
The function's logic and signature (excluding parameter type annotations) are hashed. Formatting changes like whitespace, newlines, comments, or trailing commas do **not** cause invalidation.
|
43
|
+
|
44
|
+
* **Dependencies:**\
|
45
|
+
All user-defined functions and methods the function calls or uses are included recursively. Dependencies are detected by:
|
46
|
+
* Inspecting the function's global scope for referenced functions/objects.
|
47
|
+
* Inferring from argument type annotations.
|
48
|
+
* Analyzing object constructions and method calls to identify classes and methods used.
|
49
|
+
|
50
|
+
* **Top-Level Module Code:**\
|
51
|
+
Changes unrelated to the function or its dependencies in the module do **not** trigger invalidation.
|
52
|
+
|
53
|
+
#### 2. Call Hash (Computed on Every Function Call)
|
54
|
+
|
55
|
+
Each function call's cache key (the **call hash**) combines:
|
56
|
+
|
57
|
+
* **Passed Arguments:**\
|
58
|
+
Includes positional and keyword arguments, combined with default values. Changing defaults alone doesn't necessarily trigger invalidation unless it affects actual call values.
|
59
|
+
|
60
|
+
* **Captured Global Variables:**\
|
61
|
+
When `capture=True` or explicit capture annotations are used, `checkpointer` hashes global variables referenced by the function:
|
62
|
+
* `CaptureMe` variables are hashed on every call, so changes trigger invalidation.
|
63
|
+
* `CaptureMeOnce` variables are hashed once per session for performance optimization.
|
64
|
+
|
65
|
+
* **Custom Argument Hashing:**\
|
66
|
+
Using `HashBy` annotations, arguments or captured variables can be transformed before hashing (e.g., sorting lists to ignore order), allowing more precise or efficient call hashes.
|
67
|
+
|
68
|
+
## 💡 Usage
|
69
|
+
|
70
|
+
Once a function is decorated with `@checkpoint`, you can interact with its caching behavior using the following methods:
|
71
|
+
|
72
|
+
* **`expensive_function(...)`**:\
|
73
|
+
Call the function normally. This will compute and cache the result or load it from cache.
|
74
|
+
|
75
|
+
* **`expensive_function.rerun(...)`**:\
|
76
|
+
Force the original function to execute and overwrite any existing cached result.
|
77
|
+
|
78
|
+
* **`expensive_function.fn(...)`**:\
|
79
|
+
Call the undecorated function directly, bypassing the cache (useful in recursion to prevent caching intermediate steps).
|
80
|
+
|
81
|
+
* **`expensive_function.get(...)`**:\
|
82
|
+
Retrieve the cached result without executing the function. Raises `CheckpointError` if no valid cache exists.
|
83
|
+
|
84
|
+
* **`expensive_function.exists(...)`**:\
|
85
|
+
Check if a cached result exists without computing or loading it.
|
86
|
+
|
87
|
+
* **`expensive_function.delete(...)`**:\
|
88
|
+
Remove the cached entry for given arguments.
|
89
|
+
|
90
|
+
* **`expensive_function.reinit(recursive: bool = False)`**:\
|
91
|
+
Recalculate the function identity hash and recapture `CaptureMeOnce` variables, updating the cached function state within the same Python session.
|
92
|
+
|
93
|
+
## ⚙️ Configuration & Customization
|
94
|
+
|
95
|
+
The `@checkpoint` decorator accepts the following parameters:
|
96
|
+
|
97
|
+
* **`storage`** (Type: `str` or `checkpointer.Storage`, Default: `"pickle"`)\
|
98
|
+
Storage backend to use: `"pickle"` (disk-based, persistent), `"memory"` (in-memory, non-persistent), or a custom `Storage` class.
|
99
|
+
|
100
|
+
* **`directory`** (Type: `str` or `pathlib.Path` or `None`, Default: `~/.cache/checkpoints`)\
|
101
|
+
Base directory for disk-based checkpoints (only for `"pickle"` storage).
|
102
|
+
|
103
|
+
* **`when`** (Type: `bool`, Default: `True`)\
|
104
|
+
Enable or disable checkpointing dynamically, useful for environment-based toggling.
|
105
|
+
|
106
|
+
* **`capture`** (Type: `bool`, Default: `False`)\
|
107
|
+
If `True`, includes global variables referenced by the function in call hashes (except those excluded via `NoHash`).
|
108
|
+
|
109
|
+
* **`should_expire`** (Type: `Callable[[datetime.datetime], bool]`, Default: `None`)\
|
110
|
+
A custom callable that receives the `datetime` timestamp of a cached result. It should return `True` if the cached result is considered expired and needs recomputation, or `False` otherwise.
|
111
|
+
|
112
|
+
* **`fn_hash_from`** (Type: `Any`, Default: `None`)\
|
113
|
+
Override the computed function identity hash with any hashable object you provide (e.g., version strings, config IDs). This gives you explicit control over the function's version and when its cache should be invalidated.
|
114
|
+
|
115
|
+
* **`verbosity`** (Type: `int` (`0`, `1`, or `2`), Default: `1`)\
|
116
|
+
Controls the level of logging output from `checkpointer`.
|
117
|
+
* `0`: No output.
|
118
|
+
* `1`: Shows when functions are computed and cached.
|
119
|
+
* `2`: Also shows when cached results are remembered (loaded from cache).
|
120
|
+
|
121
|
+
## 🔬 Customize Argument Hashing
|
122
|
+
|
123
|
+
You can customize how arguments are hashed without modifying the actual argument values to improve cache hit rates or speed up hashing.
|
124
|
+
|
125
|
+
* **`Annotated[T, HashBy[fn]]`**:\
|
126
|
+
Transform the argument via `fn(argument)` before hashing. Useful for normalization (e.g., sorting lists) or optimized hashing for complex inputs.
|
127
|
+
|
128
|
+
* **`NoHash[T]`**:\
|
129
|
+
Exclude the argument from hashing completely, so changes to it won't trigger cache invalidation.
|
130
|
+
|
131
|
+
**Example:**
|
132
|
+
|
133
|
+
```python
|
134
|
+
from typing import Annotated
|
135
|
+
from checkpointer import checkpoint, HashBy, NoHash
|
136
|
+
from pathlib import Path
|
137
|
+
import logging
|
138
|
+
|
139
|
+
def file_bytes(path: Path) -> bytes:
|
140
|
+
return path.read_bytes()
|
141
|
+
|
142
|
+
@checkpoint
|
143
|
+
def process(
|
144
|
+
numbers: Annotated[list[int], HashBy[sorted]], # Hash by sorted list
|
145
|
+
data_file: Annotated[Path, HashBy[file_bytes]], # Hash by file content
|
146
|
+
log: NoHash[logging.Logger], # Exclude logger from hashing
|
147
|
+
):
|
148
|
+
...
|
149
|
+
```
|
150
|
+
|
151
|
+
In this example, the hash for `numbers` ignores order, `data_file` is hashed based on its contents rather than path, and changes to `log` don't affect caching.
|
152
|
+
|
153
|
+
## 🎯 Capturing Global Variables
|
154
|
+
|
155
|
+
`checkpointer` can include **captured global variables** in call hashes - these are globals your function reads during execution that may affect results.
|
156
|
+
|
157
|
+
Use `capture=True` on `@checkpoint` to capture **all** referenced globals (except those explicitly excluded with `NoHash`).
|
158
|
+
|
159
|
+
Alternatively, you can **opt-in selectively** by annotating globals with:
|
160
|
+
|
161
|
+
* **`CaptureMe[T]`**:\
|
162
|
+
Capture the variable on every call (triggers invalidation on changes).
|
163
|
+
|
164
|
+
* **`CaptureMeOnce[T]`**:\
|
165
|
+
Capture once per Python session (for expensive, immutable globals).
|
166
|
+
|
167
|
+
You can also combine these with `HashBy` to customize how captured variables are hashed (e.g., hash by subset of attributes).
|
168
|
+
|
169
|
+
**Example:**
|
170
|
+
|
171
|
+
```python
|
172
|
+
from typing import Annotated
|
173
|
+
from checkpointer import checkpoint, CaptureMe, CaptureMeOnce, HashBy
|
174
|
+
from pathlib import Path
|
175
|
+
|
176
|
+
def file_bytes(path: Path) -> bytes:
|
177
|
+
return path.read_bytes()
|
178
|
+
|
179
|
+
captured_data: CaptureMe[Annotated[Path, HashBy[file_bytes]]] = Path("data.txt")
|
180
|
+
session_config: CaptureMeOnce[dict] = {"mode": "prod"}
|
181
|
+
|
182
|
+
@checkpoint
|
183
|
+
def process():
|
184
|
+
# `captured_data` is included in the call hash on every call, hashed by file content
|
185
|
+
# `session_config` is hashed once per session
|
186
|
+
...
|
187
|
+
```
|
188
|
+
|
189
|
+
## 🗄️ Custom Storage Backends
|
190
|
+
|
191
|
+
Implement your own storage backend by subclassing `checkpointer.Storage` and overriding required methods.
|
192
|
+
|
193
|
+
Within storage methods, `call_hash` identifies calls by arguments. Use `self.fn_id()` to get function identity (name + hash/version), important for organizing checkpoints.
|
194
|
+
|
195
|
+
**Example:**
|
196
|
+
|
197
|
+
```python
|
198
|
+
from checkpointer import checkpoint, Storage
|
199
|
+
from datetime import datetime
|
200
|
+
|
201
|
+
class MyCustomStorage(Storage):
|
202
|
+
def exists(self, call_hash):
|
203
|
+
fn_dir = self.checkpointer.directory / self.fn_id()
|
204
|
+
return (fn_dir / call_hash).exists()
|
205
|
+
|
206
|
+
def store(self, call_hash, data):
|
207
|
+
... # Store serialized data
|
208
|
+
return data # Must return data to checkpointer
|
209
|
+
|
210
|
+
def checkpoint_date(self, call_hash): ...
|
211
|
+
def load(self, call_hash): ...
|
212
|
+
def delete(self, call_hash): ...
|
213
|
+
|
214
|
+
@checkpoint(storage=MyCustomStorage)
|
215
|
+
def custom_cached_function(x: int):
|
216
|
+
return x ** 2
|
217
|
+
```
|
218
|
+
|
219
|
+
## ⚡ Async Support
|
220
|
+
|
221
|
+
`checkpointer` works with Python's `asyncio` and other async runtimes.
|
222
|
+
|
223
|
+
```python
|
224
|
+
import asyncio
|
225
|
+
from checkpointer import checkpoint
|
226
|
+
|
227
|
+
@checkpoint
|
228
|
+
async def async_compute_sum(a: int, b: int) -> int:
|
229
|
+
print(f"Asynchronously computing {a} + {b}...")
|
230
|
+
await asyncio.sleep(1)
|
231
|
+
return a + b
|
232
|
+
|
233
|
+
async def main():
|
234
|
+
result1 = await async_compute_sum(3, 7)
|
235
|
+
print(f"Result 1: {result1}")
|
236
|
+
|
237
|
+
result2 = await async_compute_sum(3, 7)
|
238
|
+
print(f"Result 2: {result2}")
|
239
|
+
|
240
|
+
result3 = async_compute_sum.get(3, 7)
|
241
|
+
print(f"Result 3 (from cache): {result3}")
|
242
|
+
|
243
|
+
asyncio.run(main())
|
244
|
+
```
|
@@ -8,8 +8,8 @@ from .types import AwaitableValue, Captured, CapturedOnce, CaptureMe, CaptureMeO
|
|
8
8
|
|
9
9
|
checkpoint = Checkpointer()
|
10
10
|
capture_checkpoint = Checkpointer(capture=True)
|
11
|
-
memory_checkpoint = Checkpointer(
|
12
|
-
tmp_checkpoint = Checkpointer(
|
11
|
+
memory_checkpoint = Checkpointer(storage="memory", verbosity=0)
|
12
|
+
tmp_checkpoint = Checkpointer(directory=f"{tempfile.gettempdir()}/checkpoints")
|
13
13
|
static_checkpoint = Checkpointer(fn_hash_from=())
|
14
14
|
|
15
15
|
def cleanup_all(invalidated=True, expired=True):
|