cashet 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,15 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(pytest:*)",
5
+ "Bash(uv run:*)",
6
+ "Bash(git add:*)",
7
+ "Bash(git commit:*)",
8
+ "Bash(git push:*)"
9
+ ]
10
+ },
11
+ "sandbox": {
12
+ "enabled": false,
13
+ "autoAllowBashIfSandboxed": false
14
+ }
15
+ }
@@ -0,0 +1,19 @@
1
+ name: Release
2
+
3
+ on:
4
+ release:
5
+ types: [published]
6
+
7
+ permissions:
8
+ contents: read
9
+
10
+ jobs:
11
+ pypi:
12
+ runs-on: ubuntu-latest
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+ - uses: astral-sh/setup-uv@v5
16
+ with:
17
+ python-version: "3.11"
18
+ - run: uv build
19
+ - run: uv publish --token ${{ secrets.PYPI_API_TOKEN }}
@@ -0,0 +1,12 @@
1
+ .cashet/
2
+ *.db
3
+
4
+ __pycache__/
5
+ *.py[cod]
6
+ *.egg-info/
7
+ dist/
8
+ build/
9
+ .venv/
10
+ .pytest_cache/
11
+ .coverage
12
+ htmlcov/
@@ -0,0 +1 @@
1
+ 3.11
cashet-0.1.0/AGENTS.md ADDED
@@ -0,0 +1,58 @@
1
+ # AGENTS.md
2
+
3
+ ## Project overview
4
+
5
+ cashet is a content-addressable compute cache with git semantics. Python functions + args are hashed into cache keys, results stored as immutable blobs, identical calls deduplicated. Local-only but protocol-based for future distributed backends.
6
+
7
+ ## Commands
8
+
9
+ - Install deps: `uv sync`
10
+ - Run tests: `uv run pytest tests/ -v`
11
+ - Lint: `uv run ruff check src/ tests/`
12
+ - Type check: `uv run pyright src/`
13
+ - Format fix: `uv run ruff format src/ tests/`
14
+ - Run CLI: `uv run cashet --help`
15
+
16
+ Always run `ruff check`, `pyright`, and `pytest` before committing. All three must pass clean.
17
+
18
+ ## Tech stack
19
+
20
+ - Python >=3.11, src layout (`src/cashet/`)
21
+ - Build: hatchling
22
+ - Package manager: uv (no pip directly)
23
+ - Linting: ruff (target py311)
24
+ - Type checking: pyright strict mode (target 3.11)
25
+ - Testing: pytest
26
+ - CLI: click + rich
27
+
28
+ ## Architecture
29
+
30
+ Protocol-based dependency injection via three pluggable protocols in `src/cashet/protocols.py`:
31
+
32
+ - **Store** — metadata + blob storage. Default: `SQLiteStore` in `store.py`
33
+ - **Executor** — runs functions. Default: `LocalExecutor` in `executor.py`
34
+ - **Serializer** — serialize/deserialize results. Default: `PickleSerializer` in `hashing.py`
35
+
36
+ Data flow: `Client.submit()` → `build_task_def()` hashes function source + args → `LocalExecutor.submit()` checks cache → runs if needed → `build_commit()` creates commit with parent lineage → blobs stored via `Store.put_blob()` with zlib compression (256B threshold).
37
+
38
+ ## Key design decisions
39
+
40
+ - Function identity = source code only. Closure mutable state is NOT hashed. Users must pass values as explicit args for cache invalidation.
41
+ - Blobs are content-addressable (SHA-256). Identical outputs share one blob on disk.
42
+ - `cache=False` opt-out for non-deterministic functions. These get timestamp-salted hashes so they don't overwrite previous commits.
43
+ - All source files use `from __future__ import annotations` for union type syntax compatibility with >=3.10.
44
+ - `ResultRef` objects enable DAG chaining — pass a ref as an arg to `submit()` and it auto-resolves.
45
+
46
+ ## Code style
47
+
48
+ - No comments or docstrings. Code is self-documenting through naming.
49
+ - `from __future__ import annotations` in every source file.
50
+ - Ruff rules: E, F, I, N, W, UP, B, SIM, RUF. Line length 99.
51
+ - Pyright strict mode with reportUnknown* set to "none".
52
+
53
+ ## Boundaries
54
+
55
+ - Never commit secrets or .env files
56
+ - Never modify `uv.lock` manually — use `uv` commands
57
+ - PyPI-ready package with proper metadata and classifiers
58
+ - Import name is `cashet`, CLI entry point is `cashet`
cashet-0.1.0/CLAUDE.md ADDED
@@ -0,0 +1 @@
1
+ @AGENTS.md
cashet-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Dušan Jolović
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
cashet-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,530 @@
1
+ Metadata-Version: 2.4
2
+ Name: cashet
3
+ Version: 0.1.0
4
+ Summary: Content-addressable compute cache with git semantics
5
+ Project-URL: Repository, https://github.com/jolovicdev/cashet
6
+ Author-email: jolovicdev <dusan.jolovic@proton.me>
7
+ License-Expression: MIT
8
+ License-File: LICENSE
9
+ Keywords: cache,compute,content-addressable,dag,pipeline
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
18
+ Requires-Python: >=3.11
19
+ Requires-Dist: click>=8.1
20
+ Requires-Dist: rich>=13.0
21
+ Description-Content-Type: text/markdown
22
+
23
+ <h1 align="center">cashet</h1>
24
+
25
+ <p align="center">
26
+ <strong>Content-addressable compute cache with git semantics</strong><br>
27
+ Run a function once. Get the same result instantly every time after that.
28
+ </p>
29
+
30
+ <p align="center">
31
+ <a href="#install">Install</a> · <a href="#quickstart">Quick Start</a> · <a href="#why">Why</a> · <a href="#use-cases">Use Cases</a> · <a href="#cli">CLI</a> · <a href="#api">API</a> · <a href="#how-it-works">How It Works</a>
32
+ </p>
33
+
34
+ ---
35
+
36
+ ## Install
37
+
38
+ **Use it in any project** (installs the `cashet` CLI globally):
39
+
40
+ ```bash
41
+ uv tool install git+https://github.com/jolovicdev/cashet.git
42
+ ```
43
+
44
+ ```bash
45
+ cashet --help
46
+ ```
47
+
48
+ **Develop / contribute:**
49
+
50
+ ```bash
51
+ git clone https://github.com/jolovicdev/cashet.git
52
+ cd cashet
53
+ uv sync
54
+ uv run pytest
55
+ ```
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ from cashet import Client
61
+
62
+ client = Client() # creates .cashet/ in current directory
63
+
64
+ def expensive_transform(data, scale=1.0):
65
+ # imagine this takes 10 minutes
66
+ return [x * scale for x in data]
67
+
68
+ # First call: runs the function
69
+ ref = client.submit(expensive_transform, [1, 2, 3], scale=2.0)
70
+ print(ref.load()) # [2.0, 4.0, 6.0]
71
+
72
+ # Second call with same args: instant — returns cached result
73
+ ref2 = client.submit(expensive_transform, [1, 2, 3], scale=2.0)
74
+ print(ref2.load()) # [2.0, 4.0, 6.0] — no re-computation
75
+ ```
76
+
77
+ You can also use `Client` as a context manager to ensure the store connection is closed cleanly:
78
+
79
+ ```python
80
+ with Client() as client:
81
+ ref = client.submit(expensive_transform, [1, 2, 3], scale=2.0)
82
+ print(ref.load())
83
+ ```
84
+
85
+ Chain tasks into a pipeline where each step's output feeds into the next:
86
+
87
+ ```python
88
+ from cashet import Client
89
+
90
+ client = Client()
91
+
92
+ def load_dataset(path):
93
+ return list(range(100))
94
+
95
+ def normalize(data):
96
+ max_val = max(data)
97
+ return [x / max_val for x in data]
98
+
99
+ def train_model(data, lr=0.01):
100
+ return {"loss": 0.05, "lr": lr, "samples": len(data)}
101
+
102
+ # Step 1: load
103
+ raw = client.submit(load_dataset, "data/train.csv")
104
+
105
+ # Step 2: normalize (receives raw output as input)
106
+ normalized = client.submit(normalize, raw)
107
+
108
+ # Step 3: train (receives normalized output)
109
+ model = client.submit(train_model, normalized, lr=0.001)
110
+
111
+ print(model.load()) # {'loss': 0.05, 'lr': 0.001, 'samples': 100}
112
+ ```
113
+
114
+ Re-run the script — everything returns instantly from cache. Change one argument and only that step (and downstream) re-runs.
115
+
116
+ ## Why
117
+
118
+ You already have caches (`functools.lru_cache`, `joblib.Memory`). Here's what's different:
119
+
120
+ | | lru_cache | joblib.Memory | **cashet** |
121
+ |---|---|---|---|
122
+ | Persists across restarts | No | Yes | Yes |
123
+ | Content-addressable storage | No | No | Yes (like git blobs) |
124
+ | AST-normalized hashing | No | No | Yes (comments/formatting don't break cache) |
125
+ | DAG resolution (chain outputs) | No | No | Yes |
126
+ | CLI to inspect history | No | No | Yes |
127
+ | Diff two runs | No | No | Yes |
128
+ | Garbage collection / eviction | No | No | Yes |
129
+ | Pluggable serialization | No | No | Yes |
130
+ | Explicit cache opt-out | No | Partial | Yes |
131
+ | Pluggable store / executor | No | No | Yes |
132
+
133
+ The core idea: **hash the function's AST-normalized source + arguments = unique cache key**. Comments, docstrings, and formatting changes don't invalidate the cache — only semantic changes do. Same function + same args = same result, stored immutably on disk. The result is a git-like blob you can inspect, diff, and chain.
134
+
135
+ ## Use Cases
136
+
137
+ ### 1. ML Experiment Tracking Without the Bloat
138
+
139
+ You run 200 hyperparameter sweeps overnight. Half crash. You fix a bug and re-run. Without cashet, you re-process the dataset 200 times. With cashet:
140
+
141
+ ```python
142
+ from cashet import Client
143
+
144
+ client = Client()
145
+
146
+ def preprocess(dataset_path, image_size):
147
+ # 45 minutes of image resizing
148
+ ...
149
+
150
+ def train(data, learning_rate, dropout):
151
+ ...
152
+
153
+ data = client.submit(preprocess, "s3://my-bucket/images", 224)
154
+
155
+ for lr in [0.01, 0.001, 0.0001]:
156
+ for dropout in [0.2, 0.5]:
157
+ client.submit(train, data, lr, dropout)
158
+ ```
159
+
160
+ `preprocess` runs **once** — all 6 training jobs reuse its cached output. Re-run the script tomorrow and even the training results come from cache (same function + same args = instant).
161
+
162
+ ### 2. Data Pipeline Debugging
163
+
164
+ Your ETL pipeline fails at step 5. You fix a typo. Now you need to re-run steps 5-7 but steps 1-4 are unchanged and expensive:
165
+
166
+ ```python
167
+ from cashet import Client
168
+
169
+ client = Client()
170
+
171
+ raw = client.submit(load_s3, "s3://logs/2024-05-01/")
172
+ clean = client.submit(remove_pii, raw)
173
+ enriched = client.submit(join_crm, clean, "select * from users")
174
+ report = client.submit(generate_report, enriched)
175
+ ```
176
+
177
+ Fix the `join_crm` function and re-run the script. Steps 1-2 return instantly from cache. Only step 3 onward re-executes. This works because cashet tracks which function produced which output — changing a function's source code changes its hash, invalidating downstream cache entries.
178
+
179
+ ### 3. Reproducible Notebook Results
180
+
181
+ Share a result with a colleague and they can verify exactly how it was produced:
182
+
183
+ ```python
184
+ # your notebook
185
+ ref = client.submit(generate_forecast, date="2024-01-01", model="v3")
186
+ print(f"Result hash: {ref.hash}")
187
+ ```
188
+
189
+ ```bash
190
+ # their terminal — inspect provenance
191
+ cashet show <hash>
192
+
193
+ # Output:
194
+ # Hash: a3b4c5d6...
195
+ # Function: generate_forecast
196
+ # Source: def generate_forecast(date, model): ...
197
+ # Args: (('2024-01-01',), {'model': 'v3'})
198
+ # Created: 2024-05-01T10:32:17
199
+
200
+ # Retrieve the actual result
201
+ cashet get <hash> -o forecast.csv
202
+ ```
203
+
204
+ ### 4. Incremental Computation
205
+
206
+ Process a large dataset in chunks. Already-processed chunks return instantly:
207
+
208
+ ```python
209
+ from cashet import Client
210
+
211
+ client = Client()
212
+
213
+ def process_chunk(chunk_id, source_file):
214
+ # expensive per-chunk processing
215
+ ...
216
+
217
+ results = []
218
+ for chunk_id in range(100):
219
+ ref = client.submit(process_chunk, chunk_id, "huge_file.parquet")
220
+ results.append(ref)
221
+ ```
222
+
223
+ First run processes all 100 chunks. Second run (even after restarting Python) returns all 100 results instantly. Add a new chunk? Only that one runs.
224
+
225
+ ## CLI
226
+
227
+ ```bash
228
+ # Show commit history
229
+ cashet log
230
+
231
+ # Filter by function name
232
+ cashet log --func "preprocess"
233
+
234
+ # Filter by tag
235
+ cashet log --tag env=prod --tag experiment=run-1
236
+
237
+ # Show full commit details (source code, args, error)
238
+ cashet show <hash>
239
+
240
+ # Retrieve a result to file
241
+ cashet get <hash> -o output.bin
242
+
243
+ # Compare two commits
244
+ cashet diff <hash_a> <hash_b>
245
+
246
+ # Show lineage of a result (same function+args over time)
247
+ cashet history <hash>
248
+
249
+ # Delete a specific commit
250
+ cashet rm <hash>
251
+
252
+ # Evict old cache entries and orphaned blobs
253
+ cashet gc --older-than 30
254
+
255
+ # Storage statistics
256
+ cashet stats
257
+ ```
258
+
259
+ ## API
260
+
261
+ ### `Client`
262
+
263
+ ```python
264
+ from cashet import Client
265
+
266
+ client = Client(
267
+ store_dir=".cashet", # where to store blobs + metadata (SQLiteStore)
268
+ store=None, # or inject any Store implementation
269
+ executor=None, # or inject any Executor implementation
270
+ serializer=None, # defaults to PickleSerializer
271
+ )
272
+ ```
273
+
274
+ ### Pluggable Backends
275
+
276
+ Everything is protocol-based. Swap the store, executor, or serializer without touching your task code:
277
+
278
+ ```python
279
+ from cashet import Client, Store, Executor, Serializer
280
+ from cashet.store import SQLiteStore
281
+ from cashet.executor import LocalExecutor
282
+
283
+ # These are equivalent (the defaults):
284
+ client = Client(store_dir=".cashet")
285
+
286
+ # Explicit injection:
287
+ client = Client(
288
+ store=SQLiteStore(Path(".cashet")),
289
+ executor=LocalExecutor(),
290
+ )
291
+ ```
292
+
293
+ **Store protocol** — implement this to use RocksDB, Redis, S3, or anything else:
294
+
295
+ ```python
296
+ from cashet.protocols import Store
297
+
298
+ class RedisStore:
299
+ def put_blob(self, data: bytes) -> ObjectRef: ...
300
+ def get_blob(self, ref: ObjectRef) -> bytes: ...
301
+ def put_commit(self, commit: Commit) -> None: ...
302
+ def get_commit(self, hash: str) -> Commit | None: ...
303
+ def find_by_fingerprint(self, fingerprint: str) -> Commit | None: ...
304
+ def list_commits(self, ...) -> list[Commit]: ...
305
+ def get_history(self, hash: str) -> list[Commit]: ...
306
+ def stats(self) -> dict[str, int]: ...
307
+ def evict(self, older_than: datetime) -> int: ...
308
+ def close(self) -> None: ...
309
+
310
+ client = Client(store=RedisStore("redis://localhost"))
311
+ # Everything else works identically
312
+ ```
313
+
314
+ **Executor protocol** — implement this for distributed execution (Celery, Kafka, RQ):
315
+
316
+ ```python
317
+ from cashet.protocols import Executor
318
+
319
+ class CeleryExecutor:
320
+ def submit(self, func, args, kwargs, task_def, store, serializer):
321
+ # Push to Celery, poll for result
322
+ ...
323
+
324
+ client = Client(
325
+ store=RedisStore("redis://localhost"),
326
+ executor=CeleryExecutor(),
327
+ )
328
+ ```
329
+
330
+ **Serializer protocol** — already covered below.
331
+
332
+ ### `client.submit(func, *args, **kwargs) -> ResultRef`
333
+
334
+ Submit a function for execution. Returns a `ResultRef` — a lazy handle to the result.
335
+
336
+ ```python
337
+ ref = client.submit(my_func, arg1, arg2, key="value")
338
+ ref.hash # content hash of the result
339
+ ref.size # size in bytes
340
+ ref.load() # deserialize and return the result
341
+ ```
342
+
343
+ If the same function + same arguments have been submitted before, returns the cached result **without re-executing**.
344
+
345
+ **Opt out of caching:**
346
+
347
+ ```python
348
+ # Per-call
349
+ ref = client.submit(non_deterministic_func, _cache=False)
350
+
351
+ # Per-function via decorator
352
+ @client.task(cache=False)
353
+ def random_score():
354
+ return random.random()
355
+ ```
356
+
357
+ **Tag commits:**
358
+
359
+ ```python
360
+ # Per-call
361
+ ref = client.submit(train, data, lr=0.01, _tags={"experiment": "v1"})
362
+
363
+ # Per-function via decorator
364
+ @client.task(tags={"team": "ml"})
365
+ def preprocess(raw):
366
+ ...
367
+ ```
368
+
369
+ Tags are not part of the cache key — they are metadata for organization and filtering.
370
+
371
+ ### `@client.task`
372
+
373
+ Register a function with cashet metadata and make it directly callable:
374
+
375
+ ```python
376
+ @client.task
377
+ def my_func(x):
378
+ return x * 2
379
+
380
+ ref = my_func(5) # Returns ResultRef, same as client.submit(my_func, 5)
381
+ ref.load() # 10
382
+
383
+ @client.task(cache=False, name="custom_task_name", tags={"env": "prod"})
384
+ def other_func(x):
385
+ return x + 1
386
+ ```
387
+
388
+ `client.submit(my_func, 5)` still works identically.
389
+
390
+ ### `client.log()`, `client.show()`, `client.get()`, `client.diff()`, `client.history()`, `client.rm()`, `client.gc()`
391
+
392
+ ```python
393
+ # List commits
394
+ commits = client.log(func_name="preprocess", limit=10)
395
+
396
+ # Filter by tags
397
+ commits = client.log(tags={"experiment": "v1"})
398
+
399
+ # Get commit details
400
+ commit = client.show(hash)
401
+ commit.task_def.func_source # the source code
402
+ commit.task_def.args_snapshot # the serialized args
403
+ commit.parent_hash # previous commit for same func+args
404
+ commit.created_at
405
+
406
+ # Load a result by commit hash
407
+ result = client.get(hash)
408
+
409
+ # Diff two commits
410
+ diff = client.diff(hash_a, hash_b)
411
+ # {'func_changed': True, 'args_changed': False, 'output_changed': True, ...}
412
+
413
+ # Get lineage (all runs of same func+args)
414
+ history = client.history(hash)
415
+
416
+ # Evict old entries (default: 30 days)
417
+ evicted = client.gc()
418
+ # Evict entries older than 7 days
419
+ from datetime import timedelta
420
+ evicted = client.gc(older_than=timedelta(days=7))
421
+
422
+ # Storage stats
423
+ stats = client.stats()
424
+ # {'total_commits': 42, 'completed_commits': 40, 'stored_objects': 38}
425
+ ```
426
+
427
+ ### `ResultRef`
428
+
429
+ A lazy reference to a stored result. Pass it as an argument to chain tasks:
430
+
431
+ ```python
432
+ step1 = client.submit(func_a, input_data)
433
+ step2 = client.submit(func_b, step1) # step1 auto-resolves to its output
434
+ ```
435
+
436
+ ### Custom Serialization
437
+
438
+ ```python
439
+ from cashet import Client, PickleSerializer, SafePickleSerializer, JsonSerializer
440
+
441
+ # Default: pickle (handles arbitrary Python objects)
442
+ client = Client(serializer=PickleSerializer())
443
+
444
+ # Safe pickle: restricts deserialization to an allowlist of known types
445
+ client = Client(serializer=SafePickleSerializer())
446
+
447
+ # Allow custom classes through the allowlist
448
+ client = Client(serializer=SafePickleSerializer(extra_classes=[MyClass]))
449
+
450
+ # For JSON-safe data (dicts, lists, primitives)
451
+ client = Client(serializer=JsonSerializer())
452
+
453
+ # Or implement the Serializer protocol
454
+ from cashet.hashing import Serializer
455
+
456
+ class MySerializer:
457
+ def dumps(self, obj) -> bytes:
458
+ ...
459
+ def loads(self, data: bytes):
460
+ ...
461
+ ```
462
+
463
+ ## How It Works
464
+
465
+ ```
466
+ client.submit(func, arg1, arg2)
467
+
468
+
469
+ ┌─────────────────┐
470
+ │ Hash function │ SHA256(AST-normalized source + dep versions + referenced user helpers)
471
+ │ Hash arguments │ SHA256(canonical repr of args/kwargs)
472
+ └────────┬────────┘
473
+
474
+
475
+ ┌─────────────────┐
476
+ │ Fingerprint │ func_hash:args_hash
477
+ │ cache lookup │ ← Store protocol (SQLiteStore, RedisStore, ...)
478
+ └────────┬────────┘
479
+
480
+ ┌─────┴─────┐
481
+ │ │
482
+ CACHED MISS
483
+ │ │
484
+ ▼ ▼
485
+ Return ref ← Executor protocol (LocalExecutor, CeleryExecutor, ...)
486
+ Execute function
487
+ Store result as blob → Store protocol
488
+ Record commit with parent lineage
489
+ Return ref
490
+ ```
491
+
492
+ **Architecture (protocol-based):**
493
+
494
+ | Protocol | Default | Implement for |
495
+ |---|---|---|
496
+ | `Store` | `SQLiteStore` | RocksDB, Redis, S3, Postgres |
497
+ | `Executor` | `LocalExecutor` | Celery, Kafka, RQ, subprocess |
498
+ | `Serializer` | `PickleSerializer` | JSON, MessagePack, custom formats |
499
+
500
+ **Storage layout** (in `.cashet/`):
501
+
502
+ ```
503
+ .cashet/
504
+ ├── objects/ # content-addressable blobs (like git objects)
505
+ │ ├── a3/
506
+ │ │ └── b4c5d6... # compressed result blob
507
+ │ └── e7/
508
+ │ └── f8g9h0...
509
+ └── meta.db # SQLite: commits, fingerprints, provenance
510
+ ```
511
+
512
+ **Key design decisions:**
513
+
514
+ - **Closure variables are not hashed** and emit a `ClosureWarning` if present. Function identity is source code, not runtime state. If you need cache invalidation based on a value, pass it as an explicit argument.
515
+ - **Referenced user-defined helper functions are hashed recursively.** Change an imported helper in your own code and the caller's cache invalidates correctly. Builtin and third-party library functions are skipped.
516
+ - **Blobs are deduplicated by content hash.** Identical results share one blob on disk.
517
+ - **Source is hashed as an AST.** Comments, docstrings, and whitespace changes don't invalidate the cache.
518
+ - **Non-cached tasks get unique commit hashes** (timestamp salt) so they always re-execute but still record lineage.
519
+ - **Parent tracking:** Each commit records the hash of the previous commit for the same function+args, forming a history chain you can traverse.
520
+
521
+ ## Project Status
522
+
523
+ **Experimental.** The core (hashing, DAG resolution, fingerprint dedup) is stable. The defaults work reliably for single-machine workflows. The protocol layer (`Store`, `Executor`, `Serializer`) is ready for alternative backends — implementing a Redis store or Celery executor is a single-file job.
524
+
525
+ Built-in: `SQLiteStore` + `LocalExecutor` + `PickleSerializer`.
526
+ Not yet built: Redis, RocksDB, S3 stores; Celery/Kafka executors. PRs welcome.
527
+
528
+ ## License
529
+
530
+ MIT