withcache 0.2.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- withcache/__init__.py +11 -0
- withcache/_shim.py +134 -0
- withcache/curlwithcache.py +51 -0
- withcache/server.py +901 -0
- withcache/static/htmx.min.js +1 -0
- withcache/static/pico.min.css +4 -0
- withcache/wgetwithcache.py +51 -0
- withcache-0.2.0.data/scripts/curlwithcache +4 -0
- withcache-0.2.0.data/scripts/wgetwithcache +4 -0
- withcache-0.2.0.dist-info/METADATA +271 -0
- withcache-0.2.0.dist-info/RECORD +14 -0
- withcache-0.2.0.dist-info/WHEEL +4 -0
- withcache-0.2.0.dist-info/entry_points.txt +2 -0
- withcache-0.2.0.dist-info/licenses/LICENSE +28 -0
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: withcache
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Operator-curated, URL-keyed artifact cache for a small lab (CUDA/ROCm/DOCA/firmware)
|
|
5
|
+
Project-URL: Homepage, https://github.com/safl/withcache
|
|
6
|
+
Author-email: "Simon A. F. Lund" <safl@safl.dk>
|
|
7
|
+
License: BSD-3-Clause
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Keywords: artifacts,cache,cuda,doca,firmware,lab,rocm
|
|
10
|
+
Classifier: Development Status :: 3 - Alpha
|
|
11
|
+
Classifier: Environment :: Console
|
|
12
|
+
Classifier: Intended Audience :: System Administrators
|
|
13
|
+
Classifier: License :: OSI Approved :: BSD License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Topic :: System :: Archiving :: Mirroring
|
|
16
|
+
Requires-Python: >=3.10
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
|
|
19
|
+
# withcache
|
|
20
|
+
|
|
21
|
+
[](https://github.com/safl/withcache/actions/workflows/ci.yml)
|
|
22
|
+
[](https://pypi.org/project/withcache/)
|
|
23
|
+
[](LICENSE)
|
|
24
|
+
[](https://ziglang.org)
|
|
25
|
+
[](https://github.com/safl/withcache/releases)
|
|
26
|
+
|
|
27
|
+
A tiny, **operator-curated** artifact cache for a small lab, for the big vendor
|
|
28
|
+
downloads you re-pull constantly (CUDA, ROCm, DOCA, firmware, drivers), fronted
|
|
29
|
+
by **transparent `curl`/`wget` shims** so existing scripts use it with no changes.
|
|
30
|
+
|
|
31
|
+
Think of it as **"`ccache` for HTTP artifacts, without a proxy."**
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # your script, unchanged
|
|
35
|
+
└─ curlwithcache shim ─ WITHCACHE_SERVER set?
|
|
36
|
+
├─ cached → served from the cache-host (fast, local)
|
|
37
|
+
└─ miss/unset/unreachable → runs the real curl, exactly as written
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Artifacts are cached **by their origin URL as a key**; the shim opts in by
|
|
41
|
+
re-pointing the URL at the cache. No transparent proxy, no TLS interception, no
|
|
42
|
+
client CA. The URL is a lookup key, not a connection target.
|
|
43
|
+
|
|
44
|
+
By default a miss is **auto-fetched**: the request falls through to origin (so
|
|
45
|
+
the caller gets its file straight away), and the cache-host pulls the same
|
|
46
|
+
artifact in the background, so the next request hits. Run with **`--curate`** to
|
|
47
|
+
require a human instead, who reviews the miss list in a small web UI and presses
|
|
48
|
+
**Download** (or pre-seeds via *Add from URI*). Either way the cache-host is the
|
|
49
|
+
only box that needs internet egress (and any vendor credentials), and clients
|
|
50
|
+
never write to it.
|
|
51
|
+
|
|
52
|
+
## Why not just curl + a caching proxy?
|
|
53
|
+
|
|
54
|
+
For `https://` (i.e. every vendor download) a forward proxy can't cache without
|
|
55
|
+
**SSL-bump / MITM**: curl tunnels TLS end-to-end via `CONNECT`, so the proxy
|
|
56
|
+
only sees ciphertext. The shim sidesteps that entirely by *re-pointing the URL*
|
|
57
|
+
to the cache instead of intercepting the connection. And no proxy offers the
|
|
58
|
+
optional **operator-curated** model (`--curate`: a miss queue a human approves).
|
|
59
|
+
|
|
60
|
+
## Components
|
|
61
|
+
|
|
62
|
+
| Path | What it is |
|
|
63
|
+
|-------------------------------|-------------------------------------------------------------|
|
|
64
|
+
| `src/withcache/server.py` | The cache-host: blob store + miss table + **background download manager** + operator UI (Pico.css + HTMX) |
|
|
65
|
+
| `src/withcache/_shim.py` | Shared shim core (find URL → probe → rewrite → exec) |
|
|
66
|
+
| `src/withcache/curlwithcache.py` / `wgetwithcache.py` | The Python `curl` / `wget` shims |
|
|
67
|
+
| `shim/shim.zig` | The native shim: one static binary, both tools via `argv[0]` |
|
|
68
|
+
| `deploy/Containerfile`, `deploy/compose.yml` | Single Podman/Docker host deploy |
|
|
69
|
+
|
|
70
|
+
The cache-host and the Python shims are **stdlib-only** (no third-party runtime
|
|
71
|
+
deps); the native shim is a dependency-free static binary.
|
|
72
|
+
|
|
73
|
+
## Install
|
|
74
|
+
|
|
75
|
+
The **cache-host** and **Python shims** (works on any box with Python):
|
|
76
|
+
|
|
77
|
+
```sh
|
|
78
|
+
pipx install withcache # or: uv tool install withcache / pip install withcache
|
|
79
|
+
# provides: curlwithcache wgetwithcache withcache-server
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
The **native shim** (no Python needed, for minimal/distroless boxes; ~200 KB
|
|
83
|
+
static musl binary). Grab it from the [Releases] page; one binary serves both
|
|
84
|
+
tools by the name it's invoked as:
|
|
85
|
+
|
|
86
|
+
```sh
|
|
87
|
+
curl -L .../releases/.../withcache-shim-x86_64-linux-musl -o /usr/local/bin/curlwithcache
|
|
88
|
+
chmod +x /usr/local/bin/curlwithcache
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
The Python shim is also the tested **oracle** and install-time fallback for
|
|
92
|
+
platforms without a prebuilt binary; a [differential test](tests/test_differential.py)
|
|
93
|
+
asserts the binary and the Python `plan()` rewrite argv identically.
|
|
94
|
+
|
|
95
|
+
[Releases]: https://github.com/safl/withcache/releases
|
|
96
|
+
[direnv]: https://direnv.net
|
|
97
|
+
|
|
98
|
+
## Deploy the cache-host
|
|
99
|
+
|
|
100
|
+
```sh
|
|
101
|
+
export WITHCACHE_ADMIN_PASSWORD=change-me # protects the operator UI
|
|
102
|
+
podman compose -f deploy/compose.yml up -d # or: docker compose -f ...
|
|
103
|
+
# operator UI: http://withcache-server:3000/
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Or without containers:
|
|
107
|
+
|
|
108
|
+
```sh
|
|
109
|
+
WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 3000
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Data (blobs + `cache.db` + `session-secret`) lives in the `/data` volume (or
|
|
113
|
+
`--data-dir`). Artifacts are immutable per version, so there's no cache
|
|
114
|
+
invalidation. `--workers N` sets the number of concurrent download workers, and
|
|
115
|
+
`--curate` switches from auto-fetch to operator-approved pulls.
|
|
116
|
+
|
|
117
|
+
## Use the shims (transparent `curl` / `wget`)
|
|
118
|
+
|
|
119
|
+
Every approach is the same two ingredients: (1) point at the cache with
|
|
120
|
+
`WITHCACHE_SERVER`, and (2) make `curl`/`wget` resolve to the shim. They differ
|
|
121
|
+
only in **how widely the system `curl`/`wget` is shadowed**. Pick the least
|
|
122
|
+
invasive one that fits.
|
|
123
|
+
|
|
124
|
+
> **Safety:** with `WITHCACHE_SERVER` unset the shim is a pure pass-through (it
|
|
125
|
+
> just `exec`s the real tool, zero network/parsing), so even the system-wide
|
|
126
|
+
> setup is harmless wherever the cache isn't configured. Worst case is always
|
|
127
|
+
> "no caching, `curl` still works."
|
|
128
|
+
|
|
129
|
+
These all use `command -v curlwithcache`, so they work whether you installed the
|
|
130
|
+
native binary or the Python launcher (both land under that name).
|
|
131
|
+
|
|
132
|
+
### 1. No shadowing: call the shims by name (least invasive)
|
|
133
|
+
|
|
134
|
+
Nothing is renamed; you opt in per command. Good for trying it out or a script
|
|
135
|
+
you can edit.
|
|
136
|
+
|
|
137
|
+
```sh
|
|
138
|
+
export WITHCACHE_SERVER=http://withcache-server:3000
|
|
139
|
+
curlwithcache -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz
|
|
140
|
+
wgetwithcache https://the/origin/rocm.tar.gz
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### 2. This shell only: shadow `curl`/`wget` for the session
|
|
144
|
+
|
|
145
|
+
Put `curl`/`wget` symlinks in a dir and prepend it to `PATH` in the current
|
|
146
|
+
shell. Reversible by just closing the shell.
|
|
147
|
+
|
|
148
|
+
```sh
|
|
149
|
+
mkdir -p ~/.withcache/bin
|
|
150
|
+
ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
|
|
151
|
+
ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
|
|
152
|
+
|
|
153
|
+
export WITHCACHE_SERVER=http://withcache-server:3000
|
|
154
|
+
export PATH="$HOME/.withcache/bin:$PATH"
|
|
155
|
+
hash -r # forget any cached curl/wget location
|
|
156
|
+
|
|
157
|
+
command -v curl # -> ~/.withcache/bin/curl (verify it's the shim)
|
|
158
|
+
curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # existing scripts, unchanged
|
|
159
|
+
wget https://the/origin/rocm.tar.gz # still saved as rocm.tar.gz
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### 3. Your user: make it the default for your shells (persistent)
|
|
163
|
+
|
|
164
|
+
Create the symlinks once, then add the two exports to your shell rc. Affects all
|
|
165
|
+
your future interactive shells; undo by deleting the block.
|
|
166
|
+
|
|
167
|
+
```sh
|
|
168
|
+
mkdir -p ~/.withcache/bin
|
|
169
|
+
ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
|
|
170
|
+
ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
|
|
171
|
+
|
|
172
|
+
cat >> ~/.bashrc <<'EOF'
|
|
173
|
+
|
|
174
|
+
# withcache: transparent curl/wget caching
|
|
175
|
+
export WITHCACHE_SERVER=http://withcache-server:3000
|
|
176
|
+
export PATH="$HOME/.withcache/bin:$PATH"
|
|
177
|
+
EOF
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### 4. One project only: scope it with direnv
|
|
181
|
+
|
|
182
|
+
Drop an `.envrc` in a project tree (requires [direnv]); caching applies only
|
|
183
|
+
inside that directory.
|
|
184
|
+
|
|
185
|
+
```sh
|
|
186
|
+
# .envrc
|
|
187
|
+
export WITHCACHE_SERVER=http://withcache-server:3000
|
|
188
|
+
PATH_add ~/.withcache/bin # assumes the symlinks from approach 2/3 exist
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
Then `direnv allow`.
|
|
192
|
+
|
|
193
|
+
### 5. The whole machine: every user, every shell (most invasive)
|
|
194
|
+
|
|
195
|
+
Install the shim as `curl`/`wget` in `/usr/local/bin` (ahead of `/usr/bin` on
|
|
196
|
+
the default `PATH`) and set the server globally. This also catches build tools
|
|
197
|
+
and package managers that shell out to `curl`/`wget`.
|
|
198
|
+
|
|
199
|
+
```sh
|
|
200
|
+
sudo ln -sf "$(command -v curlwithcache)" /usr/local/bin/curl
|
|
201
|
+
sudo ln -sf "$(command -v wgetwithcache)" /usr/local/bin/wget
|
|
202
|
+
|
|
203
|
+
# A login-shell env file (covers interactive logins; daemons started outside a
|
|
204
|
+
# login shell won't see it; set WITHCACHE_SERVER in their unit if you need it).
|
|
205
|
+
echo 'export WITHCACHE_SERVER=http://withcache-server:3000' \
|
|
206
|
+
| sudo tee /etc/profile.d/withcache.sh >/dev/null
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
On minimal/distroless hosts use the [native shim binary](#install) here: same
|
|
210
|
+
symlink, no Python required.
|
|
211
|
+
|
|
212
|
+
### Verify / turn it off
|
|
213
|
+
|
|
214
|
+
```sh
|
|
215
|
+
command -v curl # which curl is in effect (the shim, or the real one)
|
|
216
|
+
export REAL_CURL=/usr/bin/curl # optional: pin the wrapped tool (also $REAL_WGET)
|
|
217
|
+
|
|
218
|
+
unset WITHCACHE_SERVER # instantly back to plain curl (pass-through)
|
|
219
|
+
rm ~/.withcache/bin/curl ~/.withcache/bin/wget # remove shadowing entirely
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
How it works: the shim **scans for the URL, asks the cache, and execs the real tool**:
|
|
223
|
+
|
|
224
|
+
1. Find the real `curl`/`wget` on `$PATH` (skipping itself; `$REAL_CURL`/`$REAL_WGET` override).
|
|
225
|
+
2. With `WITHCACHE_SERVER` set, find the URL (the `scheme://` arg, or `--url`).
|
|
226
|
+
3. Probe the cache with that same tool (`curl -I` / `wget --spider`).
|
|
227
|
+
- **Hit** → re-point only the URL at `http://server/b/<base64(origin)>/<basename>` and `exec` the real tool (so `-o`, `-O`, `-L`, `--retry`, … all still apply, and the file is named after the artifact).
|
|
228
|
+
- **Miss / unreachable** → `exec` the real tool with your **arguments untouched** (origin); the miss is recorded for the operator.
|
|
229
|
+
4. With no `WITHCACHE_SERVER`, it does **zero** network/parsing, just `exec`s the real tool.
|
|
230
|
+
|
|
231
|
+
Notes & limits (all degrade gracefully; worst case is "no caching, curl still works"):
|
|
232
|
+
- Needs the wrapped tool present (it shims it). Adds ~Python-startup latency per call.
|
|
233
|
+
- URLs hidden in a `-K`/`-i` config file or piped via stdin aren't seen → those calls pass through uncached.
|
|
234
|
+
- Per-tool env override: `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` beat `WITHCACHE_SERVER`.
|
|
235
|
+
|
|
236
|
+
## Operator UI
|
|
237
|
+
|
|
238
|
+
`http://withcache-server:3000/` (Pico.css + HTMX, bundled offline) shows:
|
|
239
|
+
- **Misses**: auto-fetched by default, or (under `--curate`) each with **Download** (queues a background pull) and **Dismiss**.
|
|
240
|
+
- **Downloads**: live progress bars, `queued/running/completed/cancelled/failed`, **Cancel**, and **Clear finished**. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on [bty]'s job managers.
|
|
241
|
+
- **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at.
|
|
242
|
+
- **Add from URI**: pre-seed an artifact before anyone misses it.
|
|
243
|
+
|
|
244
|
+
## Auth
|
|
245
|
+
|
|
246
|
+
Single-tenant session-cookie auth (modelled on [bty]'s approach, env password
|
|
247
|
+
instead of PAM). The **read path** (`/blob`, `/b/…`, `/healthz`) is open so shims
|
|
248
|
+
never log in; the **operator surface** (`/`, `/admin/*`) is gated.
|
|
249
|
+
|
|
250
|
+
| Env var | Purpose |
|
|
251
|
+
|----------------------------|----------------------------------------------------------|
|
|
252
|
+
| `WITHCACHE_SERVER` | Cache-host URL the shims use |
|
|
253
|
+
| `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` | Per-tool override of the above |
|
|
254
|
+
| `WITHCACHE_ADMIN_PASSWORD` | Operator login password (unset ⇒ UI open, with a warning) |
|
|
255
|
+
| `WITHCACHE_SESSION_SECRET` | Override the persisted cookie-signing key (optional) |
|
|
256
|
+
|
|
257
|
+
[bty]: https://github.com/safl/bty
|
|
258
|
+
|
|
259
|
+
## Cache keys & signed URLs
|
|
260
|
+
|
|
261
|
+
The key is `scheme://host/path` with the **query string dropped** by default, so
|
|
262
|
+
CDN/presigned URLs (whose tokens change every request) still match by path. Pass
|
|
263
|
+
`--keep-query` to the server for query-sensitive keys. Package-manager repos
|
|
264
|
+
(`.deb`/`.rpm`) are GPG-signed and verified by the client regardless of
|
|
265
|
+
transport, so caching them this way is safe.
|
|
266
|
+
|
|
267
|
+
## Tests
|
|
268
|
+
|
|
269
|
+
```sh
|
|
270
|
+
python -m unittest discover -s tests # stdlib only, no test deps
|
|
271
|
+
```
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
withcache/__init__.py,sha256=UrNawCAnRv-ZQxFhqGekxYryydYonSi6ZonQyYhFN-M,421
|
|
2
|
+
withcache/_shim.py,sha256=_CnLxWNmqfXq9QZTqQ6_NFkWxWu97Sp4nJPAQxj39xc,5165
|
|
3
|
+
withcache/curlwithcache.py,sha256=-lFuiVtM8UKQdMP4NPX2TtJAlOfeObHgYvpQpDBpc7k,1796
|
|
4
|
+
withcache/server.py,sha256=DpioGl34bW1Zg0-5WWECSKK_pErMxIqGJ4Q-N3mSFIk,35438
|
|
5
|
+
withcache/wgetwithcache.py,sha256=ZrRKXh91KAajFSsGZnNuNchRmspevTY2bVgkQdwZaaY,1741
|
|
6
|
+
withcache/static/htmx.min.js,sha256=4gndpcgjVHnzFm3vx3UOHbzVpcGAi3eS_C5nM3aPtEc,50917
|
|
7
|
+
withcache/static/pico.min.css,sha256=3V_VWRr9ge4h3MEXrYXAFNw_HxncLXt9EB6grMKSdMI,82194
|
|
8
|
+
withcache-0.2.0.data/scripts/curlwithcache,sha256=g_bsxFt5xobsFCwsuwj62zsQQJa94BrLG7ttm_qlI4A,78
|
|
9
|
+
withcache-0.2.0.data/scripts/wgetwithcache,sha256=QK8imdXLJgxQSyyXIrFJEsdm_IOxH2mq5WAnApPD4xg,78
|
|
10
|
+
withcache-0.2.0.dist-info/METADATA,sha256=nefLMRE29sOa8qpi2RJVDkpxHc55K8W_89euTt0a8mg,12405
|
|
11
|
+
withcache-0.2.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
|
|
12
|
+
withcache-0.2.0.dist-info/entry_points.txt,sha256=ZpOEQxrJ50tSzLARCoEu80pYsWkI11f10EUytK82XIk,59
|
|
13
|
+
withcache-0.2.0.dist-info/licenses/LICENSE,sha256=k28p1vD0tdx5EqOG2L93SQ6EGqXDQTWn4RMptaUyFm0,1503
|
|
14
|
+
withcache-0.2.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
BSD 3-Clause License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026, Simon A. F. Lund
|
|
4
|
+
|
|
5
|
+
Redistribution and use in source and binary forms, with or without
|
|
6
|
+
modification, are permitted provided that the following conditions are met:
|
|
7
|
+
|
|
8
|
+
1. Redistributions of source code must retain the above copyright notice, this
|
|
9
|
+
list of conditions and the following disclaimer.
|
|
10
|
+
|
|
11
|
+
2. Redistributions in binary form must reproduce the above copyright notice,
|
|
12
|
+
this list of conditions and the following disclaimer in the documentation
|
|
13
|
+
and/or other materials provided with the distribution.
|
|
14
|
+
|
|
15
|
+
3. Neither the name of the copyright holder nor the names of its
|
|
16
|
+
contributors may be used to endorse or promote products derived from
|
|
17
|
+
this software without specific prior written permission.
|
|
18
|
+
|
|
19
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
20
|
+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
21
|
+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
22
|
+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
|
23
|
+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
24
|
+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
25
|
+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
26
|
+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
27
|
+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
28
|
+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|