withcache 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,20 @@
1
+ # local cache-host state
2
+ data/
3
+ *.part
4
+
5
+ # python
6
+ __pycache__/
7
+ *.pyc
8
+ .venv/
9
+ venv/
10
+
11
+ # packaging / test artifacts
12
+ build/
13
+ dist/
14
+ *.egg-info/
15
+ .pytest_cache/
16
+ .tox/
17
+
18
+ # zig build artifacts
19
+ zig-out/
20
+ .zig-cache/
@@ -0,0 +1,28 @@
1
+ BSD 3-Clause License
2
+
3
+ Copyright (c) 2026, Simon A. F. Lund
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions are met:
7
+
8
+ 1. Redistributions of source code must retain the above copyright notice, this
9
+ list of conditions and the following disclaimer.
10
+
11
+ 2. Redistributions in binary form must reproduce the above copyright notice,
12
+ this list of conditions and the following disclaimer in the documentation
13
+ and/or other materials provided with the distribution.
14
+
15
+ 3. Neither the name of the copyright holder nor the names of its
16
+ contributors may be used to endorse or promote products derived from
17
+ this software without specific prior written permission.
18
+
19
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,271 @@
1
+ Metadata-Version: 2.4
2
+ Name: withcache
3
+ Version: 0.2.0
4
+ Summary: Operator-curated, URL-keyed artifact cache for a small lab (CUDA/ROCm/DOCA/firmware)
5
+ Project-URL: Homepage, https://github.com/safl/withcache
6
+ Author-email: "Simon A. F. Lund" <safl@safl.dk>
7
+ License: BSD-3-Clause
8
+ License-File: LICENSE
9
+ Keywords: artifacts,cache,cuda,doca,firmware,lab,rocm
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: System Administrators
13
+ Classifier: License :: OSI Approved :: BSD License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Topic :: System :: Archiving :: Mirroring
16
+ Requires-Python: >=3.10
17
+ Description-Content-Type: text/markdown
18
+
19
+ # withcache
20
+
21
+ [![ci](https://github.com/safl/withcache/actions/workflows/ci.yml/badge.svg)](https://github.com/safl/withcache/actions/workflows/ci.yml)
22
+ [![PyPI](https://img.shields.io/pypi/v/withcache.svg)](https://pypi.org/project/withcache/)
23
+ [![license](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg)](LICENSE)
24
+ [![built with Zig](https://img.shields.io/badge/built%20with-Zig%200.16.0-f7a41d.svg)](https://ziglang.org)
25
+ [![static musl](https://img.shields.io/badge/static%20musl-x86__64%20%7C%20aarch64-blue.svg)](https://github.com/safl/withcache/releases)
26
+
27
+ A tiny, **operator-curated** artifact cache for a small lab, for the big vendor
28
+ downloads you re-pull constantly (CUDA, ROCm, DOCA, firmware, drivers), fronted
29
+ by **transparent `curl`/`wget` shims** so existing scripts use it with no changes.
30
+
31
+ Think of it as **"`ccache` for HTTP artifacts, without a proxy."**
32
+
33
+ ```
34
+ curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # your script, unchanged
35
+ └─ curlwithcache shim ─ WITHCACHE_SERVER set?
36
+ ├─ cached → served from the cache-host (fast, local)
37
+ └─ miss/unset/unreachable → runs the real curl, exactly as written
38
+ ```
39
+
40
+ Artifacts are cached **by their origin URL as a key**; the shim opts in by
41
+ re-pointing the URL at the cache. No transparent proxy, no TLS interception, no
42
+ client CA. The URL is a lookup key, not a connection target.
43
+
44
+ By default a miss is **auto-fetched**: the request falls through to origin (so
45
+ the caller gets its file straight away), and the cache-host pulls the same
46
+ artifact in the background, so the next request hits. Run with **`--curate`** to
47
+ require a human instead, who reviews the miss list in a small web UI and presses
48
+ **Download** (or pre-seeds via *Add from URI*). Either way the cache-host is the
49
+ only box that needs internet egress (and any vendor credentials), and clients
50
+ never write to it.
51
+
52
+ ## Why not just curl + a caching proxy?
53
+
54
+ For `https://` (i.e. every vendor download) a forward proxy can't cache without
55
+ **SSL-bump / MITM**: curl tunnels TLS end-to-end via `CONNECT`, so the proxy
56
+ only sees ciphertext. The shim sidesteps that entirely by *re-pointing the URL*
57
+ to the cache instead of intercepting the connection. And no proxy offers the
58
+ optional **operator-curated** model (`--curate`: a miss queue a human approves).
59
+
60
+ ## Components
61
+
62
+ | Path | What it is |
63
+ |-------------------------------|-------------------------------------------------------------|
64
+ | `src/withcache/server.py` | The cache-host: blob store + miss table + **background download manager** + operator UI (Pico.css + HTMX) |
65
+ | `src/withcache/_shim.py` | Shared shim core (find URL → probe → rewrite → exec) |
66
+ | `src/withcache/curlwithcache.py` / `wgetwithcache.py` | The Python `curl` / `wget` shims |
67
+ | `shim/shim.zig` | The native shim: one static binary, both tools via `argv[0]` |
68
+ | `deploy/Containerfile`, `deploy/compose.yml` | Single Podman/Docker host deploy |
69
+
70
+ The cache-host and the Python shims are **stdlib-only** (no third-party runtime
71
+ deps); the native shim is a dependency-free static binary.
72
+
73
+ ## Install
74
+
75
+ The **cache-host** and **Python shims** (works on any box with Python):
76
+
77
+ ```sh
78
+ pipx install withcache # or: uv tool install withcache / pip install withcache
79
+ # provides: curlwithcache wgetwithcache withcache-server
80
+ ```
81
+
82
+ The **native shim** (no Python needed, for minimal/distroless boxes; ~200 KB
83
+ static musl binary). Grab it from the [Releases] page; one binary serves both
84
+ tools by the name it's invoked as:
85
+
86
+ ```sh
87
+ curl -L .../releases/.../withcache-shim-x86_64-linux-musl -o /usr/local/bin/curlwithcache
88
+ chmod +x /usr/local/bin/curlwithcache
89
+ ```
90
+
91
+ The Python shim is also the tested **oracle** and install-time fallback for
92
+ platforms without a prebuilt binary; a [differential test](tests/test_differential.py)
93
+ asserts the binary and the Python `plan()` rewrite argv identically.
94
+
95
+ [Releases]: https://github.com/safl/withcache/releases
96
+ [direnv]: https://direnv.net
97
+
98
+ ## Deploy the cache-host
99
+
100
+ ```sh
101
+ export WITHCACHE_ADMIN_PASSWORD=change-me # protects the operator UI
102
+ podman compose -f deploy/compose.yml up -d # or: docker compose -f ...
103
+ # operator UI: http://withcache-server:3000/
104
+ ```
105
+
106
+ Or without containers:
107
+
108
+ ```sh
109
+ WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 3000
110
+ ```
111
+
112
+ Data (blobs + `cache.db` + `session-secret`) lives in the `/data` volume (or
113
+ `--data-dir`). Artifacts are immutable per version, so there's no cache
114
+ invalidation. `--workers N` sets the number of concurrent download workers, and
115
+ `--curate` switches from auto-fetch to operator-approved pulls.
116
+
117
+ ## Use the shims (transparent `curl` / `wget`)
118
+
119
+ Every approach is the same two ingredients: (1) point at the cache with
120
+ `WITHCACHE_SERVER`, and (2) make `curl`/`wget` resolve to the shim. They differ
121
+ only in **how widely the system `curl`/`wget` is shadowed**. Pick the least
122
+ invasive one that fits.
123
+
124
+ > **Safety:** with `WITHCACHE_SERVER` unset the shim is a pure pass-through (it
125
+ > just `exec`s the real tool, zero network/parsing), so even the system-wide
126
+ > setup is harmless wherever the cache isn't configured. Worst case is always
127
+ > "no caching, `curl` still works."
128
+
129
+ These all use `command -v curlwithcache`, so they work whether you installed the
130
+ native binary or the Python launcher (both land under that name).
131
+
132
+ ### 1. No shadowing: call the shims by name (least invasive)
133
+
134
+ Nothing is renamed; you opt in per command. Good for trying it out or a script
135
+ you can edit.
136
+
137
+ ```sh
138
+ export WITHCACHE_SERVER=http://withcache-server:3000
139
+ curlwithcache -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz
140
+ wgetwithcache https://the/origin/rocm.tar.gz
141
+ ```
142
+
143
+ ### 2. This shell only: shadow `curl`/`wget` for the session
144
+
145
+ Put `curl`/`wget` symlinks in a dir and prepend it to `PATH` in the current
146
+ shell. Reversible by just closing the shell.
147
+
148
+ ```sh
149
+ mkdir -p ~/.withcache/bin
150
+ ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
151
+ ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
152
+
153
+ export WITHCACHE_SERVER=http://withcache-server:3000
154
+ export PATH="$HOME/.withcache/bin:$PATH"
155
+ hash -r # forget any cached curl/wget location
156
+
157
+ command -v curl # -> ~/.withcache/bin/curl (verify it's the shim)
158
+ curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # existing scripts, unchanged
159
+ wget https://the/origin/rocm.tar.gz # still saved as rocm.tar.gz
160
+ ```
161
+
162
+ ### 3. Your user: make it the default for your shells (persistent)
163
+
164
+ Create the symlinks once, then add the two exports to your shell rc. Affects all
165
+ your future interactive shells; undo by deleting the block.
166
+
167
+ ```sh
168
+ mkdir -p ~/.withcache/bin
169
+ ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
170
+ ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
171
+
172
+ cat >> ~/.bashrc <<'EOF'
173
+
174
+ # withcache: transparent curl/wget caching
175
+ export WITHCACHE_SERVER=http://withcache-server:3000
176
+ export PATH="$HOME/.withcache/bin:$PATH"
177
+ EOF
178
+ ```
179
+
180
+ ### 4. One project only: scope it with direnv
181
+
182
+ Drop an `.envrc` in a project tree (requires [direnv]); caching applies only
183
+ inside that directory.
184
+
185
+ ```sh
186
+ # .envrc
187
+ export WITHCACHE_SERVER=http://withcache-server:3000
188
+ PATH_add ~/.withcache/bin # assumes the symlinks from approach 2/3 exist
189
+ ```
190
+
191
+ Then `direnv allow`.
192
+
193
+ ### 5. The whole machine: every user, every shell (most invasive)
194
+
195
+ Install the shim as `curl`/`wget` in `/usr/local/bin` (ahead of `/usr/bin` on
196
+ the default `PATH`) and set the server globally. This also catches build tools
197
+ and package managers that shell out to `curl`/`wget`.
198
+
199
+ ```sh
200
+ sudo ln -sf "$(command -v curlwithcache)" /usr/local/bin/curl
201
+ sudo ln -sf "$(command -v wgetwithcache)" /usr/local/bin/wget
202
+
203
+ # A login-shell env file (covers interactive logins; daemons started outside a
204
+ # login shell won't see it; set WITHCACHE_SERVER in their unit if you need it).
205
+ echo 'export WITHCACHE_SERVER=http://withcache-server:3000' \
206
+ | sudo tee /etc/profile.d/withcache.sh >/dev/null
207
+ ```
208
+
209
+ On minimal/distroless hosts use the [native shim binary](#install) here: same
210
+ symlink, no Python required.
211
+
212
+ ### Verify / turn it off
213
+
214
+ ```sh
215
+ command -v curl # which curl is in effect (the shim, or the real one)
216
+ export REAL_CURL=/usr/bin/curl # optional: pin the wrapped tool (also $REAL_WGET)
217
+
218
+ unset WITHCACHE_SERVER # instantly back to plain curl (pass-through)
219
+ rm ~/.withcache/bin/curl ~/.withcache/bin/wget # remove shadowing entirely
220
+ ```
221
+
222
+ How it works: the shim **scans for the URL, asks the cache, and execs the real tool**:
223
+
224
+ 1. Find the real `curl`/`wget` on `$PATH` (skipping itself; `$REAL_CURL`/`$REAL_WGET` override).
225
+ 2. With `WITHCACHE_SERVER` set, find the URL (the `scheme://` arg, or `--url`).
226
+ 3. Probe the cache with that same tool (`curl -I` / `wget --spider`).
227
+ - **Hit** → re-point only the URL at `http://server/b/<base64(origin)>/<basename>` and `exec` the real tool (so `-o`, `-O`, `-L`, `--retry`, … all still apply, and the file is named after the artifact).
228
+ - **Miss / unreachable** → `exec` the real tool with your **arguments untouched** (origin); the miss is recorded for the operator.
229
+ 4. With no `WITHCACHE_SERVER`, it does **zero** network/parsing, just `exec`s the real tool.
230
+
231
+ Notes & limits (all degrade gracefully; worst case is "no caching, curl still works"):
232
+ - Needs the wrapped tool present (it shims it). Adds ~Python-startup latency per call.
233
+ - URLs hidden in a `-K`/`-i` config file or piped via stdin aren't seen → those calls pass through uncached.
234
+ - Per-tool env override: `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` beat `WITHCACHE_SERVER`.
235
+
236
+ ## Operator UI
237
+
238
+ `http://withcache-server:3000/` (Pico.css + HTMX, bundled offline) shows:
239
+ - **Misses**: auto-fetched by default, or (under `--curate`) each with **Download** (queues a background pull) and **Dismiss**.
240
+ - **Downloads**: live progress bars, `queued/running/completed/cancelled/failed`, **Cancel**, and **Clear finished**. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on [bty]'s job managers.
241
+ - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at.
242
+ - **Add from URI**: pre-seed an artifact before anyone misses it.
243
+
244
+ ## Auth
245
+
246
+ Single-tenant session-cookie auth (modelled on [bty]'s approach, env password
247
+ instead of PAM). The **read path** (`/blob`, `/b/…`, `/healthz`) is open so shims
248
+ never log in; the **operator surface** (`/`, `/admin/*`) is gated.
249
+
250
+ | Env var | Purpose |
251
+ |----------------------------|----------------------------------------------------------|
252
+ | `WITHCACHE_SERVER` | Cache-host URL the shims use |
253
+ | `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` | Per-tool override of the above |
254
+ | `WITHCACHE_ADMIN_PASSWORD` | Operator login password (unset ⇒ UI open, with a warning) |
255
+ | `WITHCACHE_SESSION_SECRET` | Override the persisted cookie-signing key (optional) |
256
+
257
+ [bty]: https://github.com/safl/bty
258
+
259
+ ## Cache keys & signed URLs
260
+
261
+ The key is `scheme://host/path` with the **query string dropped** by default, so
262
+ CDN/presigned URLs (whose tokens change every request) still match by path. Pass
263
+ `--keep-query` to the server for query-sensitive keys. Package-manager repos
264
+ (`.deb`/`.rpm`) are GPG-signed and verified by the client regardless of
265
+ transport, so caching them this way is safe.
266
+
267
+ ## Tests
268
+
269
+ ```sh
270
+ python -m unittest discover -s tests # stdlib only, no test deps
271
+ ```
@@ -0,0 +1,253 @@
1
+ # withcache
2
+
3
+ [![ci](https://github.com/safl/withcache/actions/workflows/ci.yml/badge.svg)](https://github.com/safl/withcache/actions/workflows/ci.yml)
4
+ [![PyPI](https://img.shields.io/pypi/v/withcache.svg)](https://pypi.org/project/withcache/)
5
+ [![license](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg)](LICENSE)
6
+ [![built with Zig](https://img.shields.io/badge/built%20with-Zig%200.16.0-f7a41d.svg)](https://ziglang.org)
7
+ [![static musl](https://img.shields.io/badge/static%20musl-x86__64%20%7C%20aarch64-blue.svg)](https://github.com/safl/withcache/releases)
8
+
9
+ A tiny, **operator-curated** artifact cache for a small lab, for the big vendor
10
+ downloads you re-pull constantly (CUDA, ROCm, DOCA, firmware, drivers), fronted
11
+ by **transparent `curl`/`wget` shims** so existing scripts use it with no changes.
12
+
13
+ Think of it as **"`ccache` for HTTP artifacts, without a proxy."**
14
+
15
+ ```
16
+ curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # your script, unchanged
17
+ └─ curlwithcache shim ─ WITHCACHE_SERVER set?
18
+ ├─ cached → served from the cache-host (fast, local)
19
+ └─ miss/unset/unreachable → runs the real curl, exactly as written
20
+ ```
21
+
22
+ Artifacts are cached **by their origin URL as a key**; the shim opts in by
23
+ re-pointing the URL at the cache. No transparent proxy, no TLS interception, no
24
+ client CA. The URL is a lookup key, not a connection target.
25
+
26
+ By default a miss is **auto-fetched**: the request falls through to origin (so
27
+ the caller gets its file straight away), and the cache-host pulls the same
28
+ artifact in the background, so the next request hits. Run with **`--curate`** to
29
+ require a human instead, who reviews the miss list in a small web UI and presses
30
+ **Download** (or pre-seeds via *Add from URI*). Either way the cache-host is the
31
+ only box that needs internet egress (and any vendor credentials), and clients
32
+ never write to it.
33
+
34
+ ## Why not just curl + a caching proxy?
35
+
36
+ For `https://` (i.e. every vendor download) a forward proxy can't cache without
37
+ **SSL-bump / MITM**: curl tunnels TLS end-to-end via `CONNECT`, so the proxy
38
+ only sees ciphertext. The shim sidesteps that entirely by *re-pointing the URL*
39
+ to the cache instead of intercepting the connection. And no proxy offers the
40
+ optional **operator-curated** model (`--curate`: a miss queue a human approves).
41
+
42
+ ## Components
43
+
44
+ | Path | What it is |
45
+ |-------------------------------|-------------------------------------------------------------|
46
+ | `src/withcache/server.py` | The cache-host: blob store + miss table + **background download manager** + operator UI (Pico.css + HTMX) |
47
+ | `src/withcache/_shim.py` | Shared shim core (find URL → probe → rewrite → exec) |
48
+ | `src/withcache/curlwithcache.py` / `wgetwithcache.py` | The Python `curl` / `wget` shims |
49
+ | `shim/shim.zig` | The native shim: one static binary, both tools via `argv[0]` |
50
+ | `deploy/Containerfile`, `deploy/compose.yml` | Single Podman/Docker host deploy |
51
+
52
+ The cache-host and the Python shims are **stdlib-only** (no third-party runtime
53
+ deps); the native shim is a dependency-free static binary.
54
+
55
+ ## Install
56
+
57
+ The **cache-host** and **Python shims** (works on any box with Python):
58
+
59
+ ```sh
60
+ pipx install withcache # or: uv tool install withcache / pip install withcache
61
+ # provides: curlwithcache wgetwithcache withcache-server
62
+ ```
63
+
64
+ The **native shim** (no Python needed, for minimal/distroless boxes; ~200 KB
65
+ static musl binary). Grab it from the [Releases] page; one binary serves both
66
+ tools by the name it's invoked as:
67
+
68
+ ```sh
69
+ curl -L .../releases/.../withcache-shim-x86_64-linux-musl -o /usr/local/bin/curlwithcache
70
+ chmod +x /usr/local/bin/curlwithcache
71
+ ```
72
+
73
+ The Python shim is also the tested **oracle** and install-time fallback for
74
+ platforms without a prebuilt binary; a [differential test](tests/test_differential.py)
75
+ asserts the binary and the Python `plan()` rewrite argv identically.
76
+
77
+ [Releases]: https://github.com/safl/withcache/releases
78
+ [direnv]: https://direnv.net
79
+
80
+ ## Deploy the cache-host
81
+
82
+ ```sh
83
+ export WITHCACHE_ADMIN_PASSWORD=change-me # protects the operator UI
84
+ podman compose -f deploy/compose.yml up -d # or: docker compose -f ...
85
+ # operator UI: http://withcache-server:3000/
86
+ ```
87
+
88
+ Or without containers:
89
+
90
+ ```sh
91
+ WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 3000
92
+ ```
93
+
94
+ Data (blobs + `cache.db` + `session-secret`) lives in the `/data` volume (or
95
+ `--data-dir`). Artifacts are immutable per version, so there's no cache
96
+ invalidation. `--workers N` sets the number of concurrent download workers, and
97
+ `--curate` switches from auto-fetch to operator-approved pulls.
98
+
99
+ ## Use the shims (transparent `curl` / `wget`)
100
+
101
+ Every approach is the same two ingredients: (1) point at the cache with
102
+ `WITHCACHE_SERVER`, and (2) make `curl`/`wget` resolve to the shim. They differ
103
+ only in **how widely the system `curl`/`wget` is shadowed**. Pick the least
104
+ invasive one that fits.
105
+
106
+ > **Safety:** with `WITHCACHE_SERVER` unset the shim is a pure pass-through (it
107
+ > just `exec`s the real tool, zero network/parsing), so even the system-wide
108
+ > setup is harmless wherever the cache isn't configured. Worst case is always
109
+ > "no caching, `curl` still works."
110
+
111
+ These all use `command -v curlwithcache`, so they work whether you installed the
112
+ native binary or the Python launcher (both land under that name).
113
+
114
+ ### 1. No shadowing: call the shims by name (least invasive)
115
+
116
+ Nothing is renamed; you opt in per command. Good for trying it out or a script
117
+ you can edit.
118
+
119
+ ```sh
120
+ export WITHCACHE_SERVER=http://withcache-server:3000
121
+ curlwithcache -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz
122
+ wgetwithcache https://the/origin/rocm.tar.gz
123
+ ```
124
+
125
+ ### 2. This shell only: shadow `curl`/`wget` for the session
126
+
127
+ Put `curl`/`wget` symlinks in a dir and prepend it to `PATH` in the current
128
+ shell. Reversible by just closing the shell.
129
+
130
+ ```sh
131
+ mkdir -p ~/.withcache/bin
132
+ ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
133
+ ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
134
+
135
+ export WITHCACHE_SERVER=http://withcache-server:3000
136
+ export PATH="$HOME/.withcache/bin:$PATH"
137
+ hash -r # forget any cached curl/wget location
138
+
139
+ command -v curl # -> ~/.withcache/bin/curl (verify it's the shim)
140
+ curl -fsSL https://the/origin/cuda.tar.gz -o cuda.tar.gz # existing scripts, unchanged
141
+ wget https://the/origin/rocm.tar.gz # still saved as rocm.tar.gz
142
+ ```
143
+
144
+ ### 3. Your user: make it the default for your shells (persistent)
145
+
146
+ Create the symlinks once, then add the two exports to your shell rc. Affects all
147
+ your future interactive shells; undo by deleting the block.
148
+
149
+ ```sh
150
+ mkdir -p ~/.withcache/bin
151
+ ln -sf "$(command -v curlwithcache)" ~/.withcache/bin/curl
152
+ ln -sf "$(command -v wgetwithcache)" ~/.withcache/bin/wget
153
+
154
+ cat >> ~/.bashrc <<'EOF'
155
+
156
+ # withcache: transparent curl/wget caching
157
+ export WITHCACHE_SERVER=http://withcache-server:3000
158
+ export PATH="$HOME/.withcache/bin:$PATH"
159
+ EOF
160
+ ```
161
+
162
+ ### 4. One project only: scope it with direnv
163
+
164
+ Drop an `.envrc` in a project tree (requires [direnv]); caching applies only
165
+ inside that directory.
166
+
167
+ ```sh
168
+ # .envrc
169
+ export WITHCACHE_SERVER=http://withcache-server:3000
170
+ PATH_add ~/.withcache/bin # assumes the symlinks from approach 2/3 exist
171
+ ```
172
+
173
+ Then `direnv allow`.
174
+
175
+ ### 5. The whole machine: every user, every shell (most invasive)
176
+
177
+ Install the shim as `curl`/`wget` in `/usr/local/bin` (ahead of `/usr/bin` on
178
+ the default `PATH`) and set the server globally. This also catches build tools
179
+ and package managers that shell out to `curl`/`wget`.
180
+
181
+ ```sh
182
+ sudo ln -sf "$(command -v curlwithcache)" /usr/local/bin/curl
183
+ sudo ln -sf "$(command -v wgetwithcache)" /usr/local/bin/wget
184
+
185
+ # A login-shell env file (covers interactive logins; daemons started outside a
186
+ # login shell won't see it; set WITHCACHE_SERVER in their unit if you need it).
187
+ echo 'export WITHCACHE_SERVER=http://withcache-server:3000' \
188
+ | sudo tee /etc/profile.d/withcache.sh >/dev/null
189
+ ```
190
+
191
+ On minimal/distroless hosts use the [native shim binary](#install) here: same
192
+ symlink, no Python required.
193
+
194
+ ### Verify / turn it off
195
+
196
+ ```sh
197
+ command -v curl # which curl is in effect (the shim, or the real one)
198
+ export REAL_CURL=/usr/bin/curl # optional: pin the wrapped tool (also $REAL_WGET)
199
+
200
+ unset WITHCACHE_SERVER # instantly back to plain curl (pass-through)
201
+ rm ~/.withcache/bin/curl ~/.withcache/bin/wget # remove shadowing entirely
202
+ ```
203
+
204
+ How it works: the shim **scans for the URL, asks the cache, and execs the real tool**:
205
+
206
+ 1. Find the real `curl`/`wget` on `$PATH` (skipping itself; `$REAL_CURL`/`$REAL_WGET` override).
207
+ 2. With `WITHCACHE_SERVER` set, find the URL (the `scheme://` arg, or `--url`).
208
+ 3. Probe the cache with that same tool (`curl -I` / `wget --spider`).
209
+ - **Hit** → re-point only the URL at `http://server/b/<base64(origin)>/<basename>` and `exec` the real tool (so `-o`, `-O`, `-L`, `--retry`, … all still apply, and the file is named after the artifact).
210
+ - **Miss / unreachable** → `exec` the real tool with your **arguments untouched** (origin); the miss is recorded for the operator.
211
+ 4. With no `WITHCACHE_SERVER`, it does **zero** network/parsing, just `exec`s the real tool.
212
+
213
+ Notes & limits (all degrade gracefully; worst case is "no caching, curl still works"):
214
+ - Needs the wrapped tool present (it shims it). Adds ~Python-startup latency per call.
215
+ - URLs hidden in a `-K`/`-i` config file or piped via stdin aren't seen → those calls pass through uncached.
216
+ - Per-tool env override: `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` beat `WITHCACHE_SERVER`.
217
+
218
+ ## Operator UI
219
+
220
+ `http://withcache-server:3000/` (Pico.css + HTMX, bundled offline) shows:
221
+ - **Misses**: auto-fetched by default, or (under `--curate`) each with **Download** (queues a background pull) and **Dismiss**.
222
+ - **Downloads**: live progress bars, `queued/running/completed/cancelled/failed`, **Cancel**, and **Clear finished**. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on [bty]'s job managers.
223
+ - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at.
224
+ - **Add from URI**: pre-seed an artifact before anyone misses it.
225
+
226
+ ## Auth
227
+
228
+ Single-tenant session-cookie auth (modelled on [bty]'s approach, env password
229
+ instead of PAM). The **read path** (`/blob`, `/b/…`, `/healthz`) is open so shims
230
+ never log in; the **operator surface** (`/`, `/admin/*`) is gated.
231
+
232
+ | Env var | Purpose |
233
+ |----------------------------|----------------------------------------------------------|
234
+ | `WITHCACHE_SERVER` | Cache-host URL the shims use |
235
+ | `CURLWITHCACHE_SERVER` / `WGETWITHCACHE_SERVER` | Per-tool override of the above |
236
+ | `WITHCACHE_ADMIN_PASSWORD` | Operator login password (unset ⇒ UI open, with a warning) |
237
+ | `WITHCACHE_SESSION_SECRET` | Override the persisted cookie-signing key (optional) |
238
+
239
+ [bty]: https://github.com/safl/bty
240
+
241
+ ## Cache keys & signed URLs
242
+
243
+ The key is `scheme://host/path` with the **query string dropped** by default, so
244
+ CDN/presigned URLs (whose tokens change every request) still match by path. Pass
245
+ `--keep-query` to the server for query-sensitive keys. Package-manager repos
246
+ (`.deb`/`.rpm`) are GPG-signed and verified by the client regardless of
247
+ transport, so caching them this way is safe.
248
+
249
+ ## Tests
250
+
251
+ ```sh
252
+ python -m unittest discover -s tests # stdlib only, no test deps
253
+ ```
@@ -0,0 +1,27 @@
1
+ # withcache cache-host. Build context is the repo root:
2
+ # podman build -f deploy/Containerfile -t withcache .
3
+ # (deploy/compose.yml sets the context for you.)
4
+ FROM python:3.12-slim
5
+
6
+ # Install the package (no third-party deps) to get the withcache-server command.
7
+ WORKDIR /app
8
+ COPY pyproject.toml README.md /app/
9
+ COPY src /app/src
10
+ RUN pip install --no-cache-dir /app
11
+
12
+ # Run as non-root; /data is the persistent volume for blobs + sqlite.
13
+ RUN useradd --create-home --uid 10001 app \
14
+ && mkdir -p /data && chown app:app /data
15
+ USER app
16
+
17
+ EXPOSE 3000
18
+ VOLUME ["/data"]
19
+
20
+ # Set WITHCACHE_ADMIN_PASSWORD at run time to protect the operator UI.
21
+ # A session-signing key is persisted under /data automatically, or override
22
+ # with WITHCACHE_SESSION_SECRET.
23
+
24
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s \
25
+ CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:3000/healthz',timeout=2).status==200 else 1)"
26
+
27
+ ENTRYPOINT ["withcache-server", "--host", "0.0.0.0", "--port", "3000", "--data-dir", "/data"]
@@ -0,0 +1,23 @@
1
+ # Single-host deploy (run from the repo root):
2
+ # WITHCACHE_ADMIN_PASSWORD=secret podman compose -f deploy/compose.yml up -d
3
+ # (or: docker compose -f deploy/compose.yml up -d)
4
+ services:
5
+ cache-host:
6
+ build:
7
+ context: ..
8
+ dockerfile: deploy/Containerfile
9
+ image: withcache:latest
10
+ container_name: withcache
11
+ ports:
12
+ - "3000:3000"
13
+ environment:
14
+ # Protect the operator UI. Unset => UI is open (a warning is logged).
15
+ - WITHCACHE_ADMIN_PASSWORD=${WITHCACHE_ADMIN_PASSWORD:-}
16
+ # Optional: pin the session-signing key (else persisted under /data).
17
+ # - WITHCACHE_SESSION_SECRET=${WITHCACHE_SESSION_SECRET:-}
18
+ volumes:
19
+ - withcache-data:/data
20
+ restart: unless-stopped
21
+
22
+ volumes:
23
+ withcache-data: