withcache 0.2.0__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: withcache
3
- Version: 0.2.0
3
+ Version: 0.3.0
4
4
  Summary: Operator-curated, URL-keyed artifact cache for a small lab (CUDA/ROCm/DOCA/firmware)
5
5
  Project-URL: Homepage, https://github.com/safl/withcache
6
6
  Author-email: "Simon A. F. Lund" <safl@safl.dk>
@@ -111,8 +111,10 @@ WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 300
111
111
 
112
112
  Data (blobs + `cache.db` + `session-secret`) lives in the `/data` volume (or
113
113
  `--data-dir`). Artifacts are immutable per version, so there's no cache
114
- invalidation. `--workers N` sets the number of concurrent download workers, and
115
- `--curate` switches from auto-fetch to operator-approved pulls.
114
+ invalidation. `--workers N` sets the number of concurrent download workers,
115
+ `--curate` switches from auto-fetch to operator-approved pulls, and `--max-bytes`
116
+ (e.g. `50G`) caps the cache: when full it refuses new fills (no auto-eviction),
117
+ and you free space by deleting artifacts in the UI.
116
118
 
117
119
  ## Use the shims (transparent `curl` / `wget`)
118
120
 
@@ -238,7 +240,7 @@ Notes & limits (all degrade gracefully; worst case is "no caching, curl still wo
238
240
  `http://withcache-server:3000/` (Pico.css + HTMX, bundled offline) shows:
239
241
  - **Misses**: auto-fetched by default, or (under `--curate`) each with **Download** (queues a background pull) and **Dismiss**.
240
242
  - **Downloads**: live progress bars, `queued/running/completed/cancelled/failed`, **Cancel**, and **Clear finished**. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on [bty]'s job managers.
241
- - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at.
243
+ - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at, each with **Delete** to free space.
242
244
  - **Add from URI**: pre-seed an artifact before anyone misses it.
243
245
 
244
246
  ## Auth
@@ -264,6 +266,25 @@ CDN/presigned URLs (whose tokens change every request) still match by path. Pass
264
266
  (`.deb`/`.rpm`) are GPG-signed and verified by the client regardless of
265
267
  transport, so caching them this way is safe.
266
268
 
269
+ ## Consume from another tool (the client library)
270
+
271
+ A tool that already knows its download URLs (e.g. an installer or a provisioner)
272
+ can prefer the cache without shelling out to a shim or re-implementing the `/b/`
273
+ scheme. `withcache.client` is stdlib-only, so importing it adds no dependencies:
274
+
275
+ ```python
276
+ from withcache import client
277
+
278
+ # "use the cache when it's warm, the origin otherwise"
279
+ url = client.serve_url("http://cache:3000", origin) or origin
280
+ ```
281
+
282
+ `is_cached()` is a graceful `HEAD` (a miss, timeout, or unreachable cache all
283
+ return `False`, so you fall back to the origin), and it doubles as a warm-up:
284
+ the probe records the miss and, in auto-fetch mode, enqueues the fill, so the
285
+ next call flips to the cache. The encoding is shared with the shims and server,
286
+ so consumers stay in lockstep with the cache-host.
287
+
267
288
  ## Tests
268
289
 
269
290
  ```sh
@@ -93,8 +93,10 @@ WITHCACHE_ADMIN_PASSWORD=change-me withcache-server --data-dir ./data --port 300
93
93
 
94
94
  Data (blobs + `cache.db` + `session-secret`) lives in the `/data` volume (or
95
95
  `--data-dir`). Artifacts are immutable per version, so there's no cache
96
- invalidation. `--workers N` sets the number of concurrent download workers, and
97
- `--curate` switches from auto-fetch to operator-approved pulls.
96
+ invalidation. `--workers N` sets the number of concurrent download workers,
97
+ `--curate` switches from auto-fetch to operator-approved pulls, and `--max-bytes`
98
+ (e.g. `50G`) caps the cache: when full it refuses new fills (no auto-eviction),
99
+ and you free space by deleting artifacts in the UI.
98
100
 
99
101
  ## Use the shims (transparent `curl` / `wget`)
100
102
 
@@ -220,7 +222,7 @@ Notes & limits (all degrade gracefully; worst case is "no caching, curl still wo
220
222
  `http://withcache-server:3000/` (Pico.css + HTMX, bundled offline) shows:
221
223
  - **Misses**: auto-fetched by default, or (under `--curate`) each with **Download** (queues a background pull) and **Dismiss**.
222
224
  - **Downloads**: live progress bars, `queued/running/completed/cancelled/failed`, **Cancel**, and **Clear finished**. Downloads run in a background worker pool, not in the request, so large pulls never block, modelled on [bty]'s job managers.
223
- - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at.
225
+ - **Cached artifacts**: URL, size, **hits** (times served) and **misses** (times requested before it was cached), SHA-256, fetched-at, each with **Delete** to free space.
224
226
  - **Add from URI**: pre-seed an artifact before anyone misses it.
225
227
 
226
228
  ## Auth
@@ -246,6 +248,25 @@ CDN/presigned URLs (whose tokens change every request) still match by path. Pass
246
248
  (`.deb`/`.rpm`) are GPG-signed and verified by the client regardless of
247
249
  transport, so caching them this way is safe.
248
250
 
251
+ ## Consume from another tool (the client library)
252
+
253
+ A tool that already knows its download URLs (e.g. an installer or a provisioner)
254
+ can prefer the cache without shelling out to a shim or re-implementing the `/b/`
255
+ scheme. `withcache.client` is stdlib-only, so importing it adds no dependencies:
256
+
257
+ ```python
258
+ from withcache import client
259
+
260
+ # "use the cache when it's warm, the origin otherwise"
261
+ url = client.serve_url("http://cache:3000", origin) or origin
262
+ ```
263
+
264
+ `is_cached()` is a graceful `HEAD` (a miss, timeout, or unreachable cache all
265
+ return `False`, so you fall back to the origin), and it doubles as a warm-up:
266
+ the probe records the miss and, in auto-fetch mode, enqueues the fill, so the
267
+ next call flips to the cache. The encoding is shared with the shims and server,
268
+ so consumers stay in lockstep with the cache-host.
269
+
249
270
  ## Tests
250
271
 
251
272
  ```sh
@@ -4,8 +4,11 @@
4
4
  FROM python:3.12-slim
5
5
 
6
6
  # Install the package (no third-party deps) to get the withcache-server command.
7
+ # hatch_build.py is the wheel build hook (ships the shims); without it the build
8
+ # fails. No zig in this image, so the shims install as Python launchers, which
9
+ # is fine -- the container only runs withcache-server.
7
10
  WORKDIR /app
8
- COPY pyproject.toml README.md /app/
11
+ COPY pyproject.toml README.md hatch_build.py /app/
9
12
  COPY src /app/src
10
13
  RUN pip install --no-cache-dir /app
11
14
 
@@ -2,7 +2,7 @@
2
2
  .name = .withcache_shim,
3
3
  // Zig requires a literal here; keep it in lockstep with the project's
4
4
  // single source (src/withcache/__init__.py) via `make bump` / `make version-check`.
5
- .version = "0.2.0",
5
+ .version = "0.3.0",
6
6
  .fingerprint = 0xd7d96c5ed212ccaa,
7
7
  .minimum_zig_version = "0.16.0",
8
8
  .paths = .{
@@ -0,0 +1,17 @@
1
+ """withcache — operator-curated, URL-keyed artifact cache for a small lab.
2
+
3
+ - ``withcache-server`` (withcache.server:main): the cache-host.
4
+ - ``curlwithcache`` / ``wgetwithcache``: transparent curl/wget shims, shipped
5
+ as a native binary or a Python launcher (see hatch_build.py).
6
+ - ``withcache.client``: a tiny, stdlib-only library for other tools to consume
7
+ a cache-host (build serve URLs, probe what's cached) without re-implementing
8
+ the ``/b/`` URL scheme.
9
+
10
+ All modules are stdlib-only and self-contained.
11
+ """
12
+
13
+ from .client import blob_url, cache_base, is_cached, serve_url
14
+
15
+ __version__ = "0.3.0"
16
+
17
+ __all__ = ["__version__", "blob_url", "cache_base", "is_cached", "serve_url"]
@@ -0,0 +1,62 @@
1
+ """A tiny client for consuming a withcache cache-host from other tools.
2
+
3
+ Lets a consumer (e.g. bty) point downloads at withcache without re-implementing
4
+ the ``/b/`` URL scheme. Stdlib only, so importing it pulls in no third-party
5
+ dependencies.
6
+
7
+ from withcache import client
8
+
9
+ # "use the cache when it's warm, the origin otherwise"
10
+ url = client.serve_url("http://cache:3000", origin) or origin
11
+
12
+ The ``/b/<urlsafe-b64(origin)>/<basename>`` encoding is shared with the shims
13
+ and the server (one definition in :mod:`withcache._shim`), so consumers stay in
14
+ lockstep with the cache-host automatically.
15
+ """
16
+
17
+ from __future__ import annotations
18
+
19
+ import urllib.error
20
+ import urllib.request
21
+
22
+ from . import _shim
23
+
24
+ __all__ = ["PROBE_TIMEOUT", "blob_url", "cache_base", "is_cached", "serve_url"]
25
+
26
+ PROBE_TIMEOUT = 3.0 # seconds; never block the caller on a slow/unreachable cache
27
+
28
+ #: Normalize a server value: accepts 'host', 'host:3000', or 'http://host:3000'.
29
+ cache_base = _shim.cache_base
30
+
31
+
32
+ def blob_url(server: str, origin: str) -> str:
33
+ """The cache-host serve URL for ``origin``:
34
+ ``<server>/b/<urlsafe-b64(origin), unpadded>/<basename>``. The trailing
35
+ basename is cosmetic (so any downloader names the saved file after the
36
+ artifact); the cache keys on the decoded origin URL."""
37
+ return _shim.blob_url(_shim.cache_base(server), origin)
38
+
39
+
40
+ def is_cached(server: str, origin: str, timeout: float = PROBE_TIMEOUT) -> bool:
41
+ """True if the cache-host already holds ``origin`` (a ``HEAD`` on ``/b/``
42
+ returns 200). A miss (404), an unreachable host, a timeout, or any error
43
+ returns False, so a caller can safely fall back to the origin. The HEAD
44
+ also *warms* an auto-fetch cache-host: the miss is recorded and the
45
+ background fill enqueued, so a later probe flips to cached."""
46
+ req = urllib.request.Request(blob_url(server, origin), method="HEAD")
47
+ try:
48
+ with urllib.request.urlopen(req, timeout=timeout) as resp:
49
+ return bool(resp.status == 200)
50
+ except urllib.error.HTTPError:
51
+ return False # 404 miss (now recorded + enqueued by the cache-host)
52
+ except (urllib.error.URLError, OSError):
53
+ return False # unreachable / timeout -> caller serves the origin itself
54
+
55
+
56
+ def serve_url(server: str, origin: str, timeout: float = PROBE_TIMEOUT) -> str | None:
57
+ """The cache-host serve URL for ``origin`` if the cache holds it, else
58
+ ``None`` -- the convenience form of "use the cache when warm":
59
+
60
+ url = client.serve_url(cache, origin) or origin
61
+ """
62
+ return blob_url(server, origin) if is_cached(server, origin, timeout) else None
@@ -60,6 +60,28 @@ def human_size(n: int) -> str:
60
60
  return f"{n} B"
61
61
 
62
62
 
63
+ def parse_size(s: str) -> int:
64
+ """Parse '0', '1024', '50M', '20G', '1.5T' into bytes (suffixes are 1024-based)."""
65
+ s = str(s).strip()
66
+ if not s:
67
+ return 0
68
+ units = {"K": 1024, "M": 1024**2, "G": 1024**3, "T": 1024**4}
69
+ if s[-1].upper() in units:
70
+ return int(float(s[:-1]) * units[s[-1].upper()])
71
+ return int(s)
72
+
73
+
74
+ def parse_headers(raw: str) -> dict | None:
75
+ """Parse 'Name: Value' lines (e.g. a registry Authorization header that bty
76
+ pre-resolves for an oras blob) into a dict for the origin fetch; None if empty."""
77
+ out = {}
78
+ for line in (raw or "").splitlines():
79
+ name, sep, value = line.partition(":")
80
+ if sep and name.strip():
81
+ out[name.strip()] = value.strip()
82
+ return out or None
83
+
84
+
63
85
  # --------------------------------------------------------------------------
64
86
  # Auth — server-signed session cookie (bty-style, env-password instead of PAM)
65
87
  # --------------------------------------------------------------------------
@@ -135,12 +157,13 @@ class Auth:
135
157
  class Store:
136
158
  """Blobs on disk keyed by hash(normalized url); metadata in SQLite."""
137
159
 
138
- def __init__(self, data_dir: str, keep_query: bool):
160
+ def __init__(self, data_dir: str, keep_query: bool, max_bytes: int = 0):
139
161
  self.data_dir = os.path.abspath(data_dir)
140
162
  self.blob_dir = os.path.join(self.data_dir, "blobs")
141
163
  self.tmp_dir = os.path.join(self.data_dir, "tmp")
142
164
  self.db_path = os.path.join(self.data_dir, "cache.db")
143
165
  self.keep_query = keep_query
166
+ self.max_bytes = max_bytes # cap on total cached bytes; 0 = unlimited
144
167
  os.makedirs(self.blob_dir, exist_ok=True)
145
168
  os.makedirs(self.tmp_dir, exist_ok=True)
146
169
  self._init_db()
@@ -217,6 +240,15 @@ class Store:
217
240
  m = c.execute("SELECT COUNT(*) FROM misses").fetchone()[0]
218
241
  return b, m
219
242
 
243
+ def total_size(self) -> int:
244
+ with self.conn() as c:
245
+ return c.execute("SELECT COALESCE(SUM(size), 0) FROM blobs").fetchone()[0]
246
+
247
+ def has_capacity(self) -> bool:
248
+ """False once stored bytes reach --max-bytes (0 = unlimited). The guard
249
+ refuses *new* fills when full; it never evicts (delete is manual)."""
250
+ return self.max_bytes <= 0 or self.total_size() < self.max_bytes
251
+
220
252
  # -- writes ------------------------------------------------------------
221
253
  def record_miss(self, url: str):
222
254
  key = self.key_of(self.normalize(url))
@@ -243,17 +275,34 @@ class Store:
243
275
  with _DB_WRITE_LOCK, self.conn() as c:
244
276
  c.execute("DELETE FROM misses WHERE key=?", (key,))
245
277
 
246
- def store_from_origin(self, url: str, progress=None, cancel=None) -> sqlite3.Row:
278
+ def delete_blob(self, key: str):
279
+ """Drop a cached artifact (row + bytes). The manual half of eviction."""
280
+ with _DB_WRITE_LOCK, self.conn() as c:
281
+ c.execute("DELETE FROM blobs WHERE key=?", (key,))
282
+ try:
283
+ os.remove(self.blob_path(key))
284
+ except FileNotFoundError:
285
+ pass
286
+
287
+ def store_from_origin(self, url: str, progress=None, cancel=None, headers=None) -> sqlite3.Row:
247
288
  """Operator-triggered: pull the artifact from origin and store it.
248
289
 
249
290
  ``progress(done, total)`` is called as bytes arrive (total may be None);
250
291
  ``cancel()`` is polled between chunks and, if truthy, aborts the pull
251
292
  with :class:`DownloadCancelled` and leaves no partial file behind.
293
+ ``headers`` adds request headers to the origin fetch (e.g. a registry
294
+ bearer token bty pre-resolved for an oras blob). Raises :class:`CacheFull`
295
+ if the cache is already at --max-bytes.
252
296
  """
297
+ if not self.has_capacity():
298
+ raise CacheFull(f"cache full (>= {self.max_bytes} bytes); refusing to fetch {url}")
253
299
  normalized = self.normalize(url)
254
300
  key = self.key_of(normalized)
255
301
  tmp = os.path.join(self.tmp_dir, key + ".part")
256
- req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
302
+ req_headers = {"User-Agent": USER_AGENT}
303
+ if headers:
304
+ req_headers.update(headers)
305
+ req = urllib.request.Request(url, headers=req_headers)
257
306
  sha = hashlib.sha256()
258
307
  size = 0
259
308
  try:
@@ -315,6 +364,10 @@ class DownloadCancelled(Exception):
315
364
  """Raised inside a worker when its job's cancel flag is set."""
316
365
 
317
366
 
367
+ class CacheFull(Exception):
368
+ """Raised when --max-bytes is reached; the fill is refused, not evicted."""
369
+
370
+
318
371
  @dataclass
319
372
  class Job:
320
373
  id: int
@@ -326,6 +379,7 @@ class Job:
326
379
  finished_at: float | None = None
327
380
  error: str | None = None
328
381
  sha256: str | None = None
382
+ headers: dict | None = field(default=None, repr=False) # e.g. registry auth; never logged
329
383
  _cancel: threading.Event = field(default_factory=threading.Event, repr=False)
330
384
 
331
385
 
@@ -345,12 +399,12 @@ class DownloadManager:
345
399
  for _ in range(max(1, workers)):
346
400
  threading.Thread(target=self._worker, daemon=True).start()
347
401
 
348
- def enqueue(self, url: str) -> Job:
402
+ def enqueue(self, url: str, headers: dict | None = None) -> Job:
349
403
  with self._lock:
350
404
  jid = self._active.get(url)
351
405
  if jid is not None and self._jobs[jid].status in PENDING_STATES:
352
406
  return self._jobs[jid] # dedup an already-pending pull
353
- job = Job(id=next(self._ids), url=url)
407
+ job = Job(id=next(self._ids), url=url, headers=headers)
354
408
  self._jobs[job.id] = job
355
409
  self._active[url] = job.id
356
410
  self._q.put(job.id)
@@ -392,6 +446,7 @@ class DownloadManager:
392
446
  job.url,
393
447
  progress=lambda done, total, j=job: _set_progress(j, done, total),
394
448
  cancel=job._cancel.is_set,
449
+ headers=job.headers,
395
450
  )
396
451
  with self._lock:
397
452
  job.status = "completed"
@@ -474,7 +529,13 @@ class Handler(http.server.BaseHTTPRequestHandler):
474
529
  else:
475
530
  self.send_text(404, "")
476
531
 
477
- ADMIN_POST = ("/admin/fetch", "/admin/dismiss", "/admin/cancel", "/admin/clear")
532
+ ADMIN_POST = (
533
+ "/admin/fetch",
534
+ "/admin/dismiss",
535
+ "/admin/delete",
536
+ "/admin/cancel",
537
+ "/admin/clear",
538
+ )
478
539
 
479
540
  def do_POST(self):
480
541
  parsed = urllib.parse.urlsplit(self.path)
@@ -490,9 +551,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
490
551
  if parsed.path == "/admin/fetch":
491
552
  url = form.get("url", "").strip()
492
553
  if url:
493
- self.mgr.enqueue(url)
554
+ self.mgr.enqueue(url, headers=parse_headers(form.get("header", "")))
494
555
  elif parsed.path == "/admin/dismiss":
495
556
  self.store.dismiss(form.get("key", "").strip())
557
+ elif parsed.path == "/admin/delete":
558
+ self.store.delete_blob(form.get("key", "").strip())
496
559
  elif parsed.path == "/admin/cancel":
497
560
  jid = form.get("id", "")
498
561
  if jid.isdigit():
@@ -594,10 +657,12 @@ class Handler(http.server.BaseHTTPRequestHandler):
594
657
  row = self.store.get_blob(url)
595
658
  if row is None:
596
659
  self.store.record_miss(url)
597
- if self.auto_fetch:
660
+ if self.auto_fetch and self.store.has_capacity():
598
661
  # Pull it in the background so the next request hits; the client
599
- # gets this one from origin (the shim falls through on a miss).
600
- # In --curate mode an operator triggers the pull instead.
662
+ # gets this one from origin (the shim, or bty's fallback chain,
663
+ # falls through on a miss). In --curate mode an operator triggers
664
+ # the pull instead; when the cache is full we record the miss but
665
+ # schedule nothing (delete something first).
601
666
  self.mgr.enqueue(url)
602
667
  self.send_text(404, "cache miss (recorded)\n")
603
668
  return
@@ -736,6 +801,10 @@ class Handler(http.server.BaseHTTPRequestHandler):
736
801
  jobs = self.mgr.list()
737
802
  misses = self.store.list_misses()
738
803
  blobs = self.store.list_blobs()
804
+ used = human_size(self.store.total_size())
805
+ if self.store.max_bytes:
806
+ used += f" / {human_size(self.store.max_bytes)}"
807
+ full = "" if self.store.has_capacity() else " &middot; <strong>cache full</strong>"
739
808
 
740
809
  job_rows = (
741
810
  "".join(self._job_row(j) for j in jobs)
@@ -774,14 +843,21 @@ class Handler(http.server.BaseHTTPRequestHandler):
774
843
  <td class="num">{b["misses"]}</td>
775
844
  <td class="mono">{html.escape(b["sha256"][:12])}…</td>
776
845
  <td><small>{html.escape(b["fetched_at"])}</small></td>
846
+ <td>
847
+ <form hx-post="/admin/delete" hx-target="#dash" hx-swap="innerHTML"
848
+ hx-confirm="Delete this cached artifact?">
849
+ <input type="hidden" name="key" value="{html.escape(b["key"], quote=True)}">
850
+ <button type="submit" class="secondary outline">Delete</button>
851
+ </form>
852
+ </td>
777
853
  </tr>"""
778
854
  for b in blobs
779
855
  )
780
- or '<tr><td colspan="6"><em>Cache is empty.</em></td></tr>'
856
+ or '<tr><td colspan="7"><em>Cache is empty.</em></td></tr>'
781
857
  )
782
858
 
783
859
  return f"""
784
- <p><small>{nblobs} cached &middot; {nmisses} pending miss(es)</small></p>
860
+ <p><small>{nblobs} cached ({used}){full} &middot; {nmisses} pending miss(es)</small></p>
785
861
 
786
862
  <div class="row">
787
863
  <h4>Downloads</h4>
@@ -805,7 +881,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
805
881
  <figure><table class="striped">
806
882
  <thead><tr>
807
883
  <th>URL</th><th>Size</th><th class="num">Hits</th><th class="num">Misses</th>
808
- <th>SHA-256</th><th>Fetched</th>
884
+ <th>SHA-256</th><th>Fetched</th><th>Action</th>
809
885
  </tr></thead>
810
886
  <tbody>{blob_rows}</tbody>
811
887
  </table></figure>"""
@@ -869,9 +945,15 @@ def main():
869
945
  help="require an operator to approve each pull (default: auto-fetch a "
870
946
  "missed artifact in the background so the next request hits)",
871
947
  )
948
+ ap.add_argument(
949
+ "--max-bytes",
950
+ default="0",
951
+ help="cap total cached bytes and refuse new fills when full (0 = "
952
+ "unlimited; accepts 1024-based suffixes, e.g. 50G). Eviction is manual.",
953
+ )
872
954
  args = ap.parse_args()
873
955
 
874
- store = Store(args.data_dir, keep_query=args.keep_query)
956
+ store = Store(args.data_dir, keep_query=args.keep_query, max_bytes=parse_size(args.max_bytes))
875
957
  auth = Auth(resolve_secret(store.data_dir), os.environ.get("WITHCACHE_ADMIN_PASSWORD"))
876
958
  mgr = DownloadManager(store, workers=args.workers)
877
959
 
@@ -883,7 +965,8 @@ def main():
883
965
  print(
884
966
  f"withcache cache-host on http://{args.host}:{args.port} "
885
967
  f"(data={store.data_dir}, keep_query={args.keep_query}, workers={args.workers}, "
886
- f"mode={'curate' if args.curate else 'auto-fetch'})",
968
+ f"mode={'curate' if args.curate else 'auto-fetch'}, "
969
+ f"max_bytes={'unlimited' if not store.max_bytes else human_size(store.max_bytes)})",
887
970
  flush=True,
888
971
  )
889
972
  if not auth.enabled:
@@ -19,7 +19,7 @@ import base64 # noqa: E402
19
19
  import urllib.error # noqa: E402
20
20
  import urllib.request # noqa: E402
21
21
 
22
- from withcache import _shim, curlwithcache, server, wgetwithcache # noqa: E402
22
+ from withcache import _shim, client, curlwithcache, server, wgetwithcache # noqa: E402
23
23
 
24
24
 
25
25
  # --------------------------------------------------------------------------
@@ -148,6 +148,23 @@ class TestStoreFromOrigin(unittest.TestCase):
148
148
  got = self.store.get_blob(url)
149
149
  self.assertEqual((got["hits"], got["misses"]), (2, 2))
150
150
 
151
+ def test_delete_blob_removes_row_and_file(self):
152
+ url = f"http://127.0.0.1:{self.port}/artifact.bin"
153
+ row = self.store.store_from_origin(url)
154
+ path = self.store.blob_path(row["key"])
155
+ self.assertTrue(os.path.exists(path))
156
+ self.store.delete_blob(row["key"])
157
+ self.assertIsNone(self.store.get_blob(url))
158
+ self.assertFalse(os.path.exists(path))
159
+
160
+ def test_capacity_guard_refuses_new_fills_when_full(self):
161
+ store = server.Store(tempfile.mkdtemp(), keep_query=False, max_bytes=1)
162
+ self.assertTrue(store.has_capacity()) # empty: room for the first
163
+ store.store_from_origin(f"http://127.0.0.1:{self.port}/a.bin")
164
+ self.assertFalse(store.has_capacity()) # now over the 1-byte cap
165
+ with self.assertRaises(server.CacheFull):
166
+ store.store_from_origin(f"http://127.0.0.1:{self.port}/b.bin")
167
+
151
168
 
152
169
  # --------------------------------------------------------------------------
153
170
  # _shim: URL detection, rewrite, real-tool resolution, env, path-encoding
@@ -422,5 +439,105 @@ class TestAutoFetchOnMiss(unittest.TestCase):
422
439
  httpd.server_close()
423
440
 
424
441
 
442
+ # --------------------------------------------------------------------------
443
+ # Fetch-with-headers: a registry blob behind bearer auth (the oras case). bty
444
+ # pre-resolves the token and hands it to withcache for the fill.
445
+ # --------------------------------------------------------------------------
446
+ class _AuthOrigin(http.server.BaseHTTPRequestHandler):
447
+ TOKEN = "Bearer s3cret"
448
+
449
+ def do_GET(self):
450
+ if self.headers.get("Authorization") != self.TOKEN:
451
+ self.send_response(401)
452
+ self.send_header("Content-Length", "0")
453
+ self.end_headers()
454
+ return
455
+ self.send_response(200)
456
+ self.send_header("Content-Length", str(len(PAYLOAD)))
457
+ self.end_headers()
458
+ self.wfile.write(PAYLOAD)
459
+
460
+ def log_message(self, format, *args):
461
+ pass
462
+
463
+
464
+ class TestFetchWithHeaders(unittest.TestCase):
465
+ def setUp(self):
466
+ self.httpd = socketserver.TCPServer(("127.0.0.1", 0), _AuthOrigin)
467
+ threading.Thread(target=self.httpd.serve_forever, daemon=True).start()
468
+ self.url = f"http://127.0.0.1:{self.httpd.server_address[1]}/blob.bin"
469
+ self.store = server.Store(tempfile.mkdtemp(), keep_query=False)
470
+
471
+ def tearDown(self):
472
+ self.httpd.shutdown()
473
+ self.httpd.server_close()
474
+
475
+ def test_fetch_without_header_is_rejected(self):
476
+ with self.assertRaises(urllib.error.HTTPError) as cm:
477
+ self.store.store_from_origin(self.url)
478
+ self.assertEqual(cm.exception.code, 401)
479
+
480
+ def test_fetch_with_bearer_header_succeeds(self):
481
+ row = self.store.store_from_origin(self.url, headers={"Authorization": _AuthOrigin.TOKEN})
482
+ self.assertEqual(row["size"], len(PAYLOAD))
483
+
484
+
485
+ # --------------------------------------------------------------------------
486
+ # Pure helpers
487
+ # --------------------------------------------------------------------------
488
+ class TestParsers(unittest.TestCase):
489
+ def test_parse_size(self):
490
+ self.assertEqual(server.parse_size(""), 0)
491
+ self.assertEqual(server.parse_size("0"), 0)
492
+ self.assertEqual(server.parse_size("1024"), 1024)
493
+ self.assertEqual(server.parse_size("50M"), 50 * 1024**2)
494
+ self.assertEqual(server.parse_size("1.5G"), int(1.5 * 1024**3))
495
+
496
+ def test_parse_headers(self):
497
+ self.assertIsNone(server.parse_headers(""))
498
+ self.assertEqual(
499
+ server.parse_headers("Authorization: Bearer x"), {"Authorization": "Bearer x"}
500
+ )
501
+ self.assertEqual(server.parse_headers("A: 1\nB: 2"), {"A": "1", "B": "2"})
502
+
503
+
504
+ # --------------------------------------------------------------------------
505
+ # Client library: what a consumer (e.g. bty) imports instead of reimplementing
506
+ # the /b/ protocol.
507
+ # --------------------------------------------------------------------------
508
+ class TestClientLibrary(unittest.TestCase):
509
+ def setUp(self):
510
+ self.origin = socketserver.TCPServer(("127.0.0.1", 0), _Origin)
511
+ threading.Thread(target=self.origin.serve_forever, daemon=True).start()
512
+ self.origin_url = f"http://127.0.0.1:{self.origin.server_address[1]}/art.bin"
513
+ self.httpd, self.store = _start_withcache()
514
+ self.base = f"http://127.0.0.1:{self.httpd.server_address[1]}"
515
+
516
+ def tearDown(self):
517
+ for s in (self.origin, self.httpd):
518
+ s.shutdown()
519
+ s.server_close()
520
+
521
+ def test_blob_url_matches_shim_and_normalizes_server(self):
522
+ # accepts a host/host:port/http URL and emits the same /b/ URL as the shim
523
+ self.assertEqual(
524
+ client.blob_url(self.base, self.origin_url),
525
+ _shim.blob_url(_shim.cache_base(self.base), self.origin_url),
526
+ )
527
+
528
+ def test_is_cached_and_serve_url_track_the_cache(self):
529
+ self.assertFalse(client.is_cached(self.base, self.origin_url))
530
+ self.assertIsNone(client.serve_url(self.base, self.origin_url))
531
+ self.store.store_from_origin(self.origin_url) # warm it
532
+ self.assertTrue(client.is_cached(self.base, self.origin_url))
533
+ self.assertEqual(
534
+ client.serve_url(self.base, self.origin_url),
535
+ client.blob_url(self.base, self.origin_url),
536
+ )
537
+
538
+ def test_is_cached_unreachable_is_false(self):
539
+ self.assertFalse(client.is_cached("http://127.0.0.1:9", self.origin_url, timeout=0.5))
540
+
541
+
425
542
  if __name__ == "__main__":
426
543
  unittest.main(verbosity=2)
@@ -1,11 +0,0 @@
1
- """withcache — operator-curated, URL-keyed artifact cache for a small lab.
2
-
3
- Two console entry points (see pyproject.toml):
4
- withcache -> withcache.client:main (the cache-aware downloader)
5
- withcache-server -> withcache.server:main (the cache-host)
6
-
7
- Both modules are stdlib-only and self-contained, so either file can also be
8
- copied and run on its own with a plain ``python3``.
9
- """
10
-
11
- __version__ = "0.2.0"
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes