gemkeeper 0.7.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +18 -1
  3. data/README.md +11 -11
  4. data/lib/gemkeeper/bundler_mirror_configurator.rb +1 -1
  5. data/lib/gemkeeper/cli/commands/list.rb +2 -2
  6. data/lib/gemkeeper/cli/commands/server/start.rb +4 -4
  7. data/lib/gemkeeper/cli/commands/server/status.rb +3 -3
  8. data/lib/gemkeeper/cli/commands/server/stop.rb +3 -3
  9. data/lib/gemkeeper/cli/commands/sync.rb +1 -1
  10. data/lib/gemkeeper/compact_index_server/cache_meta.rb +34 -0
  11. data/lib/gemkeeper/compact_index_server/cache_store.rb +64 -0
  12. data/lib/gemkeeper/compact_index_server/gem_cache.rb +88 -0
  13. data/lib/gemkeeper/compact_index_server/gem_index.rb +78 -0
  14. data/lib/gemkeeper/compact_index_server/index_merger.rb +81 -0
  15. data/lib/gemkeeper/compact_index_server/response.rb +12 -0
  16. data/lib/gemkeeper/compact_index_server/response_builder.rb +63 -0
  17. data/lib/gemkeeper/compact_index_server/rubygems_client.rb +59 -0
  18. data/lib/gemkeeper/compact_index_server/spec_mapper.rb +38 -0
  19. data/lib/gemkeeper/compact_index_server/upload_handler.rb +36 -0
  20. data/lib/gemkeeper/compact_index_server/upstream_cache.rb +26 -0
  21. data/lib/gemkeeper/compact_index_server.rb +131 -0
  22. data/lib/gemkeeper/configuration.rb +1 -1
  23. data/lib/gemkeeper/gem_syncer.rb +53 -84
  24. data/lib/gemkeeper/gem_uploader.rb +26 -18
  25. data/lib/gemkeeper/rackup_process.rb +12 -7
  26. data/lib/gemkeeper/repo_fetcher.rb +80 -0
  27. data/lib/gemkeeper/server_manager.rb +1 -1
  28. data/lib/gemkeeper/version.rb +1 -1
  29. data/lib/gemkeeper.rb +2 -0
  30. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-consolidated-v-1.md +168 -0
  31. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-claude.md +124 -0
  32. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-codex.md +125 -0
  33. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-copilot.md +261 -0
  34. data/specs/20260529-091429-replace-geminabox-compact-proxy/spec.md +360 -0
  35. data/specs/20260529-131354-sync-serve-cache-contract/critique-consolidated-v-1.md +95 -0
  36. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-claude.md +47 -0
  37. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-codex.md +112 -0
  38. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-copilot.md +169 -0
  39. data/specs/20260529-131354-sync-serve-cache-contract/implementation-summary.md +59 -0
  40. data/specs/20260529-131354-sync-serve-cache-contract/spec.md +169 -0
  41. metadata +38 -28
@@ -41,7 +41,7 @@ module Gemkeeper
41
41
 
42
42
  def status
43
43
  if running?
44
- { running: true, pid: read_pid, url: config.geminabox_url }
44
+ { running: true, pid: read_pid, url: config.server_url }
45
45
  else
46
46
  { running: false }
47
47
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gemkeeper
4
- VERSION = "0.7.2"
4
+ VERSION = "0.8.0"
5
5
  end
data/lib/gemkeeper.rb CHANGED
@@ -11,7 +11,9 @@ require_relative "gemkeeper/gem_repo_resolver"
11
11
  require_relative "gemkeeper/manifest_builder"
12
12
  require_relative "gemkeeper/manifest_validator"
13
13
  require_relative "gemkeeper/bundler_mirror_configurator"
14
+ require_relative "gemkeeper/repo_fetcher"
14
15
  require_relative "gemkeeper/gem_syncer"
16
+ require_relative "gemkeeper/compact_index_server"
15
17
  require_relative "gemkeeper/rackup_process"
16
18
  require_relative "gemkeeper/server_readiness_probe"
17
19
  require_relative "gemkeeper/config_generator"
@@ -0,0 +1,168 @@
1
+ # Spec 20260529-091429: Consolidated Critique (v1)
2
+
3
+ ## Overview
4
+
5
+ **Critiques received from:** Claude, Copilot, Codex
6
+ **Critiques missing:** None
7
+
8
+ ## Executive Summary
9
+
10
+ All three critics reached the same verdict: the direction and architecture are sound, but the spec is not implementation-ready.
11
+ The blocking issues cluster around three areas: the `/versions` merge algorithm is underspecified in ways Bundler will exercise directly; ETag/digest header generation for merged responses is absent; and security input validation is incorrectly dismissed.
12
+ There are also several codebase assumptions that don't match reality (GemUploader response codes, `rubygems-generate_index` dependency, `list_gems` dead method).
13
+ The fixes are additive — the spec doesn't need restructuring, just more precision in the places listed below.
14
+
15
+ ---
16
+
17
+ ## Blocking Issues (must resolve before implementation)
18
+
19
+ ### B-1. `/versions` merge algorithm is underspecified
20
+
21
+ All three critics flagged this.
22
+ The spec says private gems "take precedence" when a name appears in both, but doesn't define the algorithm.
23
+
24
+ The problems:
25
+
26
+ 1. **Collision semantics** — does "private takes precedence" mean: replace the entire public entry (suppressing all public versions), replace only matching version entries, or append a private-wins line and rely on Bundler's last-entry behavior? If public versions remain visible, a developer pulling `rails 7.1.0` from the private gem (a fork) will still see public rails versions and Bundler may resolve the wrong one (dependency confusion).
27
+
28
+ 2. **Byte stability** — Bundler caches the byte offset of the last `/versions` line it fetched and issues `Range: bytes=N-` on subsequent requests. If the merge interleaves private gems by insertion date, any new private gem shifts offsets of everything after it, corrupting Bundler's incremental update. The spec must define a stable layout: e.g., upstream public block verbatim first, private gem entries appended at the end.
29
+
30
+ 3. **`VersionsFile` API** — `CompactIndex.versions(versions_file, extra_gems)` takes a `CompactIndex::VersionsFile` object backed by a cached file, not a raw upstream response string. The spec must describe how the upstream body is persisted to disk and surfaced as a `VersionsFile`.
31
+
32
+ **Recommendation:**
33
+ - Cache the raw upstream `/versions` response verbatim to `cache_dir/rubygems_cache/versions`.
34
+ - Construct a `VersionsFile` from that cached file.
35
+ - Pass private gems as `extra_gems` so they are appended after the public block — this preserves byte stability for the public block.
36
+ - "Precedence" means private gem entries replace the matching public name in `extra_gems` merge (public versions for colliding names are suppressed).
37
+
38
+ ### B-2. ETag and `Repr-Digest` for merged responses must be computed from the merged body
39
+
40
+ The spec requires these headers (FR-1.3) but does not say how to derive them for merged responses.
41
+ All three critics noted this independently.
42
+
43
+ Forwarding the upstream `ETag` or `Repr-Digest` header from RubyGems.org is wrong — the merged body differs from the upstream body, so Bundler's SHA256 verification will fail and it will retry unconditionally.
44
+
45
+ **Recommendation:** For all endpoints serving merged or locally-generated content (`/versions`, `/info/:gemname` for private gems, `/names`), compute `ETag` and `Repr-Digest: sha-256=<base64>` from the final response body. Do not forward upstream headers unchanged.
46
+ Pick SHA256 for ETag (avoids a second hash pass; consistent with `Repr-Digest`).
47
+
48
+ ### B-3. `info_checksum` has a circular dependency
49
+
50
+ Copilot and Claude both identified this.
51
+ `CompactIndex::GemVersion#info_checksum` is the SHA256 of the `/info/:gemname` response body.
52
+ To populate it in `/versions`, the server must generate the `/info` body first, hash it, and embed the hash.
53
+ An implementer who builds the versions index before the info bodies will produce invalid checksums, causing Bundler to re-fetch unconditionally.
54
+
55
+ **Recommendation:** Add an AR stating: info bodies for all private gems are generated first; their SHA256 checksums are computed and stored in memory; then the `/versions` index is built referencing those checksums. Checksums are recomputed after each successful upload.
56
+
57
+ Also note: Codex flagged that `CompactIndex::GemVersion` may use `number` rather than `version` as the field name in 0.15.x. Verify the actual field names against the installed gem before implementation.
58
+
59
+ ### B-4. `/names` scope is undefined
60
+
61
+ Copilot and Codex both flagged this.
62
+ RubyGems.org `/names` is ~2.7 MB / ~175,000 entries.
63
+ FR-1.1 says `/names` returns "all gem names (local and proxied)" but doesn't say whether the server fetches, caches, and merges the upstream `/names` file or scopes to a smaller subset.
64
+
65
+ These produce different behavior:
66
+ - Full public namespace: correct, but expensive.
67
+ - Local-only + cached: bundle install fails on any public gem not already in the info cache.
68
+
69
+ **Recommendation:** Treat `/names` consistently with `/versions` — fetch and cache the upstream `/names` file; merge with local private gem names; apply the same ETag/cache TTL rules as `/info/:gemname`.
70
+
71
+ ---
72
+
73
+ ## Significant Gaps
74
+
75
+ ### G-1. Path traversal in `/gems/:filename.gem`
76
+
77
+ All three critics raised this.
78
+ The spec's security checklist marks this N/A because the server binds to 127.0.0.1.
79
+ Localhost binding does not prevent path traversal — any local process can issue a crafted request.
80
+ A filename like `../../gemkeeper.pid` resolves outside `gems_path/gems/`.
81
+ The same `gemname` parameter is used to construct `https://rubygems.org/info/:gemname`, creating an SSRF vector if unvalidated.
82
+
83
+ **Recommendation:** Add a validation constraint: `:gemname` and `:filename` URL parameters are validated against `/\A[a-zA-Z0-9._-]+\z/` before any filesystem or upstream URL construction. Return 400 otherwise. Additionally, assert the resolved file path is under `gems_path/gems/` before serving.
84
+
85
+ ### G-2. "Valid gem" definition in FR-1.2 is ambiguous
86
+
87
+ All three critics flagged this.
88
+ Three interpretations exist: (1) correct `.gem` extension; (2) parseable tar archive containing `metadata.gz`; (3) `metadata.gz` can be loaded as a Ruby gemspec without errors.
89
+ Option 3 is dangerous — loading a gemspec can execute arbitrary Ruby.
90
+
91
+ **Recommendation:** Specify option 2: validate that the file is a tar archive containing `metadata.gz` and `data.tar.gz`. Parse gemspec metadata from `metadata.gz` using `Gem::Package.new(path).spec` inside a rescue block. If parsing raises, return 422. Do not `load` or `eval` gemspec content.
92
+
93
+ ### G-3. Upload atomicity and `gems_path/gems/` creation
94
+
95
+ Codex and Claude raised this.
96
+ FR-1.2 does not require `mkdir_p gems_path/gems/` (the subdirectory may not exist on first run), atomic temp-file writes (concurrent uploads of different gems can interleave file writes), or cleanup of temp files after failed validation.
97
+
98
+ **Recommendation:** Add to FR-1.2: create `gems_path/gems/` if absent before writing; write to a temp file in the same directory first, then `File.rename` to the target path (atomic on POSIX); delete temp file on validation failure.
99
+
100
+ ### G-4. Thread safety for in-memory gem index
101
+
102
+ Copilot and Codex raised this.
103
+ Puma uses a thread pool. A `GET /info/:gemname` concurrent with an upload rebuilding the in-memory index produces a data race.
104
+
105
+ **Recommendation:** Add an AR: the in-memory gem index is replaced atomically via a single instance variable assignment after each full rebuild (copy-on-write pattern). Index reads do not acquire a lock; the rebuild completes into a new object before swapping.
106
+
107
+ ### G-5. Upstream 404 vs outage distinction in FR-3.4
108
+
109
+ Codex and Copilot raised this.
110
+ FR-3.4 says return 404 with `"Upstream unavailable and no local cache"`.
111
+ But if RubyGems.org returns a genuine 404 (the gem does not exist), the spec's response body is misleading.
112
+ The two cases — "upstream reachable, gem not found" and "upstream unreachable" — should produce different responses.
113
+
114
+ **Recommendation:** Distinguish: upstream reachable + 404 → return 404 with no body; upstream unreachable (connection error, timeout) + no cache → return 503; upstream unreachable + cache exists → serve from cache with appropriate headers.
115
+
116
+ ### G-6. `GemUploader#list_gems` becomes a dead method
117
+
118
+ Copilot raised this.
119
+ `GemUploader#list_gems` calls `GET /api/v1/gems.json`, a Geminabox-specific endpoint the new server will not implement.
120
+ The method is not called in production code (list reads the filesystem), but it is a public method that silently breaks after migration.
121
+
122
+ **Recommendation:** Either add `GET /api/v1/gems.json` to the Out of Scope list and note that `list_gems` becomes a broken dead method (acceptable since it is unused), or have the server return a 404 for that path and add a note in the spec that the method should be removed or stubbed.
123
+
124
+ ### G-7. `rubygems-generate_index` dependency not addressed
125
+
126
+ Copilot and Codex both noted that `gemkeeper.gemspec` and `Gemfile` declare `rubygems-generate_index ~> 1.0`, which exists to support Geminabox's legacy Marshal index generation.
127
+ The spec swaps `geminabox` for `compact_index` but is silent on this.
128
+
129
+ **Recommendation:** Add `rubygems-generate_index` to the list of dependencies removed in the Integration Points table.
130
+
131
+ ---
132
+
133
+ ## Additional Requirements to Add
134
+
135
+ | # | Source | Requirement |
136
+ | - | ------ | ----------- |
137
+ | AR-new-1 | All | Info bodies computed before versions index; checksums embedded in versions from pre-computed hashes |
138
+ | AR-new-2 | All | ETag is SHA256 of merged response body for all locally-generated or merged endpoints |
139
+ | AR-new-3 | All | `:gemname` and `:filename` URL params validated to `/\A[a-zA-Z0-9._-]+\z/` before filesystem or upstream URL use |
140
+ | AR-new-4 | Copilot/Codex | Puma's thread count should be set to 1 (or the index swap made atomic) — specify which |
141
+ | FR-new-1 | All | `/names` fetches, caches, and merges upstream `/names` with local gem names under the same TTL rules as `/info` |
142
+
143
+ ---
144
+
145
+ ## Ambiguities to Resolve
146
+
147
+ 1. **ETag algorithm** — spec says "MD5 or SHA256"; pick SHA256 (consistent with `Repr-Digest`, avoids two passes).
148
+ 2. **GemUploader response codes** — spec says 201/409; actual `GemUploader#handle_response` also accepts 200 and 302 as success. Align the spec with the real code.
149
+ 3. **`/versions` cache storage** — spec is ambiguous about whether the cache stores the raw upstream body (re-merged on request) or the merged result (invalidated on upload). Specify: cache raw upstream body; re-merge with private gems in memory on request; memoize the merged result until next upload or upstream ETag change.
150
+ 4. **`CompactIndex::GemVersion` field names** — verify `number` vs `version` against the installed 0.15.x gem before referencing them in the spec.
151
+ 5. **AGENTS.md vs CLAUDE.md** — Codex noted the Integration Points table references `CLAUDE.md`, but the actual project instruction file may be `AGENTS.md`. Verify and correct.
152
+
153
+ ---
154
+
155
+ ## Summary of Required Changes
156
+
157
+ 1. **(Blocking)** Specify the `/versions` merge algorithm: upstream verbatim block first, private gems appended via `extra_gems`, collision = suppress public entry, byte-stable layout.
158
+ 2. **(Blocking)** Specify ETag and `Repr-Digest` computation from merged body for all generated/merged endpoints.
159
+ 3. **(Blocking)** Specify `info_checksum` generation order: info bodies first, checksums embedded into versions index.
160
+ 4. **(Blocking)** Define `/names` scope: fetched, cached, merged like `/versions`.
161
+ 5. Add input validation constraint for `:gemname` and `:filename` path parameters.
162
+ 6. Clarify "valid gem" validation as tar-parseable + gemspec extractable (no eval).
163
+ 7. Add upload atomicity requirements: `mkdir_p`, temp-file write, rename.
164
+ 8. Add thread safety AR: atomic index swap after rebuild.
165
+ 9. Distinguish upstream 404 vs outage in FR-3.4 (404 vs 503).
166
+ 10. Note `list_gems` dead method and `rubygems-generate_index` removal in Integration Points.
167
+ 11. Correct GemUploader response code list (200/201/302 accepted, not just 201).
168
+ 12. Resolve ETag algorithm ambiguity (SHA256 only).
@@ -0,0 +1,124 @@
1
+ # Critique v1 — Claude
2
+ # Spec 20260529-091429: Replace Geminabox with Compact Index Proxy
3
+
4
+ ## Summary
5
+
6
+ The spec is well-structured and the goal is clear.
7
+ The main risks are: the `/versions` merge strategy is underspecified (blocking); ETag generation rules for the merged response are missing (blocking); the upload flow has a gap around gemspec reading after upload; and the offline cache invalidation strategy needs tightening.
8
+
9
+ ---
10
+
11
+ ## Blocking Issues
12
+
13
+ ### 1. `/versions` merge is underspecified — will block implementation
14
+
15
+ FR-3.1 says private gems "take precedence" on name collision and that the response is "merged."
16
+ The `compact_index` gem's `CompactIndex.versions(versions_file, extra_gems)` API expects a `VersionsFile` object managing a cached local file — it is not designed to do a one-shot merge of two remote sources.
17
+
18
+ Missing:
19
+ - How is the RubyGems.org `/versions` file fetched and stored? Is it written to `cache_dir/rubygems_cache/versions`?
20
+ - Is the VersionsFile constructed from that cached file, with private gems passed as `extra_gems`?
21
+ - What does "take precedence" mean concretely — if `rails` appears in both, does the private entry completely replace all public rails versions, or are the version lists merged?
22
+ - The `/versions` file is append-only and chronologically ordered. A "merged" response that re-orders entries by initial release date across two sources is non-trivial. Does the spec require that ordering, or is a simpler concatenation acceptable?
23
+
24
+ **Recommendation:** Add an AR or explicit note on the merge algorithm: fetch + cache the upstream versions blob, use it as the VersionsFile, inject private gems as extra_gems, let `compact_index` handle the merge. Clarify that "precedence" means the private gem's info checksum wins for any overlapping name, not that public versions are suppressed.
25
+
26
+ ### 2. ETag / `Repr-Digest` generation for the merged `/versions` response
27
+
28
+ FR-1.3 requires these headers but does not say how they should be computed for the merged response.
29
+ The RubyGems.org versions file has its own ETag; after merging with private gems, that ETag is invalid.
30
+ If the server forwards the upstream ETag unchanged, Bundler will compute a SHA256 mismatch and retry.
31
+
32
+ Missing: how the server derives the ETag and `Repr-Digest` for the merged response body.
33
+
34
+ **Recommendation:** Specify that `ETag` and `Repr-Digest` for `/versions` are computed from the final merged response body (not forwarded from upstream), so Bundler's checksum validation passes.
35
+
36
+ ---
37
+
38
+ ## Significant Gaps
39
+
40
+ ### 3. Gemspec reading after upload — fragility risk
41
+
42
+ FR-2.1 says the server reads gemspec metadata on startup and "after each successful upload."
43
+ But reading dependency metadata from a `.gem` file requires `Gem::Package.new(path).spec`, which loads the full gemspec.
44
+ If a gemspec `require`s a file that isn't present in the server's load path (e.g., `require_relative "lib/my_gem/version"`), spec loading will fail silently or raise.
45
+
46
+ The spec doesn't address this.
47
+ Geminabox had the same problem and worked around it by parsing gemspecs in a subprocess.
48
+
49
+ **Recommendation:** Add an AR specifying how gemspec metadata is extracted — either via `Gem::Package.new(path).spec` with rescue, or by shelling out, or by using only the embedded gemspec without loading it. Note that errors here should produce a 422 response, not a server crash.
50
+
51
+ ### 4. Cache invalidation for `/info/:gemname` — unclear boundary
52
+
53
+ FR-3.2 says `/info/:gemname` entries are "refreshed after 60 minutes" but the spec doesn't say what triggers a refresh — wall-clock age, an upstream ETag check, or both.
54
+
55
+ Bundler sends `If-None-Match` with the cached ETag.
56
+ If the server forwards that header to RubyGems.org and gets a 304 back, should it reset the 60-minute TTL or not?
57
+
58
+ **Recommendation:** Clarify: use upstream ETag for conditional GET; if upstream returns 304, update the local TTL; if upstream returns 200, overwrite cache and update TTL.
59
+
60
+ ### 5. Upload directory creation not specified
61
+
62
+ FR-1.2 says the server saves uploaded gems to `gems_path/gems/`.
63
+ But the server is started before any gems are synced — the `gems/` subdirectory may not exist yet.
64
+
65
+ The spec should explicitly require the server to `mkdir_p gems_path/gems/` on startup (or on first upload).
66
+ Currently `RackupProcess#generate_config_ru` creates `gems_path` but not the `gems/` subdirectory.
67
+
68
+ **Recommendation:** Add to FR-1.2: if `gems_path/gems/` does not exist, create it before writing the uploaded file.
69
+
70
+ ### 6. Path traversal in `/gems/:filename.gem`
71
+
72
+ FR-1.1 defines `GET /gems/:filename.gem` without addressing input validation.
73
+ A filename like `../../etc/passwd` or `../gemkeeper.pid` could escape the `gems_path/gems/` directory.
74
+
75
+ The spec's security review marks this N/A because the server binds to 127.0.0.1.
76
+ That's reasonable for the server-to-client threat model, but the server also passes the filename to the filesystem and potentially to a RubyGems.org URL.
77
+ A malformed filename could cause unexpected behavior even from localhost.
78
+
79
+ **Recommendation:** Add a constraint in AR-4.1 or as an AR under Feature 1: filenames in URL paths are validated to match `/\A[a-zA-Z0-9._-]+-[\d.]+(-[a-z0-9_-]+)?\.gem\z/` before filesystem or proxy operations; return 400 otherwise.
80
+
81
+ ---
82
+
83
+ ## Minor Issues
84
+
85
+ ### 7. `Gem.path` vs `Gem.paths.home` — API clarification
86
+
87
+ FR-3.3 uses `Gem.path.map { |p| File.join(p, "cache", filename) }`.
88
+ `Gem.path` returns an array of gem search paths (GEM_PATH), not just GEM_HOME.
89
+ This is correct but worth confirming: on a typical mise-managed setup, `Gem.path` includes both the per-version gem home and any global paths.
90
+ The intent (check all system caches) matches the API.
91
+ No change needed, but the implementer should be aware that `Gem.path` may include paths without a `cache/` subdirectory — the lookup should use `File.exist?` before serving.
92
+
93
+ ### 8. `gemkeeper list` path discrepancy
94
+
95
+ The codebase exploration noted that `gemkeeper list` reads `Dir.glob(File.join(gems_path, "gems", "*.gem"))` — meaning it expects gems in a `gems/` subdirectory.
96
+ But `gem_syncer.rb` builds gems directly to `gems_path/<name>-<version>.gem` (no subdirectory) before uploading.
97
+ After upload, Geminabox stores them in `gems_path/gems/`.
98
+
99
+ FR-4.3 says the custom server must store uploaded gems at `gems_path/gems/` to preserve this layout.
100
+ That's correct, but the spec should also note that `gem_syncer.rb`'s build output path (`gems_path/<name>-<version>.gem`) is a staging location, not a final location — the upload step is what moves it into `gems/`.
101
+ This is implicit today; worth making explicit so implementers don't accidentally change the storage layout.
102
+
103
+ ### 9. `compact_index` gem not in Gemfile.lock yet
104
+
105
+ The spec requires adding `compact_index ~> 0.15` as a runtime dependency.
106
+ Worth noting that the Homebrew formula bundles gems and will need to be rebuilt and re-pushed to the tap for the dependency change to take effect in production.
107
+ This is already noted in the checklist (Rollout) but not connected to a concrete release step.
108
+
109
+ ### 10. `/versions` response size
110
+
111
+ RubyGems.org's `/versions` file is currently ~5 MB uncompressed.
112
+ Merging it with private gems on every request (even with a 30-minute cache) means the server allocates this in memory for every `/versions` request.
113
+ The checklist marks performance as an open item — this is the specific risk to quantify.
114
+
115
+ ---
116
+
117
+ ## Summary of Required Changes
118
+
119
+ 1. **(Blocking)** Specify the `/versions` merge algorithm — VersionsFile from cached upstream + private gems as extra_gems; define "precedence" concretely.
120
+ 2. **(Blocking)** Specify ETag and `Repr-Digest` computation for merged `/versions` response bodies.
121
+ 3. Specify gemspec extraction strategy and error handling for FR-2.1.
122
+ 4. Clarify cache refresh trigger for `/info/:gemname` (wall-clock vs ETag-based).
123
+ 5. Add `mkdir_p gems_path/gems/` requirement to FR-1.2.
124
+ 6. Add filename validation constraint for `/gems/:filename.gem` path parameter.
@@ -0,0 +1,125 @@
1
+ # Critique v1 - Codex
2
+
3
+ ## Overview
4
+
5
+ This spec replaces Geminabox with a custom Rack app that serves private gems through Bundler's compact index protocol while proxying RubyGems.org and keeping an offline cache.
6
+ The direction is right, but the spec is not implementation-ready yet because the compact index merge rules, cache semantics, and input validation rules are underspecified in places that Bundler will exercise directly.
7
+
8
+ ## Approach Summary
9
+
10
+ - Add `Gemkeeper::CompactIndexServer` as a Rack app mounted by generated `config.ru`.
11
+ - Keep the existing `GemUploader` multipart `/upload` contract and the `gems_path/gems/*.gem` final storage layout.
12
+ - Generate private gem compact index data with `compact_index`.
13
+ - Proxy RubyGems.org `/names`, `/versions`, `/info/:gemname`, and `/gems/:filename.gem` through `Net::HTTP`.
14
+ - Cache upstream compact index files and gem binaries under `cache_dir/rubygems_cache/` for offline fallback.
15
+
16
+ The replacement choice is well justified by Geminabox's stale dependency endpoint, but the spec currently treats "proxy plus merge" as a simple composition when it is the hardest part of the implementation.
17
+
18
+ ## Risks
19
+
20
+ ### 1. Compact index merge semantics are not concrete enough
21
+
22
+ Likelihood: high.
23
+ Severity: high.
24
+ The spec says `/versions` is "merged" and private gems "take precedence", but does not define whether a private/public name collision replaces the entire public name entry, replaces matching versions only, or appends a duplicate line and relies on Bundler parser behavior.
25
+ This is also a dependency-confusion risk: if a private gem name exists on RubyGems.org and public versions are still visible, Bundler may resolve a public version.
26
+ The local `compact_index` 0.15-compatible API also expects `CompactIndex.versions(versions_file, gems = nil, args = {})`, where `versions_file` is a `CompactIndex::VersionsFile`, not a raw upstream response body.
27
+ The spec partially addresses the issue by naming `compact_index`, but it needs an explicit algorithm for `/names`, `/versions`, collision precedence, ordering, and checksum generation.
28
+
29
+ ### 2. Bundler's conditional request contract can fail silently or noisily
30
+
31
+ Likelihood: medium-high.
32
+ Severity: high.
33
+ Bundler fetches `names`, `versions`, and `info/*` through the same updater path and uses `Range`, `If-None-Match`, `ETag`, and `Repr-Digest`/`Digest` to update local cache files.
34
+ FR-1.3 covers only `/versions` and `/info/:gemname`, leaving `/names` weaker even though RubyGems.org serves `/names` with the same cache headers.
35
+ For merged bodies, upstream `ETag` and digest headers are invalid and must be recomputed from the final response body.
36
+ The spec does not say what to do with malformed ranges, suffix ranges, range starts beyond EOF, weak ETags, quoted ETags, or `If-None-Match` against a stale upstream cache.
37
+
38
+ ### 3. Offline cache behavior conflates outage, missing gems, and stale data
39
+
40
+ Likelihood: high.
41
+ Severity: medium-high.
42
+ FR-3.2 says "non-2xx" means unreachable, but an upstream 404 for `/info/nonexistent-gem` is a valid upstream answer, not an outage.
43
+ FR-3.4 then asks for a 404 body saying `"Upstream unavailable and no local cache"`, which is wrong when RubyGems.org is reachable and the gem simply does not exist.
44
+ The spec also does not define whether 404s are cached, whether TTL refresh uses wall-clock only or conditional GETs, whether a 304 resets the TTL, or how corrupt/partial cache files are detected and discarded.
45
+ Cache writes need to be atomic because Puma can serve concurrent Bundler requests.
46
+
47
+ ### 4. Upload and gem metadata validation are too loose
48
+
49
+ Likelihood: medium.
50
+ Severity: high.
51
+ `POST /upload` accepts attacker-controlled multipart data from localhost, and "valid gem" is not defined.
52
+ The server should derive name, version, platform, dependencies, required Ruby/RubyGems versions, and checksum from the embedded gemspec in the `.gem`, not from the uploaded filename.
53
+ The spec does not require `mkdir_p gems_path/gems`, atomic temp-file writes, duplicate handling by parsed gem identity, filename/spec mismatch rejection, upload size limits, malformed multipart handling, or cleanup after failed validation.
54
+ It also says `GemUploader` expects only 201 and 409, but the current code treats 200, 201, 302, and 409 as successful upload outcomes.
55
+
56
+ ### 5. The security checklist is incorrectly marked N/A
57
+
58
+ Likelihood: high.
59
+ Severity: medium.
60
+ Binding to `127.0.0.1` reduces exposure but does not eliminate security requirements.
61
+ The app accepts URL path input, multipart file input, and emits local gem names and versions.
62
+ The spec needs validation for `/info/:gemname` and `/gems/:filename.gem` before filesystem access or upstream URL construction, including path traversal, percent-encoding, absolute paths, control characters, query strings, and overlong names.
63
+ Unauthenticated local upload is explicitly out of scope, but the spec should still state the accepted local-only threat model and require defensive input validation.
64
+
65
+ ### 6. Performance impact is larger than the checklist suggests
66
+
67
+ Likelihood: high.
68
+ Severity: medium.
69
+ RubyGems.org `/versions` is currently about 23 MB over the wire, and `/names` is about 2.7 MB.
70
+ Regenerating a merged body, digest, and ETag for every request will allocate large strings and add latency on a 16 GB workstation.
71
+ The spec calls this an open question, but implementation needs a concrete caching strategy for final merged response bodies, invalidation on upload, and upstream refresh cadence.
72
+ Gem binary proxying should stream to disk/client or use bounded buffering rather than reading large `.gem` files fully into memory.
73
+
74
+ ## Complexity Hotspots
75
+
76
+ ### Compact index generation and collision rules
77
+
78
+ This is the core of the feature.
79
+ The spec needs to say how private gems become `CompactIndex::Gem` and `CompactIndex::GemVersion` objects, how `info_checksum` is calculated, and how public/private name collisions are represented.
80
+ The named API fields are slightly off: the local 0.15-compatible `CompactIndex::GemVersion` struct uses `number`, `platform`, `checksum`, `info_checksum`, `dependencies`, `ruby_version`, and `rubygems_version`, not a `version` field.
81
+
82
+ ### HTTP proxy/cache implementation
83
+
84
+ The proxy has to combine upstream conditional GETs, local TTLs, offline fallback, final-body digest generation, and Bundler's range update behavior.
85
+ This needs a small but explicit cache model: file paths, sidecar metadata, atomic write strategy, status-code handling, and stale-cache rules.
86
+
87
+ ### Upload path and metadata extraction
88
+
89
+ The server must parse Rack multipart input, validate a gem package, extract metadata, write atomically, and refresh in-memory index state without racing concurrent reads.
90
+ The spec gives the happy path but not the failure and concurrency behavior.
91
+
92
+ ### Test harness
93
+
94
+ The spec relies on "bundle install works" as a verification target, but a lot of failures only show up through exact headers and cache state.
95
+ Implementation will need unit tests for route behavior and cache transitions, plus at least one integration test that runs Bundler against the local server with both private and public gems.
96
+
97
+ ## Completeness Checklist Audit
98
+
99
+ | Item | Status | Notes |
100
+ | ---- | ------ | ----- |
101
+ | Scope & acceptance criteria | WARN | Each FR has a verify line, but several acceptance criteria are too broad to implement deterministically, especially "merged", "take precedence", "valid gem", and "offline". |
102
+ | Testing strategy | WARN | Existing tests are identified, but missing tests for upload success, protocol headers, range edge cases, upstream 404 vs outage, corrupt cache files, path validation, and a real Bundler compact-index install. |
103
+ | Existing patterns compared | WARN | The spec references core classes, but misses current `GemUploader#list_gems`, CLI/user-facing Geminabox strings, README/AGENTS updates, `gemkeeper.yml.example`, and the fact that the tracked project doc is `AGENTS.md`, not `CLAUDE.md`. |
104
+ | Dependencies justified | WARN | `compact_index ~> 0.15` is a reasonable choice, but the repo already depends directly on `rubygems-generate_index`, which vendors a 0.15-compatible `compact_index`; Gemfile/Gemfile.lock updates and load-order expectations need to be explicit. |
105
+ | Architecture & interfaces | WARN | The Rack entry point and storage layout are named, but route validation, cache object boundaries, response header helpers, concurrency strategy, and generated `config.ru` require path are not fully specified. |
106
+ | Error handling & failure modes | FAIL | Upstream 404s, malformed ranges, invalid multipart bodies, invalid gem packages, partial downloads, corrupt cache files, filesystem write failures, and cache races are not adequately covered. |
107
+ | Security review | FAIL | Marking security N/A is not adequate because the server accepts path parameters and file uploads and proxies requests based on user-controlled values. |
108
+ | Performance impact | FAIL | The checklist honestly leaves this open, but `/versions` size and per-request merge/digest cost are large enough that the spec needs a concrete design before implementation. |
109
+ | Rollout & migration | WARN | Data migration is probably unnecessary, but the "drop-in" claim omits user-facing renames, docs updates, Gemfile.lock, Homebrew release steps, and whether compatibility endpoints like `/api/v1/gems.json` remain. |
110
+ | Assumptions & risks | WARN | The spec names the strict Bundler cache risk, but understates compact-index API details, public/private name collision behavior, and the incorrect `CLAUDE.md` integration point. |
111
+
112
+ ## Verdict
113
+
114
+ NEEDS WORK.
115
+ The goal and high-level architecture are solid, but implementation would force too many protocol, cache, security, and migration decisions in code.
116
+ Those decisions affect correctness under Bundler, so they should be settled in the spec first.
117
+
118
+ ## Suggested Next Steps
119
+
120
+ 1. Define the exact `/names` and `/versions` merge algorithm, including private/public name collision semantics and dependency-confusion behavior.
121
+ 2. Specify how `ETag`, `Repr-Digest`, `Digest`, `Accept-Ranges`, 206, 304, and invalid range responses are generated for all compact index endpoints, including `/names`.
122
+ 3. Define the proxy cache layout and rules for TTL, conditional upstream refresh, upstream 404 vs outage, corrupt cache recovery, atomic writes, and stale fallback.
123
+ 4. Add explicit validation requirements for gem names, gem filenames, upload files, parsed gemspec identity, and filesystem path construction.
124
+ 5. Correct codebase assumptions: use `AGENTS.md` instead of `CLAUDE.md`, decide whether `/api/v1/gems.json` is still supported for `GemUploader#list_gems`, and include README/CLI/example config wording if Geminabox is truly replaced.
125
+ 6. Expand tests to cover compact-index route units, upload success/failure, offline cache transitions, malicious paths, and a Bundler integration install with one private gem and one public gem.