gemkeeper 0.8.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,124 +0,0 @@
1
- # Critique v1 — Claude
2
- # Spec 20260529-091429: Replace Geminabox with Compact Index Proxy
3
-
4
- ## Summary
5
-
6
- The spec is well-structured and the goal is clear.
7
- The main risks are: the `/versions` merge strategy is underspecified (blocking); ETag generation rules for the merged response are missing (blocking); the upload flow has a gap around gemspec reading after upload; and the offline cache invalidation strategy needs tightening.
8
-
9
- ---
10
-
11
- ## Blocking Issues
12
-
13
- ### 1. `/versions` merge is underspecified — will block implementation
14
-
15
- FR-3.1 says private gems "take precedence" on name collision and that the response is "merged."
16
- The `compact_index` gem's `CompactIndex.versions(versions_file, extra_gems)` API expects a `VersionsFile` object managing a cached local file — it is not designed to do a one-shot merge of two remote sources.
17
-
18
- Missing:
19
- - How is the RubyGems.org `/versions` file fetched and stored? Is it written to `cache_dir/rubygems_cache/versions`?
20
- - Is the VersionsFile constructed from that cached file, with private gems passed as `extra_gems`?
21
- - What does "take precedence" mean concretely — if `rails` appears in both, does the private entry completely replace all public rails versions, or are the version lists merged?
22
- - The `/versions` file is append-only and chronologically ordered. A "merged" response that re-orders entries by initial release date across two sources is non-trivial. Does the spec require that ordering, or is a simpler concatenation acceptable?
23
-
24
- **Recommendation:** Add an AR or explicit note on the merge algorithm: fetch + cache the upstream versions blob, use it as the VersionsFile, inject private gems as extra_gems, let `compact_index` handle the merge. Clarify that "precedence" means the private gem's info checksum wins for any overlapping name, not that public versions are suppressed.
25
-
26
- ### 2. ETag / `Repr-Digest` generation for the merged `/versions` response
27
-
28
- FR-1.3 requires these headers but does not say how they should be computed for the merged response.
29
- The RubyGems.org versions file has its own ETag; after merging with private gems, that ETag is invalid.
30
- If the server forwards the upstream ETag unchanged, Bundler will compute a SHA256 mismatch and retry.
31
-
32
- Missing: how the server derives the ETag and `Repr-Digest` for the merged response body.
33
-
34
- **Recommendation:** Specify that `ETag` and `Repr-Digest` for `/versions` are computed from the final merged response body (not forwarded from upstream), so Bundler's checksum validation passes.
35
-
36
- ---
37
-
38
- ## Significant Gaps
39
-
40
- ### 3. Gemspec reading after upload — fragility risk
41
-
42
- FR-2.1 says the server reads gemspec metadata on startup and "after each successful upload."
43
- But reading dependency metadata from a `.gem` file requires `Gem::Package.new(path).spec`, which loads the full gemspec.
44
- If a gemspec `require`s a file that isn't present in the server's load path (e.g., `require_relative "lib/my_gem/version"`), spec loading will fail silently or raise.
45
-
46
- The spec doesn't address this.
47
- Geminabox had the same problem and worked around it by parsing gemspecs in a subprocess.
48
-
49
- **Recommendation:** Add an AR specifying how gemspec metadata is extracted — either via `Gem::Package.new(path).spec` with rescue, or by shelling out, or by using only the embedded gemspec without loading it. Note that errors here should produce a 422 response, not a server crash.
50
-
51
- ### 4. Cache invalidation for `/info/:gemname` — unclear boundary
52
-
53
- FR-3.2 says `/info/:gemname` entries are "refreshed after 60 minutes" but the spec doesn't say what triggers a refresh — wall-clock age, an upstream ETag check, or both.
54
-
55
- Bundler sends `If-None-Match` with the cached ETag.
56
- If the server forwards that header to RubyGems.org and gets a 304 back, should it reset the 60-minute TTL or not?
57
-
58
- **Recommendation:** Clarify: use upstream ETag for conditional GET; if upstream returns 304, update the local TTL; if upstream returns 200, overwrite cache and update TTL.
59
-
60
- ### 5. Upload directory creation not specified
61
-
62
- FR-1.2 says the server saves uploaded gems to `gems_path/gems/`.
63
- But the server is started before any gems are synced — the `gems/` subdirectory may not exist yet.
64
-
65
- The spec should explicitly require the server to `mkdir_p gems_path/gems/` on startup (or on first upload).
66
- Currently `RackupProcess#generate_config_ru` creates `gems_path` but not the `gems/` subdirectory.
67
-
68
- **Recommendation:** Add to FR-1.2: if `gems_path/gems/` does not exist, create it before writing the uploaded file.
69
-
70
- ### 6. Path traversal in `/gems/:filename.gem`
71
-
72
- FR-1.1 defines `GET /gems/:filename.gem` without addressing input validation.
73
- A filename like `../../etc/passwd` or `../gemkeeper.pid` could escape the `gems_path/gems/` directory.
74
-
75
- The spec's security review marks this N/A because the server binds to 127.0.0.1.
76
- That's reasonable for the server-to-client threat model, but the server also passes the filename to the filesystem and potentially to a RubyGems.org URL.
77
- A malformed filename could cause unexpected behavior even from localhost.
78
-
79
- **Recommendation:** Add a constraint in AR-4.1 or as an AR under Feature 1: filenames in URL paths are validated to match `/\A[a-zA-Z0-9._-]+-[\d.]+(-[a-z0-9_-]+)?\.gem\z/` before filesystem or proxy operations; return 400 otherwise.
80
-
81
- ---
82
-
83
- ## Minor Issues
84
-
85
- ### 7. `Gem.path` vs `Gem.paths.home` — API clarification
86
-
87
- FR-3.3 uses `Gem.path.map { |p| File.join(p, "cache", filename) }`.
88
- `Gem.path` returns an array of gem search paths (GEM_PATH), not just GEM_HOME.
89
- This is correct but worth confirming: on a typical mise-managed setup, `Gem.path` includes both the per-version gem home and any global paths.
90
- The intent (check all system caches) matches the API.
91
- No change needed, but the implementer should be aware that `Gem.path` may include paths without a `cache/` subdirectory — the lookup should use `File.exist?` before serving.
92
-
93
- ### 8. `gemkeeper list` path discrepancy
94
-
95
- The codebase exploration noted that `gemkeeper list` reads `Dir.glob(File.join(gems_path, "gems", "*.gem"))` — meaning it expects gems in a `gems/` subdirectory.
96
- But `gem_syncer.rb` builds gems directly to `gems_path/<name>-<version>.gem` (no subdirectory) before uploading.
97
- After upload, Geminabox stores them in `gems_path/gems/`.
98
-
99
- FR-4.3 says the custom server must store uploaded gems at `gems_path/gems/` to preserve this layout.
100
- That's correct, but the spec should also note that `gem_syncer.rb`'s build output path (`gems_path/<name>-<version>.gem`) is a staging location, not a final location — the upload step is what moves it into `gems/`.
101
- This is implicit today; worth making explicit so implementers don't accidentally change the storage layout.
102
-
103
- ### 9. `compact_index` gem not in Gemfile.lock yet
104
-
105
- The spec requires adding `compact_index ~> 0.15` as a runtime dependency.
106
- Worth noting that the Homebrew formula bundles gems and will need to be rebuilt and re-pushed to the tap for the dependency change to take effect in production.
107
- This is already noted in the checklist (Rollout) but not connected to a concrete release step.
108
-
109
- ### 10. `/versions` response size
110
-
111
- RubyGems.org's `/versions` file is currently ~5 MB uncompressed.
112
- Merging it with private gems on every request (even with a 30-minute cache) means the server allocates this in memory for every `/versions` request.
113
- The checklist marks performance as an open item — this is the specific risk to quantify.
114
-
115
- ---
116
-
117
- ## Summary of Required Changes
118
-
119
- 1. **(Blocking)** Specify the `/versions` merge algorithm — VersionsFile from cached upstream + private gems as extra_gems; define "precedence" concretely.
120
- 2. **(Blocking)** Specify ETag and `Repr-Digest` computation for merged `/versions` response bodies.
121
- 3. Specify gemspec extraction strategy and error handling for FR-2.1.
122
- 4. Clarify cache refresh trigger for `/info/:gemname` (wall-clock vs ETag-based).
123
- 5. Add `mkdir_p gems_path/gems/` requirement to FR-1.2.
124
- 6. Add filename validation constraint for `/gems/:filename.gem` path parameter.
@@ -1,125 +0,0 @@
1
- # Critique v1 - Codex
2
-
3
- ## Overview
4
-
5
- This spec replaces Geminabox with a custom Rack app that serves private gems through Bundler's compact index protocol while proxying RubyGems.org and keeping an offline cache.
6
- The direction is right, but the spec is not implementation-ready yet because the compact index merge rules, cache semantics, and input validation rules are underspecified in places that Bundler will exercise directly.
7
-
8
- ## Approach Summary
9
-
10
- - Add `Gemkeeper::CompactIndexServer` as a Rack app mounted by generated `config.ru`.
11
- - Keep the existing `GemUploader` multipart `/upload` contract and the `gems_path/gems/*.gem` final storage layout.
12
- - Generate private gem compact index data with `compact_index`.
13
- - Proxy RubyGems.org `/names`, `/versions`, `/info/:gemname`, and `/gems/:filename.gem` through `Net::HTTP`.
14
- - Cache upstream compact index files and gem binaries under `cache_dir/rubygems_cache/` for offline fallback.
15
-
16
- The replacement choice is well justified by Geminabox's stale dependency endpoint, but the spec currently treats "proxy plus merge" as a simple composition when it is the hardest part of the implementation.
17
-
18
- ## Risks
19
-
20
- ### 1. Compact index merge semantics are not concrete enough
21
-
22
- Likelihood: high.
23
- Severity: high.
24
- The spec says `/versions` is "merged" and private gems "take precedence", but does not define whether a private/public name collision replaces the entire public name entry, replaces matching versions only, or appends a duplicate line and relies on Bundler parser behavior.
25
- This is also a dependency-confusion risk: if a private gem name exists on RubyGems.org and public versions are still visible, Bundler may resolve a public version.
26
- The local `compact_index` 0.15-compatible API also expects `CompactIndex.versions(versions_file, gems = nil, args = {})`, where `versions_file` is a `CompactIndex::VersionsFile`, not a raw upstream response body.
27
- The spec partially addresses the issue by naming `compact_index`, but it needs an explicit algorithm for `/names`, `/versions`, collision precedence, ordering, and checksum generation.
28
-
29
- ### 2. Bundler's conditional request contract can fail silently or noisily
30
-
31
- Likelihood: medium-high.
32
- Severity: high.
33
- Bundler fetches `names`, `versions`, and `info/*` through the same updater path and uses `Range`, `If-None-Match`, `ETag`, and `Repr-Digest`/`Digest` to update local cache files.
34
- FR-1.3 covers only `/versions` and `/info/:gemname`, leaving `/names` weaker even though RubyGems.org serves `/names` with the same cache headers.
35
- For merged bodies, upstream `ETag` and digest headers are invalid and must be recomputed from the final response body.
36
- The spec does not say what to do with malformed ranges, suffix ranges, range starts beyond EOF, weak ETags, quoted ETags, or `If-None-Match` against a stale upstream cache.
37
-
38
- ### 3. Offline cache behavior conflates outage, missing gems, and stale data
39
-
40
- Likelihood: high.
41
- Severity: medium-high.
42
- FR-3.2 says "non-2xx" means unreachable, but an upstream 404 for `/info/nonexistent-gem` is a valid upstream answer, not an outage.
43
- FR-3.4 then asks for a 404 body saying `"Upstream unavailable and no local cache"`, which is wrong when RubyGems.org is reachable and the gem simply does not exist.
44
- The spec also does not define whether 404s are cached, whether TTL refresh uses wall-clock only or conditional GETs, whether a 304 resets the TTL, or how corrupt/partial cache files are detected and discarded.
45
- Cache writes need to be atomic because Puma can serve concurrent Bundler requests.
46
-
47
- ### 4. Upload and gem metadata validation are too loose
48
-
49
- Likelihood: medium.
50
- Severity: high.
51
- `POST /upload` accepts attacker-controlled multipart data from localhost, and "valid gem" is not defined.
52
- The server should derive name, version, platform, dependencies, required Ruby/RubyGems versions, and checksum from the embedded gemspec in the `.gem`, not from the uploaded filename.
53
- The spec does not require `mkdir_p gems_path/gems`, atomic temp-file writes, duplicate handling by parsed gem identity, filename/spec mismatch rejection, upload size limits, malformed multipart handling, or cleanup after failed validation.
54
- It also says `GemUploader` expects only 201 and 409, but the current code treats 200, 201, 302, and 409 as successful upload outcomes.
55
-
56
- ### 5. The security checklist is incorrectly marked N/A
57
-
58
- Likelihood: high.
59
- Severity: medium.
60
- Binding to `127.0.0.1` reduces exposure but does not eliminate security requirements.
61
- The app accepts URL path input, multipart file input, and emits local gem names and versions.
62
- The spec needs validation for `/info/:gemname` and `/gems/:filename.gem` before filesystem access or upstream URL construction, including path traversal, percent-encoding, absolute paths, control characters, query strings, and overlong names.
63
- Unauthenticated local upload is explicitly out of scope, but the spec should still state the accepted local-only threat model and require defensive input validation.
64
-
65
- ### 6. Performance impact is larger than the checklist suggests
66
-
67
- Likelihood: high.
68
- Severity: medium.
69
- RubyGems.org `/versions` is currently about 23 MB over the wire, and `/names` is about 2.7 MB.
70
- Regenerating a merged body, digest, and ETag for every request will allocate large strings and add latency on a 16 GB workstation.
71
- The spec calls this an open question, but implementation needs a concrete caching strategy for final merged response bodies, invalidation on upload, and upstream refresh cadence.
72
- Gem binary proxying should stream to disk/client or use bounded buffering rather than reading large `.gem` files fully into memory.
73
-
74
- ## Complexity Hotspots
75
-
76
- ### Compact index generation and collision rules
77
-
78
- This is the core of the feature.
79
- The spec needs to say how private gems become `CompactIndex::Gem` and `CompactIndex::GemVersion` objects, how `info_checksum` is calculated, and how public/private name collisions are represented.
80
- The named API fields are slightly off: the local 0.15-compatible `CompactIndex::GemVersion` struct uses `number`, `platform`, `checksum`, `info_checksum`, `dependencies`, `ruby_version`, and `rubygems_version`, not a `version` field.
81
-
82
- ### HTTP proxy/cache implementation
83
-
84
- The proxy has to combine upstream conditional GETs, local TTLs, offline fallback, final-body digest generation, and Bundler's range update behavior.
85
- This needs a small but explicit cache model: file paths, sidecar metadata, atomic write strategy, status-code handling, and stale-cache rules.
86
-
87
- ### Upload path and metadata extraction
88
-
89
- The server must parse Rack multipart input, validate a gem package, extract metadata, write atomically, and refresh in-memory index state without racing concurrent reads.
90
- The spec gives the happy path but not the failure and concurrency behavior.
91
-
92
- ### Test harness
93
-
94
- The spec relies on "bundle install works" as a verification target, but a lot of failures only show up through exact headers and cache state.
95
- Implementation will need unit tests for route behavior and cache transitions, plus at least one integration test that runs Bundler against the local server with both private and public gems.
96
-
97
- ## Completeness Checklist Audit
98
-
99
- | Item | Status | Notes |
100
- | ---- | ------ | ----- |
101
- | Scope & acceptance criteria | WARN | Each FR has a verify line, but several acceptance criteria are too broad to implement deterministically, especially "merged", "take precedence", "valid gem", and "offline". |
102
- | Testing strategy | WARN | Existing tests are identified, but missing tests for upload success, protocol headers, range edge cases, upstream 404 vs outage, corrupt cache files, path validation, and a real Bundler compact-index install. |
103
- | Existing patterns compared | WARN | The spec references core classes, but misses current `GemUploader#list_gems`, CLI/user-facing Geminabox strings, README/AGENTS updates, `gemkeeper.yml.example`, and the fact that the tracked project doc is `AGENTS.md`, not `CLAUDE.md`. |
104
- | Dependencies justified | WARN | `compact_index ~> 0.15` is a reasonable choice, but the repo already depends directly on `rubygems-generate_index`, which vendors a 0.15-compatible `compact_index`; Gemfile/Gemfile.lock updates and load-order expectations need to be explicit. |
105
- | Architecture & interfaces | WARN | The Rack entry point and storage layout are named, but route validation, cache object boundaries, response header helpers, concurrency strategy, and generated `config.ru` require path are not fully specified. |
106
- | Error handling & failure modes | FAIL | Upstream 404s, malformed ranges, invalid multipart bodies, invalid gem packages, partial downloads, corrupt cache files, filesystem write failures, and cache races are not adequately covered. |
107
- | Security review | FAIL | Marking security N/A is not adequate because the server accepts path parameters and file uploads and proxies requests based on user-controlled values. |
108
- | Performance impact | FAIL | The checklist honestly leaves this open, but `/versions` size and per-request merge/digest cost are large enough that the spec needs a concrete design before implementation. |
109
- | Rollout & migration | WARN | Data migration is probably unnecessary, but the "drop-in" claim omits user-facing renames, docs updates, Gemfile.lock, Homebrew release steps, and whether compatibility endpoints like `/api/v1/gems.json` remain. |
110
- | Assumptions & risks | WARN | The spec names the strict Bundler cache risk, but understates compact-index API details, public/private name collision behavior, and the incorrect `CLAUDE.md` integration point. |
111
-
112
- ## Verdict
113
-
114
- NEEDS WORK.
115
- The goal and high-level architecture are solid, but implementation would force too many protocol, cache, security, and migration decisions in code.
116
- Those decisions affect correctness under Bundler, so they should be settled in the spec first.
117
-
118
- ## Suggested Next Steps
119
-
120
- 1. Define the exact `/names` and `/versions` merge algorithm, including private/public name collision semantics and dependency-confusion behavior.
121
- 2. Specify how `ETag`, `Repr-Digest`, `Digest`, `Accept-Ranges`, 206, 304, and invalid range responses are generated for all compact index endpoints, including `/names`.
122
- 3. Define the proxy cache layout and rules for TTL, conditional upstream refresh, upstream 404 vs outage, corrupt cache recovery, atomic writes, and stale fallback.
123
- 4. Add explicit validation requirements for gem names, gem filenames, upload files, parsed gemspec identity, and filesystem path construction.
124
- 5. Correct codebase assumptions: use `AGENTS.md` instead of `CLAUDE.md`, decide whether `/api/v1/gems.json` is still supported for `GemUploader#list_gems`, and include README/CLI/example config wording if Geminabox is truly replaced.
125
- 6. Expand tests to cover compact-index route units, upload success/failure, offline cache transitions, malicious paths, and a Bundler integration install with one private gem and one public gem.
@@ -1,261 +0,0 @@
1
- # Critique: Replace Geminabox with Compact Index Proxy
2
-
3
- Reviewed by: GitHub Copilot (claude-sonnet-4.6)
4
- Date: 2026-05-29
5
-
6
- ## Summary
7
-
8
- The spec is well-scoped and the integration points are clearly identified.
9
- The constraints table and out-of-scope list are unusually precise — useful.
10
- However, several correctness traps exist that would produce a server that passes basic smoke tests but fails under Bundler's actual caching behaviour.
11
- The most serious issues are the `/versions` byte-stability problem (correctness), the missing thread-safety requirement (reliability), and the unspecified `/names` scope (ambiguity with large performance consequences).
12
- The testing section is thin for the volume of new logic being introduced.
13
-
14
- ---
15
-
16
- ## 1. Critical: Correctness Blockers
17
-
18
- ### 1.1 `/versions` byte-stability is unaddressed, and merging breaks Bundler's range fetching
19
-
20
- FR-1.3 requires the server to support `Range: bytes=N-` to serve partial `/versions` responses.
21
- This is how Bundler efficiently updates its local copy: it records the file size it last fetched, then asks for only the bytes after that offset.
22
-
23
- The spec's merge strategy (FR-3.1: "private gem entries take precedence when a name appears in both") sorts or interleaves private gems into the rubygems.org `/versions` body.
24
- Any time a new private gem is added, the byte offsets of every subsequent line in the merged file shift.
25
- Bundler's cached offset is now wrong: the range request returns garbled data, and Bundler either fails or silently resolves wrong versions.
26
-
27
- The spec does not define a stable layout for the merged `/versions` output.
28
- Options include appending private gems after the public block, or rebuilding the rubygems.org block verbatim and appending private entries — but the spec is silent.
29
- Without a stable layout rule, this is a correctness defect, not just a performance issue.
30
-
31
- ### 1.2 `info_checksum` has a circular dependency
32
-
33
- FR-2.2 requires `CompactIndex::GemVersion` to carry an `info_checksum` field.
34
- That checksum is the SHA256 of the `/info/:gemname` response body.
35
- To populate it in the `/versions` entry, the server must generate the `/info` body first, hash it, and embed the hash in `/versions`.
36
-
37
- The spec never describes this ordering, nor does it mention that the `info_checksum` must be recomputed whenever a new version of a gem is uploaded (because the `/info` body changes).
38
- An implementer who builds the versions index first and the info body second will produce invalid checksums that cause Bundler to re-fetch unconditionally.
39
-
40
- ### 1.3 `/names` scope is undefined and carries large performance risk
41
-
42
- FR-1.1 says `/names` returns "all gem names (local and proxied)."
43
- The rubygems.org `/names` file currently contains ~175,000 gem names (roughly 2 MB uncompressed).
44
- "Proxied" in this context almost certainly means the full public gem namespace.
45
-
46
- The spec never says whether the server fetches, caches, and merges the rubygems.org `/names` file (like it does for `/versions`), or whether `/names` is scoped only to gems that have been locally requested or cached.
47
- These produce completely different behaviour:
48
- - Full public namespace: bundle install resolves public gems by name — correct, but the endpoint becomes expensive.
49
- - Local-only: bundle install fails on any public gem not already in the info cache.
50
-
51
- The performance checklist item (unchecked) notes only the `/versions` merge cost; it does not mention `/names`.
52
- This is a missing requirement.
53
-
54
- ---
55
-
56
- ## 2. Ambiguous Requirements
57
-
58
- ### 2.1 "Valid gem" definition in FR-1.2
59
-
60
- FR-1.2 returns 422 "if the file is not a valid gem" but does not define valid.
61
- Three plausible interpretations:
62
-
63
- 1. File extension is `.gem`
64
- 2. File is a parseable tar archive containing `metadata.gz` and `data.tar.gz`
65
- 3. The gemspec within `metadata.gz` can be loaded without error
66
-
67
- These have substantially different implementation and security implications.
68
- Option 1 is trivially bypassable.
69
- Option 3 can raise arbitrary Ruby exceptions if the gemspec calls `require`.
70
- The spec should specify what validation is expected — likely option 2 at minimum.
71
-
72
- ### 2.2 `/versions` cache stores raw upstream or merged output?
73
-
74
- FR-3.2 says cache the `/versions` response.
75
- It is ambiguous whether the cache stores:
76
-
77
- - The raw rubygems.org response (requiring re-merge with private gems on every request), or
78
- - The merged result (requiring cache invalidation on every gem upload)
79
-
80
- Both are valid designs; they have different invalidation logic.
81
- The spec does not specify which, leaving the implementer to decide and potentially choosing the one that breaks ETag/Range behaviour.
82
-
83
- ### 2.3 ETag algorithm: "MD5 or SHA256"
84
-
85
- FR-1.3 says use "MD5 or SHA256" for the ETag.
86
- Giving two options creates inconsistency risk — different code paths might use different algorithms, making ETags non-comparable across restarts.
87
- Pick one.
88
- SHA256 is the better choice (used for `Repr-Digest` too; re-using the same hash avoids a second pass).
89
-
90
- ### 2.4 `handle_response` in `GemUploader` accepts 200 and 302, not just 201
91
-
92
- FR-4.2 states that the new server must return "the same status codes (201, 409) that `GemUploader` expects."
93
- This is inaccurate.
94
- `GemUploader#handle_response` maps `200`, `201`, and `302` as success.
95
- The spec's description of `GemUploader`'s contract is wrong.
96
- While the new server returning 201 will still work (201 is handled), a future implementer auditing the spec against the code will find the discrepancy and may add unnecessary 302 handling or question the spec's accuracy.
97
-
98
- ---
99
-
100
- ## 3. Codebase Assumption Gaps
101
-
102
- ### 3.1 `GemUploader#list_gems` calls `/api/v1/gems.json`
103
-
104
- The spec says "no changes to `GemUploader`" and it is correct that `list_gems` is never called from any production code path (the `list` CLI reads the filesystem directly via `Dir.glob`, per FR-4.3).
105
- But `list_gems` is a public method that calls `GET /api/v1/gems.json`, a Geminabox-specific endpoint.
106
- After this migration, calling `list_gems` will return a 404.
107
-
108
- The spec should either note that `list_gems` becomes a dead method (and optionally raise `NotImplementedError`), or explicitly call out this known breakage so the implementer doesn't silently leave a broken public method.
109
-
110
- ### 3.2 `rubygems-generate_index` dependency is not addressed
111
-
112
- The gemspec currently declares `rubygems-generate_index ~> 1.0`.
113
- This gem exists to support Geminabox's legacy Marshal index generation (`specs.4.8.gz`, `Marshal.4.8.gz`).
114
- The spec removes Geminabox and explicitly excludes legacy index formats from scope, but says only to swap `geminabox` for `compact_index` in the gemspec.
115
- `rubygems-generate_index` is likely now unused dead weight.
116
- Whether to remove it is a judgment call, but the spec should at least acknowledge it.
117
-
118
- ### 3.3 Integration test has more Geminabox assertions than lines 80–81
119
-
120
- The Integration Points table says to update lines 80–81 of `test_server_lifecycle_integration.rb`.
121
- In the current file, there are two assertions that reference Geminabox:
122
-
123
- ```ruby
124
- assert_match(/Geminabox\.data/, content) # line 80
125
- assert_match(/Geminabox\.rubygems_proxy\s*=\s*true/, content) # line 81
126
- ```
127
-
128
- But the test method is named `test_server_generates_config_ru`, and the test class also has `test_server_status_while_running` which checks `status[:url]` equals `@config.geminabox_url`.
129
- The `geminabox_url` method name on `Configuration` is referenced both here and in `RackupProcess#wait_for_server`.
130
- The spec is silent on this naming — the constraint says "no changes to `configuration.rb`", so the stale method name remains.
131
- The integration test assertion on `geminabox_url` will still pass (the URL format doesn't change), but it should be called out in the spec as an accepted naming inconsistency rather than left for the implementer to discover.
132
-
133
- ### 3.4 `compact_index` gem API is assumed but not verified
134
-
135
- The spec builds on `CompactIndex::GemVersion`, `CompactIndex::Dependency`, `CompactIndex.names()`, and `CompactIndex.info()`.
136
- The spec's own checklist flags this: "key assumption: `compact_index` 0.15.x API is stable."
137
- The `compact_index` gem is primarily an internal RubyGems.org dependency.
138
- Its README is sparse and its public API is not documented for external consumers.
139
- Before implementation begins, the actual gem should be inspected to confirm the class names and method signatures match what the spec assumes.
140
- This is flagged here not as a spec defect, but as a pre-implementation step that is conspicuously absent from the spec.
141
-
142
- ---
143
-
144
- ## 4. Error Handling and Edge Case Gaps
145
-
146
- ### 4.1 Thread safety for the in-memory gem index
147
-
148
- FR-2.1 says the server rescans `gems_path/gems/*.gem` after each successful upload.
149
- Puma (the configured server) uses a thread pool by default.
150
- A concurrent GET `/info/:gemname` while an upload is updating the in-memory index will produce a data race.
151
-
152
- The spec does not require synchronization (a `Mutex` around index reads and writes, or a copy-on-write swap).
153
- In practice, Ruby's GVL limits the impact, but it is not zero — especially during index rebuild where multiple instance variables are updated in sequence.
154
- The spec should specify that the gem index is updated atomically (e.g., replace the entire index object with a new one via a single assignment).
155
-
156
- ### 4.2 Range request with explicit end byte is unspecified
157
-
158
- FR-1.3 says handle `Range: bytes=N-` (open-ended).
159
- The HTTP spec also allows `Range: bytes=N-M` (explicit end) and multi-range requests (`Range: bytes=0-99, 200-299`).
160
- Bundler currently only sends open-ended ranges, but the spec should be explicit that multi-range and bounded-range requests return 416 (`Range Not Satisfiable`) or fall back to the full response, rather than leaving this undefined.
161
-
162
- ### 4.3 Behaviour when `gems_path/gems/` contains a corrupt `.gem` file
163
-
164
- FR-2.1 scans all `.gem` files on startup and on upload.
165
- If a file is corrupt (truncated download, disk error), extracting gemspec metadata will raise an exception.
166
- The spec does not say whether the server should skip corrupt files with a warning or abort startup.
167
- If startup is aborted, a single bad file makes the server unlaunchable.
168
-
169
- ### 4.4 Concurrent upload of the same gem
170
-
171
- FR-1.2 returns 409 if the file already exists.
172
- If two `gemkeeper sync` processes run simultaneously and both upload the same gem at the same time, a TOCTOU race exists between the existence check and the file write.
173
- The spec should specify last-write-wins, or require a file lock.
174
-
175
- ### 4.5 System gem cache traversal under `Gem.path`
176
-
177
- FR-3.3 checks `Gem.path.map { |p| File.join(p, "cache", filename) }`.
178
- `Gem.path` includes user-defined paths from `GEM_PATH` environment variable.
179
- A developer with a misconfigured `GEM_PATH` pointing to a path they don't own could cause unexpected file-serving behaviour.
180
- This is a minor concern given localhost-only binding, but the spec should note that only paths where the file is readable are considered.
181
-
182
- ---
183
-
184
- ## 5. Security Concerns
185
-
186
- ### 5.1 Path traversal in `/gems/:filename.gem`
187
-
188
- FR-3.3 constructs a filesystem path from the URL parameter `filename`.
189
- A request to `/gems/../../../../etc/passwd` (URL-decoded by Rack before routing) would traverse outside `gems_path`.
190
- Even on localhost, any process on the same machine can make this request.
191
-
192
- The spec's security checklist dismisses auth and input validation as out of scope because the server "binds to 127.0.0.1 only."
193
- That reasoning does not cover path traversal — localhost binding doesn't prevent local processes from exploiting it.
194
- The spec should require that `filename` be validated to contain only safe characters (`[a-zA-Z0-9._-]`) or that the resolved path is asserted to be under `gems_path` before serving.
195
-
196
- ### 5.2 SSRF via cached rubygems.org requests
197
-
198
- The server makes outbound requests to `https://rubygems.org/info/:gemname` where `gemname` comes from the incoming request URL.
199
- A local process could request `/info/../../../../etc/passwd` — the gemname would be used to construct the upstream URL `https://rubygems.org/info/../../../../etc/passwd`.
200
- While rubygems.org would return a 404, the spec should require URL-encoding or validation of `:gemname` before constructing the upstream URL.
201
-
202
- ---
203
-
204
- ## 6. Performance Concerns
205
-
206
- ### 6.1 In-memory merge of `/versions` on each request (already flagged in checklist)
207
-
208
- The spec acknowledges this is an open question.
209
- A concrete recommendation: cache the fully merged `/versions` body in memory and invalidate it only when a gem is uploaded (cheap) or when the upstream ETag changes (already covered by FR-3.2).
210
- The spec should promote this from "open question" to a requirement: "merged `/versions` response is memoized in memory; invalidated on upload or upstream ETag change."
211
-
212
- ### 6.2 Full gem metadata re-scan on every upload
213
-
214
- FR-2.1 says "scans `gems_path/gems/*.gem`" after each upload.
215
- For a large private gem store, this O(n) re-scan on every upload is unnecessary.
216
- An incremental approach (add the newly uploaded gem to the in-memory index directly) would be more efficient.
217
- This is a recommendation rather than a blocker, but if left as a full re-scan, the spec should cap the acceptable gem count or note the known performance characteristic.
218
-
219
- ---
220
-
221
- ## 7. Testing Strategy Gaps
222
-
223
- ### 7.1 No unit tests specified for `CompactIndexServer`
224
-
225
- The spec introduces a new 200+ line Rack application implementing 8 endpoints, proxy logic, cache management, and ETag/Range support.
226
- The testing section mentions only one integration test update (lines 80–81) and four "Verify" lines that describe manual/integration scenarios.
227
-
228
- There are no unit tests specified for:
229
- - Correct `ETag` and `Repr-Digest` header generation
230
- - 304 response when ETag matches
231
- - 206 response for range requests
232
- - The merge logic for `/versions`
233
- - Corrupt gem handling (FR-4.3 gap above)
234
- - Offline fallback path (FR-3.4)
235
-
236
- Given the project's existing unit test pattern (one `test_*.rb` per class), `test/gemkeeper/test_compact_index_server.rb` should be called out explicitly, even if only to anchor a few key behaviours.
237
-
238
- ### 7.2 FR-4.2 verify claim is overconfident
239
-
240
- FR-4.2 says "the existing `test/gemkeeper/test_gem_uploader.rb` passes without modification."
241
- The existing tests only test connection failure paths (no live server involved).
242
- They do not test a successful upload against a real or mock server.
243
- The claim that the tests "pass without modification" is true today, but it does not verify that the upload flow actually works against the new server.
244
- A new integration test covering the upload round-trip should be called out here.
245
-
246
- ---
247
-
248
- ## 8. Spec Completeness Checklist Assessment
249
-
250
- | Item | Assessment |
251
- | ---- | ---------- |
252
- | Scope & acceptance criteria | ✅ Clear. Out of Scope list is precise and useful. |
253
- | Testing strategy | ⚠️ Thin. No unit tests for the new Rack app; FR-4.2 verify is misleading. |
254
- | Existing patterns | ✅ Correctly identifies `GemUploader`, `Dir.glob` list pattern, `ServerReadinessProbe`. |
255
- | Dependencies | ⚠️ `rubygems-generate_index` not addressed; `faraday`/`faraday-multipart` not mentioned as retained. |
256
- | Architecture & interfaces | ✅ Rack app interface, config.ru, storage layout clearly specified. |
257
- | Error handling & failure modes | ⚠️ Corrupt gem files, TOCTOU on upload, thread safety, and range-end handling are missing. |
258
- | Security review | ❌ Path traversal in filename parameter and SSRF in gemname-to-upstream-URL construction are unaddressed. The localhost-only justification does not cover these. |
259
- | Performance impact | ⚠️ Acknowledged as open question but not resolved. `/names` scope is a larger risk than the spec recognises. |
260
- | Rollout & migration | ✅ Drop-in; no data migration; Homebrew rebuild noted. |
261
- | Assumptions & risks | ⚠️ `compact_index` API stability flagged but no pre-implementation verification step prescribed. |