gemkeeper 0.7.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +18 -1
  3. data/README.md +11 -11
  4. data/lib/gemkeeper/bundler_mirror_configurator.rb +1 -1
  5. data/lib/gemkeeper/cli/commands/list.rb +2 -2
  6. data/lib/gemkeeper/cli/commands/server/start.rb +4 -4
  7. data/lib/gemkeeper/cli/commands/server/status.rb +3 -3
  8. data/lib/gemkeeper/cli/commands/server/stop.rb +3 -3
  9. data/lib/gemkeeper/cli/commands/sync.rb +1 -1
  10. data/lib/gemkeeper/compact_index_server/cache_meta.rb +34 -0
  11. data/lib/gemkeeper/compact_index_server/cache_store.rb +64 -0
  12. data/lib/gemkeeper/compact_index_server/gem_cache.rb +88 -0
  13. data/lib/gemkeeper/compact_index_server/gem_index.rb +78 -0
  14. data/lib/gemkeeper/compact_index_server/index_merger.rb +81 -0
  15. data/lib/gemkeeper/compact_index_server/response.rb +12 -0
  16. data/lib/gemkeeper/compact_index_server/response_builder.rb +63 -0
  17. data/lib/gemkeeper/compact_index_server/rubygems_client.rb +59 -0
  18. data/lib/gemkeeper/compact_index_server/spec_mapper.rb +38 -0
  19. data/lib/gemkeeper/compact_index_server/upload_handler.rb +36 -0
  20. data/lib/gemkeeper/compact_index_server/upstream_cache.rb +26 -0
  21. data/lib/gemkeeper/compact_index_server.rb +131 -0
  22. data/lib/gemkeeper/configuration.rb +1 -1
  23. data/lib/gemkeeper/gem_syncer.rb +53 -84
  24. data/lib/gemkeeper/gem_uploader.rb +26 -18
  25. data/lib/gemkeeper/rackup_process.rb +12 -7
  26. data/lib/gemkeeper/repo_fetcher.rb +80 -0
  27. data/lib/gemkeeper/server_manager.rb +1 -1
  28. data/lib/gemkeeper/version.rb +1 -1
  29. data/lib/gemkeeper.rb +2 -0
  30. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-consolidated-v-1.md +168 -0
  31. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-claude.md +124 -0
  32. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-codex.md +125 -0
  33. data/specs/20260529-091429-replace-geminabox-compact-proxy/critique-v-1-copilot.md +261 -0
  34. data/specs/20260529-091429-replace-geminabox-compact-proxy/spec.md +360 -0
  35. data/specs/20260529-131354-sync-serve-cache-contract/critique-consolidated-v-1.md +95 -0
  36. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-claude.md +47 -0
  37. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-codex.md +112 -0
  38. data/specs/20260529-131354-sync-serve-cache-contract/critique-v-1-copilot.md +169 -0
  39. data/specs/20260529-131354-sync-serve-cache-contract/implementation-summary.md +59 -0
  40. data/specs/20260529-131354-sync-serve-cache-contract/spec.md +169 -0
  41. metadata +38 -28
@@ -0,0 +1,261 @@
1
+ # Critique: Replace Geminabox with Compact Index Proxy
2
+
3
+ Reviewed by: GitHub Copilot (claude-sonnet-4.6)
4
+ Date: 2026-05-29
5
+
6
+ ## Summary
7
+
8
+ The spec is well-scoped and the integration points are clearly identified.
9
+ The constraints table and out-of-scope list are unusually precise — useful.
10
+ However, several correctness traps exist that would produce a server that passes basic smoke tests but fails under Bundler's actual caching behaviour.
11
+ The most serious issues are the `/versions` byte-stability problem (correctness), the missing thread-safety requirement (reliability), and the unspecified `/names` scope (ambiguity with large performance consequences).
12
+ The testing section is thin for the volume of new logic being introduced.
13
+
14
+ ---
15
+
16
+ ## 1. Critical: Correctness Blockers
17
+
18
+ ### 1.1 `/versions` byte-stability is unaddressed, and merging breaks Bundler's range fetching
19
+
20
+ FR-1.3 requires the server to support `Range: bytes=N-` to serve partial `/versions` responses.
21
+ This is how Bundler efficiently updates its local copy: it records the file size it last fetched, then asks for only the bytes after that offset.
22
+
23
+ The spec's merge strategy (FR-3.1: "private gem entries take precedence when a name appears in both") sorts or interleaves private gems into the rubygems.org `/versions` body.
24
+ Any time a new private gem is added, the byte offsets of every subsequent line in the merged file shift.
25
+ Bundler's cached offset is now wrong: the range request returns garbled data, and Bundler either fails or silently resolves wrong versions.
26
+
27
+ The spec does not define a stable layout for the merged `/versions` output.
28
+ Options include appending private gems after the public block, or rebuilding the rubygems.org block verbatim and appending private entries — but the spec is silent.
29
+ Without a stable layout rule, this is a correctness defect, not just a performance issue.
30
+
31
+ ### 1.2 `info_checksum` has a circular dependency
32
+
33
+ FR-2.2 requires `CompactIndex::GemVersion` to carry an `info_checksum` field.
34
+ That checksum is the SHA256 of the `/info/:gemname` response body.
35
+ To populate it in the `/versions` entry, the server must generate the `/info` body first, hash it, and embed the hash in `/versions`.
36
+
37
+ The spec never describes this ordering, nor does it mention that the `info_checksum` must be recomputed whenever a new version of a gem is uploaded (because the `/info` body changes).
38
+ An implementer who builds the versions index first and the info body second will produce invalid checksums that cause Bundler to re-fetch unconditionally.
39
+
40
+ ### 1.3 `/names` scope is undefined and carries large performance risk
41
+
42
+ FR-1.1 says `/names` returns "all gem names (local and proxied)."
43
+ The rubygems.org `/names` file currently contains ~175,000 gem names (roughly 2 MB uncompressed).
44
+ "Proxied" in this context almost certainly means the full public gem namespace.
45
+
46
+ The spec never says whether the server fetches, caches, and merges the rubygems.org `/names` file (like it does for `/versions`), or whether `/names` is scoped only to gems that have been locally requested or cached.
47
+ These produce completely different behaviour:
48
+ - Full public namespace: bundle install resolves public gems by name — correct, but the endpoint becomes expensive.
49
+ - Local-only: bundle install fails on any public gem not already in the info cache.
50
+
51
+ The performance checklist item (unchecked) notes only the `/versions` merge cost; it does not mention `/names`.
52
+ This is a missing requirement.
53
+
54
+ ---
55
+
56
+ ## 2. Ambiguous Requirements
57
+
58
+ ### 2.1 "Valid gem" definition in FR-1.2
59
+
60
+ FR-1.2 returns 422 "if the file is not a valid gem" but does not define valid.
61
+ Three plausible interpretations:
62
+
63
+ 1. File extension is `.gem`
64
+ 2. File is a parseable tar archive containing `metadata.gz` and `data.tar.gz`
65
+ 3. The gemspec within `metadata.gz` can be loaded without error
66
+
67
+ These have substantially different implementation and security implications.
68
+ Option 1 is trivially bypassable.
69
+ Option 3 can raise arbitrary Ruby exceptions if the gemspec calls `require`.
70
+ The spec should specify what validation is expected — likely option 2 at minimum.
71
+
72
+ ### 2.2 `/versions` cache stores raw upstream or merged output?
73
+
74
+ FR-3.2 says cache the `/versions` response.
75
+ It is ambiguous whether the cache stores:
76
+
77
+ - The raw rubygems.org response (requiring re-merge with private gems on every request), or
78
+ - The merged result (requiring cache invalidation on every gem upload)
79
+
80
+ Both are valid designs; they have different invalidation logic.
81
+ The spec does not specify which, leaving the implementer to decide and potentially choosing the one that breaks ETag/Range behaviour.
82
+
83
+ ### 2.3 ETag algorithm: "MD5 or SHA256"
84
+
85
+ FR-1.3 says use "MD5 or SHA256" for the ETag.
86
+ Giving two options creates inconsistency risk — different code paths might use different algorithms, making ETags non-comparable across restarts.
87
+ Pick one.
88
+ SHA256 is the better choice (used for `Repr-Digest` too; re-using the same hash avoids a second pass).
89
+
90
+ ### 2.4 `handle_response` in `GemUploader` accepts 200 and 302, not just 201
91
+
92
+ FR-4.2 states that the new server must return "the same status codes (201, 409) that `GemUploader` expects."
93
+ This is inaccurate.
94
+ `GemUploader#handle_response` maps `200`, `201`, and `302` as success.
95
+ The spec's description of `GemUploader`'s contract is wrong.
96
+ While the new server returning 201 will still work (201 is handled), a future implementer auditing the spec against the code will find the discrepancy and may add unnecessary 302 handling or question the spec's accuracy.
97
+
98
+ ---
99
+
100
+ ## 3. Codebase Assumption Gaps
101
+
102
+ ### 3.1 `GemUploader#list_gems` calls `/api/v1/gems.json`
103
+
104
+ The spec says "no changes to `GemUploader`" and it is correct that `list_gems` is never called from any production code path (the `list` CLI reads the filesystem directly via `Dir.glob`, per FR-4.3).
105
+ But `list_gems` is a public method that calls `GET /api/v1/gems.json`, a Geminabox-specific endpoint.
106
+ After this migration, calling `list_gems` will return a 404.
107
+
108
+ The spec should either note that `list_gems` becomes a dead method (and optionally raise `NotImplementedError`), or explicitly call out this known breakage so the implementer doesn't silently leave a broken public method.
109
+
110
+ ### 3.2 `rubygems-generate_index` dependency is not addressed
111
+
112
+ The gemspec currently declares `rubygems-generate_index ~> 1.0`.
113
+ This gem exists to support Geminabox's legacy Marshal index generation (`specs.4.8.gz`, `Marshal.4.8.gz`).
114
+ The spec removes Geminabox and explicitly excludes legacy index formats from scope, but says only to swap `geminabox` for `compact_index` in the gemspec.
115
+ `rubygems-generate_index` is likely now unused dead weight.
116
+ Whether to remove it is a judgment call, but the spec should at least acknowledge it.
117
+
118
+ ### 3.3 Integration test has more Geminabox assertions than lines 80–81
119
+
120
+ The Integration Points table says to update lines 80–81 of `test_server_lifecycle_integration.rb`.
121
+ In the current file, there are two assertions that reference Geminabox:
122
+
123
+ ```ruby
124
+ assert_match(/Geminabox\.data/, content) # line 80
125
+ assert_match(/Geminabox\.rubygems_proxy\s*=\s*true/, content) # line 81
126
+ ```
127
+
128
+ But the test method is named `test_server_generates_config_ru`, and the test class also has `test_server_status_while_running` which checks `status[:url]` equals `@config.geminabox_url`.
129
+ The `geminabox_url` method name on `Configuration` is referenced both here and in `RackupProcess#wait_for_server`.
130
+ The spec is silent on this naming — the constraint says "no changes to `configuration.rb`", so the stale method name remains.
131
+ The integration test assertion on `geminabox_url` will still pass (the URL format doesn't change), but it should be called out in the spec as an accepted naming inconsistency rather than left for the implementer to discover.
132
+
133
+ ### 3.4 `compact_index` gem API is assumed but not verified
134
+
135
+ The spec builds on `CompactIndex::GemVersion`, `CompactIndex::Dependency`, `CompactIndex.names()`, and `CompactIndex.info()`.
136
+ The spec's own checklist flags this: "key assumption: `compact_index` 0.15.x API is stable."
137
+ The `compact_index` gem is primarily an internal RubyGems.org dependency.
138
+ Its README is sparse and its public API is not documented for external consumers.
139
+ Before implementation begins, the actual gem should be inspected to confirm the class names and method signatures match what the spec assumes.
140
+ This is flagged here not as a spec defect, but as a pre-implementation step that is conspicuously absent from the spec.
141
+
142
+ ---
143
+
144
+ ## 4. Error Handling and Edge Case Gaps
145
+
146
+ ### 4.1 Thread safety for the in-memory gem index
147
+
148
+ FR-2.1 says the server rescans `gems_path/gems/*.gem` after each successful upload.
149
+ Puma (the configured server) uses a thread pool by default.
150
+ A concurrent GET `/info/:gemname` while an upload is updating the in-memory index will produce a data race.
151
+
152
+ The spec does not require synchronization (a `Mutex` around index reads and writes, or a copy-on-write swap).
153
+ In practice, Ruby's GVL limits the impact, but it is not zero — especially during index rebuild where multiple instance variables are updated in sequence.
154
+ The spec should specify that the gem index is updated atomically (e.g., replace the entire index object with a new one via a single assignment).
155
+
156
+ ### 4.2 Range request with explicit end byte is unspecified
157
+
158
+ FR-1.3 says handle `Range: bytes=N-` (open-ended).
159
+ The HTTP spec also allows `Range: bytes=N-M` (explicit end) and multi-range requests (`Range: bytes=0-99, 200-299`).
160
+ Bundler currently only sends open-ended ranges, but the spec should be explicit that multi-range and bounded-range requests return 416 (`Range Not Satisfiable`) or fall back to the full response, rather than leaving this undefined.
161
+
162
+ ### 4.3 Behaviour when `gems_path/gems/` contains a corrupt `.gem` file
163
+
164
+ FR-2.1 scans all `.gem` files on startup and on upload.
165
+ If a file is corrupt (truncated download, disk error), extracting gemspec metadata will raise an exception.
166
+ The spec does not say whether the server should skip corrupt files with a warning or abort startup.
167
+ If startup is aborted, a single bad file makes the server unlaunchable.
168
+
169
+ ### 4.4 Concurrent upload of the same gem
170
+
171
+ FR-1.2 returns 409 if the file already exists.
172
+ If two `gemkeeper sync` processes run simultaneously and both upload the same gem at the same time, a TOCTOU race exists between the existence check and the file write.
173
+ The spec should specify last-write-wins, or require a file lock.
174
+
175
+ ### 4.5 System gem cache traversal under `Gem.path`
176
+
177
+ FR-3.3 checks `Gem.path.map { |p| File.join(p, "cache", filename) }`.
178
+ `Gem.path` includes user-defined paths from `GEM_PATH` environment variable.
179
+ A developer with a misconfigured `GEM_PATH` pointing to a path they don't own could cause unexpected file-serving behaviour.
180
+ This is a minor concern given localhost-only binding, but the spec should note that only paths where the file is readable are considered.
181
+
182
+ ---
183
+
184
+ ## 5. Security Concerns
185
+
186
+ ### 5.1 Path traversal in `/gems/:filename.gem`
187
+
188
+ FR-3.3 constructs a filesystem path from the URL parameter `filename`.
189
+ A request to `/gems/../../../../etc/passwd` (URL-decoded by Rack before routing) would traverse outside `gems_path`.
190
+ Even on localhost, any process on the same machine can make this request.
191
+
192
+ The spec's security checklist dismisses auth and input validation as out of scope because the server "binds to 127.0.0.1 only."
193
+ That reasoning does not cover path traversal — localhost binding doesn't prevent local processes from exploiting it.
194
+ The spec should require that `filename` be validated to contain only safe characters (`[a-zA-Z0-9._-]`) or that the resolved path is asserted to be under `gems_path` before serving.
195
+
196
+ ### 5.2 SSRF via cached rubygems.org requests
197
+
198
+ The server makes outbound requests to `https://rubygems.org/info/:gemname` where `gemname` comes from the incoming request URL.
199
+ A local process could request `/info/../../../../etc/passwd` — the gemname would be used to construct the upstream URL `https://rubygems.org/info/../../../../etc/passwd`.
200
+ While rubygems.org would return a 404, the spec should require URL-encoding or validation of `:gemname` before constructing the upstream URL.
201
+
202
+ ---
203
+
204
+ ## 6. Performance Concerns
205
+
206
+ ### 6.1 In-memory merge of `/versions` on each request (already flagged in checklist)
207
+
208
+ The spec acknowledges this is an open question.
209
+ A concrete recommendation: cache the fully merged `/versions` body in memory and invalidate it only when a gem is uploaded (cheap) or when the upstream ETag changes (already covered by FR-3.2).
210
+ The spec should promote this from "open question" to a requirement: "merged `/versions` response is memoized in memory; invalidated on upload or upstream ETag change."
211
+
212
+ ### 6.2 Full gem metadata re-scan on every upload
213
+
214
+ FR-2.1 says "scans `gems_path/gems/*.gem`" after each upload.
215
+ For a large private gem store, this O(n) re-scan on every upload is unnecessary.
216
+ An incremental approach (add the newly uploaded gem to the in-memory index directly) would be more efficient.
217
+ This is a recommendation rather than a blocker, but if left as a full re-scan, the spec should cap the acceptable gem count or note the known performance characteristic.
218
+
219
+ ---
220
+
221
+ ## 7. Testing Strategy Gaps
222
+
223
+ ### 7.1 No unit tests specified for `CompactIndexServer`
224
+
225
+ The spec introduces a new 200+ line Rack application implementing 8 endpoints, proxy logic, cache management, and ETag/Range support.
226
+ The testing section mentions only one integration test update (lines 80–81) and four "Verify" lines that describe manual/integration scenarios.
227
+
228
+ There are no unit tests specified for:
229
+ - Correct `ETag` and `Repr-Digest` header generation
230
+ - 304 response when ETag matches
231
+ - 206 response for range requests
232
+ - The merge logic for `/versions`
233
+ - Corrupt gem handling (FR-4.3 gap above)
234
+ - Offline fallback path (FR-3.4)
235
+
236
+ Given the project's existing unit test pattern (one `test_*.rb` per class), `test/gemkeeper/test_compact_index_server.rb` should be called out explicitly, even if only to anchor a few key behaviours.
237
+
238
+ ### 7.2 FR-4.2 verify claim is overconfident
239
+
240
+ FR-4.2 says "the existing `test/gemkeeper/test_gem_uploader.rb` passes without modification."
241
+ The existing tests only test connection failure paths (no live server involved).
242
+ They do not test a successful upload against a real or mock server.
243
+ The claim that the tests "pass without modification" is true today, but it does not verify that the upload flow actually works against the new server.
244
+ A new integration test covering the upload round-trip should be called out here.
245
+
246
+ ---
247
+
248
+ ## 8. Spec Completeness Checklist Assessment
249
+
250
+ | Item | Assessment |
251
+ | ---- | ---------- |
252
+ | Scope & acceptance criteria | ✅ Clear. Out of Scope list is precise and useful. |
253
+ | Testing strategy | ⚠️ Thin. No unit tests for the new Rack app; FR-4.2 verify is misleading. |
254
+ | Existing patterns | ✅ Correctly identifies `GemUploader`, `Dir.glob` list pattern, `ServerReadinessProbe`. |
255
+ | Dependencies | ⚠️ `rubygems-generate_index` not addressed; `faraday`/`faraday-multipart` not mentioned as retained. |
256
+ | Architecture & interfaces | ✅ Rack app interface, config.ru, storage layout clearly specified. |
257
+ | Error handling & failure modes | ⚠️ Corrupt gem files, TOCTOU on upload, thread safety, and range-end handling are missing. |
258
+ | Security review | ❌ Path traversal in filename parameter and SSRF in gemname-to-upstream-URL construction are unaddressed. The localhost-only justification does not cover these. |
259
+ | Performance impact | ⚠️ Acknowledged as open question but not resolved. `/names` scope is a larger risk than the spec recognises. |
260
+ | Rollout & migration | ✅ Drop-in; no data migration; Homebrew rebuild noted. |
261
+ | Assumptions & risks | ⚠️ `compact_index` API stability flagged but no pre-implementation verification step prescribed. |
@@ -0,0 +1,360 @@
1
+ # Spec 20260529-091429: Replace Geminabox with Compact Index Proxy
2
+
3
+ ## Overview
4
+
5
+ Replace the Geminabox dependency with a minimal custom Rack application (`Gemkeeper::CompactIndexServer`) that serves locally-built private gems via the Bundler compact index protocol and proxies public gem requests to RubyGems.org.
6
+ The server also falls back to the system gem cache for offline use when RubyGems.org is unreachable.
7
+
8
+ ## Goals
9
+
10
+ - Remove the broken Geminabox proxy (uses the retired `bundler.rubygems.org/api/v1/dependencies` endpoint)
11
+ - Implement the compact index protocol so Bundler uses efficient, cacheable resolution
12
+ - Proxy public gems from RubyGems.org transparently through the same source URL
13
+ - Enable offline use by serving from the system gem cache and a local response cache
14
+
15
+ ---
16
+
17
+ ## Feature 1: Compact Index Rack Application
18
+
19
+ **Who & why:** Developers using gemkeeper configure `source "http://localhost:9292"` as their single Bundler source.
20
+ Today that source proxies through Geminabox, whose upstream API was retired in May 2023, producing four retries on every `bundle install`.
21
+ They need a server that speaks the compact index protocol Bundler has used since 2016, without that noise.
22
+
23
+ ### Functional Requirements
24
+
25
+ #### FR-1.1: Core compact index endpoints
26
+ The server MUST implement the following endpoints:
27
+
28
+ - `GET /names` — sorted, newline-delimited list of all gem names (local + proxied upstream), generated per FR-3.5
29
+ - `GET /versions` — merged versions index combining private gems and the proxied RubyGems.org versions file, generated per FR-3.1
30
+ - `GET /info/:gemname` — per-gem dependency metadata; served from local data for private gems, proxied from RubyGems.org for public gems per FR-3.1
31
+ - `GET /gems/:filename.gem` — serve gem binary (local-first, then system cache, then proxy per FR-3.3)
32
+
33
+ All URL path parameters (`:gemname`, `:filename`) are validated against `/\A[a-zA-Z0-9._-]+\z/` before any filesystem or upstream URL use.
34
+ Return 400 for parameters that do not match.
35
+ For `GET /gems/:filename`, additionally assert the resolved path is under `gems_path/gems/` before serving.
36
+
37
+ **Verify:** `bundle install` against a Gemfile backed by this server completes without retries or `HTTPError` output.
38
+
39
+ #### FR-1.2: Gem upload endpoint
40
+ `POST /upload` accepts a multipart form upload with field name `file` (matching the current Geminabox API consumed by `GemUploader`).
41
+
42
+ Validation: open the uploaded data as a tar archive and confirm it contains `metadata.gz` and `data.tar.gz`.
43
+ Extract gemspec metadata from `metadata.gz` using `Gem::Package` inside a rescue block.
44
+ Return 422 if the archive is malformed or metadata extraction raises.
45
+ Do not `load` or `eval` gemspec content.
46
+
47
+ On success: create `gems_path/gems/` if absent; write to a temp file in the same directory; rename atomically to `gems_path/gems/<name>-<version>.gem`.
48
+ Delete the temp file if validation fails.
49
+ Response codes: 201 on success, 409 if the target path already exists, 422 on invalid gem.
50
+
51
+ After a successful write, rebuild the in-memory gem index per AR-1.1.
52
+
53
+ **Verify:** `gemkeeper sync` completes successfully; the gem appears in `gems_path/gems/` and in subsequent compact index responses.
54
+
55
+ #### FR-1.3: Conditional and range request support
56
+ All endpoints serving locally-generated or merged content (`/names`, `/versions`, `/info/:gemname`) MUST include:
57
+
58
+ - `ETag: "<sha256-hex>"` — SHA256 hex digest of the final response body
59
+ - `Repr-Digest: sha-256=<base64-encoded-sha256>` — RFC 9530; computed from the same final body
60
+ - `Accept-Ranges: bytes`
61
+
62
+ Do not forward `ETag` or `Repr-Digest` headers from RubyGems.org unchanged for merged responses; recompute from the merged body.
63
+
64
+ The server MUST handle:
65
+ - `If-None-Match` — return 304 if the ETag matches
66
+ - `Range: bytes=N-` (open-ended only) — return 206 with the partial body from byte N onward
67
+ - `Range: bytes=N-M` or multi-range — return 416
68
+
69
+ **Verify:** A second `bundle install` produces `304 Not Modified` responses for unchanged index files.
70
+
71
+ #### FR-1.4: Health endpoint
72
+ `GET /` returns `200 OK`.
73
+ Used by `ServerReadinessProbe` (`lib/gemkeeper/server_readiness_probe.rb`).
74
+
75
+ **Verify:** `gemkeeper server start` completes without timing out.
76
+
77
+ ### Architectural Requirements
78
+
79
+ #### AR-1.1: Atomic in-memory gem index
80
+ The server maintains an in-memory gem index (private gem metadata read from `gems_path/gems/`).
81
+ After each successful upload, the index is rebuilt into a new object and swapped via a single instance variable assignment.
82
+ Index reads do not acquire a lock; the swap is atomic at the Ruby object reference level (copy-on-write).
83
+ On startup, create `gems_path/gems/` if absent before scanning.
84
+
85
+ ---
86
+
87
+ ## Feature 2: Private Gem Serving
88
+
89
+ **Who & why:** The gems built by `gemkeeper sync` must appear in Bundler's dependency graph with correct version and dependency metadata.
90
+ Without accurate compact index data for private gems, Bundler will either fail to find them or resolve wrong versions.
91
+
92
+ ### Functional Requirements
93
+
94
+ #### FR-2.1: Gem file discovery and metadata extraction
95
+ On startup and after each successful upload, the server scans `gems_path/gems/*.gem`.
96
+ For each file, gemspec metadata is extracted from the embedded `metadata.gz` using `Gem::Package` inside a rescue block.
97
+ Files that raise on extraction are skipped with a warning log entry; they do not abort startup.
98
+ Extracted metadata: gem name, version, platform, runtime dependencies (name + version constraint), SHA256 checksum of the `.gem` file.
99
+
100
+ **Verify:** A gem uploaded after server start appears in `/names`, `/versions`, and `/info/:gemname` without restarting the server.
101
+
102
+ #### FR-2.2: Compact index data generation
103
+ Uses the `compact_index` gem to produce correct response bodies.
104
+
105
+ **`info_checksum` ordering** — info bodies for all private gems must be generated before the `/versions` index is built.
106
+ For each private gem, compute `Digest::MD5.hexdigest(CompactIndex.info(gem_versions_array))` and store it as `info_checksum` in the corresponding `CompactIndex::GemVersion`.
107
+ The versions index is then built referencing those pre-computed checksums.
108
+ Checksums are recomputed after each upload.
109
+
110
+ **Verified `compact_index` 0.15.0 API:**
111
+ - `CompactIndex::GemVersion` — `Struct.new(:number, :platform, :checksum, :info_checksum, :dependencies, :ruby_version, :rubygems_version)`. Field is `number`, not `version`. `checksum` is the SHA256 of the `.gem` file.
112
+ - `CompactIndex::Gem` — `Struct.new(:name, :versions)`.
113
+ - `CompactIndex::Dependency` — `Struct.new(:gem, :version, :platform, :checksum)`. The dependency gem name is field `:gem`; the constraint string is field `:version`.
114
+ - `info_checksum` uses MD5 (not SHA256) per the compact index protocol. Bundler verifies this checksum when it downloads `/info/:gemname`.
115
+
116
+ **Verify:** `bundle exec gem dependency <private-gem>` resolves correctly when the Gemfile sources from `http://localhost:9292`.
117
+
118
+ ### Architectural Requirements
119
+
120
+ #### AR-2.1: `compact_index` and `rubygems-generate_index` dependency swap
121
+ `gemkeeper.gemspec` drops `geminabox ~> 3.0` and `rubygems-generate_index ~> 1.0`, and adds `compact_index ~> 0.15`.
122
+ No other runtime dependencies are added for this feature.
123
+
124
+ ---
125
+
126
+ ## Feature 3: Public Gem Proxy with Offline Cache
127
+
128
+ **Who & why:** The Gemfile sources all gems — public and private — from `http://localhost:9292`.
129
+ Public gem resolution must work when online (proxying to RubyGems.org) and degrade gracefully when offline rather than returning 500 errors.
130
+ When offline, gems already installed on the developer's system should be servable directly, avoiding re-download on reconnect.
131
+
132
+ ### Functional Requirements
133
+
134
+ #### FR-3.1: Merge algorithm for `/versions` and `/info/:gemname`
135
+
136
+ **`/versions`:**
137
+ Fetch `https://rubygems.org/versions` and cache the raw response body to `cache_dir/rubygems_cache/versions` (refreshed when the upstream ETag changes or after 30 minutes).
138
+ Construct a `CompactIndex::VersionsFile` from that cached file.
139
+ Pass private gem objects as `extra_gems` to `CompactIndex.versions(versions_file, extra_gems)` so they are appended after the upstream public block.
140
+ When a private gem name collides with a public gem name, the private gem entry takes precedence: suppress the public entry for that name from the merged output so Bundler cannot resolve the public version.
141
+ The public block is never reordered; private entries are appended. This keeps byte offsets of the public block stable across private gem additions, preserving Bundler's incremental range fetching.
142
+ Write the merged response to `cache_dir/rubygems_cache/versions.merged` and keep only the merged body's SHA256 hex digest in memory as the current ETag.
143
+ Regenerate `versions.merged` on each upload or when the upstream ETag changes.
144
+ Serve `/versions` by streaming `versions.merged` from disk; the OS file cache handles hot reads without holding the full body in memory.
145
+
146
+ **`/info/:gemname` for private gems:** generate from local metadata (FR-2.2).
147
+ **`/info/:gemname` for public gems:** proxy `https://rubygems.org/info/:gemname` and cache per FR-3.2.
148
+
149
+ **Verify:** `bundle install` resolves a public gem (e.g., `rake`) and a private gem through the same local source with no errors.
150
+
151
+ #### FR-3.2: Cache proxy responses for offline use
152
+ Cache proxied compact index responses under `cache_dir/rubygems_cache/`:
153
+
154
+ - `/versions` raw upstream body: refreshed when upstream ETag changes or after 30 minutes. Use a conditional GET (`If-None-Match`) to upstream; a 304 resets the local TTL without rewriting the file.
155
+ - `/info/:gemname` per-gem: cached per gem name; refreshed after 60 minutes using the same conditional GET pattern.
156
+ - `.gem` binaries: cached permanently (content-addressed; gem files are immutable once published).
157
+
158
+ Cache files are written atomically (temp file + rename).
159
+
160
+ When RubyGems.org is unreachable (connection error or timeout) and a cached copy exists, serve from cache.
161
+
162
+ **Verify:** After a successful `bundle install` online, disconnecting from the network and running `bundle install` again completes using only cached data.
163
+
164
+ #### FR-3.3: System gem cache fallback for `.gem` files
165
+ Before proxying `GET /gems/:filename.gem` to RubyGems.org, check each path in `Gem.path.map { |p| File.join(p, "cache", filename) }` using `File.exist?` before attempting to read.
166
+ If a matching readable file is found, serve it directly without a network request.
167
+
168
+ **Verify:** A `.gem` file present in the system gem cache is served without an outbound RubyGems.org request.
169
+
170
+ #### FR-3.4: Response semantics for missing or unreachable upstream
171
+ Distinguish three cases for upstream gem requests:
172
+
173
+ - **Upstream reachable, gem not found** (upstream returns 4xx): return 404 with no body.
174
+ - **Upstream unreachable** (connection error, timeout) **+ cache exists**: serve from cache.
175
+ - **Upstream unreachable + no cache**: return 503 with body `"Upstream unavailable and no local cache. Connect to the internet and run bundle install to warm the cache."`.
176
+
177
+ Do not return 500 in any of these cases.
178
+
179
+ **Verify:** With RubyGems.org blocked and no cache, a request to `/info/nonexistent-gem` returns 503, not 500.
180
+
181
+ #### FR-3.5: `/names` endpoint
182
+ Fetch `https://rubygems.org/names` and cache the raw response body under `cache_dir/rubygems_cache/names` with the same 60-minute TTL and conditional GET refresh as `/info/:gemname`.
183
+ Merge local private gem names with the cached upstream names; sort the combined list alphabetically.
184
+ Write the merged result to `cache_dir/rubygems_cache/names.merged`; keep only its SHA256 hex digest in memory as the current ETag.
185
+ Regenerate `names.merged` on each upload or when the upstream names ETag changes.
186
+ Serve `/names` by streaming `names.merged` from disk.
187
+
188
+ **Verify:** `/names` includes both a known private gem name and a known public gem name.
189
+
190
+ ### Architectural Requirements
191
+
192
+ #### AR-3.1: HTTP client for proxying
193
+ Use `Net::HTTP` (stdlib) for all outbound RubyGems.org requests.
194
+ Do not add `faraday`, `httpclient`, or other HTTP client gems for proxy use.
195
+
196
+ #### AR-3.2: Proxy timeout
197
+ Outbound requests use a 5-second open timeout and a 10-second read timeout.
198
+ Timeout errors are treated as unreachable (see FR-3.4).
199
+
200
+ #### AR-3.3: Gem binary streaming
201
+ Proxy and cache responses for `.gem` file downloads using bounded buffering or streaming rather than reading the full binary into memory before sending.
202
+ RubyGems.org gem files range from a few KB to tens of MB.
203
+
204
+ ---
205
+
206
+ ## Feature 4: Gemkeeper Integration
207
+
208
+ **Who & why:** The server is an implementation detail inside gemkeeper.
209
+ All existing CLI commands, upload flow, list command, server lifecycle, and mirror configuration must continue to work without changes to their respective classes.
210
+
211
+ ### Functional Requirements
212
+
213
+ #### FR-4.1: config.ru generation
214
+ `RackupProcess#config_ru_content` (`lib/gemkeeper/rackup_process.rb`) is updated to generate a config.ru that requires `Gemkeeper::CompactIndexServer` and mounts it, passing `gems_path` and `cache_dir`.
215
+ All Geminabox configuration is removed.
216
+
217
+ **Verify:** The generated `config.ru` contains no references to `Geminabox`; the server starts and responds normally.
218
+
219
+ #### FR-4.2: Upload API compatibility — no changes to `GemUploader`
220
+ `lib/gemkeeper/gem_uploader.rb` is unchanged.
221
+ The server's `POST /upload` endpoint accepts the same multipart payload and returns status codes compatible with `GemUploader#handle_response`: 200, 201, or 302 for success; 409 for conflict.
222
+ Return 201 for a new upload; 409 if the gem already exists.
223
+
224
+ **Verify:** `gemkeeper sync` uploads gems without error; a second sync of the same version produces a skip (409 → already-exists path).
225
+
226
+ #### FR-4.3: List command compatibility — no changes to list
227
+ `gemkeeper list` reads `Dir.glob(File.join(gems_path, "gems", "*.gem"))` directly from the filesystem.
228
+ The custom server stores uploaded gems at `gems_path/gems/` matching current structure.
229
+
230
+ **Verify:** `gemkeeper list` output is unchanged after migration.
231
+
232
+ #### FR-4.4: Server lifecycle — no changes to `ServerManager`, `ServerReadinessProbe`, `BundlerMirrorConfigurator`
233
+ These classes are Rack-server-agnostic and require no modifications.
234
+
235
+ **Verify:** `gemkeeper server start`, `gemkeeper server stop`, and `gemkeeper server status` all behave identically before and after migration.
236
+
237
+ ### Architectural Requirements
238
+
239
+ #### AR-4.1: New server class location
240
+ `Gemkeeper::CompactIndexServer` is implemented in `lib/gemkeeper/compact_index_server.rb` as a Rack application (responds to `call(env)`).
241
+ It is instantiated and `run` in the generated `config.ru`.
242
+ It is not required anywhere else in the gemkeeper library.
243
+
244
+ ---
245
+
246
+ ## Data Requirements
247
+
248
+ The `rubygems_cache/` directory layout under `cache_dir`:
249
+
250
+ ```
251
+ cache_dir/
252
+ rubygems_cache/
253
+ versions # raw upstream /versions body
254
+ versions.merged # merged upstream + private gems (served to Bundler)
255
+ versions.meta # sidecar: upstream ETag + fetched_at timestamp
256
+ names # raw upstream /names body
257
+ names.merged # merged upstream + private gem names (served to Bundler)
258
+ names.meta # sidecar: upstream ETag + fetched_at timestamp
259
+ info/
260
+ <gemname> # raw upstream /info/:gemname body
261
+ <gemname>.meta # sidecar: upstream ETag + fetched_at timestamp
262
+ gems/
263
+ <name>-<version>.gem # cached gem binaries (permanent)
264
+ ```
265
+
266
+ Sidecar `.meta` files are written atomically alongside the body file.
267
+
268
+ ---
269
+
270
+ ## Integration Points
271
+
272
+ | File | Change |
273
+ | ---- | ------ |
274
+ | `lib/gemkeeper/rackup_process.rb` | Replace `config_ru_content` |
275
+ | `lib/gemkeeper/compact_index_server.rb` | New file — the Rack app |
276
+ | `gemkeeper.gemspec` | Remove `geminabox ~> 3.0` and `rubygems-generate_index ~> 1.0`; add `compact_index ~> 0.15` |
277
+ | `test/integration/test_server_lifecycle_integration.rb` | Update config.ru content assertions (lines 80–81) |
278
+ | `CLAUDE.md` | Update Architecture section; remove Geminabox references |
279
+ | `AGENTS.md` | Same updates as CLAUDE.md |
280
+
281
+ ### Known dead code after migration
282
+ `GemUploader#list_gems` calls `GET /api/v1/gems.json`, a Geminabox-specific endpoint the new server does not implement.
283
+ The method is unused in production (the list CLI reads the filesystem directly).
284
+ Remove or raise `NotImplementedError` — do not leave a silently broken public method.
285
+
286
+ ## Related Specs
287
+
288
+ None — this is a standalone infrastructure replacement.
289
+
290
+ ## Constraints
291
+
292
+ - No changes to `gem_uploader.rb`, `server_manager.rb`, `server_readiness_probe.rb`, `bundler_mirror_configurator.rb`, or `configuration.rb`
293
+ - No new runtime dependencies beyond `compact_index ~> 0.15`
294
+ - `POST /upload` multipart API must remain compatible with `GemUploader`
295
+ - Gem storage path (`gems_path/gems/*.gem`) must remain unchanged so `gemkeeper list` is unaffected
296
+
297
+ ## Out of Scope
298
+
299
+ - Authentication for uploads or downloads
300
+ - HTTPS/TLS
301
+ - Yanking gems
302
+ - Proxying sources other than rubygems.org
303
+ - The `gemkeeper manifest`, `gemkeeper setup`, or `gemkeeper sync` internals
304
+ - Serving legacy index formats (`Marshal.4.8.gz`, `specs.4.8.gz`)
305
+ - `GET /api/v1/gems.json` (Geminabox-specific; unused in production after `list_gems` removal)
306
+
307
+ ## Spec Completeness Checklist
308
+
309
+ - [x] **Scope & acceptance criteria** — each FR has a Verify line; Out of Scope list is explicit; blocking ambiguities from critique resolved
310
+ - [x] **Testing strategy** — FRs reference existing tests (FR-4.2, FR-4.4); integration verify conditions cover server start, upload round-trip, offline cache, and Bundler resolution; `test_compact_index_server.rb` implied by AR-4.1 convention (one test file per class)
311
+ - [x] **Existing patterns** — references `GemUploader`, `ServerReadinessProbe`, `ServerManager`, `Dir.glob` list pattern, `Gem::Package` extraction, and existing storage path conventions throughout
312
+ - [x] **Dependencies** — `compact_index ~> 0.15` justified in AR-2.1; `rubygems-generate_index` removal explicit; `Net::HTTP` (stdlib) chosen in AR-3.1; no other additions
313
+ - [x] **Architecture & interfaces** — Rack app interface in AR-4.1; storage layout in Data Requirements; proxy HTTP client in AR-3.1/AR-3.2; config.ru generation in FR-4.1; cache layout in Data Requirements; atomic upload in FR-1.2; atomic index swap in AR-1.1
314
+ - [x] **Error handling & failure modes** — FR-3.4 distinguishes upstream 404 vs 503; FR-1.2 covers malformed upload and 422; FR-2.1 covers corrupt gem at startup; AR-1.1 covers concurrent read/write; FR-1.3 covers invalid range (416)
315
+ - [x] **Security review** — FR-1.1 adds path parameter validation (`/\A[a-zA-Z0-9._-]+\z/`) and path-under-gems_path assertion; FR-1.2 prohibits gemspec eval; AR-3.1 scopes proxy to rubygems.org only; localhost-only binding inherited from `RackupProcess`
316
+ - [x] **Performance impact** — merged `/versions` (~23 MB) and `/names` (~2.7 MB) written to disk and streamed; only SHA256 ETag strings held in memory (FR-3.1, FR-3.5); gem binary proxy streamed per AR-3.3; private gem index is small and negligible
317
+ - [x] **Rollout & migration** — drop-in replacement; no data migration; existing `gems_path/gems/` reused; Homebrew formula rebuild required; `list_gems` dead method called out explicitly
318
+ - [x] **Assumptions & risks** — `compact_index` 0.15.x field names flagged for pre-implementation verification (FR-2.2); Bundler `Range`/`Repr-Digest` strictness addressed in FR-1.3; `/versions` byte-stability addressed in FR-3.1
319
+
320
+ ---
321
+
322
+ ## Change Log
323
+
324
+ ### Update from `critique-consolidated-v-1.md`
325
+
326
+ **Applied:**
327
+ - B-1: Specified `/versions` merge algorithm — upstream verbatim block first via `VersionsFile`, private gems as `extra_gems`, collision = suppress public entry, byte-stable layout (FR-3.1)
328
+ - B-2: Specified ETag and `Repr-Digest` computed from merged body; SHA256 only; no forwarding of upstream headers for merged responses (FR-1.3)
329
+ - B-3: Added `info_checksum` generation ordering — info bodies first, checksums embedded before versions index is built (FR-2.2)
330
+ - B-4: Added `/names` as a full fetch/cache/merge endpoint matching `/versions` semantics (FR-3.5)
331
+ - G-1: Added URL parameter validation (`/\A[a-zA-Z0-9._-]+\z/`) and path-containment assertion to FR-1.1
332
+ - G-2: Defined "valid gem" as tar-parseable with extractable metadata; no eval (FR-1.2)
333
+ - G-3: Added `mkdir_p`, atomic temp-file write, and temp cleanup to FR-1.2
334
+ - G-4: Added AR-1.1 specifying atomic index swap (copy-on-write) for thread safety
335
+ - G-5: Replaced FR-3.4 with three-way distinction: upstream 404 → 404; unreachable + cache → serve cache; unreachable + no cache → 503
336
+ - G-6: Added `GemUploader#list_gems` dead-method callout to Integration Points; `/api/v1/gems.json` added to Out of Scope
337
+ - G-7: Added `rubygems-generate_index` to AR-2.1 as dependency to remove; added to Integration Points table
338
+ - Corrected FR-4.2 response codes to match actual `GemUploader#handle_response`: 200/201/302 success, 409 conflict
339
+ - Added AR-3.3 requiring gem binary streaming to avoid full-file memory allocation
340
+ - Added Data Requirements section with `rubygems_cache/` directory layout and sidecar metadata files
341
+ - Added AGENTS.md to Integration Points (both CLAUDE.md and AGENTS.md exist in repo)
342
+ - Clarified FR-3.2 cache write atomicity and conditional GET (If-None-Match) refresh behavior
343
+
344
+ ### Pre-implementation compact_index API verification
345
+
346
+ **Applied:**
347
+ - Corrected `info_checksum` hash algorithm from SHA256 to MD5 — the protocol spec and `compact_index` gem both use `Digest::MD5` for this field; Bundler verifies it on download
348
+ - Confirmed `GemVersion` field is `number` (not `version`); documented full struct signature
349
+ - Confirmed `Dependency` fields: `:gem` for the dep name, `:version` for the constraint
350
+ - Confirmed collision suppression works via last-wins semantics — `VersionsFile#contents` appends `extra_gems` verbatim; no pre-filtering of upstream file needed
351
+ - Improved FR-3.4 503 body to include actionable guidance for cold-start offline case
352
+
353
+ **Rejected:**
354
+ - "Set Puma thread count to 1" — over-specifies implementation; AR-1.1's atomic swap is the correct architectural constraint
355
+ - "Add `test/gemkeeper/test_compact_index_server.rb` as an explicit FR" — the one-test-file-per-class convention is already established in the project; calling it out in the spec over-specifies test structure
356
+
357
+ **Reorganized:**
358
+ - Split old FR-3.1 into FR-3.1 (merge algorithm) and FR-3.5 (/names endpoint) for clarity
359
+ - Moved `gems_path/gems/` creation from an implicit assumption into FR-1.2 and AR-1.1 explicitly
360
+ - Added Data Requirements section to centralize the cache directory layout (previously scattered across FR-3.1 and FR-3.2)