npm - @legioncodeinc/rflectr - Versions diffs - 0.1.0 → 0.1.1 - Mend

@legioncodeinc/rflectr 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/library/requirements/completed/prd-011-claude-desktop-integration/prd-011-claude-desktop-integration-index.md ADDED Viewed

@@ -0,0 +1,228 @@
+# PRD-011: Claude Desktop Integration *(Retroactive)*
+> **Status:** Shipped
+> **Priority:** —
+> **Effort:** —
+> **Written:** June 2026
+> **Retroactive:** Yes — written after implementation (rflectr v0.2.7).
+> **Source:** `src/claude-desktop/*`, `src/claude-app.ts`
+---
+## Overview
+`rflectr claude-app` launches the **Claude Desktop** app in third-party-inference ("3P") mode, pointed at a local rflectr gateway instead of Anthropic's servers. The user picks a provider + model (or a favorites catalog), rflectr starts an in-process gateway server on a random local port, writes a gateway config into Claude Desktop's on-disk 3P config library, and opens (or restarts) the app. On exit, rflectr restores the original config.
+The defining constraint of this surface: **a desktop app cannot inherit environment variables** the way a launched CLI can. The CLI launchers (`rflectr claude`, `codex`, `gemini`) point the host at the proxy purely through child-process env vars and never touch a config file (see [PRD-001 — CLI Core & Launch Orchestration](../prd-001-cli-core-launch-orchestration/prd-001-cli-core-launch-orchestration-index.md)). Claude Desktop has no such hook. So this is the one surface where rflectr **writes the host application's own config file** — and therefore must back it up and restore it on exit, guarded by a lock file to survive crashes. This is the deliberate exception to rflectr's env-only isolation contract.
+Entry point: `runClaudeAppCommand` (`src/claude-app.ts:61`). Supported on **macOS and Windows only** (`claudeAppSupported`, `src/claude-desktop/app-launch.ts:11`).
+> See also: [`harnesses.md`](../../../knowledge/private/integrations/harnesses.md) (private integration notes — the "where each host departs from the pattern" overview) and [`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md) (public user-facing setup guide).
+---
+## What Was Built
+- A `rflectr claude-app` command that, on macOS/Windows, drives Claude Desktop into 3P mode against a local gateway with no manual config editing.
+- A **config-write** path: a `<uuid>.json` gateway config is written into the Claude Desktop 3P config library, and `_meta.json`'s `appliedId` is repointed at it so the app picks it up on next launch (`writeRflectrConfig`, `src/claude-desktop/app-config.ts:58`).
+- A **backup + restore** path: `_meta.json` is copied to `_meta.json.bak` before the patch, and a `.rflectr.lock` file records the live session. On clean exit, crash recovery, or `--restore`, the backup is restored and the injected config removed (`src/claude-desktop/app-session.ts`).
+- **Session lifecycle** handling: concurrent-session detection, stale-session recovery on startup, SIGINT/SIGTERM-driven shutdown, and a `process.on('exit')` cleanup hook.
+- **Model selection** reusing the Codex provider/model pickers, plus a favorites-catalog mode driven by saved preferences.
+- The gateway itself is the full `server` gateway (`startServer`, see [PRD-012 — Server Gateway](../prd-012-server-gateway/prd-012-server-gateway-index.md)), serving `/anthropic` with a synthetic model catalog — not a bespoke per-protocol proxy.
+---
+## Goals
+- Let a user route Claude Desktop's **Cowork** and **Code** tabs at registry providers / OpenCode Zen / Go with one command, no manual Developer-menu config.
+- Never leave Claude Desktop's config in a broken 3P state after the rflectr session ends — restore the prior config on exit and after crashes.
+- Keep the model catalog under the user's control (single selected model, or a favorites-only catalog).
+- Reuse the existing `server` gateway and SDK translation layer rather than building a Claude-Desktop-specific proxy.
+## Non-Goals
+- **Linux support.** macOS and Windows only; `claudeAppSupported()` throws otherwise (`src/claude-desktop/app-launch.ts:11`).
+- **Restoring Claude Desktop to first-party (Anthropic sign-in / Chat tab) mode.** rflectr restores the *pre-session* 3P config state; full 1P revert is a documented manual procedure in the user guide ([`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md) → "Restore Claude Desktop to Anthropic's servers").
+- **The Chat tab.** With 3P inference, Claude Desktop offers only Cowork and Code; Chat is an Anthropic product constraint, not an rflectr limitation.
+- **Editing `settings.json`-style host config beyond the 3P config library.** rflectr only writes the `<uuid>.json` and `_meta.json` under `Claude-3p/configLibrary/`.
+- **Mid-session model switching** inside the running app (the Gemini CLI surface has a `.model` switch; Claude Desktop does not).
+---
+## Features
+| # | Feature | Implementation |
+|---|---------|----------------|
+| F1 | `rflectr claude-app` command + help/restore/trace flags | `runClaudeAppCommand`, `claudeAppHelpText` (`src/claude-app.ts:27`, `src/claude-app.ts:61`) |
+| F2 | Platform gate (macOS/Windows only) | `claudeAppSupported` (`src/claude-desktop/app-launch.ts:11`) |
+| F3 | App discovery (Claude.app / Claude.exe / Start menu) | `findClaudeApp` (`src/claude-desktop/app-launch.ts:68`) |
+| F4 | Launch or restart the app | `launchOrRestartClaudeApp` (`src/claude-desktop/app-launch.ts:198`) |
+| F5 | Provider + model picker (reuses Codex pickers) | `pickCodexProvider` / `pickCodexModel` via `src/claude-app.ts:127`–`134` |
+| F6 | Favorites-catalog mode | `favorites.length > 0` branch + `filterServerModelsByFavorites` (`src/claude-app.ts:121`, `src/claude-app.ts:153`) |
+| F7 | Gateway config write into 3P config library | `writeRflectrConfig` (`src/claude-desktop/app-config.ts:58`) |
+| F8 | `_meta.json` backup before patch | `backupMetaJson` (`src/claude-desktop/app-session.ts:43`) |
+| F9 | Lock file (pid / startedAt / uuid / proxyPort) | `ClaudeSessionLock`, `writeSessionLock` (`src/claude-desktop/app-session.ts:7`, `:28`) |
+| F10 | Concurrent + stale session detection | `isConcurrentLiveSession`, `hasStaleSession` (`src/claude-desktop/app-session.ts:76`, `:67`) |
+| F11 | Shutdown wait (SIGINT/SIGTERM) + exit-hook cleanup | `waitForShutdown`, `setupExitCleanup` (`src/claude-desktop/app-session.ts:94`, `:119`) |
+| F12 | Restore on exit / crash / `--restore` | `cleanupSession`, `recoverSession` (`src/claude-desktop/app-session.ts:113`, `:82`) |
+| F13 | Local gateway (the `server` gateway, `/anthropic`) | `startServer` + `createGatewayModelCatalog` (`src/claude-app.ts:183`) |
+| F14 | Recent-models persistence per provider | `savePreferences` recent-models update (`src/claude-app.ts:206`–`212`) |
+---
+## Architecture & Implementation
+### Where this surface departs from the env-only rule
+CLI hosts are pointed at the proxy through child-process environment variables only (PRD-001). A desktop app has no env to inherit, so Claude Desktop reads a **config file** at launch. rflectr therefore writes that config — and to keep the rule "never permanently mutate the host" intact, it pairs every write with a backup and a guaranteed restore.
+```mermaid
+flowchart TD
+    cli["CLI hosts (claude, codex, gemini)"] -->|child-process env vars| proxy["translating proxy / gateway"]
+    app["Desktop app (claude-app)"] -->|"writes &lt;uuid&gt;.json + repoints _meta.json"| proxy
+    app -.->|"_meta.json.bak + .rflectr.lock"| restore["restore original config on exit / crash"]
+    proxy --> gw["server gateway /anthropic + SDK adapter"]
+```
+### Gateway config shape
+`buildRflectrConfig(proxyPort)` (`src/claude-desktop/app-config.ts:48`) produces the 3P gateway profile:
+```jsonc
+{
+  "inferenceProvider": "gateway",
+  "inferenceGatewayBaseUrl": "http://127.0.0.1:<port>/anthropic",
+  "inferenceGatewayApiKey": "dummy",
+  "inferenceGatewayAuthScheme": "bearer",
+  "coworkEgressAllowedHosts": ["*"]
+}
+```
+- `inferenceGatewayBaseUrl` ends in `/anthropic` with **no `/v1` suffix** — Claude Desktop appends `/v1/models` and `/v1/messages` itself; a `/anthropic/v1` URL would break discovery and inference (documented in [`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md)).
+- The API key is the literal `'dummy'` because local-mode gateway has no server password; the server is started with `apiKey: 'dummy'` / `serverPassword: null` (`src/claude-app.ts:186`–`187`).
+### Config write + `_meta.json` repointing
+`writeRflectrConfig(proxyPort)` (`src/claude-desktop/app-config.ts:58`):
+1. Generates a `randomUUID()` and writes `buildRflectrConfig(...)` to `configLibrary/<uuid>.json`.
+2. Reads (or initializes) `_meta.json`, sets `appliedId = uuid`, and appends an `entries` row `{ id: uuid, name: 'Rflectr Gateway' }` if not already present.
+3. Returns the uuid (used as the session/cleanup key).
+Config roots, by platform (`getClaudeDesktopHome`, `src/claude-desktop/app-config.ts:8`):
+| Platform | 3P config root |
+|---|---|
+| macOS | `~/Library/Application Support/Claude-3p/` |
+| Windows | `%LOCALAPPDATA%\Claude-3p/` |
+The config library and `_meta.json` live under `configLibrary/` within that root (`getConfigLibraryPath`, `getMetaJsonPath`, `src/claude-desktop/app-config.ts:15`, `:19`).
+### Backup / restore via lock files
+The lock and backup are the safety mechanism that makes config-writing reversible:
+- **Backup:** `backupMetaJson()` copies `_meta.json` → `_meta.json.bak` *before* the patch (`src/claude-desktop/app-session.ts:43`). Called at `src/claude-app.ts:181`, before `startServer` and `writeRflectrConfig`.
+- **Lock:** `writeSessionLock({ pid, startedAt, uuid, proxyPort })` writes `.rflectr.lock` in the 3P home (`src/claude-desktop/app-session.ts:28`; shape `ClaudeSessionLock`, `:7`). Written at `src/claude-app.ts:196` right after the config is applied.
+- **Restore:** `restoreMetaJson()` copies the `.bak` back over `_meta.json` and deletes the backup (`src/claude-desktop/app-session.ts:51`). `removeRflectrConfig(uuid)` deletes the injected `<uuid>.json` (`:60`).
+- **Cleanup entry points:** `cleanupSession(uuid)` (clean exit, `:113`) and `recoverSession()` (crash / `--restore`, `:82`) both run restore + config-removal + lock deletion. `setupExitCleanup(uuid)` registers `cleanupSession` on `process.on('exit')` as a last-resort net (`:119`).
+### Session lifecycle
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant R as rflectr claude-app
+    participant FS as Claude-3p config
+    participant App as Claude Desktop
+    U->>R: rflectr claude-app
+    R->>R: claudeAppSupported() + TTY check
+    R->>R: isConcurrentLiveSession()? -> abort if live
+    R->>R: hasStaleSession()? -> recoverSession()
+    R->>R: pick provider + model (or favorites)
+    R->>FS: backupMetaJson()  (_meta.json -> .bak)
+    R->>R: startServer() on 127.0.0.1:0
+    R->>FS: writeRflectrConfig(port)  (<uuid>.json + _meta.appliedId)
+    R->>FS: writeSessionLock({pid,uuid,port})
+    R->>R: setupExitCleanup(uuid)
+    R->>App: launchOrRestartClaudeApp()
+    U-->>R: Ctrl+C (SIGINT)
+    R->>R: waitForShutdown() resolves
+    R->>FS: cleanupSession(uuid)  (restore .bak, rm config, rm lock)
+    R->>App: optionally quitClaudeAppGracefully()
+```
+Startup guards (`src/claude-app.ts`):
+- **Interactive-terminal required** — non-TTY aborts (`src/claude-app.ts:84`).
+- **Concurrent session** — `isConcurrentLiveSession()` (lock present + pid alive) aborts with a "stop it with Ctrl+C" message (`src/claude-app.ts:90`; `src/claude-desktop/app-session.ts:76`).
+- **Stale session** — `hasStaleSession()` (lock present + pid dead) triggers `recoverSession()` to clean up a prior crash before proceeding (`src/claude-app.ts:96`).
+Shutdown ordering (`src/claude-app.ts:232`–`245`): `waitForShutdown()` resolves on SIGINT/SIGTERM, then `cleanupSession(uuid)` runs **before** the optional "close Claude Desktop?" prompt — so the config is restored ASAP and a second Ctrl+C during the prompt finds nothing left to undo (per the inline comment at `src/claude-app.ts:235`).
+### Model selection
+- Providers come from `fetchProviderCatalog({ agent: 'codex-app' })`, filtered by `codexCompatibleProviders(..., 'codex-app')` (`src/claude-app.ts:105`, `:113`). The provider's models are narrowed with `routableModelsForProvider(provider, 'codex-app')` (`providerForClaudePicker`, `src/claude-app.ts:57`).
+- **Single-model mode:** the picked model becomes a one-entry `ServerModelInfo[]` carrying `modelFormat`, `npm`, `apiBaseUrl`, `baseUrl`, `completionsUrl`, `upstreamModelId`, `contextWindow`, and the resolved provider `apiKey` (`src/claude-app.ts:157`–`174`). Credential resolved via `activeProvider.apiKey` or `resolveProviderCredential(id, authRef)` (`src/claude-app.ts:138`–`148`).
+- **Favorites mode:** when `prefs.favoriteModels.length > 0`, a `__favorites__` picker option loads all server models and filters them with `filterServerModelsByFavorites(allModels, favorites)` (`src/claude-app.ts:153`–`155`).
+- Either way, the model list is wrapped in `createGatewayModelCatalog(serverModels, { maskGatewayIds: true })` and handed to `startServer` (`src/claude-app.ts:188`). Discovery-id masking is on so Claude Desktop's competitor-name filtering doesn't hide models (rationale in [`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md) troubleshooting).
+- Recent models per provider are persisted (single-mode only) via `savePreferences` (`src/claude-app.ts:206`–`212`).
+### App discovery & launch (platform specifics)
+`findClaudeApp()` (`src/claude-desktop/app-launch.ts:68`):
+- **macOS:** checks `/Applications/Claude.app` and `~/Applications/Claude.app`, then falls back to `mdfind` by bundle id `com.anthropic.claudefordesktop`.
+- **Windows:** checks `%LOCALAPPDATA%\Programs\Claude` and `%LOCALAPPDATA%\Claude` (including `app-*` subfolders) for `Claude.exe`, then falls back to `Get-StartApps` returning a `shell:AppsFolder\<AppID>` URI.
+`launchOrRestartClaudeApp()` (`src/claude-desktop/app-launch.ts:198`): if the app isn't running, it just opens it; if it is running, it prompts to restart (so the new config is read), quits gracefully (`osascript` on macOS, `CloseMainWindow()` on Windows), waits up to 5s, force-quits Windows PIDs if needed, then reopens. "Is it running?" uses `osascript` (macOS) or PowerShell `Get-Process` / matching-PID checks (Windows) (`isClaudeAppRunning`, `:121`).
+---
+## Acceptance Criteria
+- [x] `rflectr claude-app` launches Claude Desktop in 3P gateway mode on macOS and Windows (`runClaudeAppCommand`, `src/claude-app.ts:61`; `claudeAppSupported`, `src/claude-desktop/app-launch.ts:11`).
+- [x] A `<uuid>.json` gateway config is written into the 3P config library with `inferenceProvider: 'gateway'`, a `/anthropic` base URL (no `/v1`), `bearer` auth, and a `'dummy'` key (`buildRflectrConfig` / `writeRflectrConfig`, `src/claude-desktop/app-config.ts:48`, `:58`).
+- [x] `_meta.json` `appliedId` is repointed at the new uuid and an `entries` row is added (`src/claude-desktop/app-config.ts:65`–`71`).
+- [x] `_meta.json` is backed up to `_meta.json.bak` before the patch (`backupMetaJson`, `src/claude-desktop/app-session.ts:43`; called `src/claude-app.ts:181`).
+- [x] A `.rflectr.lock` records `pid`, `startedAt`, `uuid`, and `proxyPort` (`ClaudeSessionLock` / `writeSessionLock`, `src/claude-desktop/app-session.ts:7`, `:28`).
+- [x] On clean exit (Ctrl+C), the original `_meta.json` is restored, the injected config removed, and the lock deleted (`cleanupSession`, `src/claude-desktop/app-session.ts:113`; invoked `src/claude-app.ts:237`).
+- [x] After a crash, a stale session is detected and cleaned up on next run, and `--restore` performs the same recovery (`hasStaleSession` + `recoverSession`, `src/claude-desktop/app-session.ts:67`, `:82`; `--restore` at `src/claude-app.ts:67`).
+- [x] A concurrent live session is detected and the second invocation aborts (`isConcurrentLiveSession`, `src/claude-desktop/app-session.ts:76`; `src/claude-app.ts:90`).
+- [x] Exit-hook cleanup is registered so an abrupt `process.exit` still restores config (`setupExitCleanup`, `src/claude-desktop/app-session.ts:119`; `src/claude-app.ts:204`).
+- [x] Both a single selected model and a favorites-only catalog can drive the gateway (`src/claude-app.ts:153`–`174`).
+- [x] The gateway is the shared `server` gateway serving `/anthropic`, not a bespoke proxy (`startServer` + `createGatewayModelCatalog`, `src/claude-app.ts:183`–`192`).
+- [x] Non-interactive terminals are rejected (`src/claude-app.ts:84`).
+---
+## Files
+| File | Role |
+|---|---|
+| `src/claude-app.ts` | Command entry: `runClaudeAppCommand`, help text, picker orchestration, server start, session wiring, shutdown |
+| `src/claude-desktop/app-config.ts` | 3P config paths, `buildRflectrConfig`, `writeRflectrConfig`, `_meta.json` read/write |
+| `src/claude-desktop/app-session.ts` | Lock file + backup/restore: `backupMetaJson`, `restoreMetaJson`, `removeRflectrConfig`, lock read/write, stale/concurrent detection, `cleanupSession`, `recoverSession`, `waitForShutdown`, `setupExitCleanup` |
+| `src/claude-desktop/app-launch.ts` | App discovery + launch/restart/quit per platform: `claudeAppSupported`, `findClaudeApp`, `isClaudeAppRunning`, `launchOrRestartClaudeApp`, `quitClaudeAppGracefully` |
+---
+## Risks & Known Limitations
+- **Config-editing exception to the env-only rule.** Unlike every CLI launcher, this surface writes the host application's config file. Reversibility depends entirely on the backup (`_meta.json.bak`) + lock (`.rflectr.lock`) machinery. If the backup or lock is lost, manual recovery (the documented 1P-revert procedure) is required.
+- **Backup granularity.** Only `_meta.json` is backed up; the injected `<uuid>.json` is removed by uuid on cleanup. If `_meta.json` is mutated by Claude Desktop itself between backup and restore, restore overwrites those changes with the pre-session snapshot.
+- **`process.on('exit')` constraints.** The exit hook runs synchronous cleanup; it cannot await async work. A hard kill (`SIGKILL`) bypasses both the SIGINT/SIGTERM handler and the exit hook, leaving a stale session for the next-run / `--restore` recovery to clean up.
+- **Linux unsupported.** `claudeAppSupported()` throws on any non-darwin/non-win32 platform.
+- **No mid-session model switch.** The catalog is fixed at launch; changing models means restarting the session.
+- **Chat tab unavailable in 3P mode** (Anthropic product constraint). Claude in Chrome is also incompatible with a gateway. Both documented in [`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md).
+- **Full 1P revert is manual.** rflectr restores the pre-session 3P config but does not return Claude Desktop to Anthropic sign-in; the user guide documents the multi-step manual revert.
+---
+## Related
+- [`harnesses.md`](../../../knowledge/private/integrations/harnesses.md) — private integration notes (the host-departure pattern, platform-differences table).
+- [`claude-desktop.md`](../../../knowledge/public/guides/claude-desktop.md) — public setup, gateway cheat sheet, restore-to-1P, troubleshooting.
+- [PRD-009 — Codex Integration](../prd-009-codex-integration/prd-009-codex-integration-index.md) — sibling desktop-app surface (`codex-app`) using the same config-patch + backup/restore-via-lock pattern.
+- [PRD-005 — Local Proxy & Catalog Routing](../prd-005-local-proxy-catalog-routing/prd-005-local-proxy-catalog-routing-index.md) — the proxy/translation machinery the gateway builds on.
+- [PRD-012 — Server Gateway](../prd-012-server-gateway/prd-012-server-gateway-index.md) — the `startServer` gateway that serves `/anthropic` for Claude Desktop.
+- [PRD-001 — CLI Core & Launch Orchestration](../prd-001-cli-core-launch-orchestration/prd-001-cli-core-launch-orchestration-index.md) — the env-only isolation contract this surface deliberately departs from.

package/library/requirements/completed/prd-012-server-gateway/prd-012-server-gateway-index.md ADDED Viewed

@@ -0,0 +1,356 @@
+# PRD-012: Server Gateway *(Retroactive)*
+> **Status:** Shipped
+> **Priority:** —
+> **Effort:** —
+> **Written:** June 2026
+> **Retroactive:** Yes — written after implementation (rflectr v0.2.7).
+> **Source:** `src/server/*`
+---
+## Overview
+Every other rflectr command is short-lived: it starts a proxy, spawns a coding
+agent as a child process, and tears everything down on exit. `rflectr server`
+inverts that model. It runs a **long-lived, foreground HTTP gateway** that
+exposes the same model backends — OpenCode Zen, OpenCode Go, every materialized
+local registry provider, or Claude on Google Vertex AI — behind both an
+**Anthropic-compatible** and an **OpenAI-compatible** endpoint on a single port
+(default **17645**).
+The gateway is the backend for [PRD-011 Claude Desktop Integration](../prd-011-claude-desktop-integration/prd-011-claude-desktop-integration-index.md)
+and for any tool that can be pointed at a base URL (e.g. THE AI Counsel, OpenAI-compatible
+editor extensions). It reuses the same translation core as the CLI proxies
+(PRD-004 / PRD-005): anthropic-format models forward raw to the provider's
+`/v1/messages`; openai-format models route through the shared Vercel AI SDK
+adapter via `createLanguageModel`. There is no second translation path.
+`runServerCommand(options)` (`src/server/index.ts:377`) drives an interactive
+wizard (which providers to expose, optional favorites-only catalog, discovery-id
+masking, local vs network bind, server password) and then `startServer()`
+(`src/server/router.ts:76`) listens until `Ctrl+C`.
+---
+## What Was Built
+- **`rflectr server`** — foreground gateway over Zen/Go cloud models plus every
+  local registry provider, served on one port as both Anthropic and OpenAI APIs
+  (`src/server/index.ts:377`, `src/server/router.ts:76`).
+- **`rflectr server --vertex`** — Claude on Google Vertex AI using local gcloud
+  Application Default Credentials, no OpenCode key required
+  (`src/server/index.ts:290`, `src/server/vertex-config.ts`).
+- **Unified model loading** — `loadServerModels()` merges Zen, Go, and local
+  provider models into a single `ServerModelInfo[]`, enriched with reasoning
+  metadata (`src/server/index.ts:155`).
+- **Per-endpoint routing** — `handleAnthropicMessages` and
+  `handleOpenAIChatCompletions` dispatch by `modelFormat`: anthropic → raw
+  forward; openai → SDK adapter with a per-`(model × npm × baseURL)`
+  `LanguageModel` cache (`src/server/router.ts:155`, `:242`, `:331`).
+- **Auth gate** — Bearer / `x-api-key` comparison against an optional server
+  password; `null` password (local mode) allows all callers
+  (`src/server/auth.ts:10`).
+- **Discovery-id masking** — self-inverse provider/model-slug reversal so vendor
+  names never appear literally in Claude Desktop / Cowork discovery ids
+  (`src/server/vendor-mask.ts:14`).
+- **Provider / favorites filtering** — expose a chosen subset of providers, or
+  only favorite models (`src/server/catalog-filter.ts`).
+- **Credential hygiene** — `GET /models` strips `apiKey` from every model entry;
+  header values are CR/LF-sanitized (`src/server/router.ts:125`, `:397`).
+---
+## Goals
+1. Serve every configured backend (Zen, Go, local registry providers, Vertex)
+   behind **one** local HTTP port that speaks **both** Anthropic and OpenAI wire
+   formats.
+2. Reuse the shared SDK translation core (PRD-004) and upstream-forward helpers
+   (PRD-005) — no gateway-specific translation logic.
+3. Make the gateway safe to expose on a LAN: optional server password, network
+   bind opt-in, credential stripping in catalog responses.
+4. Provide a discovery surface Claude Desktop / Cowork can consume, including
+   optional vendor-name masking.
+5. Offer a zero-OpenCode-key path to Claude via Vertex AI using existing gcloud
+   ADC.
+## Non-Goals
+- Process management / daemonization — the server runs in the foreground and
+  exits on `Ctrl+C` (`waitForShutdown` in `src/server/index.ts:189`). No
+  systemd unit, PID file, or background mode is shipped.
+- TLS termination — the gateway listens over plain HTTP; HTTPS is expected to be
+  handled by a front proxy if needed.
+- Accurate cost reporting for non-Anthropic models (inherited limitation from
+  the translation layer; Claude clients apply their own pricing table).
+- Rate limiting, request quotas, or multi-tenant key management.
+- Live context-window updates on `/model` switch (see Risks).
+---
+## Features
+| # | Feature | Where |
+|---|---------|-------|
+| F1 | Foreground gateway on port 17645, dual Anthropic + OpenAI endpoints | `src/server/router.ts:76`, `src/server/index.ts:450` |
+| F2 | Interactive wizard: start mode, favorites-only, exposed providers, masking | `src/server/index.ts:256`, `src/server/prompts.ts` |
+| F3 | Unified model load (Zen + Go + local providers) with reasoning enrichment | `src/server/index.ts:155`, `:176` |
+| F4 | Anthropic Messages relay (raw forward or SDK adapter by format) | `src/server/router.ts:155` |
+| F5 | OpenAI Chat Completions relay (direct relay or SDK adapter) | `src/server/router.ts:242` |
+| F6 | Gateway alias ids + bidirectional catalog lookup | `src/server/models.ts:114`, `:140` |
+| F7 | Discovery-id masking (self-inverse) | `src/server/vendor-mask.ts:14` |
+| F8 | Bearer / `x-api-key` auth with null-password local mode | `src/server/auth.ts:10` |
+| F9 | `apiKey` stripped from `GET /models` | `src/server/router.ts:125` |
+| F10 | Provider-subset and favorites-only filtering | `src/server/catalog-filter.ts:6`, `:15` |
+| F11 | Vertex AI mode via gcloud ADC | `src/server/index.ts:290`, `src/server/vertex-config.ts` |
+| F12 | Server-password save/reuse, network-bind opt-in | `src/server/index.ts:205`, `src/server/prompts.ts:51` |
+---
+## Architecture & Implementation
+### Request flow
+```mermaid
+flowchart TD
+    client["Claude Desktop / any tool"] --> ep{"method + path"}
+    ep -->|"GET /health"| health["{ ok: true }"]
+    ep -->|other| auth{"isAuthorized?"}
+    auth -->|no| u401["401 Unauthorized"]
+    auth -->|yes| route{"path"}
+    route -->|"GET /models"| list["catalog.list() — apiKey stripped"]
+    route -->|"GET /anthropic/v1/models"| amodels["formatGatewayAnthropicModels (optional mask)"]
+    route -->|"GET /openai/v1/models"| omodels["formatOpenAIModels"]
+    route -->|"POST /anthropic/v1/messages"| anth["handleAnthropicMessages"]
+    route -->|"POST /openai/v1/chat/completions"| oai["handleOpenAIChatCompletions"]
+    anth --> fmt{"model.modelFormat"}
+    oai --> fmt
+    fmt -->|anthropic| raw["raw forward → {baseUrl}/v1/messages"]
+    fmt -->|openai| sdk["createLanguageModel + SDK adapter"]
+```
+`routeRequest` (`src/server/router.ts:109`) handles `/health` **before** the auth
+check, then gates everything else through `isAuthorized` before dispatching by
+method + path.
+### Server model loading
+`loadServerModels()` (`src/server/index.ts:155`) calls
+`fetchProviderCatalog({ agent: 'server' })` and assembles one `ServerModelInfo[]`:
+- **Zen** models, filtered by the registry's `subscriptionFilter` (free-only when
+  configured) via `filterZenModelsForServer` (`src/server/index.ts:115`), then
+  mapped by `zenGoModelsToServerModels` (`src/provider-catalog.ts:259`).
+- **Go** models, filtered to drop `modelFormat === 'unsupported'` by
+  `usableGoModels` (`src/server/index.ts:123`), same mapper.
+- **Local registry providers**, mapped by `localProvidersToServerModels`
+  (`src/provider-catalog.ts:228`), each carrying `npm`, `apiBaseUrl`, `baseUrl`,
+  `completionsUrl`, `apiKey`, `authType`, and `oauthAccountId`.
+For Zen/Go, openai-format models get `npm = '@ai-sdk/openai-compatible'` and
+`apiBaseUrl = ${backend.baseUrl}/v1`; anthropic-format models stay raw
+passthrough (no `npm`) — matching the CLI catalog's `zenGoModelToRoute`
+(`src/provider-catalog.ts:273`).
+Every model is then passed through `enrichServerModelReasoning`
+(`src/server/index.ts:176`), which calls `getReasoningCapabilities` and stamps a
+`defaultEffort` fallback for openai-format models that declare one.
+### Catalog & gateway aliases (`src/server/models.ts`)
+`createGatewayModelCatalog(models, opts?)` (`:140`) builds a bidirectional
+lookup keyed by `model.id` **and** by the exposed gateway alias, so Claude
+clients (which only surface `claude-*` / `anthropic-*` ids) can address a model
+by either form.
+- `gatewayAliasId(model)` (`:114`) → `anthropic-{provider}__{model}` via
+  `aliasModelId` (from `src/proxy.ts`).
+- `exposedGatewayAliasId(model, opts?)` (`:118`) → masked alias when
+  `opts.maskGatewayIds`.
+- `gatewayDisplayName(model, opts?)` (`:124`) → `"Model Name"`, or
+  `"Model Name (Provider Label)"` when masking is on.
+- `upstreamModelId(model)` (`:159`) → strips a trailing `[1m]` context suffix
+  for the wire call.
+- `formatGatewayAnthropicModels` / `formatOpenAIModels` (`:129`, `:174`) build
+  the endpoint payloads; `formatAnthropicModelEntry` (`:58`) attaches
+  `context_window` / `max_input_tokens` via `resolveContextWindow`.
+### Routing — Anthropic messages (`src/server/router.ts:155`)
+1. Parse JSON body; look up the model in the catalog (`lookupModel`, `:307`).
+2. **anthropic format** (`:176`): validate `baseUrl` is `http(s)://`, compute
+   `{baseUrl}/v1/messages` (or the cloud backend's URL via `backendFor`, `:322`),
+   forward the body verbatim — swapping in `upstreamModelId(model)` and relaying
+   the inbound `anthropic-beta` header — through `postJsonUpstream`
+   (shared with PRD-005's `upstream-forward.ts`).
+3. **openai format** (`:192`): guard with `isSdkMigratedNpm(model.npm)`; init or
+   reuse a cached `LanguageModel` (`getOrInitLanguageModel`, `:331`);
+   `sdkTranslateRequest` → `streamAnthropicResponse` (SSE) or
+   `generateAnthropicResponse`. The response `model` field is set to the masked
+   display name when masking is on (`getResponseModelId`, `:363`) so Claude
+   Desktop's status chip shows a human-readable name.
+### Routing — OpenAI chat completions (`src/server/router.ts:242`)
+- `supportsDirectOpenAIChatCompletions(model)` (`src/server/models.ts:165`) is
+  true for openai-format models with a `completionsUrl` or a Zen/Go backend — those
+  relay raw through `relayAnthropicMessages` to `{completionsUrl|backend}/v1/chat/completions`.
+- Otherwise the request goes through the SDK adapter: `translateOpenAiRequest` →
+  `streamOpenAiResponse` / `generateOpenAiResponse` (`src/openai-adapter.ts`).
+### LanguageModel cache (`src/server/router.ts:331`)
+`getOrInitLanguageModel` keys the cache on
+`providerId/sourceBackend ∣ id ∣ upstreamModelId ∣ npm ∣ baseURL` (joined with
+`\x1f`) so a given model is instantiated once per process and reused across
+requests.
+### Auth (`src/server/auth.ts`)
+`isAuthorized(request, serverPassword)` (`:10`) returns `true` immediately when
+`serverPassword === null` (local mode). Otherwise it accepts a `Bearer` token
+(`extractBearerToken`, `:19`) **or** an `x-api-key` header, each passed through
+`sanitizeCredential` (first non-empty line only, `:4`). In **network mode** the
+wizard requires a server password; it is the only gate once the port is reachable
+beyond localhost, so it must be treated as a real secret. Incoming header values
+are CR/LF-stripped in `sanitizeIncomingHeaderValue` (`src/server/router.ts:397`).
+### Vendor masking (`src/server/vendor-mask.ts`)
+`maskGatewayModelId(aliasId)` (`:14`) reverses the provider-slug and
+model-suffix segments of `anthropic-{provider}__{model}`. It is **self-inverse**
+— `unmaskGatewayModelId` (`:24`) calls the same function. The masked catalog
+registers all of `model.id`, the masked alias, and the raw alias so chat
+requests resolve regardless of which id the client sends
+(`createGatewayModelCatalog`, `src/server/models.ts:146`).
+### Vertex mode (`src/server/index.ts:290`, `src/server/vertex-config.ts`)
+`runVertexServerCommand` exposes **Claude on Vertex AI** without an OpenCode key:
+- `buildVertexRuntimeConfig(env?)` (`vertex-config.ts:107`) resolves project
+  (`ANTHROPIC_VERTEX_PROJECT_ID` → `GOOGLE_CLOUD_PROJECT` → `GOOGLE_VERTEX_PROJECT`)
+  and location (`GOOGLE_CLOUD_LOCATION` → `CLOUD_ML_REGION` → `GOOGLE_VERTEX_LOCATION`
+  → `global`); returns `null` if no project is set.
+- `hasApplicationDefaultCredentials()` (`:66`) checks
+  `GOOGLE_APPLICATION_CREDENTIALS` or `~/.config/gcloud/application_default_credentials.json`.
+- `vertexModelsToServerModels(config)` (`:118`) builds `ServerModelInfo[]` routed
+  through `@ai-sdk/google-vertex/anthropic` (`VERTEX_ANTHROPIC_NPM`,
+  `modelFormat: 'openai'`, `sourceBackend: 'vertex'`).
+- `createVertexModelCatalog(models)` (`:159`) adds short aliases (`sonnet` /
+  `haiku` / `opus`) and `[1m]` context variants, resolving client lookups via
+  `vertexClientModelLookupCandidates` (`:139`). Defaults: `claude-sonnet-4-6`,
+  `claude-opus-4-6`, `claude-haiku-4-5`; overridable at
+  `~/.rflectr/vertex-models.json`.
+The Vertex server starts with `apiKey: 'vertex-local'` and passes a `vertex`
+config (`{ project, location }`) to `startServer`, which threads it into
+`createLanguageModel` (`src/server/router.ts:331`, `:348`).
+---
+## API Surface
+Base URLs for clients:
+- Anthropic: `http://127.0.0.1:17645/anthropic`
+- OpenAI: `http://127.0.0.1:17645/openai/v1`
+> Do **not** append `/v1` to the Anthropic base URL — the Anthropic SDK adds API
+> paths itself.
+| Method + path | Purpose | Source |
+|---|---|---|
+| `GET /health` | Liveness `{ ok: true }` (pre-auth) | `src/server/router.ts:114` |
+| `GET /models` | Raw catalog, `apiKey` stripped | `src/server/router.ts:124` |
+| `GET /anthropic/v1/models` | Anthropic-format list (optionally masked) | `src/server/router.ts:129` |
+| `GET /openai/v1/models` | OpenAI-format list | `src/server/router.ts:134` |
+| `POST /anthropic/v1/messages` | Anthropic Messages relay | `src/server/router.ts:139` |
+| `POST /openai/v1/chat/completions` | OpenAI Chat Completions relay | `src/server/router.ts:144` |
+`POST /anthropic/v1/messages` honors `stream` (SSE when true), supports both
+streaming and non-streaming for anthropic-format (raw forward) and openai-format
+(SDK adapter) models, and relays the inbound `anthropic-beta` header on raw
+forwards. Unknown / unsupported models return `400`; upstream/SDK errors surface
+as `502`.
+---
+## Acceptance Criteria
+- [x] `rflectr server` starts a foreground HTTP gateway on port 17645 serving
+      both `/anthropic` and `/openai/v1` endpoints (`src/server/index.ts:450`,
+      `src/server/router.ts:76`).
+- [x] `loadServerModels()` merges Zen, Go, and local registry provider models
+      into one `ServerModelInfo[]` (`src/server/index.ts:155`).
+- [x] Local providers are appended carrying `npm` / `apiBaseUrl` / `baseUrl` /
+      `completionsUrl` / `apiKey` (`src/provider-catalog.ts:228`).
+- [x] `handleAnthropicMessages` raw-forwards anthropic-format models to
+      `{baseUrl}/v1/messages` (`src/server/router.ts:176`).
+- [x] openai-format models route through the `isSdkMigratedNpm` guard →
+      `createLanguageModel` + `streamAnthropicResponse` /
+      `generateAnthropicResponse` (`src/server/router.ts:192`).
+- [x] `GET /models` strips `apiKey` from every entry (`src/server/router.ts:125`).
+- [x] Auth accepts `Bearer` or `x-api-key`; `null` password allows all callers
+      (`src/server/auth.ts:10`).
+- [x] Discovery-id masking reverses provider/model segments and is self-inverse
+      (`src/server/vendor-mask.ts:14`).
+- [x] Provider-subset and favorites-only filtering are available in the wizard
+      (`src/server/catalog-filter.ts`, `src/server/index.ts:405`).
+- [x] `rflectr server --vertex` exposes Claude on Vertex AI via gcloud ADC with
+      no OpenCode key (`src/server/index.ts:290`, `src/server/vertex-config.ts:66`).
+- [x] Per-`(model × npm × baseURL)` `LanguageModel` cache reuses instances across
+      requests (`src/server/router.ts:331`).
+- [x] `/health` is reachable without auth (`src/server/router.ts:114`).
+- [x] Network mode requires a server password before binding to `0.0.0.0`
+      (`src/server/index.ts:205`, `:394`).
+---
+## Files
+| File | Role |
+|------|------|
+| `src/server/index.ts` | Command entry, wizard, `loadServerModels`, reasoning enrichment, Vertex command, startup output |
+| `src/server/router.ts` | HTTP server, routing, Anthropic + OpenAI handlers, LanguageModel cache, header sanitization |
+| `src/server/models.ts` | `ServerModelInfo`, catalog builders, gateway aliases, display names, endpoint payload formatters |
+| `src/server/auth.ts` | `isAuthorized`, `extractBearerToken`, `sanitizeCredential` |
+| `src/server/catalog-filter.ts` | Provider-subset / favorites filtering, provider summary |
+| `src/server/provider-select.ts` | Interactive exposed-providers picker |
+| `src/server/vendor-mask.ts` | Self-inverse discovery-id masking |
+| `src/server/vertex-config.ts` | Vertex runtime config, ADC detection, Vertex model catalog |
+| `src/server/prompts.ts` | Wizard prompts (start mode, listen mode, password, masking, favorites) |
+| `src/provider-catalog.ts` | `zenGoModelsToServerModels`, `localProvidersToServerModels` (shared) |
+| `src/upstream-forward.ts` | `postJsonUpstream`, `relayAnthropicMessages` (shared with proxy) |
+---
+## Risks & Known Limitations
+- **No TLS / no daemonization.** Plain HTTP, foreground only. A LAN deployment
+  relies entirely on the server password as its sole access gate.
+- **Server password is the only network gate.** Once bound to `0.0.0.0`, any
+  caller with the password reaches every exposed provider's upstream key. Treat
+  it as a real secret.
+- **Cost display inaccurate for non-Anthropic models** — Claude clients apply
+  their own pricing table; the gateway cannot correct it.
+- **Context window reflects launch state.** Discovery payloads carry a static
+  `context_window`; a live `/model` switch in a Claude client does not refresh it.
+- **OAuth-only local providers** with no stored key are skipped upstream of this
+  gateway (PRD-002), so they never appear in the catalog.
+- **Vertex auth beyond ADC** (impersonation, workload identity) is not handled —
+  only `GOOGLE_APPLICATION_CREDENTIALS` or the default ADC file are detected.
+- **`::ts::` / `[1m]` string conventions** are inherited from the translation and
+  alias layers; the same edge-case caveats apply.
+---
+## Related
+- [PRD-002: Provider Registry](../prd-002-provider-registry/prd-002-provider-registry-index.md) — local provider discovery feeding `localProvidersToServerModels`.
+- [PRD-003: Model Discovery & Classification](../prd-003-model-discovery-classification/prd-003-model-discovery-classification-index.md) — `ModelInfo` source for `zenGoModelsToServerModels`.
+- [PRD-004: Translation Layer](../prd-004-translation-layer/prd-004-translation-layer-index.md) — the shared SDK adapter (`createLanguageModel`, `streamAnthropicResponse`).
+- [PRD-005: Local Proxy & Catalog Routing](../prd-005-local-proxy-catalog-routing/prd-005-local-proxy-catalog-routing-index.md) — shared `upstream-forward.ts` and `aliasModelId`.
+- [PRD-011: Claude Desktop Integration](../prd-011-claude-desktop-integration/prd-011-claude-desktop-integration-index.md) — primary consumer of this gateway.
+- Knowledge: [Server Gateway (private)](../../../knowledge/private/infrastructure/server-gateway.md) · [API Server guide (public)](../../../knowledge/public/guides/api-server.md)

package/library/requirements/completed/prd-012-server-gateway/qa/.gitkeep ADDED Viewed

File without changes

package/library/requirements/in-work/README.md CHANGED Viewed

@@ -1,19 +1,19 @@
----
-ai_description: |
-  Contains PRD folders actively being implemented. A folder lives here
-  from the moment implementation begins until the work ships.
-  Structure inside is identical to backlog/: prd-<###>-<slug>/index + sub-PRDs + qa/.
-  To promote: move entire prd-<###>-<slug>/ folder to completed/.
-  Do NOT create new PRD folders here; create them in backlog/ first,
-  then move to in-work/ when implementation starts.
-human_description: |
-  PRDs currently being implemented. Do not start new PRDs here —
-  create them in backlog/ and move the folder here when work begins.
-  When work ships, move the entire folder to completed/.
----
-# Requirements — In Work
-PRDs currently being implemented. Folder location = lifecycle state.
-Move an entire `prd-<###>-<slug>/` folder **from** `backlog/` → here when implementation starts, and **from** here → `completed/` when the work ships.
+---
+ai_description: |
+  Contains PRD folders actively being implemented. A folder lives here
+  from the moment implementation begins until the work ships.
+  Structure inside is identical to backlog/: prd-<###>-<slug>/index + sub-PRDs + qa/.
+  To promote: move entire prd-<###>-<slug>/ folder to completed/.
+  Do NOT create new PRD folders here; create them in backlog/ first,
+  then move to in-work/ when implementation starts.
+human_description: |
+  PRDs currently being implemented. Do not start new PRDs here —
+  create them in backlog/ and move the folder here when work begins.
+  When work ships, move the entire folder to completed/.
+---
+# Requirements — In Work
+PRDs currently being implemented. Folder location = lifecycle state.
+Move an entire `prd-<###>-<slug>/` folder **from** `backlog/` → here when implementation starts, and **from** here → `completed/` when the work ships.