npm - @legioncodeinc/rflectr - Versions diffs - 0.1.1 → 0.1.2 - Mend

@legioncodeinc/rflectr 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

package/library/requirements/completed/prd-003-model-discovery-classification/prd-003-model-discovery-classification-index.md DELETED Viewed

@@ -1,260 +0,0 @@
-# PRD-003: Model Discovery & Classification *(Retroactive)*
-> **Status:** Shipped
-> **Priority:** —
-> **Effort:** —
-> **Written:** June 2026
-> **Retroactive:** Yes — written after implementation (rflectr v0.2.7).
-> **Source:** `src/models.ts`, `src/model-compatibility.ts`, `src/context-window.ts`, `src/reasoning-capabilities.ts`
----
-## Overview
-`rflectr` re-points Claude Code (and the other agents) at alternative model backends. Before a user can pick a model, the launcher has to build a list of models that are actually usable, decide for each one whether it can be forwarded raw to an Anthropic-compatible endpoint or must be translated, and stamp it with the metadata Claude Code needs to render an accurate status bar (context window, cost, reasoning controls).
-This PRD documents that subsystem: how the cloud (OpenCode Zen/Go) model list is assembled from two merged sources, how every model's `modelFormat` is classified, how context windows and reasoning capabilities are resolved, and how unusable models (incompatible, deprecated, stale-free) are filtered out.
-The single most load-bearing output is `modelFormat`. It is the branch point for the entire launch flow — `'anthropic'` means direct passthrough, anything else routes through the SDK adapter proxy (see [PRD-004 — Translation Layer](../prd-004-translation-layer/prd-004-translation-layer-index.md)).
-Knowledge doc: [`model-discovery-classification.md`](../../../knowledge/private/ai/model-discovery-classification.md).
----
-## What Was Built
-- A **two-source merge** that builds the cloud model list from a live API call (`GET {backendUrl}/v1/models`, no auth) enriched by the local OpenCode CLI cache (`~/.cache/opencode/models.json`) — `src/models.ts:131` (`getModels`), `src/models.ts:85` (`mergeModels`).
-- A pure **format classifier** `classifyModelFormat(modelId, providerNpm)` returning `'anthropic' | 'openai' | 'unsupported'` from provider npm first, then ID-prefix heuristics — `src/constants.ts:58`.
-- A **Go-backend override** that demotes any `'anthropic'` classification to `'openai'`, because the Go gateway is OpenAI-compatible and an `@ai-sdk/anthropic` npm in the cache is a metadata error — `src/models.ts:50`, `src/models.ts:95`.
-- **`sourceBackend` stamping** so a combined Zen-free + Go-paid list (the `go` tier) can resolve the correct base URL per selected model — `src/models.ts:51`, `src/models.ts:96`.
-- **Context-window resolution** from cache `limit.context` → ID heuristics → 200K default, plus a `[1m]` suffix convention for million-token variants — `src/context-window.ts`, `src/context-model-id.ts`.
-- **Reasoning-capability metadata** (levels, default level, mode, wire format) resolved per provider npm + model id — `src/reasoning-capabilities.ts`, `src/provider-factory.ts:469`.
-- **Incompatibility filtering** via a curated JSON blacklist plus conservative models.dev capability checks — `src/model-compatibility.ts`, `src/data/model-incompatible.json`.
-- **Graceful degradation**: API failure falls back to the cache, then to caller-supplied fallback models, then errors — `src/models.ts:140`.
----
-## Goals
-- Show the user only models that will actually work in a coding agent (no embedding/image/audio/video/deprecated/managed-agent models).
-- Decide passthrough-vs-translate deterministically and at the model level, not the provider level.
-- Render an accurate remaining-context status bar in Claude Code for non-Anthropic models.
-- Surface reasoning-effort controls where the provider supports them.
-- Never hard-depend on the OpenCode cache: it is enrichment, not a runtime requirement.
-- Degrade gracefully when the model API is unreachable.
-## Non-Goals
-- Live `/model` switching context-window updates — fixed at launch in switch-menu mode (documented limitation).
-- Accurate cost display for non-Anthropic models — Claude Code owns its pricing table; this is unfixable from rflectr.
-- Registry-provider model materialization — owned by [PRD-002 — Provider Registry](../prd-002-provider-registry/prd-002-provider-registry-index.md). This PRD covers the cloud Zen/Go path; registry models arrive pre-stamped.
-- Backend selection / tier logic — owned by [PRD-008 — Preferences, Tiers & Favorites](../prd-008-preferences-tiers-favorites/prd-008-preferences-tiers-favorites-index.md). This PRD consumes `backend.id` / `sourceBackend`.
-- The translation that happens *after* a model is classified `'openai'` — owned by [PRD-004 — Translation Layer](../prd-004-translation-layer/prd-004-translation-layer-index.md).
----
-## Features
-| # | Feature | Source |
-|---|---------|--------|
-| F1 | Two-source merge (live API ids + OpenCode cache enrichment) | `src/models.ts:85`, `src/models.ts:131` |
-| F2 | `classifyModelFormat` (npm → id-prefix heuristic) | `src/constants.ts:58` |
-| F3 | Go-backend anthropic→openai demotion | `src/models.ts:50`, `src/models.ts:95` |
-| F4 | `sourceBackend` per-model stamping | `src/models.ts:51`, `src/models.ts:96` |
-| F5 | `isFree` detection (cost.input == 0 && cost.output == 0) | `src/models.ts:44` |
-| F6 | Brand derivation for grouping | `src/models.ts:23` (`deriveBrand`) |
-| F7 | Context-window resolution (cache → heuristics → 200K) | `src/context-window.ts:131` |
-| F8 | `[1m]` context-suffix convention | `src/context-model-id.ts` |
-| F9 | Reasoning-capability metadata resolution | `src/reasoning-capabilities.ts:24`, `src/provider-factory.ts:469` |
-| F10 | Curated incompatibility blacklist filtering | `src/model-compatibility.ts:46`, `src/data/model-incompatible.json` |
-| F11 | models.dev conservative capability filtering | `src/model-compatibility.ts:60`, `src/registry/models-dev.ts:229` |
-| F12 | Deprecated/stale model exclusion | `src/models.ts:43` (cache `status`), `model-incompatible.json` (`stale_promotion`) |
-| F13 | Graceful fallback chain on API failure | `src/models.ts:140` |
----
-## Architecture & Implementation
-### The two-source merge
-The cloud (OpenCode Zen / Go) model list is built from two sources merged together in `getModels` — `src/models.ts:131`:
-1. **Primary — live API.** `fetchModelsFromApi(backend)` does `GET {backend.baseUrl}/v1/models` (5s abort timeout, no real auth — sends a placeholder `Bearer test`) and returns the available model ids — `src/models.ts:69`. This is the authority on *what exists*.
-2. **Enrichment — OpenCode cache.** `readModelsFromCache(backendId)` reads `~/.cache/opencode/models.json` (path `OPENCODE_CACHE_PATH`, `src/constants.ts:48`) and indexes the entries under the `opencode` (Zen) or `opencode-go` (Go) provider keys — `src/models.ts:31`. Each cache entry supplies `name`, `family`, `cost`, `provider.npm`, and `limit.context`. The cache is optional; a missing or unparseable file simply yields `null` (`loadOpencodeCache`, `src/context-window.ts:63`).
-`mergeModels(apiIds, cache, backendId)` — `src/models.ts:85` — drives the join:
-```
-for each apiId:
-  if shouldHideModel(...) → drop                       (incompatibility filter)
-  if cache has apiId      → spread cached ModelInfo, re-stamp sourceBackend + modelFormat
-  else                    → synthesize a bare ModelInfo (name = id, brand 'Other',
-                            classifyModelFormat(id, undefined), resolveContextWindow(id))
-```
-So the API ids are the spine; the cache only *enriches* ids that already appear in the live response. An id present only in the cache is never shown.
-```mermaid
-flowchart TD
-    api["GET {backendUrl}/v1/models — available ids (no auth)"]
-    cache["~/.cache/opencode/models.json — name, family, cost, npm, limit.context"]
-    api --> filter["shouldHideModel() — blacklist + models.dev"]
-    filter --> merge["mergeModels()"]
-    cache --> merge
-    merge --> out["ModelInfo[] — modelFormat, sourceBackend, contextWindow, cost, isFree"]
-```
-### The classification heuristic
-`classifyModelFormat(modelId, providerNpm)` — `src/constants.ts:58` — is a pure function. Provider npm (from the cache) wins; if absent, an ID-prefix heuristic is applied to the lowercased id:
-| Order | Condition | Result | Rationale |
-|---|---|---|---|
-| 1 | `providerNpm === '@ai-sdk/anthropic'` | `anthropic` | Direct Anthropic passthrough. |
-| 2 | `providerNpm === '@ai-sdk/openai'` | `unsupported` | Cloud Zen/Go proxy can't do model-specific OpenAI endpoints. |
-| 3 | `providerNpm === '@ai-sdk/google'` | `unsupported` | Needs Gemini-specific endpoints the cloud path lacks. |
-| 4 | id starts with `claude-` | `anthropic` | Heuristic fallback when no cache npm. |
-| 5 | id starts with `gpt-` | `unsupported` | Heuristic fallback. |
-| 6 | id starts with `gemini-` | `unsupported` | Heuristic fallback. |
-| 7 | (everything else) | `openai` | Catch-all → SDK adapter via local proxy. |
-**Nuance:** `unsupported` is a *cloud-wizard* restriction, not a global one. The cloud OpenCode Zen/Go proxy layer can't reach GPT/Gemini directly, so those are hidden in the wizard. The same GPT/Gemini models *are* usable through the **local OpenAI / Google provider** (PRD-002), which carries the real `@ai-sdk/openai` / `@ai-sdk/google` npm and routes through the SDK adapter normally.
-### The Go-backend override
-The Go gateway is OpenAI-compatible. If the cache labels a Go model `@ai-sdk/anthropic` (a metadata error), `mergeModels` and `readModelsFromCache` both demote `'anthropic'` → `'openai'` so it routes through the translation proxy rather than attempting a raw Anthropic passthrough that would fail — `src/models.ts:50` and `src/models.ts:95`.
-### `sourceBackend` and the combined-list problem
-Every `ModelInfo` carries `sourceBackend: 'zen' | 'go'`, set from the backend that was queried (`src/models.ts:51`, `:96`). The `go` subscription tier shows Zen *free* models **and** Go *paid* models in one combined list; `sourceBackend` is what lets the launcher set the correct `ANTHROPIC_BASE_URL` per selected model rather than per session.
-### `isFree` and brand grouping
-`isFree` is true when the cache cost is explicitly `input === 0 && output === 0` — `src/models.ts:44`. Bare (cache-miss) models default `isFree: false`. `deriveBrand(family)` maps a family string to a display brand via the `BRAND_MAP` prefix table (Claude, GPT, Gemini, DeepSeek, Qwen, MiniMax, Kimi, GLM, MiMo, Grok, Nemotron, else `'Other'`) — `src/models.ts:9`. `groupModels` then splits free models out and buckets the rest by brand, each sorted by id — `src/models.ts:111`.
-### Context-window resolution
-`resolveContextWindow(modelId, explicit?)` — `src/context-window.ts:131` — resolves in priority order:
-1. An explicit positive value (cache `limit.context`, or a pre-resolved number) wins.
-2. Otherwise `lookupContextWindow` consults a model-id → context map built from the cache. The map prefers the `opencode` / `opencode-go` provider keys (`CACHE_PROVIDER_PRIORITY`), then falls back to the max context seen across any provider for that id — `src/context-window.ts:75`.
-3. Otherwise ID-pattern heuristics (`HEURISTIC_RULES`, ordered most-specific-first) — `src/context-window.ts:30`.
-4. Otherwise `DEFAULT_CONTEXT_WINDOW = 200_000` — Claude Code's own fallback for unknown models — `src/context-window.ts:12`.
-The resolved window is written to `CLAUDE_CODE_MAX_CONTEXT_TOKENS` (by `buildChildEnv`) and emitted in the proxy's synthetic `GET /v1/models` (`context_window` per model) so the host can render remaining context.
-### The `[1m]` suffix convention
-Claude Code treats third-party routes as 200K unless the model id ends with `[1m]` — `src/context-model-id.ts:3`. `claudeCodeClientModelId(modelId, contextWindow?)` strips any existing suffix, resolves the real window, and re-appends `[1m]` when the window exceeds 200K so the host renders the larger window — `src/context-model-id.ts:17`. `routeLookupIds` generates the suffix/`models/`-prefix variants so inbound ids still match catalog aliases — `src/context-model-id.ts:27`.
-### Reasoning-capability metadata
-`resolveReasoningCapabilities({ npm, modelId, ...metadata })` — `src/reasoning-capabilities.ts:24` — delegates to `getReasoningCapabilities(npm, modelId, metadata)` in `src/provider-factory.ts:469`. The result (`ReasoningCapabilities`, `src/provider-factory.ts:206`) carries effort `levels`, a `defaultLevel`, `supportsSummaries`, a `mode` (`none | internal-only | controllable`), a provenance `source`, a `confidence`, and an optional `wireFormat` discriminated union (openrouter / openai-effort / anthropic-thinking / google-thinking-config / mistral-effort / deepseek-thinking) — `src/provider-factory.ts:187`. Per-provider effort vocabularies differ (e.g. OpenAI `low/medium/high/xhigh`, xAI `none/low/medium/high`, DeepSeek `high/max/off`) — `src/provider-factory.ts:216`.
-### Incompatibility filtering
-`shouldHideModel(ctx)` — `src/model-compatibility.ts:68` — hides a model when `hideReason` returns non-null. Two layers:
-1. **Curated blacklist** (`src/data/model-incompatible.json`) keyed by `{provider, modelId, agents?}`. `provider: '*'` matches any provider; an absent/empty `agents` matches every agent. Categories include `image_generation`, `audio_only`, `video_generation`, `embedding`, `managed_agent`, `deprecated`, `gated_access`, and `stale_promotion` — `src/model-compatibility.ts:46`.
-2. **models.dev conservative capabilities** — `shouldHideByModelsDevCapabilities` hides a model only when its models.dev row is explicit: non-text-only output modalities, `tool_call === false`, or `interactions === true && chat === false` — `src/registry/models-dev.ts:229`.
-`mergeModels` calls this with `agent: 'claude'` for the cloud path — `src/models.ts:91`. Cache entries with `status === 'deprecated'` are dropped at read time too — `src/models.ts:43`.
-### Stale-free models
-The knowledge doc and `CLAUDE.md` reference a `STALE_FREE_MODELS` constant for models whose free promotion ended but the API still returns. In the shipped v0.2.7 tree this responsibility lives in the **incompatibility blacklist** instead: `qwen3.6-plus-free` is listed with `category: "stale_promotion"` (`src/data/model-incompatible.json`) and filtered through the same `shouldHideModel` path. There is no separate `STALE_FREE_MODELS` symbol in `src/constants.ts` in this version.
-### Graceful degradation
-`getModels` — `src/models.ts:131` — tries the live API first. On any throw it falls back: cache (if non-empty) → caller `fallbackModels` (if any) → a thrown error instructing the user to check network / OpenCode status. The returned `fromCache` flag tells callers the list is stale.
----
-## Data Shapes
-`ModelInfo` (cloud Zen/Go path) — `src/types.ts:22`:
-| Field | Type | Notes |
-|---|---|---|
-| `id` | `string` | Model id from the API. |
-| `name` | `string` | Display name (cache `name`, else id). |
-| `isFree` | `boolean` | Cache cost both zero. |
-| `brand` | `string` | From `deriveBrand(family)`. |
-| `sourceBackend` | `'zen' \| 'go'` | Backend that was queried; drives base-URL selection. |
-| `modelFormat` | `'anthropic' \| 'openai' \| 'unsupported'` | Launch branch point (`ModelFormat`, `src/types.ts:5`). |
-| `cost` | `ModelCost?` | `{ input, output, cache_read?, cache_write? }`. |
-| `contextWindow` | `number?` | Resolved window for the status bar. |
-`ModelFormat` semantics:
-| Value | Meaning |
-|---|---|
-| `anthropic` | Direct passthrough to the provider's Anthropic endpoint. |
-| `openai` | Routed through the SDK adapter via the local proxy. |
-| `unsupported` | Hidden in the cloud OpenCode wizard only (not a global ban). |
-`ReasoningCapabilities` — `src/provider-factory.ts:206`: `{ levels: string[], defaultLevel: string, supportsSummaries: boolean, mode: 'none'|'internal-only'|'controllable', source, confidence, wireFormat? }`.
-`IncompatibleModelEntry` — `src/model-compatibility.ts:20`: `{ provider, modelId, category, reason, agents?, sources?, verifiedAt? }`.
----
-## Acceptance Criteria
-- [x] Cloud model list is built by merging live API ids with OpenCode cache enrichment — `src/models.ts:131`, `:85`.
-- [x] API ids are authoritative; cache-only ids are not shown — `src/models.ts:90`.
-- [x] OpenCode cache is optional — a missing/unparseable file degrades to `null` without error — `src/context-window.ts:63`.
-- [x] `classifyModelFormat` resolves npm first, then id-prefix heuristics, returning one of three `ModelFormat` values — `src/constants.ts:58`.
-- [x] Go backend demotes `anthropic` → `openai` — `src/models.ts:50`, `:95`.
-- [x] `sourceBackend` is stamped on every model for per-model base-URL resolution — `src/models.ts:51`, `:96`.
-- [x] `isFree` is true only when cache cost input and output are both 0 — `src/models.ts:44`.
-- [x] Context window resolves cache `limit.context` → cache index → id heuristics → 200K default — `src/context-window.ts:131`.
-- [x] `[1m]` suffix is appended for >200K windows so Claude Code renders the larger window — `src/context-model-id.ts:17`.
-- [x] Reasoning capabilities are resolved per npm + model id with a per-provider wire format — `src/reasoning-capabilities.ts:24`, `src/provider-factory.ts:469`.
-- [x] Incompatible models (image/audio/video/embedding/managed-agent/deprecated/gated/stale) are filtered via the curated blacklist — `src/model-compatibility.ts:46`, `src/data/model-incompatible.json`.
-- [x] models.dev capability checks hide non-text/no-tool/interactions-only models conservatively — `src/registry/models-dev.ts:229`.
-- [x] Cache entries marked `status: 'deprecated'` are dropped at read time — `src/models.ts:43`.
-- [x] API failure degrades to cache → fallback models → error, with a `fromCache` flag — `src/models.ts:140`.
----
-## Files
-**Primary**
-- `src/models.ts` — two-source merge, `getModels`, `mergeModels`, `readModelsFromCache`, `fetchModelsFromApi`, `deriveBrand`, `groupModels`, `isFree`, `sourceBackend`, Go override.
-- `src/constants.ts` — `classifyModelFormat` (`:58`), `OPENCODE_CACHE_PATH` (`:48`).
-- `src/context-window.ts` — `resolveContextWindow`, `lookupContextWindow`, `buildContextWindowIndex`, `contextWindowFromHeuristics`, `loadOpencodeCache`, `DEFAULT_CONTEXT_WINDOW`.
-- `src/context-model-id.ts` — `[1m]` suffix handling, `claudeCodeClientModelId`, `routeLookupIds`.
-- `src/reasoning-capabilities.ts` — `resolveReasoningCapabilities`, `effortProviderOptions` (re-exports from `provider-factory`).
-- `src/model-compatibility.ts` — `shouldHideModel`, `hideReason`, `findBlacklistEntry`.
-**Supporting**
-- `src/data/model-incompatible.json` — curated incompatibility blacklist (image/audio/video/embedding/managed-agent/deprecated/gated/stale entries).
-- `src/data/models-dev-cache.json` — relayed models.dev snapshot used for conservative capability filtering and enrichment.
-- `src/data/pricing-cache.json` — relayed pricing snapshot.
-- `src/registry/models-dev.ts` — `findModelsDevModel`, `loadModelsDevCache`, `shouldHideByModelsDevCapabilities`.
-- `src/provider-factory.ts` — `getReasoningCapabilities`, `ReasoningCapabilities` / `ReasoningMetadata` types, effort vocabularies, wire formats.
-- `src/types.ts` — `ModelInfo`, `ModelFormat`, `ModelCost`.
----
-## Risks & Known Limitations
-- **Cost display is inaccurate for non-Anthropic models.** Claude Code applies its own internal pricing table to whatever model id it sees, so the cost shown for a Groq/DeepSeek/Gemini model is wrong. Unfixable from rflectr — the host owns its pricing display.
-- **GPT and Gemini models are `unsupported` in the cloud wizard.** They are hidden from the OpenCode Zen/Go wizard because the cloud proxy layer can't reach those model-specific endpoints. Workaround is the local OpenAI/Google provider (PRD-002), which routes through the SDK adapter.
-- **Context window is fixed at launch in switch-menu mode.** Claude Code's gateway model discovery carries only id + display name (no `context_window`) and fetches `/v1/models` once at startup, so a live `/model` switch does not update the displayed window. Single-model launches show the correct window.
-- **OpenCode cache is enrichment, not authority.** Without it, models still appear (id-only) but lose `name`, `family`, `cost`, and accurate per-model context/format from npm — classification then relies on id-prefix heuristics, which can misclassify novel ids.
-- **Heuristic windows can drift.** `HEURISTIC_RULES` and the brand map are hand-maintained; a new model family with no cache entry falls to the 200K default or a coarse pattern match until the rules are updated.
-- **Blacklist is point-in-time.** Entries carry `verifiedAt` dates (mostly 2026-06); a model that changes capabilities upstream won't be re-evaluated until the JSON is updated.
-- **`STALE_FREE_MODELS` is documented but not present as a constant** in v0.2.7 — the stale-free responsibility moved into `model-incompatible.json` (`category: stale_promotion`). Docs referencing the constant are slightly ahead of/behind the code.
----
-## Related
-- [Knowledge: Model Discovery & Classification](../../../knowledge/private/ai/model-discovery-classification.md)
-- [PRD-002 — Provider Registry](../prd-002-provider-registry/prd-002-provider-registry-index.md) — registry providers arrive with models pre-stamped (`modelFormat`, `npm`, `contextWindow`, `reasoning`).
-- [PRD-004 — Translation Layer](../prd-004-translation-layer/prd-004-translation-layer-index.md) — what happens after a model is classified `'openai'`.
-- [PRD-008 — Preferences, Tiers & Favorites](../prd-008-preferences-tiers-favorites/prd-008-preferences-tiers-favorites-index.md) — subscription tiers that drive which backends are queried and combined.

package/library/requirements/completed/prd-003-model-discovery-classification/qa/.gitkeep DELETED Viewed

File without changes

package/library/requirements/completed/prd-004-translation-layer/prd-004-translation-layer-index.md DELETED Viewed

@@ -1,196 +0,0 @@
-# PRD-004: Translation Layer (Vercel AI SDK Adapter) *(Retroactive)*
-> **Status:** Shipped
-> **Priority:** —
-> **Effort:** —
-> **Written:** June 2026
-> **Retroactive:** Yes — written after implementation (rflectr v0.2.7).
-> **Source:** `src/sdk-adapter.ts`, `src/provider-factory.ts`, `src/openai-adapter.ts`, `src/gemini-parts.ts`
----
-## Overview
-A Claude Code / Codex / Gemini host speaks one wire format (Anthropic `/v1/messages`, OpenAI Responses, or Gemini REST). The alternative model backends rflectr re-points those hosts at speak many — OpenAI Chat Completions, OpenAI Responses, Gemini `v1beta`, xAI, Mistral, DeepSeek, OpenRouter, openai-compatible endpoints, and so on. The translation layer is the single code path that bridges the host's format to whatever the selected model actually wants.
-The defining decision is that there is **exactly one translation path**, not N. Every non-Anthropic provider routes through the Vercel AI SDK (`ai` + `@ai-sdk/*`) — the same packages OpenCode loads — which owns wire format, endpoint selection, and provider quirks. rflectr never hand-rolls an Anthropic→OpenAI or Anthropic→Gemini translator per provider. Adding a provider once makes it usable from every host.
-Anthropic-native models skip the adapter entirely and are forwarded raw; `isSdkMigratedNpm(npm)` (`src/provider-factory.ts:68`) is the gate — true for any npm except `@ai-sdk/anthropic`.
-See the knowledge doc: [`../../../knowledge/private/ai/translation-layer.md`](../../../knowledge/private/ai/translation-layer.md).
-## What Was Built
-- A provider factory (`createLanguageModel`) that maps OpenCode's `api.npm` package name to a Vercel AI SDK `LanguageModel` via dynamic `import(npm)` and `create*`-export discovery (`src/provider-factory.ts:102`).
-- A Responses-vs-Chat API selector (`modelPrefersResponsesApi`) that routes newer OpenAI/xAI reasoning models through the Responses API (`src/provider-factory.ts:34`).
-- The Anthropic-facing adapter: `translateRequest` (request → SDK call params), `streamAnthropicResponse` (SDK `fullStream` → Anthropic SSE), and `generateAnthropicResponse` (non-streaming) (`src/sdk-adapter.ts:222`, `:413`, `:431`).
-- Inline `role:'system'` folding — Claude Code's mid-conversation skills list and system-reminders are merged into the system prompt instead of being dropped (`src/sdk-adapter.ts:90`, `:232`).
-- The `thought_signature` round-trip — smuggled through the tool-use id so reasoning models (especially Gemini) get their signature echoed back verbatim (`src/proxy-shared.ts:93`, `:72`).
-- Per-provider reasoning capability + effort translation (`getReasoningCapabilities`, `effortProviderOptions`, `thinkingProviderOptions`, `deepMergeProviderOptions`) (`src/provider-factory.ts:469`, `:633`, `:741`, `:725`).
-- Two more host-facing directions reusing the same factory + SDK model: the OpenAI Chat Completions adapter (`src/openai-adapter.ts`), the Codex Responses adapter (`src/codex-responses-adapter.ts`), and the Gemini parts translator (`src/gemini-parts.ts`).
-## Goals
-- One translation path for all non-Anthropic providers — eliminate the combinatorial per-provider translator mess.
-- Let the SDK own wire format, endpoint selection, and provider quirks (message ordering, reasoning signatures, tool-call encoding).
-- Preserve host behavior that the Anthropic wire format alone would lose — inline system messages and reasoning signatures.
-- Keep `dist/cli.js` small by loading SDK provider packages on demand from `node_modules`.
-- Make every host (Claude Code, Codex, Gemini, Claude Desktop gateway) converge on the same factory so a provider added once works everywhere.
-## Non-Goals
-- Hand-rolled per-provider wire translation — explicitly rejected in favor of the SDK.
-- Owning the agent tool loop. The adapter is strictly **one turn per request**; Claude Code owns the loop (`src/sdk-adapter.ts:3`).
-- Bundling SDK provider packages into the CLI. They ship as `dependencies` / `external` and resolve at runtime (`src/provider-factory.ts:81`).
-- Routing Anthropic-native models — those bypass the adapter and are forwarded raw (passthrough handled by the proxy, see PRD-005).
-- Accurate cost reporting for non-Anthropic models (Claude Code applies its own pricing table; documented limitation).
-## Features
-| # | Feature | Source |
-|---|---------|--------|
-| F1 | Single-path gate (`isSdkMigratedNpm`): SDK for all npm except `@ai-sdk/anthropic` | `src/provider-factory.ts:68` |
-| F2 | `createLanguageModel({npm,modelId,apiKey,baseURL})` factory via dynamic `import(npm)` + `create*` discovery | `src/provider-factory.ts:102`, `:72`, `:81` |
-| F3 | Responses-vs-Chat API selection (`modelPrefersResponsesApi`) for OpenAI/xAI | `src/provider-factory.ts:34` |
-| F4 | `translateRequest` — messages, tools, `tool_choice`, system | `src/sdk-adapter.ts:222` |
-| F5 | Inline `role:'system'` folding into the system prompt | `src/sdk-adapter.ts:90`, `:232` |
-| F6 | `streamAnthropicResponse` — SDK `fullStream` → Anthropic SSE | `src/sdk-adapter.ts:413`, `:273` |
-| F7 | `generateAnthropicResponse` — non-streaming case | `src/sdk-adapter.ts:431` |
-| F8 | `thought_signature` round-trip via tool-use id (encode/decode) | `src/proxy-shared.ts:93`, `:72`; `src/sdk-adapter.ts:190`, `:352` |
-| F9 | Per-provider reasoning effort/thinking translation | `src/provider-factory.ts:469`, `:633`, `:741` |
-| F10 | OpenAI Chat Completions host direction (reuses factory + SDK model) | `src/openai-adapter.ts:38` |
-| F11 | Gemini parts → Anthropic blocks translation | `src/gemini-parts.ts:16`, `:45` |
-| F12 | Codex Responses host direction (reuses factory + SDK model) | `src/codex-responses-adapter.ts:243` |
-## Architecture & Implementation
-### One translation path
-```mermaid
-flowchart TD
-    host["Host wire format<br/>(Anthropic / Responses / Gemini)"]
-    host --> gate{"isSdkMigratedNpm(npm)?"}
-    gate -- "no (@ai-sdk/anthropic)" --> raw["raw passthrough (PRD-005)"]
-    gate -- "yes" --> trans["translate*Request() — host body → SdkCallParams"]
-    trans --> factory["createLanguageModel({npm, modelId, apiKey, baseURL})"]
-    factory --> model["Vercel AI SDK LanguageModel"]
-    model --> run["streamText / generateText"]
-    run --> back["map SDK fullStream → host SSE / JSON"]
-    back --> host
-```
-The classifier in PRD-003 supplies `npm` and `modelFormat`; the proxy in PRD-005 dispatches into this layer.
-### Request translation (Anthropic → SDK)
-`translateRequest(body, npm, options?)` (`src/sdk-adapter.ts:222`) builds an `SdkCallParams` object:
-- **Messages** — `translateMessages` (`src/sdk-adapter.ts:153`) walks Anthropic blocks into SDK `ModelMessage[]`: text, images (`imagePart`, `:103`), `tool_use` → `tool-call`, `tool_result` → a `tool` role message, and `thinking` → SDK `reasoning` parts. Tool-result messages need a tool name, resolved by `annotateToolNames` (`:116`) which builds an id→name map first.
-- **Tools** — `translateTools` (`:204`) wraps each Anthropic tool as an SDK `tool({ description, inputSchema: jsonSchema(...) })`.
-- **`tool_choice`** — `translateToolChoice` (`:214`) maps `auto`→`'auto'`, `any`→`'required'`, `tool`→`{type:'tool',toolName}`.
-- **System folding** — `systemToString` (`:82`) flattens the top-level `system`; `inlineSystemText` (`:90`) collects mid-conversation `role:'system'` messages (Claude Code injects the skills list and system-reminders this way) and joins them into the system prompt so they are not dropped (`:232`).
-- **Reasoning** — effort is read from `output_config.effort` (`anthropicEffortFromRequest`, `:65`) or `options.defaultEffort` (the Claude Desktop gateway omits effort); `providerOptions` is the deep-merge of `thinkingProviderOptions(npm)` and `effortProviderOptions(...)` (`:244`).
-- **ChatGPT Codex OAuth branch** — when `openAiOAuth` is set, the system prompt moves into `providerOptions.openai.instructions`, `system`/`maxOutputTokens` are cleared, and a default `"You are a coding assistant."` is used if no system text exists (`:235`, `:251`, `:258`).
-### Factory discovery (`createLanguageModel`)
-`createLanguageModel(spec)` (`src/provider-factory.ts:102`) is async and routes by `npm`:
-| npm | Behavior | Source |
-|---|---|---|
-| `@ai-sdk/google-vertex/anthropic` (`VERTEX_ANTHROPIC_NPM`) | Claude on Vertex AI via ADC (no apiKey) | `:105` |
-| `@ai-sdk/openai` | OAuth → ChatGPT Codex backend (`https://chatgpt.com/backend-api/codex`); API key → direct. `modelPrefersResponsesApi()` picks `openai.responses(id)` vs `openai.chat(id)` | `:117` |
-| `@ai-sdk/xai` | Direct; also consults `modelPrefersResponsesApi()` | `:134` |
-| `@ai-sdk/google` | Direct — ignores `baseURL`, uses native `v1beta` (passing the OpenAI-compat discovery URL would 404) | `:142` |
-| `@ai-sdk/anthropic` | Direct; strips a trailing `/v1` from `baseURL`, re-appends `/v1` for the SDK | `:149` |
-| `@ai-sdk/openai-compatible`, `@openrouter/ai-sdk-provider` | Routed via `baseURL` | `:160`, `:167` |
-| anything else | `loadSdkProviderFactory(npm)` → dynamic `import(npm)` → `findCreateFactory` finds the `create*()` export | `:170`, `:81`, `:72` |
-`findCreateFactory` (`:72`) scans the module's exports for a function whose name starts with `create`. `loadSdkProviderFactory` (`:81`) caches the promise per npm and, on `ERR_MODULE_NOT_FOUND`, raises an install hint. Reasoning models matching `/deepseek-r1|think|reasoning|qwq/` are wrapped with `extractReasoningMiddleware({ tagName: 'think' })` (`:176`). The `@ai-sdk/*` packages are npm `dependencies` marked `external` in `tsup.config.ts`, so `dist/cli.js` stays small.
-### Responses-vs-Chat selection
-`modelPrefersResponsesApi(modelId)` (`src/provider-factory.ts:34`) returns true when a model must use `/v1/responses` rather than `/v1/chat/completions`:
-| Pattern | Examples | Matched by |
-|---|---|---|
-| `gpt-5.4` / `gpt-5.5` prefixes | `gpt-5.4`, `gpt-5.5-fast` | `RESPONSES_ONLY_PREFIXES` exact/`-`-prefix (`:12`, `:36`) |
-| `gpt-5-pro` / `gpt-5.2-pro` | `gpt-5-pro` | same |
-| `gpt-5-codex` and versioned codex | `gpt-5-codex`, `gpt-5.3-codex` | prefix + `gpt-*-codex` check (`:40`) |
-| o-series | `o3`, `o4`, `o3-mini` | prefix list (`:18`) |
-| xAI multi-agent | `grok-4.20-multi-agent`, `grok-4.2-multiagent` | `grok-*` + `multi-agent`/`multiagent` (`:42`) |
-`upstreamModelId` (from PRD-003) carries OpenCode's `api.id` because catalog ids may differ from upstream API ids (e.g. `gpt-5.5-fast` → `gpt-5.5`).
-### Response translation (SDK → Anthropic SSE)
-`writeAnthropicStream` (`src/sdk-adapter.ts:273`) consumes the SDK `fullStream` and emits Anthropic SSE events. It tracks one open content block at a time and maps SDK parts:
-- `reasoning-start` / `reasoning-delta` → `thinking` block + `thinking_delta`; `reasoning-end` captures the round-trip signature emitted later as a `signature_delta` on close (`:321`, `:331`, `:302`).
-- `text-start` / `text-delta` → `text` block + `text_delta` (`:337`).
-- `tool-input-start` / `-delta` → `tool_use` block (id encoded with the thought signature) + `input_json_delta`; `tool-call` handles the non-streamed case (`:349`, `:365`).
-- `finish` maps `finishReason` and usage; `error` closes the open block and emits an Anthropic `error` event (`:381`, `:393`).
-`streamAnthropicResponse` (`:413`) wires `streamText` to `writeAnthropicStream` and swallows stream-property rejections; `generateAnthropicResponse` (`:431`) runs `generateText` and builds a single Anthropic message JSON.
-### thought_signature round-trip
-Reasoning models (especially Gemini) require their `thought_signature` echoed back verbatim on the next turn, but the Anthropic wire format has no field for it. rflectr smuggles it through the tool-use id:
-- **Encode** — `encodeToolUseId(rawId, signature)` (`src/proxy-shared.ts:93`) base64url-encodes the signature and appends it after a `__ts__` separator: `{rawId}__ts__{base64url(signature)}`. When emitting blocks, the adapter calls it at `tool-input-start` / `tool-call` with the signature from `grabRoundTripSignature` (`src/sdk-adapter.ts:352`, `:371`).
-- **Decode** — `splitToolUseId(id)` (`src/proxy-shared.ts:72`) recovers `{ rawId, thoughtSignature }`. On the next request, `translateMessages` decodes it and, for Google, sets `providerOptions.google.thoughtSignature` on the `tool-call` part (`src/sdk-adapter.ts:190`); the SDK then handles Gemini's strict echo-back.
-`grabRoundTripSignature` (`src/proxy-shared.ts:22`) reads the signature from provider metadata — `google.thoughtSignature` / `thought_signature` for Gemini, `openai.reasoningEncryptedContent` for OpenAI Responses. On the Gemini host direction, `partThoughtSignature` (`src/gemini-parts.ts:8`) pulls it off the function-call part and `parseGeminiPart` re-encodes it into the tool-use id (`:30`).
-> **Separator note:** the live separator is `__ts__` with a base64url-encoded payload (`src/proxy-shared.ts:38`, `:93`). `::ts::` is retained only as a legacy decode fallback for sessions started before the change (`:82`). The knowledge doc describes the `::ts::` form.
-### Reusing the factory for the other two host directions
-The same `createLanguageModel` + `streamText`/`generateText` underpins all hosts; only the host-facing translation differs:
-- **OpenAI Chat Completions** (`src/openai-adapter.ts`): `translateOpenAiRequest` (`:38`) builds `SdkCallParams` from an OpenAI body; `generateOpenAiResponse` (`:131`) / `streamOpenAiResponse` (`:161`) emit the OpenAI JSON / SSE shape.
-- **Codex Responses API** (`src/codex-responses-adapter.ts`): `translateResponsesRequest` (`:243`) / `translateResponsesInput` (`:166`) / `translateResponsesTools` (`:230`) build SDK params from a Responses body; `streamResponsesResponse` (`:575`) / `generateResponsesResponse` (`:593`) emit Responses SSE/JSON.
-- **Gemini REST** (`src/gemini-parts.ts`): `parseGeminiPart` (`:16`), `collectAnthropicBlocksFromGeminiParts` (`:45`), and `mapGeminiUsage` (`:76`) translate Gemini parts and usage.
-## Acceptance Criteria
-- [x] All non-Anthropic providers route through the Vercel AI SDK; `@ai-sdk/anthropic` is the only npm that bypasses it (`isSdkMigratedNpm`, `src/provider-factory.ts:68`).
-- [x] `createLanguageModel` resolves an SDK `LanguageModel` from `{npm, modelId, apiKey, baseURL}` via dynamic `import(npm)` + `create*` discovery (`src/provider-factory.ts:102`, `:72`, `:81`).
-- [x] Special factory branches exist for Vertex Anthropic, OpenAI (OAuth Codex backend + Responses/chat), xAI, Google `v1beta`, Anthropic `/v1` normalization, openai-compatible, and OpenRouter (`src/provider-factory.ts:105`–`:174`).
-- [x] `modelPrefersResponsesApi` returns true for GPT-5.4+/5.5/pro, `*-codex`, the o-series, and xAI multi-agent models (`src/provider-factory.ts:34`).
-- [x] `translateRequest` maps messages, tools, `tool_choice`, and system into `SdkCallParams` (`src/sdk-adapter.ts:222`).
-- [x] Inline `role:'system'` messages are folded into the system prompt rather than dropped (`src/sdk-adapter.ts:90`, `:232`).
-- [x] `streamAnthropicResponse` maps the SDK `fullStream` to Anthropic SSE and `generateAnthropicResponse` handles the non-streaming case (`src/sdk-adapter.ts:413`, `:431`).
-- [x] `thought_signature` is encoded into the tool-use id and decoded back into `providerOptions.google.thoughtSignature` (`src/proxy-shared.ts:93`, `:72`; `src/sdk-adapter.ts:190`).
-- [x] Per-provider reasoning effort/thinking is translated via `getReasoningCapabilities` / `effortProviderOptions` / `thinkingProviderOptions` / `deepMergeProviderOptions` (`src/provider-factory.ts:469`, `:633`, `:741`, `:725`).
-- [x] The Codex Responses and OpenAI/Gemini host directions reuse the same factory + SDK model (`src/codex-responses-adapter.ts:243`, `src/openai-adapter.ts:38`, `src/gemini-parts.ts:16`).
-- [x] SDK provider packages load on demand and stay `external` so `dist/cli.js` remains small (`src/provider-factory.ts:81`; `tsup.config.ts`).
-## Files
-| File | Role |
-|------|------|
-| `src/sdk-adapter.ts` | Anthropic `/v1/messages` ↔ SDK; `translateRequest`, `translateMessages`, `streamAnthropicResponse`, `generateAnthropicResponse`, inline-system folding |
-| `src/provider-factory.ts` | `createLanguageModel`, `isSdkMigratedNpm`, `modelPrefersResponsesApi`, reasoning capability + effort translation |
-| `src/openai-adapter.ts` | OpenAI Chat Completions host direction (`translateOpenAiRequest`, `generate`/`streamOpenAiResponse`) |
-| `src/gemini-parts.ts` | Gemini content-part → Anthropic block translation + usage mapping |
-| `src/codex-responses-adapter.ts` | Codex Responses API host direction (translation aspects) |
-| `src/proxy-shared.ts` | `encodeToolUseId` / `splitToolUseId` (thought_signature round-trip), `grabRoundTripSignature`, SSE helpers |
-| `tests/sdk-adapter.test.ts`, `tests/provider-factory.test.ts` | Unit coverage for the pure translation functions |
-## Risks & Known Limitations
-- **`thought_signature` separator collision.** The signature is appended after a separator in the tool-use id. The (live) `__ts__` separator carries a base64url payload, so a collision would require the encoded payload to itself contain `__ts__` — extremely unlikely. The legacy `::ts::` form (kept as a decode fallback) would break only if a raw signature literally contained `::ts::` (`src/proxy-shared.ts:38`, `:82`).
-- **Gemini strict echo-back.** Gemini rejects requests that don't echo `thought_signature` verbatim. This is why the hand-rolled Gemini-native path was retired — the SDK handles the echo-back once the signature round-trips (`src/sdk-adapter.ts:190`, `src/gemini-parts.ts:8`).
-- **Cost display inaccuracy.** Claude Code applies its own pricing table, so non-Anthropic model cost is always wrong (documented, by design).
-- **One turn per request.** The adapter never loops; if a host expected the adapter to drive a tool loop it would break. Claude Code owns the loop (`src/sdk-adapter.ts:3`).
-- **`@ai-sdk/github-copilot` is unsupported.** OpenCode loads it from internal `@opencode-ai/core`, not a public npm factory the dynamic `import(npm)` can resolve (documented limitation).
-- **Provider-specific reasoning mappings are heuristic.** Effort levels are mapped per provider (e.g. xAI has no `medium`; DeepSeek `low/medium`→`high`), so a requested effort may snap to the nearest valid value (`src/provider-factory.ts:424`, `:359`).
-## Related
-- [`../../../knowledge/private/ai/translation-layer.md`](../../../knowledge/private/ai/translation-layer.md) — the knowledge doc this PRD documents.
-- [`../prd-003-model-discovery-classification/prd-003-model-discovery-classification-index.md`](../prd-003-model-discovery-classification/prd-003-model-discovery-classification-index.md) — classification supplies the `npm` and `modelFormat` that drive factory routing and the `isSdkMigratedNpm` gate.
-- [`../prd-005-local-proxy-catalog-routing/prd-005-local-proxy-catalog-routing-index.md`](../prd-005-local-proxy-catalog-routing/prd-005-local-proxy-catalog-routing-index.md) — the local proxy dispatches Anthropic-format requests into this adapter (or raw passthrough for `@ai-sdk/anthropic`).
-- [`../prd-009-codex-integration/prd-009-codex-integration-index.md`](../prd-009-codex-integration/prd-009-codex-integration-index.md) — Codex reuses the factory + SDK model via the Responses host direction.
-- [`../prd-010-gemini-cli-integration/prd-010-gemini-cli-integration-index.md`](../prd-010-gemini-cli-integration/prd-010-gemini-cli-integration-index.md) — Gemini CLI reuses the factory via the Gemini REST host direction.

package/library/requirements/completed/prd-004-translation-layer/qa/.gitkeep DELETED Viewed

File without changes