npm - @loicngr/kobo - Versions diffs - 1.7.6 → 1.7.8 - Mend

@loicngr/kobo 1.7.6 → 1.7.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (110) hide show

package/AGENTS.md CHANGED Viewed

@@ -171,6 +171,35 @@ When adding features touching `notion-service.ts`, remember: **no token = no fea
 See the "Notion integration" section of the README for the end-user setup guide.
+### Agent engines
+Two engines live under `src/server/services/agent/engines/`, both implementing the `AgentEngine` contract in `types.ts`:
+**Claude Code** (`claude-code/`) — uses `@anthropic-ai/claude-agent-sdk` (in-process async iterator). Spawns no subprocess. Auth via `~/.claude.json` or `ANTHROPIC_API_KEY` env var.
+**OpenAI Codex** (`codex/`) — uses the **`codex app-server` JSON-RPC protocol** (line-delimited JSON over stdio with a long-lived `codex` subprocess). The engine layers are:
+- `jsonrpc/transport.ts` + `jsonrpc/peer.ts` — generic JSON-RPC 2.0 stdio peer (request correlation, notifications, server-initiated requests)
+- `client.ts` — typed `AppServerClient` wrapping the peer (initialize / thread.start / thread.resume / turn.start / turn.interrupt)
+- `protocol/types.ts` — hand-written subset of the Codex v2 protocol types (camelCase field names — `agentMessage`, `commandExecution`, etc.). The full canonical bindings are generated by `codex app-server generate-ts` if the protocol drifts.
+- `event-mapper.ts` — translates app-server notifications (`item/started`, `item/completed`, `item/agentMessage/delta`, `turn/completed`, `thread/tokenUsage/updated`, `account/rateLimits/updated`, `error`) into Kōbō `AgentEvent` union
+- `server-requests.ts` — handles server-initiated approval/elicitation requests (`item/commandExecution/requestApproval`, `item/fileChange/requestApproval`, `item/tool/requestUserInput`, `item/permissions/requestApproval`, plus v1 legacy aliases `execCommandApproval` / `applyPatchApproval`)
+- `engine.ts` — `createCodexEngine()` factory wiring everything into `AgentEngine`
+- `spawn.ts` — locates the `codex` binary via `@openai/codex` dependency and spawns `codex app-server`
+Auth: delegated to the `codex` CLI which reads `OPENAI_API_KEY` from env or `~/.codex/auth.json`. Kōbō ships no Codex credentials. The `@openai/codex` package (binary) is a direct dependency.
+Background: the engine was migrated from `@openai/codex-sdk` (one-shot `codex exec`) to `codex app-server` in May 2026 to unlock features the SDK didn't surface (sub-agent visibility, interactive approvals, `request_user_input`, structured rate limits). The original migration plan and wire-capture notes live at `docs/superpowers/plans/2026-05-11-codex-app-server-migration.md` and `2026-05-11-codex-app-server-wire-capture.md`.
+**Protocol gotchas worth remembering** (post-migration findings):
+- **`experimentalApi: true` is mandatory in the `initialize` handshake.** Without it, any turn using experimental fields — most importantly `turn/start.collaborationMode` — is rejected with `-32600: requires experimentalApi capability`. See `client.ts:connect()`.
+- **`collaborationMode` is sticky server-side.** Once a turn ran in `mode: 'plan'`, every subsequent turn on the same thread stays in plan until we explicitly send `mode: 'default'` again. The engine therefore always emits the field on `turn/start` — never omits it — so a Plan → Bypass switch actually takes effect. Mapping: Kōbō `plan` → `plan`, every other Kōbō mode → `default`. Plan mode is the only one that unlocks Codex's internal `request_user_input` tool.
+- **Permission mode vs collaboration mode are independent.** Sandbox + approvalPolicy control *what the agent may do at OS level* (read-only / workspace-write, never / on-request / unless-trusted). `collaborationMode` is a separate session-level flag that gates internal Codex behaviour (notably interactive Q&A). Kōbō hides both behind a single "permission mode" selector and maps them together.
+- **Sub-agents map to `collabAgentToolCall`.** Codex's analogue of Claude's Task tool is `collabAgentToolCall` (`spawnAgent` / `sendInput` / `resumeAgent` / `wait` / `closeAgent`). The mapper emits **both** a `tool:call` named `Task` (chat card) and a `subagent:progress` event (right-hand panel) per call — same dual-emission Claude does. See `event-mapper.ts` `handleItemStarted` / `handleItemCompleted` for the `collabAgentToolCall` branch.
+- **`fileChange` items carry a unified-diff blob.** The protocol shape is `{ path, kind: PatchChangeKind, diff: string }` per change; `kind` is a discriminated union, not a string. The mapper flattens the first change into a Claude-style Edit input (`{ file_path, diff, change_kind, move_path? }`) so the existing `ToolCallItem` renderer picks it up. The client parses the unified diff into `DiffLine[]` via `parseUnifiedDiff` in `inline-diff.ts`.
+- **Streaming bursts trip auto-scroll.** Codex emits one `message:text` event per token-delta (50-200 per message), versus Claude which emits ~1 per content block. The naive `eventCount` watcher in `ActivityFeed.vue` triggered an animated `scrollToBottom(180)` per event, causing stacked animations and visible jank. The fix coalesces requests through `requestAnimationFrame` and only animates the *first* scroll after a quiet period — subsequent scrolls during a burst snap instantly.
+- **`MCP tools` need `default_tools_approval_mode: 'auto'` in `config.mcp_servers`.** Without it Codex flags every MCP tool call as needing user approval ("user cancelled MCP tool call"). Kōbō trusts every tool it spawns, so the options-builder pre-approves the namespace.
 ## Code conventions
 **Service layer** throws descriptive errors; the route layer catches and maps to HTTP status codes. Error messages follow the pattern `` `Workspace '${id}' not found` `` / `` `... is already archived` ``.

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Kōbō
-> **Kōbō** (工房) — Japanese for *workshop*. A multi-workspace agent manager for [Claude Code](https://claude.com/claude-code).
+> **Kōbō** (工房) — Japanese for *workshop*. A multi-workspace agent manager for [Claude Code](https://claude.com/claude-code) and [OpenAI Codex](https://developers.openai.com/codex/) *(Codex support is still experimental — see [the section below](#openai-codex-integration))*.
 > [!NOTE]
 > 🚧 **Active development** — breaking changes may still land on `develop`. The database layer ships with forward-only migrations and a timestamped pre-migration backup of `kobo.db` before any schema change, so upgrades preserve your data even across invasive refactors.
@@ -12,7 +12,7 @@ Think of it as an apprentice's hall: you hand out missions, each apprentice sets
 ## Features
 - **Isolated git worktrees** — every workspace runs on its own branch in its own directory, with a configurable global worktrees root for new workspaces, so concurrent Claude sessions never step on each other
-- **Pluggable agent engine** — Kōbō talks to agents through an `AgentEngine` contract with a normalised `AgentEvent` stream (`src/server/services/agent/engines/`). The `claude-code` engine runs on the official [`@anthropic-ai/claude-agent-sdk`](https://github.com/anthropics/claude-agent-sdk-typescript); adding a second runtime (e.g. Codex) only requires a new adapter, not a rewrite of the UI or orchestration layer
+- **Pluggable agent engine — two runtimes shipped** — Kōbō talks to agents through an `AgentEngine` contract with a normalised `AgentEvent` stream (`src/server/services/agent/engines/`). The `claude-code` engine runs on the official [`@anthropic-ai/claude-agent-sdk`](https://github.com/anthropics/claude-agent-sdk-typescript); the `codex` engine speaks the [`codex app-server`](https://github.com/openai/codex) JSON-RPC protocol over stdio with the official [`@openai/codex`](https://www.npmjs.com/package/@openai/codex) binary. Engine is chosen per-workspace at creation time, with a single normalised UI (sub-agents, tool calls, todos, reasoning, permission modes, MCP servers, auto-loop) covering both. Adding a third runtime only requires a new adapter, not a rewrite of the UI or orchestration layer
 - **Interactive `AskUserQuestion`** — when the agent invokes `AskUserQuestion`, Kōbō pauses the session via the SDK's `defer` pattern, surfaces a question panel in the UI, and resumes the agent once the user answers. The session does not occupy any resources while it waits
 - **Rich chat feed** — live streaming text, thinking blocks, inline tool calls with expandable diffs for Edit/Write, per-turn session cards, markdown rendering, jump-to-previous-user-message button, and infinite scroll-up over persisted history
 - **Task & acceptance criteria tracking** — the agent reports progress through a dedicated MCP server (`kobo-tasks`) that reads and updates tasks directly from the SQLite database
@@ -27,7 +27,7 @@ Think of it as an apprentice's hall: you hand out missions, each apprentice sets
 - **Prompt templates** — personal library of reusable prompts with variable substitution (`{working_branch}`, `{commit_count}`, etc.), insertable from the chat input via `/` autocomplete; editable in Settings > Templates
 - **Favorites and tags** — pin workspaces to the top via right-click favourite, organise with per-workspace tags filterable from the sidebar; a global tag catalogue keeps colours consistent across workspaces
 - **Health panel + config export/import** — inspect backend health (agent sessions, migration state, dev servers, DB size) and roundtrip your Kōbō config (settings, templates, skills) between machines via JSON
-- **Account-level quota panel** — a colored mini-bar badge in the chat footer shows the current Claude Code 5-hour and 7-day usage, fed by a backend service that polls Anthropic's OAuth usage endpoint every 60 seconds. Click to open a popover with full bars, reset times, a "Refresh now" button, and a one-click jump to the Stats tab. Pluggable per-provider (Codex-ready), persisted in SQLite so the badge is populated on cold start, and account-level so it's the same across workspaces sharing the same engine
+- **Account-level quota panel** — a colored mini-bar badge in the chat footer shows the current Claude Code 5-hour and 7-day usage (Claude workspaces) or live Codex rate-limit buckets (Codex workspaces — driven by the structured `account/rateLimits/updated` app-server notification). Click to open a popover with full bars, reset times, a "Refresh now" button, and a one-click jump to the Stats tab. Pluggable per-provider, persisted in SQLite so the badge is populated on cold start, and account-level so it's the same across workspaces sharing the same engine
 - **Resizable right drawer** — drag-to-resize horizontally and vertically, with tab state and split ratio persisted to localStorage
 - **Soft interrupt** — pause an agent mid-execution (SIGINT, like pressing Escape in Claude Code) without killing the process; the agent stops the current tool and waits for the next message
 - **Archive instead of delete** — soft-remove workspaces without losing the worktree, branches, or history; unarchive restores the exact pre-archive state
@@ -51,7 +51,9 @@ Think of it as an apprentice's hall: you hand out missions, each apprentice sets
 ### Prerequisites
 - Node.js ≥ 20
-- [Claude Code](https://claude.com/claude-code) authenticated via `claude /login` once. The `claude` CLI is **no longer required at runtime** — Kōbō embeds the official [`@anthropic-ai/claude-agent-sdk`](https://github.com/anthropics/claude-agent-sdk-typescript), which reuses the same login.
+- At least one agent runtime, authenticated:
+  - [Claude Code](https://claude.com/claude-code) — `claude /login` once. The `claude` CLI is **no longer required at runtime** — Kōbō embeds the official [`@anthropic-ai/claude-agent-sdk`](https://github.com/anthropics/claude-agent-sdk-typescript), which reuses the same login.
+  - [OpenAI Codex](https://developers.openai.com/codex/) — `codex login` once, or export `OPENAI_API_KEY` in the env. See [OpenAI Codex integration](#openai-codex-integration). Workspaces pick an engine at creation time, so you only need to set up the one(s) you use.
 - Git
 - Optional: Docker (if you configure per-workspace dev servers)
 - Optional: `gh` CLI (if you use the PR automation)
@@ -155,6 +157,146 @@ If you need to pin a specific version of the Notion MCP server, use a fork, or a
 Without a valid token configured, the Notion import field in the workspace creation form will return an error when you click **Refresh** or submit a Notion URL — the rest of Kōbō (workspaces, agents, tasks, Git integration) keeps working independently.
+## OpenAI Codex integration
+> [!WARNING]
+> 🧪 **Experimental** — the Codex engine has shipped but is still maturing. The Claude Code engine remains the primary, battle-tested path. Expect occasional rough edges on Codex-only flows (tool rendering for less common item types, sub-agent interactions, edge cases around `collaborationMode` and approval prompts). Bugs and feedback welcome on the issue tracker.
+Kōbō ships a second agent engine that runs on top of the official **OpenAI Codex** CLI. Pick `OpenAI Codex` in the engine selector when you create a workspace and the agent talks to the `codex` binary instead of Claude Code, with the same UI surface: streaming text, reasoning blocks, tool cards (Bash, Edit, Read, WebSearch, MCP tools, ImageGeneration), sub-agents (Codex's `collabAgentToolCall` family is mapped onto the same `Task` panel Claude's Task tool feeds), todo list, permission modes, interactive approvals, structured rate limits, auto-loop. **This feature is opt-in and requires you to authenticate the `codex` CLI separately from Claude Code** — Kōbō ships no OpenAI credentials.
+Under the hood, Kōbō spawns a long-lived `codex app-server` subprocess per workspace and speaks the [Codex app-server JSON-RPC protocol](https://github.com/openai/codex/tree/main/codex-rs/app-server) over stdio. The `codex` binary is pulled in via the [`@openai/codex`](https://www.npmjs.com/package/@openai/codex) npm package, which is a direct dependency — no separate install required.
+### Authenticating the Codex CLI
+Two paths, pick one:
+1. **`codex login` (recommended)** — run `codex login` once. The CLI writes a token to `~/.codex/auth.json` which Kōbō's spawned `codex app-server` reuses automatically:
+   ```bash
+   codex login
+   ```
+2. **`OPENAI_API_KEY` env var** — set the variable before launching Kōbō:
+   ```bash
+   OPENAI_API_KEY=sk-your-key-here PORT=9999 npx @loicngr/kobo@latest
+   ```
+Kōbō does not store or proxy the key. If you change the credential or revoke it, Kōbō follows automatically on the next session start.
+### Permission modes (Codex)
+Kōbō's four permission modes (`plan` / `bypass` / `strict` / `interactive`) map to Codex's `sandbox` + `approvalPolicy` pair, plus a separate `collaborationMode` flag that gates interactive questions:
+| Kōbō mode | Codex sandbox | Codex approvalPolicy | Codex collaborationMode | Effect |
+|---|---|---|---|---|
+| `plan` | `read-only` | `never` | `plan` | Read-only sandbox + the agent can ask interactive questions (`request_user_input`) |
+| `bypass` | `workspace-write` | `never` | `default` | Full autonomy in the worktree, no approvals |
+| `strict` | `workspace-write` | `on-request` | `default` | Writes allowed, approval prompted on sensitive commands |
+| `interactive` | `workspace-write` | `unless-trusted` | `default` | Writes allowed, approval prompted on every untrusted action |
+Interactive Q&A (`request_user_input`) is only available in `plan` — this is a constraint of Codex itself, not Kōbō. The typical workflow is: brainstorm in `plan` until the agent has the context it needs, then switch to `bypass`/`strict` for execution.
+### Models and reasoning effort
+The Codex engine exposes the OpenAI model catalogue (`gpt-5-codex`, `gpt-5.4`, `o4-mini`, `o3`) and the standard reasoning-effort scale (`auto` / `minimal` / `low` / `medium` / `high` / `xhigh`). Both selectors switch automatically when you flip the workspace's engine.
+### Sub-agents
+When the Codex agent uses its `spawnAgent` collab tool, Kōbō renders a **Task** card in the chat (like Claude's Task tool) and a live entry in the **SUB-AGENTS** panel of the right drawer — same plumbing the Claude engine uses. The same panel is hidden for engines that don't expose sub-agents.
+### MCP servers
+The `kobo-tasks` MCP server (and any other MCP server you configure on the workspace) is plumbed into Codex through the standard `config.mcp_servers` entry. Tool calls under those servers are pre-approved (`default_tools_approval_mode: 'auto'`) so the agent doesn't get blocked on every call.
+### When the binary is missing
+Without a working `codex` install or a valid credential, creating a `codex`-engine workspace returns a clear error at first turn and the workspace transitions to `error` status. The rest of Kōbō (Claude-engine workspaces, tasks, Git, dev servers) keeps working independently.
+## Voice transcription (local Whisper)
+Kōbō supports local voice transcription with push-to-talk in both:
+- `WorkspacePage` (chat input)
+- `CreatePage` (workspace instructions textarea)
+### Requirements
+- `whisper-cli` from [`whisper.cpp`](https://github.com/ggml-org/whisper.cpp)
+- `ffmpeg`
+- `cmake` (required to build `whisper.cpp` from source)
+- At least one Whisper model downloaded from **Settings → Voice**
+### Install `whisper.cpp` (Linux/macOS)
+```bash
+git clone https://github.com/ggml-org/whisper.cpp.git
+cd whisper.cpp
+cmake -B build
+cmake --build build -j
+```
+This usually produces `build/bin/whisper-cli`.
+You can also download a prebuilt archive from the `whisper.cpp` releases page (for example: <https://github.com/ggml-org/whisper.cpp/releases/tag/v1.8.4>) and point Kōbō to the extracted `whisper-cli` binary path.
+### Install `ffmpeg`
+Ubuntu / Debian:
+```bash
+sudo apt update
+sudo apt install -y cmake build-essential ffmpeg
+```
+Windows:
+- Install `ffmpeg` (for example via Chocolatey: `choco install ffmpeg`, or via Scoop: `scoop install ffmpeg`)
+- Verify in PowerShell:
+```powershell
+where ffmpeg
+ffmpeg -version
+```
+### Windows notes for `whisper.cpp`
+Install CMake and Visual Studio Build Tools (C/C++), then build `whisper.cpp` (or use a prebuilt `whisper-cli`), then verify:
+```powershell
+where whisper-cli
+whisper-cli -h
+```
+### Configure in Kōbō
+Open **Settings → Voice**:
+- Enable voice transcription
+- Optionally set:
+  - **Whisper binary path (optional)**
+  - **ffmpeg binary path (optional)**
+- If left empty, Kōbō falls back to:
+  - `whisper-cli` from `PATH` (or `WHISPER_CPP_COMMAND` if set)
+  - `ffmpeg` from `PATH`
+- Download a model (e.g. `base`) and select it as active
+The Voice panel shows runtime status (`ready/missing`) for both Whisper and ffmpeg so setup issues are visible immediately.
+### Advanced voice parameters
+Kōbō exposes additional transcription settings in **Settings → Voice**:
+- **Temperature** (`0..1`) — decoding stability vs flexibility
+- **Initial prompt** — optional context/jargon for better recognition
+- **Translate to English** — translate non-English speech to English
+- **Suppress non-speech tokens** — reduce non-speech artifacts in output
+Recommended defaults by model:
+- `tiny` / `base` → `0.1`
+- `small` / `medium` / `large-v3` → `0.2`
 ## Sentry integration
 Kōbō can turn a Sentry issue into a dedicated "fix workspace" — you paste the issue URL at workspace creation and Kōbō extracts the stacktrace, culprit, tags, offending spans and extra context, writes them as a local markdown file inside the worktree (`.ai/thoughts/SENTRY-<id>.md`), and primes the Claude agent with a TDD fix workflow that points at that file. The agent also keeps access to the Sentry MCP tools (`search_issue_events`, `get_issue_tag_values`, `get_sentry_resource`) so it can dig deeper on its own. **This feature is opt-in and reuses the Sentry MCP configuration you already have for Claude Code** — Kōbō does not manage a Sentry token separately.

package/dist/mcp-server/kobo-tasks-server.js CHANGED Viewed

@@ -119,6 +119,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
             name: 'list_tasks',
             description: 'CALL FIRST on any non-trivial turn to know what the user wants done and what is already completed. Returns every task and acceptance criterion for the current workspace with its id and status. Re-call periodically (before marking something done, or after the user asks for a status) to stay in sync with user-added or external updates.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'mark_task_done',
@@ -130,11 +131,13 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['task_id'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'mark_auto_loop_ready',
             description: 'CALL ONLY at the end of a `/kobo-prep-autoloop` grooming session, once all tasks look atomic and implementable in one session. Flips a flag on the workspace that unlocks the auto-loop toggle in the UI. Do NOT call during normal sessions.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'create_task',
@@ -150,6 +153,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['title'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'update_task',
@@ -171,6 +175,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['task_id'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'delete_task',
@@ -182,11 +187,13 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['task_id'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'get_workspace_info',
             description: 'CALL EARLY in a session to confirm project path, working/source branch, worktree path, model, and notion link. Cheap read — useful when the user refers to "this workspace" or when you need the worktree path to locate files.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'set_workspace_agent_description',
@@ -201,6 +208,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['description'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'cron_create',
@@ -232,6 +240,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['expression', 'prompt'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'cron_delete',
@@ -243,16 +252,19 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['id'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'cron_list',
             description: 'List all crons currently armed on THIS workspace, including their next and last fire times.',
             inputSchema: { type: 'object', properties: {}, additionalProperties: false },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'get_git_info',
             description: 'CALL BEFORE creating a PR, committing in batches, or reporting progress to the user. Returns commit count ahead of source, files changed, insertions/deletions, and existing PR URL if any.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'set_workspace_status',
@@ -268,26 +280,31 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['status'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'get_notion_ticket',
             description: 'CALL when the user references "the ticket", "the Notion page", or when you need the source-of-truth text for the mission. Returns the Notion URL + locally-extracted ticket content from .ai/thoughts/.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'get_dev_server_status',
             description: 'CALL BEFORE asking the user whether the app is running, or when your change is dev-server-sensitive. Returns running/stopped/starting/error + URL, port, container names.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'start_dev_server',
             description: 'CALL WHEN the user asks you to test the running app and the dev server is stopped.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'stop_dev_server',
             description: 'CALL WHEN the user explicitly asks to stop the dev server, or before destructive operations that require a clean boot.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'get_dev_server_logs',
@@ -299,11 +316,13 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: [],
             },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'list_workspace_images',
             description: 'CALL WHEN the user mentions "the screenshot", "the attached image", or when you need to reference a previously-uploaded image. Returns uid, originalName, relativePath, createdAt for every image in .ai/images/.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'get_settings',
@@ -318,12 +337,14 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: [],
             },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         // ── Knowledge / context tools ─────────────────────────────────────────────
         {
             name: 'list_documents',
             description: 'CALL EARLY on a new session to discover plans, specs, and thoughts previously written for this workspace. Recursively lists every .md under docs/plans/, docs/superpowers/, and .ai/thoughts/. Before writing a new plan, check if one already exists.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'read_document',
@@ -338,6 +359,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['path'],
             },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'log_thought',
@@ -354,6 +376,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['title', 'content'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'search_codebase',
@@ -375,11 +398,13 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['query'],
             },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'get_session_usage',
             description: 'CALL when you need to self-regulate on long missions — returns token/cost totals for the workspace lifetime and for the currently running agent_session. Useful before spawning heavy subagents or deep reasoning on already-expensive sessions.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { readOnlyHint: true, openWorldHint: false },
         },
         {
             name: 'schedule_wakeup',
@@ -402,11 +427,13 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
                 },
                 required: ['delaySeconds', 'prompt'],
             },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
         {
             name: 'cancel_wakeup',
             description: 'CALL to cancel any pending wakeup on this workspace (e.g. the condition you were waiting on resolved early, or you decided not to continue). Idempotent — safe to call when nothing is pending.',
             inputSchema: { type: 'object', properties: {}, required: [] },
+            annotations: { destructiveHint: false, openWorldHint: false },
         },
     ],
 }));

package/dist/server/index.js CHANGED Viewed

@@ -19,6 +19,7 @@ import sentryRouter from './routes/sentry.js';
 import settingsRouter from './routes/settings.js';
 import templatesRouter from './routes/templates.js';
 import usageRoutes from './routes/usage.js';
+import voiceRouter from './routes/voice.js';
 import workspacesRouter from './routes/workspaces.js';
 import { getAvailableSkills, reconcileOrphanSessions, restoreRetryCountsFromDb, sendMessage, setBackendPort, startAgent, startWatchdog, stopAgent, stopWatchdog, } from './services/agent/orchestrator.js';
 import * as autoLoopService from './services/auto-loop-service.js';
@@ -83,6 +84,7 @@ app.route('/api/search', searchRouter);
 app.route('/api/health', healthRouter);
 app.route('/api/engines', enginesRouter);
 app.route('/api/migration', migrationRouter);
+app.route('/api/voice', voiceRouter);
 // Skills endpoint
 app.get('/api/skills', (c) => c.json(getAvailableSkills()));
 const PORT = parseInt(process.env.SERVER_PORT || process.env.PORT || '3000', 10);

package/dist/server/routes/health.js CHANGED Viewed

@@ -3,6 +3,7 @@ import fs from 'node:fs';
 import { Hono } from 'hono';
 import { getDb } from '../db/index.js';
 import { SCHEMA_VERSION } from '../db/migrations.js';
+import { resolveCodexBinary } from '../services/agent/engines/codex/spawn.js';
 import { getGlobalSettings, getProjectSettings, SETTINGS_SCHEMA_VERSION } from '../services/settings-service.js';
 import { getDbPath, getKoboHome } from '../utils/paths.js';
 import { slugifyProjectName } from '../utils/project-slug.js';
@@ -19,6 +20,18 @@ function checkClaudeCli() {
         return { available: false, version: null };
     }
 }
+function checkCodexCli() {
+    try {
+        const bin = resolveCodexBinary();
+        const r = spawnSync(bin, ['--version'], { encoding: 'utf-8' });
+        if (r.error || r.status !== 0)
+            return { available: false, version: null };
+        return { available: true, version: (r.stdout ?? '').trim() || null };
+    }
+    catch {
+        return { available: false, version: null };
+    }
+}
 function isProcessAlive(pid) {
     try {
         process.kill(pid, 0);
@@ -142,6 +155,7 @@ app.get('/report', (c) => {
         },
         settings: { schemaVersion: SETTINGS_SCHEMA_VERSION },
         claudeCli: checkClaudeCli(),
+        codexCli: checkCodexCli(),
         workspaces: {
             total: settingsRow.n,
             archived: archivedRow.n,

package/dist/server/routes/voice.js ADDED Viewed

@@ -0,0 +1,149 @@
+import { Hono } from 'hono';
+import * as settingsService from '../services/settings-service.js';
+import * as transcriptionService from '../services/transcription-service.js';
+import * as workspaceService from '../services/workspace-service.js';
+const app = new Hono();
+const MAX_AUDIO_SIZE = 10 * 1024 * 1024;
+const ALLOWED_AUDIO_MIME = new Set(['audio/webm', 'audio/ogg', 'audio/wav', 'audio/mpeg', 'audio/mp4']);
+const LANGUAGE_RE = /^[a-z-]+$/i;
+function isVoiceLikeError(err) {
+    if (!err || typeof err !== 'object')
+        return false;
+    const e = err;
+    return typeof e.message === 'string' && typeof e.code === 'string' && typeof e.status === 'number';
+}
+function toVoiceHttpStatus(status) {
+    return status === 400 ? 400 : 500;
+}
+async function parseAndTranscribeFromBody(c, config) {
+    const body = await c.req.parseBody();
+    const audio = body.audio;
+    const languageRaw = body.language;
+    const language = typeof languageRaw === 'string' && languageRaw.trim().length > 0 ? languageRaw.trim() : 'auto';
+    if (language !== 'auto' && (!LANGUAGE_RE.test(language) || language.length > 16)) {
+        return c.json({ error: `Invalid language '${language}'`, code: 'LANGUAGE_INVALID' }, 400);
+    }
+    if (!audio || !(audio instanceof File)) {
+        return c.json({ error: 'Missing audio field in multipart body', code: 'MIC_AUDIO_INVALID' }, 400);
+    }
+    if (!ALLOWED_AUDIO_MIME.has(audio.type)) {
+        return c.json({ error: `Unsupported audio type '${audio.type}'`, code: 'MIC_AUDIO_INVALID' }, 400);
+    }
+    const buffer = Buffer.from(await audio.arrayBuffer());
+    if (buffer.length === 0 || buffer.length > MAX_AUDIO_SIZE) {
+        return c.json({ error: 'Invalid audio size', code: 'MIC_AUDIO_INVALID' }, 400);
+    }
+    const result = await transcriptionService.transcribeAudio({
+        audioBuffer: buffer,
+        modelName: config.modelName,
+        language,
+        temperature: config.temperature,
+        prompt: config.prompt,
+        translateToEnglish: config.translateToEnglish,
+        suppressNonSpeechTokens: config.suppressNonSpeechTokens,
+    });
+    return c.json(result);
+}
+app.get('/models', (c) => {
+    try {
+        return c.json(transcriptionService.listVoiceModels());
+    }
+    catch (err) {
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message }, 500);
+    }
+});
+app.get('/runtime', async (c) => {
+    try {
+        const status = await transcriptionService.getVoiceRuntimeStatus();
+        return c.json(status);
+    }
+    catch (err) {
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message, code: 'VOICE_RUNTIME_CHECK_FAILED' }, 500);
+    }
+});
+app.post('/models/:name/download', async (c) => {
+    try {
+        const name = c.req.param('name');
+        const result = await transcriptionService.downloadVoiceModel(name);
+        return c.json(result, 201);
+    }
+    catch (err) {
+        if (err instanceof transcriptionService.VoiceError || isVoiceLikeError(err)) {
+            return c.json({ error: err.message, code: err.code }, toVoiceHttpStatus(err.status));
+        }
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message, code: 'MODEL_DOWNLOAD_FAILED' }, 500);
+    }
+});
+app.delete('/models/:name', (c) => {
+    try {
+        const name = c.req.param('name');
+        transcriptionService.deleteVoiceModel(name);
+        return c.body(null, 204);
+    }
+    catch (err) {
+        if (err instanceof transcriptionService.VoiceError || isVoiceLikeError(err)) {
+            return c.json({ error: err.message, code: err.code }, toVoiceHttpStatus(err.status));
+        }
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message, code: 'MODEL_DELETE_FAILED' }, 500);
+    }
+});
+app.post('/workspaces/:id/transcribe', async (c) => {
+    try {
+        const id = c.req.param('id');
+        const workspace = workspaceService.getWorkspace(id);
+        if (!workspace)
+            return c.json({ error: `Workspace '${id}' not found` }, 404);
+        const global = settingsService.getGlobalSettings();
+        if (!global.voiceEnabled) {
+            return c.json({ error: 'Voice transcription is disabled', code: 'VOICE_DISABLED' }, 400);
+        }
+        if (!global.voiceModel) {
+            return c.json({ error: 'No voice model configured', code: 'MODEL_NOT_CONFIGURED' }, 400);
+        }
+        return await parseAndTranscribeFromBody(c, {
+            modelName: global.voiceModel,
+            temperature: global.voiceTemperature,
+            prompt: global.voicePrompt,
+            translateToEnglish: global.voiceTranslateToEnglish,
+            suppressNonSpeechTokens: global.voiceSuppressNonSpeechTokens,
+        });
+    }
+    catch (err) {
+        if (err instanceof transcriptionService.VoiceError || isVoiceLikeError(err)) {
+            return c.json({ error: err.message, code: err.code }, toVoiceHttpStatus(err.status));
+        }
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message, code: 'TRANSCRIPTION_FAILED' }, 500);
+    }
+});
+// Draft transcription endpoint used before a workspace exists (Create page).
+app.post('/transcribe', async (c) => {
+    try {
+        const global = settingsService.getGlobalSettings();
+        if (!global.voiceEnabled) {
+            return c.json({ error: 'Voice transcription is disabled', code: 'VOICE_DISABLED' }, 400);
+        }
+        if (!global.voiceModel) {
+            return c.json({ error: 'No voice model configured', code: 'MODEL_NOT_CONFIGURED' }, 400);
+        }
+        return await parseAndTranscribeFromBody(c, {
+            modelName: global.voiceModel,
+            temperature: global.voiceTemperature,
+            prompt: global.voicePrompt,
+            translateToEnglish: global.voiceTranslateToEnglish,
+            suppressNonSpeechTokens: global.voiceSuppressNonSpeechTokens,
+        });
+    }
+    catch (err) {
+        if (err instanceof transcriptionService.VoiceError) {
+            return c.json({ error: err.message, code: err.code }, toVoiceHttpStatus(err.status));
+        }
+        const message = err instanceof Error ? err.message : String(err);
+        return c.json({ error: message, code: 'TRANSCRIPTION_FAILED' }, 500);
+    }
+});
+export default app;