npm - @yul-labs/agent-relay - Versions diffs - 0.1.1 - Mend

@yul-labs/agent-relay 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/LICENSE +21 -0
package/README.md +395 -0
package/agent-relay.config.example.json +38 -0
package/dist/cli.d.ts +1 -0
package/dist/cli.js +2938 -0
package/dist/index.d.ts +1698 -0
package/dist/index.js +2571 -0
package/package.json +83 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 agent-relay contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,395 @@
+# agent-relay
+> A vendor-agnostic **interactive** LLM coding-agent session orchestrator.
+`agent-relay` runs a coding agent (**Claude Code**, **Codex**) in its real
+interactive terminal, watches for the approval / choice prompts it raises, and a
+pluggable **Decider** (a rule policy, an LLM, or any function/API) answers them —
+so the agent runs **unattended without you sitting at the keyboard**. Every run
+is tracked as a *session* with a log, and completion is detected so success can
+be judged.
+This is the core idea: **don't suppress the agent's prompts — let them happen and
+have an LLM/policy answer them**, interactively.
+- ✅ Drives the **real interactive TUI** of Claude & Codex over a PTY (node-pty)
+- ✅ A **Decider** answers approvals/choices: `rule` (default), `command` (an LLM
+  CLI like `claude -p` / `codex exec`), `function`/API, or `always-approve`
+- ✅ Both **Claude and Codex** are first-class and **verified end-to-end**
+- ✅ Vendor-agnostic core: every agent is an `AgentAdapter`; nothing in `core/`
+  imports Claude/Codex
+- ✅ Hard timeout, idle timeout, max-interactions, graceful cancel on every run
+- ✅ Deterministic **fake interactive agent** makes the whole loop testable offline
+- ✅ "Done" means **`pnpm typecheck` + `pnpm test` pass** — not "code was written"
+---
+## Requirements
+- Node.js **>= 18.19**
+- [pnpm](https://pnpm.io/)
+- [`node-pty`](https://www.npmjs.com/package/node-pty) — installed automatically;
+  it ships prebuilt binaries for macOS/Linux. agent-relay self-heals its
+  `spawn-helper` permissions at runtime, so it works across install methods.
+- Optional, per adapter:
+  - [`claude`](https://www.npmjs.com/package/@anthropic-ai/claude-code) on PATH
+  - [`codex`](https://www.npmjs.com/package/@openai/codex) on PATH
+Designed for **macOS / Linux** CLI use.
+## Install
+```bash
+# Global CLI (once published):
+npm i -g agent-relay        # or: pnpm add -g agent-relay
+# From a local checkout:
+pnpm install
+pnpm build                  # tsup -> dist/cli.js + dist/index.js + .d.ts
+npm i -g .                  # install this checkout globally
+```
+Run from source without building: `pnpm dev -- <command>`.
+## Quick start
+**Zero-config** — `init` is optional. With no `agent-relay.config.json`, the CLI
+uses built-in defaults and creates its session/log dirs on first run, so you can
+go straight to `npx`:
+```bash
+npx agent-relay run --adapter claude --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"
+```
+```bash
+agent-relay init                                   # OPTIONAL: write a config you can edit
+agent-relay adapters                               # list adapters
+agent-relay doctor                                 # check environment
+# Deterministic, no real agent:
+agent-relay run --adapter fake --prompt "test task"
+# Real agents, driven interactively (the decider answers their prompts):
+agent-relay run --adapter codex  --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"
+agent-relay run --adapter claude --prompt-file ./examples/task.md
+agent-relay resume --session-id <sessionId>
+```
+Run `agent-relay init` only when you want a config file on disk to customize
+(pick a different default adapter, point the decider at an LLM/local model, or
+change the session/log dirs). Pass `--root /path/to/project` to run against
+another project — the real `claude`/`codex` then loads *that* project's
+`CLAUDE.md`, skills, MCP servers, and hooks natively.
+---
+## How it works
+```
+prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
+                                                        │
+                              output settles (idle)     ▼
+                                              ┌──> prompt detected? ──yes──> Decider.decide()
+                                              │                                     │
+                                              │         keystrokes  <───────────────┘
+                                              │  (Enter / arrow+Enter / y/n / text)
+                                              │
+                                              └──> no prompt + sustained idle ──> quit cleanly (completed)
+```
+1. The agent is launched **interactively** in a pseudo-terminal under **pure
+   autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`,
+   Codex with `-s workspace-write -a never`. The agent rarely asks — but the
+   prompts that *still* appear (the directory-trust dialog, the occasional choice)
+   are what the Decider answers. `approvalPolicy: "gated"` makes the agent ask
+   more (Claude `--permission-mode acceptEdits`, Codex `-a on-request`);
+   `"readonly"` sandboxes it (Claude `--permission-mode plan`, Codex `-s read-only`).
+2. When the terminal goes idle, agent-relay strips ANSI and **detects** a prompt:
+   a numbered/▶ **menu** (single choice), a **multi-select** menu (checkboxes
+   `[ ]`/`[x]`/`◯`/`◉`), a yes/no **approval**, or — opt-in — a free-text **input**
+   step. Multi-step flows (menu → menu → input → submit) are handled as a
+   sequence: each settled screen is detected, decided, answered, and the consumed
+   screen is dropped so the next step is read fresh.
+3. The **Decider** decides; the keystrokes are written back into the PTY
+   (Enter to confirm a pre-selected option, arrow-down+Enter to pick another,
+   `y`/`n`, typed text, or — for a multi-select — a single batch that navigates
+   with ↑/↓, toggles the chosen rows with SPACE, and submits with `→`).
+4. When the agent finishes and stays idle for `completionIdleMs` (default 8s,
+   tune with `--completion-idle-ms`), the session quits the TUI and is marked
+   **completed**. While the agent is *working* (its TUI shows a
+   "… esc to interrupt" indicator) the run is **never** treated as done — so a
+   multi-minute think/build is not cut short, however long it stays quiet.
+> Driving a real TUI by scraping its output is inherently heuristic. The detector
+> patterns and keymaps are tuned for the current Claude/Codex TUIs and are
+> configurable per adapter; if a vendor changes its TUI, the patterns may need a
+> tweak. Both agents are verified working today (see Testing).
+**Multi-step menus** work out of the box for the real agents. A free-text
+**input** step, however, is **opt-in**: the detector only recognizes one when you
+give it an `inputPattern`, and it is **off by default for Claude/Codex** because
+their idle main input box would otherwise be misread as "the agent is asking for
+text". If you enable it, anchor the pattern to an unambiguous prompt phrase
+(e.g. `/enter .*name:/i`, not a bare `/enter/i` that would also fire on
+"Press Enter to continue"). Detection order within a screen is **choice →
+approval → input**.
+**Multi-select** (checkbox) menus are detected automatically when most rows carry
+a `[ ]`/`[x]`/`◯`/`◉` box. The decider returns `optionIndexes` (the set that
+should end up checked); the keymap then emits ONE batch — navigate ↑/↓, SPACE to
+toggle only the rows that differ, then submit. The submit key defaults to `→`
+(works for the common step-wizard "advance/Submit"); it is the `multiSelectSubmit`
+arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned.
+The `rule` decider keeps the current selection and submits (no judgment); an LLM
+decider picks the task-relevant rows.
+## The Decider
+The Decider answers every detected prompt. Choose it in config (`decider`) or
+per run:
+The guiding policy is **proceed by default, redirect only off-task danger**:
+let the agent do its work (so the project actually progresses) and only refuse
+an action that is *both* dangerous *and* unrelated to the task. Judging
+"on-task vs off-task" needs the goal, so it is an **LLM** job — not a regex.
+| Type | What it does |
+| --- | --- |
+| `rule` (default) | Deterministic & offline, no judgment: **approves** every y/n and **confirms the recommended (first) option** on a menu so the task proceeds. (No label guessing by default — the TUI already pre-selects its affirmative choice.) Opt into `denyPatterns` (`rm -rf`, `sudo`, ...) to make it **label-aware**: deny dangerous approvals and steer menus to a negative option — see the caveat below. |
+| `command` | Delegate to an **LLM CLI** — `claude -p`, `codex exec`, etc. The model gets the **task** and approves on-task work (even risky) while denying off-task danger with a redirect comment. |
+| `api` | Same task-aware policy via an **OpenAI-compatible HTTP endpoint** (llama.cpp / vLLM / Ollama / LM Studio / OpenAI). Works with local open models. |
+| `always-approve` | Approve/confirm everything. Maximum autonomy, no judgment. |
+```jsonc
+// agent-relay.config.json — pick ONE "decider":
+// a) local open model (llama.cpp / vLLM / Ollama, OpenAI-compatible):
+"decider": { "type": "api", "url": "http://localhost:9090/v1/chat/completions", "model": "default", "maxTokens": 1024 }
+// b) Claude CLI as the decider:
+"decider": { "type": "command", "command": "claude", "args": ["-p"] }
+// c) Codex CLI as the decider (note: `codex exec` does NOT accept `-a`):
+"decider": { "type": "command", "command": "codex", "args": ["exec", "--skip-git-repo-check", "-s", "read-only"] }
+```
+…or **per run, with no config file** — the `--decider*` flags build the same
+override on the fly (great for `npx`):
+```bash
+# local open model (api):
+agent-relay run -a claude -p "..." \
+  --decider api --decider-url http://localhost:9090/v1/chat/completions \
+  --decider-model default --decider-max-tokens 1024
+# an LLM CLI (command) — pass the WHOLE command line as one quoted string:
+agent-relay run -a claude -p "..." \
+  --decider command --decider-command "codex exec --skip-git-repo-check -s read-only"
+# force a simple policy:
+agent-relay run -a codex -p "..." --decider always-approve
+```
+The type is inferred when obvious (`--decider-url` ⇒ `api`, `--decider-command`
+⇒ `command`), so `--decider` itself is optional in those cases. A `--decider*`
+flag **overrides** `config.decider` for that run; with no flag, the configured
+(or default `rule`) decider is used. Complex command args with shell quoting
+aren't supported on the flag — use a config file for those.
+All four backends are verified. The task-aware policy is the important part:
+given the task *"delete build/ and node_modules"*, an LLM decider **approves**
+`rm -rf build node_modules` ("directly implements the task"); given the task
+*"fix the README typo"*, it **denies** `rm -rf ~/Documents` ("dangerous and
+off-task") with a redirect comment — the *same* command, opposite decisions.
+The local model + rule deciders have also driven real Claude/Codex sessions
+end-to-end. Embedders can plug any model via a `FunctionDecider`
+(`async (request) => decision`) or the `onApprovalRequest` run-hook.
+> A denial can carry a redirect `text`; agent-relay types it back to the agent
+> when its TUI next asks "what should I do instead?" (best-effort, TUI-dependent).
+> **Latency note:** the agent waits at its prompt while the decider thinks, so a
+> slow decider (a large local model can take ~20 s/decision) is fine for menus
+> the agent waits on, but raise `--idle-timeout-ms` for long sessions. The fast
+> `rule` decider is the default for unattended runs.
+> **Reasoning models:** the `api` decider's `maxTokens` (default **2048**) is a
+> *cap*, not a target — a normal decision still costs only what it needs. But a
+> reasoning model emits a long chain-of-thought before its JSON answer, so a
+> too-small cap truncates it mid-thought (empty `content` → unparseable → safe
+> deny). Keep `--decider-max-tokens` generous (≥1024) for such models; `content`
+> is parsed first, with `reasoning_content` as a fallback.
+---
+## Commands
+- **`init`** — write `agent-relay.config.json` + `.agent-relay/{sessions,logs}`.
+- **`adapters`** — list adapters (`claude`, `codex`, `fake`), mode, resume support.
+- **`doctor`** — Node version, config validity, dirs, and whether `claude`/`codex`
+  are on PATH (missing CLIs are warnings, never errors).
+- **`run`** — options: `-a/--adapter`, `-p/--prompt`, `-f/--prompt-file`,
+  `--cwd`, `--max-turns`, `--timeout-ms`, `--idle-timeout-ms`,
+  `--completion-idle-ms`, `--approval <auto|gated|readonly>`, `--ultracode`,
+  `--decider*`, `--verbose`, `--max-log-bytes`, `--dry-run`, `--extra <args...>`,
+  `--root`. Exit code `0` on `completed`, non-zero otherwise. (`--approval gated`
+  keeps the agent asking so the decider answers more; `auto` is pure-autonomy.
+  `--ultracode` (Claude) forces Opus and drives `/effort ultracode` in the TUI
+  before the task — the full preset, unattended; plain `--effort xhigh` is the
+  default otherwise.)
+- **`sessions`** — list recorded sessions with log sizes + total. Prune with
+  `sessions --prune --keep <n>` / `--older-than <days>` / `--all` (deletes the
+  session JSON **and** its log). Logs accumulate forever otherwise.
+- **`resume`** — continue a prior session with a follow-up prompt:
+  `resume --session-id <id> -p "..."`. **Both verified end-to-end** (recall prior
+  context + run the follow-up): **Claude** via `--continue`; **Codex** via its
+  captured NATIVE session id — `codex resume <id> "<prompt>"` (the codex adapter
+  records the rollout UUID from `~/.codex/sessions/…` after each run into the
+  session's `sessionRef`; a legacy session with no captured id falls back to
+  `resume --last`).
+## Configuration
+`agent-relay init` writes `agent-relay.config.json` (validated by a zod schema;
+see [`agent-relay.config.example.json`](./agent-relay.config.example.json)):
+```jsonc
+{
+  "defaultAdapter": "claude",
+  "sessionsDir": ".agent-relay/sessions",
+  "logsDir": ".agent-relay/logs",
+  "defaults": {
+    "maxTurns": 20,            // max interactions (prompts answered) before aborting
+    "timeoutMs": 1800000,      // hard wall-clock cap
+    "idleTimeoutMs": 300000,   // abort if no event for this long
+    "approvalPolicy": "auto",  // "readonly" => sandbox the agent; otherwise it asks and the decider answers
+    "sandbox": "workspace-write"
+  },
+  "decider": { "type": "rule" },
+  "hooks": { "onComplete": "echo \"[$AGENT_RELAY_STATUS] $AGENT_RELAY_SESSION_ID\"" },
+  "adapters": {
+    "claude": { "type": "claude", "mode": "pty", "command": "claude", "args": [] },
+    "codex":  { "type": "codex",  "mode": "pty", "command": "codex",  "args": [] },
+    "fake":   { "type": "fake",   "mode": "test" }
+  }
+}
+```
+- **`decider`** decides every prompt — this is the real approval mechanism (see
+  **The Decider**). `approvalPolicy` no longer "approves"; it only selects
+  `readonly` (Codex `-s read-only`, Claude `--permission-mode plan`) vs. letting
+  the agent ask normally.
+- `adapters.<name>.args` are passed straight through to the underlying CLI
+  (after the adapter's own flags, before the prompt) — e.g.
+  `adapters.claude.args: ["--effort", "max"]` to set Claude's effort level
+  (`low|medium|high|xhigh|max`), or `["--model", "opus"]`. The same flags can be
+  set per-run with `--extra` (e.g. `run -a claude -p "..." --extra --effort max`).
+  `adapters.<name>.env` injects/overrides spawn env vars — the Claude adapter sets
+  `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` by default (override with
+  `adapters.claude.env: { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "0" }`).
+- `defaults.completionIdleMs` (or `--completion-idle-ms`) tunes how long the
+  finished agent must stay quiet before its TUI is quit (default 8s).
+- For a fully autonomous, no-prompt agent (no decider involvement), set
+  `adapters.claude.args: ["--dangerously-skip-permissions"]` /
+  `adapters.codex.args: ["--full-auto"]`.
+- **`hooks`** are shell commands run at lifecycle points with `AGENT_RELAY_*`
+  env vars (`SESSION_ID`, `STATUS`, `ADAPTER`, `LOG_FILE`, `EXIT_CODE`, `CWD`).
+> ⚠️ **Safety:** the default `rule` decider **approves everything** so the task
+> can progress (danger-regex blocking is opt-in via `denyPatterns`, and is
+> unreliable on mangled TUI text anyway). For real control, use `approvalPolicy:
+> "readonly"` (sandbox), or an LLM `command`/`api` decider that approves on-task
+> work and denies off-task danger. Don't point an auto-approving run at an
+> untrusted task.
+## Sessions, logs, completion
+- `.agent-relay/sessions/<id>.json` — metadata (prompt, adapter, status, times,
+  logFile, exitCode, error, meta). Tiny; list/prune with `agent-relay sessions`.
+- `.agent-relay/logs/<id>.log` — header + one line per event + footer. By default
+  the **raw stdout/stderr stream is omitted** (a TUI redraws constantly — it's
+  ~98% noise), so a run logs only its meaningful events and stays ~1 KB. Use
+  `--verbose` for the full stream when debugging, and `--max-log-bytes` to cap it.
+- Status lifecycle: `created → running → { completed | failed | timeout |
+  cancelled | waiting_approval }`. The `CompletionDetector` maps the run to a
+  terminal status (timeout/idle → `timeout`, cancel → `cancelled`, success/exit
+  0 → `completed`, else `failed`) and is extensible.
+## Architecture
+```
+src/core/        types · config(zod) · session · logger · completion · decider ·
+                 hooks · runner · errors · util/{ansi,line-splitter,which,...}
+src/adapters/    registry · fake-adapter ·
+   interactive/  pty-session (the detect→decide→respond loop) · prompt-detector ·
+                 interactive-adapter · claude-interactive · codex-interactive
+src/commands/    init · adapters · doctor · run · resume
+src/cli.ts       commander CLI       src/index.ts  programmatic API
+```
+**Decoupling rule:** `core/` depends only on the `AgentAdapter` interface and
+receives an adapter factory from `adapters/registry.ts`; it never imports a
+vendor adapter. The `Decider` reaches adapters via the run context, so any
+decision backend works with any agent.
+```ts
+import { runAgent, createAdapterFactory, RuleDecider } from "agent-relay";
+const outcome = await runAgent({
+  config, rootDir: process.cwd(), adapterName: "codex",
+  prompt: "Fix the failing test",
+  resolveAdapter: createAdapterFactory(),
+  decider: new RuleDecider(),                 // or a CommandDecider / FunctionDecider
+});
+```
+## Testing
+```bash
+pnpm typecheck
+pnpm test                    # 100+ offline, deterministic tests
+```
+The default suite is fully offline. Its centerpiece is a **real PTY test**: a
+deterministic fake interactive agent (`tests/fixtures/fake-interactive.cjs`) is
+driven through `runPtySession` and the Decider answers its approval + menu
+prompts — proving the detect → decide → respond → complete loop, plus deny and
+abort paths. A second fixture (`fake-interactive-wizard.cjs`, `tests/wizard.test.ts`)
+drives a **multi-step** flow — menu → menu → free-text input → `[y/n]` submit —
+and asserts each step is detected on its own fresh screen (the per-step buffer
+reset), that a non-sequentially-numbered menu maps to the right keystroke, and
+that the submitted artifact reflects the exact navigated choices. It also covers
+the decider, prompt detector, ANSI cleanup, config, sessions, hooks, completion,
+safety rails, and the commands.
+### Real Claude/Codex integration (opt-in)
+```bash
+AGENT_RELAY_RUN_REAL_AGENT_TESTS=1 pnpm test
+```
+Skipped by default; each adapter self-skips when its CLI is absent; runs in a
+temp dir with a trivial file-creation prompt. **Both Codex and Claude are
+verified** end-to-end: agent-relay drives the real TUI, the rule decider answers
+the trust/permission prompts, and the agent creates the file.
+## Remaining TODO / future work
+- **Structured-interactive backends**: Claude `--input-format stream-json` +
+  permission callbacks and Codex `app-server`/`mcp-server` would be more robust
+  than PTY scraping for those agents (the "structured-first" path). PTY is the
+  universal driver today.
+- **Richer prompt detection / per-agent keymaps** as TUIs evolve.
+- **Native session-id capture in PTY mode** — resume is wired via `--continue` /
+  `resume --last` (most-recent in cwd); capturing the real id would allow
+  resuming a *specific* older session.
+- Add a `repository` URL to `package.json` before publishing.
+## Working in this repo
+See [`CLAUDE.md`](./CLAUDE.md). `.claude/settings.json` ships a Stop hook that
+runs `pnpm typecheck` and blocks finishing on type errors.
+## License
+MIT — see [LICENSE](./LICENSE).

package/agent-relay.config.example.json ADDED Viewed

@@ -0,0 +1,38 @@
+{
+  "defaultAdapter": "claude",
+  "sessionsDir": ".agent-relay/sessions",
+  "logsDir": ".agent-relay/logs",
+  "defaults": {
+    "maxTurns": 20,
+    "timeoutMs": 1800000,
+    "idleTimeoutMs": 300000,
+    "approvalPolicy": "auto",
+    "sandbox": "workspace-write",
+    "requireApprovalOnRiskyActions": false
+  },
+  "adapters": {
+    "claude": {
+      "type": "claude",
+      "mode": "pty",
+      "command": "claude",
+      "args": []
+    },
+    "codex": {
+      "type": "codex",
+      "mode": "pty",
+      "command": "codex",
+      "args": []
+    },
+    "fake": {
+      "type": "fake",
+      "mode": "test",
+      "args": []
+    }
+  },
+  "decider": {
+    "type": "rule"
+  },
+  "hooks": {
+    "onComplete": "echo \"[$AGENT_RELAY_STATUS] $AGENT_RELAY_SESSION_ID\""
+  }
+}

package/dist/cli.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ #!/usr/bin/env node