npm - @yul-labs/agent-relay - Versions diffs - 0.1.1 → 0.1.3 - Mend

@yul-labs/agent-relay 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -99,11 +99,15 @@ prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
 ```
 1. The agent is launched **interactively** in a pseudo-terminal under **pure
-   autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`,
-   Codex with `-s workspace-write -a never`. The agent rarely asks — but the
+   autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`
+   (+ `--effort xhigh` by default), Codex with `-s danger-full-access -a never`
+   (full bypass by default — tighten with `defaults.sandbox` /
+   `adapters.codex.sandbox`, e.g. `workspace-write`). The agent rarely asks — but the
    prompts that *still* appear (the directory-trust dialog, the occasional choice)
    are what the Decider answers. `approvalPolicy: "gated"` makes the agent ask
-   more (Claude `--permission-mode acceptEdits`, Codex `-a on-request`);
+   more — and routes those asks to the Decider — via Claude `--permission-mode
+   default` (its normal "ask before each edit/command" mode; **not** `acceptEdits`,
+   which would silently auto-approve edits) and Codex `-a on-request`;
    `"readonly"` sandboxes it (Claude `--permission-mode plan`, Codex `-s read-only`).
 2. When the terminal goes idle, agent-relay strips ANSI and **detects** a prompt:
    a numbered/▶ **menu** (single choice), a **multi-select** menu (checkboxes
@@ -144,6 +148,47 @@ arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned
 The `rule` decider keeps the current selection and submits (no judgment); an LLM
 decider picks the task-relevant rows.
+### Getting a result back (what you can and can't read)
+agent-relay's job is to **drive** the agent and **answer its prompts** — not to
+recover the agent's prose answer from the screen. The TUI text is heavily mangled
+after ANSI stripping (words run together, spinners interleave), so **scraping a
+structured answer out of stdout is not reliable** and is deliberately not
+attempted. Two things you *can* rely on:
+- **For a structured result, have the agent WRITE IT TO A FILE.** Put it in the
+  prompt — e.g. *"...and write the final JSON to `result.json`"* — then read that
+  file after the run. This is the validated pattern for getting data back out.
+- **Token usage is surfaced** as `result.meta.usage`, read from the agent's OWN
+  session transcript (`~/.claude/projects/…` for Claude, `~/.codex/sessions/…` for
+  Codex) — so it works on every machine **regardless of TUI / status-line
+  settings**, and the token counts are the API's real numbers, not a scrape.
+  Token fields: `{ source, model, inputTokens, outputTokens, cachedInputTokens,
+  cacheCreationTokens, reasoningTokens, totalTokens }`.
+- **Cost is COMPUTED, not scraped** — `usage.costUsd` = Σ(tokens × the model's
+  list price) from a built-in table (override per `config.pricing`). It is `null`
+  (with a warning) when the model's price is unknown, so you can tell "unpriced"
+  from a real `0`. The scraped nominal "Session $" (often `0` on Team/Max seats) is
+  kept separately as `usage.subscriptionSessionCostUsd`; `usage.contextPercent` is
+  another status-line extra. `source` is `"transcript"` (authoritative tokens) or
+  `"status-line"`.
+  > **`costUsd` is a REFERENCE (shadow) cost** — what the run *would* bill via the
+  > API. agent-relay drives the **interactive TUI**, which on a Claude subscription
+  > is covered by the plan, so for those runs the real marginal cost is ~0 and
+  > `costUsd` is a reference, not the amount charged. Only the non-interactive paths
+  > (`claude -p` / Agent SDK — and, per Anthropic, from **2026-06-15** no longer
+  > counted against subscription limits) actually bill at these rates. Staying on
+  > the **interactive** PTY path is what keeps unattended runs on the subscription.
+A **structured-first** backend (Claude `--output-format stream-json`) *would*
+return the answer text + usage + cost natively — but it **requires `-p`/`--print`
+(non-interactive)**, which is a different execution model from the TUI and, on a
+Claude subscription, is **not covered by the plan** (per Anthropic, `claude -p` /
+Agent SDK fall outside subscription limits from 2026-06-15). So driving the
+**interactive TUI** here is deliberate: it keeps unattended runs on the
+subscription. The file-write pattern is the supported way to recover answers
+without leaving that path.
 ## The Decider
 The Decider answers every detected prompt. Choose it in config (`decider`) or
@@ -315,6 +360,28 @@ see [`agent-relay.config.example.json`](./agent-relay.config.example.json)):
   cancelled | waiting_approval }`. The `CompletionDetector` maps the run to a
   terminal status (timeout/idle → `timeout`, cancel → `cancelled`, success/exit
   0 → `completed`, else `failed`) and is extensible.
+- `meta.completedCleanly` (exit 0, not aborted) is the unambiguous success flag;
+  `meta.settled` is `true` when the run answered a prompt **or** completed cleanly
+  (so a 0-interaction skip-mode run is `settled: true`, not falsely `false`).
+## Operational notes
+- **It writes into the target repo.** `sessionsDir` / `logsDir` default to
+  `<root>/.agent-relay/{sessions,logs}` — add **`.agent-relay/`** to the target
+  repo's `.gitignore`, or point them elsewhere (absolute / out-of-root paths work).
+  `--dry-run` still records a session entry, clearly marked `meta.dryRun: true`.
+- **Pin the binary.** Without `adapters.<name>.command`, the first `claude` /
+  `codex` on `PATH` is used — ambiguous when several exist (version managers, shell
+  wrappers, `cmux`). Set `command` to an absolute path; either way the **resolved
+  path is logged** at run start (`resolved claude → …`).
+- **Effort/cost defaults.** Claude runs at `--effort xhigh` and Codex at
+  `-s danger-full-access` by **default** (full autonomy) — both raise cost/latency
+  and access; override with `adapters.<name>.args` / `defaults.sandbox`.
+- **Reasoning-model deciders.** An `api` decider pointed at a reasoning model
+  (e.g. gpt-oss) spends tokens on `reasoning_content` before the JSON answer; too
+  small a `--decider-max-tokens` empties `content`. agent-relay clamps it up to a
+  **512 floor** and falls back to parsing `reasoning_content`, so a normal config
+  won't return an empty decision — but give verbose reasoners more headroom.
 ## Architecture