@yul-labs/agent-relay 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -99,11 +99,15 @@ prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
99
99
  ```
100
100
 
101
101
  1. The agent is launched **interactively** in a pseudo-terminal under **pure
102
- autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`,
103
- Codex with `-s workspace-write -a never`. The agent rarely asks — but the
102
+ autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`
103
+ (+ `--effort xhigh` by default), Codex with `-s danger-full-access -a never`
104
+ (full bypass by default — tighten with `defaults.sandbox` /
105
+ `adapters.codex.sandbox`, e.g. `workspace-write`). The agent rarely asks — but the
104
106
  prompts that *still* appear (the directory-trust dialog, the occasional choice)
105
107
  are what the Decider answers. `approvalPolicy: "gated"` makes the agent ask
106
- more (Claude `--permission-mode acceptEdits`, Codex `-a on-request`);
108
+ more — and routes those asks to the Decider — via Claude `--permission-mode
109
+ default` (its normal "ask before each edit/command" mode; **not** `acceptEdits`,
110
+ which would silently auto-approve edits) and Codex `-a on-request`;
107
111
  `"readonly"` sandboxes it (Claude `--permission-mode plan`, Codex `-s read-only`).
108
112
  2. When the terminal goes idle, agent-relay strips ANSI and **detects** a prompt:
109
113
  a numbered/▶ **menu** (single choice), a **multi-select** menu (checkboxes
@@ -144,6 +148,47 @@ arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned
144
148
  The `rule` decider keeps the current selection and submits (no judgment); an LLM
145
149
  decider picks the task-relevant rows.
146
150
 
151
+ ### Getting a result back (what you can and can't read)
152
+
153
+ agent-relay's job is to **drive** the agent and **answer its prompts** — not to
154
+ recover the agent's prose answer from the screen. The TUI text is heavily mangled
155
+ after ANSI stripping (words run together, spinners interleave), so **scraping a
156
+ structured answer out of stdout is not reliable** and is deliberately not
157
+ attempted. Two things you *can* rely on:
158
+
159
+ - **For a structured result, have the agent WRITE IT TO A FILE.** Put it in the
160
+ prompt — e.g. *"...and write the final JSON to `result.json`"* — then read that
161
+ file after the run. This is the validated pattern for getting data back out.
162
+ - **Token usage is surfaced** as `result.meta.usage`, read from the agent's OWN
163
+ session transcript (`~/.claude/projects/…` for Claude, `~/.codex/sessions/…` for
164
+ Codex) — so it works on every machine **regardless of TUI / status-line
165
+ settings**, and the token counts are the API's real numbers, not a scrape.
166
+ Token fields: `{ source, model, inputTokens, outputTokens, cachedInputTokens,
167
+ cacheCreationTokens, reasoningTokens, totalTokens }`.
168
+ - **Cost is COMPUTED, not scraped** — `usage.costUsd` = Σ(tokens × the model's
169
+ list price) from a built-in table (override per `config.pricing`). It is `null`
170
+ (with a warning) when the model's price is unknown, so you can tell "unpriced"
171
+ from a real `0`. The scraped nominal "Session $" (often `0` on Team/Max seats) is
172
+ kept separately as `usage.subscriptionSessionCostUsd`; `usage.contextPercent` is
173
+ another status-line extra. `source` is `"transcript"` (authoritative tokens) or
174
+ `"status-line"`.
175
+ > **`costUsd` is a REFERENCE (shadow) cost** — what the run *would* bill via the
176
+ > API. agent-relay drives the **interactive TUI**, which on a Claude subscription
177
+ > is covered by the plan, so for those runs the real marginal cost is ~0 and
178
+ > `costUsd` is a reference, not the amount charged. Only the non-interactive paths
179
+ > (`claude -p` / Agent SDK — and, per Anthropic, from **2026-06-15** no longer
180
+ > counted against subscription limits) actually bill at these rates. Staying on
181
+ > the **interactive** PTY path is what keeps unattended runs on the subscription.
182
+
183
+ A **structured-first** backend (Claude `--output-format stream-json`) *would*
184
+ return the answer text + usage + cost natively — but it **requires `-p`/`--print`
185
+ (non-interactive)**, which is a different execution model from the TUI and, on a
186
+ Claude subscription, is **not covered by the plan** (per Anthropic, `claude -p` /
187
+ Agent SDK fall outside subscription limits from 2026-06-15). So driving the
188
+ **interactive TUI** here is deliberate: it keeps unattended runs on the
189
+ subscription. The file-write pattern is the supported way to recover answers
190
+ without leaving that path.
191
+
147
192
  ## The Decider
148
193
 
149
194
  The Decider answers every detected prompt. Choose it in config (`decider`) or
@@ -315,6 +360,28 @@ see [`agent-relay.config.example.json`](./agent-relay.config.example.json)):
315
360
  cancelled | waiting_approval }`. The `CompletionDetector` maps the run to a
316
361
  terminal status (timeout/idle → `timeout`, cancel → `cancelled`, success/exit
317
362
  0 → `completed`, else `failed`) and is extensible.
363
+ - `meta.completedCleanly` (exit 0, not aborted) is the unambiguous success flag;
364
+ `meta.settled` is `true` when the run answered a prompt **or** completed cleanly
365
+ (so a 0-interaction skip-mode run is `settled: true`, not falsely `false`).
366
+
367
+ ## Operational notes
368
+
369
+ - **It writes into the target repo.** `sessionsDir` / `logsDir` default to
370
+ `<root>/.agent-relay/{sessions,logs}` — add **`.agent-relay/`** to the target
371
+ repo's `.gitignore`, or point them elsewhere (absolute / out-of-root paths work).
372
+ `--dry-run` still records a session entry, clearly marked `meta.dryRun: true`.
373
+ - **Pin the binary.** Without `adapters.<name>.command`, the first `claude` /
374
+ `codex` on `PATH` is used — ambiguous when several exist (version managers, shell
375
+ wrappers, `cmux`). Set `command` to an absolute path; either way the **resolved
376
+ path is logged** at run start (`resolved claude → …`).
377
+ - **Effort/cost defaults.** Claude runs at `--effort xhigh` and Codex at
378
+ `-s danger-full-access` by **default** (full autonomy) — both raise cost/latency
379
+ and access; override with `adapters.<name>.args` / `defaults.sandbox`.
380
+ - **Reasoning-model deciders.** An `api` decider pointed at a reasoning model
381
+ (e.g. gpt-oss) spends tokens on `reasoning_content` before the JSON answer; too
382
+ small a `--decider-max-tokens` empties `content`. agent-relay clamps it up to a
383
+ **512 floor** and falls back to parsing `reasoning_content`, so a normal config
384
+ won't return an empty decision — but give verbose reasoners more headroom.
318
385
 
319
386
  ## Architecture
320
387