@yul-labs/agent-relay 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +70 -3
- package/dist/cli.js +317 -72
- package/dist/index.d.ts +212 -16
- package/dist/index.js +312 -67
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -99,11 +99,15 @@ prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
|
|
|
99
99
|
```
|
|
100
100
|
|
|
101
101
|
1. The agent is launched **interactively** in a pseudo-terminal under **pure
|
|
102
|
-
autonomy** (the project's concept): Claude with `--dangerously-skip-permissions
|
|
103
|
-
Codex with `-s
|
|
102
|
+
autonomy** (the project's concept): Claude with `--dangerously-skip-permissions`
|
|
103
|
+
(+ `--effort xhigh` by default), Codex with `-s danger-full-access -a never`
|
|
104
|
+
(full bypass by default — tighten with `defaults.sandbox` /
|
|
105
|
+
`adapters.codex.sandbox`, e.g. `workspace-write`). The agent rarely asks — but the
|
|
104
106
|
prompts that *still* appear (the directory-trust dialog, the occasional choice)
|
|
105
107
|
are what the Decider answers. `approvalPolicy: "gated"` makes the agent ask
|
|
106
|
-
more
|
|
108
|
+
more — and routes those asks to the Decider — via Claude `--permission-mode
|
|
109
|
+
default` (its normal "ask before each edit/command" mode; **not** `acceptEdits`,
|
|
110
|
+
which would silently auto-approve edits) and Codex `-a on-request`;
|
|
107
111
|
`"readonly"` sandboxes it (Claude `--permission-mode plan`, Codex `-s read-only`).
|
|
108
112
|
2. When the terminal goes idle, agent-relay strips ANSI and **detects** a prompt:
|
|
109
113
|
a numbered/▶ **menu** (single choice), a **multi-select** menu (checkboxes
|
|
@@ -144,6 +148,47 @@ arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned
|
|
|
144
148
|
The `rule` decider keeps the current selection and submits (no judgment); an LLM
|
|
145
149
|
decider picks the task-relevant rows.
|
|
146
150
|
|
|
151
|
+
### Getting a result back (what you can and can't read)
|
|
152
|
+
|
|
153
|
+
agent-relay's job is to **drive** the agent and **answer its prompts** — not to
|
|
154
|
+
recover the agent's prose answer from the screen. The TUI text is heavily mangled
|
|
155
|
+
after ANSI stripping (words run together, spinners interleave), so **scraping a
|
|
156
|
+
structured answer out of stdout is not reliable** and is deliberately not
|
|
157
|
+
attempted. Two things you *can* rely on:
|
|
158
|
+
|
|
159
|
+
- **For a structured result, have the agent WRITE IT TO A FILE.** Put it in the
|
|
160
|
+
prompt — e.g. *"...and write the final JSON to `result.json`"* — then read that
|
|
161
|
+
file after the run. This is the validated pattern for getting data back out.
|
|
162
|
+
- **Token usage is surfaced** as `result.meta.usage`, read from the agent's OWN
|
|
163
|
+
session transcript (`~/.claude/projects/…` for Claude, `~/.codex/sessions/…` for
|
|
164
|
+
Codex) — so it works on every machine **regardless of TUI / status-line
|
|
165
|
+
settings**, and the token counts are the API's real numbers, not a scrape.
|
|
166
|
+
Token fields: `{ source, model, inputTokens, outputTokens, cachedInputTokens,
|
|
167
|
+
cacheCreationTokens, reasoningTokens, totalTokens }`.
|
|
168
|
+
- **Cost is COMPUTED, not scraped** — `usage.costUsd` = Σ(tokens × the model's
|
|
169
|
+
list price) from a built-in table (override per `config.pricing`). It is `null`
|
|
170
|
+
(with a warning) when the model's price is unknown, so you can tell "unpriced"
|
|
171
|
+
from a real `0`. The scraped nominal "Session $" (often `0` on Team/Max seats) is
|
|
172
|
+
kept separately as `usage.subscriptionSessionCostUsd`; `usage.contextPercent` is
|
|
173
|
+
another status-line extra. `source` is `"transcript"` (authoritative tokens) or
|
|
174
|
+
`"status-line"`.
|
|
175
|
+
> **`costUsd` is a REFERENCE (shadow) cost** — what the run *would* bill via the
|
|
176
|
+
> API. agent-relay drives the **interactive TUI**, which on a Claude subscription
|
|
177
|
+
> is covered by the plan, so for those runs the real marginal cost is ~0 and
|
|
178
|
+
> `costUsd` is a reference, not the amount charged. Only the non-interactive paths
|
|
179
|
+
> (`claude -p` / Agent SDK — and, per Anthropic, from **2026-06-15** no longer
|
|
180
|
+
> counted against subscription limits) actually bill at these rates. Staying on
|
|
181
|
+
> the **interactive** PTY path is what keeps unattended runs on the subscription.
|
|
182
|
+
|
|
183
|
+
A **structured-first** backend (Claude `--output-format stream-json`) *would*
|
|
184
|
+
return the answer text + usage + cost natively — but it **requires `-p`/`--print`
|
|
185
|
+
(non-interactive)**, which is a different execution model from the TUI and, on a
|
|
186
|
+
Claude subscription, is **not covered by the plan** (per Anthropic, `claude -p` /
|
|
187
|
+
Agent SDK fall outside subscription limits from 2026-06-15). So driving the
|
|
188
|
+
**interactive TUI** here is deliberate: it keeps unattended runs on the
|
|
189
|
+
subscription. The file-write pattern is the supported way to recover answers
|
|
190
|
+
without leaving that path.
|
|
191
|
+
|
|
147
192
|
## The Decider
|
|
148
193
|
|
|
149
194
|
The Decider answers every detected prompt. Choose it in config (`decider`) or
|
|
@@ -315,6 +360,28 @@ see [`agent-relay.config.example.json`](./agent-relay.config.example.json)):
|
|
|
315
360
|
cancelled | waiting_approval }`. The `CompletionDetector` maps the run to a
|
|
316
361
|
terminal status (timeout/idle → `timeout`, cancel → `cancelled`, success/exit
|
|
317
362
|
0 → `completed`, else `failed`) and is extensible.
|
|
363
|
+
- `meta.completedCleanly` (exit 0, not aborted) is the unambiguous success flag;
|
|
364
|
+
`meta.settled` is `true` when the run answered a prompt **or** completed cleanly
|
|
365
|
+
(so a 0-interaction skip-mode run is `settled: true`, not falsely `false`).
|
|
366
|
+
|
|
367
|
+
## Operational notes
|
|
368
|
+
|
|
369
|
+
- **It writes into the target repo.** `sessionsDir` / `logsDir` default to
|
|
370
|
+
`<root>/.agent-relay/{sessions,logs}` — add **`.agent-relay/`** to the target
|
|
371
|
+
repo's `.gitignore`, or point them elsewhere (absolute / out-of-root paths work).
|
|
372
|
+
`--dry-run` still records a session entry, clearly marked `meta.dryRun: true`.
|
|
373
|
+
- **Pin the binary.** Without `adapters.<name>.command`, the first `claude` /
|
|
374
|
+
`codex` on `PATH` is used — ambiguous when several exist (version managers, shell
|
|
375
|
+
wrappers, `cmux`). Set `command` to an absolute path; either way the **resolved
|
|
376
|
+
path is logged** at run start (`resolved claude → …`).
|
|
377
|
+
- **Effort/cost defaults.** Claude runs at `--effort xhigh` and Codex at
|
|
378
|
+
`-s danger-full-access` by **default** (full autonomy) — both raise cost/latency
|
|
379
|
+
and access; override with `adapters.<name>.args` / `defaults.sandbox`.
|
|
380
|
+
- **Reasoning-model deciders.** An `api` decider pointed at a reasoning model
|
|
381
|
+
(e.g. gpt-oss) spends tokens on `reasoning_content` before the JSON answer; too
|
|
382
|
+
small a `--decider-max-tokens` empties `content`. agent-relay clamps it up to a
|
|
383
|
+
**512 floor** and falls back to parsing `reasoning_content`, so a normal config
|
|
384
|
+
won't return an empty decision — but give verbose reasoners more headroom.
|
|
318
385
|
|
|
319
386
|
## Architecture
|
|
320
387
|
|