npm - maestro-agent-sdk - Versions diffs - 0.1.13 → 0.1.16 - Mend

maestro-agent-sdk 0.1.13 → 0.1.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/README.md +214 -163
package/dist/core/agent.d.ts +7 -1
package/dist/core/agent.d.ts.map +1 -1
package/dist/core/agent.js +2 -0
package/dist/core/agent.js.map +1 -1
package/dist/core/loop.d.ts.map +1 -1
package/dist/core/loop.js +77 -15
package/dist/core/loop.js.map +1 -1
package/dist/index.d.ts +7 -6
package/dist/index.d.ts.map +1 -1
package/dist/index.js +6 -5
package/dist/index.js.map +1 -1
package/dist/mcp/pool.d.ts +2 -2
package/dist/mcp/pool.d.ts.map +1 -1
package/dist/mcp/pool.js.map +1 -1
package/dist/platform/env-bootstrap.js +4 -1
package/dist/platform/env-bootstrap.js.map +1 -1
package/dist/platform/logger.d.ts.map +1 -1
package/dist/platform/logger.js.map +1 -1
package/dist/platform/version.d.ts +1 -1
package/dist/platform/version.js +1 -1
package/dist/provider.d.ts +72 -17
package/dist/provider.d.ts.map +1 -1
package/dist/provider.js +119 -29
package/dist/provider.js.map +1 -1
package/dist/providers/anthropic.d.ts +98 -26
package/dist/providers/anthropic.d.ts.map +1 -1
package/dist/providers/anthropic.js +158 -31
package/dist/providers/anthropic.js.map +1 -1
package/dist/providers/deepseek.d.ts +16 -5
package/dist/providers/deepseek.d.ts.map +1 -1
package/dist/providers/deepseek.js +17 -6
package/dist/providers/deepseek.js.map +1 -1
package/dist/sub-agent/runner.d.ts.map +1 -1
package/dist/sub-agent/runner.js +1 -1
package/dist/sub-agent/runner.js.map +1 -1
package/dist/tools/builtin/bash.d.ts.map +1 -1
package/dist/tools/builtin/bash.js.map +1 -1
package/dist/tools/builtin/edit.d.ts.map +1 -1
package/dist/tools/builtin/edit.js.map +1 -1
package/dist/tools/builtin/multi_edit.d.ts.map +1 -1
package/dist/tools/builtin/multi_edit.js.map +1 -1
package/dist/tools/builtin/write.d.ts.map +1 -1
package/dist/tools/builtin/write.js.map +1 -1
package/dist/tools/index.d.ts +36 -0
package/dist/tools/index.d.ts.map +1 -0
package/dist/tools/index.js +40 -0
package/dist/tools/index.js.map +1 -0
package/dist/types.d.ts +78 -0
package/dist/types.d.ts.map +1 -1
package/dist/types.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -4,25 +4,18 @@
 [![npm version](https://img.shields.io/npm/v/maestro-agent-sdk.svg)](https://www.npmjs.com/package/maestro-agent-sdk)
 [![license](https://img.shields.io/npm/l/maestro-agent-sdk.svg)](./LICENSE)
-**Embeddable agent SDK that ships skills, memory, and MCP out of the box.**
+**Embeddable agent SDK — skills, memory, MCP, and host-controlled guardrails out of the box.**
 Anthropic + DeepSeek today, BYO-provider in one file. No CLI, no gateway, no host lock-in.
 > **Status:** Early port (v0.1.x). Active development. API surface may change before 1.0.
-Inspired by [Claude Code](https://www.anthropic.com/claude-code) and [`hermes-agent`](https://github.com/NousResearch/hermes-agent) — same agent-loop shape, repackaged as an embeddable TypeScript library.
-### How it compares
-| | What you get |
-|---|---|
-| **vs [`@anthropic-ai/claude-agent-sdk`](https://github.com/anthropics/claude-agent-sdk-typescript)** | Multi-provider from day one (Anthropic + DeepSeek), with skills (`SKILL.md` / `skill.md` indexing), memory (auto context compaction), and MCP client pool built in — not provided as separate add-ons. |
-| **vs LangChain / LangGraph** | Thin loop, no DSL. A provider is one adapter file; a tool is `{ name, description, schema, run }`. You read the source in an afternoon. |
+A generalizable agent runtime. Swap providers, inject your own logger/MCP resolver/hooks, and embed it in any host process — no framework, no lock-in.
 ## What's in the box
-- **Agent loop** — provider-driven tool-calling loop with iteration cap, abort signal, and event stream.
+- **Agent loop** — provider-driven tool-calling loop with iteration cap, abort signal, LLM pre/post guardrail hooks, and event stream.
 - **Pluggable providers** — first-class adapters for Anthropic (Claude) and DeepSeek V4; provider-neutral message schema so adding OpenAI / Gemini / Ollama is a thin file.
-- **Built-in tools** — `bash`, `Read`, `Write`, `Edit`, `Glob`, `Grep`, `Agent` (sub-agent delegation), `TaskCreate`/`TaskUpdate`/`TaskList`/`TaskGet`, `WebFetch`, `skill_view`, `skill_write`. Bring your own via `ToolRegistry`. Grep shells out to ripgrep (`rg`) so install it if you want the tool active; the SDK surfaces a structured error pointing to the install path when missing.
+- **Built-in tools** — `bash`, `Read`, `Write`, `Edit`, `MultiEdit`, `Glob`, `Grep`, `Agent` (sub-agent delegation), `TaskCreate`/`TaskUpdate`/`TaskList`/`TaskGet`, `WebFetch` (optional SSRF policy via `createWebFetchTool`), `skill_view`, `skill_write`. Bring your own via `ToolRegistry`. Grep shells out to ripgrep (`rg`) so install it if you want the tool active; the SDK surfaces a structured error pointing to the install path when missing. Tool primitives are also importable from the `maestro-agent-sdk/tools` subpath when you don't need the rest of the runtime.
 - **MCP** — built-in client pool (stdio + SSE) so any MCP server (`@modelcontextprotocol/sdk`) shows up as tools.
 - **Skills** — per-workspace `.skills/<skillKey>/<name>/skill.md` packages with FTS-style indexing, on-demand body load (`skill_view`), and agent-autonomous authoring (`skill_write`).
 - **Memory** — automatic context compression (summarization + pruning) when the token budget is hit. Reuses the agent's own model for compaction — no separate model knob.
@@ -107,133 +100,67 @@ for await (const event of runConversation(agent, "Summarize today's news.")) {
 }
 ```
-> **Effort scale.** `effort` drives both the thinking budget _and_ the
-> tool-iteration cap. The model also sees its remaining-iteration count in a
-> `<system-reminder>` block every turn so it can self-pace. Knobs:
+> **Effort scale (v0.1.16+).** `effort` controls two orthogonal knobs:
+>
+> 1. **Reasoning depth** — thinking budget on Anthropic (`thinking.budget_tokens`),
+>    `reasoning_effort` on DeepSeek.
+> 2. **Working-mode persona** — a `## Working mode` block injected into the
+>    system prompt with imperative verbs the model conditions on from turn 1
+>    (e.g. `low` → "answer fast, one Read max", `max` → "exhaustive,
+>    enumerate failure modes"). Pure function of `effort`, prefix-cache stable.
+>
+> The **tool-iteration cap is no longer derived from effort** as of v0.1.16 —
+> it's a single host-tunable default (`DEFAULT_MAX_ITERATIONS = 90`) that you
+> override per call via `AgentQueryOptions.maxIterations`. The default
+> matches v0.1.15's old `xhigh` cap (the "extended exploration" baseline)
+> so existing callers that didn't pin a value see no behavior change. This
+> lets a host mix and match: `effort: "low"` + `maxIterations: 90` (terse,
+> but don't trip on a surprise sub-task), or `effort: "max"` +
+> `maxIterations: 30` (think hard, but stay snappy).
+>
+> | effort  | thinking budget (Anthropic) | DeepSeek `reasoning_effort` |
+> |---------|----------------------------:|:---------------------------:|
+> | `low`   |                      2 048  |          `low`              |
+> | `medium`|                      8 192  |          `medium`           |
+> | `high`  |                     16 384  |          `high`             |
+> | `xhigh` |                     16 384  |          `high`             |
+> | `max`   |                     32 768  |          `max`              |
+>
+> **`xhigh` shares `high`'s thinking ceiling** — the difference is persona,
+> not budget. `xhigh` tells the model to use the same allowance more broadly
+> (hold multiple hypotheses, survey, name edge cases). On sonnet-4-6 /
+> haiku-4-5 the answer-quality return on thinking above ~16K dropped off
+> sharply in practice, so the lever became persona instead of more tokens.
+>
+> **`max` is halved from v0.1.15's 65 536** — 64K thinking is rarely fully
+> utilized in a single turn; the latency penalty was unrecouped. 32K is the
+> ceiling for "really chew on this" without paying for headroom the model
+> doesn't reach.
 >
-> | effort  | thinking budget | iteration cap |
-> |---------|----------------:|--------------:|
-> | `low`   |          2 048  |             5 |
-> | `medium`|          8 192  |            20 |
-> | `high`  |         16 384  |            50 |
-> | `xhigh` |         32 768  |            90 |
-> | `max`   |         65 536  |           200 |
+> DeepSeek's API ships four tiers; maestro's `xhigh` maps to DeepSeek `high`
+> (not `max`) so that `max` stays reserved for the explicit "deepest
+> reasoning" opt-in.
+>
+> **Turn-adaptive budget (v0.1.16).** The per-turn thinking budget the loop
+> actually sends to the API is *not* constant — it's resolved through
+> `thinkingBudgetForTurn(base, iter, maxIter)`:
+>
+> - **First turn** (`iter == 0`) — full base. Planning gets the full
+>   allowance because a careful first-turn plan saves tool calls later.
+> - **Middle turns** — full base. Interleaved thinking between tool calls
+>   is what Anthropic's interleaved-thinking beta is for; cutting it
+>   mid-flow defeats the beta.
+> - **Last 3 turns** (wrap-up zone) — `base / 4`, floored at 1024 (the
+>   Anthropic API minimum). The iteration reminder has already flipped to
+>   "finalize NOW"; spending another 16K thinking on a turn that mostly
+>   emits final text is pure latency waste.
+>
+> The model still sees a remaining-iteration count in the per-turn
+> `<system-reminder>` so it can self-pace within the cap you set.
 More runnable scripts live under [`examples/`](./examples) — Anthropic, DeepSeek,
 a custom-tool walkthrough, and a `skill_write` demo.
-## Skills — per-workspace, agent-autonomous
-Skill catalog routing is deterministic from `(opts.cwd, opts.skillKey)`:
-```
-  skillKey set    → <cwd>/.skills/<skillKey>/
-  skillKey unset  → <cwd>/.skills/default/   (uses MAESTRO_DEFAULT_SKILL_KEY)
-```
-Every skill lives under a **named key** subdirectory. The SDK never reads from
-`<cwd>/.skills/` directly, so a host can list "which profiles exist in this
-workspace?" with one `readdir`. One workspace can host multiple disjoint
-catalogs (e.g. `legal/`, `coding/`, `research/`) and each session selects its
-profile by passing `skillKey`.
-### On-disk layout
-```
-<cwd>/.skills/
-├── default/                          ← skillKey omitted
-│   └── general/note-template/skill.md
-├── legal/                            ← skillKey: "legal"
-│   └── general/
-│       ├── ocr/
-│       │   ├── skill.md
-│       │   ├── scripts/preprocess.py
-│       │   └── references/api.md
-│       └── hearing-report/skill.md
-└── coding/                           ← skillKey: "coding"
-    └── general/code-review/skill.md
-```
-### Manifest format (clawgram-style)
-Two filename conventions are accepted: `SKILL.md` (upstream v0.13.0 with YAML
-frontmatter) and `skill.md` (lowercase, body-based). For new skills the
-clawgram convention is recommended:
-```markdown
-# OCR 텍스트 추출 (English subtitle)
-> **Description**: OCR, 이미지 읽어줘, PDF 텍스트 추출 요청 시 트리거.
-## Required MCP
-- ocr
-- paddleocr
-## 트리거
-- ...
-## 프로세스
-### 1. 이미지 준비
-### 2. paddleocr 실행
-## Gotchas
-- 흐릿한 이미지는 deskew 필요
-```
-The first heading is the display title; the `> **Description**: ...` blockquote
-carries the trigger keywords (this drives system-prompt activation). The
-loader extracts the description from either YAML frontmatter or this
-blockquote — both styles can coexist in the same `.skills/<key>/` tree.
-### Authoring from inside the agent — `skill_write`
-The model can persist new skills mid-session, including adjacent assets
-(scripts, templates, references), in one transactional call:
-```ts
-skill_write({
-  name: "ocr",                       // kebab-case, becomes the folder name
-  content: "# OCR ...\n\n> **Description**: OCR, 이미지 읽어줘\n\n...",
-  files: {
-    "scripts/preprocess.py": "import cv2\n...",
-    "scripts/run.sh": "#!/bin/bash\n...",
-    "templates/report.html": "<!doctype html>...",
-    "references/paddleocr-api.md": "# PaddleOCR API\n...",
-  },
-  overwrite: false,                  // default: refuse to clobber
-});
-```
-Resulting layout under `<skillsDir>/ocr/`:
-```
-ocr/
-├── skill.md            ← from `content`
-├── scripts/
-│   ├── preprocess.py
-│   └── run.sh
-├── templates/report.html
-└── references/paddleocr-api.md
-```
-Safety:
-- kebab-case validation on `name`
-- relative-path validation on every `files` key (rejects `..` escapes,
-  absolute prefixes, backslashes, and the reserved `skill.md` name)
-- `overwrite=false` → batch aborts BEFORE any disk touch if any target
-  already exists (validate-all-then-write)
-- cache invalidation on success → the new skill appears in the NEXT turn's
-  `<available_skills>` catalog (intentionally not the current turn — would
-  break the prompt cache)
-### Reading from the model side — `skill_view`
-The system prompt only carries name + summary per skill (FTS-style index).
-When the model decides a skill is relevant it calls `skill_view(name)` and
-gets the full body back, with a `[Skill directory: ...]` hint so relative
-paths in the body resolve against the right cwd.
 ## Configuration
 Per-call options on `AgentQueryOptions`:
@@ -241,6 +168,8 @@ Per-call options on `AgentQueryOptions`:
 | Option | Required | Purpose |
 |---|---|---|
 | `cwd` | ✓ | Workspace root. Drives `.skills/` location, rollout `_meta`, and the `mkdir` invariant. |
+| `effort` | — | Reasoning depth + working-mode persona (`low`/`medium`/`high`/`xhigh`/`max`). See the effort table above. |
+| `maxIterations` | — | Tool-iteration cap. Omit for `DEFAULT_MAX_ITERATIONS = 90`. Decoupled from `effort` as of v0.1.16 — controls turn budget, not reasoning depth. |
 | `skillKey` | — | Named skill profile within `<cwd>/.skills/`. Omit for `default`. |
 | `allowedSkills` | — | Per-call name whitelist applied before curation. |
 | `sessionMetadata` | — | Opaque host bag round-tripped via the rollout `_meta` header. |
@@ -322,7 +251,7 @@ Each session JSONL at `~/.maestro/sessions/<sessionId>.jsonl` carries a
 `_meta` header line for forensics and host-side indexing:
 ```jsonl
-{"_meta":{"version":1,"cwd":"/path","skillKey":"legal","userId":"...","createdAt":"2026-05-18T...","sdkVersion":"0.1.5","skillsDir":"...","metadata":{...}}}
+{"_meta":{"version":1,"cwd":"/path","skillKey":"legal","userId":"...","createdAt":"2026-05-18T...","sdkVersion":"0.1.x","skillsDir":"...","metadata":{...}}}
 {"role":"user","content":"..."}
 {"role":"assistant","content":[...]}
 ```
@@ -332,25 +261,18 @@ treats their first line as a regular message. Hosts that want to inspect
 session metadata without reading the full message log can call
 `loadMaestroSessionMeta(sessionId)`.
-## Architecture
+## Positioning — a building block, not a product
-```
-src/
-├── core/         AIAgent class + run_conversation loop
-├── tools/        ToolRegistry + builtin tools + PreToolUse/PostToolUse hook surface
-├── providers/    Provider adapters (anthropic, deepseek)
-├── mcp/          MCP client pool (stdio + SSE)
-├── skills/       Skill loader, index builder, usage tracker, curator
-├── memory/       Context compressor, token estimator, reminders, scrubber
-├── state/        Per-session todo store
-├── sub-agent/    Sub-agent runner for the `Agent` tool
-├── platform/     Injectable host adapters (logger, lifecycle, config, jsonl, version, mcp-config)
-├── agents/       Cross-agent rollout helpers + per-agent registry contract
-├── storage/      ConversationReader DI (host supplies past turns for cross-agent forks)
-└── media/        File-event extraction from inline `[FILE:/path]` tags
-```
+maestro-agent-sdk is an agent *runtime*, not an agent *product*. You pick the UI, the provider mix, the guardrail rules, the storage layer.
+| Project | Layer | Key trade-off |
+|---------|-------|---------------|
+| **maestro-agent-sdk** | Embeddable SDK | Agent loop only — no CLI, no UI, no fixed product shape. Host injects logger, MCP resolver, session store, guardrails. |
+| **hermes-agent** | Full-featured app | TUI, web dashboard, gateway, cron, Discord/Feishu. All-in-one — opinionated and coupled to its own host. |
+| **OpenAI Agents SDK** | SDK + scaffold | Strong guardrails/tracing/handoffs, but multi-agent by design — heavier abstraction surface. |
+| **oh-my-claudecode** | Orchestration plugin | Sits on Claude Code agent loop. Value is team mode, LSP tools, session replay. |
-The `platform/`, `storage/`, and `agents/contracts` modules expose **injection points** so the SDK never assumes a particular host process.
+**maestro-agent-sdk leaves product decisions to you.** Same `AIAgent` works in a Telegram bot, cron runner, or code review pipeline.
 ## Host integration (DI)
@@ -374,26 +296,155 @@ setMcpResolver((opts) => ({
 setConversationReader((userId, topic, groupId) => myStore.read({ userId, topic, groupId }));
 ```
+## Skills — drop a directory, get indexed context
+Skills are `SKILL.md` (or `skill.md`) files inside `<cwd>/.skills/<skillKey>/<name>/`. The SDK walks that tree on first turn, parses each file's YAML frontmatter, and appends a `## Skills (mandatory)` block to the system prompt with one `name + 60-char description` line per skill. Bodies stay on disk — the model calls `skill_view(name)` to load the full markdown on demand. Index is cached per (root, mtime, TTL) so subsequent turns pay no walk cost.
+```ts
+import { maestroProvider } from "maestro-agent-sdk";
+// `maestroProvider` is the batteries-included entry point: it builds the
+// ToolRegistry, wires builtin tools + skills + MCP, and drives the loop.
+for await (const event of maestroProvider({
+  cwd: "/path/to/workspace",  // .skills/ resolved relative to this
+  skillKey: "legal",          // → /path/to/workspace/.skills/legal/<name>/SKILL.md
+  prompt: "Draft a contract clause for ...",
+  userId: "alice",
+  session: "thread-42",
+  // skill_view + skill_write tools are auto-registered; the model picks
+  // which skill body to load per turn.
+})) {
+  if (event.type === "text_delta") process.stdout.write(event.content);
+}
+```
+**Creating skills:** `skill_write(name, body)` → writes `SKILL.md` into the named directory; the index hot-reloads.
+**Loading skills:** `skill_view(name)` → returns the full markdown body to the model.
+**Security:** every `SKILL.md` is scanned at index-time for prompt-injection, exfiltration, and destructive shell patterns. A flagged file is dropped from the catalog with a logged reason.
+## Hooks & Guardrails — LLM pre/post + tool hooks
+### LLM Pre Hook — inspect every API call
+Fires right before every provider call. The host can pass through, replace the user-visible content, or tripwire the entire run. Receives the full message array (system + history + current turn).
+```ts
+import { AIAgent, AnthropicProvider, ToolRegistry } from "maestro-agent-sdk";
+const agent = new AIAgent(provider, tools, {
+  model: "claude-sonnet-4-6",
+  systemPrompt: "...",
+  llmPreHook: async (messages, { abortSignal }) => {
+    const lastUser = messages.filter((m) => m.role === "user").at(-1);
+    const text = typeof lastUser?.content === "string" ? lastUser.content : "";
+    if (/api[_-]?key|password/i.test(text)) {
+      return {
+        decision: "reject_content",
+        message: "Sensitive credential detected — please rephrase without secrets.",
+      };
+    }
+    if (/rm -rf \//.test(text)) {
+      return { decision: "tripwire", message: "Destructive request blocked." };
+    }
+    return { decision: "allow" };
+  },
+});
+```
+### LLM Post Hook — validate the final turn
+Fires when the model produced a turn-complete response (no pending tool calls), before the `result` event is yielded. Use for output redaction, API-key leak detection, or final policy enforcement.
+```ts
+const agent = new AIAgent(provider, tools, {
+  // ...
+  llmPostHook: async (text, { messages }) => {
+    if (/sk-[a-zA-Z0-9]{20,}/.test(text)) {
+      return {
+        decision: "reject_content",
+        message: "[redacted: API key leak detected in assistant output]",
+      };
+    }
+    return { decision: "allow" };
+  },
+});
+```
+### Tool hooks — per-tool pre/post
+`ToolRegistry.use({ pre, post })` brackets every `dispatch()`. Pre can `allow` / `modify` / `block`; post sees the actual outcome via `status: "ok" | "blocked" | "error"` (since v0.1.14) so audit/telemetry hooks observe denied and failed calls too.
+```ts
+import { ToolRegistry, type PreToolUseDecision } from "maestro-agent-sdk";
+const tools = new ToolRegistry();
+// ... register builtin tools ...
+tools.use({
+  name: "fs-allowlist",
+  pre: ({ toolName, input }): PreToolUseDecision => {
+    if (toolName !== "Write" && toolName !== "Edit") return { decision: "allow" };
+    const path = String(input.file_path ?? "");
+    if (!path.startsWith("/workspace/")) {
+      return { decision: "block", error: `path '${path}' outside allowlist` };
+    }
+    return { decision: "allow" };
+  },
+  post: ({ toolName, status, error, output }) => {
+    metrics.increment(`tool.${toolName}.${status}`);
+    if (status === "error") logger.warn({ toolName, error }, "tool failed");
+    return {};  // pass output through unchanged
+  },
+});
+```
+### Guardrail decisions
+| Decision | Effect |
+|----------|--------|
+| `allow` | Proceed normally |
+| `reject_content` | Replace the message/result, continue execution |
+| `tripwire` | Abort the entire agent run immediately (LLM hooks only) |
+| `modify` | (Tool pre hooks only) Substitute the tool's `input` before dispatch |
+| `block` | (Tool pre hooks only) Skip tool execution, return the supplied error |
+## MCP — zero-config client pool
+Wire an `McpResolver` and the SDK lazily spawns, caches, and reuses MCP subprocess clients across turns. Cache key includes `(userId, session, groupId, agentKind, server, specHash)` — two users never share a client, and same-server / same-spec calls within a session reuse the warm process.
+```ts
+import { setMcpResolver } from "maestro-agent-sdk";
+setMcpResolver((opts) => ({
+  playwright: {
+    command: "playwright-mcp",
+    args: ["--user-data-dir", `/tmp/pw-${opts.userId}`],
+  },
+  // SSE transport
+  search: { type: "sse", url: "https://internal.example.com/mcp" },
+}));
+```
+- **Lazy spawn** — servers start on first tool call, not at agent creation.
+- **Pool cache** — `(userId, session, groupId, agentKind, server, specHash)` keyed; idle TTL 5 min, LRU cap 16 (override via `MAESTRO_MCP_POOL_IDLE_TTL_MS` / `MAESTRO_MCP_POOL_MAX`).
+- **In-flight dedup** — concurrent acquires on the same key await one `start()` instead of double-spawning (v0.1.14).
+- **Env values in cache hash** — `{ TOKEN: alice }` and `{ TOKEN: bob }` get separate processes by default; opt high-churn keys out via `setMcpCacheIgnoreEnvKeys(["DEPTH"])` (v0.1.14).
+- **stdio + SSE** — both transports supported via `MaestroMcpServerSpec`.
+- **Graceful shutdown** — `SIGINT` / `SIGTERM` closes every cached client before exit.
 ## Development
 ```bash
 git clone git@github.com:maestrojeong/maestro-agent-sdk.git
 cd maestro-agent-sdk
-npm install
+bun install         # also supported
+npm install         # alternative
 npm run typecheck   # tsc --noEmit
 npm run build       # tsc + tsc-alias → dist/
-npm test            # vitest, 426 tests (+11 skipped without ripgrep)
+npm test            # vitest, 437 tests (+11 skipped without ripgrep)
 ```
-### Known gaps
-Two test files are currently excluded in `vitest.config.ts`:
-- `maestro-registry.test.ts`
-- `maestro-session-store.test.ts`
-They rely on host-side helpers (`appendConversationEvent`, `getConversationPath`) and on the strict workspace-root check that the SDK loosened. They'll come back online once we wire them through the `ConversationReader` DI hook.
 ## License
-[MIT](./LICENSE). Design influenced by [Claude Code](https://www.anthropic.com/claude-code) and Nous Research's [`hermes-agent`](https://github.com/NousResearch/hermes-agent) (also MIT); see [NOTICE](./NOTICE) for attribution details.
+[MIT](./LICENSE).

package/dist/core/agent.d.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 import type { Provider } from "../providers/base.js";
 import type { ToolRegistry } from "../tools/registry.js";
-import type { EffortLevel } from "../types.js";
+import type { EffortLevel, LlmPostHook, LlmPreHook } from "../types.js";
 /**
  * AIAgent — minimal TS port of upstream `run_agent.py::AIAgent`.
  *
@@ -56,6 +56,10 @@ export interface AIAgentConfig {
      * only fires for subsequent tool_result turns.
      */
     buildIterReminder?: (iterationsRemaining: number) => string | null;
+    /** LLM Pre Hook — fires right before every provider API call. */
+    llmPreHook?: LlmPreHook;
+    /** LLM Post Hook — fires on turn-complete (no tool calls) before `result` event. */
+    llmPostHook?: LlmPostHook;
 }
 export declare class AIAgent {
     readonly provider: Provider;
@@ -65,6 +69,8 @@ export declare class AIAgent {
         effort?: EffortLevel;
         abortSignal?: AbortSignal;
         buildIterReminder?: (iterationsRemaining: number) => string | null;
+        llmPreHook?: LlmPreHook;
+        llmPostHook?: LlmPostHook;
     };
     constructor(provider: Provider, tools: ToolRegistry, config: AIAgentConfig);
 }

package/dist/core/agent.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"agent.d.ts","sourceRoot":"","sources":["../../src/core/agent.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC;AACjD,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AACrD,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,SAAS,CAAC;~~AAE3C~~;;;;;;;;;GASG;AAEH,MAAM,WAAW,aAAa;IAC5B,oDAAoD;IACpD,KAAK,EAAE,MAAM,CAAC;IACd,sEAAsE;IACtE,YAAY,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,oDAAoD;IACpD,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB;;;;;;;;OAQG;IACH,cAAc,CAAC,EAAE,MAAM,CAAC;IACxB;;;;;OAKG;IACH,MAAM,CAAC,EAAE,WAAW,CAAC;IACrB,8EAA8E;IAC9E,WAAW,CAAC,EAAE,WAAW,CAAC;IAC1B;;;;;;;;;;;;;;;OAeG;IACH,iBAAiB,CAAC,EAAE,CAAC,mBAAmB,EAAE,MAAM,KAAK,MAAM,GAAG,IAAI,CAAC;~~CACpE~~;AAED,qBAAa,OAAO;IAClB,QAAQ,CAAC,QAAQ,EAAE,QAAQ,CAAC;IAC5B,QAAQ,CAAC,KAAK,EAAE,YAAY,CAAC;IAC7B,QAAQ,CAAC,MAAM,EAAE,QAAQ,CACvB,IAAI,CAAC,aAAa,EAAE,OAAO,GAAG,cAAc,GAAG,eAAe,GAAG,WAAW,CAAC,CAC9E,GAAG;QACF,cAAc,CAAC,EAAE,MAAM,CAAC;QACxB,MAAM,CAAC,EAAE,WAAW,CAAC;QACrB,WAAW,CAAC,EAAE,WAAW,CAAC;QAC1B,iBAAiB,CAAC,EAAE,CAAC,mBAAmB,EAAE,MAAM,KAAK,MAAM,GAAG,IAAI,CAAC;~~KACpE~~,CAAC;gBAEU,QAAQ,EAAE,QAAQ,EAAE,KAAK,EAAE,YAAY,EAAE,MAAM,EAAE,aAAa;~~CAgB3E~~"}
1	+ {"version":3,"file":"agent.d.ts","sourceRoot":"","sources":["../../src/core/agent.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,kBAAkB,CAAC;AACjD,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,kBAAkB,CAAC;AACrD,OAAO,KAAK,EAAE,WAAW,EAAE,WAAW,EAAE,UAAU,EAAE,MAAM,SAAS,CAAC;AAEpE;;;;;;;;;GASG;AAEH,MAAM,WAAW,aAAa;IAC5B,oDAAoD;IACpD,KAAK,EAAE,MAAM,CAAC;IACd,sEAAsE;IACtE,YAAY,EAAE,MAAM,CAAC;IACrB,yEAAyE;IACzE,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,oDAAoD;IACpD,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB;;;;;;;;OAQG;IACH,cAAc,CAAC,EAAE,MAAM,CAAC;IACxB;;;;;OAKG;IACH,MAAM,CAAC,EAAE,WAAW,CAAC;IACrB,8EAA8E;IAC9E,WAAW,CAAC,EAAE,WAAW,CAAC;IAC1B;;;;;;;;;;;;;;;OAeG;IACH,iBAAiB,CAAC,EAAE,CAAC,mBAAmB,EAAE,MAAM,KAAK,MAAM,GAAG,IAAI,CAAC;IACnE,iEAAiE;IACjE,UAAU,CAAC,EAAE,UAAU,CAAC;IACxB,oFAAoF;IACpF,WAAW,CAAC,EAAE,WAAW,CAAC;CAC3B;AAED,qBAAa,OAAO;IAClB,QAAQ,CAAC,QAAQ,EAAE,QAAQ,CAAC;IAC5B,QAAQ,CAAC,KAAK,EAAE,YAAY,CAAC;IAC7B,QAAQ,CAAC,MAAM,EAAE,QAAQ,CACvB,IAAI,CAAC,aAAa,EAAE,OAAO,GAAG,cAAc,GAAG,eAAe,GAAG,WAAW,CAAC,CAC9E,GAAG;QACF,cAAc,CAAC,EAAE,MAAM,CAAC;QACxB,MAAM,CAAC,EAAE,WAAW,CAAC;QACrB,WAAW,CAAC,EAAE,WAAW,CAAC;QAC1B,iBAAiB,CAAC,EAAE,CAAC,mBAAmB,EAAE,MAAM,KAAK,MAAM,GAAG,IAAI,CAAC;QACnE,UAAU,CAAC,EAAE,UAAU,CAAC;QACxB,WAAW,CAAC,EAAE,WAAW,CAAC;KAC3B,CAAC;gBAEU,QAAQ,EAAE,QAAQ,EAAE,KAAK,EAAE,YAAY,EAAE,MAAM,EAAE,aAAa;CAkB3E"}

package/dist/core/agent.js CHANGED Viewed

@@ -16,6 +16,8 @@ export class AIAgent {
             ...(config.effort ? { effort: config.effort } : {}),
             ...(config.abortSignal ? { abortSignal: config.abortSignal } : {}),
             ...(config.buildIterReminder ? { buildIterReminder: config.buildIterReminder } : {}),
+            ...(config.llmPreHook ? { llmPreHook: config.llmPreHook } : {}),
+            ...(config.llmPostHook ? { llmPostHook: config.llmPostHook } : {}),
         };
     }
 }

package/dist/core/agent.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"agent.js","sourceRoot":"","sources":["../../src/core/agent.ts"],"names":[],"mappings":"~~AA8DA~~,MAAM,OAAO,OAAO;IACT,QAAQ,CAAW;IACnB,KAAK,CAAe;IACpB,MAAM,~~CAOb~~;IAEF,YAAY,QAAkB,EAAE,KAAmB,EAAE,MAAqB;QACxE,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;QACzB,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;QACnB,IAAI,CAAC,MAAM,GAAG;YACZ,KAAK,EAAE,MAAM,CAAC,KAAK;YACnB,YAAY,EAAE,MAAM,CAAC,YAAY;YACjC,aAAa,EAAE,MAAM,CAAC,aAAa,IAAI,EAAE;YACzC,SAAS,EAAE,MAAM,CAAC,SAAS,IAAI,IAAI;YACnC,GAAG,CAAC,MAAM,CAAC,cAAc,IAAI,MAAM,CAAC,cAAc,GAAG,CAAC;gBACpD,CAAC,CAAC,EAAE,cAAc,EAAE,MAAM,CAAC,cAAc,EAAE;gBAC3C,CAAC,CAAC,EAAE,CAAC;YACP,GAAG,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,MAAM,EAAE,MAAM,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YACnD,GAAG,CAAC,MAAM,CAAC,WAAW,CAAC,CAAC,CAAC,EAAE,WAAW,EAAE,MAAM,CAAC,WAAW,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,GAAG,CAAC,MAAM,CAAC,iBAAiB,CAAC,CAAC,CAAC,EAAE,iBAAiB,EAAE,MAAM,CAAC,iBAAiB,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;~~SACrF~~,CAAC;IACJ,CAAC;CACF"}
1	+ {"version":3,"file":"agent.js","sourceRoot":"","sources":["../../src/core/agent.ts"],"names":[],"mappings":"AAkEA,MAAM,OAAO,OAAO;IACT,QAAQ,CAAW;IACnB,KAAK,CAAe;IACpB,MAAM,CASb;IAEF,YAAY,QAAkB,EAAE,KAAmB,EAAE,MAAqB;QACxE,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;QACzB,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;QACnB,IAAI,CAAC,MAAM,GAAG;YACZ,KAAK,EAAE,MAAM,CAAC,KAAK;YACnB,YAAY,EAAE,MAAM,CAAC,YAAY;YACjC,aAAa,EAAE,MAAM,CAAC,aAAa,IAAI,EAAE;YACzC,SAAS,EAAE,MAAM,CAAC,SAAS,IAAI,IAAI;YACnC,GAAG,CAAC,MAAM,CAAC,cAAc,IAAI,MAAM,CAAC,cAAc,GAAG,CAAC;gBACpD,CAAC,CAAC,EAAE,cAAc,EAAE,MAAM,CAAC,cAAc,EAAE;gBAC3C,CAAC,CAAC,EAAE,CAAC;YACP,GAAG,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,MAAM,EAAE,MAAM,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YACnD,GAAG,CAAC,MAAM,CAAC,WAAW,CAAC,CAAC,CAAC,EAAE,WAAW,EAAE,MAAM,CAAC,WAAW,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,GAAG,CAAC,MAAM,CAAC,iBAAiB,CAAC,CAAC,CAAC,EAAE,iBAAiB,EAAE,MAAM,CAAC,iBAAiB,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YACpF,GAAG,CAAC,MAAM,CAAC,UAAU,CAAC,CAAC,CAAC,EAAE,UAAU,EAAE,MAAM,CAAC,UAAU,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;YAC/D,GAAG,CAAC,MAAM,CAAC,WAAW,CAAC,CAAC,CAAC,EAAE,WAAW,EAAE,MAAM,CAAC,WAAW,EAAE,CAAC,CAAC,CAAC,EAAE,CAAC;SACnE,CAAC;IACJ,CAAC;CACF"}

package/dist/core/loop.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"loop.d.ts","sourceRoot":"","sources":["../../src/core/loop.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;~~AAK5C~~,OAAO,KAAK,EAAwB,eAAe,EAAoB,MAAM,kBAAkB,CAAC;AAChG,OAAO,KAAK,~~EAA2B~~,YAAY,EAAE,MAAM,SAAS,CAAC;~~AA0BrE~~;;;;;;;;;;;;;;;;;;;;;GAqBG;AACH,wBAAuB,eAAe,CACpC,KAAK,EAAE,OAAO,EACd,QAAQ,EAAE,eAAe,EAAE,GAC1B,cAAc,CAAC,YAAY,CAAC,~~CAgV9B~~"}
1	+ {"version":3,"file":"loop.d.ts","sourceRoot":"","sources":["../../src/core/loop.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EAAE,OAAO,EAAE,MAAM,cAAc,CAAC;AAM5C,OAAO,KAAK,EAAwB,eAAe,EAAoB,MAAM,kBAAkB,CAAC;AAChG,OAAO,KAAK,EAAc,YAAY,EAAE,MAAM,SAAS,CAAC;AAwBxD;;;;;;;;;;;;;;;;;;;;;GAqBG;AACH,wBAAuB,eAAe,CACpC,KAAK,EAAE,OAAO,EACd,QAAQ,EAAE,eAAe,EAAE,GAC1B,cAAc,CAAC,YAAY,CAAC,CAkZ9B"}

package/dist/core/loop.js CHANGED Viewed

@@ -2,13 +2,12 @@ import { extractFileEvents } from "../media/file-events.js";
 import { compressIfNeeded } from "../memory/compressor.js";
 import { StreamingContextScrubber, scrubString } from "../memory/scrubber.js";
 import { logger } from "../platform/logger.js";
-const EFFORT_LEVELS = ["minimal", "low", "medium", "high", "xhigh", "max"];
-function nextEffortLevel(current) {
-    const idx = EFFORT_LEVELS.indexOf(current ?? "medium");
-    if (idx < 0 || idx >= EFFORT_LEVELS.length - 1)
-        return null;
-    return EFFORT_LEVELS[idx + 1];
-}
+import { thinkingBudgetForTurn } from "../providers/anthropic.js";
+// v0.1.16: removed `EFFORT_LEVELS` + `nextEffortLevel`. The previous
+// max_iterations result message recommended bumping effort to get more
+// turns, but the iteration cap is now decoupled from effort and
+// controlled directly by `AgentQueryOptions.maxIterations`. The message
+// surfaces that hint instead, so the helpers are no longer needed.
 /**
  * Cap applied to the `content` field of `tool_result` UnifiedEvents emitted to
  * the dispatcher. Matches the 200-char ceiling already enforced by
@@ -87,16 +86,49 @@ export async function* runConversation(agent, messages) {
         // progressive typing UX. complete() stays as the fallback for providers
         // that haven't implemented stream() yet (e.g. an early Phase 5 OpenAI
         // adapter could ship stream() in a follow-up).
+        // v0.1.16: thinking budget is turn-adaptive. The base budget on
+        // `agent.config.thinkingBudget` reflects the caller's effort; for the
+        // wire call we resolve it through `thinkingBudgetForTurn` so the
+        // wrap-up zone (last 3 turns) trims down to 1/4 base. First + middle
+        // turns get the full base. See `thinkingBudgetForTurn` for the
+        // rationale; the helper handles the undefined / zero base no-op and
+        // the Anthropic >= 1024 minimum internally.
+        const turnBudget = thinkingBudgetForTurn(agent.config.thinkingBudget, iterations, maxIter);
         const callOpts = {
             model: agent.config.model,
             messages: wireMessages,
             system: agent.config.systemPrompt,
             tools: agent.tools.schemas(),
             maxTokens: agent.config.maxTokens,
-            ...(agent.config.thinkingBudget ? { thinkingBudget: agent.config.thinkingBudget } : {}),
+            ...(turnBudget ? { thinkingBudget: turnBudget } : {}),
             ...(agent.config.effort ? { effort: agent.config.effort } : {}),
             ...(agent.config.abortSignal ? { abortSignal: agent.config.abortSignal } : {}),
         };
+        // ─── LLM Pre Hook ───
+        // Host guardrail runs before the provider sees the request. tripwire
+        // aborts the entire run; reject_content injects a rejection message as
+        // a user turn and lets the model respond to it next iteration.
+        if (agent.config.llmPreHook) {
+            const preResult = await agent.config.llmPreHook(wireMessages, {
+                ...(agent.config.abortSignal ? { abortSignal: agent.config.abortSignal } : {}),
+            });
+            if (preResult.decision === "tripwire") {
+                yield {
+                    type: "error",
+                    content: preResult.message ?? "guardrail: pre-hook tripwire",
+                };
+                return;
+            }
+            if (preResult.decision === "reject_content" && preResult.message) {
+                messages.push({
+                    role: "user",
+                    content: [{ type: "text", text: preResult.message }],
+                });
+                yield { type: "user_message", content: preResult.message };
+                continue;
+            }
+            // allow — fall through to provider call
+        }
         let response;
         let assistantText = "";
         const toolUses = [];
@@ -255,10 +287,33 @@ export async function* runConversation(agent, messages) {
         // history exactly once.
         messages.push({ role: "assistant", content: assistantBlocks });
         if (toolUses.length === 0) {
+            // ─── LLM Post Hook ───
+            // Host guardrail validates the final assistant text before the `result`
+            // event. tripwire replaces the result with an error; reject_content
+            // rewrites the content field so the caller surfaces the rejection message.
+            let resultContent = assistantText;
+            if (agent.config.llmPostHook) {
+                // Snapshot the current conversation (excludes the assistant turn just
+                // pushed — it's now the last entry in `messages`).
+                const postResult = await agent.config.llmPostHook(assistantText, {
+                    messages,
+                    ...(agent.config.abortSignal ? { abortSignal: agent.config.abortSignal } : {}),
+                });
+                if (postResult.decision === "tripwire") {
+                    yield {
+                        type: "error",
+                        content: postResult.message ?? "guardrail: post-hook tripwire",
+                    };
+                    return;
+                }
+                if (postResult.decision === "reject_content") {
+                    resultContent = postResult.message ?? resultContent;
+                }
+            }
             // No more tools — turn complete.
             yield {
                 type: "result",
-                content: assistantText,
+                content: resultContent,
                 stopReason: response.stopReason,
                 usage: {
                     inputTokens: usageAcc.inputTokens,
@@ -354,18 +409,25 @@ export async function* runConversation(agent, messages) {
         messages.push({ role: "user", content: toolResultBlocks });
         iterations++;
     }
-    const nextEffort = nextEffortLevel(agent.config.effort);
+    // v0.1.16: the iteration cap is no longer derived from effort, so the
+    // "raise effort to get more turns" affordance from earlier versions is
+    // misleading. We still surface `effort` in the message for context (the
+    // host may want to raise reasoning depth too), but the actionable hint
+    // is to raise `maxIterations` via AgentQueryOptions, which is the only
+    // knob that controls the cap now.
     yield {
         type: "result",
-        content: nextEffort
-            ? `Task didn't finish within the ${maxIter}-turn budget at effort='${agent.config.effort ?? "default"}'. Try increasing effort to '${nextEffort}' — run: set_effort ${nextEffort}`
-            : `Task didn't finish within the ${maxIter}-turn budget at effort='${agent.config.effort ?? "default"}' (already at max).`,
+        content: `Task didn't finish within the ${maxIter}-turn budget at effort='${agent.config.effort ?? "default"}'. Raise the cap via AgentQueryOptions.maxIterations (currently defaulting to ${maxIter}) — and/or increase effort for deeper per-turn reasoning.`,
         stopReason: "max_iterations",
         usage: {
             inputTokens: usageAcc.inputTokens,
             outputTokens: usageAcc.outputTokens,
-            ...(usageAcc.cacheCreationInputTokens ? { cacheCreationInputTokens: usageAcc.cacheCreationInputTokens } : {}),
-            ...(usageAcc.cacheReadInputTokens ? { cacheReadInputTokens: usageAcc.cacheReadInputTokens } : {}),
+            ...(usageAcc.cacheCreationInputTokens
+                ? { cacheCreationInputTokens: usageAcc.cacheCreationInputTokens }
+                : {}),
+            ...(usageAcc.cacheReadInputTokens
+                ? { cacheReadInputTokens: usageAcc.cacheReadInputTokens }
+                : {}),
         },
     };
 }