npm - thoughtgear - Versions diffs - 0.1.0 - Mend

thoughtgear 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/LICENSE +21 -0
package/PROMPT_HANDLER.md +689 -0
package/README.md +207 -0
package/dist/classes/PromptHandler.d.ts +263 -0
package/dist/classes/PromptHandler.d.ts.map +1 -0
package/dist/classes/PromptHandler.js +743 -0
package/dist/classes/PromptHandler.js.map +1 -0
package/dist/index.d.ts +3 -0
package/dist/index.d.ts.map +1 -0
package/dist/index.js +2 -0
package/dist/index.js.map +1 -0
package/package.json +62 -0

package/PROMPT_HANDLER.md ADDED Viewed

@@ -0,0 +1,689 @@
+# PromptHandler.ts — Deep Reference
+A minimal, self-contained emulation of the openclaw agent loop (`src/agents/pi-embedded-runner`). The goal is to expose the **shape** of a real agent loop — stateless iterations, ORM-persisted transcript, pluggable Executor — in something you can read top-to-bottom in one sitting.
+This document explains every type, class, method, and field in `PromptHandler.ts`, plus the reasoning behind each design choice.
+---
+## 1. Mental model
+A user sends a prompt. The handler does **not** answer in one shot — it runs a **loop**:
+```
+user message
+   │
+   ▼
+[iteration 0]  build prompt ──▶ call model ──▶ stream blocks
+                                                 │
+                                       ┌─────────┴─────────┐
+                                  text only            tool calls
+                                       │                   │
+                                     DONE         execute tools
+                                                           │
+                                                           ▼
+                                              [iteration 1]  build prompt
+                                                            (now includes tool results)
+                                                            ──▶ call model ──▶ ...
+```
+Each iteration is **stateless against memory**: it loads everything it needs from the ORM (history + run state). That property is what makes the loop runnable in two very different environments:
+- **Locally**: one Node process, awaits each iteration in sequence.
+- **AWS Lambda**: each iteration is a separate Lambda invocation. After saving state to the ORM, the current invocation calls `lambda.invoke(...)` on itself and returns. The next invocation picks up the `runId`, rebuilds context from the ORM, and continues.
+The same `continueRun(runId)` method serves both modes. Only the **Executor** differs.
+This is exactly the pattern openclaw uses in `pi-embedded-runner`: an outer retry/failover loop wrapping a stateless `runEmbeddedAttempt` that rebuilds the payload from history every iteration.
+---
+## 2. File layout
+The file is organized top-down so you can read it like a story:
+1. **Types** — the data flowing through the system.
+2. **ORM** — persistence surface (Mongo + SQL behind one façade).
+3. **LLM provider** — abstraction over Anthropic / OpenAI / mock.
+4. **Executor** — local vs Lambda iteration scheduling.
+5. **PromptHandler** — the actual loop.
+6. **Lambda entry point** — illustrative routing.
+Each section depends only on what comes before it, so there are no forward references.
+---
+## 3. Types
+### `Tool`
+```ts
+type Tool = { key: string; description: string; content: string };
+```
+Mirrors an openclaw **skill**: a chunk of instructions the model can pull into context.
+- `key` — unique identifier, used as the function name when the model emits a `tool_call`.
+- `description` — short text the model sees in the system prompt to decide whether to invoke it.
+- `content` — the skill body. When the tool is invoked, this string becomes the `tool_result`. For function-style tools, `executeTool` is where you'd swap this for real dispatch (HTTP call, DB query, etc.).
+The reason `content` is a `string` rather than a callback is that this scaffold treats tools as **skill definitions** — the model gets a body of instructions, not a typed function. Function-style tools (`(args) => result`) are easy to add later by extending the type.
+### `ModelConfig`
+```ts
+type ModelConfig = { name; provider: 'anthropic' | 'openai' | 'google' | 'mock'; apiKey; baseUrl? };
+```
+Everything the LLM layer needs to authenticate and select a backend. `provider` is the discriminant the factory uses to pick a concrete `LLMProvider`. `baseUrl` is optional because most users hit the vendor's default endpoint; it exists for proxies, self-hosted gateways, and local mock servers.
+### `FileAttachment`
+```ts
+type FileAttachment = { name; mimeType; data: string /* base64 */ };
+```
+A single attachment carried inline. Base64 is the lowest-common-denominator transport — works across HTTP, Lambda payloads, and DB rows without binary handling. For large files you'd swap this for an S3 URL/key, but the type stays the same shape.
+### `ContentBlock`
+A **discriminated union** of every kind of payload that can appear inside a message:
+```ts
+| { type: 'text'; text }
+| { type: 'reasoning'; text }              // extended thinking
+| { type: 'tool_call'; id; name; input }   // model asks to call a tool
+| { type: 'tool_result'; toolCallId; output; isError? }  // our reply to a tool_call
+| { type: 'file'; file: FileAttachment }
+```
+Discriminated unions are deliberate: every consumer narrows on `block.type`, so adding a new block type forces a compile error everywhere it's missed. This is the same shape Anthropic/OpenAI's SDKs use after normalization, which is why the LLM providers return `ContentBlock[]` directly rather than provider-native types.
+`tool_call.input` is `unknown` because the model can emit arbitrary JSON; validation happens at the tool boundary (in `executeTool`), not at the type level.
+### `Role` + `Message`
+```ts
+type Role = 'system' | 'user' | 'assistant' | 'tool';
+type Message = {
+  id; runId; role; blocks: ContentBlock[]; createdAt; iteration: number;
+};
+```
+A `Message` is the atomic unit persisted in the ORM. Why these fields:
+- `id` — stable identifier (UUID). Lets you de-duplicate or look up a single message without composite keys.
+- `runId` — groups messages belonging to one logical "conversation turn." Indexed in the ORM for fast history retrieval.
+- `role` — standard chat-completions roles. `'tool'` is its own role (rather than a sub-type of `assistant`/`user`) because most providers want tool results in a dedicated slot.
+- `blocks` — array, not a single string, because a single message can contain text + tool calls + reasoning simultaneously.
+- `createdAt` — for ordering and TTL/cleanup.
+- `iteration` — which loop iteration produced this message. Useful for debugging ("why did the model loop 7 times?") and for compaction heuristics.
+### `RunStatus` + `RunState`
+```ts
+type RunStatus = 'pending' | 'streaming' | 'awaiting_tools' | 'compacting' | 'done' | 'failed';
+type RunState = { runId; status; iteration; lastStopReason?; lastError?; createdAt; updatedAt };
+```
+`RunState` is the **only** mutable record per run. Each loop iteration reads it, mutates it, and writes it back. This is the file's central concurrency contract: you can resume a run from any process at any time using just `runId + ORM`.
+The status values map to phases inside `continueRun`:
+- `pending` — created or just finished a tool round; ready for the next model call.
+- `streaming` — currently inside a model stream.
+- `awaiting_tools` — model emitted tool_use; we're running tools.
+- `compacting` — history was over the budget; summarizing.
+- `done` — `stopReason === 'end_turn'`. Terminal.
+- `failed` — unrecoverable error. Terminal.
+A separate `iteration` counter exists because `status` alone can't gate `maxIterations` (you'd lose count across crashes).
+`lastStopReason` and `lastError` are diagnostic. They're optional because they only have meaningful values in some states.
+### `StreamCallbacks`
+```ts
+type StreamCallbacks = {
+  onPartialReply?, onReasoningStream?, onBlockReply?, onToolStart?, onToolResult?, onDone?
+};
+```
+The set of hooks the LLM layer and the loop fire as the run progresses. These are exactly the callbacks openclaw exposes upstream (`onPartialReply`, `onReasoningStream`, `onBlockReply`, `onToolStart`, `onToolResult`) and what channels like the Telegram plugin subscribe to in order to stream tokens back to users.
+All callbacks are optional. If a consumer doesn't care about reasoning deltas, it just doesn't pass `onReasoningStream` — the field is `?`, so there's nothing to call.
+---
+## 4. ORM layer
+### `DbConfig`
+```ts
+type DbConfig =
+  | { type: 'mongodb'; uri; database }
+  | { type: 'sql'; dialect: 'postgres' | 'mysql' | 'sqlite'; uri };
+```
+Another discriminated union. The `type` tag selects which adapter the `ORM` façade instantiates. SQL has a sub-discriminant (`dialect`) because driver and quoting differ across postgres/mysql/sqlite, even though the schema is identical.
+### `OrmAdapter` interface
+```ts
+interface OrmAdapter {
+  saveMessage; getHistory;
+  saveRunState; getRunState;
+  cacheGet; cacheSet;
+  saveMemory; getMemory;
+}
+```
+Defines the **contract** that any backend must satisfy. Why these eight methods specifically:
+- `saveMessage` / `getHistory` — write-once append log + read-many for one runId. The minimum required to rebuild a transcript each iteration.
+- `saveRunState` / `getRunState` — upsert by runId. The mutable cursor that drives the loop.
+- `cacheGet` / `cacheSet` — opaque KV with TTL. Reserved for prompt caching, model-call memoization, and any other transient deduplication. Real ORMs would back this with Redis, but the interface lets a single Mongo collection do it too.
+- `saveMemory` / `getMemory` — scoped, long-lived KV. The `scope` parameter (e.g. `"user:123"`, `"channel:xyz"`) namespaces values so you can store per-user facts without colliding. Mirrors openclaw's `auto memory` system.
+Eight methods is the **smallest** set that supports the rest of the file. Anything else (analytics, audit, vector search) belongs in a separate concern, not bolted on here.
+### `ORM` class (façade)
+```ts
+class ORM implements OrmAdapter {
+  private adapter: OrmAdapter;
+  constructor(public config: DbConfig) {
+    this.adapter = config.type === 'mongodb'
+      ? new MongoOrmAdapter(config)
+      : new SqlOrmAdapter(config);
+  }
+  saveMessage(msg) { return this.adapter.saveMessage(msg); }
+  // ...delegating each method through
+}
+```
+This is the **Strategy pattern**. The façade picks an adapter once in the constructor, then forwards every call. Consumers depend on `ORM`, not on the concrete adapter — so swapping Mongo for SQL doesn't touch the `PromptHandler`.
+Why a class delegating to an adapter, rather than just exporting the adapter directly? Two reasons:
+1. **Stable import surface.** Callers always `new ORM(...)`. The internal split is invisible.
+2. **Place for cross-cutting concerns.** If you later add tracing, retries, or metric hooks, they live on `ORM` and apply to both backends for free.
+`config` is `public readonly` so consumers can inspect it (useful for logging the database target at startup) without being able to mutate it.
+### `MongoOrmAdapter` and `SqlOrmAdapter`
+Both implement `OrmAdapter` with **no-op stubs**. They exist to lock in the surface and let you compile and run the loop today with the `MockProvider`, then wire up real drivers (`mongodb`, `kysely`, `pg`, `better-sqlite3`) inside each method without touching the rest of the file.
+The collection / table layout each is expected to use:
+| Concept       | Mongo collection | SQL table       |
+| ------------- | ---------------- | --------------- |
+| `Message`     | `messages`       | `messages`      |
+| `RunState`    | `run_states`     | `run_states`    |
+| `cacheGet/Set`| `cache`          | `cache`         |
+| `saveMemory/getMemory` | `memory` | `memory`        |
+Indexes that matter for performance:
+- `messages.runId` + `messages.createdAt` — `getHistory` scans by runId, ordered.
+- `run_states.runId` — primary key / unique index.
+- `cache.key` — primary key; add a TTL index on `expiresAt` for Mongo, a cron sweep for SQL.
+---
+## 5. LLM provider layer
+### `StreamResult`
+```ts
+type StreamResult = {
+  blocks: ContentBlock[];
+  stopReason: 'end_turn' | 'tool_use' | 'max_tokens' | 'error';
+};
+```
+The provider-normalized return value of one model call. The loop only ever inspects these two fields:
+- `blocks` — every piece of output the model produced (text, reasoning, tool calls). Persisted verbatim as the assistant message.
+- `stopReason` — drives the loop's branching. `tool_use` → run tools and iterate. `end_turn` → finalize. Other values fail or extend depending on policy.
+This is the same normalization openclaw does in `stream-resolution.ts` + per-provider wrappers (`openai-stream-wrappers.ts`, etc.).
+### `LLMProvider` interface
+```ts
+interface LLMProvider {
+  stream(args: { system; messages; tools; callbacks?; runId }): Promise<StreamResult>;
+}
+```
+A single method. Inputs are everything the model needs to produce a turn; output is the normalized `StreamResult`. The interface is `Promise<StreamResult>`, not an `AsyncIterable`, because **streaming behavior is communicated through callbacks** (`onPartialReply` etc.), and the final accumulated blocks are returned at the end. This shape is easier to compose and to test than an async iterator.
+### `createLLMProvider` factory
+```ts
+function createLLMProvider(model: ModelConfig): LLMProvider {
+  switch (model.provider) {
+    case 'anthropic': return new AnthropicProvider(model);
+    case 'openai':    return new OpenAIProvider(model);
+    case 'google':    return new GoogleProvider(model);
+    case 'mock':      return new MockProvider();
+  }
+}
+```
+A `switch` on the discriminant. TypeScript exhaustiveness-checks this — adding a new provider to `ModelConfig['provider']` without handling it here is a compile error.
+### `AnthropicProvider`, `OpenAIProvider`, and `GoogleProvider`
+Stubs. The body comment in each describes the integration:
+- **Anthropic**: `@anthropic-ai/sdk` → `messages.stream(...)`. Map `content_block_delta` events to `onPartialReply` / `onReasoningStream`. Collect `tool_use` blocks into `ContentBlock` of type `'tool_call'`. Finalize on `message_stop`.
+- **OpenAI**: equivalent shape with the `responses` API.
+- **Google (Gemini)**: `@google/genai` → `ai.models.generateContentStream({...})`. The provider key is `'google'` to match openclaw's naming (`google-stream-wrappers.ts`, `google-prompt-cache.ts`); the underlying models are the Gemini family (`gemini-2.5-pro`, `gemini-2.5-flash`, etc.). A few Gemini-specific normalizations to keep in mind:
+  | Gemini concept                                | Normalized to                              |
+  | --------------------------------------------- | ------------------------------------------ |
+  | `systemInstruction` (separate from messages)  | `args.system` — pass through, don't inline |
+  | `parts[].text`                                | `ContentBlock('text')` + `onPartialReply`  |
+  | `parts[].thought`                             | `ContentBlock('reasoning')` + `onReasoningStream` |
+  | `parts[].functionCall { name, args }`         | `ContentBlock('tool_call')`                |
+  | Tool results → `parts[].functionResponse` under `role: 'user'` | reverse mapping when building `contents` |
+  | `finishReason: 'STOP' \| 'MAX_TOKENS' \| 'SAFETY' \| 'TOOL_USE'` | `StreamResult.stopReason`            |
+  | `cachedContent` (implicit + explicit caching) | use ORM `cacheGet/Set` to track cache IDs  |
+All three return an empty text block today so the file compiles end-to-end.
+### `MockProvider`
+```ts
+class MockProvider implements LLMProvider {
+  async stream(args) {
+    const turn = args.messages.filter((m) => m.role === 'assistant').length;
+    if (turn === 0 && args.tools[0]) {
+      // emit a tool call on turn 0
+    }
+    // text-only on subsequent turns
+  }
+}
+```
+A **deterministic fake** for end-to-end testing without API keys. The rule "emit one tool call on iteration 0, then text on iteration 1+" is the simplest behavior that exercises every branch of the loop:
+1. Iteration 0 → `stopReason: 'tool_use'` → loop runs the tool → schedules iteration 1.
+2. Iteration 1 → `stopReason: 'end_turn'` → loop finalizes.
+That's the complete state machine in two turns. If you can run `MockProvider` against your ORM and see two messages persisted, the rest of the system is working.
+The mock also calls `onPartialReply` / `onBlockReply` so you can verify your callback wiring without a real provider.
+---
+## 6. Executor layer
+This is the piece that **abstracts the runtime**. Local vs Lambda is the whole reason the file is split this way.
+### `Executor` interface
+```ts
+interface Executor {
+  scheduleNextIteration(runId: string): Promise<void>;
+}
+```
+Exactly one method. "When the loop has more work to do, here's how you continue it." The implementation decides whether "continue" means "call the next function" or "fire a new Lambda invocation."
+### `LocalExecutor`
+```ts
+class LocalExecutor implements Executor {
+  private handler!: PromptHandler;
+  bind(handler) { this.handler = handler; }
+  async scheduleNextIteration(runId) {
+    await new Promise((r) => setImmediate(r));
+    await this.handler.continueRun(runId);
+  }
+}
+```
+A few subtle things here:
+- **`bind(handler)` instead of constructor injection.** There's a chicken-and-egg problem: `PromptHandler` wants to receive an `Executor`, and `LocalExecutor` wants a reference to the `PromptHandler` to call `continueRun`. The handler's constructor calls `executor.bind(this)` to close the loop. This avoids requiring callers to write `new LocalExecutor()` and then manually wire it.
+- **`!` non-null assertion on `handler`.** A deliberate decision: `bind` is always called by the `PromptHandler` constructor before `scheduleNextIteration` is ever invoked, so the field is logically initialized. Marking it `!` keeps the type clean without optional-chaining noise everywhere.
+- **`setImmediate` yield.** Yields to the event loop before recursing. Without this, a chain of synchronous-style iterations could starve the loop and trip Node's max recursion in pathological cases. Also gives `onDone` / `onToolResult` callbacks a chance to drain.
+### `LambdaExecutor`
+```ts
+type LambdaInvoker = (payload: { runId; action: 'continue' }) => Promise<void>;
+class LambdaExecutor implements Executor {
+  constructor(private invoke: LambdaInvoker) {}
+  async scheduleNextIteration(runId) {
+    await this.invoke({ runId, action: 'continue' });
+  }
+}
+```
+The invoke callback is **injected**, not imported. That's deliberate: this file doesn't depend on `@aws-sdk/client-lambda`. You build the invoker yourself:
+```ts
+import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';
+const client = new LambdaClient({});
+const invoke: LambdaInvoker = async (payload) => {
+  await client.send(new InvokeCommand({
+    FunctionName: 'my-thoughtgear',
+    InvocationType: 'Event', // async — fire and forget
+    Payload: Buffer.from(JSON.stringify(payload)),
+  }));
+};
+```
+`InvocationType: 'Event'` is the important detail. It tells AWS not to wait for the new invocation to finish — the current invocation returns immediately, releasing its Lambda runtime container. Total billable time stays bounded per iteration regardless of how long the full run takes.
+The reason iterating through Lambda invocations is interesting at all: a single Lambda has a hard 15-minute cap. A 10-step agent run with slow tool calls could exceed that. Splitting iterations across invocations turns one long run into many short ones.
+---
+## 7. PromptHandler — the loop
+This is where everything plugs in.
+### `PromptHandlerOptions`
+```ts
+type PromptHandlerOptions = {
+  context: string;
+  tools: Tool[];
+  model: ModelConfig;
+  orm: ORM;
+  executor?: Executor;
+  callbacks?: StreamCallbacks;
+  maxIterations?: number;
+  compactionCharThreshold?: number;
+};
+```
+A single options object instead of a long positional argument list. Three are optional:
+- `executor` — defaults to `LocalExecutor`. Pass `LambdaExecutor` for serverless.
+- `callbacks` — defaults to `{}`. No-op for headless runs.
+- `maxIterations` — defaults to `16`. Mirrors openclaw's retry-limit; prevents infinite tool loops.
+- `compactionCharThreshold` — defaults to `80_000`. Past this, history gets summarized.
+### Constructor
+```ts
+constructor(opts) {
+  // ...assign fields
+  this.llm = createLLMProvider(opts.model);
+  const executor = opts.executor ?? new LocalExecutor();
+  if (executor instanceof LocalExecutor) executor.bind(this);
+  this.executor = executor;
+}
+```
+Key moves:
+1. **Eagerly build the LLM provider** so model errors surface at construction, not on first prompt.
+2. **Default executor is `LocalExecutor`** — works out of the box without thinking about Lambda.
+3. **`instanceof LocalExecutor` check** for the `bind` call. `LambdaExecutor` doesn't need a back-reference because it doesn't recurse into the handler.
+### `handlePrompt({ text, files? })`
+Entry point. Three steps:
+1. **Generate `runId`** via `randomUUID()`. UUIDv4 from Node's `crypto` module — no external dep.
+2. **Persist the user message** with `iteration: 0`. The blocks array is built from the text plus any file attachments.
+3. **Persist initial `RunState`** with `status: 'pending'`, `iteration: 0`.
+4. **Schedule the first iteration** via the Executor. Returns `{ runId }` immediately — the loop runs out-of-band.
+The return value is `{ runId }`, not the final reply, because:
+- In Lambda mode, the reply doesn't exist yet — it'll be produced across future invocations.
+- In local mode, the caller can subscribe to `callbacks.onDone(runId)` for completion, or poll `orm.getHistory(runId)` / `orm.getRunState(runId)`.
+This asymmetric API (fire-and-track-by-id) is the **only** API shape that works in both runtimes.
+### `continueRun(runId)`
+The body of the loop. Read it as a sequence of well-defined phases — same phases as `pi-embedded-runner/run/attempt.ts`:
+```
+1. Load state. Bail if terminal. Fail if over maxIterations.
+2. Load history. Compact if oversized.
+3. Build system prompt.
+4. Mark status = 'streaming'.
+5. Call llm.stream(...). Catch and fail on error.
+6. Persist assistant message (whatever blocks came back).
+7. If stopReason === 'tool_use':
+      a. Mark status = 'awaiting_tools'.
+      b. For each tool_call: execute, fire callbacks, persist tool_result message.
+      c. Bump iteration, mark status = 'pending'.
+      d. Schedule next iteration via executor.
+      e. Return.
+   Else:
+      Mark status = 'done', fire onDone.
+```
+Why each phase exists:
+- **Terminal-state check.** Prevents zombie iterations after a run was already cleaned up. Lambda re-invocations are idempotent against terminal states.
+- **`maxIterations` guard.** Stops runaway tool loops (model keeps invoking the same tool). Hard-fails the run with a clear reason instead of looping forever.
+- **Compaction before prompt build.** If history is over budget, the system prompt + history would blow past the model's context window. Compaction trims first.
+- **`updateState({ status: 'streaming' })` before the model call.** If the process dies mid-stream, status alone tells you "this run was interrupted during a model call." Good for ops dashboards.
+- **Single try/catch around `llm.stream`.** All model errors funnel into `fail(state, message)`. There's no partial recovery — that lives in the openclaw failover system, intentionally out of scope here.
+- **Persist assistant message regardless of stop reason.** Even on `tool_use`, the assistant's reasoning and the tool calls themselves go into history. Otherwise the next iteration can't link tool results to their calls.
+- **Sequential tool execution.** Tool calls run one at a time in iteration order. Parallel execution is easy to add (`Promise.all`) but means non-deterministic ordering and harder debugging.
+- **Bump iteration + reset to `pending` before scheduling.** The next iteration sees a clean state. `awaiting_tools` is only the **transient** state during tool execution.
+- **`return` immediately after `scheduleNextIteration`.** Critical for Lambda: the current invocation must end so a new one can take over. In local mode, the `LocalExecutor` resolves the `await` chain in the same process.
+### `buildSystemPrompt()`
+```ts
+private buildSystemPrompt(): string {
+  const skills = this.tools
+    .map((t) => `## ${t.key}\n${t.description}\n\n${t.content}`)
+    .join('\n\n---\n\n');
+  return `${this.context}\n\n# Available Tools / Skills\n\n${skills}`;
+}
+```
+Concatenates `context` + a "Tools" section. The format (`## key`, `---` separators) is conventional markdown that LLMs are well-trained on.
+Why **not** pass tools as a structured `tools` parameter to the model API? You can, and that's how function-calling works at the API level. This file uses a hybrid: tools are advertised in the system prompt **and** passed to `llm.stream(...)` so the provider sees the schemas. The system-prompt copy is what lets skill-style tools work where the model just consumes the content rather than calling a function.
+The order — context first, tools second — is deliberate for **prompt caching**. The context block is fixed across iterations (heartbeat, etc.), so providers can cache the prefix. Tools come second so changes to the tool list don't invalidate the context cache. This is the same ordering rule openclaw enforces in `prompt-cache-retention.ts`.
+### `executeTool(call)`
+```ts
+private async executeTool(call) {
+  const tool = this.tools.find((t) => t.key === call.name);
+  if (!tool) return { type: 'tool_result', toolCallId: call.id, output: 'Error: ...', isError: true };
+  return { type: 'tool_result', toolCallId: call.id, output: tool.content };
+}
+```
+Two paths:
+- **Unknown tool** → error result. The model occasionally hallucinates tool names; returning a clear error lets the next iteration self-correct.
+- **Known tool** → return its `content` as the result.
+The "return content as result" behavior is the **skill** interpretation of tools: the result is a body of instructions the next iteration should follow. For **function-style** tools, replace this line with a dispatch table:
+```ts
+const result = await toolHandlers[call.name]?.(call.input);
+return { type: 'tool_result', toolCallId: call.id, output: JSON.stringify(result) };
+```
+The `isError` flag on `tool_result` exists because Anthropic's API supports it as a hint to the model. OpenAI silently ignores it.
+### `maybeCompact(history, state)`
+```ts
+private async maybeCompact(history, state) {
+  const size = history.reduce((n, m) => n + JSON.stringify(m.blocks).length, 0);
+  if (size <= this.compactionCharThreshold) return history;
+  await this.updateState(state, { status: 'compacting' });
+  const keep = history.slice(-4);
+  const dropped = history.slice(0, -4);
+  const summary = { /* synthetic system message describing what was dropped */ };
+  await this.orm.saveMessage(summary);
+  return [summary, ...keep];
+}
+```
+A simplified stand-in for openclaw's compaction (`compact.ts`, `compact.runtime.ts`).
+- **Char-based budget.** Real implementations use token counts, but chars are a cheap proxy that needs no tokenizer.
+- **Keep last 4 messages.** A common heuristic: recent context is always more valuable than older context. Number is arbitrary; tune per use case.
+- **Synthetic summary system message.** Replaces dropped messages with a single placeholder. A real implementation would call the model to summarize the dropped content; this scaffold just notes "[compacted N messages]" so the model knows context was elided.
+- **Status transition.** Sets `compacting` during the pass so an observer can see it. Status is left as `compacting` when we return; the caller (`continueRun`) overwrites it to `streaming` immediately after.
+Why a synthetic message rather than mutating existing ones: the ORM is **append-only**. We never rewrite history — we extend it. Compaction adds a summary; the dropped messages still exist in the DB if you need them for forensics.
+### `updateState(state, patch)` and `fail(state, error)`
+```ts
+private async updateState(state, patch) {
+  const next = { ...state, ...patch, updatedAt: new Date() };
+  Object.assign(state, next);
+  await this.orm.saveRunState(next);
+}
+```
+A small utility. Three things happen:
+1. **Spread + override** to produce the next state. Always bumps `updatedAt`.
+2. **`Object.assign(state, next)`** mutates the caller's local reference so subsequent reads in the same iteration see the new state without another DB round-trip.
+3. **Persist** to the ORM.
+`fail` is a thin wrapper that sets `status: 'failed'` + `lastError`. It exists so error paths are syntactically identical and easy to grep.
+---
+## 8. Lambda entry point
+```ts
+type LambdaEvent =
+  | { action: 'start'; text; files? }
+  | { action: 'continue'; runId };
+function makeLambdaHandler(handler: PromptHandler) {
+  return async (event) => {
+    if (event.action === 'start')    return handler.handlePrompt({ text: event.text, files: event.files });
+    if (event.action === 'continue') { await handler.continueRun(event.runId); return { runId: event.runId }; }
+  };
+}
+```
+The shape of a Lambda function that fronts the handler. Two routes:
+- `start` — external trigger (API Gateway, EventBridge, SQS). Calls `handlePrompt`.
+- `continue` — self-invocation from `LambdaExecutor`. Calls `continueRun`.
+The discriminated `LambdaEvent` makes routing exhaustive at the type level — if you add a third action, TypeScript forces you to handle it.
+This block is illustrative; you'd normally wire your own SQS/API Gateway adapter on top. The key insight is that **the same `PromptHandler` instance** handles both routes — no code duplication between "first turn" and "subsequent turns."
+---
+## 9. End-to-end example
+```ts
+import { PromptHandler, ORM, LocalExecutor } from './PromptHandler';
+const orm = new ORM({ type: 'mongodb', uri: 'mongodb://localhost:27017', database: 'agent' });
+const handler = new PromptHandler({
+  context: 'You are a helpful assistant. Current date: 2026-05-14.',
+  tools: [
+    { key: 'search_docs', description: 'Search internal docs.', content: 'When invoked, search the knowledge base.' },
+  ],
+  model: { name: 'claude-opus-4-7', provider: 'mock', apiKey: '' },
+  orm,
+  callbacks: {
+    onPartialReply: (chunk, runId) => console.log(`[${runId}]`, chunk),
+    onToolStart:    (call, runId) => console.log(`[${runId}] tool:`, call.name),
+    onDone:         (runId) => console.log(`[${runId}] done`),
+  },
+});
+const { runId } = await handler.handlePrompt({ text: 'What docs do we have on retention?' });
+```
+With `MockProvider`, this run produces:
+1. User message persisted (iteration 0).
+2. Assistant message with one `tool_call` for `search_docs` (iteration 0).
+3. Tool result message containing the skill content (iteration 0).
+4. Assistant text "Done." (iteration 1).
+5. RunState → `done`.
+Swap `provider: 'mock'` for `'anthropic'` + a real API key and the same flow drives a real model.
+For Lambda:
+```ts
+const lambdaExecutor = new LambdaExecutor(invoke);
+const handler = new PromptHandler({ /* ... */, executor: lambdaExecutor });
+export const lambdaHandler = makeLambdaHandler(handler);
+```
+Same handler, same loop, just a different scheduling primitive.
+---
+## 10. Mapping to openclaw
+For anyone cross-referencing this scaffold to the real `pi-embedded-runner`:
+| Scaffold concept            | openclaw equivalent                                                 |
+| --------------------------- | ------------------------------------------------------------------- |
+| `handlePrompt`              | `runEmbeddedPiAgent` entry (`src/agents/pi-embedded-runner/run.ts:361`) |
+| `continueRun` (one iter.)   | `runEmbeddedAttempt` (`run/attempt.ts:802`)                          |
+| `LLMProvider.stream`        | `subscribeEmbeddedPiSession` + per-provider wrappers                 |
+| `buildSystemPrompt`         | `buildEmbeddedSystemPrompt` (`system-prompt.ts:17`)                  |
+| `executeTool` + tool result | `runToolLifecycle` (`run/attempt.ts:2761`)                           |
+| `maybeCompact`              | `compact.ts` + `compaction-runtime-context.ts`                       |
+| `RunState`                  | run-state.ts                                                         |
+| `ORM`                       | session manager (`session-manager-*.ts`)                             |
+| `Executor`                  | not split out — openclaw runs in-process, but the loop is shaped     |
+|                             | so it could be                                                       |
+| `StreamCallbacks`           | `onPartialReply` / `onReasoningStream` / `onBlockReply` etc. in      |
+|                             | `run/attempt.ts:2710-2718`                                           |
+| `maxIterations`             | retry-limit (`run/retry-limit.ts`)                                   |
+| Prompt cache ordering rule  | `prompt-cache-retention.ts` + cache-control payload wrappers         |
+The scaffold deliberately omits openclaw's:
+- Failover / fallback (`run/assistant-failover.ts`, `run/failover-policy.ts`)
+- Auth profile rotation (`run/auth-controller.ts`)
+- Per-provider quirks (HTML-entity tool argument decoding, Google prompt cache, etc.)
+- Idle-timeout breakers and abort signals
+- Lane-based concurrency control
+- Token-accurate accounting
+Those exist because real production loops face real production failure modes. Add them when you hit those failure modes; don't add them speculatively.
+---
+## 11. What you'd change to harden this
+Roughly in order of impact:
+1. **Wire real ORM adapters.** Pick `mongodb` or `kysely`, fill in the eight methods, add the indexes listed above.
+2. **Wire `AnthropicProvider.stream`** against `@anthropic-ai/sdk`. Map event types to blocks and callbacks. Same for `OpenAIProvider` (`openai` SDK) and `GoogleProvider` (`@google/genai`).
+3. **Add an abort signal.** `handlePrompt` accepts an `AbortSignal`; pass it down to `llm.stream` and check it between iterations. Lets you cancel a runaway run.
+4. **Add idle timeouts.** Wrap `llm.stream` in a `Promise.race` with a deadline so a stalled provider doesn't hold the Lambda forever.
+5. **Real compaction.** Call the model itself to summarize the dropped messages; replace the placeholder summary with the model's output.
+6. **Failover.** Catch model errors in `continueRun`, classify them, and either retry, swap model, or fail. This is the largest piece of openclaw not represented here.
+7. **Telemetry.** Add a logger plumbed through callbacks; record token counts, tool latencies, iteration durations.
+8. **Prompt-cache hits.** Use `cacheGet`/`cacheSet` to memoize the system prompt block hash per model — saves money on long-running loops with stable context.
+Each of these is additive; none requires restructuring the loop.