npm - @onkernel/cua-ai - Versions diffs - 0.0.1 → 0.2.0 - Mend

@onkernel/cua-ai 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/CHANGELOG.md +96 -0
package/README.md +341 -65
package/dist/chunk-D7D4PA-g.js +13 -0
package/dist/index.d.ts +576 -10
package/dist/index.js +1992 -11
package/docs/supported-models.md +76 -0
package/examples/quickstart.ts +28 -22
package/package.json +10 -6
package/dist/api-keys.d.ts +0 -8
package/dist/api-keys.d.ts.map +0 -1
package/dist/api-keys.js +0 -48
package/dist/api-keys.js.map +0 -1
package/dist/index.d.ts.map +0 -1
package/dist/index.js.map +0 -1
package/dist/models.d.ts +0 -33
package/dist/models.d.ts.map +0 -1
package/dist/models.js +0 -159
package/dist/models.js.map +0 -1
package/dist/providers/anthropic/index.d.ts +0 -10
package/dist/providers/anthropic/index.d.ts.map +0 -1
package/dist/providers/anthropic/index.js +0 -16
package/dist/providers/anthropic/index.js.map +0 -1
package/dist/providers/common.d.ts +0 -111
package/dist/providers/common.d.ts.map +0 -1
package/dist/providers/common.js +0 -138
package/dist/providers/common.js.map +0 -1
package/dist/providers/gemini/index.d.ts +0 -11
package/dist/providers/gemini/index.d.ts.map +0 -1
package/dist/providers/gemini/index.js +0 -14
package/dist/providers/gemini/index.js.map +0 -1
package/dist/providers/openai/index.d.ts +0 -8
package/dist/providers/openai/index.d.ts.map +0 -1
package/dist/providers/openai/index.js +0 -22
package/dist/providers/openai/index.js.map +0 -1
package/dist/providers/tzafon/index.d.ts +0 -12
package/dist/providers/tzafon/index.d.ts.map +0 -1
package/dist/providers/tzafon/index.js +0 -18
package/dist/providers/tzafon/index.js.map +0 -1
package/dist/providers/tzafon/provider.d.ts +0 -8
package/dist/providers/tzafon/provider.d.ts.map +0 -1
package/dist/providers/tzafon/provider.js +0 -234
package/dist/providers/tzafon/provider.js.map +0 -1
package/dist/providers/yutori/index.d.ts +0 -12
package/dist/providers/yutori/index.d.ts.map +0 -1
package/dist/providers/yutori/index.js +0 -23
package/dist/providers/yutori/index.js.map +0 -1
package/dist/providers/yutori/provider.d.ts +0 -9
package/dist/providers/yutori/provider.d.ts.map +0 -1
package/dist/providers/yutori/provider.js +0 -307
package/dist/providers/yutori/provider.js.map +0 -1
package/dist/providers.d.ts +0 -6
package/dist/providers.d.ts.map +0 -1
package/dist/providers.js +0 -26
package/dist/providers.js.map +0 -1
package/dist/runtime-spec.d.ts +0 -29
package/dist/runtime-spec.d.ts.map +0 -1
package/dist/runtime-spec.js +0 -58
package/dist/runtime-spec.js.map +0 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,101 @@
 # Changelog
+## 0.2.0 - 2026-06-10
+### Fixed
+- The published package is now importable under plain Node ESM. 0.1.0 shipped
+  extensionless relative imports in `dist/`, so `import "@onkernel/cua-ai"`
+  failed outside bundlers; `dist/` is now bundled with tsdown.
+- The shipped `examples/quickstart.ts` imports `@onkernel/cua-ai` instead of a
+  `../src` path that does not exist in the tarball, checks `stopReason` so
+  provider errors are no longer silent, resolves its API key via
+  `requireCuaEnvApiKeyForModel`, and switches providers with the `CUA_MODEL`
+  env var.
+- `docs/` (the supported-models list the README links to) is now included in
+  the npm tarball.
+- A malformed Yutori tool call now degrades to an empty-arguments call instead
+  of failing the entire response, matching the existing Tzafon hardening.
+### Breaking changes
+- Provider namespaces follow one convention. Every namespace now exports
+  `computerTools({ actions? })` / `computerToolExecutors({ actions? })`,
+  `createActionSchema`, `coordinateSystem()`, `providerModule`,
+  `<PROVIDER>_CUA_ACTION_TYPES`, `<PROVIDER>_COMPUTER_INSTRUCTIONS`, a
+  `<Provider>Action` type, and `ComputerToolsOptions`. This replaces 0.1.0's
+  `createComputerToolDefinitions(options)` /
+  `CreateComputerToolDefinitionsOptions`, the per-namespace
+  `COMPUTER_TOOL_COORDINATES` constants, `TZAFON_ACTION_TYPES` /
+  `YUTORI_ACTION_TYPES`, and the `OPENAI_BATCH_INSTRUCTIONS` /
+  `GEMINI_INSTRUCTIONS_RAW` / `TZAFON_INSTRUCTIONS_RAW` /
+  `YUTORI_INSTRUCTIONS_RAW` prompt constants.
+- `CUA_BATCH_TOOL_NAME` is now `"computer_batch"` (was
+  `"batch_computer_actions"`), matching the batch tool Anthropic ships by
+  default. `anthropic.ANTHROPIC_BATCH_TOOL_NAME` carries the same new value;
+  the other per-namespace batch aliases (`TZAFON_BATCH_TOOL_NAME`,
+  `YUTORI_BATCH_TOOL_NAME`, `*_BATCH_DESCRIPTION`, `*BatchSchema`,
+  `*BatchInput`) were removed — use `CUA_BATCH_TOOL_NAME`,
+  `CUA_BATCH_TOOL_DESCRIPTION`, `CuaBatchSchema`, and `CuaBatchInput`.
+- Anthropic tools are now the 13 canonical browser actions Anthropic supports
+  (no `back`/`forward`/`url`) plus a `computer_batch` batch tool by default;
+  pass `excludeBatch: true` to omit it. Unsupported `actions` entries throw.
+  `anthropic.ANTHROPIC_CUA_ACTION_TYPES` reflects the supported subset rather
+  than aliasing the full canonical list.
+- Yutori models now use Yutori's documented native `tool_set` request field.
+  `streamYutori` strips canonical action tools from the outbound payload
+  (preserve specific tools via the `keepToolNames` stream option), selects the
+  n1.5 core tool set where applicable, and normalizes native tool calls back
+  to canonical names. `yutori.providerModule.toolDefinitions()` is `[]`;
+  `yutori.computerTools()` builds local mirrors for executor lookup, validates
+  `{ actions }` against the supported subset, and throws on unsupported
+  actions. `yutoriBuiltinToolsOnPayload` was replaced by
+  `yutoriNativeToolSetOnPayload`. The Yutori runtime spec also carries a
+  screenshot policy (append a 1280x800 webp screenshot to the latest message).
+- Family model annotations now match only the family root plus numeric
+  revision or dated-snapshot suffixes (`claude-opus-4-7`,
+  `gpt-5.5-2026-04-23`). Named sibling variants such as `gpt-5.4-mini` are no
+  longer listed by `listCuaModels()` or accepted by `getCuaModel()` without
+  their own annotation.
+- `google:gemini-2.5-computer-use-preview-10-2025` was removed from the
+  catalog: it rejects the standard function declarations this package sends
+  and requires Google's native `tools.computer_use` wrapper. Use
+  `google:gemini-3-flash-preview` or `google:gemini-3-pro-preview`.
+- `streamTzafonResponses` no longer accepts a `maxOutputTokens` option — use
+  the standard `maxTokens` stream option.
+### Added
+- `CuaProviderModule` contract plus a `providerModule` export per namespace,
+  and a richer `CuaRuntimeSpec`: `toolExecutors` (local adapters that turn
+  provider tool calls into canonical `CuaAction`s via `CuaToolExecutorSpec`),
+  `coordinateSystem`, and optional `screenshot` policy alongside the existing
+  tool definitions, default prompt, and payload middleware.
+- `resolveCuaRuntimeSpec(input, options?)` accepts `ComputerToolsOptions` and
+  forwards it to the provider module, so runtime consumers can narrow tool
+  definitions and executors (e.g. `{ actions: ["click"] }`).
+- `registerCuaProviders()` is exported: importing the package still registers
+  the Yutori/Tzafon stream providers automatically, and this restores them
+  after pi-ai registry mutators (`clearApiProviders`, `resetApiProviders`,
+  `unregisterApiProviders`).
+- `parseCuaModelRef` / `getCuaModel` accept `"gemini:"` refs as an alias for
+  `"google:"`, and unsupported-provider errors now list the valid providers.
+- `CuaMouseButton` and `CuaDragMouseButton` closed unions type the `button`
+  field on click/mouse_down/mouse_up and drag actions.
+- `yutori.YutoriOptions` and `tzafon.TzafonResponsesOptions` are exported and
+  aligned; both support `keepToolNames` to preserve caller tools that collide
+  with canonical action names on the wire.
+- Yutori native action vocabulary exports: `YUTORI_N1_ACTION_TYPES`,
+  `YUTORI_N15_CORE_ACTION_TYPES`, `YUTORI_N15_EXPANDED_ACTION_TYPES`,
+  tool-set ids, `yutoriToolSetForModel`, `yutoriNativeActionsForModel`, and
+  `toCanonicalActions`; Tzafon exports `toCanonicalActions`,
+  `TzafonCanonicalAction`, `tzafonComputerUseOnPayload`, and
+  `tzafonToolCallId`.
+- README and JSDoc coverage across the public surface: API key prerequisites
+  and helpers, error handling (`stopReason` semantics), a multi-turn
+  tool-result example, the complete export list, and per-provider canonical
+  action subsets.
 ## 0.1.0
 - Provider-qualified CUA model catalog with support annotations and curated overrides.

package/README.md CHANGED Viewed

@@ -10,46 +10,190 @@ for building CUA agents on Kernel.
 npm install @onkernel/cua-ai
 ```
+## Prerequisites
+You need an API key for each provider you call. The helpers in this package
+check these environment variables, in order:
+| Provider    | Environment variables (checked in order)    |
+| ----------- | ------------------------------------------- |
+| `openai`    | `OPENAI_API_KEY`                            |
+| `anthropic` | `ANTHROPIC_OAUTH_TOKEN`, `ANTHROPIC_API_KEY` |
+| `google`    | `GOOGLE_API_KEY`, `GEMINI_API_KEY`          |
+| `tzafon`    | `TZAFON_API_KEY`                            |
+| `yutori`    | `YUTORI_API_KEY`                            |
+The exported helpers wrap this table:
+- `cuaApiKeyEnvVarsForProvider(provider)` — the env var names for a provider
+  (accepts `"gemini"` as an alias for `"google"`).
+- `getCuaEnvApiKey(provider)` — read the key, or `undefined` when unset.
+- `requireCuaEnvApiKey(provider)` — read the key, or throw naming the
+  variables to set.
+- `getCuaEnvApiKeyForModel(refOrModel)` / `requireCuaEnvApiKeyForModel(refOrModel)`
+  — the same, keyed by a model ref like `"openai:gpt-5.5"` or a concrete
+  `Model<Api>`.
+Pass the resolved key as the `apiKey` stream option (as in the Quick Start
+below) so a missing key fails loudly before any request is made. If you omit
+`apiKey`, pi-ai's built-in providers fall back to their own env lookup
+(`OPENAI_API_KEY`; `ANTHROPIC_OAUTH_TOKEN`/`ANTHROPIC_API_KEY`; for `google`
+only `GEMINI_API_KEY`, not `GOOGLE_API_KEY`), and this package's Tzafon/Yutori
+stream adapters read `TZAFON_API_KEY`/`YUTORI_API_KEY`.
 ## Quick Start
 ```ts
 import { readFile } from "node:fs/promises";
-import { complete, getCuaModel, openai } from "@onkernel/cua-ai";
-const screenshot = await readFile("examples/screenshot.png");
+import { complete, getCuaModel, openai, requireCuaEnvApiKeyForModel } from "@onkernel/cua-ai";
 const model = getCuaModel("openai:gpt-5.5");
-const response = await complete(model, {
-  systemPrompt: "You are a browser automation agent.",
-  messages: [
-    {
-      role: "user",
-      content: [
-        { type: "text", text: "Click the Login button in this screenshot." },
-        { type: "image", data: screenshot.toString("base64"), mimeType: "image/png" },
-      ],
-      timestamp: Date.now(),
-    },
-  ],
-  tools: openai.createComputerToolDefinitions({ actions: ["click"] }),
-});
+const apiKey = requireCuaEnvApiKeyForModel("openai:gpt-5.5"); // throws unless OPENAI_API_KEY is set
+// Any screenshot of the page you want to act on, resolved relative to this
+// module so the snippet does not depend on the process working directory.
+const screenshot = await readFile(new URL("./screenshot.png", import.meta.url));
+const response = await complete(
+  model,
+  {
+    systemPrompt: "You are a browser automation agent.",
+    messages: [
+      {
+        role: "user",
+        content: [
+          { type: "text", text: "Click the sign in / up link in this screenshot." },
+          { type: "image", data: screenshot.toString("base64"), mimeType: "image/png" },
+        ],
+        timestamp: Date.now(),
+      },
+    ],
+    tools: openai.computerTools({ actions: ["click"] }),
+  },
+  { apiKey },
+);
+if (response.stopReason === "error" || response.stopReason === "aborted") {
+  throw new Error(response.errorMessage ?? `request ended with stopReason "${response.stopReason}"`);
+}
 for (const block of response.content) {
-  if (block.type === "toolCall" && block.name === "click_mouse") {
+  if (block.type === "toolCall" && block.name === "click") {
     console.log("click:", block.arguments);
   }
 }
 ```
+A runnable version ships at [`examples/quickstart.ts`](./examples/quickstart.ts)
+(with a sample screenshot). In this repo, run it from `packages/ai` with
+`npm run example:quickstart`; switch providers with the `CUA_MODEL` env var,
+e.g. `CUA_MODEL=anthropic:claude-opus-4-7`.
+## Error Handling
+pi-ai's `complete()` and `stream()` **resolve instead of throwing** when a
+request fails. The returned `AssistantMessage` carries the outcome on
+`stopReason`:
+- `"stop"`, `"length"`, `"toolUse"` — success; `content` holds the response.
+- `"error"` — the provider call failed (bad API key, no model access, network
+  error, …). `content` is empty and `errorMessage` holds the provider error.
+- `"aborted"` — the request was cancelled via the `signal` stream option.
+Always check `stopReason` before reading `content` — otherwise a typo'd API
+key looks like a successful run that produced nothing:
+```ts
+if (response.stopReason === "error" || response.stopReason === "aborted") {
+  throw new Error(response.errorMessage ?? `request ended with stopReason "${response.stopReason}"`);
+}
+```
+`getCuaModel()`, `requireCuaEnvApiKey*()`, and `computerTools({ actions })`
+validate eagerly and throw regular errors.
+## Continuing the Loop
+[`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent)
+runs this loop for you — `CuaAgent`/`CuaAgentHarness` classes with browser
+execution against a Kernel browser. Reach for it first; the rest of this
+section is for driving the loop yourself against your own browser stack.
+A computer-use session is a loop: the model calls a tool, you execute it
+against a real browser, and you send the result (with a fresh screenshot) back
+so the model can plan the next step. Tool results are pi-ai
+`ToolResultMessage`s:
+```ts
+type ToolResultMessage = {
+  role: "toolResult";
+  toolCallId: string; // ToolCall.id from the assistant message
+  toolName: string;   // ToolCall.name
+  content: (TextContent | ImageContent)[];
+  details?: unknown;  // optional executor metadata, not sent to the model
+  isError: boolean;
+  timestamp: number;
+};
+```
+A minimal two-turn loop:
+```ts
+import { complete, getCuaModel, openai, requireCuaEnvApiKeyForModel, type Message } from "@onkernel/cua-ai";
+const model = getCuaModel("openai:gpt-5.5");
+const apiKey = requireCuaEnvApiKeyForModel("openai:gpt-5.5");
+const tools = openai.computerTools({ actions: ["click", "type", "screenshot"] });
+const messages: Message[] = [
+  {
+    role: "user",
+    content: [
+      { type: "text", text: "Click the sign in / up link in this screenshot." },
+      { type: "image", data: screenshotBase64, mimeType: "image/png" },
+    ],
+    timestamp: Date.now(),
+  },
+];
+// Turn 1: the model responds with tool calls.
+const first = await complete(model, { messages, tools }, { apiKey });
+if (first.stopReason === "error" || first.stopReason === "aborted") {
+  throw new Error(first.errorMessage);
+}
+messages.push(first); // the AssistantMessage joins the transcript as-is
+// Execute each tool call against your browser stack, then append a
+// toolResult message carrying a fresh screenshot.
+for (const block of first.content) {
+  if (block.type !== "toolCall") continue;
+  const freshScreenshotBase64 = await runInYourBrowser(block.name, block.arguments);
+  messages.push({
+    role: "toolResult",
+    toolCallId: block.id,
+    toolName: block.name,
+    content: [
+      { type: "text", text: "done" },
+      { type: "image", data: freshScreenshotBase64, mimeType: "image/png" },
+    ],
+    isError: false,
+    timestamp: Date.now(),
+  });
+}
+// Turn 2: the model sees the results and plans the next action.
+const second = await complete(model, { messages, tools }, { apiKey });
+```
 ## Core Concepts
-`@onkernel/cua-ai` re-exports the core primitives of
-[`@earendil-works/pi-ai`](https://github.com/earendil-works/pi/tree/main/packages/ai):
+`@onkernel/cua-ai` re-exports the full surface of
+[`@earendil-works/pi-ai`](https://github.com/earendil-works/pi/tree/main/packages/ai)
+(`export * from "@earendil-works/pi-ai"`), including the core primitives:
 `Model`, `Context`, `Message`, `Tool`, `complete`, `stream`, `completeSimple`,
-`streamSimple`, `Type`, `Static`, `TSchema`, and the event/validation helpers
-that pi-ai exposes. Some familiarity with pi-ai is assumed; Kernel adds the
-computer-use model catalog and provider/tool metadata.
+`streamSimple`, `Type`, `Static`, `TSchema`, and the event/validation helpers.
+Some familiarity with pi-ai is assumed; Kernel adds the computer-use model
+catalog and provider/tool metadata.
 ### Model Refs
@@ -59,13 +203,14 @@ computer-use model catalog and provider/tool metadata.
 ```ts
 getCuaModel("openai:gpt-5.5");
 getCuaModel("anthropic:claude-opus-4-7");
-getCuaModel("google:gemini-2.5-computer-use-preview-10-2025");
+getCuaModel("google:gemini-3-flash-preview");
 getCuaModel("tzafon:tzafon.northstar-cua-fast");
 getCuaModel("yutori:n1.5-latest");
 ```
 `getCuaModel(ref)` returns a pi-ai `Model<Api>` you can pass to `complete()`
-or `stream()`.
+or `stream()`. It throws when the ref names a model without a CUA-support
+annotation.
 See [`docs/supported-models.md`](./docs/supported-models.md) for the current
 list of CUA-supporting models per provider.
@@ -95,59 +240,153 @@ interface CuaModelInfo {
 }
 ```
-### Exports
+## Exports
+Everything below is importable from the package root. pi-ai's full surface is
+re-exported alongside (see [Core Concepts](#core-concepts)).
-Top-level exports:
+### Models and refs
 - `getCuaModel(ref: CuaModelRef): Model<Api>`
 - `listCuaModels(provider?: CuaProvider): CuaModelInfo[]`
+- `parseCuaModelRef(ref: string): { provider: CuaProvider; model: string }` —
+  accepts the `"gemini:"` alias
+- `formatCuaModelRef(provider, model): CuaModelRef`
 - `providerForModel(model: Model<Api>): CuaProvider`
-- `resolveCuaRuntimeSpec(input: CuaModelRef | Model<Api>): CuaRuntimeSpec`
+- `isCuaProvider(value: string): value is CuaProvider`
+- `findCuaAnnotation(provider, modelId): CuaModelAnnotation | undefined`
 - `CUA_PROVIDERS: readonly CuaProvider[]`
-- `CuaBatchSchema`, `CuaActionSchema`, `CuaNavigationSchema` TypeBox schemas
-- `createCuaActionSchema(actions?)`, `createCuaBatchSchema(actions?)`
+- `CUA_MODEL_ANNOTATIONS: Record<CuaProvider, readonly CuaModelAnnotation[]>` —
+  the source-cited support table
+- Types: `CuaProvider`, `CuaModelRef`, `CuaModelInfo`, `CuaModelAnnotation`,
+  `CuaModelMatch`
+### API keys
+- `cuaApiKeyEnvVarsForProvider(provider): readonly string[]`
+- `getCuaEnvApiKey(provider): string | undefined`
+- `requireCuaEnvApiKey(provider): string`
+- `getCuaEnvApiKeyForModel(refOrModel): string | undefined`
+- `requireCuaEnvApiKeyForModel(refOrModel): string`
+### Runtime specs
+- `resolveCuaRuntimeSpec(input: CuaModelRef | Model<Api>, options?: ComputerToolsOptions): CuaRuntimeSpec`
+- Types: `CuaRuntimeSpec`, `CuaRuntimeSpecInput`, `CuaProviderModule`,
+  `CuaScreenshotSpec`, `CuaScreenshotTransformSpec`, `CuaPayloadHook`,
+  `CuaPayloadContext`
 `resolveCuaRuntimeSpec()` centralizes provider-specific defaults for
 runtime consumers:
 - canonical provider id
-- canonical CUA tool definitions
+- provider-facing CUA tool definitions used in model requests
+- local execution adapters used by `CuaAgent`/`CuaAgentHarness`
 - default system prompt text
+- provider coordinate convention
+- optional provider screenshot input policy
 - optional provider payload middleware (for protocol quirks)
-Provider namespaces expose `createComputerToolDefinitions({ actions? })` for
-building model-facing pi-ai `Tool[]` definitions. Omit `actions` for the
-provider's default computer tool set, or pass an action subset to narrow the
-schema for a single `complete()` call:
+Pass `options` (e.g. `{ actions: ["click"] }`) to narrow the resolved tool
+definitions and executors; it is forwarded to the provider module's
+`toolDefinitions()`/`toolExecutors()`, so providers with a restricted subset
+(Anthropic, Yutori) throw on unsupported actions.
+### Canonical actions and tools
+- `CUA_ACTION_TYPES: readonly CuaActionType[]` — the 16 canonical action names
+- `computerTools(options?: ComputerToolsOptions): Tool[]` /
+  `createCuaActionToolDefinitions(actions?)` — one `Tool` per canonical action
+  (the full canonical superset; provider namespaces apply provider defaults
+  and validation on top)
+- `computerToolExecutors(options?)` / `createCuaActionToolExecutors(actions?)`
+  — matching `CuaToolExecutorSpec[]` execution adapters
+- `createCuaActionSchema(actions?)`, `CuaActionSchema` — TypeBox union schema
+- `createCuaBatchSchema(actions?)`, `CuaBatchSchema`,
+  `createCuaBatchToolDefinition(actions?, options?)`,
+  `createCuaBatchToolExecutor(actions?, options?)`,
+  `CUA_BATCH_TOOL_NAME` (`"computer_batch"`), `CUA_BATCH_TOOL_DESCRIPTION`
+- `createCuaNavigationToolDefinition()`, `CuaNavigationSchema`,
+  `CUA_NAVIGATION_TOOL_NAME` (`"computer_use_extra"`),
+  `CUA_NAVIGATION_TOOL_DESCRIPTION`
+- `canonicalToolCallName(action)`, `canonicalToolCallArguments(action)` — map
+  a normalized `CuaAction` back to its tool-call name/arguments
+- `normalizeGotoUrl(value)` — prefix bare hostnames with `https://`
+- Types: `CuaAction` (plus the 16 per-action interfaces), `CuaActionType`,
+  `CuaMouseButton`, `CuaDragMouseButton`, `CuaBatchInput`,
+  `CuaNavigationInput`, `CuaToolExecutorSpec`, `ComputerToolsOptions`,
+  `ComputerToolCoordinateSystem`
+### Provider registration
+- `registerCuaProviders(): void` — re-register the Yutori/Tzafon stream
+  providers with pi-ai's global registry (runs automatically on import;
+  idempotent; call it after any pi-ai registry mutator)
+## Provider Tools
+Provider namespaces expose `computerTools({ actions? })` for
+building the provider's default CUA `Tool[]` definitions. These are the tools
+sent to the model when you call `complete()` or `stream()` directly. The
+default set can differ by provider: Anthropic includes its `computer_batch`
+tool from the computer-use best-practices reference, while providers such as
+OpenAI currently expose individual canonical browser actions. Omit `actions`
+for the provider's default computer tool set, or pass an action subset to narrow
+the schema for a single `complete()` call:
 ```ts
 import { openai } from "@onkernel/cua-ai";
-const allComputerTools = openai.createComputerToolDefinitions();
-const clickOnlyTools = openai.createComputerToolDefinitions({ actions: ["click"] });
+const allComputerTools = openai.computerTools();
+const clickOnlyTools = openai.computerTools({ actions: ["click"] });
 ```
-Every provider namespace synthesizes a `batch_computer_actions` tool definition.
-That gives models a consistent way to plan ordered browser actions even when the
-provider's native computer-use API has a different shape. Provider namespaces
-are still used so the definitions can diverge over time where provider protocol
-differences matter.
-Provider namespaces also expose `COMPUTER_TOOL_COORDINATES`, which describes
-the coordinates the provider's computer tool calls are expected to emit:
+When `actions` is provided, it must be a subset of that provider's supported
+canonical action set; unsupported actions throw (e.g.
+`anthropic.computerTools({ actions: ["back"] })` throws
+`unsupported Anthropic canonical action(s): back`).
+Per-provider canonical action subsets (each namespace exports its list as
+`<PROVIDER>_CUA_ACTION_TYPES`):
+| Namespace   | Canonical actions                                                                  |
+| ----------- | ---------------------------------------------------------------------------------- |
+| `openai`    | all 16                                                                              |
+| `anthropic` | 13 — everything except `back`, `forward`, `url`; adds `computer_batch` by default  |
+| `gemini`    | all 16                                                                              |
+| `tzafon`    | all 16 (replaced on the wire by Tzafon's native `computer_use` tool)                |
+| `yutori`    | 13 — everything except `screenshot`, `url`, `cursor_position` (local mirrors only)  |
+Runtime specs also include `toolExecutors`: provider-owned adapters that use
+the same tool-call names as the model-facing tools and translate their
+arguments into canonical CUA actions for `@onkernel/cua-agent`. For most
+providers, `toolDefinitions` and `toolExecutors` line up one-for-one. Some
+providers are different on the wire: Yutori exposes browser actions through its
+documented `tool_set` request field, so its runtime spec has no model-facing
+`toolDefinitions` (`yutori.providerModule.toolDefinitions()` is `[]`) but
+still provides local `toolExecutors` for the canonical actions emitted after
+Yutori's native tool calls are normalized. `yutori.computerTools()` builds
+local mirrors of those canonical tools — they are never sent to the API
+(`streamYutori` strips them from the outbound payload) and exist so the
+normalized tool calls have matching local definitions/executors. Caller-provided
+tools that should remain on the provider payload can be preserved by payload
+middleware via `CuaPayloadContext.keepToolNames`.
+Provider namespaces also expose `coordinateSystem()`, which returns the
+coordinates the provider's computer tool calls are expected to emit:
 ```ts
-openai.COMPUTER_TOOL_COORDINATES
+openai.coordinateSystem()
 // { type: "pixel" }
-gemini.COMPUTER_TOOL_COORDINATES
+gemini.coordinateSystem()
 // { type: "normalized", range: [0, 999] }
 ```
 Current coordinate contracts:
 - `openai`: pixel coordinates
-- `anthropic`: pixel coordinates
+- `anthropic`: pixel coordinates, matching Anthropic's computer-use quickstart
 - `gemini`: normalized coordinates in the 0-999 range ([source](https://ai.google.dev/gemini-api/docs/computer-use))
 - `yutori`: normalized coordinates in the 0-1000 range ([source](https://docs.yutori.com/reference/navigator), [SDK helper](https://github.com/yutori-ai/yutori-sdk-python/blob/main/yutori/navigator/coordinates.py))
 - `tzafon`: normalized coordinates in the 0-999 range ([source](https://docs.lightcone.ai/guides/coordinates/), [model card](https://huggingface.co/Tzafon/Northstar-CUA-Fast))
@@ -182,7 +421,7 @@ type CuaActionClick = {
   type: "click";
   x: number;
   y: number;
-  button?: string;
+  button?: CuaMouseButton; // "left" | "right" | "middle" | "back" | "forward"
   hold_keys?: string[];
 };
@@ -192,8 +431,13 @@ type CuaActionGoto = {
 };
 ```
-The provider namespace `createComputerToolDefinitions()` emits a
-`batch_computer_actions` tool whose input is:
+Mouse buttons are closed unions: `CuaMouseButton` for `click`/`mouse_down`/
+`mouse_up` and `CuaDragMouseButton` (`"left" | "right" | "middle"`) for
+`drag`. Executors coerce anything outside the set to `"left"`. `keys` stays
+`string[]` — the agent-side key-alias table passes unrecognized keys through.
+`createCuaBatchToolDefinition(actions?, options?)` builds a batch tool schema
+whose input is:
 ```ts
 type CuaBatchInput = {
@@ -201,11 +445,14 @@ type CuaBatchInput = {
 };
 ```
-The model can plan several writes and reads in one call. Read actions such as
-`screenshot`, `url`, and `cursor_position` can be interleaved with writes so
-your executor can return fresh state in the same order.
+Providers can include a batch tool when their model is expected to use one.
+Anthropic does this by default with `computer_batch` (also exported as
+`anthropic.ANTHROPIC_BATCH_TOOL_NAME`, equal to the top-level
+`CUA_BATCH_TOOL_NAME`); Yutori does not.
+`createCuaBatchToolExecutor()` is the matching execution adapter for turning
+that provider-defined batch input into canonical CUA actions.
-When `actions` is omitted, the OpenAI namespace also emits a `computer_use_extra`
+`createCuaNavigationToolDefinition()` can synthesize a `computer_use_extra`
 navigation tool whose input is:
 ```ts
@@ -215,15 +462,44 @@ type CuaNavigationInput = {
 };
 ```
-Provider namespaces:
-- `openai`: `createComputerToolDefinitions`, `COMPUTER_TOOL_COORDINATES`, OpenAI CUA action schemas, and `OPENAI_BATCH_INSTRUCTIONS`
-- `anthropic`: `createComputerToolDefinitions`, `COMPUTER_TOOL_COORDINATES`, prompt helpers, and CUA batch schema aliases
-- `gemini`: `createComputerToolDefinitions`, `COMPUTER_TOOL_COORDINATES`, prompt helpers, and CUA batch schema aliases
-- `tzafon`: `createComputerToolDefinitions`, `COMPUTER_TOOL_COORDINATES`, prompt helpers, and local `tzafon-responses` stream adapter
-- `yutori`: Yutori prompt helpers, local `yutori-chat-completions` stream
-  adapter, `createComputerToolDefinitions`, `COMPUTER_TOOL_COORDINATES`, and
-  `yutoriBuiltinToolsOnPayload`
+## Provider Namespaces
+Every provider namespace (`openai`, `anthropic`, `gemini`, `tzafon`,
+`yutori`) follows one convention:
+- `computerTools(options?)` and `computerToolExecutors(options?)`
+- `createActionSchema(actions?)` — TypeBox schema for the provider's subset
+- `coordinateSystem()`
+- `build<Provider>SystemPrompt({ suffix? })` and
+  `<PROVIDER>_COMPUTER_INSTRUCTIONS` (the prompt text)
+- `<PROVIDER>_CUA_ACTION_TYPES` — the supported canonical action subset
+- `<Provider>Action` type — the canonical action union for that subset
+- `ComputerToolsOptions` type (Anthropic's adds `excludeBatch`, also exported
+  as `AnthropicComputerToolsOptions`)
+- `providerModule` — the uniform `CuaProviderModule` object that
+  `resolveCuaRuntimeSpec` looks up
+Provider-specific extras:
+- `openai`: `openaiResponsesStoreOnPayload` payload hook, plus the
+  `computer_use_extra` navigation aliases `OPENAI_EXTRA_TOOL_NAME`,
+  `OPENAI_EXTRA_TOOL_DESCRIPTION`, `OpenAIExtraSchema`, `OpenAIExtraInput`
+- `anthropic`: `ANTHROPIC_BATCH_TOOL_NAME` (`"computer_batch"`)
+- `tzafon`: the `tzafon-responses` stream adapter (`TZAFON_RESPONSES_API`,
+  `streamTzafonResponses`, `streamSimpleTzafonResponses`,
+  `TzafonResponsesOptions` with `keepToolNames`), `tzafonComputerUseOnPayload`
+  payload middleware, `tzafonToolCallId`, and the native-to-canonical
+  normalizer `toCanonicalActions` (+ `TzafonCanonicalAction`)
+- `yutori`: the `yutori-chat-completions` stream adapter
+  (`YUTORI_CHAT_COMPLETIONS_API`, `streamYutori`, `streamSimpleYutori`,
+  `YutoriOptions` with `keepToolNames`), `yutoriNativeToolSetOnPayload`
+  payload middleware, the native Navigator action sets
+  (`YUTORI_N1_ACTION_TYPES`, `YUTORI_N15_CORE_ACTION_TYPES`,
+  `YUTORI_N15_EXPANDED_ACTION_TYPES`, `YUTORI_N15_ACTION_TYPES`, the
+  `YUTORI_N15_CORE_TOOL_SET`/`YUTORI_N15_EXPANDED_TOOL_SET` tool-set ids, and
+  the matching `Yutori*ActionType` types), `yutoriToolSetForModel`,
+  `yutoriNativeActionsForModel`, and the native-to-canonical normalizer
+  `toCanonicalActions`
 This package does not execute browser actions. Use `@onkernel/cua-agent` when
 you want model tool calls executed against a Kernel browser.

package/dist/chunk-D7D4PA-g.js ADDED Viewed

@@ -0,0 +1,13 @@
+//#region \0rolldown/runtime.js
+var __defProp = Object.defineProperty;
+var __exportAll = (all, no_symbols) => {
+	let target = {};
+	for (var name in all) __defProp(target, name, {
+		get: all[name],
+		enumerable: true
+	});
+	if (!no_symbols) __defProp(target, Symbol.toStringTag, { value: "Module" });
+	return target;
+};
+//#endregion
+export { __exportAll as t };