npm - @nightowlsdev/core - Versions diffs - 0.3.0 → 0.4.0 - Mend

@nightowlsdev/core 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,270 @@
+# @nightowlsdev/core
+> The engine: `defineAgent` / `defineSkill` / `defineSwarm`, the run loop, cost governor, HITL `ask`.
+`@nightowlsdev/core` is the flagship engine of the Night Owls framework. It defines the assembly API you use to declare tools, skills, agents, and a swarm, and the `SwarmEngine` that runs them — streaming a typed `SwarmEvent` log, serializing per-lane turns, metering token spend, gating side-effecting tools behind human approval, and suspending/resuming for human-in-the-loop questions. It is engine-internals aware but keeps an "engine wall": nothing from `@mastra/*` leaks into its public types, so storage, auth, telemetry, models, and the runner are all pluggable adapters around this core.
+## Install
+```bash
+pnpm add @nightowlsdev/core
+```
+Peer dependency (you provide it):
+- `ai` `^6.0.0` — the AI-SDK, used by the model layer and the `./test-utils` mock model.
+`@mastra/core`, `@mastra/memory`, `@nightowlsdev/hooks`, and `zod` are regular dependencies and installed for you.
+## Usage
+A minimal single-agent swarm with one skill, run against the in-memory dev store. (`modelFactory` returns your AI-SDK model for a given `modelId`; a real app passes a provider plugin like `@nightowlsdev/model-openai`.)
+```ts
+import { z } from "zod";
+import {
+  defineTool,
+  defineSkill,
+  defineAgent,
+  defineSwarm,
+  InMemoryStorage,
+} from "@nightowlsdev/core";
+// 1. A tool = a typed, executable capability. `ctx` carries the run's tenant/user/run ids + `ctx.secrets`.
+const getWeather = defineTool({
+  name: "get_weather",
+  description: "Look up the current weather for a city.",
+  inputSchema: z.object({ city: z.string() }),
+  outputSchema: z.object({ tempC: z.number() }),
+  execute: async ({ city }, ctx) => {
+    // ctx.tenantId / ctx.userId / ctx.runId are available here; ctx.secrets.resolve(ref) reaches the vault.
+    return { tempC: 21 };
+  },
+});
+// 2. A skill is a tool an agent is allowed to invoke.
+const weatherSkill = defineSkill(getWeather);
+// 3. An agent = a persona + its skills (+ optional delegates / model pin).
+const concierge = defineAgent({
+  slug: "concierge",
+  role: "orchestrator",
+  personality: "A concise, friendly travel concierge.",
+  skills: [weatherSkill],
+  modelId: "openai/gpt-5.5",
+});
+// 4. A swarm wires agents to storage, the model allow-list + factory, and per-run caps.
+const swarm = defineSwarm({
+  storage: new InMemoryStorage(),
+  agents: [concierge],
+  models: { allow: ["openai/gpt-5.5"] },
+  modelFactory: (modelId) => myAiSdkModel(modelId), // return an AI-SDK LanguageModel
+  cost: { maxSteps: 12, maxCostUsd: 1.0 },
+});
+// 5. Run a turn — the engine yields a typed event stream you render / persist / meter.
+const ctx = {
+  tenantId: "default",
+  userId: "u1",
+  agentSlug: "concierge",
+  runId: crypto.randomUUID(),
+  threadId: crypto.randomUUID(),
+};
+for await (const ev of swarm.engine.run({ message: "Weather in Lisbon?" }, ctx)) {
+  console.log(ev.type, ev); // swarm.status / swarm.message / swarm.tool_call / swarm.usage / swarm.turn_usage / ...
+}
+```
+## API
+### Assembly
+- **`defineTool<I, O>(spec: ToolSpec<I, O>): SwarmTool`** — declare a typed, executable tool. `spec` carries `name`, `inputSchema`/`outputSchema` (Zod), `execute(input, ctx)`, and `needsApproval` / `origin` (`"first-party"` default, or `"mcp"`). The execute body receives a `SwarmToolContext` (`tenantId`/`userId`/`runId` + `secrets`). Wraps a Mastra tool internally; the approval gate is enforced inside this wrapper.
+- **`defineSkill(tool: SwarmTool): SwarmSkill`** — mark a tool as a skill an agent may invoke (identity passthrough; `SwarmSkill` is `SwarmTool`).
+- **`defineAgent(spec: AgentSpec): AgentDef`** — declare an agent: `slug`, `personality`, optional `role` (`"orchestrator"|"specialist"`), `capabilities`, `skills`, `delegates` (slugs it may hand off to), `modelId` (a concrete id or a tier sentinel — see the tier router), and a per-agent `memory` override.
+- **`defineSwarm(cfg: SwarmConfig): Swarm`** — assemble agents into a runnable `Swarm` (`{ engine: SwarmEngine }`). Seeds code-defined agents into storage and builds a per-swarm skill resolver and hook dispatcher (no module-level global state, no cross-swarm leakage).
+- **`buildSkillResolver(agents: AgentDef[]): (name) => SwarmSkill | undefined`** — the per-swarm skill registry builder `defineSwarm` uses internally; exported for advanced/direct-engine assembly.
+- **`defineBundle(spec: BundleSpec): BundleDef`** — compose + closure-validate a reusable **capability bundle** from `defineAgent` outputs (see *Capability bundles* below).
+- **`mergeBundle(cfg: SwarmConfig, bundle: BundleDef): SwarmConfig`** — fold a validated bundle into a swarm config; the result is a drop-in `defineSwarm` input.
+- **`ASK_TOOL_NAME`** — `"ask"`, the name of the built-in human-in-the-loop tool injected on every agent.
+### Capability bundles (`defineBundle` / `mergeBundle`)
+A **capability bundle** is a reusable, closure-validated *composition* of agents + their rules/workflows (+ connector grants) that you author once and reuse across projects — the unit gap-map #4 ("cross-project capability reuse") names. It's a **static, in-process** composition: a bundle is built from `defineAgent` outputs (skill handles present) and folded into a `SwarmConfig` at boot. No DB, no serialization, **no new runtime path** — `mergeBundle` is a front-end to `defineSwarm`.
+```ts
+import { defineAgent, defineBundle, mergeBundle, defineSwarm } from "@nightowlsdev/core";
+// Author a reusable bundle from composed agents (skill handles ride along on each AgentDef).
+const editor   = defineAgent({ slug: "editor",   personality: "Writes drafts.",  skills: [draftSkill] });
+const reviewer = defineAgent({ slug: "reviewer", personality: "Reviews drafts.", skills: [searchSkill] });
+export const contentStudio = defineBundle({
+  slug: "content-studio",
+  title: "Content Studio",
+  agents: [editor, reviewer],
+  // optional swarm-scoped `rules` / `workflows`, validated against the members' handles
+});
+// Reuse it in any host swarm — mergeBundle folds it into the SwarmConfig, then defineSwarm runs it unchanged.
+const swarm = defineSwarm(
+  mergeBundle(
+    { storage, agents: [], models: { allow: ["openai/gpt-5.5"] }, modelFactory, cost: { maxSteps: 12, maxCostUsd: 1 } },
+    contentStudio,
+  ),
+);
+```
+**`defineBundle` closure-validates at author time** — a missing handle or a typo fails loudly here, not as a runtime `run_failed`:
+- every member `skillName` resolves to a first-party skill **handle** present on the bundle, **or** is covered by a BN1 connector grant (below) — a connector-looking `provider.action` name with neither is rejected;
+- every `delegates` target is a bundle member or a declared `requires` dependency;
+- every tool-seam **rule** tool-ref and every **workflow** `step.tool` resolves to a handle or a declared grant (globs and `agent-*` delegation gates are skipped — they aren't single-handle refs);
+- no workflow step embeds a **credential ref** (`secretRef` / `credentialRef` / `connectionId` / `owl_connections`) — a bundle declares *which* capability it needs, never a handle to a credential.
+**Connector grants (BN1).** A member can declare `connectorGrants` — permission to invoke a connector provider's actions (`{ agentSlug, provider, actions }`, **names only**, never creds). `defineBundle` folds the action names into that member's `skillNames` (expanding a short `"post_message"` to `"slack.post_message"`), so the host's `connectorTools` resolver grants the tool by membership at runtime (SP5-gated). Grants are an allowlist — an ungranted connector action is still rejected, and a grant for one member doesn't satisfy another's skills.
+```ts
+defineBundle({
+  slug: "content-studio",
+  agents: [editor, reviewer],
+  connectorGrants: [{ agentSlug: "editor", provider: "slack", actions: ["post_message"] }],
+});
+```
+**`mergeBundle(cfg, bundle)`** appends the bundle's agents + swarm-scoped rules/workflows to `cfg` (per-agent rules/workflows ride on the merged `AgentDef`s and are collected by `defineSwarm` as usual). A bundle whose member shadows an existing agent slug throws. Types: `BundleSpec`, `BundleDef`, `BundleDep`, `ConnectorGrant`.
+> **Scope.** A bundle is *source-level* composition: code you import and pass to `defineSwarm`, so its rules/workflows keep their normal runtime home (engine config) and its skill handles are present in-process. **BN0** (static composition) and **BN1** (connector grants) have shipped; DB-persisted versioning + cross-tenant distribution + *evolve-once → upgrade-downstream* are later slices (BN2 versioning; BN3 apply/upgrade). Design + roadmap: [`docs/superpowers/specs/2026-06-19-capability-bundles-design.md`](../../docs/superpowers/specs/2026-06-19-capability-bundles-design.md) and [`docs/bundles/README.md`](../../docs/bundles/README.md).
+### The run loop — `SwarmEngine`
+- **`new SwarmEngine(opts: EngineOpts)`** — the engine. `defineSwarm` constructs one for you; construct it directly only for advanced/test wiring. `EngineOpts` is the low-level superset of `SwarmConfig` (it also accepts a prebuilt `hooks` `HookDispatcher`, an injectable `floor`, a `resolveSkill`, etc.).
+- **`engine.run(input: RunInput, ctx: SwarmContext): AsyncIterable<SwarmEvent>`** — run one turn, yielding the typed event stream. `input.message` is the prompt; `input.context` is untrusted/advisory page context.
+- **`engine.resume(args, ctx): AsyncIterable<SwarmEvent>`** — resume a suspended run (after a `swarm.question`) with `{ runId, toolCallId, followupId, answer, context? }`. Continues the same thread; requires a durable `mastraStore` to survive across processes.
+- **`engine.history(threadId, ctx, opts?): Promise<SwarmMessage[]>`** — wall-safe persisted conversation (decodes the `[slug]`/`[user:id]` attribution prefixes). `[]` when stateless.
+- **`engine.listThreads(ctx, opts?): Promise<ThreadSummary[]>`** — the user's conversations (participation-based when the adapter supports it).
+- **`engine.listAgents(ctx): Promise<AgentSummary[]>`** — the tenant's agent roster (slug, display name, role, delegate graph).
+- **`engine.threadEvents(threadId, ctx): Promise<SwarmEvent[]>`** — the full ordered event log for a thread's container (rich timeline restore).
+- **`engine.activeRuns(container, ctx)` / `engine.scratchpadPublic(container, ctx)`** — in-flight runs and the public scratchpad for a conversation.
+- **`ReserveDenied`** — the typed error thrown when a `preGeneration` hook vetoes a model launch; mapped to a terminal `run_failed` stage `"reserve"`.
+### Events
+- **`ev(type, base, data)`** — typed `SwarmEvent` constructor; **`isEvent(e, type)`** — typed narrowing guard.
+- **`SwarmEvent`** (exported type) — the discriminated event union: `swarm.status`, `swarm.message`, `swarm.handoff`, `swarm.tool_call`, `swarm.tool_result`, `swarm.question`, `swarm.answer`, `swarm.usage`, `swarm.turn_usage`, `swarm.run_failed`.
+### Human-in-the-loop `ask`
+Every agent gets a built-in `ask` tool (`ASK_TOOL_NAME`). When an agent calls it, the run **suspends** and the engine emits a `swarm.question` event (carrying the prompt and an optional rich `AskField` widget spec). The host answers by calling `engine.resume(...)`, which feeds `{ answer }` back. The same suspend/resume machinery powers the tool-approval gate and the cap-that-asks.
+### Storage (dev)
+- **`InMemoryStorage`** — a zero-config, single-process `StorageAdapter` for tests/dev. Implements the read-only `agents` repo plus runs/events/messages/scratchpad and `seedAgent` (so `defineSwarm` can seed code-defined agents). Not durable — use `@nightowlsdev/storage-supabase` in production.
+- Adapter contract types (from `./types`): `StorageAdapter`, `AgentRepo`, `VersionedRepo`, `RunStore`, `EventStore`, `MessageStore`, `ScratchpadStore`, plus `SwarmContext`, `AgentVersion`, `RunInput`, `SwarmMessage`, `ThreadSummary`, `AgentSummary`, `MemoryConfig`, `CompletionVerdict`.
+### Prompt + model helpers
+- **`GUARDRAILS`**, **`composeSystemPrompt(row)`** — the built-in safety preamble and per-agent system-prompt composer.
+- **`allowListModelProvider({ allow })`** — the `ModelProvider` that validates every resolved model id (including tier-routed ones) against the swarm's allow-list.
+### Subpath entry points
+- **`@nightowlsdev/core`** — everything above.
+- **`@nightowlsdev/core/test-utils`** — mock-model helpers for downstream tests (depends only on `ai/test` + `zod`, kept off the main barrel): `scriptedModel(scripts)` (an `ai/test` `MockLanguageModelV3` whose Nth `doStream` returns the Nth script), `textScript(chunks)` (a text-only generation script), `toolCallScript(toolName, toolCallId, input)` (a single tool-call generation), `partsStream(parts)`, and the `USAGE` fixture.
+## Configuration / Environment
+`@nightowlsdev/core` reads **no environment variables** — every behavior is configured via `SwarmConfig` (or the lower-level `EngineOpts`). The seams below are the ones most worth knowing.
+### `toolApproval` — the non-removable safety control (lead with this)
+`SwarmConfig.toolApproval?: ToolApprovalPolicy` forces human approval on side-effecting tools **regardless of a tool's own `needsApproval` flag**. It exists because spend caps limit *cost*, not *harm*: a consumer pack could ship a `needsApproval:false` $0.50 action that causes $50k of damage. It cannot be removed — when unset it defaults to `{ mode: "flag" }`.
+Two modes:
+- **`{ mode: "flag" }` (default)** — only tools declared `needsApproval:true` gate (suspend-and-ask). MCP tools default to `needsApproval:true`; first-party tools default to `false`.
+- **`{ mode: "all-side-effecting" }`** — force-ask **every non-read-only tool**: every MCP tool and every first-party tool not on the read-only allowlist. The safe default for an untrusted consumer pack. The exempt set defaults to `DEFAULT_READ_ONLY_TOOLS` (`ask`, `get_page_context`, `recall_lane`); override it via `readOnly`.
+`defineSwarm` bakes the policy into the hook dispatcher, which resolves each call to `allow` (run it), `deny` (block — the side effect never runs, the model gets a blocked result), or `ask` (suspend → `swarm.question` → resume; approve runs the side effect, reject blocks it). The `swarm.tool_call` event's `needsApproval` reflects whether the policy + flag will gate the call (so the UI can render an approval card).
+### `secrets` — run-scoped secret resolution
+`SwarmConfig.secrets?: SecretResolver` plugs a platform vault into the engine. When set, the engine scopes it per-run on the request context, so a first-party tool body can call `await ctx.secrets.resolve(ref)` (typed `BoundSecrets`) to fetch a tenant-scoped secret **at execution time**. The run's tenant/auth scope is captured by the binding — never passed by the tool — so a tool can never resolve another tenant's secret. Unset ⇒ `ctx.secrets.resolve(...)` yields `undefined` (no vault), never throws.
+### Tier router — Swift / Genius
+`SwarmConfig.models.tier?: TierConfig` enables cheap-default model routing layered over per-agent pinning:
+- `tiers.swift` (required) is the cheap default every non-pinned agent lands on; `tiers.genius` (optional) is the frontier model.
+- An agent opts into routing with a tier-sentinel `modelId` (`"tier:"`, `"tier:swift"`, `"tier:genius"`); a concrete `modelId` is kept verbatim (routing never overrides a pin).
+- `allowGenius` (default `false`) is the **server-enforced paid gate** — a pack/agent config cannot grant itself Genius. A Genius request with the gate closed **downgrades** to Swift so the run still proceeds cheaply.
+- An optional per-task `escalate(ctx)` hook may bump a generation to Genius, still subject to the gate.
+Helpers: **`resolveTier(modelId, cfg, ctx): TierResolution`** (the full routing result — effective `modelId`, landed `tier`, `downgraded`/`escalated`), **`tierModelId(modelId, cfg, ctx): string`** (the engine convenience: effective id; identity when no config), **`isTierSentinel(modelId)`**. Types: `ModelTier`, `ModelRef`, `TierConfig`, `TierResolution`, `TierEscalationContext`. The routed model is always re-validated by the allow-list.
+### `cost` — caps + metering + the cap-that-asks
+`SwarmConfig.cost` carries `maxSteps` and `maxCostUsd` (per-run caps) plus metering:
+- `prices` statically overrides the built-in `PRICE_TABLE`; `priceFeed` is an optional live numbers-only seam; `failOnUnknownModel` (default `false`) makes an unpriced model throw instead of pricing at $0.
+- `perDelegate` adds optional per-delegate USD sub-budgets (`PerDelegateBudget`) so one runaway sub-agent can't burn the whole turn.
+- **`onCapHit`** (default `"stop"`) — with `"ask"`, a global cost/step-cap hit **pauses and asks** ("Budget cap reached — continue?") instead of terminally failing; approve raises `maxCostUsd` by **`capIncrementUsd`** (defaults to the original `maxCostUsd`) and continues, reject stops. A server-side opt-in for consumer runs.
+Cost helpers: **`CostGovernor`** (global step + USD cap enforcement, `addUsage`/`shouldStop`/`raiseCostCap`), **`DelegateBudgets`** (per-delegate sub-budgets), **`PRICE_TABLE`** (built-in, host-overridable model prices), **`priceUsage(prices, modelId, breakdown, opts?)`**, **`sumBreakdowns`**, **`sumTurnUsage`**. Types: `Price`, `PerDelegateBudget`, `UsageBreakdown`, `UsageCost`, `PriceFeed`, `PricingOpts`, `SlugUsage`, `TurnUsage`.
+### Container floor — per-lane turn serialization
+`EngineOpts.floor?: ContainerFloor` serializes runs per lane (so two agents in the same lane don't interleave, while different lanes run in parallel). **`containerFloor`** is the default in-memory process singleton; **`InMemoryContainerFloor`** is its class. For serverless/multi-instance deploys, pass a Postgres-backed floor (`createPostgresFloor` from `@nightowlsdev/storage-supabase`). Types: `ContainerFloor`, `FloorHolder`, `Release`.
+### Rate limiting — fixed-window primitive
+A generic, dependency-free fixed-window limiter — the building block for an abuse gateway. It knows nothing about tenants or billing; you supply the key. Wire it into the `preGeneration` / `preToolCall` hooks to cap a tenant's generations or side-effecting tool calls per window (the layer the per-run caps — `maxSteps`, `maxCostUsd` — can't cover).
+- **`createInMemoryRateLimitStore(): RateLimitStore`** — a real single-instance store (`hit(key, cfg, nowSec)` → decision), pruning expired windows opportunistically. A horizontally-scaled deploy backs the same `RateLimitStore` interface with Redis/Postgres.
+- **`decideFixedWindow(prev, cfg, nowSec)`** — the pure decision (no I/O): returns `{ decision, state }` so you can persist the next window state in any store.
+- **`rateConfig(max, windowSec, fallbackMax)`** — build a `{ windowSec, max }` config from an env value, clamped to a sane positive integer.
+- Types: `RateLimitConfig`, `RateLimitState`, `RateLimitDecision`, `RateLimitStore`.
+```ts
+const limiter = createInMemoryRateLimitStore();
+const GEN = rateConfig(Number(process.env.GEN_PER_MIN), 60, 30);
+// inside a preGeneration hook:
+const rl = await limiter.hit(`gen:${ev.tenantId}`, GEN, Math.floor(Date.now() / 1000));
+if (!rl.allow) return deny(`rate limit — retry in ${rl.resetSec}s`);
+```
+### Decision/observer hooks (re-exported from `@nightowlsdev/hooks`)
+`@nightowlsdev/core` re-exports the engine-free hook substrate so hosts configure `SwarmConfig.hooks` without a second import:
+- **`defineHook(hooks)`** — typed-identity helper to author a `SwarmHooks` bundle.
+- **`HookDispatcher` / `createHookDispatcher(hooks?, policy?)`** — the dispatcher `defineSwarm` builds (combines the tool-approval policy + per-tool flag + the optional `preToolCall` hook); decision hooks are **fail-closed** (a throw ⇒ deny).
+- Decision constructors/constants: **`deny(reason)`**, **`ask(reason?)`**, **`ALLOW`**, **`ALLOW_TOOL`**, **`DEFAULT_READ_ONLY_TOOLS`**.
+- Types: `HookDecision`, `ToolDecision`, `SwarmHooks`, `PreGenerationEvent`/`PreGenerationHook`, `ToolPreCallEvent`/`PreToolCallHook`, `ToolApprovalPolicy`, `GuardMutationEvent`/`GuardMutationHook`.
+`SwarmConfig.hooks` exposes `preGeneration` (awaited before each model launch — the platform billing-reserve seam; a `deny` vetoes the generation), `preToolCall` (the richer action-approval gate on top of `toolApproval`), and `guardDefinitionMutation` (gate who may publish/rollback an agent definition).
+### `onEvent` — the per-event observer
+`SwarmConfig.onEvent?: (ev, ctx) => void | Promise<void>` is a best-effort, transport-agnostic observer fired after each event is persisted (in `run` and `resume`). It is awaited but fail-safe (a throw is swallowed). This is where platform metering lives — debit on `swarm.turn_usage`, settle on a terminal. With `preGeneration` it forms the two halves of a credit ledger.
+### `verifyCompletion` — the completion supervisor
+`SwarmConfig.verifyCompletion?: CompletionVerifier` is a host-supplied check (typically a cheap LLM judge) fired when a turn would end: given the original request + a transcript, it returns `{ complete, missing? }`. If incomplete, the engine re-nudges the orchestrator with the specific gap; if it still can't finish, the run ends `run_failed` stage `"incomplete"` (refundable) instead of a silent `done`. Fail-safe (a throw ⇒ treated as complete). Omit ⇒ the cheap structural "did the root speak last?" fallback nudge.
+### Host-owned billing (reference)
+The engine exposes neutral seams (`preGeneration`, `onEvent`, `verifyCompletion`, `cost`) — it meters quantities but never prices them. Pricing, wallets, credits, and refund policies are host concerns. For a worked reference covering all five patterns (prepaid credit ledger, `onEvent` debit/settle/refund observer, per-agent attribution, model-tier wiring, and completion judge), see [`../../examples/billing-reference/README.md`](../../examples/billing-reference/README.md). Those patterns are **host-owned** (implemented in `apps/getnightowls`) and are not part of any `@nightowlsdev/*` API.
+### `customAuth` — wrap an auth function
+`customAuth(fn)` wraps an `authenticate(req)` function into an `AuthProvider`, the shape the runner uses to resolve server-side identity. Identity always comes from the server, never from tool args or page context.
+### Telemetry
+`SwarmConfig.telemetry?: TelemetryExporter | TelemetryExporter[]` collects `run`/`generation`/`tool`/`recall` spans, batch-exported once per run (best-effort — a throwing exporter never breaks a run). Helpers: **`customTelemetry(fn)`**, **`compositeTelemetry(exporters)`**, **`resolveTelemetry(t)`**, **`CapturingExporter`** (test/in-memory), **`SpanCollector`**.
+### Other config options
+`storage` (required `StorageAdapter`), `models.allow` (required), `modelFactory` (required), `mastraStore` (durable suspend/resume snapshot store — required for cross-process HITL), `memory` (opt-in conversational `MemoryConfig`), `pageContext`, `scratchpad`, and `recallLane` (opt-in collaboration tools). See the inline doc comments on `SwarmConfig`/`EngineOpts` for the full contract.