@namzu/sdk 0.1.5 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -0
- package/README.md +314 -669
- package/dist/bridge/tools/connector/adapter.d.ts +2 -2
- package/dist/config/runtime.d.ts +52 -52
- package/dist/connector/builtins/webhook.d.ts +1 -1
- package/dist/contracts/a2a.d.ts +125 -125
- package/dist/contracts/schemas.d.ts +34 -34
- package/dist/index.d.ts +2 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/tools/builtins/__tests__/computer-use.test.d.ts +2 -0
- package/dist/tools/builtins/__tests__/computer-use.test.d.ts.map +1 -0
- package/dist/tools/builtins/__tests__/computer-use.test.js +146 -0
- package/dist/tools/builtins/__tests__/computer-use.test.js.map +1 -0
- package/dist/tools/builtins/__tests__/structuredOutput.example.d.ts +10 -10
- package/dist/tools/builtins/computer-use.d.ts +185 -0
- package/dist/tools/builtins/computer-use.d.ts.map +1 -0
- package/dist/tools/builtins/computer-use.js +151 -0
- package/dist/tools/builtins/computer-use.js.map +1 -0
- package/dist/tools/builtins/index.d.ts +1 -0
- package/dist/tools/builtins/index.d.ts.map +1 -1
- package/dist/tools/builtins/index.js +1 -0
- package/dist/tools/builtins/index.js.map +1 -1
- package/dist/tools/builtins/ls.d.ts +1 -1
- package/dist/types/computer-use/index.d.ts +74 -0
- package/dist/types/computer-use/index.d.ts.map +1 -0
- package/dist/types/computer-use/index.js +35 -0
- package/dist/types/computer-use/index.js.map +1 -0
- package/dist/types/plugin/index.d.ts +14 -14
- package/dist/types/sandbox/index.d.ts +2 -2
- package/dist/types/verification/index.d.ts +18 -18
- package/package.json +19 -21
- package/src/index.ts +5 -0
- package/src/tools/builtins/__tests__/computer-use.test.ts +188 -0
- package/src/tools/builtins/computer-use.ts +165 -0
- package/src/tools/builtins/index.ts +1 -0
- package/src/types/computer-use/index.ts +126 -0
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
# Namzu
|
|
1
|
+
# Namzu — An Operating System for AI Agents
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**A dependency-free, fully controllable kernel that runs, isolates, schedules, remembers, and coordinates AI agents.** Your UI, chat interface, voice surface, CLI, or automation pipeline sits on top of a stable kernel instead of reinventing the hard parts: sandbox isolation, process lifecycle, signals, checkpoints, memory, protocol interop, and audit.
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/@namzu/sdk)
|
|
6
6
|
[](https://github.com/cogitave/namzu/actions/workflows/ci.yml)
|
|
@@ -8,831 +8,476 @@ Open-source AI agent SDK with a built-in runtime. Nothing between you and your a
|
|
|
8
8
|
[](https://www.typescriptlang.org/)
|
|
9
9
|
[](https://nodejs.org/)
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
---
|
|
12
12
|
|
|
13
|
-
##
|
|
13
|
+
## The Thesis
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
Most "agent frameworks" today are really application frameworks. They ship chat UIs, picking UI layouts, batteries-included hosted dashboards, vendor-specific fast paths, and integration drivers for a handful of databases. You get something you can demo in an hour, and three months later you own a stack where the same framework dictates your frontend, your database, your observability, and your model vendor.
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
We think agent software should be layered like Unix. At the bottom there needs to be a **kernel**: something to isolate processes, schedule tool calls, manage memory pressure, propagate signals across a call tree, persist checkpoints so a run can resume after a crash, mediate inter-process communication, and produce an auditable event stream. Above the kernel there is user space — shells, editors, IDEs, voice gateways, React apps. The kernel does not care which shell you pick; the shell cannot break the isolation the kernel provides.
|
|
18
18
|
|
|
19
|
-
**
|
|
19
|
+
**Namzu is the kernel.** It runs agents the way Unix runs processes. It does not render UI, it does not pick your database, it does not favor one LLM vendor. It gives you a surface — typed, versioned, documented — that any UI, any storage backend, and any model can plug into. The surface is small and stable; the guts underneath are deep.
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
---
|
|
22
22
|
|
|
23
|
-
|
|
23
|
+
## What Namzu Is
|
|
24
|
+
|
|
25
|
+
Namzu is a single-process TypeScript kernel with the following responsibilities:
|
|
26
|
+
|
|
27
|
+
- **Process execution and isolation.** Tools run inside OS-level sandboxes: Seatbelt (SBPL) on macOS, mount + PID namespaces on Linux. Deny-default file I/O, scoped network access, enforced resource limits. No Docker, no container runtime, no daemon, no sidecar.
|
|
28
|
+
- **Agent lifecycle.** Parent/child agent spawn with depth tracking, budget splitting, and causal trace linkage. A supervisor can fork a subtree of agents and get their results back, with each child isolated from its siblings.
|
|
29
|
+
- **Scheduling.** Per-run token, cost, wall-clock, and iteration budgets. Limit checker, task router (cheap model for compaction, expensive for coding), tool tiering (LLM learns to prefer cheaper tools first).
|
|
30
|
+
- **Signals.** `AbortController` tree spanning parent and children. `cancel(taskId)` and `cancelAll(parentRunId)` propagate. Runs can be paused and resumed, aborted cleanly, and emit lifecycle events for every transition.
|
|
31
|
+
- **Memory management.** Working memory via structured compaction to a typed `WorkingState`. Long-term memory via an indexed, tag/query/status-searchable store with disk persistence. No vector database required by default.
|
|
32
|
+
- **Durability.** Atomic per-iteration checkpoints, automatic emergency core-dump on SIGINT/SIGTERM, separate storage for runs, threads, conversations, activities, memories, and tasks.
|
|
33
|
+
- **IPC.** Native A2A (Google agent-to-agent) and MCP (Anthropic Model Context Protocol) — both client and server, one SDK. An internal event bus with circuit breakers, file lock manager, and edit ownership tracking so concurrent agents do not stomp on each other.
|
|
34
|
+
- **Capability system.** Tools are first-class, typed, permissioned, and progressively disclosed. The LLM does not see the full tool catalog; tools start deferred, get activated on demand, and can be suspended. Each tool declares `readOnly`, `destructive`, `concurrencySafe`, `permissions`, `category`.
|
|
35
|
+
- **Syscall filtering.** Every tool call goes through a verification gate — allow / deny / ask, with built-in rules for read-only allowlist and dangerous pattern deny-list, plus custom regex rules. This is separate from sandbox isolation; it is the decision layer, the sandbox is the enforcement layer.
|
|
36
|
+
- **Retrieval-augmented context (RAG).** A full pipeline: chunking, embedding providers, ingestion, knowledge base storage, vector store, retriever, context assembler, and a first-class `rag-tool`.
|
|
37
|
+
- **Skills.** Disclosure-tiered capability bundles that the agent can load on demand, distinct from tools.
|
|
38
|
+
- **Personas.** YAML-defined identity, expertise, reflexes, and output format with inheritance — specialize a base persona by merging a single field, no prompt concatenation.
|
|
39
|
+
- **Advisory system.** Mid-execution consultation with specialized advisors. Provider-agnostic: put a security advisor on Bedrock, an architecture advisor on OpenRouter, and let the main agent decide when to consult whom.
|
|
40
|
+
- **Human-in-the-loop.** Structured plan review, per-tool approval with destructiveness flags, typed decision contracts, checkpoint/resume across sessions.
|
|
41
|
+
- **Plugin system.** Lifecycle-hooked plugin loader with MCP contributions, tool contributions, and manifest-driven resolution.
|
|
42
|
+
- **Multi-tenant isolation from day one.** Connector registries, vaults, config, and stores are tenant-scoped. Two organizations can share a process without cross-contamination.
|
|
43
|
+
- **Provider abstraction.** OpenRouter and AWS Bedrock today; the `Provider` interface is narrow enough that adding another vendor is an afternoon. BYOK everywhere, no hidden hot paths for any vendor.
|
|
44
|
+
- **Telemetry.** OpenTelemetry-native spans and metrics. Cost accounting (input tokens, output tokens, cached tokens, cache write tokens, cache discount) flows from the provider into per-run, per-tenant rollups.
|
|
45
|
+
- **Prompt cache integration.** Hash-based system-prompt cache per thread, integrated with provider cache controls (OpenRouter `cacheControl` today, more planned), plus full cache telemetry in every run.
|
|
46
|
+
- **Vault.** BYOK credentials and secrets, tenant-scoped, pluggable backend.
|
|
47
|
+
- **Thread / Run separation.** Conversations (thread: user ↔ assistant messages across sessions) are cleanly separated from runs (tool calls, iterations, internal state). Multi-turn dialogs carry only the context that matters.
|
|
48
|
+
|
|
49
|
+
Every one of those bullets points at code that exists today in `src/`. The architecture is deep even where the surface is quiet.
|
|
50
|
+
|
|
51
|
+
## What Namzu Is Not
|
|
52
|
+
|
|
53
|
+
Equally important for scoping expectations:
|
|
54
|
+
|
|
55
|
+
- **Not a chat SDK.** There are no React, Svelte, or Vue hooks, no generative UI components, no `useChat`. Your UI framework is your choice; the kernel hands you a typed event stream.
|
|
56
|
+
- **Not a hosted service.** There is no dashboard, no Namzu Cloud, no billing page. You run it in your own process.
|
|
57
|
+
- **Not a deployment adapter.** No Next.js, Hono, Express, or Cloudflare Workers plumbing in the kernel. Those belong in separate packages or your own infra code.
|
|
58
|
+
- **Not a dev studio.** No bundled playground UI. A playground that consumes the kernel's event protocol could exist as a separate tool; it would not live inside `@namzu/sdk`.
|
|
59
|
+
- **Not a vector database.** RAG ships with a pluggable `VectorStore` interface, but the kernel does not embed pgvector or Pinecone. Bring your own.
|
|
60
|
+
- **Not an LLM router service.** Task routing is an in-process policy, not a hosted service.
|
|
61
|
+
- **Not a prompt management UI.** Personas are code-defined (YAML files in your repo), not database rows behind a web form.
|
|
62
|
+
|
|
63
|
+
The goal of that list is not to be minimal — the kernel is plenty rich. The goal is to keep the kernel's **interface surface** small and stable so the layers above can move fast without breaking what is underneath.
|
|
24
64
|
|
|
25
|
-
|
|
65
|
+
---
|
|
26
66
|
|
|
27
|
-
|
|
67
|
+
## The Complete Feature Map
|
|
68
|
+
|
|
69
|
+
How Namzu compares, category by category. Framework category tells you what job the project actually does.
|
|
70
|
+
|
|
71
|
+
| | **Namzu** | LangGraph | CrewAI | Mastra | Vercel AI SDK | OpenAI Agents SDK |
|
|
72
|
+
|---|---|---|---|---|---|---|
|
|
73
|
+
| Category | **Agent Kernel** | Graph framework | Crew framework | TS app framework | Frontend-first SDK | Vendor SDK |
|
|
74
|
+
| Language | TypeScript | Python/JS | Python | TypeScript | TypeScript | Python/JS |
|
|
75
|
+
| Process sandbox (OS-level) | ✅ Seatbelt + NS | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
76
|
+
| Multi-tenant from day 1 | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
|
|
77
|
+
| Sub-agent spawn (`fork`/`exec`) | ✅ parent/child/depth/budget | via graph | crews | ✅ | ❌ | handoffs |
|
|
78
|
+
| Signal propagation tree | ✅ AbortController + cancelAll | ❌ | ❌ | partial | ❌ | ❌ |
|
|
79
|
+
| Checkpoint + resume | ✅ per-iteration | ✅ per-superstep | ❌ | partial | ❌ | sessions |
|
|
80
|
+
| Emergency save on signal | ✅ `EmergencySaveManager` | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
81
|
+
| Resource quotas (token / cost / time) | ✅ per run + per child | manual | manual | manual | ❌ | manual |
|
|
82
|
+
| Provider prompt cache wired | ✅ `ContextCache` + telemetry | ❌ | ❌ | partial | ❌ | ✅ |
|
|
83
|
+
| Thread ↔ Run separation | ✅ | ❌ | ❌ | ✅ | ❌ | partial |
|
|
84
|
+
| Native A2A protocol | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
85
|
+
| Native MCP (client + server) | ✅ | plugin | ❌ | ✅ | ❌ | client only |
|
|
86
|
+
| RAG built into the kernel | ✅ full pipeline | via integrations | via integrations | plugin | ❌ | via tools |
|
|
87
|
+
| Persona inheritance (YAML) | ✅ merge-based | ❌ | role strings | partial | ❌ | instructions |
|
|
88
|
+
| Advisory system (multi-advisor) | ✅ provider-agnostic | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
89
|
+
| Structured context compaction | ✅ WorkingState | ❌ | ❌ | partial | ❌ | ❌ |
|
|
90
|
+
| Tool tiering (cost-aware) | ✅ user-defined | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
91
|
+
| Task routing (per-task model) | ✅ fallback chains | manual | manual | manual | ❌ | manual |
|
|
92
|
+
| Progressive tool disclosure | ✅ deferred/active/suspended | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
93
|
+
| Tool-call verification gate | ✅ allow/deny/ask + custom | ❌ | task-level scope | ❌ | tool approval | ❌ |
|
|
94
|
+
| File ownership / edit locking | ✅ `EditOwnershipTracker` | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
95
|
+
| Circuit breakers on the bus | ✅ `CircuitBreaker` | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
96
|
+
| Skills system (separate from tools) | ✅ disclosure-tiered | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
97
|
+
| Plugin system with lifecycle | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
|
|
98
|
+
| Vault / BYOK | ✅ tenant-scoped | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
99
|
+
| Telemetry (OpenTelemetry) | ✅ native | via LangSmith | CrewAI+ | partial | ❌ | built-in tracing |
|
|
100
|
+
| Provider lock-in | none | low | low | low | low | OpenAI-first |
|
|
28
101
|
|
|
29
|
-
|
|
102
|
+
---
|
|
30
103
|
|
|
31
|
-
|
|
104
|
+
## Architecture in Depth — Every Subsystem
|
|
32
105
|
|
|
33
|
-
|
|
106
|
+
Every folder under `src/` maps to a traditional OS concept. This section walks them one by one, in the order a request actually flows.
|
|
34
107
|
|
|
35
|
-
|
|
108
|
+
### 1. The Boundary: Sandbox (`sandbox/`)
|
|
36
109
|
|
|
37
|
-
|
|
110
|
+
Sandboxing is the foundation. Tools do not execute in the host process; they execute inside an OS-enforced jail with deny-default file I/O and scoped network access. macOS uses Seatbelt profiles in SBPL format (the same mechanism Apple's own apps use). Linux uses lightweight mount + PID namespaces, without requiring Docker, systemd, or any container runtime. The `SandboxProvider` abstraction (`sandbox/factory.ts`, `sandbox/provider/`) means you can swap in a Firecracker or gVisor-backed provider later without touching the rest of the kernel.
|
|
38
111
|
|
|
39
|
-
|
|
112
|
+
The kernel enforces memory, timeout, and max-process limits on top of whatever the sandbox gives you. The goal: a rogue or hallucinated tool call should never wipe your filesystem or exfiltrate arbitrary data, even if the LLM tries very hard.
|
|
40
113
|
|
|
41
|
-
|
|
42
|
-
|---|---|---|---|---|---|
|
|
43
|
-
| Language | TypeScript | Python/JS | Python | Python/JS | TypeScript |
|
|
44
|
-
| Provider lock-in | None (BYOK) | Low | Low | Optimized for OpenAI | Low |
|
|
45
|
-
| **Process sandbox** | **Native (Seatbelt + NS)** | No | No | No | No |
|
|
46
|
-
| Agent patterns | 4 (reactive, pipeline, router, supervisor) | Graph-based | Role-based crews | Handoffs | Single-agent |
|
|
47
|
-
| A2A protocol | Native | No | No | No | No |
|
|
48
|
-
| MCP support | Native (client + server) | Plugin | No | Client only | No |
|
|
49
|
-
| Multi-tenant | Built-in | No | No | No | No |
|
|
50
|
-
| Thread/Run separation | Yes | No | No | Sessions | No |
|
|
51
|
-
| Plan review (HITL) | Structured | Graph interrupts | No | Basic | Tool approval |
|
|
52
|
-
| Persona inheritance | Yes | No | Role strings | Instructions | System prompt |
|
|
53
|
-
| Progressive tool loading | Yes | No | No | No | No |
|
|
54
|
-
| Advisory system | Multi-advisor, provider-agnostic | No | No | No | No |
|
|
55
|
-
| Context compaction | Structured WorkingState | No | No | No | No |
|
|
56
|
-
| Tool tiering | Configurable, user-defined | No | No | No | No |
|
|
57
|
-
| Task routing | Per-task model selection | No | No | No | No |
|
|
58
|
-
| RAG built-in | Full pipeline | Via integrations | Via integrations | Via tools | No |
|
|
59
|
-
| Telemetry | OpenTelemetry | LangSmith | CrewAI+ | Built-in tracing | No |
|
|
60
|
-
|
|
61
|
-
### Architecture Quality Scores
|
|
62
|
-
|
|
63
|
-
> **Note:** These scores are AI-generated based on public documentation, community feedback, and (for Namzu) direct codebase analysis. They are not official benchmarks — treat them as an informed architectural comparison, not a definitive ranking. We tried to be fair; if you disagree with a score, open an issue and let's discuss.
|
|
114
|
+
### 2. Interprocess Communication: Bridge (`bridge/`) and Bus (`bus/`)
|
|
64
115
|
|
|
65
|
-
|
|
66
|
-
|---|---|---|---|---|---|
|
|
67
|
-
| Type Safety | 9 | 5 | 7 | 7 | 9 |
|
|
68
|
-
| Modularity | 9 | 5 | 7 | 8 | 9 |
|
|
69
|
-
| Interface Segregation | 8 | 4 | 6 | 8 | 8 |
|
|
70
|
-
| Extensibility | 9 | 7 | 6 | 6 | 7 |
|
|
71
|
-
| Convention Consistency | 8 | 5 | 7 | 8 | 8 |
|
|
72
|
-
| Dependency Direction | 9 | 4 | 6 | 8 | 8 |
|
|
73
|
-
| **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
|
|
116
|
+
Two layers here, with different jobs.
|
|
74
117
|
|
|
75
|
-
**
|
|
118
|
+
**Bridge** is cross-process and cross-agent communication. The `bridge/a2a/` folder speaks Google's Agent-to-Agent protocol: your agents can publish agent cards describing their capabilities and can discover and invoke other agents' capabilities. The `bridge/mcp/` folder speaks Anthropic's Model Context Protocol, both as a client (consume MCP servers as tools) and as a server (expose your Namzu tools to any MCP-speaking agent). The `bridge/sse/` folder contains the event mapper that turns in-process events into Server-Sent Events for any consumer on the other side of HTTP. `bridge/tools/` wires it all into the tool system.
|
|
76
119
|
|
|
77
|
-
|
|
120
|
+
**Bus** is in-process. This is where the kernel's internal nervous system lives. The bus emits typed `AgentBusEvent`s for every meaningful transition: run started, iteration begun, checkpoint created, tool call dispatched, tool result returned, agent paused, agent canceled, plan requested, plan approved, error thrown. On top of raw event fan-out, the bus offers three kernel-grade primitives:
|
|
78
121
|
|
|
79
|
-
-
|
|
122
|
+
- **`CircuitBreaker`** (`bus/breaker.ts`) closes the bus to a flapping agent. If an agent's run keeps failing, the breaker trips and prevents retry storms. Configurable failure threshold and reset timeout.
|
|
123
|
+
- **`FileLockManager`** (`bus/lock.ts`) holds locks on files across concurrent agents. A child cannot acquire a lock its parent or sibling already holds. Acquisition timeout is enforced.
|
|
124
|
+
- **`EditOwnershipTracker`** (`bus/ownership.ts`) records which run last claimed ownership of a path, emits events on contention, and lets a HITL layer decide who wins. When two agents try to edit the same file, one of them is told to wait or re-plan.
|
|
80
125
|
|
|
81
|
-
|
|
126
|
+
These exist because the moment you have more than one agent running in parallel against a shared filesystem, you need the kernel to arbitrate. Most frameworks either do not have parallelism or leave it to user space; Namzu treats it as a first-class kernel concern.
|
|
82
127
|
|
|
83
|
-
|
|
128
|
+
### 3. Process Lifecycle: Manager (`manager/`)
|
|
84
129
|
|
|
85
|
-
|
|
130
|
+
`manager/agent/lifecycle.ts` is the `fork()` + `exec()` + `waitpid()` of the kernel. When a parent agent (say a `SupervisorAgent`) spawns a child, the lifecycle manager:
|
|
86
131
|
|
|
87
|
-
|
|
132
|
+
- Allocates a slice of the parent's token budget, timeout budget, and cost budget to the child
|
|
133
|
+
- Creates a child `AbortController` linked to the parent's
|
|
134
|
+
- Builds a child config via the agent definition's `configBuilder(factoryOptions)`
|
|
135
|
+
- Stamps the child with `parentAgentId`, `parentRunId`, `threadId`, and `depth`
|
|
136
|
+
- Registers the child task in an internal `TaskRegistry` keyed by `TaskId`
|
|
137
|
+
- Emits `agent_pending` on the bus with parent/child/depth metadata
|
|
138
|
+
- Forwards every child event to the parent's run listener so the supervisor sees what its subtree is doing
|
|
88
139
|
|
|
89
|
-
|
|
140
|
+
When the parent is cancelled — by HITL, by a limit breach, or by an external signal — `cancelAll(parentRunId)` walks the subtree and aborts every descendant. This is the Unix `SIGKILL` to a process group.
|
|
90
141
|
|
|
91
|
-
|
|
142
|
+
`manager/connector/` manages the lifecycle of external connectors (MCP servers, HTTP connectors). `manager/plan/lifecycle.ts` coordinates HITL plan review. `manager/run/persistence.ts` is the run-level persistence surface, and `manager/run/emergency.ts` is the emergency-save subsystem (see §9 below).
|
|
92
143
|
|
|
93
|
-
###
|
|
144
|
+
### 4. Scheduling: Router (`router/`), Execution (`execution/`), Limit Checker (`run/LimitChecker.ts`)
|
|
94
145
|
|
|
95
|
-
|
|
146
|
+
The router policy (`router/task-router.ts`) decides which model a task should go to. Compaction and summarization go to cheap models; coding and complex reasoning stay on expensive ones. Tiering is user-defined — you decide which models belong in which tier and what guidance the LLM gets about preferring tier-1 tools first.
|
|
96
147
|
|
|
97
|
-
|
|
98
|
-
- **Personal research agent** — feeds documents into the RAG pipeline, builds a knowledge base from your notes/PDFs/bookmarks, and answers questions with citations from your own data.
|
|
99
|
-
- **Code review agent** — watches your repos, reviews PRs with a pipeline agent (extract diff → analyze → write review), posts feedback automatically.
|
|
100
|
-
- **Media organizer** — scans your library, categorizes files, renames based on metadata, deduplicates. A pipeline agent with file tools handles this end to end.
|
|
148
|
+
The execution layer (`execution/base.ts`, `execution/local.ts`) is the concrete executor that invokes the provider, dispatches tool calls, and produces iteration results. Execution is pluggable; you could swap in a remote executor without touching the agent patterns above.
|
|
101
149
|
|
|
102
|
-
|
|
150
|
+
The limit checker (`run/LimitChecker.ts`) is the kernel scheduler's enforcement point. Every iteration it checks: have we exceeded the token budget? The cost budget? The wall-clock timeout? The iteration count? Has the user issued an abort? If any is true, it returns a typed hard-stop decision — `cancelled`, `token_budget_exceeded`, `timeout`, `max_iterations` — and the run ends cleanly with a stop reason recorded in its metadata.
|
|
103
151
|
|
|
104
|
-
|
|
152
|
+
### 5. The Runtime Query Path (`runtime/`)
|
|
105
153
|
|
|
106
|
-
|
|
107
|
-
- **Document processing pipeline** — ingest contracts, invoices, or reports through RAG. Agents extract key data, flag anomalies, and generate summaries. Human-in-the-loop ensures nothing gets approved without review.
|
|
108
|
-
- **Internal ops bot** — connects to your existing tools (Slack, Jira, databases) via HTTP connectors or MCP servers. Team members ask questions in natural language, the agent queries the right systems and responds.
|
|
109
|
-
- **Compliance checker** — a supervisor agent coordinates specialized sub-agents that each check a different regulation. Results are aggregated, flagged items go through plan review before any action is taken.
|
|
154
|
+
`runtime/query/` is where one iteration of the agent loop actually happens. The pieces:
|
|
110
155
|
|
|
111
|
-
|
|
156
|
+
- `runtime/query/context.ts` assembles the request context: system prompt, persona, skills, tools, messages.
|
|
157
|
+
- `runtime/query/context-cache.ts` implements `ContextCache` — a hash-based system-prompt cache per thread. If the prompt inputs have not changed since last iteration, the cache returns the same text so provider-level prompt caching can hit.
|
|
158
|
+
- `runtime/query/prompt.ts` owns `PromptBuilder` — structured, segment-based prompt assembly (static segment vs dynamic segment) that plays well with provider prompt caches.
|
|
159
|
+
- `runtime/query/guard.ts` runs pre-dispatch guards on the request.
|
|
160
|
+
- `runtime/query/executor.ts` actually calls the provider and streams the result.
|
|
161
|
+
- `runtime/query/result.ts` normalizes the provider's response into the kernel's canonical shape.
|
|
162
|
+
- `runtime/query/checkpoint.ts` writes the iteration's checkpoint.
|
|
163
|
+
- `runtime/query/tooling.ts` bridges the iteration to the tool system, including progressive disclosure state.
|
|
164
|
+
- `runtime/query/iteration/` contains the iteration machinery.
|
|
165
|
+
- `runtime/query/plugin-hooks.ts` lets plugins observe and shape iterations.
|
|
166
|
+
- `runtime/query/events.ts` emits the typed events that feed the bus.
|
|
112
167
|
|
|
113
|
-
|
|
168
|
+
`runtime/decision/` (with `parser.ts` and `fallback.ts`) parses LLM decisions (tool calls vs final answer vs thinking vs advisory request) and falls back gracefully when the LLM returns malformed output.
|
|
114
169
|
|
|
115
|
-
|
|
116
|
-
- **Agent marketplace** — define agents as portable definitions (info + tools + persona), publish them, let others deploy with their own keys and customize via persona inheritance.
|
|
117
|
-
- **Cross-organization workflows** — agents from different companies discover each other via A2A agent cards and collaborate on shared tasks without a central authority.
|
|
170
|
+
### 6. Memory Management: Compaction (`compaction/`) and Store (`store/`)
|
|
118
171
|
|
|
119
|
-
|
|
172
|
+
Memory in the kernel is two systems cooperating.
|
|
120
173
|
|
|
121
|
-
|
|
122
|
-
npm install @namzu/sdk
|
|
123
|
-
```
|
|
174
|
+
**Working memory** is `compaction/`. When a thread's context approaches the model's window, the kernel does not truncate. It runs the `structured` compaction manager (default in `compaction/managers/structured.ts`, with `slidingWindow.ts` and `null.ts` as alternatives), which incrementally extracts `task / plan / files / decisions / failures` from the message stream into a typed `WorkingState`. The extractor (`compaction/extractor.ts`), verifier (`compaction/verifier.ts`), and serializer (`compaction/serializer.ts`) together produce compact markdown that replaces old messages. The agent keeps context awareness at a fraction of the token cost. `compaction/dangling.ts` handles partial tool-call streams that could otherwise corrupt the conversation state.
|
|
124
175
|
|
|
125
|
-
|
|
176
|
+
**Long-term memory** is `store/memory/`. The `MemoryIndex` (with `InMemoryMemoryIndex` as the default and a disk-backed variant) stores typed `MemoryIndexEntry` records, searchable by free-text query, tag set, and status filter. It persists to disk atomically. There is no required vector database — the default is good-old tag and text search. You can layer an embedding-backed index on top if you want, but the kernel does not assume it.
|
|
126
177
|
|
|
127
|
-
|
|
128
|
-
import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
|
|
129
|
-
import { z } from 'zod'
|
|
178
|
+
Alongside memory, `store/` has sibling stores for every kernel concept: `store/run/` (runs, iterations, checkpoints), `store/conversation/` (threads and messages), `store/activity/` (activity log), `store/task/` (task registry), and an in-memory generic `InMemoryStore` for tests and ephemeral workloads.
|
|
130
179
|
|
|
131
|
-
|
|
132
|
-
const searchWeb = defineTool({
|
|
133
|
-
name: 'search_web',
|
|
134
|
-
description: 'Search the web for information',
|
|
135
|
-
inputSchema: z.object({ query: z.string() }),
|
|
136
|
-
category: 'network',
|
|
137
|
-
permissions: ['network_access'],
|
|
138
|
-
readOnly: true,
|
|
139
|
-
destructive: false,
|
|
140
|
-
concurrencySafe: true,
|
|
141
|
-
execute: async ({ query }) => {
|
|
142
|
-
const results = await fetch(`https://api.search.com?q=${query}`)
|
|
143
|
-
return { success: true, output: await results.text() }
|
|
144
|
-
},
|
|
145
|
-
})
|
|
180
|
+
### 7. The Capability System: Tools (`tools/`) and Registry (`registry/`)
|
|
146
181
|
|
|
147
|
-
|
|
148
|
-
const provider = ProviderFactory.createProvider({
|
|
149
|
-
type: 'openrouter',
|
|
150
|
-
apiKey: process.env.OPENROUTER_KEY!,
|
|
151
|
-
})
|
|
182
|
+
Tools in Namzu are first-class typed values, not JSON schemas you have to keep in sync with a handler somewhere else. `defineTool()` takes a Zod `inputSchema`, a Zod `outputSchema` (optional), and an `execute` function. It also takes **declarations** the kernel uses for routing and safety:
|
|
152
183
|
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
tools.
|
|
184
|
+
- `category` — e.g. `network`, `filesystem`, `compute`, `memory`.
|
|
185
|
+
- `permissions` — e.g. `network_access`, `write_filesystem`. Enforced at dispatch time.
|
|
186
|
+
- `readOnly` — predicate over input; tools that only read get different treatment by the verification gate and tool tiering.
|
|
187
|
+
- `destructive` — boolean flag that triggers HITL approval when true.
|
|
188
|
+
- `concurrencySafe` — whether two concurrent runs can invoke this tool with no interference.
|
|
156
189
|
|
|
157
|
-
|
|
158
|
-
id: 'researcher',
|
|
159
|
-
name: 'Research Assistant',
|
|
160
|
-
version: '1.0.0',
|
|
161
|
-
category: 'research',
|
|
162
|
-
description: 'Finds and synthesizes information from the web',
|
|
163
|
-
})
|
|
190
|
+
`tools/builtins/` ships file I/O, shell, and glob-search tools. `tools/advisory/`, `tools/memory/`, `tools/task/`, and `tools/coordinator/` ship kernel-facing tools that let agents consult advisors, query memory, coordinate siblings, and manage their task registry from inside the agent loop.
|
|
164
191
|
|
|
165
|
-
|
|
166
|
-
{ messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
|
|
167
|
-
{ model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
|
|
168
|
-
)
|
|
169
|
-
```
|
|
192
|
+
**Progressive disclosure** is unique to Namzu. Tools exist in three states — `deferred`, `activated`, `suspended`. The LLM does not see the full tool catalog; it sees the current active set plus a searchable summary of deferred tools. When it needs something specific, it activates it; when it is done, it suspends it. This keeps the context window focused, reduces hallucinated tool calls, and lets a single agent work across dozens of tools without drowning in a prompt.
|
|
170
193
|
|
|
171
|
-
|
|
194
|
+
**Tool tiering** teaches the LLM a cost hierarchy. You define tiers ("tier-1: local", "tier-2: fast remote", "tier-3: expensive API"), each with its own guidance template, and the kernel instructs the LLM to prefer lower tiers first. Unlike hardcoded approaches, every label, priority, and template is yours.
|
|
172
195
|
|
|
173
|
-
|
|
196
|
+
Registries (`registry/`) are the kernel's object tables. `registry/tool/` is the canonical tool catalog. `registry/agent/` holds agent definitions (the thing you can `AgentManager.spawn()`). `registry/connector/` holds connector catalogs. `registry/plugin/` holds plugins. `ManagedRegistry` is the shared base class with tenant scoping.
|
|
174
197
|
|
|
175
|
-
###
|
|
198
|
+
### 8. The Decision Layer: Verification Gate (`verification/`)
|
|
176
199
|
|
|
177
|
-
|
|
200
|
+
Before any tool call leaves the kernel, it goes through `verification/gate.ts`'s `VerificationGate`. Think of it as the kernel's seccomp — a rule-based decision layer that says *allow*, *deny*, or *ask*.
|
|
178
201
|
|
|
179
|
-
|
|
180
|
-
import { ReactiveAgent } from '@namzu/sdk'
|
|
202
|
+
Built-in rules:
|
|
181
203
|
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
version: '1.0.0',
|
|
186
|
-
category: 'analysis',
|
|
187
|
-
description: 'Analyzes data with LLM + tools',
|
|
188
|
-
})
|
|
204
|
+
- **`allow_read_only`** — if the tool's `readOnly(input)` returns true, allow.
|
|
205
|
+
- **`deny_dangerous_patterns`** — if the input matches any pattern from `DANGEROUS_PATTERNS` (shell injection, common exfiltration signatures, etc.), deny.
|
|
206
|
+
- **Custom regex rules** — per-tenant, per-agent, or global.
|
|
189
207
|
|
|
190
|
-
|
|
191
|
-
{ messages: [{ role: 'user', content: 'Analyze this dataset and find trends' }], workingDirectory: process.cwd() },
|
|
192
|
-
{
|
|
193
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
194
|
-
tokenBudget: 8192,
|
|
195
|
-
timeoutMs: 600_000,
|
|
196
|
-
provider,
|
|
197
|
-
tools, // ToolRegistry
|
|
198
|
-
systemPrompt: 'You are a data analyst.',
|
|
199
|
-
},
|
|
200
|
-
)
|
|
201
|
-
```
|
|
208
|
+
The `ask` decision hands control to the HITL layer. The verification gate is the kernel layer that makes "destructive tool requires approval" a policy, not a user-space convention.
|
|
202
209
|
|
|
203
|
-
|
|
210
|
+
Verification is intentionally separate from the sandbox: verification is the *decision*, sandbox is the *enforcement*. If a rule fails to deny and a call somehow gets through, the sandbox is still there to contain the damage. Defense in depth, kernel-style.
|
|
204
211
|
|
|
205
|
-
|
|
212
|
+
### 9. Durability: Checkpoints and Emergency Save
|
|
206
213
|
|
|
207
|
-
|
|
208
|
-
import { PipelineAgent } from '@namzu/sdk'
|
|
214
|
+
The kernel assumes processes crash. Two layers make sure that when they do, you do not lose the run.
|
|
209
215
|
|
|
210
|
-
|
|
211
|
-
id: 'etl',
|
|
212
|
-
name: 'ETL Pipeline',
|
|
213
|
-
version: '1.0.0',
|
|
214
|
-
category: 'pipeline',
|
|
215
|
-
description: 'Extract → transform → load',
|
|
216
|
-
})
|
|
216
|
+
**Checkpoints** (`store/run/disk.ts`) are atomic per-iteration snapshots. Each `IterationCheckpoint` captures the run state at a super-step boundary — messages, working state, tool-call state, usage, cost, iteration index. Writes are atomic via write-temp-rename (Convention #8). You can read them, list them, and delete them. A future `Run.replay(runId, { fromCheckpoint })` API will build on top of this; the storage is already there.
|
|
217
217
|
|
|
218
|
-
|
|
219
|
-
{ messages: [], workingDirectory: process.cwd() },
|
|
220
|
-
{
|
|
221
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
222
|
-
tokenBudget: 8192,
|
|
223
|
-
timeoutMs: 600_000,
|
|
224
|
-
steps: [
|
|
225
|
-
{ name: 'extract', execute: async (inp, ctx) => await readSource('./data') },
|
|
226
|
-
{ name: 'transform', execute: async (data, ctx) => normalize(data) },
|
|
227
|
-
{ name: 'load', execute: async (data, ctx) => await writeToDb(data) },
|
|
228
|
-
],
|
|
229
|
-
},
|
|
230
|
-
)
|
|
231
|
-
```
|
|
218
|
+
**Emergency save** (`manager/run/emergency.ts`) is the kernel's core-dump. `EmergencySaveManager` installs handlers for SIGINT and SIGTERM. When the process is dying, every active run gets its `toEmergencySnapshot()` flushed atomically to an `emergency/` directory. On the next boot you can inspect or resume the saved state. There is no reliance on the user remembering to catch signals; the kernel does it.
|
|
232
219
|
|
|
233
|
-
|
|
220
|
+
Together these give Namzu durable execution without requiring a database. Runs resume across crashes, across reboots, across graceful shutdowns.
|
|
234
221
|
|
|
235
|
-
|
|
222
|
+
### 10. Retrieval-Augmented Generation: RAG (`rag/`)
|
|
236
223
|
|
|
237
|
-
|
|
238
|
-
import { RouterAgent } from '@namzu/sdk'
|
|
224
|
+
RAG is a full kernel subsystem, not a bolt-on. The pipeline:
|
|
239
225
|
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
226
|
+
- `rag/chunking.ts` — text chunking strategies (configurable by `ChunkingConfig`).
|
|
227
|
+
- `rag/embedding.ts` — the `EmbeddingProvider` abstraction. Providers are BYOK and swappable.
|
|
228
|
+
- `rag/ingestion.ts` — end-to-end ingest: document → chunks → embeddings → vector store.
|
|
229
|
+
- `rag/vector-store.ts` — the `VectorStore` interface, tenant-scoped via `TenantId`. Bring your own backend (pgvector, Pinecone, an in-memory impl for tests).
|
|
230
|
+
- `rag/knowledge-base.ts` — a named collection of documents with metadata and config.
|
|
231
|
+
- `rag/retriever.ts` — the retrieval query path with configurable top-k, threshold, and reranking.
|
|
232
|
+
- `rag/context-assembler.ts` — turns retrieval hits into prompt-ready context windows.
|
|
233
|
+
- `rag/rag-tool.ts` — a first-class tool your agent can invoke, not an external integration.
|
|
247
234
|
|
|
248
|
-
|
|
249
|
-
{ messages: [{ role: 'user', content: 'Solve 2x + 3 = 11' }], workingDirectory: process.cwd() },
|
|
250
|
-
{
|
|
251
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
252
|
-
tokenBudget: 4096,
|
|
253
|
-
timeoutMs: 600_000,
|
|
254
|
-
provider,
|
|
255
|
-
routes: [
|
|
256
|
-
{ agentId: 'math-solver', agent: mathAgent, description: 'Solves equations' },
|
|
257
|
-
{ agentId: 'writer', agent: writerAgent, description: 'Writes content' },
|
|
258
|
-
],
|
|
259
|
-
fallbackAgentId: 'writer',
|
|
260
|
-
},
|
|
261
|
-
)
|
|
262
|
-
```
|
|
235
|
+
RAG lives in the kernel because retrieval is a capability every non-trivial agent needs. Making you wire it up from plugins every time was not the right default.
|
|
263
236
|
|
|
264
|
-
###
|
|
237
|
+
### 11. Skills (`skills/`)
|
|
265
238
|
|
|
266
|
-
|
|
239
|
+
Skills are disclosure-tiered capability bundles distinct from tools. A skill is a named body of knowledge, workflow, or policy that the agent can load on demand. `skills/loader.ts` reads them from disk; `skills/registry.ts` holds the active catalog; each skill has a `SkillDisclosureLevel` that decides when the LLM sees it (always visible, searchable-on-demand, explicit-activation-only). Skills and tools together form the two axes of an agent's capability surface.
|
|
267
240
|
|
|
268
|
-
|
|
269
|
-
import { SupervisorAgent, AgentManager } from '@namzu/sdk'
|
|
241
|
+
### 12. Personas (`persona/`)
|
|
270
242
|
|
|
271
|
-
|
|
272
|
-
id: 'lead',
|
|
273
|
-
name: 'Project Lead',
|
|
274
|
-
version: '1.0.0',
|
|
275
|
-
category: 'coordination',
|
|
276
|
-
description: 'Delegates sub-tasks to specialized agents',
|
|
277
|
-
})
|
|
243
|
+
Personas describe who an agent is. `persona/assembler.ts` loads them from YAML and composes them with inheritance: a base `researcher` persona defines identity, expertise areas, output format, and reflexes; an `ml-researcher` child merges a single field (`expertise: [...base, 'ML', 'PyTorch']`) and inherits everything else. The assembler produces a typed `AgentPersona` that flows into the prompt as a structured segment (not a string concatenation, not a template hack), so prompt-cache-friendliness is preserved.
|
|
278
244
|
|
|
279
|
-
|
|
280
|
-
{ messages: [{ role: 'user', content: 'Research, write, and review a Q3 report' }], workingDirectory: process.cwd() },
|
|
281
|
-
{
|
|
282
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
283
|
-
tokenBudget: 32_768,
|
|
284
|
-
timeoutMs: 1_800_000,
|
|
285
|
-
provider,
|
|
286
|
-
agentManager, // resolves agent ids → implementations
|
|
287
|
-
agentIds: ['researcher', 'writer', 'reviewer'],
|
|
288
|
-
systemPrompt: 'You coordinate specialists. Decompose tasks, delegate, and synthesize results.',
|
|
289
|
-
},
|
|
290
|
-
)
|
|
291
|
-
// Child runs tracked via parent_run_id and depth
|
|
292
|
-
```
|
|
245
|
+
Personas are code-defined (YAML files in your repo). There is no database, no admin UI, no runtime mutation. That is deliberate: your agent's identity belongs in version control.
|
|
293
246
|
|
|
294
|
-
|
|
247
|
+
### 13. Advisory System (`advisory/`)
|
|
295
248
|
|
|
296
|
-
|
|
249
|
+
An advisor is a specialized assistant a running agent can consult mid-execution. The main agent is solving a task; halfway through it hits a decision it is not confident about, or a domain it wants a second opinion on. It fires an advisory request with context; the advisory layer evaluates triggers, routes to the right advisor, executes on a (possibly different) provider, and returns a structured answer the main agent can act on.
|
|
297
250
|
|
|
298
|
-
|
|
299
|
-
import { defineTool, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
|
|
300
|
-
import { z } from 'zod'
|
|
251
|
+
Pieces:
|
|
301
252
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
category: 'network',
|
|
307
|
-
permissions: ['network_access'],
|
|
308
|
-
readOnly: true,
|
|
309
|
-
destructive: false,
|
|
310
|
-
concurrencySafe: true,
|
|
311
|
-
execute: async ({ url, method }) => {
|
|
312
|
-
const resp = await fetch(url, { method })
|
|
313
|
-
return { success: true, output: await resp.text() }
|
|
314
|
-
},
|
|
315
|
-
})
|
|
253
|
+
- `advisory/registry.ts` — `AdvisorRegistry`, the catalog of available advisors keyed by domain.
|
|
254
|
+
- `advisory/evaluator.ts` — `TriggerEvaluator`, decides whether an advisory should fire given context and config.
|
|
255
|
+
- `advisory/executor.ts` — `AdvisoryExecutor`, runs the advisor, collects its output, and feeds it back.
|
|
256
|
+
- `advisory/context.ts` — `AdvisoryContext`, the payload passed to advisors.
|
|
316
257
|
|
|
317
|
-
|
|
318
|
-
const registry = new ToolRegistry()
|
|
319
|
-
registry.register(getBuiltinTools(), 'deferred')
|
|
320
|
-
registry.register(fetchApi, 'active')
|
|
258
|
+
Unlike Anthropic's advisor tool (Claude-only, single advisor), Namzu's is **provider-agnostic** and **multi-advisor**: put a security advisor on Bedrock, an architecture advisor on OpenRouter, a legal advisor on Anthropic, and the agent decides who to consult. This is one of the things that most cleanly separates Namzu from the pack.
|
|
321
259
|
|
|
322
|
-
|
|
323
|
-
registry.activate(['read_file', 'bash'])
|
|
324
|
-
const llmTools = registry.toLLMTools() // Only active + suspended tools
|
|
325
|
-
```
|
|
260
|
+
### 14. Human-in-the-Loop (`types/hitl/`, `manager/plan/lifecycle.ts`, `types/decision/`)
|
|
326
261
|
|
|
327
|
-
|
|
262
|
+
HITL is structured, not just a "pause and wait for input" hook. The kernel defines typed decision contracts: the LLM produces a plan, the plan can be approved / edited / rejected, approval can be per-tool with explicit destructiveness acknowledgment, rejection can carry feedback that re-enters the loop as a new iteration. The plan lifecycle has its own manager so that pending plans persist across checkpoint resumes. The verification gate's `ask` decision routes into this same HITL layer.
|
|
328
263
|
|
|
329
|
-
|
|
264
|
+
The kernel does not render a UI for this — it emits events and exposes a typed API so the UI layer you choose can render them however you like.
|
|
330
265
|
|
|
331
|
-
|
|
266
|
+
### 15. Providers (`provider/`)
|
|
332
267
|
|
|
333
|
-
|
|
334
|
-
import { PluginLifecycleManager } from '@namzu/sdk'
|
|
335
|
-
|
|
336
|
-
const manager = new PluginLifecycleManager({ pluginRegistry, toolRegistry, log })
|
|
337
|
-
const plugin = await manager.install('/path/to/plugin', 'project')
|
|
338
|
-
await manager.enable(plugin.id)
|
|
339
|
-
// → manifest.tools registered as `${plugin}:${tool}` (deferred)
|
|
340
|
-
// → manifest.hooks attached for run_start/end, iteration_start/end,
|
|
341
|
-
// pre/post_llm_call, pre/post_tool_use
|
|
342
|
-
// → manifest.mcpServers connected via stdio; their tools registered as
|
|
343
|
-
// `${plugin}:mcp__${server}__${tool}` (deferred)
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
Hook handlers can return `continue`, `modify` (rewrite tool input), `skip` (synthesize a tool result), or `error` (fail the run). Modify actions compose — chained hooks each see the previous hook's modified input. The runtime emits `plugin_hook_executing` / `plugin_hook_completed` events around every handler.
|
|
268
|
+
An LLM provider implements a narrow interface: given a typed request, return a typed response (streaming or not) and propagate normalized usage, cost, and cache telemetry. Today `provider/openrouter/` and `provider/bedrock/` are in the box; adding another vendor is adding one directory. `provider/telemetry/` normalizes provider-specific response fields (OpenRouter's `cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`, Bedrock's equivalents) into a single kernel-wide telemetry shape.
|
|
347
269
|
|
|
348
|
-
|
|
270
|
+
`ProviderFactory` is the single entry point. Every run chooses its provider by name; the provider object itself is stateless enough to be shared across runs.
|
|
349
271
|
|
|
350
|
-
|
|
272
|
+
### 16. Connectors (`connector/`)
|
|
351
273
|
|
|
352
|
-
|
|
353
|
-
import { drainQuery, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
|
|
274
|
+
A connector is how an agent reaches external systems. `connector/BaseConnector.ts` is the abstract base; `connector/mcp/` implements MCP connectors in both `stdio` and `http` transports with a `client.ts` and an `adapter.ts` that turns MCP tools into Namzu `ToolDefinition`s; `connector/builtins/` ships the built-in connectors (HTTP, shell, etc.); `connector/execution/` handles connector-level execution concerns. Plugin contributions can register connectors at runtime.
|
|
354
275
|
|
|
355
|
-
|
|
356
|
-
tools.register(getBuiltinTools(), 'active')
|
|
276
|
+
### 17. Prompt Cache Integration
|
|
357
277
|
|
|
358
|
-
|
|
359
|
-
const result = await drainQuery({
|
|
360
|
-
agentId: 'solver', agentName: 'Solver', threadId,
|
|
361
|
-
provider, tools, runConfig, messages, resumeHandler,
|
|
362
|
-
sandboxProvider,
|
|
363
|
-
})
|
|
364
|
-
```
|
|
278
|
+
The kernel takes prompt caching seriously because token cost is the number-one production constraint for agents. `runtime/query/context-cache.ts` maintains a per-thread `ContextCache` that hashes the inputs (system prompt + persona + skills + tools + base prompt) and only rebuilds when the hash changes. When the provider supports cache controls (OpenRouter's `cacheControl` parameter today, Anthropic and Bedrock cache headers in progress), the kernel attaches them, and the response's cache telemetry (`cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`) flows back into the run's usage metrics.
|
|
365
279
|
|
|
366
|
-
|
|
280
|
+
This is why `PromptBuilder` splits a request into static and dynamic segments: the static segment is the cache target, and the kernel does the bookkeeping to keep it stable across iterations so the cache actually hits.
|
|
367
281
|
|
|
368
|
-
|
|
282
|
+
### 18. Vault (`vault/`)
|
|
369
283
|
|
|
370
|
-
|
|
371
|
-
import { drainQuery, SandboxProviderFactory, ToolRegistry, getBuiltinTools, getRootLogger } from '@namzu/sdk'
|
|
284
|
+
The vault holds BYOK credentials and arbitrary secrets. `InMemoryCredentialVault` is the default backend; the `CredentialVault` interface lets you plug in your own. Credentials are tenant-scoped — tenant A cannot see tenant B's keys. Tools, providers, and connectors resolve credentials through the vault rather than reading environment variables directly, so you can rotate without redeploying and you can audit who accessed what.
|
|
372
285
|
|
|
373
|
-
|
|
374
|
-
{ enabled: true, provider: 'local', timeoutMs: 60_000, memoryLimitMb: 512, maxProcesses: 16, cleanupOnDestroy: true },
|
|
375
|
-
getRootLogger(),
|
|
376
|
-
)
|
|
286
|
+
### 19. Telemetry (`telemetry/`)
|
|
377
287
|
|
|
378
|
-
|
|
379
|
-
tools.register(getBuiltinTools(), 'active')
|
|
380
|
-
|
|
381
|
-
const result = await drainQuery({
|
|
382
|
-
agentId: 'coder', agentName: 'Coder', threadId,
|
|
383
|
-
provider, tools, runConfig,
|
|
384
|
-
messages: [{ role: 'user', content: 'Write a Python script and run it' }],
|
|
385
|
-
resumeHandler,
|
|
386
|
-
sandboxProvider, // sandbox-aware tools opt in here
|
|
387
|
-
})
|
|
388
|
-
```
|
|
288
|
+
OpenTelemetry-native. `telemetry/attributes.ts` defines the canonical attribute keys; `telemetry/metrics.ts` defines the kernel's metrics surface. Every iteration, every tool call, every provider call emits spans with consistent attributes: `run.id`, `thread.id`, `agent.id`, `tenant.id`, `tool.name`, `provider.name`, `model`, `usage.input_tokens`, `usage.output_tokens`, `usage.cached_tokens`, `cost.usd`. Wire your existing OTel collector, or pipe to LangSmith / Langfuse / Braintrust via their OTel adapters.
|
|
389
289
|
|
|
390
|
-
|
|
290
|
+
### 20. Plugin System (`plugin/`)
|
|
391
291
|
|
|
392
|
-
|
|
393
|
-
|----------|-----------|---------|
|
|
394
|
-
| macOS | `sandbox-exec` with Seatbelt (SBPL) | Deny-default, allow-back for agent workspace |
|
|
395
|
-
| Linux | Namespace isolation | Process + filesystem isolation |
|
|
292
|
+
Plugins extend the kernel at runtime. A plugin manifest declares what it contributes (tools, MCP servers, advisors, connectors), and the kernel's `plugin/loader.ts` reads manifests from disk, `plugin/resolver.ts` namespaces everything safely, and `plugin/lifecycle.ts` hooks plugin init / shutdown into the kernel's own lifecycle. Plugins can subscribe to iteration hooks via `runtime/query/plugin-hooks.ts` and shape what the LLM sees.
|
|
396
293
|
|
|
397
|
-
|
|
294
|
+
Plugins are how a community ecosystem grows around the kernel without the kernel having to ship batteries for every use case.
|
|
398
295
|
|
|
399
|
-
|
|
400
|
-
- **Workspace-scoped I/O** — reads and writes only within the agent's `rootDir`
|
|
401
|
-
- **Path canonicalization** — resolves macOS symlinks (`/var` → `/private/var`) so seatbelt rules match real paths
|
|
402
|
-
- **Process isolation** — `same-sandbox` scope for signals and process info
|
|
403
|
-
- **Automatic lifecycle** — sandbox is created before query iteration, destroyed in `finally`
|
|
296
|
+
### 21. Gateway (`gateway/`)
|
|
404
297
|
|
|
405
|
-
|
|
406
|
-
// Direct sandbox API (low-level)
|
|
407
|
-
const sandbox = await sandboxProvider.create({
|
|
408
|
-
workingDirectory: process.cwd(),
|
|
409
|
-
timeoutMs: 30_000,
|
|
410
|
-
memoryLimitMb: 512,
|
|
411
|
-
maxProcesses: 16,
|
|
412
|
-
})
|
|
298
|
+
`gateway/local.ts` is the local-process gateway — a thin translation layer between an external caller (HTTP, WebSocket, stdin, another agent over A2A) and the kernel's run API. Put a real HTTP server in front of it and you have an agent service; wrap it in a CLI and you have an agent shell. The gateway is where your application layer plugs into the kernel.
|
|
413
299
|
|
|
414
|
-
|
|
415
|
-
console.log(result.stdout) // "hello\n"
|
|
416
|
-
console.log(result.exitCode) // 0
|
|
300
|
+
### 22. Agent Patterns (`agents/`)
|
|
417
301
|
|
|
418
|
-
|
|
419
|
-
const content = await sandbox.readFile('script.py')
|
|
302
|
+
Four patterns ship in the kernel. They are not mandatory — you can write your own `AbstractAgent` subclass for custom loops — but these are the shapes most real workloads want.
|
|
420
303
|
|
|
421
|
-
|
|
422
|
-
|
|
304
|
+
- **`ReactiveAgent`** — the canonical agent loop. Prompt → LLM → tool call(s) → iterate → stop. Handles token budget, cost limit, timeout, max iterations, HITL injection, progressive tool disclosure, compaction, and checkpointing automatically.
|
|
305
|
+
- **`PipelineAgent`** — deterministic sequential steps. Each step is a typed function; output of step N is input of step N+1. Rolls back on failure. Useful for ETL, RAG ingestion, multi-stage document processing.
|
|
306
|
+
- **`RouterAgent`** — an LLM classifies the input and delegates to the best-suited agent from a configured set of candidates, with a fallback. Useful for intent routing in customer support, dispatcher bots, and multi-expert systems.
|
|
307
|
+
- **`SupervisorAgent`** — a coordinator that spawns and orchestrates a set of specialized child agents. Tracks the full parent/child/depth hierarchy, aggregates results, handles partial failures, and honors the shared budget tracker.
|
|
423
308
|
|
|
424
|
-
|
|
309
|
+
All four sit on top of the same lifecycle manager, the same limit checker, the same bus, the same verification gate. Switching patterns does not change what safety or durability the kernel provides.
|
|
425
310
|
|
|
426
|
-
|
|
311
|
+
### 23. Multi-Tenant Isolation
|
|
427
312
|
|
|
428
|
-
|
|
313
|
+
Every registry, every store, every vault is tenant-scoped. `TenantId` is a branded ID threaded through the kernel's types. A run for tenant A cannot accidentally read tenant B's knowledge base, invoke tenant B's tools, or resolve tenant B's credentials. This is not a feature you turn on — it is the default, and a single-tenant setup is just a special case.
|
|
429
314
|
|
|
430
|
-
|
|
431
|
-
import { ProviderFactory } from '@namzu/sdk'
|
|
315
|
+
### 24. Thread / Run Separation
|
|
432
316
|
|
|
433
|
-
|
|
434
|
-
const openrouter = ProviderFactory.createProvider({
|
|
435
|
-
type: 'openrouter',
|
|
436
|
-
apiKey: process.env.OPENROUTER_KEY!,
|
|
437
|
-
})
|
|
317
|
+
A **thread** is a conversation: a series of user ↔ assistant messages, possibly spanning many sessions, probably spanning many days. A **run** is a single execution pass: an input, iterations, tool calls, usage, cost, result. One thread has many runs. Most frameworks conflate the two; Namzu keeps them explicit, with separate stores, separate IDs, and separate serialization. Multi-turn dialogs carry only the context the kernel thinks matters (via compaction), and run traces stay auditable without drowning in prior-turn tool chatter.
|
|
438
318
|
|
|
439
|
-
|
|
440
|
-
const bedrock = ProviderFactory.createProvider({
|
|
441
|
-
type: 'bedrock',
|
|
442
|
-
region: 'us-east-1',
|
|
443
|
-
})
|
|
319
|
+
---
|
|
444
320
|
|
|
445
|
-
|
|
446
|
-
for await (const chunk of openrouter.chatStream({
|
|
447
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
448
|
-
messages: [{ role: 'user', content: 'hi' }],
|
|
449
|
-
})) {
|
|
450
|
-
process.stdout.write(chunk.delta?.content ?? '')
|
|
451
|
-
}
|
|
321
|
+
## Install
|
|
452
322
|
|
|
453
|
-
|
|
454
|
-
|
|
323
|
+
```bash
|
|
324
|
+
npm install @namzu/sdk
|
|
455
325
|
```
|
|
456
326
|
|
|
457
|
-
|
|
327
|
+
Requirements: Node ≥ 22, TypeScript strict mode, ESM.
|
|
458
328
|
|
|
459
|
-
|
|
329
|
+
## Quick Start
|
|
460
330
|
|
|
461
331
|
```typescript
|
|
462
|
-
import {
|
|
463
|
-
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
332
|
+
import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
|
|
333
|
+
import { z } from 'zod'
|
|
334
|
+
|
|
335
|
+
const searchWeb = defineTool({
|
|
336
|
+
name: 'search_web',
|
|
337
|
+
description: 'Search the web for information',
|
|
338
|
+
inputSchema: z.object({ query: z.string() }),
|
|
339
|
+
category: 'network',
|
|
340
|
+
permissions: ['network_access'],
|
|
341
|
+
readOnly: true,
|
|
342
|
+
destructive: false,
|
|
343
|
+
concurrencySafe: true,
|
|
344
|
+
execute: async ({ query }) => {
|
|
345
|
+
const r = await fetch(`https://api.search.com?q=${query}`)
|
|
346
|
+
return { success: true, output: await r.text() }
|
|
347
|
+
},
|
|
477
348
|
})
|
|
478
349
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
model: 'openai/text-embedding-3-small',
|
|
350
|
+
const provider = ProviderFactory.createProvider({
|
|
351
|
+
type: 'openrouter',
|
|
482
352
|
apiKey: process.env.OPENROUTER_KEY!,
|
|
483
353
|
})
|
|
484
354
|
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
const retriever = new DefaultRetriever(vectorStore, embedder)
|
|
488
|
-
|
|
489
|
-
// Knowledge base — pass (config, vectorStore, embeddingProvider)
|
|
490
|
-
const kb = new DefaultKnowledgeBase(
|
|
491
|
-
{ id: 'docs', name: 'API Guides', tenantId: 'default' },
|
|
492
|
-
vectorStore,
|
|
493
|
-
embedder,
|
|
494
|
-
)
|
|
495
|
-
await kb.ingest(apiDoc, { title: 'API Guide', source: 'doc-1' })
|
|
496
|
-
const results = await kb.query({ text: 'How do I authenticate?', config: { topK: 5 } })
|
|
355
|
+
const tools = new ToolRegistry()
|
|
356
|
+
tools.register(searchWeb)
|
|
497
357
|
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
358
|
+
const agent = new ReactiveAgent({
|
|
359
|
+
id: 'researcher',
|
|
360
|
+
name: 'Research Assistant',
|
|
361
|
+
version: '1.0.0',
|
|
362
|
+
category: 'research',
|
|
363
|
+
description: 'Finds and synthesizes information',
|
|
502
364
|
})
|
|
503
|
-
```
|
|
504
365
|
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
```typescript
|
|
510
|
-
import {
|
|
511
|
-
HttpConnector,
|
|
512
|
-
ConnectorManager,
|
|
513
|
-
ConnectorRegistry,
|
|
514
|
-
MCPClient,
|
|
515
|
-
MCPConnectorBridge,
|
|
516
|
-
TenantConnectorManager,
|
|
517
|
-
} from '@namzu/sdk'
|
|
518
|
-
|
|
519
|
-
// HTTP connector — configure via connect()
|
|
520
|
-
const slack = new HttpConnector()
|
|
521
|
-
await slack.connect(
|
|
522
|
-
{ id: 'slack', baseUrl: 'https://slack.com/api' },
|
|
523
|
-
{ type: 'bearer', token: process.env.SLACK_TOKEN! },
|
|
366
|
+
const result = await agent.run(
|
|
367
|
+
{ messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
|
|
368
|
+
{ model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
|
|
524
369
|
)
|
|
525
|
-
|
|
526
|
-
// MCP client (stdio or HTTP-SSE transport)
|
|
527
|
-
const mcpClient = new MCPClient({
|
|
528
|
-
serverName: 'my-tools',
|
|
529
|
-
transport: { type: 'stdio', command: 'node', args: ['server.js'] },
|
|
530
|
-
})
|
|
531
|
-
await mcpClient.connect()
|
|
532
|
-
const tools = await mcpClient.listTools()
|
|
533
|
-
const result = await mcpClient.callTool('my_tool', { input: 'value' })
|
|
534
|
-
|
|
535
|
-
// Bridge MCP as a connector so connector-based code paths can reach it
|
|
536
|
-
const connectorManager = new ConnectorManager({ registry: new ConnectorRegistry() })
|
|
537
|
-
const mcpBridge = new MCPConnectorBridge({ manager: connectorManager })
|
|
538
|
-
const discoveredTools = await mcpBridge.listTools()
|
|
539
|
-
await mcpBridge.callTool('my_tool', { input: 'value' })
|
|
540
|
-
|
|
541
|
-
// Multi-tenant isolation
|
|
542
|
-
const tenantManager = new TenantConnectorManager({ registry: new ConnectorRegistry() })
|
|
543
|
-
tenantManager.registerTenant({ tenantId: 'org-123', name: 'Org 123' })
|
|
544
370
|
```
|
|
545
371
|
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
## Human-in-the-Loop
|
|
549
|
-
|
|
550
|
-
Pause agent execution for human review of plans and tool calls. Checkpoint and resume runs across sessions.
|
|
372
|
+
That is a complete, sandbox-isolated, checkpointed, telemetrized agent run with prompt caching, progressive tool disclosure, structured compaction, and emergency save all wired in by default. Those are not features you enable; they are how the kernel runs.
|
|
551
373
|
|
|
552
|
-
|
|
374
|
+
Examples for `PipelineAgent`, `RouterAgent`, and `SupervisorAgent` are in `src/agents/`.
|
|
553
375
|
|
|
554
|
-
|
|
555
|
-
import { PlanManager, drainQuery, autoApproveHandler } from '@namzu/sdk'
|
|
556
|
-
import type { ResumeHandler } from '@namzu/sdk'
|
|
557
|
-
|
|
558
|
-
// 1. Plan approval — runs when the agent produces a plan
|
|
559
|
-
const planManager = new PlanManager(runId, async (request) => {
|
|
560
|
-
const decision = await showPlanUI(request)
|
|
561
|
-
return {
|
|
562
|
-
approved: decision.approved,
|
|
563
|
-
feedback: decision.feedback,
|
|
564
|
-
modifiedSteps: decision.editedSteps,
|
|
565
|
-
}
|
|
566
|
-
})
|
|
567
|
-
|
|
568
|
-
// 2. Tool review — runs for every pending tool call (required by query/drainQuery)
|
|
569
|
-
const resumeHandler: ResumeHandler = async (request) => {
|
|
570
|
-
if (request.type === 'tool_review') {
|
|
571
|
-
const hasDestructive = request.toolCalls.some((t) => t.isDestructive)
|
|
572
|
-
return hasDestructive
|
|
573
|
-
? { action: 'reject_tools', feedback: 'Destructive tool blocked' }
|
|
574
|
-
: { action: 'approve_tools' }
|
|
575
|
-
}
|
|
576
|
-
if (request.type === 'plan_approval') {
|
|
577
|
-
return { action: 'approve_plan' }
|
|
578
|
-
}
|
|
579
|
-
return { action: 'continue' }
|
|
580
|
-
}
|
|
581
|
-
|
|
582
|
-
await drainQuery({ /* ...runConfig, provider, tools, messages, */ resumeHandler })
|
|
583
|
-
```
|
|
584
|
-
|
|
585
|
-
Checkpoint/resume enables long-running agents to pause and restart without losing state (`CheckpointManager`, `checkpointId` in `QueryParams`).
|
|
586
|
-
|
|
587
|
-
## A2A Protocol
|
|
588
|
-
|
|
589
|
-
Agent-to-Agent protocol support for cross-platform agent interoperability. Publish agent cards, accept A2A messages, and bridge between Namzu runs and A2A tasks.
|
|
590
|
-
|
|
591
|
-
```typescript
|
|
592
|
-
import { buildAgentCard, runToA2ATask, a2aMessageToCreateRun } from '@namzu/sdk'
|
|
593
|
-
|
|
594
|
-
// Publish agent capabilities as an A2A Agent Card
|
|
595
|
-
const card = buildAgentCard(agentInfo, {
|
|
596
|
-
baseUrl: 'https://api.example.com',
|
|
597
|
-
transport: 'rest',
|
|
598
|
-
providerOrganization: 'Cogitave',
|
|
599
|
-
})
|
|
600
|
-
// Serve at /.well-known/agent-card.json
|
|
601
|
-
|
|
602
|
-
// Convert an inbound A2A message-send into run creation params
|
|
603
|
-
const runParams = a2aMessageToCreateRun(agentId, {
|
|
604
|
-
message: a2aMessage,
|
|
605
|
-
contextId: a2aMessage.contextId,
|
|
606
|
-
metadata: { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192 },
|
|
607
|
-
})
|
|
608
|
-
|
|
609
|
-
// Convert a persisted Run (wire type) + thread messages into an A2A task response
|
|
610
|
-
const a2aTask = runToA2ATask(run, threadMessages)
|
|
611
|
-
```
|
|
612
|
-
|
|
613
|
-
## Streaming (SSE)
|
|
614
|
-
|
|
615
|
-
Map internal agent execution events to Server-Sent Events for real-time client updates.
|
|
616
|
-
|
|
617
|
-
Agents emit `RunEvent`s through the listener passed to `run()` / `drainQuery()`. `mapRunToStreamEvent` translates those into SSE-ready `{ event, data }` tuples (returns `null` for events without a wire mapping, which you should skip):
|
|
618
|
-
|
|
619
|
-
```typescript
|
|
620
|
-
import { mapRunToStreamEvent, drainQuery } from '@namzu/sdk'
|
|
621
|
-
|
|
622
|
-
// Event families: run.*, iteration.*, tool.*, token.*, message.*, review.*,
|
|
623
|
-
// checkpoint.*, activity.*, plan.*, agent.*, task.*, plugin.*, sandbox.*
|
|
624
|
-
const listener = (event) => {
|
|
625
|
-
const mapped = mapRunToStreamEvent(event, runId)
|
|
626
|
-
if (!mapped) return
|
|
627
|
-
response.write(`event: ${mapped.wire}\ndata: ${JSON.stringify(mapped.data)}\n\n`)
|
|
628
|
-
}
|
|
629
|
-
|
|
630
|
-
await drainQuery({ /* ...runConfig, provider, tools, messages */ }, listener)
|
|
631
|
-
```
|
|
632
|
-
|
|
633
|
-
## Persona System
|
|
634
|
-
|
|
635
|
-
Layer-based system prompt assembly with inheritance. Define identity, expertise, reflexes, and output format as structured data.
|
|
636
|
-
|
|
637
|
-
```typescript
|
|
638
|
-
import { assembleSystemPrompt, mergePersonas, withSessionContext } from '@namzu/sdk'
|
|
639
|
-
|
|
640
|
-
const basePersona = {
|
|
641
|
-
identity: { role: 'Research Agent', description: 'Gathers and synthesizes information' },
|
|
642
|
-
expertise: { domains: ['academic research', 'data analysis'] },
|
|
643
|
-
reflexes: { constraints: ['Always cite sources', 'Be concise'] },
|
|
644
|
-
output: { format: 'markdown' },
|
|
645
|
-
}
|
|
646
|
-
|
|
647
|
-
// Specialize via inheritance
|
|
648
|
-
const mlResearcher = mergePersonas(basePersona, {
|
|
649
|
-
expertise: { domains: ['machine learning', 'NLP'] },
|
|
650
|
-
})
|
|
376
|
+
---
|
|
651
377
|
|
|
652
|
-
|
|
653
|
-
const systemPrompt = assembleSystemPrompt(mlResearcher, loadedSkills)
|
|
654
|
-
```
|
|
378
|
+
## Design Principles
|
|
655
379
|
|
|
656
|
-
|
|
380
|
+
Five choices shape every decision in the kernel.
|
|
657
381
|
|
|
658
|
-
|
|
382
|
+
**No workarounds. Fix at the root.** When something is wrong, we fix the pattern, not the symptom. A subtle bug in the lifecycle manager means the lifecycle manager changes — we do not paper over it in the agent pattern that calls it.
|
|
659
383
|
|
|
660
|
-
|
|
661
|
-
import { SkillRegistry, resolveSkillChain } from '@namzu/sdk'
|
|
384
|
+
**Type safety is the foundation.** Every resource ID is branded (`RunId`, `ThreadId`, `TaskId`, `TenantId`, `AgentId`, `ToolId`, `MemoryId`, `ChunkId`...). Every discriminated union has exhaustiveness checks. Every public API has Zod-validated inputs at the boundary. The TypeScript compiler is not a formality; it is the first line of defense.
|
|
662
385
|
|
|
663
|
-
|
|
664
|
-
await registry.registerAll('/path/to/skills', 'metadata')
|
|
386
|
+
**Deny by default. Fail fast.** Sandboxes deny file I/O by default. Verification gates deny tool calls by default unless a rule allows them. Limit checkers fail the run the moment a budget is breached. Configuration errors throw at boot, not at the 90-minute mark of a long-running job.
|
|
665
387
|
|
|
666
|
-
|
|
667
|
-
const loaded = await registry.load('web-search', 'full')
|
|
668
|
-
const skill = loaded?.skill
|
|
669
|
-
|
|
670
|
-
// Resolve inheritance: shared skills + agent-specific overrides
|
|
671
|
-
const chain = await resolveSkillChain(
|
|
672
|
-
'/skills/shared',
|
|
673
|
-
'/skills/agent-specific',
|
|
674
|
-
'metadata',
|
|
675
|
-
)
|
|
676
|
-
```
|
|
388
|
+
**Dependency direction is sacred.** `contracts` knows nothing about `sdk`. `sdk` knows nothing about `agents` or `api`. Circular dependencies are a compile error, not a code-review suggestion. This is what keeps the kernel's interface surface small even as its guts grow.
|
|
677
389
|
|
|
678
|
-
|
|
390
|
+
**Convention over surprise.** Every new feature follows a shared pattern language — Registries, Managers, Stores, Runs, Bridges, Providers. You read one subsystem, you can navigate the next one.
|
|
679
391
|
|
|
680
|
-
|
|
681
|
-
|
|
682
|
-
```typescript
|
|
683
|
-
import { InMemoryConversationStore } from '@namzu/sdk'
|
|
684
|
-
|
|
685
|
-
const store = new InMemoryConversationStore({ maxMessages: 50 })
|
|
392
|
+
---
|
|
686
393
|
|
|
687
|
-
|
|
688
|
-
store.createThread('thd_abc123')
|
|
689
|
-
store.addUserMessage('thd_abc123', 'What is the capital of France?')
|
|
394
|
+
## The Agent Event Protocol (AEP)
|
|
690
395
|
|
|
691
|
-
|
|
692
|
-
store.persistRunResult('thd_abc123', runId, runMessages)
|
|
396
|
+
The kernel's contract with the outside world is a typed, versioned event stream. Any UI, any shell, any observability tool subscribes to AEP and renders what it wants.
|
|
693
397
|
|
|
694
|
-
|
|
695
|
-
const history = store.loadMessages('thd_abc123')
|
|
696
|
-
// → [{ role: 'user', content: '...' }, { role: 'assistant', content: '...' }]
|
|
697
|
-
```
|
|
398
|
+
AEP flows over three transports:
|
|
698
399
|
|
|
699
|
-
|
|
400
|
+
- **Bus** (`bus/`) — in-process, for tightly-coupled consumers.
|
|
401
|
+
- **SSE** (`bridge/sse/mapper.ts`) — cross-process over HTTP, for web UIs and remote observers.
|
|
402
|
+
- **A2A** (`bridge/a2a/`) — cross-agent, for multi-agent meshes.
|
|
700
403
|
|
|
701
|
-
|
|
404
|
+
Every transport emits the same event shape. Event types include run lifecycle (`run_started`, `run_paused`, `run_completed`), iteration events (`iteration_started`, `checkpoint_created`), tool events (`tool_called`, `tool_result`), agent events (`agent_pending`, `agent_canceled`), plan events (`plan_requested`, `plan_approved`), advisory events, and error events. They carry consistent metadata: `runId`, `threadId`, `agentId`, `tenantId`, `timestamp`, `depth`, `parentRunId`.
|
|
702
405
|
|
|
703
|
-
|
|
406
|
+
AEP v1 is being finalized. Until the spec is stamped, treat the event shapes as semver-minor.
|
|
704
407
|
|
|
705
|
-
|
|
706
|
-
import { RunPersistence, DiskTaskStore, getRootLogger } from '@namzu/sdk'
|
|
707
|
-
|
|
708
|
-
// Run persistence with token/cost tracking
|
|
709
|
-
const persistence = new RunPersistence({
|
|
710
|
-
runId,
|
|
711
|
-
agentId: 'researcher',
|
|
712
|
-
agentName: 'Research Assistant',
|
|
713
|
-
providerId: 'openrouter',
|
|
714
|
-
outputDir: './runs',
|
|
715
|
-
runConfig: {
|
|
716
|
-
model: 'anthropic/claude-sonnet-4-20250514',
|
|
717
|
-
tokenBudget: 8192,
|
|
718
|
-
timeoutMs: 600_000,
|
|
719
|
-
temperature: 0.7,
|
|
720
|
-
},
|
|
721
|
-
log: getRootLogger(),
|
|
722
|
-
})
|
|
723
|
-
await persistence.init()
|
|
724
|
-
persistence.accumulateUsage({
|
|
725
|
-
promptTokens: 100,
|
|
726
|
-
completionTokens: 50,
|
|
727
|
-
totalTokens: 150,
|
|
728
|
-
})
|
|
729
|
-
await persistence.persist()
|
|
730
|
-
|
|
731
|
-
// Task store with atomic writes (tenant-aware)
|
|
732
|
-
const taskStore = new DiskTaskStore({
|
|
733
|
-
baseDir: './tasks',
|
|
734
|
-
defaultRunId: runId,
|
|
735
|
-
tenantId: 'org-1',
|
|
736
|
-
})
|
|
737
|
-
```
|
|
738
|
-
|
|
739
|
-
## Telemetry
|
|
408
|
+
---
|
|
740
409
|
|
|
741
|
-
|
|
410
|
+
## What You Can Build
|
|
742
411
|
|
|
743
|
-
|
|
744
|
-
import { initTelemetry, getTracer, createPlatformMetrics } from '@namzu/sdk'
|
|
412
|
+
Namzu is not a toy. It is meant for real workloads.
|
|
745
413
|
|
|
746
|
-
|
|
747
|
-
serviceName: 'agent-platform',
|
|
748
|
-
exporterType: 'otlp',
|
|
749
|
-
otlpEndpoint: 'http://localhost:4318',
|
|
750
|
-
otlpHeaders: { authorization: `Bearer ${process.env.OTLP_TOKEN!}` },
|
|
751
|
-
})
|
|
752
|
-
await telemetry.start()
|
|
414
|
+
**Personal and homelab.** A home-automation agent monitoring logs, restarting services, running health checks. A personal research agent feeding PDFs and notes through the RAG pipeline into a knowledge base, answering with citations from your own data. A code-review agent watching your repos, reviewing PRs with a `PipelineAgent` (extract diff → analyze → write review), and posting feedback automatically. A media organizer scanning your library, categorizing files, renaming based on metadata, deduplicating.
|
|
753
415
|
|
|
754
|
-
|
|
755
|
-
const metrics = createPlatformMetrics()
|
|
416
|
+
**Business and team.** A customer-support triage system where a `RouterAgent` classifies incoming tickets and delegates to specialized children (billing, technical, general), each with its own persona, tools, and knowledge base. A document-processing pipeline ingesting contracts, invoices, and reports through RAG, extracting key data, flagging anomalies, generating summaries, with HITL approval for anything destructive. An internal-ops bot that plugs into Slack, Jira, and your database over MCP. A compliance checker where a `SupervisorAgent` coordinates sub-agents each checking a different regulation, then aggregates results and routes flagged items through plan review.
|
|
756
417
|
|
|
757
|
-
|
|
758
|
-
metrics.recordTokenUsage('anthropic/claude-sonnet-4-20250514', 100, 50)
|
|
759
|
-
metrics.recordToolCall('search_web', true)
|
|
760
|
-
span.end()
|
|
418
|
+
**Platform and SaaS.** This is the shape Namzu was designed for from day one. Agent-as-a-Service — each customer gets isolated agents with their own BYOK keys, connector configs, and knowledge bases; tenant isolation is built in, not bolted on. An agent marketplace — agents are portable definitions (`info + tools + persona + skills`), publishable, deployable by any customer with their own keys, specializable through persona inheritance. Cross-organization workflows where agents from different companies discover each other via A2A agent cards and collaborate without a central authority.
|
|
761
419
|
|
|
762
|
-
|
|
763
|
-
```
|
|
420
|
+
---
|
|
764
421
|
|
|
765
|
-
##
|
|
422
|
+
## Quality Bar
|
|
766
423
|
|
|
767
|
-
|
|
768
|
-
@namzu/sdk
|
|
769
|
-
├── advisory/ Advisor registry, execution, trigger evaluation
|
|
770
|
-
├── agents/ Reactive, Pipeline, Router, Supervisor
|
|
771
|
-
├── bridge/ A2A, SSE, connector→tool adapters
|
|
772
|
-
├── bus/ Agent bus and coordination primitives
|
|
773
|
-
├── compaction/ WorkingState extraction and conversation compaction
|
|
774
|
-
├── config/ Runtime configuration with Zod schemas
|
|
775
|
-
├── connector/ HTTP, webhook, MCP client/server, tenant isolation
|
|
776
|
-
├── constants/ Shared SDK constants
|
|
777
|
-
├── contracts/ External wire types and validation schemas (HTTP/A2A/SSE)
|
|
778
|
-
├── execution/ Base and local execution contexts
|
|
779
|
-
├── gateway/ Local task gateway
|
|
780
|
-
├── manager/ Plan, agent, connector, run lifecycle
|
|
781
|
-
├── persona/ System prompt assembly and merging
|
|
782
|
-
├── plugin/ Manifest discovery, lifecycle, contributions, hooks
|
|
783
|
-
├── provider/ OpenRouter, Bedrock, Mock LLM providers
|
|
784
|
-
├── rag/ Chunking, embedding, vector store, retrieval
|
|
785
|
-
├── registry/ Base, managed, agent, connector, tool, plugin registries
|
|
786
|
-
├── router/ Task→model routing
|
|
787
|
-
├── run/ Reporters and limit checking
|
|
788
|
-
├── runtime/ Query engine, iteration phases, decision parser
|
|
789
|
-
├── sandbox/ Process-level isolation (Seatbelt, namespace)
|
|
790
|
-
├── skills/ Skill registry, discovery, and chaining
|
|
791
|
-
├── store/ In-memory, disk, conversation, activity, task, memory
|
|
792
|
-
├── telemetry/ OpenTelemetry tracing and metrics
|
|
793
|
-
├── tools/ defineTool, built-ins, task / advisory / memory tools
|
|
794
|
-
├── types/ Domain model and internal type definitions
|
|
795
|
-
├── utils/ ID generation, cost calc, hashing, logging, shell
|
|
796
|
-
├── vault/ Credential management
|
|
797
|
-
└── verification/ Verification gate and rules
|
|
798
|
-
```
|
|
424
|
+
On architectural fundamentals, Namzu scores at the top of open-source agent frameworks.
|
|
799
425
|
|
|
800
|
-
|
|
426
|
+
| Criterion | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
|
|
427
|
+
|---|---|---|---|---|---|
|
|
428
|
+
| Type Safety | 9 | 5 | 7 | 7 | 9 |
|
|
429
|
+
| Modularity | 9 | 5 | 7 | 8 | 9 |
|
|
430
|
+
| Interface Segregation | 8 | 4 | 6 | 8 | 8 |
|
|
431
|
+
| Extensibility | 9 | 7 | 6 | 6 | 7 |
|
|
432
|
+
| Convention Consistency | 8 | 5 | 7 | 8 | 8 |
|
|
433
|
+
| Dependency Direction | 9 | 4 | 6 | 8 | 8 |
|
|
434
|
+
| **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
|
|
801
435
|
|
|
802
|
-
|
|
436
|
+
Scores are informed from public docs, community reports, and direct codebase analysis — not a definitive ranking. Where we know we have work to do: test coverage is not where the architecture deserves. Helping close that gap is the highest-leverage contribution today.
|
|
803
437
|
|
|
804
|
-
|
|
438
|
+
---
|
|
805
439
|
|
|
806
|
-
|
|
440
|
+
## Roadmap
|
|
807
441
|
|
|
808
|
-
|
|
442
|
+
Honest view. The kernel is already deep. The next three releases tighten the consumer surface, add the subsystems that are genuinely missing, and extend the driver model to new I/O shapes.
|
|
809
443
|
|
|
810
|
-
|
|
444
|
+
### v0.2 — Surface Polish (short, mostly wiring + docs)
|
|
811
445
|
|
|
812
|
-
|
|
446
|
+
- `Run.replay(runId, { fromCheckpoint })` API on top of the existing checkpoint store
|
|
447
|
+
- Memory promotion pipeline connecting compaction output to the indexed memory store via a Reflector persona
|
|
448
|
+
- **AEP v1 spec** — version and document the event shapes in `bridge/sse/mapper.ts`
|
|
449
|
+
- Public pattern docs for lifecycle, checkpoints, emergency save, budget / quota, verification gate, context cache, file ownership, and circuit breaker
|
|
450
|
+
- `ContextCache` generalized across providers (OpenRouter today → Anthropic, Bedrock next)
|
|
813
451
|
|
|
814
|
-
|
|
452
|
+
### v0.3 — New Subsystems (the four genuinely missing pieces)
|
|
815
453
|
|
|
816
|
-
|
|
454
|
+
- **Workflow / process-graph DSL** — typed `step / branch / parallel / loop / hitl` builder, durable on top of the existing checkpoint and lifecycle
|
|
455
|
+
- **Evaluation subsystem** — `Dataset` + `Scorer` + `Experiment` primitives with a `namzu eval run` CLI, model-graded / rule-based / statistical scorers, SCD-2 versioning
|
|
456
|
+
- **Content-level guardrails** — a second policy layer next to the verification gate, covering LLM I/O (PII, prompt injection, output schema, toxicity) with per-tenant and per-tool attachment
|
|
457
|
+
- **Semantic cache** and **prompt compression** as opt-in additions next to the existing `ContextCache`
|
|
817
458
|
|
|
818
|
-
|
|
459
|
+
### v0.4 — Drivers and I/O (extending the driver model)
|
|
819
460
|
|
|
820
|
-
**
|
|
461
|
+
- **Voice driver** — unified STT / TTS provider abstraction, duplex streaming, real-time speech-to-speech
|
|
462
|
+
- **Multimodal tool I/O** — MIME-typed binary handles for image, audio, and video inputs and outputs
|
|
463
|
+
- **Computer-use driver** — reference implementation with its own sandbox profile
|
|
464
|
+
- **Deterministic provider replay** — cassette pattern for eval and CI, separate from run-level checkpoints
|
|
821
465
|
|
|
822
|
-
|
|
823
|
-
- **Free to use in your own products** (as long as you're not building a competing agent platform)
|
|
824
|
-
- **Each version converts to MIT after 2 years** — fully open source, no strings attached
|
|
466
|
+
### Explicitly out of scope (community or separate packages)
|
|
825
467
|
|
|
826
|
-
|
|
468
|
+
- `@namzu/react`, `@namzu/svelte`, `@namzu/vue` chat hooks
|
|
469
|
+
- Next.js / Hono / Cloudflare Workers adapters
|
|
470
|
+
- A dev studio playground (would consume AEP, lives in its own repo)
|
|
471
|
+
- A visual observability dashboard in the style of VoltOps or LangSmith
|
|
827
472
|
|
|
828
|
-
|
|
473
|
+
These are valuable — they belong on top of the kernel, not in it. Keeping the kernel's interface surface small is why the kernel can move fast.
|
|
829
474
|
|
|
830
|
-
|
|
475
|
+
---
|
|
831
476
|
|
|
832
|
-
##
|
|
477
|
+
## License and Vision
|
|
833
478
|
|
|
834
|
-
|
|
479
|
+
[FSL-1.1-MIT](./LICENSE.md). Every version becomes fully MIT two years after release.
|
|
835
480
|
|
|
836
|
-
|
|
481
|
+
The vision: an open, community-driven agent kernel that reduces systemic dependencies on proprietary platforms — so everyone can build, own, and run AI agents freely. Namzu works with any LLM provider through BYOK, runs in isolation without container orchestration, and surfaces a stable protocol so the application layer stays yours.
|
|
837
482
|
|
|
838
|
-
|
|
483
|
+
If that resonates, we would love your help. Bug reports, feature ideas, PRs, a kind word on your blog — all of it matters. The fastest way in is to pick a subsystem from `src/` that looks interesting, read its code, and open an issue or a PR.
|