@namzu/sdk 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/CHANGELOG.md +16 -0
  2. package/README.md +314 -669
  3. package/dist/bridge/tools/connector/adapter.d.ts +2 -2
  4. package/dist/config/runtime.d.ts +52 -52
  5. package/dist/connector/builtins/webhook.d.ts +1 -1
  6. package/dist/contracts/a2a.d.ts +125 -125
  7. package/dist/contracts/schemas.d.ts +34 -34
  8. package/dist/index.d.ts +2 -0
  9. package/dist/index.d.ts.map +1 -1
  10. package/dist/index.js +2 -0
  11. package/dist/index.js.map +1 -1
  12. package/dist/tools/builtins/__tests__/computer-use.test.d.ts +2 -0
  13. package/dist/tools/builtins/__tests__/computer-use.test.d.ts.map +1 -0
  14. package/dist/tools/builtins/__tests__/computer-use.test.js +146 -0
  15. package/dist/tools/builtins/__tests__/computer-use.test.js.map +1 -0
  16. package/dist/tools/builtins/__tests__/structuredOutput.example.d.ts +10 -10
  17. package/dist/tools/builtins/computer-use.d.ts +185 -0
  18. package/dist/tools/builtins/computer-use.d.ts.map +1 -0
  19. package/dist/tools/builtins/computer-use.js +151 -0
  20. package/dist/tools/builtins/computer-use.js.map +1 -0
  21. package/dist/tools/builtins/index.d.ts +1 -0
  22. package/dist/tools/builtins/index.d.ts.map +1 -1
  23. package/dist/tools/builtins/index.js +1 -0
  24. package/dist/tools/builtins/index.js.map +1 -1
  25. package/dist/tools/builtins/ls.d.ts +1 -1
  26. package/dist/types/computer-use/index.d.ts +74 -0
  27. package/dist/types/computer-use/index.d.ts.map +1 -0
  28. package/dist/types/computer-use/index.js +35 -0
  29. package/dist/types/computer-use/index.js.map +1 -0
  30. package/dist/types/plugin/index.d.ts +14 -14
  31. package/dist/types/sandbox/index.d.ts +2 -2
  32. package/dist/types/verification/index.d.ts +18 -18
  33. package/package.json +19 -21
  34. package/src/index.ts +5 -0
  35. package/src/tools/builtins/__tests__/computer-use.test.ts +188 -0
  36. package/src/tools/builtins/computer-use.ts +165 -0
  37. package/src/tools/builtins/index.ts +1 -0
  38. package/src/types/computer-use/index.ts +126 -0
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
- # Namzu
1
+ # Namzu — An Operating System for AI Agents
2
2
 
3
- Open-source AI agent SDK with a built-in runtime. Nothing between you and your agents.
3
+ **A dependency-free, fully controllable kernel that runs, isolates, schedules, remembers, and coordinates AI agents.** Your UI, chat interface, voice surface, CLI, or automation pipeline sits on top of a stable kernel instead of reinventing the hard parts: sandbox isolation, process lifecycle, signals, checkpoints, memory, protocol interop, and audit.
4
4
 
5
5
  [![npm](https://img.shields.io/npm/v/@namzu/sdk?color=blue)](https://www.npmjs.com/package/@namzu/sdk)
6
6
  [![CI](https://github.com/cogitave/namzu/actions/workflows/ci.yml/badge.svg)](https://github.com/cogitave/namzu/actions/workflows/ci.yml)
@@ -8,831 +8,476 @@ Open-source AI agent SDK with a built-in runtime. Nothing between you and your a
8
8
  [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue)](https://www.typescriptlang.org/)
9
9
  [![Node](https://img.shields.io/badge/node-%3E%3D22-brightgreen)](https://nodejs.org/)
10
10
 
11
- > **Our goal is to build an open, community-driven agent framework that reduces systemic dependencies on proprietary platforms — so that everyone can build, own, and run AI agents freely.** Namzu is designed to work with any LLM provider through a bring-your-own-key model, and every version becomes fully open source (MIT) after two years. If this vision resonates with you, we'd love your help — whether it's a bug report, a feature idea, a pull request, or just spreading the word. Every contribution matters.
11
+ ---
12
12
 
13
- ## Why Namzu?
13
+ ## The Thesis
14
14
 
15
- There are great agent frameworks out there LangChain, CrewAI, AutoGen, Vercel AI SDK, OpenAI Agents SDK. Each solves a real problem. Namzu exists because we think some things are still missing.
15
+ Most "agent frameworks" today are really application frameworks. They ship chat UIs, picking UI layouts, batteries-included hosted dashboards, vendor-specific fast paths, and integration drivers for a handful of databases. You get something you can demo in an hour, and three months later you own a stack where the same framework dictates your frontend, your database, your observability, and your model vendor.
16
16
 
17
- **Sandboxed execution.** Agents execute tools inside process-level sandboxes. macOS uses Seatbelt (SBPL) profiles for deny-default file-I/O and process isolation. Linux uses lightweight mount + PID namespace isolation for process scoping, with resource limits (memory, timeout, max processes) enforced by the runtime. No Docker, no containers.
17
+ We think agent software should be layered like Unix. At the bottom there needs to be a **kernel**: something to isolate processes, schedule tool calls, manage memory pressure, propagate signals across a call tree, persist checkpoints so a run can resume after a crash, mediate inter-process communication, and produce an auditable event stream. Above the kernel there is user space — shells, editors, IDEs, voice gateways, React apps. The kernel does not care which shell you pick; the shell cannot break the isolation the kernel provides.
18
18
 
19
- **True provider independence.** Most frameworks say they're provider-agnostic but are optimized for one vendor. Namzu treats every provider as a first-class citizen through BYOK (Bring Your Own Key). Switch from OpenRouter to Bedrock by changing one line. No performance penalties, no second-class APIs.
19
+ **Namzu is the kernel.** It runs agents the way Unix runs processes. It does not render UI, it does not pick your database, it does not favor one LLM vendor. It gives you a surface typed, versioned, documented that any UI, any storage backend, and any model can plug into. The surface is small and stable; the guts underneath are deep.
20
20
 
21
- **Thread/Run separation.** Other frameworks mix conversation history with execution traces. Namzu cleanly separates threads (user↔assistant conversation) from runs (tool calls, iterations, internal state). Multi-turn conversations carry only the context that matters.
21
+ ---
22
22
 
23
- **A2A + MCP in one SDK.** Google's Agent-to-Agent protocol and Anthropic's Model Context Protocol are usually separate integrations. Namzu bridges both natively — your agents can expose capabilities via A2A agent cards and consume external tools via MCP, out of the box.
23
+ ## What Namzu Is
24
+
25
+ Namzu is a single-process TypeScript kernel with the following responsibilities:
26
+
27
+ - **Process execution and isolation.** Tools run inside OS-level sandboxes: Seatbelt (SBPL) on macOS, mount + PID namespaces on Linux. Deny-default file I/O, scoped network access, enforced resource limits. No Docker, no container runtime, no daemon, no sidecar.
28
+ - **Agent lifecycle.** Parent/child agent spawn with depth tracking, budget splitting, and causal trace linkage. A supervisor can fork a subtree of agents and get their results back, with each child isolated from its siblings.
29
+ - **Scheduling.** Per-run token, cost, wall-clock, and iteration budgets. Limit checker, task router (cheap model for compaction, expensive for coding), tool tiering (LLM learns to prefer cheaper tools first).
30
+ - **Signals.** `AbortController` tree spanning parent and children. `cancel(taskId)` and `cancelAll(parentRunId)` propagate. Runs can be paused and resumed, aborted cleanly, and emit lifecycle events for every transition.
31
+ - **Memory management.** Working memory via structured compaction to a typed `WorkingState`. Long-term memory via an indexed, tag/query/status-searchable store with disk persistence. No vector database required by default.
32
+ - **Durability.** Atomic per-iteration checkpoints, automatic emergency core-dump on SIGINT/SIGTERM, separate storage for runs, threads, conversations, activities, memories, and tasks.
33
+ - **IPC.** Native A2A (Google agent-to-agent) and MCP (Anthropic Model Context Protocol) — both client and server, one SDK. An internal event bus with circuit breakers, file lock manager, and edit ownership tracking so concurrent agents do not stomp on each other.
34
+ - **Capability system.** Tools are first-class, typed, permissioned, and progressively disclosed. The LLM does not see the full tool catalog; tools start deferred, get activated on demand, and can be suspended. Each tool declares `readOnly`, `destructive`, `concurrencySafe`, `permissions`, `category`.
35
+ - **Syscall filtering.** Every tool call goes through a verification gate — allow / deny / ask, with built-in rules for read-only allowlist and dangerous pattern deny-list, plus custom regex rules. This is separate from sandbox isolation; it is the decision layer, the sandbox is the enforcement layer.
36
+ - **Retrieval-augmented context (RAG).** A full pipeline: chunking, embedding providers, ingestion, knowledge base storage, vector store, retriever, context assembler, and a first-class `rag-tool`.
37
+ - **Skills.** Disclosure-tiered capability bundles that the agent can load on demand, distinct from tools.
38
+ - **Personas.** YAML-defined identity, expertise, reflexes, and output format with inheritance — specialize a base persona by merging a single field, no prompt concatenation.
39
+ - **Advisory system.** Mid-execution consultation with specialized advisors. Provider-agnostic: put a security advisor on Bedrock, an architecture advisor on OpenRouter, and let the main agent decide when to consult whom.
40
+ - **Human-in-the-loop.** Structured plan review, per-tool approval with destructiveness flags, typed decision contracts, checkpoint/resume across sessions.
41
+ - **Plugin system.** Lifecycle-hooked plugin loader with MCP contributions, tool contributions, and manifest-driven resolution.
42
+ - **Multi-tenant isolation from day one.** Connector registries, vaults, config, and stores are tenant-scoped. Two organizations can share a process without cross-contamination.
43
+ - **Provider abstraction.** OpenRouter and AWS Bedrock today; the `Provider` interface is narrow enough that adding another vendor is an afternoon. BYOK everywhere, no hidden hot paths for any vendor.
44
+ - **Telemetry.** OpenTelemetry-native spans and metrics. Cost accounting (input tokens, output tokens, cached tokens, cache write tokens, cache discount) flows from the provider into per-run, per-tenant rollups.
45
+ - **Prompt cache integration.** Hash-based system-prompt cache per thread, integrated with provider cache controls (OpenRouter `cacheControl` today, more planned), plus full cache telemetry in every run.
46
+ - **Vault.** BYOK credentials and secrets, tenant-scoped, pluggable backend.
47
+ - **Thread / Run separation.** Conversations (thread: user ↔ assistant messages across sessions) are cleanly separated from runs (tool calls, iterations, internal state). Multi-turn dialogs carry only the context that matters.
48
+
49
+ Every one of those bullets points at code that exists today in `src/`. The architecture is deep even where the surface is quiet.
50
+
51
+ ## What Namzu Is Not
52
+
53
+ Equally important for scoping expectations:
54
+
55
+ - **Not a chat SDK.** There are no React, Svelte, or Vue hooks, no generative UI components, no `useChat`. Your UI framework is your choice; the kernel hands you a typed event stream.
56
+ - **Not a hosted service.** There is no dashboard, no Namzu Cloud, no billing page. You run it in your own process.
57
+ - **Not a deployment adapter.** No Next.js, Hono, Express, or Cloudflare Workers plumbing in the kernel. Those belong in separate packages or your own infra code.
58
+ - **Not a dev studio.** No bundled playground UI. A playground that consumes the kernel's event protocol could exist as a separate tool; it would not live inside `@namzu/sdk`.
59
+ - **Not a vector database.** RAG ships with a pluggable `VectorStore` interface, but the kernel does not embed pgvector or Pinecone. Bring your own.
60
+ - **Not an LLM router service.** Task routing is an in-process policy, not a hosted service.
61
+ - **Not a prompt management UI.** Personas are code-defined (YAML files in your repo), not database rows behind a web form.
62
+
63
+ The goal of that list is not to be minimal — the kernel is plenty rich. The goal is to keep the kernel's **interface surface** small and stable so the layers above can move fast without breaking what is underneath.
24
64
 
25
- **Multi-tenant from day one.** Most frameworks assume single-user, single-process. Namzu ships with tenant isolation, scoped connector registries, and environment-aware configuration. Building a platform where multiple teams run agents? That's the default, not an afterthought.
65
+ ---
26
66
 
27
- **Human-in-the-loop that actually works.** Not just "pause and wait for input." Namzu has structured plan review, per-tool approval with destructiveness flags, checkpoint/resume across sessions, and a decision framework that gives humans real control without breaking the agent loop.
67
+ ## The Complete Feature Map
68
+
69
+ How Namzu compares, category by category. Framework category tells you what job the project actually does.
70
+
71
+ | | **Namzu** | LangGraph | CrewAI | Mastra | Vercel AI SDK | OpenAI Agents SDK |
72
+ |---|---|---|---|---|---|---|
73
+ | Category | **Agent Kernel** | Graph framework | Crew framework | TS app framework | Frontend-first SDK | Vendor SDK |
74
+ | Language | TypeScript | Python/JS | Python | TypeScript | TypeScript | Python/JS |
75
+ | Process sandbox (OS-level) | ✅ Seatbelt + NS | ❌ | ❌ | ❌ | ❌ | ❌ |
76
+ | Multi-tenant from day 1 | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
77
+ | Sub-agent spawn (`fork`/`exec`) | ✅ parent/child/depth/budget | via graph | crews | ✅ | ❌ | handoffs |
78
+ | Signal propagation tree | ✅ AbortController + cancelAll | ❌ | ❌ | partial | ❌ | ❌ |
79
+ | Checkpoint + resume | ✅ per-iteration | ✅ per-superstep | ❌ | partial | ❌ | sessions |
80
+ | Emergency save on signal | ✅ `EmergencySaveManager` | ❌ | ❌ | ❌ | ❌ | ❌ |
81
+ | Resource quotas (token / cost / time) | ✅ per run + per child | manual | manual | manual | ❌ | manual |
82
+ | Provider prompt cache wired | ✅ `ContextCache` + telemetry | ❌ | ❌ | partial | ❌ | ✅ |
83
+ | Thread ↔ Run separation | ✅ | ❌ | ❌ | ✅ | ❌ | partial |
84
+ | Native A2A protocol | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
85
+ | Native MCP (client + server) | ✅ | plugin | ❌ | ✅ | ❌ | client only |
86
+ | RAG built into the kernel | ✅ full pipeline | via integrations | via integrations | plugin | ❌ | via tools |
87
+ | Persona inheritance (YAML) | ✅ merge-based | ❌ | role strings | partial | ❌ | instructions |
88
+ | Advisory system (multi-advisor) | ✅ provider-agnostic | ❌ | ❌ | ❌ | ❌ | ❌ |
89
+ | Structured context compaction | ✅ WorkingState | ❌ | ❌ | partial | ❌ | ❌ |
90
+ | Tool tiering (cost-aware) | ✅ user-defined | ❌ | ❌ | ❌ | ❌ | ❌ |
91
+ | Task routing (per-task model) | ✅ fallback chains | manual | manual | manual | ❌ | manual |
92
+ | Progressive tool disclosure | ✅ deferred/active/suspended | ❌ | ❌ | ❌ | ❌ | ❌ |
93
+ | Tool-call verification gate | ✅ allow/deny/ask + custom | ❌ | task-level scope | ❌ | tool approval | ❌ |
94
+ | File ownership / edit locking | ✅ `EditOwnershipTracker` | ❌ | ❌ | ❌ | ❌ | ❌ |
95
+ | Circuit breakers on the bus | ✅ `CircuitBreaker` | ❌ | ❌ | ❌ | ❌ | ❌ |
96
+ | Skills system (separate from tools) | ✅ disclosure-tiered | ❌ | ❌ | ❌ | ❌ | ❌ |
97
+ | Plugin system with lifecycle | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
98
+ | Vault / BYOK | ✅ tenant-scoped | ❌ | ❌ | ❌ | ❌ | ❌ |
99
+ | Telemetry (OpenTelemetry) | ✅ native | via LangSmith | CrewAI+ | partial | ❌ | built-in tracing |
100
+ | Provider lock-in | none | low | low | low | low | OpenAI-first |
28
101
 
29
- **Persona system with inheritance.** Define agent identity, expertise, reflexes, and output format as structured data. Specialize agents through inheritance — a base researcher persona becomes an ML researcher by merging one field. No prompt string concatenation.
102
+ ---
30
103
 
31
- **Progressive tool disclosure.** Agents don't see all tools at once. Tools start as deferred (searchable but not active), get activated on demand, and can be suspended. This keeps the LLM's context focused and reduces hallucinated tool calls.
104
+ ## Architecture in Depth Every Subsystem
32
105
 
33
- **Advisory System.** A three-layer advisory architecture where agents can consult specialized advisors mid-execution. Unlike Anthropic's Advisor Tool (Claude-only, single advisor), Namzu's system is provider-agnostic (any model advises any model via BYOK), supports multi-advisor with domain routing, configurable triggers, and agent-autonomous consultation. Define a security advisor on Bedrock, an architecture advisor on OpenRouter, and let the agent decide when to consult whom.
106
+ Every folder under `src/` maps to a traditional OS concept. This section walks them one by one, in the order a request actually flows.
34
107
 
35
- **Structured compaction.** When context reaches capacity, Namzu doesn't just truncate — it incrementally extracts structured data (task, plan, files, decisions, failures) into a typed WorkingState, serializes it as compact markdown, and replaces old messages. The agent continues with full context awareness at a fraction of the token cost.
108
+ ### 1. The Boundary: Sandbox (`sandbox/`)
36
109
 
37
- **Tool tiering.** Configurable tier system that teaches the LLM to prefer cheaper tools first. Unlike hardcoded approaches, Namzu's tiers are fully user-defined bring your own tier labels, priorities, and guidance templates.
110
+ Sandboxing is the foundation. Tools do not execute in the host process; they execute inside an OS-enforced jail with deny-default file I/O and scoped network access. macOS uses Seatbelt profiles in SBPL format (the same mechanism Apple's own apps use). Linux uses lightweight mount + PID namespaces, without requiring Docker, systemd, or any container runtime. The `SandboxProvider` abstraction (`sandbox/factory.ts`, `sandbox/provider/`) means you can swap in a Firecracker or gVisor-backed provider later without touching the rest of the kernel.
38
111
 
39
- **Task routing.** Route sub-tasks to different models based on task type. Compaction and summarization go to cheap models, coding stays on expensive ones. Provider-agnostic, configurable per task type, with automatic fallback chains.
112
+ The kernel enforces memory, timeout, and max-process limits on top of whatever the sandbox gives you. The goal: a rogue or hallucinated tool call should never wipe your filesystem or exfiltrate arbitrary data, even if the LLM tries very hard.
40
113
 
41
- | | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
42
- |---|---|---|---|---|---|
43
- | Language | TypeScript | Python/JS | Python | Python/JS | TypeScript |
44
- | Provider lock-in | None (BYOK) | Low | Low | Optimized for OpenAI | Low |
45
- | **Process sandbox** | **Native (Seatbelt + NS)** | No | No | No | No |
46
- | Agent patterns | 4 (reactive, pipeline, router, supervisor) | Graph-based | Role-based crews | Handoffs | Single-agent |
47
- | A2A protocol | Native | No | No | No | No |
48
- | MCP support | Native (client + server) | Plugin | No | Client only | No |
49
- | Multi-tenant | Built-in | No | No | No | No |
50
- | Thread/Run separation | Yes | No | No | Sessions | No |
51
- | Plan review (HITL) | Structured | Graph interrupts | No | Basic | Tool approval |
52
- | Persona inheritance | Yes | No | Role strings | Instructions | System prompt |
53
- | Progressive tool loading | Yes | No | No | No | No |
54
- | Advisory system | Multi-advisor, provider-agnostic | No | No | No | No |
55
- | Context compaction | Structured WorkingState | No | No | No | No |
56
- | Tool tiering | Configurable, user-defined | No | No | No | No |
57
- | Task routing | Per-task model selection | No | No | No | No |
58
- | RAG built-in | Full pipeline | Via integrations | Via integrations | Via tools | No |
59
- | Telemetry | OpenTelemetry | LangSmith | CrewAI+ | Built-in tracing | No |
60
-
61
- ### Architecture Quality Scores
62
-
63
- > **Note:** These scores are AI-generated based on public documentation, community feedback, and (for Namzu) direct codebase analysis. They are not official benchmarks — treat them as an informed architectural comparison, not a definitive ranking. We tried to be fair; if you disagree with a score, open an issue and let's discuss.
114
+ ### 2. Interprocess Communication: Bridge (`bridge/`) and Bus (`bus/`)
64
115
 
65
- | Criterion | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
66
- |---|---|---|---|---|---|
67
- | Type Safety | 9 | 5 | 7 | 7 | 9 |
68
- | Modularity | 9 | 5 | 7 | 8 | 9 |
69
- | Interface Segregation | 8 | 4 | 6 | 8 | 8 |
70
- | Extensibility | 9 | 7 | 6 | 6 | 7 |
71
- | Convention Consistency | 8 | 5 | 7 | 8 | 8 |
72
- | Dependency Direction | 9 | 4 | 6 | 8 | 8 |
73
- | **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
116
+ Two layers here, with different jobs.
74
117
 
75
- **What the scores tell us:**
118
+ **Bridge** is cross-process and cross-agent communication. The `bridge/a2a/` folder speaks Google's Agent-to-Agent protocol: your agents can publish agent cards describing their capabilities and can discover and invoke other agents' capabilities. The `bridge/mcp/` folder speaks Anthropic's Model Context Protocol, both as a client (consume MCP servers as tools) and as a server (expose your Namzu tools to any MCP-speaking agent). The `bridge/sse/` folder contains the event mapper that turns in-process events into Server-Sent Events for any consumer on the other side of HTTP. `bridge/tools/` wires it all into the tool system.
76
119
 
77
- - **Namzu** scores highest on type safety, extensibility, and dependency direction its branded ID system, abstract base patterns, and clean acyclic module graph are genuine strengths.
120
+ **Bus** is in-process. This is where the kernel's internal nervous system lives. The bus emits typed `AgentBusEvent`s for every meaningful transition: run started, iteration begun, checkpoint created, tool call dispatched, tool result returned, agent paused, agent canceled, plan requested, plan approved, error thrown. On top of raw event fan-out, the bus offers three kernel-grade primitives:
78
121
 
79
- - **LangChain/LangGraph** suffers from well-documented abstraction bloat and dependency issues. The community consistently reports difficulty debugging, deep coupling between modules, and frequent breaking changes.
122
+ - **`CircuitBreaker`** (`bus/breaker.ts`) closes the bus to a flapping agent. If an agent's run keeps failing, the breaker trips and prevents retry storms. Configurable failure threshold and reset timeout.
123
+ - **`FileLockManager`** (`bus/lock.ts`) holds locks on files across concurrent agents. A child cannot acquire a lock its parent or sibling already holds. Acquisition timeout is enforced.
124
+ - **`EditOwnershipTracker`** (`bus/ownership.ts`) records which run last claimed ownership of a path, emits events on contention, and lets a HITL layer decide who wins. When two agents try to edit the same file, one of them is told to wait or re-plan.
80
125
 
81
- - **CrewAI** offers clean role-based design with Pydantic validation, but is Python-only with limited extensibility for custom agent patterns beyond crews. Security-conscious with task-level tool scoping.
126
+ These exist because the moment you have more than one agent running in parallel against a shared filesystem, you need the kernel to arbitrate. Most frameworks either do not have parallelism or leave it to user space; Namzu treats it as a first-class kernel concern.
82
127
 
83
- - **OpenAI Agents SDK** is deliberately minimal — four primitives, clean design, good type safety in its TypeScript variant. Limited by a narrow scope: no RAG, no multi-tenant, no A2A. Optimized for OpenAI models.
128
+ ### 3. Process Lifecycle: Manager (`manager/`)
84
129
 
85
- - **Vercel AI SDK** has the strongest frontend integration and end-to-end type safety from server to client. Clean modular architecture. Focused on web/chat UIs rather than backend agent orchestration.
130
+ `manager/agent/lifecycle.ts` is the `fork()` + `exec()` + `waitpid()` of the kernel. When a parent agent (say a `SupervisorAgent`) spawns a child, the lifecycle manager:
86
131
 
87
- **Where Namzu needs to improve:** Test coverage is the next priority the focus so far has been getting the architecture and core abstractions right. Now that the foundation is solid, comprehensive testing is next. Contributions are very welcome here.
132
+ - Allocates a slice of the parent's token budget, timeout budget, and cost budget to the child
133
+ - Creates a child `AbortController` linked to the parent's
134
+ - Builds a child config via the agent definition's `configBuilder(factoryOptions)`
135
+ - Stamps the child with `parentAgentId`, `parentRunId`, `threadId`, and `depth`
136
+ - Registers the child task in an internal `TaskRegistry` keyed by `TaskId`
137
+ - Emits `agent_pending` on the bus with parent/child/depth metadata
138
+ - Forwards every child event to the parent's run listener so the supervisor sees what its subtree is doing
88
139
 
89
- ## What Can You Build?
140
+ When the parent is cancelled — by HITL, by a limit breach, or by an external signal — `cancelAll(parentRunId)` walks the subtree and aborts every descendant. This is the Unix `SIGKILL` to a process group.
90
141
 
91
- Namzu is not a toy framework for chatbot demos. It's designed for real workloads whether you're automating your homelab, streamlining business operations, or building a full agent platform.
142
+ `manager/connector/` manages the lifecycle of external connectors (MCP servers, HTTP connectors). `manager/plan/lifecycle.ts` coordinates HITL plan review. `manager/run/persistence.ts` is the run-level persistence surface, and `manager/run/emergency.ts` is the emergency-save subsystem (see §9 below).
92
143
 
93
- ### Personal & Homelab
144
+ ### 4. Scheduling: Router (`router/`), Execution (`execution/`), Limit Checker (`run/LimitChecker.ts`)
94
145
 
95
- **Your own AI assistant that actually does things.** Not just answers questions executes. Connect it to your file system, your scripts, your local services via MCP, and let it manage your infrastructure through conversation.
146
+ The router policy (`router/task-router.ts`) decides which model a task should go to. Compaction and summarization go to cheap models; coding and complex reasoning stay on expensive ones. Tiering is user-defined — you decide which models belong in which tier and what guidance the LLM gets about preferring tier-1 tools first.
96
147
 
97
- - **Home automation agent** monitors logs, restarts services, runs health checks, alerts you when something breaks. Give it bash + read-file tools and a reactive loop.
98
- - **Personal research agent** — feeds documents into the RAG pipeline, builds a knowledge base from your notes/PDFs/bookmarks, and answers questions with citations from your own data.
99
- - **Code review agent** — watches your repos, reviews PRs with a pipeline agent (extract diff → analyze → write review), posts feedback automatically.
100
- - **Media organizer** — scans your library, categorizes files, renames based on metadata, deduplicates. A pipeline agent with file tools handles this end to end.
148
+ The execution layer (`execution/base.ts`, `execution/local.ts`) is the concrete executor that invokes the provider, dispatches tool calls, and produces iteration results. Execution is pluggable; you could swap in a remote executor without touching the agent patterns above.
101
149
 
102
- ### Business & Team
150
+ The limit checker (`run/LimitChecker.ts`) is the kernel scheduler's enforcement point. Every iteration it checks: have we exceeded the token budget? The cost budget? The wall-clock timeout? The iteration count? Has the user issued an abort? If any is true, it returns a typed hard-stop decision — `cancelled`, `token_budget_exceeded`, `timeout`, `max_iterations` — and the run ends cleanly with a stop reason recorded in its metadata.
103
151
 
104
- **Agents that plug into your existing workflows.** Namzu's connector system and multi-tenant isolation mean you can deploy agents for different teams without them stepping on each other.
152
+ ### 5. The Runtime Query Path (`runtime/`)
105
153
 
106
- - **Customer support triage** a router agent classifies incoming tickets and delegates to specialized agents (billing, technical, general). Each agent has its own persona, tools, and knowledge base.
107
- - **Document processing pipeline** — ingest contracts, invoices, or reports through RAG. Agents extract key data, flag anomalies, and generate summaries. Human-in-the-loop ensures nothing gets approved without review.
108
- - **Internal ops bot** — connects to your existing tools (Slack, Jira, databases) via HTTP connectors or MCP servers. Team members ask questions in natural language, the agent queries the right systems and responds.
109
- - **Compliance checker** — a supervisor agent coordinates specialized sub-agents that each check a different regulation. Results are aggregated, flagged items go through plan review before any action is taken.
154
+ `runtime/query/` is where one iteration of the agent loop actually happens. The pieces:
110
155
 
111
- ### Platform & SaaS
156
+ - `runtime/query/context.ts` assembles the request context: system prompt, persona, skills, tools, messages.
157
+ - `runtime/query/context-cache.ts` implements `ContextCache` — a hash-based system-prompt cache per thread. If the prompt inputs have not changed since last iteration, the cache returns the same text so provider-level prompt caching can hit.
158
+ - `runtime/query/prompt.ts` owns `PromptBuilder` — structured, segment-based prompt assembly (static segment vs dynamic segment) that plays well with provider prompt caches.
159
+ - `runtime/query/guard.ts` runs pre-dispatch guards on the request.
160
+ - `runtime/query/executor.ts` actually calls the provider and streams the result.
161
+ - `runtime/query/result.ts` normalizes the provider's response into the kernel's canonical shape.
162
+ - `runtime/query/checkpoint.ts` writes the iteration's checkpoint.
163
+ - `runtime/query/tooling.ts` bridges the iteration to the tool system, including progressive disclosure state.
164
+ - `runtime/query/iteration/` contains the iteration machinery.
165
+ - `runtime/query/plugin-hooks.ts` lets plugins observe and shape iterations.
166
+ - `runtime/query/events.ts` emits the typed events that feed the bus.
112
167
 
113
- **Build a multi-tenant agent platform for your users.** This is what Namzu was designed for from the start.
168
+ `runtime/decision/` (with `parser.ts` and `fallback.ts`) parses LLM decisions (tool calls vs final answer vs thinking vs advisory request) and falls back gracefully when the LLM returns malformed output.
114
169
 
115
- - **Agent-as-a-Service** each customer gets isolated agents with their own API keys (BYOK), connector configs, and knowledge bases. Tenant isolation is built in, not bolted on.
116
- - **Agent marketplace** — define agents as portable definitions (info + tools + persona), publish them, let others deploy with their own keys and customize via persona inheritance.
117
- - **Cross-organization workflows** — agents from different companies discover each other via A2A agent cards and collaborate on shared tasks without a central authority.
170
+ ### 6. Memory Management: Compaction (`compaction/`) and Store (`store/`)
118
171
 
119
- ## Install
172
+ Memory in the kernel is two systems cooperating.
120
173
 
121
- ```bash
122
- npm install @namzu/sdk
123
- ```
174
+ **Working memory** is `compaction/`. When a thread's context approaches the model's window, the kernel does not truncate. It runs the `structured` compaction manager (default in `compaction/managers/structured.ts`, with `slidingWindow.ts` and `null.ts` as alternatives), which incrementally extracts `task / plan / files / decisions / failures` from the message stream into a typed `WorkingState`. The extractor (`compaction/extractor.ts`), verifier (`compaction/verifier.ts`), and serializer (`compaction/serializer.ts`) together produce compact markdown that replaces old messages. The agent keeps context awareness at a fraction of the token cost. `compaction/dangling.ts` handles partial tool-call streams that could otherwise corrupt the conversation state.
124
175
 
125
- ## Quick Start
176
+ **Long-term memory** is `store/memory/`. The `MemoryIndex` (with `InMemoryMemoryIndex` as the default and a disk-backed variant) stores typed `MemoryIndexEntry` records, searchable by free-text query, tag set, and status filter. It persists to disk atomically. There is no required vector database — the default is good-old tag and text search. You can layer an embedding-backed index on top if you want, but the kernel does not assume it.
126
177
 
127
- ```typescript
128
- import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
129
- import { z } from 'zod'
178
+ Alongside memory, `store/` has sibling stores for every kernel concept: `store/run/` (runs, iterations, checkpoints), `store/conversation/` (threads and messages), `store/activity/` (activity log), `store/task/` (task registry), and an in-memory generic `InMemoryStore` for tests and ephemeral workloads.
130
179
 
131
- // Define a tool
132
- const searchWeb = defineTool({
133
- name: 'search_web',
134
- description: 'Search the web for information',
135
- inputSchema: z.object({ query: z.string() }),
136
- category: 'network',
137
- permissions: ['network_access'],
138
- readOnly: true,
139
- destructive: false,
140
- concurrencySafe: true,
141
- execute: async ({ query }) => {
142
- const results = await fetch(`https://api.search.com?q=${query}`)
143
- return { success: true, output: await results.text() }
144
- },
145
- })
180
+ ### 7. The Capability System: Tools (`tools/`) and Registry (`registry/`)
146
181
 
147
- // Create a provider (model is chosen per-run, not on the provider)
148
- const provider = ProviderFactory.createProvider({
149
- type: 'openrouter',
150
- apiKey: process.env.OPENROUTER_KEY!,
151
- })
182
+ Tools in Namzu are first-class typed values, not JSON schemas you have to keep in sync with a handler somewhere else. `defineTool()` takes a Zod `inputSchema`, a Zod `outputSchema` (optional), and an `execute` function. It also takes **declarations** the kernel uses for routing and safety:
152
183
 
153
- // Register tools and build the agent
154
- const tools = new ToolRegistry()
155
- tools.register(searchWeb)
184
+ - `category` e.g. `network`, `filesystem`, `compute`, `memory`.
185
+ - `permissions` e.g. `network_access`, `write_filesystem`. Enforced at dispatch time.
186
+ - `readOnly` — predicate over input; tools that only read get different treatment by the verification gate and tool tiering.
187
+ - `destructive` — boolean flag that triggers HITL approval when true.
188
+ - `concurrencySafe` — whether two concurrent runs can invoke this tool with no interference.
156
189
 
157
- const agent = new ReactiveAgent({
158
- id: 'researcher',
159
- name: 'Research Assistant',
160
- version: '1.0.0',
161
- category: 'research',
162
- description: 'Finds and synthesizes information from the web',
163
- })
190
+ `tools/builtins/` ships file I/O, shell, and glob-search tools. `tools/advisory/`, `tools/memory/`, `tools/task/`, and `tools/coordinator/` ship kernel-facing tools that let agents consult advisors, query memory, coordinate siblings, and manage their task registry from inside the agent loop.
164
191
 
165
- const result = await agent.run(
166
- { messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
167
- { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
168
- )
169
- ```
192
+ **Progressive disclosure** is unique to Namzu. Tools exist in three states — `deferred`, `activated`, `suspended`. The LLM does not see the full tool catalog; it sees the current active set plus a searchable summary of deferred tools. When it needs something specific, it activates it; when it is done, it suspends it. This keeps the context window focused, reduces hallucinated tool calls, and lets a single agent work across dozens of tools without drowning in a prompt.
170
193
 
171
- ## Agent Types
194
+ **Tool tiering** teaches the LLM a cost hierarchy. You define tiers ("tier-1: local", "tier-2: fast remote", "tier-3: expensive API"), each with its own guidance template, and the kernel instructs the LLM to prefer lower tiers first. Unlike hardcoded approaches, every label, priority, and template is yours.
172
195
 
173
- Namzu provides four agent orchestration patterns, each designed for different execution models.
196
+ Registries (`registry/`) are the kernel's object tables. `registry/tool/` is the canonical tool catalog. `registry/agent/` holds agent definitions (the thing you can `AgentManager.spawn()`). `registry/connector/` holds connector catalogs. `registry/plugin/` holds plugins. `ManagedRegistry` is the shared base class with tenant scoping.
174
197
 
175
- ### Reactive Agent
198
+ ### 8. The Decision Layer: Verification Gate (`verification/`)
176
199
 
177
- The core agentic loop. Sends messages to an LLM, executes tool calls, and iterates until the task is complete or a stop condition is hit (token budget, cost limit, timeout, max iterations).
200
+ Before any tool call leaves the kernel, it goes through `verification/gate.ts`'s `VerificationGate`. Think of it as the kernel's seccomp a rule-based decision layer that says *allow*, *deny*, or *ask*.
178
201
 
179
- ```typescript
180
- import { ReactiveAgent } from '@namzu/sdk'
202
+ Built-in rules:
181
203
 
182
- const agent = new ReactiveAgent({
183
- id: 'solver',
184
- name: 'Problem Solver',
185
- version: '1.0.0',
186
- category: 'analysis',
187
- description: 'Analyzes data with LLM + tools',
188
- })
204
+ - **`allow_read_only`** if the tool's `readOnly(input)` returns true, allow.
205
+ - **`deny_dangerous_patterns`** — if the input matches any pattern from `DANGEROUS_PATTERNS` (shell injection, common exfiltration signatures, etc.), deny.
206
+ - **Custom regex rules** — per-tenant, per-agent, or global.
189
207
 
190
- const result = await agent.run(
191
- { messages: [{ role: 'user', content: 'Analyze this dataset and find trends' }], workingDirectory: process.cwd() },
192
- {
193
- model: 'anthropic/claude-sonnet-4-20250514',
194
- tokenBudget: 8192,
195
- timeoutMs: 600_000,
196
- provider,
197
- tools, // ToolRegistry
198
- systemPrompt: 'You are a data analyst.',
199
- },
200
- )
201
- ```
208
+ The `ask` decision hands control to the HITL layer. The verification gate is the kernel layer that makes "destructive tool requires approval" a policy, not a user-space convention.
202
209
 
203
- ### Pipeline Agent
210
+ Verification is intentionally separate from the sandbox: verification is the *decision*, sandbox is the *enforcement*. If a rule fails to deny and a call somehow gets through, the sandbox is still there to contain the damage. Defense in depth, kernel-style.
204
211
 
205
- Deterministic, sequential step execution. Each step receives the output of the previous one. Supports rollback on failure.
212
+ ### 9. Durability: Checkpoints and Emergency Save
206
213
 
207
- ```typescript
208
- import { PipelineAgent } from '@namzu/sdk'
214
+ The kernel assumes processes crash. Two layers make sure that when they do, you do not lose the run.
209
215
 
210
- const etl = new PipelineAgent({
211
- id: 'etl',
212
- name: 'ETL Pipeline',
213
- version: '1.0.0',
214
- category: 'pipeline',
215
- description: 'Extract → transform → load',
216
- })
216
+ **Checkpoints** (`store/run/disk.ts`) are atomic per-iteration snapshots. Each `IterationCheckpoint` captures the run state at a super-step boundary — messages, working state, tool-call state, usage, cost, iteration index. Writes are atomic via write-temp-rename (Convention #8). You can read them, list them, and delete them. A future `Run.replay(runId, { fromCheckpoint })` API will build on top of this; the storage is already there.
217
217
 
218
- const result = await etl.run(
219
- { messages: [], workingDirectory: process.cwd() },
220
- {
221
- model: 'anthropic/claude-sonnet-4-20250514',
222
- tokenBudget: 8192,
223
- timeoutMs: 600_000,
224
- steps: [
225
- { name: 'extract', execute: async (inp, ctx) => await readSource('./data') },
226
- { name: 'transform', execute: async (data, ctx) => normalize(data) },
227
- { name: 'load', execute: async (data, ctx) => await writeToDb(data) },
228
- ],
229
- },
230
- )
231
- ```
218
+ **Emergency save** (`manager/run/emergency.ts`) is the kernel's core-dump. `EmergencySaveManager` installs handlers for SIGINT and SIGTERM. When the process is dying, every active run gets its `toEmergencySnapshot()` flushed atomically to an `emergency/` directory. On the next boot you can inspect or resume the saved state. There is no reliance on the user remembering to catch signals; the kernel does it.
232
219
 
233
- ### Router Agent
220
+ Together these give Namzu durable execution without requiring a database. Runs resume across crashes, across reboots, across graceful shutdowns.
234
221
 
235
- Intelligent delegation. An LLM analyzes the input and routes it to the best-suited agent from a set of candidates.
222
+ ### 10. Retrieval-Augmented Generation: RAG (`rag/`)
236
223
 
237
- ```typescript
238
- import { RouterAgent } from '@namzu/sdk'
224
+ RAG is a full kernel subsystem, not a bolt-on. The pipeline:
239
225
 
240
- const router = new RouterAgent({
241
- id: 'dispatcher',
242
- name: 'Task Router',
243
- version: '1.0.0',
244
- category: 'routing',
245
- description: 'Routes an input to the best-fit agent',
246
- })
226
+ - `rag/chunking.ts` text chunking strategies (configurable by `ChunkingConfig`).
227
+ - `rag/embedding.ts` — the `EmbeddingProvider` abstraction. Providers are BYOK and swappable.
228
+ - `rag/ingestion.ts` — end-to-end ingest: document → chunks → embeddings → vector store.
229
+ - `rag/vector-store.ts` — the `VectorStore` interface, tenant-scoped via `TenantId`. Bring your own backend (pgvector, Pinecone, an in-memory impl for tests).
230
+ - `rag/knowledge-base.ts` — a named collection of documents with metadata and config.
231
+ - `rag/retriever.ts` the retrieval query path with configurable top-k, threshold, and reranking.
232
+ - `rag/context-assembler.ts` — turns retrieval hits into prompt-ready context windows.
233
+ - `rag/rag-tool.ts` — a first-class tool your agent can invoke, not an external integration.
247
234
 
248
- const result = await router.run(
249
- { messages: [{ role: 'user', content: 'Solve 2x + 3 = 11' }], workingDirectory: process.cwd() },
250
- {
251
- model: 'anthropic/claude-sonnet-4-20250514',
252
- tokenBudget: 4096,
253
- timeoutMs: 600_000,
254
- provider,
255
- routes: [
256
- { agentId: 'math-solver', agent: mathAgent, description: 'Solves equations' },
257
- { agentId: 'writer', agent: writerAgent, description: 'Writes content' },
258
- ],
259
- fallbackAgentId: 'writer',
260
- },
261
- )
262
- ```
235
+ RAG lives in the kernel because retrieval is a capability every non-trivial agent needs. Making you wire it up from plugins every time was not the right default.
263
236
 
264
- ### Supervisor Agent
237
+ ### 11. Skills (`skills/`)
265
238
 
266
- Multi-agent coordinator. Manages child agents, delegates tasks, aggregates results, and tracks the full run hierarchy.
239
+ Skills are disclosure-tiered capability bundles distinct from tools. A skill is a named body of knowledge, workflow, or policy that the agent can load on demand. `skills/loader.ts` reads them from disk; `skills/registry.ts` holds the active catalog; each skill has a `SkillDisclosureLevel` that decides when the LLM sees it (always visible, searchable-on-demand, explicit-activation-only). Skills and tools together form the two axes of an agent's capability surface.
267
240
 
268
- ```typescript
269
- import { SupervisorAgent, AgentManager } from '@namzu/sdk'
241
+ ### 12. Personas (`persona/`)
270
242
 
271
- const supervisor = new SupervisorAgent({
272
- id: 'lead',
273
- name: 'Project Lead',
274
- version: '1.0.0',
275
- category: 'coordination',
276
- description: 'Delegates sub-tasks to specialized agents',
277
- })
243
+ Personas describe who an agent is. `persona/assembler.ts` loads them from YAML and composes them with inheritance: a base `researcher` persona defines identity, expertise areas, output format, and reflexes; an `ml-researcher` child merges a single field (`expertise: [...base, 'ML', 'PyTorch']`) and inherits everything else. The assembler produces a typed `AgentPersona` that flows into the prompt as a structured segment (not a string concatenation, not a template hack), so prompt-cache-friendliness is preserved.
278
244
 
279
- const result = await supervisor.run(
280
- { messages: [{ role: 'user', content: 'Research, write, and review a Q3 report' }], workingDirectory: process.cwd() },
281
- {
282
- model: 'anthropic/claude-sonnet-4-20250514',
283
- tokenBudget: 32_768,
284
- timeoutMs: 1_800_000,
285
- provider,
286
- agentManager, // resolves agent ids → implementations
287
- agentIds: ['researcher', 'writer', 'reviewer'],
288
- systemPrompt: 'You coordinate specialists. Decompose tasks, delegate, and synthesize results.',
289
- },
290
- )
291
- // Child runs tracked via parent_run_id and depth
292
- ```
245
+ Personas are code-defined (YAML files in your repo). There is no database, no admin UI, no runtime mutation. That is deliberate: your agent's identity belongs in version control.
293
246
 
294
- ## Tool System
247
+ ### 13. Advisory System (`advisory/`)
295
248
 
296
- Define tools with Zod schemas, permission declarations, and destructiveness flags. The SDK includes built-in tools for file I/O, shell commands, and glob search.
249
+ An advisor is a specialized assistant a running agent can consult mid-execution. The main agent is solving a task; halfway through it hits a decision it is not confident about, or a domain it wants a second opinion on. It fires an advisory request with context; the advisory layer evaluates triggers, routes to the right advisor, executes on a (possibly different) provider, and returns a structured answer the main agent can act on.
297
250
 
298
- ```typescript
299
- import { defineTool, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
300
- import { z } from 'zod'
251
+ Pieces:
301
252
 
302
- const fetchApi = defineTool({
303
- name: 'fetch_api',
304
- description: 'Call an external API endpoint',
305
- inputSchema: z.object({ url: z.string().url(), method: z.enum(['GET', 'POST']) }),
306
- category: 'network',
307
- permissions: ['network_access'],
308
- readOnly: true,
309
- destructive: false,
310
- concurrencySafe: true,
311
- execute: async ({ url, method }) => {
312
- const resp = await fetch(url, { method })
313
- return { success: true, output: await resp.text() }
314
- },
315
- })
253
+ - `advisory/registry.ts` `AdvisorRegistry`, the catalog of available advisors keyed by domain.
254
+ - `advisory/evaluator.ts` — `TriggerEvaluator`, decides whether an advisory should fire given context and config.
255
+ - `advisory/executor.ts` `AdvisoryExecutor`, runs the advisor, collects its output, and feeds it back.
256
+ - `advisory/context.ts` `AdvisoryContext`, the payload passed to advisors.
316
257
 
317
- // Tool registry with progressive activation
318
- const registry = new ToolRegistry()
319
- registry.register(getBuiltinTools(), 'deferred')
320
- registry.register(fetchApi, 'active')
258
+ Unlike Anthropic's advisor tool (Claude-only, single advisor), Namzu's is **provider-agnostic** and **multi-advisor**: put a security advisor on Bedrock, an architecture advisor on OpenRouter, a legal advisor on Anthropic, and the agent decides who to consult. This is one of the things that most cleanly separates Namzu from the pack.
321
259
 
322
- // Agents can search deferred tools and activate on demand
323
- registry.activate(['read_file', 'bash'])
324
- const llmTools = registry.toLLMTools() // Only active + suspended tools
325
- ```
260
+ ### 14. Human-in-the-Loop (`types/hitl/`, `manager/plan/lifecycle.ts`, `types/decision/`)
326
261
 
327
- Built-in tools: `ReadFileTool`, `WriteFileTool`, `EditTool`, `BashTool`, `GlobTool`, `GrepTool`, `LsTool`, `SearchToolsTool`
262
+ HITL is structured, not just a "pause and wait for input" hook. The kernel defines typed decision contracts: the LLM produces a plan, the plan can be approved / edited / rejected, approval can be per-tool with explicit destructiveness acknowledgment, rejection can carry feedback that re-enters the loop as a new iteration. The plan lifecycle has its own manager so that pending plans persist across checkpoint resumes. The verification gate's `ask` decision routes into this same HITL layer.
328
263
 
329
- ### Plugin Contributions
264
+ The kernel does not render a UI for this — it emits events and exposes a typed API so the UI layer you choose can render them however you like.
330
265
 
331
- Plugins extend the runtime with tools, hooks, and MCP servers via a manifest. `PluginLifecycleManager.enable()` loads contributions on demand and rolls back cleanly on failure.
266
+ ### 15. Providers (`provider/`)
332
267
 
333
- ```typescript
334
- import { PluginLifecycleManager } from '@namzu/sdk'
335
-
336
- const manager = new PluginLifecycleManager({ pluginRegistry, toolRegistry, log })
337
- const plugin = await manager.install('/path/to/plugin', 'project')
338
- await manager.enable(plugin.id)
339
- // → manifest.tools registered as `${plugin}:${tool}` (deferred)
340
- // → manifest.hooks attached for run_start/end, iteration_start/end,
341
- // pre/post_llm_call, pre/post_tool_use
342
- // → manifest.mcpServers connected via stdio; their tools registered as
343
- // `${plugin}:mcp__${server}__${tool}` (deferred)
344
- ```
345
-
346
- Hook handlers can return `continue`, `modify` (rewrite tool input), `skip` (synthesize a tool result), or `error` (fail the run). Modify actions compose — chained hooks each see the previous hook's modified input. The runtime emits `plugin_hook_executing` / `plugin_hook_completed` events around every handler.
268
+ An LLM provider implements a narrow interface: given a typed request, return a typed response (streaming or not) and propagate normalized usage, cost, and cache telemetry. Today `provider/openrouter/` and `provider/bedrock/` are in the box; adding another vendor is adding one directory. `provider/telemetry/` normalizes provider-specific response fields (OpenRouter's `cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`, Bedrock's equivalents) into a single kernel-wide telemetry shape.
347
269
 
348
- ### Sandbox-Aware Execution
270
+ `ProviderFactory` is the single entry point. Every run chooses its provider by name; the provider object itself is stateless enough to be shared across runs.
349
271
 
350
- File and shell built-ins (`ReadFileTool`, `WriteFileTool`, `EditTool`, `BashTool`) route through `sandbox.exec()` / `sandbox.readFile()` / `sandbox.writeFile()` when a sandbox is present in the execution context, and fall back to native operations when not. Use `query()` (streaming generator) with a `ToolRegistry` and a `sandboxProvider`:
272
+ ### 16. Connectors (`connector/`)
351
273
 
352
- ```typescript
353
- import { drainQuery, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
274
+ A connector is how an agent reaches external systems. `connector/BaseConnector.ts` is the abstract base; `connector/mcp/` implements MCP connectors in both `stdio` and `http` transports with a `client.ts` and an `adapter.ts` that turns MCP tools into Namzu `ToolDefinition`s; `connector/builtins/` ships the built-in connectors (HTTP, shell, etc.); `connector/execution/` handles connector-level execution concerns. Plugin contributions can register connectors at runtime.
354
275
 
355
- const tools = new ToolRegistry()
356
- tools.register(getBuiltinTools(), 'active')
276
+ ### 17. Prompt Cache Integration
357
277
 
358
- // With sandbox: file + shell tool calls are isolated to the agent workspace
359
- const result = await drainQuery({
360
- agentId: 'solver', agentName: 'Solver', threadId,
361
- provider, tools, runConfig, messages, resumeHandler,
362
- sandboxProvider,
363
- })
364
- ```
278
+ The kernel takes prompt caching seriously because token cost is the number-one production constraint for agents. `runtime/query/context-cache.ts` maintains a per-thread `ContextCache` that hashes the inputs (system prompt + persona + skills + tools + base prompt) and only rebuilds when the hash changes. When the provider supports cache controls (OpenRouter's `cacheControl` parameter today, Anthropic and Bedrock cache headers in progress), the kernel attaches them, and the response's cache telemetry (`cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`) flows back into the run's usage metrics.
365
279
 
366
- ## Sandbox
280
+ This is why `PromptBuilder` splits a request into static and dynamic segments: the static segment is the cache target, and the kernel does the bookkeeping to keep it stable across iterations so the cache actually hits.
367
281
 
368
- Process-level isolation for agent tool execution. No Docker, no containers — native OS mechanisms.
282
+ ### 18. Vault (`vault/`)
369
283
 
370
- ```typescript
371
- import { drainQuery, SandboxProviderFactory, ToolRegistry, getBuiltinTools, getRootLogger } from '@namzu/sdk'
284
+ The vault holds BYOK credentials and arbitrary secrets. `InMemoryCredentialVault` is the default backend; the `CredentialVault` interface lets you plug in your own. Credentials are tenant-scoped — tenant A cannot see tenant B's keys. Tools, providers, and connectors resolve credentials through the vault rather than reading environment variables directly, so you can rotate without redeploying and you can audit who accessed what.
372
285
 
373
- const sandboxProvider = SandboxProviderFactory.create(
374
- { enabled: true, provider: 'local', timeoutMs: 60_000, memoryLimitMb: 512, maxProcesses: 16, cleanupOnDestroy: true },
375
- getRootLogger(),
376
- )
286
+ ### 19. Telemetry (`telemetry/`)
377
287
 
378
- const tools = new ToolRegistry()
379
- tools.register(getBuiltinTools(), 'active')
380
-
381
- const result = await drainQuery({
382
- agentId: 'coder', agentName: 'Coder', threadId,
383
- provider, tools, runConfig,
384
- messages: [{ role: 'user', content: 'Write a Python script and run it' }],
385
- resumeHandler,
386
- sandboxProvider, // sandbox-aware tools opt in here
387
- })
388
- ```
288
+ OpenTelemetry-native. `telemetry/attributes.ts` defines the canonical attribute keys; `telemetry/metrics.ts` defines the kernel's metrics surface. Every iteration, every tool call, every provider call emits spans with consistent attributes: `run.id`, `thread.id`, `agent.id`, `tenant.id`, `tool.name`, `provider.name`, `model`, `usage.input_tokens`, `usage.output_tokens`, `usage.cached_tokens`, `cost.usd`. Wire your existing OTel collector, or pipe to LangSmith / Langfuse / Braintrust via their OTel adapters.
389
289
 
390
- **How it works:**
290
+ ### 20. Plugin System (`plugin/`)
391
291
 
392
- | Platform | Mechanism | Profile |
393
- |----------|-----------|---------|
394
- | macOS | `sandbox-exec` with Seatbelt (SBPL) | Deny-default, allow-back for agent workspace |
395
- | Linux | Namespace isolation | Process + filesystem isolation |
292
+ Plugins extend the kernel at runtime. A plugin manifest declares what it contributes (tools, MCP servers, advisors, connectors), and the kernel's `plugin/loader.ts` reads manifests from disk, `plugin/resolver.ts` namespaces everything safely, and `plugin/lifecycle.ts` hooks plugin init / shutdown into the kernel's own lifecycle. Plugins can subscribe to iteration hooks via `runtime/query/plugin-hooks.ts` and shape what the LLM sees.
396
293
 
397
- The sandbox creates a temporary workspace directory, restricts file I/O to that directory, and destroys everything on cleanup. The seatbelt profile is minimal by design:
294
+ Plugins are how a community ecosystem grows around the kernel without the kernel having to ship batteries for every use case.
398
295
 
399
- - **Deny-default** nothing is allowed unless explicitly granted
400
- - **Workspace-scoped I/O** — reads and writes only within the agent's `rootDir`
401
- - **Path canonicalization** — resolves macOS symlinks (`/var` → `/private/var`) so seatbelt rules match real paths
402
- - **Process isolation** — `same-sandbox` scope for signals and process info
403
- - **Automatic lifecycle** — sandbox is created before query iteration, destroyed in `finally`
296
+ ### 21. Gateway (`gateway/`)
404
297
 
405
- ```typescript
406
- // Direct sandbox API (low-level)
407
- const sandbox = await sandboxProvider.create({
408
- workingDirectory: process.cwd(),
409
- timeoutMs: 30_000,
410
- memoryLimitMb: 512,
411
- maxProcesses: 16,
412
- })
298
+ `gateway/local.ts` is the local-process gateway — a thin translation layer between an external caller (HTTP, WebSocket, stdin, another agent over A2A) and the kernel's run API. Put a real HTTP server in front of it and you have an agent service; wrap it in a CLI and you have an agent shell. The gateway is where your application layer plugs into the kernel.
413
299
 
414
- const result = await sandbox.exec('/bin/sh', ['-c', 'echo hello'], { timeoutMs: 5_000 })
415
- console.log(result.stdout) // "hello\n"
416
- console.log(result.exitCode) // 0
300
+ ### 22. Agent Patterns (`agents/`)
417
301
 
418
- await sandbox.writeFile('script.py', 'print("namzu")')
419
- const content = await sandbox.readFile('script.py')
302
+ Four patterns ship in the kernel. They are not mandatory — you can write your own `AbstractAgent` subclass for custom loops — but these are the shapes most real workloads want.
420
303
 
421
- await sandbox.destroy() // Cleanup workspace
422
- ```
304
+ - **`ReactiveAgent`** — the canonical agent loop. Prompt → LLM → tool call(s) iterate → stop. Handles token budget, cost limit, timeout, max iterations, HITL injection, progressive tool disclosure, compaction, and checkpointing automatically.
305
+ - **`PipelineAgent`** — deterministic sequential steps. Each step is a typed function; output of step N is input of step N+1. Rolls back on failure. Useful for ETL, RAG ingestion, multi-stage document processing.
306
+ - **`RouterAgent`** — an LLM classifies the input and delegates to the best-suited agent from a configured set of candidates, with a fallback. Useful for intent routing in customer support, dispatcher bots, and multi-expert systems.
307
+ - **`SupervisorAgent`** — a coordinator that spawns and orchestrates a set of specialized child agents. Tracks the full parent/child/depth hierarchy, aggregates results, handles partial failures, and honors the shared budget tracker.
423
308
 
424
- ## Providers
309
+ All four sit on top of the same lifecycle manager, the same limit checker, the same bus, the same verification gate. Switching patterns does not change what safety or durability the kernel provides.
425
310
 
426
- Pluggable LLM backends with a unified interface for chat, streaming, and model discovery.
311
+ ### 23. Multi-Tenant Isolation
427
312
 
428
- The provider is constructed once with credentials; the model is selected per chat/run so you can swap models without rebuilding the client.
313
+ Every registry, every store, every vault is tenant-scoped. `TenantId` is a branded ID threaded through the kernel's types. A run for tenant A cannot accidentally read tenant B's knowledge base, invoke tenant B's tools, or resolve tenant B's credentials. This is not a feature you turn on it is the default, and a single-tenant setup is just a special case.
429
314
 
430
- ```typescript
431
- import { ProviderFactory } from '@namzu/sdk'
315
+ ### 24. Thread / Run Separation
432
316
 
433
- // OpenRouter (BYOK)
434
- const openrouter = ProviderFactory.createProvider({
435
- type: 'openrouter',
436
- apiKey: process.env.OPENROUTER_KEY!,
437
- })
317
+ A **thread** is a conversation: a series of user ↔ assistant messages, possibly spanning many sessions, probably spanning many days. A **run** is a single execution pass: an input, iterations, tool calls, usage, cost, result. One thread has many runs. Most frameworks conflate the two; Namzu keeps them explicit, with separate stores, separate IDs, and separate serialization. Multi-turn dialogs carry only the context the kernel thinks matters (via compaction), and run traces stay auditable without drowning in prior-turn tool chatter.
438
318
 
439
- // AWS Bedrock
440
- const bedrock = ProviderFactory.createProvider({
441
- type: 'bedrock',
442
- region: 'us-east-1',
443
- })
319
+ ---
444
320
 
445
- // Streaming — model is part of the per-call params
446
- for await (const chunk of openrouter.chatStream({
447
- model: 'anthropic/claude-sonnet-4-20250514',
448
- messages: [{ role: 'user', content: 'hi' }],
449
- })) {
450
- process.stdout.write(chunk.delta?.content ?? '')
451
- }
321
+ ## Install
452
322
 
453
- // Model discovery
454
- const models = await openrouter.listModels()
323
+ ```bash
324
+ npm install @namzu/sdk
455
325
  ```
456
326
 
457
- ## RAG
327
+ Requirements: Node ≥ 22, TypeScript strict mode, ESM.
458
328
 
459
- End-to-end retrieval-augmented generation: chunk documents, embed them, store and search vectors, and inject context into agent prompts.
329
+ ## Quick Start
460
330
 
461
331
  ```typescript
462
- import {
463
- TextChunker,
464
- OpenRouterEmbeddingProvider,
465
- InMemoryVectorStore,
466
- DefaultRetriever,
467
- DefaultKnowledgeBase,
468
- createRAGTool,
469
- } from '@namzu/sdk'
470
-
471
- // Chunking (fixed-size, sentence, paragraph, or recursive)
472
- const chunker = new TextChunker()
473
- const chunks = chunker.chunk(document, {
474
- strategy: 'recursive',
475
- chunkSize: 512,
476
- chunkOverlap: 64,
332
+ import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
333
+ import { z } from 'zod'
334
+
335
+ const searchWeb = defineTool({
336
+ name: 'search_web',
337
+ description: 'Search the web for information',
338
+ inputSchema: z.object({ query: z.string() }),
339
+ category: 'network',
340
+ permissions: ['network_access'],
341
+ readOnly: true,
342
+ destructive: false,
343
+ concurrencySafe: true,
344
+ execute: async ({ query }) => {
345
+ const r = await fetch(`https://api.search.com?q=${query}`)
346
+ return { success: true, output: await r.text() }
347
+ },
477
348
  })
478
349
 
479
- // Embedding
480
- const embedder = new OpenRouterEmbeddingProvider({
481
- model: 'openai/text-embedding-3-small',
350
+ const provider = ProviderFactory.createProvider({
351
+ type: 'openrouter',
482
352
  apiKey: process.env.OPENROUTER_KEY!,
483
353
  })
484
354
 
485
- // Vector store and retriever
486
- const vectorStore = new InMemoryVectorStore()
487
- const retriever = new DefaultRetriever(vectorStore, embedder)
488
-
489
- // Knowledge base — pass (config, vectorStore, embeddingProvider)
490
- const kb = new DefaultKnowledgeBase(
491
- { id: 'docs', name: 'API Guides', tenantId: 'default' },
492
- vectorStore,
493
- embedder,
494
- )
495
- await kb.ingest(apiDoc, { title: 'API Guide', source: 'doc-1' })
496
- const results = await kb.query({ text: 'How do I authenticate?', config: { topK: 5 } })
355
+ const tools = new ToolRegistry()
356
+ tools.register(searchWeb)
497
357
 
498
- // Attach to agent as a tool
499
- const ragTool = createRAGTool({
500
- knowledgeBases: new Map([['docs', kb]]),
501
- defaultKnowledgeBaseId: 'docs',
358
+ const agent = new ReactiveAgent({
359
+ id: 'researcher',
360
+ name: 'Research Assistant',
361
+ version: '1.0.0',
362
+ category: 'research',
363
+ description: 'Finds and synthesizes information',
502
364
  })
503
- ```
504
365
 
505
- ## Connectors
506
-
507
- Unified framework for integrating external services — HTTP APIs, webhooks, and MCP servers — with execution contexts and multi-tenant isolation.
508
-
509
- ```typescript
510
- import {
511
- HttpConnector,
512
- ConnectorManager,
513
- ConnectorRegistry,
514
- MCPClient,
515
- MCPConnectorBridge,
516
- TenantConnectorManager,
517
- } from '@namzu/sdk'
518
-
519
- // HTTP connector — configure via connect()
520
- const slack = new HttpConnector()
521
- await slack.connect(
522
- { id: 'slack', baseUrl: 'https://slack.com/api' },
523
- { type: 'bearer', token: process.env.SLACK_TOKEN! },
366
+ const result = await agent.run(
367
+ { messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
368
+ { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
524
369
  )
525
-
526
- // MCP client (stdio or HTTP-SSE transport)
527
- const mcpClient = new MCPClient({
528
- serverName: 'my-tools',
529
- transport: { type: 'stdio', command: 'node', args: ['server.js'] },
530
- })
531
- await mcpClient.connect()
532
- const tools = await mcpClient.listTools()
533
- const result = await mcpClient.callTool('my_tool', { input: 'value' })
534
-
535
- // Bridge MCP as a connector so connector-based code paths can reach it
536
- const connectorManager = new ConnectorManager({ registry: new ConnectorRegistry() })
537
- const mcpBridge = new MCPConnectorBridge({ manager: connectorManager })
538
- const discoveredTools = await mcpBridge.listTools()
539
- await mcpBridge.callTool('my_tool', { input: 'value' })
540
-
541
- // Multi-tenant isolation
542
- const tenantManager = new TenantConnectorManager({ registry: new ConnectorRegistry() })
543
- tenantManager.registerTenant({ tenantId: 'org-123', name: 'Org 123' })
544
370
  ```
545
371
 
546
- MCP servers can also be declared in a plugin manifest (`mcpServers: [{ name, command, args, env }]`). The plugin lifecycle starts each server on enable, discovers its tools, and registers them under the plugin namespace. Disable disconnects the clients before unregistering the tools.
547
-
548
- ## Human-in-the-Loop
549
-
550
- Pause agent execution for human review of plans and tool calls. Checkpoint and resume runs across sessions.
372
+ That is a complete, sandbox-isolated, checkpointed, telemetrized agent run with prompt caching, progressive tool disclosure, structured compaction, and emergency save all wired in by default. Those are not features you enable; they are how the kernel runs.
551
373
 
552
- Plan approval and tool review are separate handlers wired at different points:
374
+ Examples for `PipelineAgent`, `RouterAgent`, and `SupervisorAgent` are in `src/agents/`.
553
375
 
554
- ```typescript
555
- import { PlanManager, drainQuery, autoApproveHandler } from '@namzu/sdk'
556
- import type { ResumeHandler } from '@namzu/sdk'
557
-
558
- // 1. Plan approval — runs when the agent produces a plan
559
- const planManager = new PlanManager(runId, async (request) => {
560
- const decision = await showPlanUI(request)
561
- return {
562
- approved: decision.approved,
563
- feedback: decision.feedback,
564
- modifiedSteps: decision.editedSteps,
565
- }
566
- })
567
-
568
- // 2. Tool review — runs for every pending tool call (required by query/drainQuery)
569
- const resumeHandler: ResumeHandler = async (request) => {
570
- if (request.type === 'tool_review') {
571
- const hasDestructive = request.toolCalls.some((t) => t.isDestructive)
572
- return hasDestructive
573
- ? { action: 'reject_tools', feedback: 'Destructive tool blocked' }
574
- : { action: 'approve_tools' }
575
- }
576
- if (request.type === 'plan_approval') {
577
- return { action: 'approve_plan' }
578
- }
579
- return { action: 'continue' }
580
- }
581
-
582
- await drainQuery({ /* ...runConfig, provider, tools, messages, */ resumeHandler })
583
- ```
584
-
585
- Checkpoint/resume enables long-running agents to pause and restart without losing state (`CheckpointManager`, `checkpointId` in `QueryParams`).
586
-
587
- ## A2A Protocol
588
-
589
- Agent-to-Agent protocol support for cross-platform agent interoperability. Publish agent cards, accept A2A messages, and bridge between Namzu runs and A2A tasks.
590
-
591
- ```typescript
592
- import { buildAgentCard, runToA2ATask, a2aMessageToCreateRun } from '@namzu/sdk'
593
-
594
- // Publish agent capabilities as an A2A Agent Card
595
- const card = buildAgentCard(agentInfo, {
596
- baseUrl: 'https://api.example.com',
597
- transport: 'rest',
598
- providerOrganization: 'Cogitave',
599
- })
600
- // Serve at /.well-known/agent-card.json
601
-
602
- // Convert an inbound A2A message-send into run creation params
603
- const runParams = a2aMessageToCreateRun(agentId, {
604
- message: a2aMessage,
605
- contextId: a2aMessage.contextId,
606
- metadata: { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192 },
607
- })
608
-
609
- // Convert a persisted Run (wire type) + thread messages into an A2A task response
610
- const a2aTask = runToA2ATask(run, threadMessages)
611
- ```
612
-
613
- ## Streaming (SSE)
614
-
615
- Map internal agent execution events to Server-Sent Events for real-time client updates.
616
-
617
- Agents emit `RunEvent`s through the listener passed to `run()` / `drainQuery()`. `mapRunToStreamEvent` translates those into SSE-ready `{ event, data }` tuples (returns `null` for events without a wire mapping, which you should skip):
618
-
619
- ```typescript
620
- import { mapRunToStreamEvent, drainQuery } from '@namzu/sdk'
621
-
622
- // Event families: run.*, iteration.*, tool.*, token.*, message.*, review.*,
623
- // checkpoint.*, activity.*, plan.*, agent.*, task.*, plugin.*, sandbox.*
624
- const listener = (event) => {
625
- const mapped = mapRunToStreamEvent(event, runId)
626
- if (!mapped) return
627
- response.write(`event: ${mapped.wire}\ndata: ${JSON.stringify(mapped.data)}\n\n`)
628
- }
629
-
630
- await drainQuery({ /* ...runConfig, provider, tools, messages */ }, listener)
631
- ```
632
-
633
- ## Persona System
634
-
635
- Layer-based system prompt assembly with inheritance. Define identity, expertise, reflexes, and output format as structured data.
636
-
637
- ```typescript
638
- import { assembleSystemPrompt, mergePersonas, withSessionContext } from '@namzu/sdk'
639
-
640
- const basePersona = {
641
- identity: { role: 'Research Agent', description: 'Gathers and synthesizes information' },
642
- expertise: { domains: ['academic research', 'data analysis'] },
643
- reflexes: { constraints: ['Always cite sources', 'Be concise'] },
644
- output: { format: 'markdown' },
645
- }
646
-
647
- // Specialize via inheritance
648
- const mlResearcher = mergePersonas(basePersona, {
649
- expertise: { domains: ['machine learning', 'NLP'] },
650
- })
376
+ ---
651
377
 
652
- // Assemble final system prompt with skills injected
653
- const systemPrompt = assembleSystemPrompt(mlResearcher, loadedSkills)
654
- ```
378
+ ## Design Principles
655
379
 
656
- ## Skills System
380
+ Five choices shape every decision in the kernel.
657
381
 
658
- Reusable agent behaviors with progressive disclosure (metadata-only full body) and inheritance chains.
382
+ **No workarounds. Fix at the root.** When something is wrong, we fix the pattern, not the symptom. A subtle bug in the lifecycle manager means the lifecycle manager changes — we do not paper over it in the agent pattern that calls it.
659
383
 
660
- ```typescript
661
- import { SkillRegistry, resolveSkillChain } from '@namzu/sdk'
384
+ **Type safety is the foundation.** Every resource ID is branded (`RunId`, `ThreadId`, `TaskId`, `TenantId`, `AgentId`, `ToolId`, `MemoryId`, `ChunkId`...). Every discriminated union has exhaustiveness checks. Every public API has Zod-validated inputs at the boundary. The TypeScript compiler is not a formality; it is the first line of defense.
662
385
 
663
- const registry = new SkillRegistry()
664
- await registry.registerAll('/path/to/skills', 'metadata')
386
+ **Deny by default. Fail fast.** Sandboxes deny file I/O by default. Verification gates deny tool calls by default unless a rule allows them. Limit checkers fail the run the moment a budget is breached. Configuration errors throw at boot, not at the 90-minute mark of a long-running job.
665
387
 
666
- // Load full skill content on demand returns SkillLoadResult | undefined
667
- const loaded = await registry.load('web-search', 'full')
668
- const skill = loaded?.skill
669
-
670
- // Resolve inheritance: shared skills + agent-specific overrides
671
- const chain = await resolveSkillChain(
672
- '/skills/shared',
673
- '/skills/agent-specific',
674
- 'metadata',
675
- )
676
- ```
388
+ **Dependency direction is sacred.** `contracts` knows nothing about `sdk`. `sdk` knows nothing about `agents` or `api`. Circular dependencies are a compile error, not a code-review suggestion. This is what keeps the kernel's interface surface small even as its guts grow.
677
389
 
678
- ## Threads & Conversations
390
+ **Convention over surprise.** Every new feature follows a shared pattern language — Registries, Managers, Stores, Runs, Bridges, Providers. You read one subsystem, you can navigate the next one.
679
391
 
680
- Namzu separates **threads** (clean user↔assistant conversation history) from **runs** (full execution traces with tool calls, iterations, and internal state). This means multi-turn conversations carry only the relevant context — no tool noise leaking between runs.
681
-
682
- ```typescript
683
- import { InMemoryConversationStore } from '@namzu/sdk'
684
-
685
- const store = new InMemoryConversationStore({ maxMessages: 50 })
392
+ ---
686
393
 
687
- // Start a thread
688
- store.createThread('thd_abc123')
689
- store.addUserMessage('thd_abc123', 'What is the capital of France?')
394
+ ## The Agent Event Protocol (AEP)
690
395
 
691
- // After a run completes, persist only the final assistant response
692
- store.persistRunResult('thd_abc123', runId, runMessages)
396
+ The kernel's contract with the outside world is a typed, versioned event stream. Any UI, any shell, any observability tool subscribes to AEP and renders what it wants.
693
397
 
694
- // Next run loads clean conversation history (no tool calls, no system messages)
695
- const history = store.loadMessages('thd_abc123')
696
- // → [{ role: 'user', content: '...' }, { role: 'assistant', content: '...' }]
697
- ```
398
+ AEP flows over three transports:
698
399
 
699
- The `ConversationStore` interface is pluggable swap in SQLite, Postgres, or any backend. `InMemoryConversationStore` is bundled for non-persistent use; applications wire it into the runtime themselves.
400
+ - **Bus** (`bus/`) — in-process, for tightly-coupled consumers.
401
+ - **SSE** (`bridge/sse/mapper.ts`) — cross-process over HTTP, for web UIs and remote observers.
402
+ - **A2A** (`bridge/a2a/`) — cross-agent, for multi-agent meshes.
700
403
 
701
- ## Persistence
404
+ Every transport emits the same event shape. Event types include run lifecycle (`run_started`, `run_paused`, `run_completed`), iteration events (`iteration_started`, `checkpoint_created`), tool events (`tool_called`, `tool_result`), agent events (`agent_pending`, `agent_canceled`), plan events (`plan_requested`, `plan_approved`), advisory events, and error events. They carry consistent metadata: `runId`, `threadId`, `agentId`, `tenantId`, `timestamp`, `depth`, `parentRunId`.
702
405
 
703
- In-memory and disk-backed stores for runs, tasks, conversations, and activities.
406
+ AEP v1 is being finalized. Until the spec is stamped, treat the event shapes as semver-minor.
704
407
 
705
- ```typescript
706
- import { RunPersistence, DiskTaskStore, getRootLogger } from '@namzu/sdk'
707
-
708
- // Run persistence with token/cost tracking
709
- const persistence = new RunPersistence({
710
- runId,
711
- agentId: 'researcher',
712
- agentName: 'Research Assistant',
713
- providerId: 'openrouter',
714
- outputDir: './runs',
715
- runConfig: {
716
- model: 'anthropic/claude-sonnet-4-20250514',
717
- tokenBudget: 8192,
718
- timeoutMs: 600_000,
719
- temperature: 0.7,
720
- },
721
- log: getRootLogger(),
722
- })
723
- await persistence.init()
724
- persistence.accumulateUsage({
725
- promptTokens: 100,
726
- completionTokens: 50,
727
- totalTokens: 150,
728
- })
729
- await persistence.persist()
730
-
731
- // Task store with atomic writes (tenant-aware)
732
- const taskStore = new DiskTaskStore({
733
- baseDir: './tasks',
734
- defaultRunId: runId,
735
- tenantId: 'org-1',
736
- })
737
- ```
738
-
739
- ## Telemetry
408
+ ---
740
409
 
741
- OpenTelemetry integration for distributed tracing and metrics across agents, tools, and providers.
410
+ ## What You Can Build
742
411
 
743
- ```typescript
744
- import { initTelemetry, getTracer, createPlatformMetrics } from '@namzu/sdk'
412
+ Namzu is not a toy. It is meant for real workloads.
745
413
 
746
- const telemetry = initTelemetry({
747
- serviceName: 'agent-platform',
748
- exporterType: 'otlp',
749
- otlpEndpoint: 'http://localhost:4318',
750
- otlpHeaders: { authorization: `Bearer ${process.env.OTLP_TOKEN!}` },
751
- })
752
- await telemetry.start()
414
+ **Personal and homelab.** A home-automation agent monitoring logs, restarting services, running health checks. A personal research agent feeding PDFs and notes through the RAG pipeline into a knowledge base, answering with citations from your own data. A code-review agent watching your repos, reviewing PRs with a `PipelineAgent` (extract diff → analyze → write review), and posting feedback automatically. A media organizer scanning your library, categorizing files, renaming based on metadata, deduplicating.
753
415
 
754
- const tracer = getTracer()
755
- const metrics = createPlatformMetrics()
416
+ **Business and team.** A customer-support triage system where a `RouterAgent` classifies incoming tickets and delegates to specialized children (billing, technical, general), each with its own persona, tools, and knowledge base. A document-processing pipeline ingesting contracts, invoices, and reports through RAG, extracting key data, flagging anomalies, generating summaries, with HITL approval for anything destructive. An internal-ops bot that plugs into Slack, Jira, and your database over MCP. A compliance checker where a `SupervisorAgent` coordinates sub-agents each checking a different regulation, then aggregates results and routes flagged items through plan review.
756
417
 
757
- const span = tracer.startSpan('agent.run')
758
- metrics.recordTokenUsage('anthropic/claude-sonnet-4-20250514', 100, 50)
759
- metrics.recordToolCall('search_web', true)
760
- span.end()
418
+ **Platform and SaaS.** This is the shape Namzu was designed for from day one. Agent-as-a-Service — each customer gets isolated agents with their own BYOK keys, connector configs, and knowledge bases; tenant isolation is built in, not bolted on. An agent marketplace — agents are portable definitions (`info + tools + persona + skills`), publishable, deployable by any customer with their own keys, specializable through persona inheritance. Cross-organization workflows where agents from different companies discover each other via A2A agent cards and collaborate without a central authority.
761
419
 
762
- metrics.recordRunDuration('completed', 12.4)
763
- ```
420
+ ---
764
421
 
765
- ## Architecture
422
+ ## Quality Bar
766
423
 
767
- ```
768
- @namzu/sdk
769
- ├── advisory/ Advisor registry, execution, trigger evaluation
770
- ├── agents/ Reactive, Pipeline, Router, Supervisor
771
- ├── bridge/ A2A, SSE, connector→tool adapters
772
- ├── bus/ Agent bus and coordination primitives
773
- ├── compaction/ WorkingState extraction and conversation compaction
774
- ├── config/ Runtime configuration with Zod schemas
775
- ├── connector/ HTTP, webhook, MCP client/server, tenant isolation
776
- ├── constants/ Shared SDK constants
777
- ├── contracts/ External wire types and validation schemas (HTTP/A2A/SSE)
778
- ├── execution/ Base and local execution contexts
779
- ├── gateway/ Local task gateway
780
- ├── manager/ Plan, agent, connector, run lifecycle
781
- ├── persona/ System prompt assembly and merging
782
- ├── plugin/ Manifest discovery, lifecycle, contributions, hooks
783
- ├── provider/ OpenRouter, Bedrock, Mock LLM providers
784
- ├── rag/ Chunking, embedding, vector store, retrieval
785
- ├── registry/ Base, managed, agent, connector, tool, plugin registries
786
- ├── router/ Task→model routing
787
- ├── run/ Reporters and limit checking
788
- ├── runtime/ Query engine, iteration phases, decision parser
789
- ├── sandbox/ Process-level isolation (Seatbelt, namespace)
790
- ├── skills/ Skill registry, discovery, and chaining
791
- ├── store/ In-memory, disk, conversation, activity, task, memory
792
- ├── telemetry/ OpenTelemetry tracing and metrics
793
- ├── tools/ defineTool, built-ins, task / advisory / memory tools
794
- ├── types/ Domain model and internal type definitions
795
- ├── utils/ ID generation, cost calc, hashing, logging, shell
796
- ├── vault/ Credential management
797
- └── verification/ Verification gate and rules
798
- ```
424
+ On architectural fundamentals, Namzu scores at the top of open-source agent frameworks.
799
425
 
800
- ## Vision
426
+ | Criterion | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
427
+ |---|---|---|---|---|---|
428
+ | Type Safety | 9 | 5 | 7 | 7 | 9 |
429
+ | Modularity | 9 | 5 | 7 | 8 | 9 |
430
+ | Interface Segregation | 8 | 4 | 6 | 8 | 8 |
431
+ | Extensibility | 9 | 7 | 6 | 6 | 7 |
432
+ | Convention Consistency | 8 | 5 | 7 | 8 | 8 |
433
+ | Dependency Direction | 9 | 4 | 6 | 8 | 8 |
434
+ | **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
801
435
 
802
- AI agents shouldn't be locked behind walled gardens. Today, building production-grade agents means choosing a platform and accepting its constraints its models, its pricing, its rules. We believe agent infrastructure should be open, composable, and owned by the people who build on it.
436
+ Scores are informed from public docs, community reports, and direct codebase analysis not a definitive ranking. Where we know we have work to do: test coverage is not where the architecture deserves. Helping close that gap is the highest-leverage contribution today.
803
437
 
804
- Namzu exists to make that real. A single SDK that works with any LLM, any tool ecosystem, any deployment model. No vendor lock-in. No surprise pricing changes. No permission needed.
438
+ ---
805
439
 
806
- ### Where we're headed
440
+ ## Roadmap
807
441
 
808
- **Now** Core SDK with multi-agent orchestration, tool system, RAG, MCP, and A2A support. Everything you need to build and run agents locally or on your own infrastructure.
442
+ Honest view. The kernel is already deep. The next three releases tighten the consumer surface, add the subsystems that are genuinely missing, and extend the driver model to new I/O shapes.
809
443
 
810
- **Next**Managed runtime for deploying agents at scale, conversational agent builder (build, configure, and deploy agents entirely through chat), and a marketplace for sharing agent definitions and tool connectors.
444
+ ### v0.2 Surface Polish (short, mostly wiring + docs)
811
445
 
812
- **Later** Decentralized agent network where agents discover and collaborate with each other across organizations via A2A, without a central authority.
446
+ - `Run.replay(runId, { fromCheckpoint })` API on top of the existing checkpoint store
447
+ - Memory promotion pipeline connecting compaction output to the indexed memory store via a Reflector persona
448
+ - **AEP v1 spec** — version and document the event shapes in `bridge/sse/mapper.ts`
449
+ - Public pattern docs for lifecycle, checkpoints, emergency save, budget / quota, verification gate, context cache, file ownership, and circuit breaker
450
+ - `ContextCache` generalized across providers (OpenRouter today → Anthropic, Bedrock next)
813
451
 
814
- We're building this in the open because we believe the agent layer of the stack should belong to everyone. If you share this belief, come build with us.
452
+ ### v0.3 New Subsystems (the four genuinely missing pieces)
815
453
 
816
- ## License
454
+ - **Workflow / process-graph DSL** — typed `step / branch / parallel / loop / hitl` builder, durable on top of the existing checkpoint and lifecycle
455
+ - **Evaluation subsystem** — `Dataset` + `Scorer` + `Experiment` primitives with a `namzu eval run` CLI, model-graded / rule-based / statistical scorers, SCD-2 versioning
456
+ - **Content-level guardrails** — a second policy layer next to the verification gate, covering LLM I/O (PII, prompt injection, output schema, toxicity) with per-tenant and per-tool attachment
457
+ - **Semantic cache** and **prompt compression** as opt-in additions next to the existing `ContextCache`
817
458
 
818
- This software is licensed under the [Functional Source License, Version 1.1, MIT Future License (FSL-1.1-MIT)](./LICENSE.md).
459
+ ### v0.4 Drivers and I/O (extending the driver model)
819
460
 
820
- **What this means:**
461
+ - **Voice driver** — unified STT / TTS provider abstraction, duplex streaming, real-time speech-to-speech
462
+ - **Multimodal tool I/O** — MIME-typed binary handles for image, audio, and video inputs and outputs
463
+ - **Computer-use driver** — reference implementation with its own sandbox profile
464
+ - **Deterministic provider replay** — cassette pattern for eval and CI, separate from run-level checkpoints
821
465
 
822
- - **Free for internal use, education, and research**
823
- - **Free to use in your own products** (as long as you're not building a competing agent platform)
824
- - **Each version converts to MIT after 2 years** — fully open source, no strings attached
466
+ ### Explicitly out of scope (community or separate packages)
825
467
 
826
- Enterprise licensing is available for organizations that need to build competing products or services. Contact us at enterprise@cogitave.com.
468
+ - `@namzu/react`, `@namzu/svelte`, `@namzu/vue` chat hooks
469
+ - Next.js / Hono / Cloudflare Workers adapters
470
+ - A dev studio playground (would consume AEP, lives in its own repo)
471
+ - A visual observability dashboard in the style of VoltOps or LangSmith
827
472
 
828
- ## Contributing
473
+ These are valuable — they belong on top of the kernel, not in it. Keeping the kernel's interface surface small is why the kernel can move fast.
829
474
 
830
- We welcome contributions! Please read our contributing guidelines (coming soon) before submitting a PR.
475
+ ---
831
476
 
832
- ## Security
477
+ ## License and Vision
833
478
 
834
- Found a vulnerability? Please report it responsibly. See [SECURITY.md](./SECURITY.md) for details.
479
+ [FSL-1.1-MIT](./LICENSE.md). Every version becomes fully MIT two years after release.
835
480
 
836
- ---
481
+ The vision: an open, community-driven agent kernel that reduces systemic dependencies on proprietary platforms — so everyone can build, own, and run AI agents freely. Namzu works with any LLM provider through BYOK, runs in isolation without container orchestration, and surfaces a stable protocol so the application layer stays yours.
837
482
 
838
- Built by [@bahadirarda](https://github.com/bahadirarda) · [Cogitave](https://github.com/Cogitave)
483
+ If that resonates, we would love your help. Bug reports, feature ideas, PRs, a kind word on your blog — all of it matters. The fastest way in is to pick a subsystem from `src/` that looks interesting, read its code, and open an issue or a PR.