npm - @namzu/sdk - Versions diffs - 0.1.5 → 0.1.6 - Mend

@namzu/sdk 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/CHANGELOG.md +16 -0
package/README.md +314 -669
package/dist/bridge/tools/connector/adapter.d.ts +2 -2
package/dist/config/runtime.d.ts +52 -52
package/dist/connector/builtins/webhook.d.ts +1 -1
package/dist/contracts/a2a.d.ts +125 -125
package/dist/contracts/schemas.d.ts +34 -34
package/dist/index.d.ts +2 -0
package/dist/index.d.ts.map +1 -1
package/dist/index.js +2 -0
package/dist/index.js.map +1 -1
package/dist/tools/builtins/__tests__/computer-use.test.d.ts +2 -0
package/dist/tools/builtins/__tests__/computer-use.test.d.ts.map +1 -0
package/dist/tools/builtins/__tests__/computer-use.test.js +146 -0
package/dist/tools/builtins/__tests__/computer-use.test.js.map +1 -0
package/dist/tools/builtins/__tests__/structuredOutput.example.d.ts +10 -10
package/dist/tools/builtins/computer-use.d.ts +185 -0
package/dist/tools/builtins/computer-use.d.ts.map +1 -0
package/dist/tools/builtins/computer-use.js +151 -0
package/dist/tools/builtins/computer-use.js.map +1 -0
package/dist/tools/builtins/index.d.ts +1 -0
package/dist/tools/builtins/index.d.ts.map +1 -1
package/dist/tools/builtins/index.js +1 -0
package/dist/tools/builtins/index.js.map +1 -1
package/dist/tools/builtins/ls.d.ts +1 -1
package/dist/types/computer-use/index.d.ts +74 -0
package/dist/types/computer-use/index.d.ts.map +1 -0
package/dist/types/computer-use/index.js +35 -0
package/dist/types/computer-use/index.js.map +1 -0
package/dist/types/plugin/index.d.ts +14 -14
package/dist/types/sandbox/index.d.ts +2 -2
package/dist/types/verification/index.d.ts +18 -18
package/package.json +19 -21
package/src/index.ts +5 -0
package/src/tools/builtins/__tests__/computer-use.test.ts +188 -0
package/src/tools/builtins/computer-use.ts +165 -0
package/src/tools/builtins/index.ts +1 -0
package/src/types/computer-use/index.ts +126 -0

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Namzu
+# Namzu — An Operating System for AI Agents
-Open-source AI agent SDK with a built-in runtime. Nothing between you and your agents.
+**A dependency-free, fully controllable kernel that runs, isolates, schedules, remembers, and coordinates AI agents.** Your UI, chat interface, voice surface, CLI, or automation pipeline sits on top of a stable kernel instead of reinventing the hard parts: sandbox isolation, process lifecycle, signals, checkpoints, memory, protocol interop, and audit.
 [![npm](https://img.shields.io/npm/v/@namzu/sdk?color=blue)](https://www.npmjs.com/package/@namzu/sdk)
 [![CI](https://github.com/cogitave/namzu/actions/workflows/ci.yml/badge.svg)](https://github.com/cogitave/namzu/actions/workflows/ci.yml)
@@ -8,831 +8,476 @@ Open-source AI agent SDK with a built-in runtime. Nothing between you and your a
 [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue)](https://www.typescriptlang.org/)
 [![Node](https://img.shields.io/badge/node-%3E%3D22-brightgreen)](https://nodejs.org/)
-> **Our goal is to build an open, community-driven agent framework that reduces systemic dependencies on proprietary platforms — so that everyone can build, own, and run AI agents freely.** Namzu is designed to work with any LLM provider through a bring-your-own-key model, and every version becomes fully open source (MIT) after two years. If this vision resonates with you, we'd love your help — whether it's a bug report, a feature idea, a pull request, or just spreading the word. Every contribution matters.
+---
-## Why Namzu?
+## The Thesis
-There are great agent frameworks out there — LangChain, CrewAI, AutoGen, Vercel AI SDK, OpenAI Agents SDK. Each solves a real problem. Namzu exists because we think some things are still missing.
+Most "agent frameworks" today are really application frameworks. They ship chat UIs, picking UI layouts, batteries-included hosted dashboards, vendor-specific fast paths, and integration drivers for a handful of databases. You get something you can demo in an hour, and three months later you own a stack where the same framework dictates your frontend, your database, your observability, and your model vendor.
-**Sandboxed execution.** Agents execute tools inside process-level sandboxes. macOS uses Seatbelt (SBPL) profiles for deny-default file-I/O and process isolation. Linux uses lightweight mount + PID namespace isolation for process scoping, with resource limits (memory, timeout, max processes) enforced by the runtime. No Docker, no containers.
+We think agent software should be layered like Unix. At the bottom there needs to be a **kernel**: something to isolate processes, schedule tool calls, manage memory pressure, propagate signals across a call tree, persist checkpoints so a run can resume after a crash, mediate inter-process communication, and produce an auditable event stream. Above the kernel there is user space — shells, editors, IDEs, voice gateways, React apps. The kernel does not care which shell you pick; the shell cannot break the isolation the kernel provides.
-**True provider independence.** Most frameworks say they're provider-agnostic but are optimized for one vendor. Namzu treats every provider as a first-class citizen through BYOK (Bring Your Own Key). Switch from OpenRouter to Bedrock by changing one line. No performance penalties, no second-class APIs.
+**Namzu is the kernel.** It runs agents the way Unix runs processes. It does not render UI, it does not pick your database, it does not favor one LLM vendor. It gives you a surface — typed, versioned, documented — that any UI, any storage backend, and any model can plug into. The surface is small and stable; the guts underneath are deep.
-**Thread/Run separation.** Other frameworks mix conversation history with execution traces. Namzu cleanly separates threads (user↔assistant conversation) from runs (tool calls, iterations, internal state). Multi-turn conversations carry only the context that matters.
+---
-**A2A + MCP in one SDK.** Google's Agent-to-Agent protocol and Anthropic's Model Context Protocol are usually separate integrations. Namzu bridges both natively — your agents can expose capabilities via A2A agent cards and consume external tools via MCP, out of the box.
+## What Namzu Is
+Namzu is a single-process TypeScript kernel with the following responsibilities:
+- **Process execution and isolation.** Tools run inside OS-level sandboxes: Seatbelt (SBPL) on macOS, mount + PID namespaces on Linux. Deny-default file I/O, scoped network access, enforced resource limits. No Docker, no container runtime, no daemon, no sidecar.
+- **Agent lifecycle.** Parent/child agent spawn with depth tracking, budget splitting, and causal trace linkage. A supervisor can fork a subtree of agents and get their results back, with each child isolated from its siblings.
+- **Scheduling.** Per-run token, cost, wall-clock, and iteration budgets. Limit checker, task router (cheap model for compaction, expensive for coding), tool tiering (LLM learns to prefer cheaper tools first).
+- **Signals.** `AbortController` tree spanning parent and children. `cancel(taskId)` and `cancelAll(parentRunId)` propagate. Runs can be paused and resumed, aborted cleanly, and emit lifecycle events for every transition.
+- **Memory management.** Working memory via structured compaction to a typed `WorkingState`. Long-term memory via an indexed, tag/query/status-searchable store with disk persistence. No vector database required by default.
+- **Durability.** Atomic per-iteration checkpoints, automatic emergency core-dump on SIGINT/SIGTERM, separate storage for runs, threads, conversations, activities, memories, and tasks.
+- **IPC.** Native A2A (Google agent-to-agent) and MCP (Anthropic Model Context Protocol) — both client and server, one SDK. An internal event bus with circuit breakers, file lock manager, and edit ownership tracking so concurrent agents do not stomp on each other.
+- **Capability system.** Tools are first-class, typed, permissioned, and progressively disclosed. The LLM does not see the full tool catalog; tools start deferred, get activated on demand, and can be suspended. Each tool declares `readOnly`, `destructive`, `concurrencySafe`, `permissions`, `category`.
+- **Syscall filtering.** Every tool call goes through a verification gate — allow / deny / ask, with built-in rules for read-only allowlist and dangerous pattern deny-list, plus custom regex rules. This is separate from sandbox isolation; it is the decision layer, the sandbox is the enforcement layer.
+- **Retrieval-augmented context (RAG).** A full pipeline: chunking, embedding providers, ingestion, knowledge base storage, vector store, retriever, context assembler, and a first-class `rag-tool`.
+- **Skills.** Disclosure-tiered capability bundles that the agent can load on demand, distinct from tools.
+- **Personas.** YAML-defined identity, expertise, reflexes, and output format with inheritance — specialize a base persona by merging a single field, no prompt concatenation.
+- **Advisory system.** Mid-execution consultation with specialized advisors. Provider-agnostic: put a security advisor on Bedrock, an architecture advisor on OpenRouter, and let the main agent decide when to consult whom.
+- **Human-in-the-loop.** Structured plan review, per-tool approval with destructiveness flags, typed decision contracts, checkpoint/resume across sessions.
+- **Plugin system.** Lifecycle-hooked plugin loader with MCP contributions, tool contributions, and manifest-driven resolution.
+- **Multi-tenant isolation from day one.** Connector registries, vaults, config, and stores are tenant-scoped. Two organizations can share a process without cross-contamination.
+- **Provider abstraction.** OpenRouter and AWS Bedrock today; the `Provider` interface is narrow enough that adding another vendor is an afternoon. BYOK everywhere, no hidden hot paths for any vendor.
+- **Telemetry.** OpenTelemetry-native spans and metrics. Cost accounting (input tokens, output tokens, cached tokens, cache write tokens, cache discount) flows from the provider into per-run, per-tenant rollups.
+- **Prompt cache integration.** Hash-based system-prompt cache per thread, integrated with provider cache controls (OpenRouter `cacheControl` today, more planned), plus full cache telemetry in every run.
+- **Vault.** BYOK credentials and secrets, tenant-scoped, pluggable backend.
+- **Thread / Run separation.** Conversations (thread: user ↔ assistant messages across sessions) are cleanly separated from runs (tool calls, iterations, internal state). Multi-turn dialogs carry only the context that matters.
+Every one of those bullets points at code that exists today in `src/`. The architecture is deep even where the surface is quiet.
+## What Namzu Is Not
+Equally important for scoping expectations:
+- **Not a chat SDK.** There are no React, Svelte, or Vue hooks, no generative UI components, no `useChat`. Your UI framework is your choice; the kernel hands you a typed event stream.
+- **Not a hosted service.** There is no dashboard, no Namzu Cloud, no billing page. You run it in your own process.
+- **Not a deployment adapter.** No Next.js, Hono, Express, or Cloudflare Workers plumbing in the kernel. Those belong in separate packages or your own infra code.
+- **Not a dev studio.** No bundled playground UI. A playground that consumes the kernel's event protocol could exist as a separate tool; it would not live inside `@namzu/sdk`.
+- **Not a vector database.** RAG ships with a pluggable `VectorStore` interface, but the kernel does not embed pgvector or Pinecone. Bring your own.
+- **Not an LLM router service.** Task routing is an in-process policy, not a hosted service.
+- **Not a prompt management UI.** Personas are code-defined (YAML files in your repo), not database rows behind a web form.
+The goal of that list is not to be minimal — the kernel is plenty rich. The goal is to keep the kernel's **interface surface** small and stable so the layers above can move fast without breaking what is underneath.
-**Multi-tenant from day one.** Most frameworks assume single-user, single-process. Namzu ships with tenant isolation, scoped connector registries, and environment-aware configuration. Building a platform where multiple teams run agents? That's the default, not an afterthought.
+---
-**Human-in-the-loop that actually works.** Not just "pause and wait for input." Namzu has structured plan review, per-tool approval with destructiveness flags, checkpoint/resume across sessions, and a decision framework that gives humans real control without breaking the agent loop.
+## The Complete Feature Map
+How Namzu compares, category by category. Framework category tells you what job the project actually does.
+| | **Namzu** | LangGraph | CrewAI | Mastra | Vercel AI SDK | OpenAI Agents SDK |
+|---|---|---|---|---|---|---|
+| Category | **Agent Kernel** | Graph framework | Crew framework | TS app framework | Frontend-first SDK | Vendor SDK |
+| Language | TypeScript | Python/JS | Python | TypeScript | TypeScript | Python/JS |
+| Process sandbox (OS-level) | ✅ Seatbelt + NS | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Multi-tenant from day 1 | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
+| Sub-agent spawn (`fork`/`exec`) | ✅ parent/child/depth/budget | via graph | crews | ✅ | ❌ | handoffs |
+| Signal propagation tree | ✅ AbortController + cancelAll | ❌ | ❌ | partial | ❌ | ❌ |
+| Checkpoint + resume | ✅ per-iteration | ✅ per-superstep | ❌ | partial | ❌ | sessions |
+| Emergency save on signal | ✅ `EmergencySaveManager` | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Resource quotas (token / cost / time) | ✅ per run + per child | manual | manual | manual | ❌ | manual |
+| Provider prompt cache wired | ✅ `ContextCache` + telemetry | ❌ | ❌ | partial | ❌ | ✅ |
+| Thread ↔ Run separation | ✅ | ❌ | ❌ | ✅ | ❌ | partial |
+| Native A2A protocol | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Native MCP (client + server) | ✅ | plugin | ❌ | ✅ | ❌ | client only |
+| RAG built into the kernel | ✅ full pipeline | via integrations | via integrations | plugin | ❌ | via tools |
+| Persona inheritance (YAML) | ✅ merge-based | ❌ | role strings | partial | ❌ | instructions |
+| Advisory system (multi-advisor) | ✅ provider-agnostic | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Structured context compaction | ✅ WorkingState | ❌ | ❌ | partial | ❌ | ❌ |
+| Tool tiering (cost-aware) | ✅ user-defined | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Task routing (per-task model) | ✅ fallback chains | manual | manual | manual | ❌ | manual |
+| Progressive tool disclosure | ✅ deferred/active/suspended | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Tool-call verification gate | ✅ allow/deny/ask + custom | ❌ | task-level scope | ❌ | tool approval | ❌ |
+| File ownership / edit locking | ✅ `EditOwnershipTracker` | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Circuit breakers on the bus | ✅ `CircuitBreaker` | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Skills system (separate from tools) | ✅ disclosure-tiered | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Plugin system with lifecycle | ✅ | ❌ | ❌ | partial | ❌ | ❌ |
+| Vault / BYOK | ✅ tenant-scoped | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Telemetry (OpenTelemetry) | ✅ native | via LangSmith | CrewAI+ | partial | ❌ | built-in tracing |
+| Provider lock-in | none | low | low | low | low | OpenAI-first |
-**Persona system with inheritance.** Define agent identity, expertise, reflexes, and output format as structured data. Specialize agents through inheritance — a base researcher persona becomes an ML researcher by merging one field. No prompt string concatenation.
+---
-**Progressive tool disclosure.** Agents don't see all tools at once. Tools start as deferred (searchable but not active), get activated on demand, and can be suspended. This keeps the LLM's context focused and reduces hallucinated tool calls.
+## Architecture in Depth — Every Subsystem
-**Advisory System.** A three-layer advisory architecture where agents can consult specialized advisors mid-execution. Unlike Anthropic's Advisor Tool (Claude-only, single advisor), Namzu's system is provider-agnostic (any model advises any model via BYOK), supports multi-advisor with domain routing, configurable triggers, and agent-autonomous consultation. Define a security advisor on Bedrock, an architecture advisor on OpenRouter, and let the agent decide when to consult whom.
+Every folder under `src/` maps to a traditional OS concept. This section walks them one by one, in the order a request actually flows.
-**Structured compaction.** When context reaches capacity, Namzu doesn't just truncate — it incrementally extracts structured data (task, plan, files, decisions, failures) into a typed WorkingState, serializes it as compact markdown, and replaces old messages. The agent continues with full context awareness at a fraction of the token cost.
+### 1. The Boundary: Sandbox (`sandbox/`)
-**Tool tiering.** Configurable tier system that teaches the LLM to prefer cheaper tools first. Unlike hardcoded approaches, Namzu's tiers are fully user-defined — bring your own tier labels, priorities, and guidance templates.
+Sandboxing is the foundation. Tools do not execute in the host process; they execute inside an OS-enforced jail with deny-default file I/O and scoped network access. macOS uses Seatbelt profiles in SBPL format (the same mechanism Apple's own apps use). Linux uses lightweight mount + PID namespaces, without requiring Docker, systemd, or any container runtime. The `SandboxProvider` abstraction (`sandbox/factory.ts`, `sandbox/provider/`) means you can swap in a Firecracker or gVisor-backed provider later without touching the rest of the kernel.
-**Task routing.** Route sub-tasks to different models based on task type. Compaction and summarization go to cheap models, coding stays on expensive ones. Provider-agnostic, configurable per task type, with automatic fallback chains.
+The kernel enforces memory, timeout, and max-process limits on top of whatever the sandbox gives you. The goal: a rogue or hallucinated tool call should never wipe your filesystem or exfiltrate arbitrary data, even if the LLM tries very hard.
-| | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
-|---|---|---|---|---|---|
-| Language | TypeScript | Python/JS | Python | Python/JS | TypeScript |
-| Provider lock-in | None (BYOK) | Low | Low | Optimized for OpenAI | Low |
-| **Process sandbox** | **Native (Seatbelt + NS)** | No | No | No | No |
-| Agent patterns | 4 (reactive, pipeline, router, supervisor) | Graph-based | Role-based crews | Handoffs | Single-agent |
-| A2A protocol | Native | No | No | No | No |
-| MCP support | Native (client + server) | Plugin | No | Client only | No |
-| Multi-tenant | Built-in | No | No | No | No |
-| Thread/Run separation | Yes | No | No | Sessions | No |
-| Plan review (HITL) | Structured | Graph interrupts | No | Basic | Tool approval |
-| Persona inheritance | Yes | No | Role strings | Instructions | System prompt |
-| Progressive tool loading | Yes | No | No | No | No |
-| Advisory system | Multi-advisor, provider-agnostic | No | No | No | No |
-| Context compaction | Structured WorkingState | No | No | No | No |
-| Tool tiering | Configurable, user-defined | No | No | No | No |
-| Task routing | Per-task model selection | No | No | No | No |
-| RAG built-in | Full pipeline | Via integrations | Via integrations | Via tools | No |
-| Telemetry | OpenTelemetry | LangSmith | CrewAI+ | Built-in tracing | No |
-### Architecture Quality Scores
-> **Note:** These scores are AI-generated based on public documentation, community feedback, and (for Namzu) direct codebase analysis. They are not official benchmarks — treat them as an informed architectural comparison, not a definitive ranking. We tried to be fair; if you disagree with a score, open an issue and let's discuss.
+### 2. Interprocess Communication: Bridge (`bridge/`) and Bus (`bus/`)
-| Criterion | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
-|---|---|---|---|---|---|
-| Type Safety | 9 | 5 | 7 | 7 | 9 |
-| Modularity | 9 | 5 | 7 | 8 | 9 |
-| Interface Segregation | 8 | 4 | 6 | 8 | 8 |
-| Extensibility | 9 | 7 | 6 | 6 | 7 |
-| Convention Consistency | 8 | 5 | 7 | 8 | 8 |
-| Dependency Direction | 9 | 4 | 6 | 8 | 8 |
-| **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
+Two layers here, with different jobs.
-**What the scores tell us:**
+**Bridge** is cross-process and cross-agent communication. The `bridge/a2a/` folder speaks Google's Agent-to-Agent protocol: your agents can publish agent cards describing their capabilities and can discover and invoke other agents' capabilities. The `bridge/mcp/` folder speaks Anthropic's Model Context Protocol, both as a client (consume MCP servers as tools) and as a server (expose your Namzu tools to any MCP-speaking agent). The `bridge/sse/` folder contains the event mapper that turns in-process events into Server-Sent Events for any consumer on the other side of HTTP. `bridge/tools/` wires it all into the tool system.
-- **Namzu** scores highest on type safety, extensibility, and dependency direction — its branded ID system, abstract base patterns, and clean acyclic module graph are genuine strengths.
+**Bus** is in-process. This is where the kernel's internal nervous system lives. The bus emits typed `AgentBusEvent`s for every meaningful transition: run started, iteration begun, checkpoint created, tool call dispatched, tool result returned, agent paused, agent canceled, plan requested, plan approved, error thrown. On top of raw event fan-out, the bus offers three kernel-grade primitives:
-- **LangChain/LangGraph** suffers from well-documented abstraction bloat and dependency issues. The community consistently reports difficulty debugging, deep coupling between modules, and frequent breaking changes.
+- **`CircuitBreaker`** (`bus/breaker.ts`) closes the bus to a flapping agent. If an agent's run keeps failing, the breaker trips and prevents retry storms. Configurable failure threshold and reset timeout.
+- **`FileLockManager`** (`bus/lock.ts`) holds locks on files across concurrent agents. A child cannot acquire a lock its parent or sibling already holds. Acquisition timeout is enforced.
+- **`EditOwnershipTracker`** (`bus/ownership.ts`) records which run last claimed ownership of a path, emits events on contention, and lets a HITL layer decide who wins. When two agents try to edit the same file, one of them is told to wait or re-plan.
-- **CrewAI** offers clean role-based design with Pydantic validation, but is Python-only with limited extensibility for custom agent patterns beyond crews. Security-conscious with task-level tool scoping.
+These exist because the moment you have more than one agent running in parallel against a shared filesystem, you need the kernel to arbitrate. Most frameworks either do not have parallelism or leave it to user space; Namzu treats it as a first-class kernel concern.
-- **OpenAI Agents SDK** is deliberately minimal — four primitives, clean design, good type safety in its TypeScript variant. Limited by a narrow scope: no RAG, no multi-tenant, no A2A. Optimized for OpenAI models.
+### 3. Process Lifecycle: Manager (`manager/`)
-- **Vercel AI SDK** has the strongest frontend integration and end-to-end type safety from server to client. Clean modular architecture. Focused on web/chat UIs rather than backend agent orchestration.
+`manager/agent/lifecycle.ts` is the `fork()` + `exec()` + `waitpid()` of the kernel. When a parent agent (say a `SupervisorAgent`) spawns a child, the lifecycle manager:
-**Where Namzu needs to improve:** Test coverage is the next priority — the focus so far has been getting the architecture and core abstractions right. Now that the foundation is solid, comprehensive testing is next. Contributions are very welcome here.
+- Allocates a slice of the parent's token budget, timeout budget, and cost budget to the child
+- Creates a child `AbortController` linked to the parent's
+- Builds a child config via the agent definition's `configBuilder(factoryOptions)`
+- Stamps the child with `parentAgentId`, `parentRunId`, `threadId`, and `depth`
+- Registers the child task in an internal `TaskRegistry` keyed by `TaskId`
+- Emits `agent_pending` on the bus with parent/child/depth metadata
+- Forwards every child event to the parent's run listener so the supervisor sees what its subtree is doing
-## What Can You Build?
+When the parent is cancelled — by HITL, by a limit breach, or by an external signal — `cancelAll(parentRunId)` walks the subtree and aborts every descendant. This is the Unix `SIGKILL` to a process group.
-Namzu is not a toy framework for chatbot demos. It's designed for real workloads — whether you're automating your homelab, streamlining business operations, or building a full agent platform.
+`manager/connector/` manages the lifecycle of external connectors (MCP servers, HTTP connectors). `manager/plan/lifecycle.ts` coordinates HITL plan review. `manager/run/persistence.ts` is the run-level persistence surface, and `manager/run/emergency.ts` is the emergency-save subsystem (see §9 below).
-### Personal & Homelab
+### 4. Scheduling: Router (`router/`), Execution (`execution/`), Limit Checker (`run/LimitChecker.ts`)
-**Your own AI assistant that actually does things.** Not just answers questions — executes. Connect it to your file system, your scripts, your local services via MCP, and let it manage your infrastructure through conversation.
+The router policy (`router/task-router.ts`) decides which model a task should go to. Compaction and summarization go to cheap models; coding and complex reasoning stay on expensive ones. Tiering is user-defined — you decide which models belong in which tier and what guidance the LLM gets about preferring tier-1 tools first.
-- **Home automation agent** — monitors logs, restarts services, runs health checks, alerts you when something breaks. Give it bash + read-file tools and a reactive loop.
-- **Personal research agent** — feeds documents into the RAG pipeline, builds a knowledge base from your notes/PDFs/bookmarks, and answers questions with citations from your own data.
-- **Code review agent** — watches your repos, reviews PRs with a pipeline agent (extract diff → analyze → write review), posts feedback automatically.
-- **Media organizer** — scans your library, categorizes files, renames based on metadata, deduplicates. A pipeline agent with file tools handles this end to end.
+The execution layer (`execution/base.ts`, `execution/local.ts`) is the concrete executor that invokes the provider, dispatches tool calls, and produces iteration results. Execution is pluggable; you could swap in a remote executor without touching the agent patterns above.
-### Business & Team
+The limit checker (`run/LimitChecker.ts`) is the kernel scheduler's enforcement point. Every iteration it checks: have we exceeded the token budget? The cost budget? The wall-clock timeout? The iteration count? Has the user issued an abort? If any is true, it returns a typed hard-stop decision — `cancelled`, `token_budget_exceeded`, `timeout`, `max_iterations` — and the run ends cleanly with a stop reason recorded in its metadata.
-**Agents that plug into your existing workflows.** Namzu's connector system and multi-tenant isolation mean you can deploy agents for different teams without them stepping on each other.
+### 5. The Runtime Query Path (`runtime/`)
-- **Customer support triage** — a router agent classifies incoming tickets and delegates to specialized agents (billing, technical, general). Each agent has its own persona, tools, and knowledge base.
-- **Document processing pipeline** — ingest contracts, invoices, or reports through RAG. Agents extract key data, flag anomalies, and generate summaries. Human-in-the-loop ensures nothing gets approved without review.
-- **Internal ops bot** — connects to your existing tools (Slack, Jira, databases) via HTTP connectors or MCP servers. Team members ask questions in natural language, the agent queries the right systems and responds.
-- **Compliance checker** — a supervisor agent coordinates specialized sub-agents that each check a different regulation. Results are aggregated, flagged items go through plan review before any action is taken.
+`runtime/query/` is where one iteration of the agent loop actually happens. The pieces:
-### Platform & SaaS
+- `runtime/query/context.ts` assembles the request context: system prompt, persona, skills, tools, messages.
+- `runtime/query/context-cache.ts` implements `ContextCache` — a hash-based system-prompt cache per thread. If the prompt inputs have not changed since last iteration, the cache returns the same text so provider-level prompt caching can hit.
+- `runtime/query/prompt.ts` owns `PromptBuilder` — structured, segment-based prompt assembly (static segment vs dynamic segment) that plays well with provider prompt caches.
+- `runtime/query/guard.ts` runs pre-dispatch guards on the request.
+- `runtime/query/executor.ts` actually calls the provider and streams the result.
+- `runtime/query/result.ts` normalizes the provider's response into the kernel's canonical shape.
+- `runtime/query/checkpoint.ts` writes the iteration's checkpoint.
+- `runtime/query/tooling.ts` bridges the iteration to the tool system, including progressive disclosure state.
+- `runtime/query/iteration/` contains the iteration machinery.
+- `runtime/query/plugin-hooks.ts` lets plugins observe and shape iterations.
+- `runtime/query/events.ts` emits the typed events that feed the bus.
-**Build a multi-tenant agent platform for your users.** This is what Namzu was designed for from the start.
+`runtime/decision/` (with `parser.ts` and `fallback.ts`) parses LLM decisions (tool calls vs final answer vs thinking vs advisory request) and falls back gracefully when the LLM returns malformed output.
-- **Agent-as-a-Service** — each customer gets isolated agents with their own API keys (BYOK), connector configs, and knowledge bases. Tenant isolation is built in, not bolted on.
-- **Agent marketplace** — define agents as portable definitions (info + tools + persona), publish them, let others deploy with their own keys and customize via persona inheritance.
-- **Cross-organization workflows** — agents from different companies discover each other via A2A agent cards and collaborate on shared tasks without a central authority.
+### 6. Memory Management: Compaction (`compaction/`) and Store (`store/`)
-## Install
+Memory in the kernel is two systems cooperating.
-```bash
-npm install @namzu/sdk
-```
+**Working memory** is `compaction/`. When a thread's context approaches the model's window, the kernel does not truncate. It runs the `structured` compaction manager (default in `compaction/managers/structured.ts`, with `slidingWindow.ts` and `null.ts` as alternatives), which incrementally extracts `task / plan / files / decisions / failures` from the message stream into a typed `WorkingState`. The extractor (`compaction/extractor.ts`), verifier (`compaction/verifier.ts`), and serializer (`compaction/serializer.ts`) together produce compact markdown that replaces old messages. The agent keeps context awareness at a fraction of the token cost. `compaction/dangling.ts` handles partial tool-call streams that could otherwise corrupt the conversation state.
-## Quick Start
+**Long-term memory** is `store/memory/`. The `MemoryIndex` (with `InMemoryMemoryIndex` as the default and a disk-backed variant) stores typed `MemoryIndexEntry` records, searchable by free-text query, tag set, and status filter. It persists to disk atomically. There is no required vector database — the default is good-old tag and text search. You can layer an embedding-backed index on top if you want, but the kernel does not assume it.
-```typescript
-import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
-import { z } from 'zod'
+Alongside memory, `store/` has sibling stores for every kernel concept: `store/run/` (runs, iterations, checkpoints), `store/conversation/` (threads and messages), `store/activity/` (activity log), `store/task/` (task registry), and an in-memory generic `InMemoryStore` for tests and ephemeral workloads.
-// Define a tool
-const searchWeb = defineTool({
-  name: 'search_web',
-  description: 'Search the web for information',
-  inputSchema: z.object({ query: z.string() }),
-  category: 'network',
-  permissions: ['network_access'],
-  readOnly: true,
-  destructive: false,
-  concurrencySafe: true,
-  execute: async ({ query }) => {
-    const results = await fetch(`https://api.search.com?q=${query}`)
-    return { success: true, output: await results.text() }
-  },
-})
+### 7. The Capability System: Tools (`tools/`) and Registry (`registry/`)
-// Create a provider (model is chosen per-run, not on the provider)
-const provider = ProviderFactory.createProvider({
-  type: 'openrouter',
-  apiKey: process.env.OPENROUTER_KEY!,
-})
+Tools in Namzu are first-class typed values, not JSON schemas you have to keep in sync with a handler somewhere else. `defineTool()` takes a Zod `inputSchema`, a Zod `outputSchema` (optional), and an `execute` function. It also takes **declarations** the kernel uses for routing and safety:
-// Register tools and build the agent
-const tools = new ToolRegistry()
-tools.register(searchWeb)
+- `category` — e.g. `network`, `filesystem`, `compute`, `memory`.
+- `permissions` — e.g. `network_access`, `write_filesystem`. Enforced at dispatch time.
+- `readOnly` — predicate over input; tools that only read get different treatment by the verification gate and tool tiering.
+- `destructive` — boolean flag that triggers HITL approval when true.
+- `concurrencySafe` — whether two concurrent runs can invoke this tool with no interference.
-const agent = new ReactiveAgent({
-  id: 'researcher',
-  name: 'Research Assistant',
-  version: '1.0.0',
-  category: 'research',
-  description: 'Finds and synthesizes information from the web',
-})
+`tools/builtins/` ships file I/O, shell, and glob-search tools. `tools/advisory/`, `tools/memory/`, `tools/task/`, and `tools/coordinator/` ship kernel-facing tools that let agents consult advisors, query memory, coordinate siblings, and manage their task registry from inside the agent loop.
-const result = await agent.run(
-  { messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
-  { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
-)
-```
+**Progressive disclosure** is unique to Namzu. Tools exist in three states — `deferred`, `activated`, `suspended`. The LLM does not see the full tool catalog; it sees the current active set plus a searchable summary of deferred tools. When it needs something specific, it activates it; when it is done, it suspends it. This keeps the context window focused, reduces hallucinated tool calls, and lets a single agent work across dozens of tools without drowning in a prompt.
-## Agent Types
+**Tool tiering** teaches the LLM a cost hierarchy. You define tiers ("tier-1: local", "tier-2: fast remote", "tier-3: expensive API"), each with its own guidance template, and the kernel instructs the LLM to prefer lower tiers first. Unlike hardcoded approaches, every label, priority, and template is yours.
-Namzu provides four agent orchestration patterns, each designed for different execution models.
+Registries (`registry/`) are the kernel's object tables. `registry/tool/` is the canonical tool catalog. `registry/agent/` holds agent definitions (the thing you can `AgentManager.spawn()`). `registry/connector/` holds connector catalogs. `registry/plugin/` holds plugins. `ManagedRegistry` is the shared base class with tenant scoping.
-### Reactive Agent
+### 8. The Decision Layer: Verification Gate (`verification/`)
-The core agentic loop. Sends messages to an LLM, executes tool calls, and iterates until the task is complete or a stop condition is hit (token budget, cost limit, timeout, max iterations).
+Before any tool call leaves the kernel, it goes through `verification/gate.ts`'s `VerificationGate`. Think of it as the kernel's seccomp — a rule-based decision layer that says *allow*, *deny*, or *ask*.
-```typescript
-import { ReactiveAgent } from '@namzu/sdk'
+Built-in rules:
-const agent = new ReactiveAgent({
-  id: 'solver',
-  name: 'Problem Solver',
-  version: '1.0.0',
-  category: 'analysis',
-  description: 'Analyzes data with LLM + tools',
-})
+- **`allow_read_only`** — if the tool's `readOnly(input)` returns true, allow.
+- **`deny_dangerous_patterns`** — if the input matches any pattern from `DANGEROUS_PATTERNS` (shell injection, common exfiltration signatures, etc.), deny.
+- **Custom regex rules** — per-tenant, per-agent, or global.
-const result = await agent.run(
-  { messages: [{ role: 'user', content: 'Analyze this dataset and find trends' }], workingDirectory: process.cwd() },
-  {
-    model: 'anthropic/claude-sonnet-4-20250514',
-    tokenBudget: 8192,
-    timeoutMs: 600_000,
-    provider,
-    tools,                          // ToolRegistry
-    systemPrompt: 'You are a data analyst.',
-  },
-)
-```
+The `ask` decision hands control to the HITL layer. The verification gate is the kernel layer that makes "destructive tool requires approval" a policy, not a user-space convention.
-### Pipeline Agent
+Verification is intentionally separate from the sandbox: verification is the *decision*, sandbox is the *enforcement*. If a rule fails to deny and a call somehow gets through, the sandbox is still there to contain the damage. Defense in depth, kernel-style.
-Deterministic, sequential step execution. Each step receives the output of the previous one. Supports rollback on failure.
+### 9. Durability: Checkpoints and Emergency Save
-```typescript
-import { PipelineAgent } from '@namzu/sdk'
+The kernel assumes processes crash. Two layers make sure that when they do, you do not lose the run.
-const etl = new PipelineAgent({
-  id: 'etl',
-  name: 'ETL Pipeline',
-  version: '1.0.0',
-  category: 'pipeline',
-  description: 'Extract → transform → load',
-})
+**Checkpoints** (`store/run/disk.ts`) are atomic per-iteration snapshots. Each `IterationCheckpoint` captures the run state at a super-step boundary — messages, working state, tool-call state, usage, cost, iteration index. Writes are atomic via write-temp-rename (Convention #8). You can read them, list them, and delete them. A future `Run.replay(runId, { fromCheckpoint })` API will build on top of this; the storage is already there.
-const result = await etl.run(
-  { messages: [], workingDirectory: process.cwd() },
-  {
-    model: 'anthropic/claude-sonnet-4-20250514',
-    tokenBudget: 8192,
-    timeoutMs: 600_000,
-    steps: [
-      { name: 'extract',   execute: async (inp, ctx) => await readSource('./data') },
-      { name: 'transform', execute: async (data, ctx) => normalize(data) },
-      { name: 'load',      execute: async (data, ctx) => await writeToDb(data) },
-    ],
-  },
-)
-```
+**Emergency save** (`manager/run/emergency.ts`) is the kernel's core-dump. `EmergencySaveManager` installs handlers for SIGINT and SIGTERM. When the process is dying, every active run gets its `toEmergencySnapshot()` flushed atomically to an `emergency/` directory. On the next boot you can inspect or resume the saved state. There is no reliance on the user remembering to catch signals; the kernel does it.
-### Router Agent
+Together these give Namzu durable execution without requiring a database. Runs resume across crashes, across reboots, across graceful shutdowns.
-Intelligent delegation. An LLM analyzes the input and routes it to the best-suited agent from a set of candidates.
+### 10. Retrieval-Augmented Generation: RAG (`rag/`)
-```typescript
-import { RouterAgent } from '@namzu/sdk'
+RAG is a full kernel subsystem, not a bolt-on. The pipeline:
-const router = new RouterAgent({
-  id: 'dispatcher',
-  name: 'Task Router',
-  version: '1.0.0',
-  category: 'routing',
-  description: 'Routes an input to the best-fit agent',
-})
+- `rag/chunking.ts` — text chunking strategies (configurable by `ChunkingConfig`).
+- `rag/embedding.ts` — the `EmbeddingProvider` abstraction. Providers are BYOK and swappable.
+- `rag/ingestion.ts` — end-to-end ingest: document → chunks → embeddings → vector store.
+- `rag/vector-store.ts` — the `VectorStore` interface, tenant-scoped via `TenantId`. Bring your own backend (pgvector, Pinecone, an in-memory impl for tests).
+- `rag/knowledge-base.ts` — a named collection of documents with metadata and config.
+- `rag/retriever.ts` — the retrieval query path with configurable top-k, threshold, and reranking.
+- `rag/context-assembler.ts` — turns retrieval hits into prompt-ready context windows.
+- `rag/rag-tool.ts` — a first-class tool your agent can invoke, not an external integration.
-const result = await router.run(
-  { messages: [{ role: 'user', content: 'Solve 2x + 3 = 11' }], workingDirectory: process.cwd() },
-  {
-    model: 'anthropic/claude-sonnet-4-20250514',
-    tokenBudget: 4096,
-    timeoutMs: 600_000,
-    provider,
-    routes: [
-      { agentId: 'math-solver', agent: mathAgent, description: 'Solves equations' },
-      { agentId: 'writer',      agent: writerAgent, description: 'Writes content' },
-    ],
-    fallbackAgentId: 'writer',
-  },
-)
-```
+RAG lives in the kernel because retrieval is a capability every non-trivial agent needs. Making you wire it up from plugins every time was not the right default.
-### Supervisor Agent
+### 11. Skills (`skills/`)
-Multi-agent coordinator. Manages child agents, delegates tasks, aggregates results, and tracks the full run hierarchy.
+Skills are disclosure-tiered capability bundles distinct from tools. A skill is a named body of knowledge, workflow, or policy that the agent can load on demand. `skills/loader.ts` reads them from disk; `skills/registry.ts` holds the active catalog; each skill has a `SkillDisclosureLevel` that decides when the LLM sees it (always visible, searchable-on-demand, explicit-activation-only). Skills and tools together form the two axes of an agent's capability surface.
-```typescript
-import { SupervisorAgent, AgentManager } from '@namzu/sdk'
+### 12. Personas (`persona/`)
-const supervisor = new SupervisorAgent({
-  id: 'lead',
-  name: 'Project Lead',
-  version: '1.0.0',
-  category: 'coordination',
-  description: 'Delegates sub-tasks to specialized agents',
-})
+Personas describe who an agent is. `persona/assembler.ts` loads them from YAML and composes them with inheritance: a base `researcher` persona defines identity, expertise areas, output format, and reflexes; an `ml-researcher` child merges a single field (`expertise: [...base, 'ML', 'PyTorch']`) and inherits everything else. The assembler produces a typed `AgentPersona` that flows into the prompt as a structured segment (not a string concatenation, not a template hack), so prompt-cache-friendliness is preserved.
-const result = await supervisor.run(
-  { messages: [{ role: 'user', content: 'Research, write, and review a Q3 report' }], workingDirectory: process.cwd() },
-  {
-    model: 'anthropic/claude-sonnet-4-20250514',
-    tokenBudget: 32_768,
-    timeoutMs: 1_800_000,
-    provider,
-    agentManager,                                        // resolves agent ids → implementations
-    agentIds: ['researcher', 'writer', 'reviewer'],
-    systemPrompt: 'You coordinate specialists. Decompose tasks, delegate, and synthesize results.',
-  },
-)
-// Child runs tracked via parent_run_id and depth
-```
+Personas are code-defined (YAML files in your repo). There is no database, no admin UI, no runtime mutation. That is deliberate: your agent's identity belongs in version control.
-## Tool System
+### 13. Advisory System (`advisory/`)
-Define tools with Zod schemas, permission declarations, and destructiveness flags. The SDK includes built-in tools for file I/O, shell commands, and glob search.
+An advisor is a specialized assistant a running agent can consult mid-execution. The main agent is solving a task; halfway through it hits a decision it is not confident about, or a domain it wants a second opinion on. It fires an advisory request with context; the advisory layer evaluates triggers, routes to the right advisor, executes on a (possibly different) provider, and returns a structured answer the main agent can act on.
-```typescript
-import { defineTool, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
-import { z } from 'zod'
+Pieces:
-const fetchApi = defineTool({
-  name: 'fetch_api',
-  description: 'Call an external API endpoint',
-  inputSchema: z.object({ url: z.string().url(), method: z.enum(['GET', 'POST']) }),
-  category: 'network',
-  permissions: ['network_access'],
-  readOnly: true,
-  destructive: false,
-  concurrencySafe: true,
-  execute: async ({ url, method }) => {
-    const resp = await fetch(url, { method })
-    return { success: true, output: await resp.text() }
-  },
-})
+- `advisory/registry.ts` — `AdvisorRegistry`, the catalog of available advisors keyed by domain.
+- `advisory/evaluator.ts` — `TriggerEvaluator`, decides whether an advisory should fire given context and config.
+- `advisory/executor.ts` — `AdvisoryExecutor`, runs the advisor, collects its output, and feeds it back.
+- `advisory/context.ts` — `AdvisoryContext`, the payload passed to advisors.
-// Tool registry with progressive activation
-const registry = new ToolRegistry()
-registry.register(getBuiltinTools(), 'deferred')
-registry.register(fetchApi, 'active')
+Unlike Anthropic's advisor tool (Claude-only, single advisor), Namzu's is **provider-agnostic** and **multi-advisor**: put a security advisor on Bedrock, an architecture advisor on OpenRouter, a legal advisor on Anthropic, and the agent decides who to consult. This is one of the things that most cleanly separates Namzu from the pack.
-// Agents can search deferred tools and activate on demand
-registry.activate(['read_file', 'bash'])
-const llmTools = registry.toLLMTools() // Only active + suspended tools
-```
+### 14. Human-in-the-Loop (`types/hitl/`, `manager/plan/lifecycle.ts`, `types/decision/`)
-Built-in tools: `ReadFileTool`, `WriteFileTool`, `EditTool`, `BashTool`, `GlobTool`, `GrepTool`, `LsTool`, `SearchToolsTool`
+HITL is structured, not just a "pause and wait for input" hook. The kernel defines typed decision contracts: the LLM produces a plan, the plan can be approved / edited / rejected, approval can be per-tool with explicit destructiveness acknowledgment, rejection can carry feedback that re-enters the loop as a new iteration. The plan lifecycle has its own manager so that pending plans persist across checkpoint resumes. The verification gate's `ask` decision routes into this same HITL layer.
-### Plugin Contributions
+The kernel does not render a UI for this — it emits events and exposes a typed API so the UI layer you choose can render them however you like.
-Plugins extend the runtime with tools, hooks, and MCP servers via a manifest. `PluginLifecycleManager.enable()` loads contributions on demand and rolls back cleanly on failure.
+### 15. Providers (`provider/`)
-```typescript
-import { PluginLifecycleManager } from '@namzu/sdk'
-const manager = new PluginLifecycleManager({ pluginRegistry, toolRegistry, log })
-const plugin = await manager.install('/path/to/plugin', 'project')
-await manager.enable(plugin.id)
-// → manifest.tools registered as `${plugin}:${tool}` (deferred)
-// → manifest.hooks attached for run_start/end, iteration_start/end,
-//   pre/post_llm_call, pre/post_tool_use
-// → manifest.mcpServers connected via stdio; their tools registered as
-//   `${plugin}:mcp__${server}__${tool}` (deferred)
-```
-Hook handlers can return `continue`, `modify` (rewrite tool input), `skip` (synthesize a tool result), or `error` (fail the run). Modify actions compose — chained hooks each see the previous hook's modified input. The runtime emits `plugin_hook_executing` / `plugin_hook_completed` events around every handler.
+An LLM provider implements a narrow interface: given a typed request, return a typed response (streaming or not) and propagate normalized usage, cost, and cache telemetry. Today `provider/openrouter/` and `provider/bedrock/` are in the box; adding another vendor is adding one directory. `provider/telemetry/` normalizes provider-specific response fields (OpenRouter's `cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`, Bedrock's equivalents) into a single kernel-wide telemetry shape.
-### Sandbox-Aware Execution
+`ProviderFactory` is the single entry point. Every run chooses its provider by name; the provider object itself is stateless enough to be shared across runs.
-File and shell built-ins (`ReadFileTool`, `WriteFileTool`, `EditTool`, `BashTool`) route through `sandbox.exec()` / `sandbox.readFile()` / `sandbox.writeFile()` when a sandbox is present in the execution context, and fall back to native operations when not. Use `query()` (streaming generator) with a `ToolRegistry` and a `sandboxProvider`:
+### 16. Connectors (`connector/`)
-```typescript
-import { drainQuery, ToolRegistry, getBuiltinTools } from '@namzu/sdk'
+A connector is how an agent reaches external systems. `connector/BaseConnector.ts` is the abstract base; `connector/mcp/` implements MCP connectors in both `stdio` and `http` transports with a `client.ts` and an `adapter.ts` that turns MCP tools into Namzu `ToolDefinition`s; `connector/builtins/` ships the built-in connectors (HTTP, shell, etc.); `connector/execution/` handles connector-level execution concerns. Plugin contributions can register connectors at runtime.
-const tools = new ToolRegistry()
-tools.register(getBuiltinTools(), 'active')
+### 17. Prompt Cache Integration
-// With sandbox: file + shell tool calls are isolated to the agent workspace
-const result = await drainQuery({
-  agentId: 'solver', agentName: 'Solver', threadId,
-  provider, tools, runConfig, messages, resumeHandler,
-  sandboxProvider,
-})
-```
+The kernel takes prompt caching seriously because token cost is the number-one production constraint for agents. `runtime/query/context-cache.ts` maintains a per-thread `ContextCache` that hashes the inputs (system prompt + persona + skills + tools + base prompt) and only rebuilds when the hash changes. When the provider supports cache controls (OpenRouter's `cacheControl` parameter today, Anthropic and Bedrock cache headers in progress), the kernel attaches them, and the response's cache telemetry (`cache_read_input_tokens`, `cache_creation_input_tokens`, `cache_discount`) flows back into the run's usage metrics.
-## Sandbox
+This is why `PromptBuilder` splits a request into static and dynamic segments: the static segment is the cache target, and the kernel does the bookkeeping to keep it stable across iterations so the cache actually hits.
-Process-level isolation for agent tool execution. No Docker, no containers — native OS mechanisms.
+### 18. Vault (`vault/`)
-```typescript
-import { drainQuery, SandboxProviderFactory, ToolRegistry, getBuiltinTools, getRootLogger } from '@namzu/sdk'
+The vault holds BYOK credentials and arbitrary secrets. `InMemoryCredentialVault` is the default backend; the `CredentialVault` interface lets you plug in your own. Credentials are tenant-scoped — tenant A cannot see tenant B's keys. Tools, providers, and connectors resolve credentials through the vault rather than reading environment variables directly, so you can rotate without redeploying and you can audit who accessed what.
-const sandboxProvider = SandboxProviderFactory.create(
-  { enabled: true, provider: 'local', timeoutMs: 60_000, memoryLimitMb: 512, maxProcesses: 16, cleanupOnDestroy: true },
-  getRootLogger(),
-)
+### 19. Telemetry (`telemetry/`)
-const tools = new ToolRegistry()
-tools.register(getBuiltinTools(), 'active')
-const result = await drainQuery({
-  agentId: 'coder', agentName: 'Coder', threadId,
-  provider, tools, runConfig,
-  messages: [{ role: 'user', content: 'Write a Python script and run it' }],
-  resumeHandler,
-  sandboxProvider,                                      // sandbox-aware tools opt in here
-})
-```
+OpenTelemetry-native. `telemetry/attributes.ts` defines the canonical attribute keys; `telemetry/metrics.ts` defines the kernel's metrics surface. Every iteration, every tool call, every provider call emits spans with consistent attributes: `run.id`, `thread.id`, `agent.id`, `tenant.id`, `tool.name`, `provider.name`, `model`, `usage.input_tokens`, `usage.output_tokens`, `usage.cached_tokens`, `cost.usd`. Wire your existing OTel collector, or pipe to LangSmith / Langfuse / Braintrust via their OTel adapters.
-**How it works:**
+### 20. Plugin System (`plugin/`)
-| Platform | Mechanism | Profile |
-|----------|-----------|---------|
-| macOS | `sandbox-exec` with Seatbelt (SBPL) | Deny-default, allow-back for agent workspace |
-| Linux | Namespace isolation | Process + filesystem isolation |
+Plugins extend the kernel at runtime. A plugin manifest declares what it contributes (tools, MCP servers, advisors, connectors), and the kernel's `plugin/loader.ts` reads manifests from disk, `plugin/resolver.ts` namespaces everything safely, and `plugin/lifecycle.ts` hooks plugin init / shutdown into the kernel's own lifecycle. Plugins can subscribe to iteration hooks via `runtime/query/plugin-hooks.ts` and shape what the LLM sees.
-The sandbox creates a temporary workspace directory, restricts file I/O to that directory, and destroys everything on cleanup. The seatbelt profile is minimal by design:
+Plugins are how a community ecosystem grows around the kernel without the kernel having to ship batteries for every use case.
-- **Deny-default** — nothing is allowed unless explicitly granted
-- **Workspace-scoped I/O** — reads and writes only within the agent's `rootDir`
-- **Path canonicalization** — resolves macOS symlinks (`/var` → `/private/var`) so seatbelt rules match real paths
-- **Process isolation** — `same-sandbox` scope for signals and process info
-- **Automatic lifecycle** — sandbox is created before query iteration, destroyed in `finally`
+### 21. Gateway (`gateway/`)
-```typescript
-// Direct sandbox API (low-level)
-const sandbox = await sandboxProvider.create({
-  workingDirectory: process.cwd(),
-  timeoutMs: 30_000,
-  memoryLimitMb: 512,
-  maxProcesses: 16,
-})
+`gateway/local.ts` is the local-process gateway — a thin translation layer between an external caller (HTTP, WebSocket, stdin, another agent over A2A) and the kernel's run API. Put a real HTTP server in front of it and you have an agent service; wrap it in a CLI and you have an agent shell. The gateway is where your application layer plugs into the kernel.
-const result = await sandbox.exec('/bin/sh', ['-c', 'echo hello'], { timeoutMs: 5_000 })
-console.log(result.stdout)  // "hello\n"
-console.log(result.exitCode) // 0
+### 22. Agent Patterns (`agents/`)
-await sandbox.writeFile('script.py', 'print("namzu")')
-const content = await sandbox.readFile('script.py')
+Four patterns ship in the kernel. They are not mandatory — you can write your own `AbstractAgent` subclass for custom loops — but these are the shapes most real workloads want.
-await sandbox.destroy() // Cleanup workspace
-```
+- **`ReactiveAgent`** — the canonical agent loop. Prompt → LLM → tool call(s) → iterate → stop. Handles token budget, cost limit, timeout, max iterations, HITL injection, progressive tool disclosure, compaction, and checkpointing automatically.
+- **`PipelineAgent`** — deterministic sequential steps. Each step is a typed function; output of step N is input of step N+1. Rolls back on failure. Useful for ETL, RAG ingestion, multi-stage document processing.
+- **`RouterAgent`** — an LLM classifies the input and delegates to the best-suited agent from a configured set of candidates, with a fallback. Useful for intent routing in customer support, dispatcher bots, and multi-expert systems.
+- **`SupervisorAgent`** — a coordinator that spawns and orchestrates a set of specialized child agents. Tracks the full parent/child/depth hierarchy, aggregates results, handles partial failures, and honors the shared budget tracker.
-## Providers
+All four sit on top of the same lifecycle manager, the same limit checker, the same bus, the same verification gate. Switching patterns does not change what safety or durability the kernel provides.
-Pluggable LLM backends with a unified interface for chat, streaming, and model discovery.
+### 23. Multi-Tenant Isolation
-The provider is constructed once with credentials; the model is selected per chat/run so you can swap models without rebuilding the client.
+Every registry, every store, every vault is tenant-scoped. `TenantId` is a branded ID threaded through the kernel's types. A run for tenant A cannot accidentally read tenant B's knowledge base, invoke tenant B's tools, or resolve tenant B's credentials. This is not a feature you turn on — it is the default, and a single-tenant setup is just a special case.
-```typescript
-import { ProviderFactory } from '@namzu/sdk'
+### 24. Thread / Run Separation
-// OpenRouter (BYOK)
-const openrouter = ProviderFactory.createProvider({
-  type: 'openrouter',
-  apiKey: process.env.OPENROUTER_KEY!,
-})
+A **thread** is a conversation: a series of user ↔ assistant messages, possibly spanning many sessions, probably spanning many days. A **run** is a single execution pass: an input, iterations, tool calls, usage, cost, result. One thread has many runs. Most frameworks conflate the two; Namzu keeps them explicit, with separate stores, separate IDs, and separate serialization. Multi-turn dialogs carry only the context the kernel thinks matters (via compaction), and run traces stay auditable without drowning in prior-turn tool chatter.
-// AWS Bedrock
-const bedrock = ProviderFactory.createProvider({
-  type: 'bedrock',
-  region: 'us-east-1',
-})
+---
-// Streaming — model is part of the per-call params
-for await (const chunk of openrouter.chatStream({
-  model: 'anthropic/claude-sonnet-4-20250514',
-  messages: [{ role: 'user', content: 'hi' }],
-})) {
-  process.stdout.write(chunk.delta?.content ?? '')
-}
+## Install
-// Model discovery
-const models = await openrouter.listModels()
+```bash
+npm install @namzu/sdk
 ```
-## RAG
+Requirements: Node ≥ 22, TypeScript strict mode, ESM.
-End-to-end retrieval-augmented generation: chunk documents, embed them, store and search vectors, and inject context into agent prompts.
+## Quick Start
 ```typescript
-import {
-  TextChunker,
-  OpenRouterEmbeddingProvider,
-  InMemoryVectorStore,
-  DefaultRetriever,
-  DefaultKnowledgeBase,
-  createRAGTool,
-} from '@namzu/sdk'
-// Chunking (fixed-size, sentence, paragraph, or recursive)
-const chunker = new TextChunker()
-const chunks = chunker.chunk(document, {
-  strategy: 'recursive',
-  chunkSize: 512,
-  chunkOverlap: 64,
+import { defineTool, ProviderFactory, ReactiveAgent, ToolRegistry } from '@namzu/sdk'
+import { z } from 'zod'
+const searchWeb = defineTool({
+  name: 'search_web',
+  description: 'Search the web for information',
+  inputSchema: z.object({ query: z.string() }),
+  category: 'network',
+  permissions: ['network_access'],
+  readOnly: true,
+  destructive: false,
+  concurrencySafe: true,
+  execute: async ({ query }) => {
+    const r = await fetch(`https://api.search.com?q=${query}`)
+    return { success: true, output: await r.text() }
+  },
 })
-// Embedding
-const embedder = new OpenRouterEmbeddingProvider({
-  model: 'openai/text-embedding-3-small',
+const provider = ProviderFactory.createProvider({
+  type: 'openrouter',
   apiKey: process.env.OPENROUTER_KEY!,
 })
-// Vector store and retriever
-const vectorStore = new InMemoryVectorStore()
-const retriever = new DefaultRetriever(vectorStore, embedder)
-// Knowledge base — pass (config, vectorStore, embeddingProvider)
-const kb = new DefaultKnowledgeBase(
-  { id: 'docs', name: 'API Guides', tenantId: 'default' },
-  vectorStore,
-  embedder,
-)
-await kb.ingest(apiDoc, { title: 'API Guide', source: 'doc-1' })
-const results = await kb.query({ text: 'How do I authenticate?', config: { topK: 5 } })
+const tools = new ToolRegistry()
+tools.register(searchWeb)
-// Attach to agent as a tool
-const ragTool = createRAGTool({
-  knowledgeBases: new Map([['docs', kb]]),
-  defaultKnowledgeBaseId: 'docs',
+const agent = new ReactiveAgent({
+  id: 'researcher',
+  name: 'Research Assistant',
+  version: '1.0.0',
+  category: 'research',
+  description: 'Finds and synthesizes information',
 })
-```
-## Connectors
-Unified framework for integrating external services — HTTP APIs, webhooks, and MCP servers — with execution contexts and multi-tenant isolation.
-```typescript
-import {
-  HttpConnector,
-  ConnectorManager,
-  ConnectorRegistry,
-  MCPClient,
-  MCPConnectorBridge,
-  TenantConnectorManager,
-} from '@namzu/sdk'
-// HTTP connector — configure via connect()
-const slack = new HttpConnector()
-await slack.connect(
-  { id: 'slack', baseUrl: 'https://slack.com/api' },
-  { type: 'bearer', token: process.env.SLACK_TOKEN! },
+const result = await agent.run(
+  { messages: [{ role: 'user', content: 'Summarize the latest LLM benchmarks' }], workingDirectory: process.cwd() },
+  { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192, timeoutMs: 600_000, provider, tools },
 )
-// MCP client (stdio or HTTP-SSE transport)
-const mcpClient = new MCPClient({
-  serverName: 'my-tools',
-  transport: { type: 'stdio', command: 'node', args: ['server.js'] },
-})
-await mcpClient.connect()
-const tools = await mcpClient.listTools()
-const result = await mcpClient.callTool('my_tool', { input: 'value' })
-// Bridge MCP as a connector so connector-based code paths can reach it
-const connectorManager = new ConnectorManager({ registry: new ConnectorRegistry() })
-const mcpBridge = new MCPConnectorBridge({ manager: connectorManager })
-const discoveredTools = await mcpBridge.listTools()
-await mcpBridge.callTool('my_tool', { input: 'value' })
-// Multi-tenant isolation
-const tenantManager = new TenantConnectorManager({ registry: new ConnectorRegistry() })
-tenantManager.registerTenant({ tenantId: 'org-123', name: 'Org 123' })
 ```
-MCP servers can also be declared in a plugin manifest (`mcpServers: [{ name, command, args, env }]`). The plugin lifecycle starts each server on enable, discovers its tools, and registers them under the plugin namespace. Disable disconnects the clients before unregistering the tools.
-## Human-in-the-Loop
-Pause agent execution for human review of plans and tool calls. Checkpoint and resume runs across sessions.
+That is a complete, sandbox-isolated, checkpointed, telemetrized agent run with prompt caching, progressive tool disclosure, structured compaction, and emergency save all wired in by default. Those are not features you enable; they are how the kernel runs.
-Plan approval and tool review are separate handlers wired at different points:
+Examples for `PipelineAgent`, `RouterAgent`, and `SupervisorAgent` are in `src/agents/`.
-```typescript
-import { PlanManager, drainQuery, autoApproveHandler } from '@namzu/sdk'
-import type { ResumeHandler } from '@namzu/sdk'
-// 1. Plan approval — runs when the agent produces a plan
-const planManager = new PlanManager(runId, async (request) => {
-  const decision = await showPlanUI(request)
-  return {
-    approved: decision.approved,
-    feedback: decision.feedback,
-    modifiedSteps: decision.editedSteps,
-  }
-})
-// 2. Tool review — runs for every pending tool call (required by query/drainQuery)
-const resumeHandler: ResumeHandler = async (request) => {
-  if (request.type === 'tool_review') {
-    const hasDestructive = request.toolCalls.some((t) => t.isDestructive)
-    return hasDestructive
-      ? { action: 'reject_tools', feedback: 'Destructive tool blocked' }
-      : { action: 'approve_tools' }
-  }
-  if (request.type === 'plan_approval') {
-    return { action: 'approve_plan' }
-  }
-  return { action: 'continue' }
-}
-await drainQuery({ /* ...runConfig, provider, tools, messages, */ resumeHandler })
-```
-Checkpoint/resume enables long-running agents to pause and restart without losing state (`CheckpointManager`, `checkpointId` in `QueryParams`).
-## A2A Protocol
-Agent-to-Agent protocol support for cross-platform agent interoperability. Publish agent cards, accept A2A messages, and bridge between Namzu runs and A2A tasks.
-```typescript
-import { buildAgentCard, runToA2ATask, a2aMessageToCreateRun } from '@namzu/sdk'
-// Publish agent capabilities as an A2A Agent Card
-const card = buildAgentCard(agentInfo, {
-  baseUrl: 'https://api.example.com',
-  transport: 'rest',
-  providerOrganization: 'Cogitave',
-})
-// Serve at /.well-known/agent-card.json
-// Convert an inbound A2A message-send into run creation params
-const runParams = a2aMessageToCreateRun(agentId, {
-  message: a2aMessage,
-  contextId: a2aMessage.contextId,
-  metadata: { model: 'anthropic/claude-sonnet-4-20250514', tokenBudget: 8192 },
-})
-// Convert a persisted Run (wire type) + thread messages into an A2A task response
-const a2aTask = runToA2ATask(run, threadMessages)
-```
-## Streaming (SSE)
-Map internal agent execution events to Server-Sent Events for real-time client updates.
-Agents emit `RunEvent`s through the listener passed to `run()` / `drainQuery()`. `mapRunToStreamEvent` translates those into SSE-ready `{ event, data }` tuples (returns `null` for events without a wire mapping, which you should skip):
-```typescript
-import { mapRunToStreamEvent, drainQuery } from '@namzu/sdk'
-// Event families: run.*, iteration.*, tool.*, token.*, message.*, review.*,
-// checkpoint.*, activity.*, plan.*, agent.*, task.*, plugin.*, sandbox.*
-const listener = (event) => {
-  const mapped = mapRunToStreamEvent(event, runId)
-  if (!mapped) return
-  response.write(`event: ${mapped.wire}\ndata: ${JSON.stringify(mapped.data)}\n\n`)
-}
-await drainQuery({ /* ...runConfig, provider, tools, messages */ }, listener)
-```
-## Persona System
-Layer-based system prompt assembly with inheritance. Define identity, expertise, reflexes, and output format as structured data.
-```typescript
-import { assembleSystemPrompt, mergePersonas, withSessionContext } from '@namzu/sdk'
-const basePersona = {
-  identity: { role: 'Research Agent', description: 'Gathers and synthesizes information' },
-  expertise: { domains: ['academic research', 'data analysis'] },
-  reflexes: { constraints: ['Always cite sources', 'Be concise'] },
-  output: { format: 'markdown' },
-}
-// Specialize via inheritance
-const mlResearcher = mergePersonas(basePersona, {
-  expertise: { domains: ['machine learning', 'NLP'] },
-})
+---
-// Assemble final system prompt with skills injected
-const systemPrompt = assembleSystemPrompt(mlResearcher, loadedSkills)
-```
+## Design Principles
-## Skills System
+Five choices shape every decision in the kernel.
-Reusable agent behaviors with progressive disclosure (metadata-only → full body) and inheritance chains.
+**No workarounds. Fix at the root.** When something is wrong, we fix the pattern, not the symptom. A subtle bug in the lifecycle manager means the lifecycle manager changes — we do not paper over it in the agent pattern that calls it.
-```typescript
-import { SkillRegistry, resolveSkillChain } from '@namzu/sdk'
+**Type safety is the foundation.** Every resource ID is branded (`RunId`, `ThreadId`, `TaskId`, `TenantId`, `AgentId`, `ToolId`, `MemoryId`, `ChunkId`...). Every discriminated union has exhaustiveness checks. Every public API has Zod-validated inputs at the boundary. The TypeScript compiler is not a formality; it is the first line of defense.
-const registry = new SkillRegistry()
-await registry.registerAll('/path/to/skills', 'metadata')
+**Deny by default. Fail fast.** Sandboxes deny file I/O by default. Verification gates deny tool calls by default unless a rule allows them. Limit checkers fail the run the moment a budget is breached. Configuration errors throw at boot, not at the 90-minute mark of a long-running job.
-// Load full skill content on demand — returns SkillLoadResult | undefined
-const loaded = await registry.load('web-search', 'full')
-const skill = loaded?.skill
-// Resolve inheritance: shared skills + agent-specific overrides
-const chain = await resolveSkillChain(
-  '/skills/shared',
-  '/skills/agent-specific',
-  'metadata',
-)
-```
+**Dependency direction is sacred.** `contracts` knows nothing about `sdk`. `sdk` knows nothing about `agents` or `api`. Circular dependencies are a compile error, not a code-review suggestion. This is what keeps the kernel's interface surface small even as its guts grow.
-## Threads & Conversations
+**Convention over surprise.** Every new feature follows a shared pattern language — Registries, Managers, Stores, Runs, Bridges, Providers. You read one subsystem, you can navigate the next one.
-Namzu separates **threads** (clean user↔assistant conversation history) from **runs** (full execution traces with tool calls, iterations, and internal state). This means multi-turn conversations carry only the relevant context — no tool noise leaking between runs.
-```typescript
-import { InMemoryConversationStore } from '@namzu/sdk'
-const store = new InMemoryConversationStore({ maxMessages: 50 })
+---
-// Start a thread
-store.createThread('thd_abc123')
-store.addUserMessage('thd_abc123', 'What is the capital of France?')
+## The Agent Event Protocol (AEP)
-// After a run completes, persist only the final assistant response
-store.persistRunResult('thd_abc123', runId, runMessages)
+The kernel's contract with the outside world is a typed, versioned event stream. Any UI, any shell, any observability tool subscribes to AEP and renders what it wants.
-// Next run loads clean conversation history (no tool calls, no system messages)
-const history = store.loadMessages('thd_abc123')
-// → [{ role: 'user', content: '...' }, { role: 'assistant', content: '...' }]
-```
+AEP flows over three transports:
-The `ConversationStore` interface is pluggable — swap in SQLite, Postgres, or any backend. `InMemoryConversationStore` is bundled for non-persistent use; applications wire it into the runtime themselves.
+- **Bus** (`bus/`) — in-process, for tightly-coupled consumers.
+- **SSE** (`bridge/sse/mapper.ts`) — cross-process over HTTP, for web UIs and remote observers.
+- **A2A** (`bridge/a2a/`) — cross-agent, for multi-agent meshes.
-## Persistence
+Every transport emits the same event shape. Event types include run lifecycle (`run_started`, `run_paused`, `run_completed`), iteration events (`iteration_started`, `checkpoint_created`), tool events (`tool_called`, `tool_result`), agent events (`agent_pending`, `agent_canceled`), plan events (`plan_requested`, `plan_approved`), advisory events, and error events. They carry consistent metadata: `runId`, `threadId`, `agentId`, `tenantId`, `timestamp`, `depth`, `parentRunId`.
-In-memory and disk-backed stores for runs, tasks, conversations, and activities.
+AEP v1 is being finalized. Until the spec is stamped, treat the event shapes as semver-minor.
-```typescript
-import { RunPersistence, DiskTaskStore, getRootLogger } from '@namzu/sdk'
-// Run persistence with token/cost tracking
-const persistence = new RunPersistence({
-  runId,
-  agentId: 'researcher',
-  agentName: 'Research Assistant',
-  providerId: 'openrouter',
-  outputDir: './runs',
-  runConfig: {
-    model: 'anthropic/claude-sonnet-4-20250514',
-    tokenBudget: 8192,
-    timeoutMs: 600_000,
-    temperature: 0.7,
-  },
-  log: getRootLogger(),
-})
-await persistence.init()
-persistence.accumulateUsage({
-  promptTokens: 100,
-  completionTokens: 50,
-  totalTokens: 150,
-})
-await persistence.persist()
-// Task store with atomic writes (tenant-aware)
-const taskStore = new DiskTaskStore({
-  baseDir: './tasks',
-  defaultRunId: runId,
-  tenantId: 'org-1',
-})
-```
-## Telemetry
+---
-OpenTelemetry integration for distributed tracing and metrics across agents, tools, and providers.
+## What You Can Build
-```typescript
-import { initTelemetry, getTracer, createPlatformMetrics } from '@namzu/sdk'
+Namzu is not a toy. It is meant for real workloads.
-const telemetry = initTelemetry({
-  serviceName: 'agent-platform',
-  exporterType: 'otlp',
-  otlpEndpoint: 'http://localhost:4318',
-  otlpHeaders: { authorization: `Bearer ${process.env.OTLP_TOKEN!}` },
-})
-await telemetry.start()
+**Personal and homelab.** A home-automation agent monitoring logs, restarting services, running health checks. A personal research agent feeding PDFs and notes through the RAG pipeline into a knowledge base, answering with citations from your own data. A code-review agent watching your repos, reviewing PRs with a `PipelineAgent` (extract diff → analyze → write review), and posting feedback automatically. A media organizer scanning your library, categorizing files, renaming based on metadata, deduplicating.
-const tracer = getTracer()
-const metrics = createPlatformMetrics()
+**Business and team.** A customer-support triage system where a `RouterAgent` classifies incoming tickets and delegates to specialized children (billing, technical, general), each with its own persona, tools, and knowledge base. A document-processing pipeline ingesting contracts, invoices, and reports through RAG, extracting key data, flagging anomalies, generating summaries, with HITL approval for anything destructive. An internal-ops bot that plugs into Slack, Jira, and your database over MCP. A compliance checker where a `SupervisorAgent` coordinates sub-agents each checking a different regulation, then aggregates results and routes flagged items through plan review.
-const span = tracer.startSpan('agent.run')
-metrics.recordTokenUsage('anthropic/claude-sonnet-4-20250514', 100, 50)
-metrics.recordToolCall('search_web', true)
-span.end()
+**Platform and SaaS.** This is the shape Namzu was designed for from day one. Agent-as-a-Service — each customer gets isolated agents with their own BYOK keys, connector configs, and knowledge bases; tenant isolation is built in, not bolted on. An agent marketplace — agents are portable definitions (`info + tools + persona + skills`), publishable, deployable by any customer with their own keys, specializable through persona inheritance. Cross-organization workflows where agents from different companies discover each other via A2A agent cards and collaborate without a central authority.
-metrics.recordRunDuration('completed', 12.4)
-```
+---
-## Architecture
+## Quality Bar
-```
-@namzu/sdk
-├── advisory/        Advisor registry, execution, trigger evaluation
-├── agents/          Reactive, Pipeline, Router, Supervisor
-├── bridge/          A2A, SSE, connector→tool adapters
-├── bus/             Agent bus and coordination primitives
-├── compaction/      WorkingState extraction and conversation compaction
-├── config/          Runtime configuration with Zod schemas
-├── connector/       HTTP, webhook, MCP client/server, tenant isolation
-├── constants/       Shared SDK constants
-├── contracts/       External wire types and validation schemas (HTTP/A2A/SSE)
-├── execution/       Base and local execution contexts
-├── gateway/         Local task gateway
-├── manager/         Plan, agent, connector, run lifecycle
-├── persona/         System prompt assembly and merging
-├── plugin/          Manifest discovery, lifecycle, contributions, hooks
-├── provider/        OpenRouter, Bedrock, Mock LLM providers
-├── rag/             Chunking, embedding, vector store, retrieval
-├── registry/        Base, managed, agent, connector, tool, plugin registries
-├── router/          Task→model routing
-├── run/             Reporters and limit checking
-├── runtime/         Query engine, iteration phases, decision parser
-├── sandbox/         Process-level isolation (Seatbelt, namespace)
-├── skills/          Skill registry, discovery, and chaining
-├── store/           In-memory, disk, conversation, activity, task, memory
-├── telemetry/       OpenTelemetry tracing and metrics
-├── tools/           defineTool, built-ins, task / advisory / memory tools
-├── types/           Domain model and internal type definitions
-├── utils/           ID generation, cost calc, hashing, logging, shell
-├── vault/           Credential management
-└── verification/    Verification gate and rules
-```
+On architectural fundamentals, Namzu scores at the top of open-source agent frameworks.
-## Vision
+| Criterion | Namzu | LangChain/LangGraph | CrewAI | OpenAI Agents SDK | Vercel AI SDK |
+|---|---|---|---|---|---|
+| Type Safety | 9 | 5 | 7 | 7 | 9 |
+| Modularity | 9 | 5 | 7 | 8 | 9 |
+| Interface Segregation | 8 | 4 | 6 | 8 | 8 |
+| Extensibility | 9 | 7 | 6 | 6 | 7 |
+| Convention Consistency | 8 | 5 | 7 | 8 | 8 |
+| Dependency Direction | 9 | 4 | 6 | 8 | 8 |
+| **Overall** | **8.7** | **5.0** | **6.5** | **7.5** | **8.2** |
-AI agents shouldn't be locked behind walled gardens. Today, building production-grade agents means choosing a platform and accepting its constraints — its models, its pricing, its rules. We believe agent infrastructure should be open, composable, and owned by the people who build on it.
+Scores are informed from public docs, community reports, and direct codebase analysis — not a definitive ranking. Where we know we have work to do: test coverage is not where the architecture deserves. Helping close that gap is the highest-leverage contribution today.
-Namzu exists to make that real. A single SDK that works with any LLM, any tool ecosystem, any deployment model. No vendor lock-in. No surprise pricing changes. No permission needed.
+---
-### Where we're headed
+## Roadmap
-**Now** — Core SDK with multi-agent orchestration, tool system, RAG, MCP, and A2A support. Everything you need to build and run agents locally or on your own infrastructure.
+Honest view. The kernel is already deep. The next three releases tighten the consumer surface, add the subsystems that are genuinely missing, and extend the driver model to new I/O shapes.
-**Next** — Managed runtime for deploying agents at scale, conversational agent builder (build, configure, and deploy agents entirely through chat), and a marketplace for sharing agent definitions and tool connectors.
+### v0.2 — Surface Polish (short, mostly wiring + docs)
-**Later** — Decentralized agent network where agents discover and collaborate with each other across organizations via A2A, without a central authority.
+- `Run.replay(runId, { fromCheckpoint })` API on top of the existing checkpoint store
+- Memory promotion pipeline connecting compaction output to the indexed memory store via a Reflector persona
+- **AEP v1 spec** — version and document the event shapes in `bridge/sse/mapper.ts`
+- Public pattern docs for lifecycle, checkpoints, emergency save, budget / quota, verification gate, context cache, file ownership, and circuit breaker
+- `ContextCache` generalized across providers (OpenRouter today → Anthropic, Bedrock next)
-We're building this in the open because we believe the agent layer of the stack should belong to everyone. If you share this belief, come build with us.
+### v0.3 — New Subsystems (the four genuinely missing pieces)
-## License
+- **Workflow / process-graph DSL** — typed `step / branch / parallel / loop / hitl` builder, durable on top of the existing checkpoint and lifecycle
+- **Evaluation subsystem** — `Dataset` + `Scorer` + `Experiment` primitives with a `namzu eval run` CLI, model-graded / rule-based / statistical scorers, SCD-2 versioning
+- **Content-level guardrails** — a second policy layer next to the verification gate, covering LLM I/O (PII, prompt injection, output schema, toxicity) with per-tenant and per-tool attachment
+- **Semantic cache** and **prompt compression** as opt-in additions next to the existing `ContextCache`
-This software is licensed under the [Functional Source License, Version 1.1, MIT Future License (FSL-1.1-MIT)](./LICENSE.md).
+### v0.4 — Drivers and I/O (extending the driver model)
-**What this means:**
+- **Voice driver** — unified STT / TTS provider abstraction, duplex streaming, real-time speech-to-speech
+- **Multimodal tool I/O** — MIME-typed binary handles for image, audio, and video inputs and outputs
+- **Computer-use driver** — reference implementation with its own sandbox profile
+- **Deterministic provider replay** — cassette pattern for eval and CI, separate from run-level checkpoints
-- **Free for internal use, education, and research**
-- **Free to use in your own products** (as long as you're not building a competing agent platform)
-- **Each version converts to MIT after 2 years** — fully open source, no strings attached
+### Explicitly out of scope (community or separate packages)
-Enterprise licensing is available for organizations that need to build competing products or services. Contact us at enterprise@cogitave.com.
+- `@namzu/react`, `@namzu/svelte`, `@namzu/vue` chat hooks
+- Next.js / Hono / Cloudflare Workers adapters
+- A dev studio playground (would consume AEP, lives in its own repo)
+- A visual observability dashboard in the style of VoltOps or LangSmith
-## Contributing
+These are valuable — they belong on top of the kernel, not in it. Keeping the kernel's interface surface small is why the kernel can move fast.
-We welcome contributions! Please read our contributing guidelines (coming soon) before submitting a PR.
+---
-## Security
+## License and Vision
-Found a vulnerability? Please report it responsibly. See [SECURITY.md](./SECURITY.md) for details.
+[FSL-1.1-MIT](./LICENSE.md). Every version becomes fully MIT two years after release.
----
+The vision: an open, community-driven agent kernel that reduces systemic dependencies on proprietary platforms — so everyone can build, own, and run AI agents freely. Namzu works with any LLM provider through BYOK, runs in isolation without container orchestration, and surfaces a stable protocol so the application layer stays yours.
-Built by [@bahadirarda](https://github.com/bahadirarda) · [Cogitave](https://github.com/Cogitave)
+If that resonates, we would love your help. Bug reports, feature ideas, PRs, a kind word on your blog — all of it matters. The fastest way in is to pick a subsystem from `src/` that looks interesting, read its code, and open an issue or a PR.