npm - @miller-tech/uap - Versions diffs - 1.40.0 → 1.41.0 - Mend

@miller-tech/uap 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (150) hide show

package/README.md +109 -642
package/dist/.tsbuildinfo +1 -1
package/dist/cli/deliver-defaults.d.ts +23 -0
package/dist/cli/deliver-defaults.d.ts.map +1 -0
package/dist/cli/deliver-defaults.js +121 -0
package/dist/cli/deliver-defaults.js.map +1 -0
package/dist/cli/init.d.ts.map +1 -1
package/dist/cli/init.js +29 -0
package/dist/cli/init.js.map +1 -1
package/dist/cli/setup.d.ts.map +1 -1
package/dist/cli/setup.js +19 -0
package/dist/cli/setup.js.map +1 -1
package/dist/policies/policy-tools.d.ts +7 -0
package/dist/policies/policy-tools.d.ts.map +1 -1
package/dist/policies/policy-tools.js +24 -2
package/dist/policies/policy-tools.js.map +1 -1
package/docs/INDEX.md +48 -286
package/docs/architecture/OVERVIEW.md +328 -0
package/docs/architecture/PROTOCOL.md +204 -0
package/docs/benchmarks/README.md +17 -192
package/docs/getting-started/CONFIGURATION.md +237 -0
package/docs/getting-started/INSTALLATION.md +125 -0
package/docs/getting-started/QUICKSTART.md +115 -0
package/docs/guides/COORDINATION.md +162 -0
package/docs/guides/DELIVER.md +115 -0
package/docs/guides/DEPLOY_BATCHING.md +212 -0
package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
package/docs/guides/LOCAL_MODELS.md +148 -0
package/docs/guides/MCP_ROUTER.md +195 -0
package/docs/guides/MEMORY.md +235 -0
package/docs/guides/MULTI_MODEL.md +223 -0
package/docs/guides/POLICIES.md +190 -0
package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
package/docs/integrations/MCP_ROUTER.md +147 -0
package/docs/integrations/RTK.md +102 -0
package/docs/reference/API.md +485 -0
package/docs/reference/CLI.md +719 -0
package/docs/reference/CONFIGURATION.md +90 -193
package/docs/reference/DATABASE_SCHEMA.md +110 -344
package/docs/reference/FEATURES.md +176 -472
package/docs/reference/PATTERNS.md +102 -0
package/docs/reference/PLATFORMS.md +83 -0
package/package.json +3 -1
package/src/policies/enforcers/7ebbc721-7540-4e9f-879a-770e0213a09b_architecture_review.py +101 -0
package/src/policies/enforcers/__pycache__/_common.cpython-312.pyc +0 -0
package/src/policies/enforcers/_common.py +100 -0
package/src/policies/enforcers/artifact_hygiene.py +52 -0
package/src/policies/enforcers/cluster_routing.py +63 -0
package/src/policies/enforcers/codebase_read_before_plan.py +52 -0
package/src/policies/enforcers/coord_overlap.py +81 -0
package/src/policies/enforcers/delivery_enforcement.py +97 -0
package/src/policies/enforcers/doc_live_over_report.py +50 -0
package/src/policies/enforcers/expert_review_required.py +135 -0
package/src/policies/enforcers/iac_parity.py +53 -0
package/src/policies/enforcers/mcp_router_first.py +37 -0
package/src/policies/enforcers/memory_before_plan.py +61 -0
package/src/policies/enforcers/parallel_reads.py +50 -0
package/src/policies/enforcers/rtk_wrap.py +44 -0
package/src/policies/enforcers/schema_diff_gate.py +80 -0
package/src/policies/enforcers/session_memory_write.py +52 -0
package/src/policies/enforcers/task_required.py +131 -0
package/src/policies/enforcers/test_gate.py +58 -0
package/src/policies/enforcers/validate_plan_before_build.py +75 -0
package/src/policies/enforcers/worktree_required.py +57 -0
package/src/policies/schemas/policies/architecture-review.md +51 -0
package/src/policies/schemas/policies/artifact-hygiene.md +29 -0
package/src/policies/schemas/policies/cluster-routing.md +31 -0
package/src/policies/schemas/policies/codebase-read-before-plan.md +30 -0
package/src/policies/schemas/policies/coord-overlap.md +24 -0
package/src/policies/schemas/policies/delivery-enforcement.md +45 -0
package/src/policies/schemas/policies/doc-live-over-report.md +32 -0
package/src/policies/schemas/policies/expert-review-required.md +60 -0
package/src/policies/schemas/policies/iac-parity.md +31 -0
package/src/policies/schemas/policies/mandatory-testing-deployment.md +147 -0
package/src/policies/schemas/policies/mcp-router-first.md +24 -0
package/src/policies/schemas/policies/memory-before-plan.md +24 -0
package/src/policies/schemas/policies/merge-deploy-monitor-verify.md +145 -0
package/src/policies/schemas/policies/parallel-reads.md +24 -0
package/src/policies/schemas/policies/rtk-wrap.md +26 -0
package/src/policies/schemas/policies/schema-diff-gate.md +30 -0
package/src/policies/schemas/policies/session-memory-write.md +24 -0
package/src/policies/schemas/policies/task-required.md +49 -0
package/src/policies/schemas/policies/test-gate.md +24 -0
package/src/policies/schemas/policies/validate-plan-before-build.md +28 -0
package/src/policies/schemas/policies/worktree-required.md +28 -0
package/templates/hooks/uap-policy-gate.sh +5 -0
package/docs/AGENTS.md +0 -423
package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
package/docs/GETTING_STARTED.md +0 -288
package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
package/docs/architecture/EXPERT_STACK.md +0 -137
package/docs/architecture/MULTI_MODEL.md +0 -224
package/docs/architecture/PLATFORM_GATING.md +0 -68
package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
package/docs/architecture/UAP_COMPLIANCE.md +0 -217
package/docs/architecture/UAP_PROTOCOL.md +0 -339
package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
package/docs/archive/opencode-integration-guide.md +0 -740
package/docs/archive/opencode-integration-quickref.md +0 -180
package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
package/docs/blog/local-coding-agents.md +0 -266
package/docs/blog/x-thread.md +0 -254
package/docs/deployment/DEPLOYMENT.md +0 -895
package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
package/docs/deployment/DEPLOY_BATCHING.md +0 -273
package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
package/docs/getting-started/INTEGRATION.md +0 -628
package/docs/getting-started/OVERVIEW.md +0 -324
package/docs/getting-started/SETUP.md +0 -377
package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
package/docs/integrations/RTK_INTEGRATION.md +0 -468
package/docs/operations/TROUBLESHOOTING.md +0 -660
package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
package/docs/pr/UPSTREAM_PRS.md +0 -424
package/docs/reference/API_REFERENCE.md +0 -903
package/docs/reference/EXPERT_DROIDS.md +0 -219
package/docs/reference/HARNESS-MATRIX.md +0 -318
package/docs/reference/PATTERN_LIBRARY.md +0 -636
package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
package/docs/research/DOMAIN_STRATEGIES.md +0 -316
package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217

package/docs/architecture/OVERVIEW.md ADDED Viewed

@@ -0,0 +1,328 @@
+# UAP Architecture Overview
+`v1.40.0` · 168 TypeScript modules across 18 `src/` subsystems · 117 test suites
+The Universal Agent Protocol (UAP) is a layer that sits **underneath** an AI
+coding agent's harness — Claude Code, Factory, Cursor, OpenCode, Codex, and
+others. It does not replace the model or the harness. Instead it installs
+**hooks** that intercept the harness's tool calls, then mediates each call
+through three services — memory injection, policy enforcement, and tool-output
+compression — before handing control back. On top of that mediation layer it
+ships a rich CLI for memory, delivery, worktrees, tasks, deployment, and
+multi-model routing.
+This document describes the system architecture. For the normative
+harness↔UAP contract, see [PROTOCOL.md](PROTOCOL.md).
+---
+## The hook-mediation model
+A bare agent harness calls a tool (Edit, Write, Bash, a spawned sub-agent, an
+MCP tool) and the model sees the raw result. UAP inserts itself between the
+harness and the tool by registering hooks at the harness's interception points:
+- **Claude Code / VSCode / Factory / Cursor** — `PreToolUse` hooks
+- **OpenCode** — the `tool.execute.before` plugin hook
+- **Codex** — gating via the UAP MCP server (`execute_tool`)
+- **Hermes** — the `pre_tool_call` event
+The same logical lifecycle runs on every harness:
+```
+                          ┌──────────────────────────────────────┐
+                          │            AGENT HARNESS              │
+                          │   (Claude Code / Factory / OpenCode)  │
+                          └───────────────┬──────────────────────┘
+                                          │ tool call
+                                          ▼
+              ┌────────────────────── UAP HOOK LAYER ──────────────────────┐
+              │                                                             │
+   session    │   SessionStart hook                                        │
+   start ─────┼─▶  • inject last-24h memory  (<uap-context> … )            │
+              │    • clean stale agents / work claims                       │
+              │                                                             │
+   per tool   │   PreToolUse / tool.execute.before hook                     │
+   call ──────┼─▶  ┌───────────┐   ┌────────────┐   ┌────────────────────┐ │
+              │    │  MEMORY   │──▶│   POLICY   │──▶│     MCP ROUTER     │ │
+              │    │ injection │   │   gates    │   │ token compression  │ │
+              │    └───────────┘   └─────┬──────┘   └─────────┬──────────┘ │
+              │                          │ deny (exit 2)      │            │
+              │                          ▼                    ▼            │
+              │                     BLOCK call          compressed result  │
+              └─────────────────────────────────────────────┼────────────┘
+                                          │ allow            │
+                                          ▼                  ▼
+                                   ┌──────────────────────────────┐
+                                   │       THE ACTUAL TOOL        │
+                                   │   (fs / shell / MCP server)  │
+                                   └──────────────────────────────┘
+```
+Hooks are **fail-open for context** (memory injection never blocks) and
+**fail-closed for safety** (a required policy violation returns exit code 2 and
+the harness aborts the call). Hook scripts are generated and installed by
+`uap hooks install` (`src/cli/hooks.ts`) from templates in `templates/hooks/`.
+---
+## Component map
+```
+src/
+├── memory/         4-tier memory: working, session, semantic (Qdrant), graph
+├── mcp-router/     hierarchical MCP router — tool hiding + FTS5 output compression
+├── policies/       hook-based policy gates + 20 Python enforcers
+├── delivery/       `uap deliver` convergence loop (15 modules)
+├── coordination/   multi-agent registry, overlap detection, deploy batching
+├── models/         multi-model routing, planning, execution profiles
+├── tasks/          dependency-aware task tracker (SQLite, DAG)
+├── dashboard/      live task / agent / memory / policy visualization
+├── observability/  HALO / OpenInference span export for harness analysis
+├── analyzers/      project structure analysis + metadata generation
+├── generators/     CLAUDE.md / config generation
+├── benchmarks/     Terminal-Bench harness + scoring
+├── browser/        cloaked browser automation for agents
+├── telemetry/      run telemetry
+├── models/…        (see above)
+├── bin/            CLI entry (cli.ts), policy bin, llama-server-optimize
+├── cli/            ~35 command modules wired into bin/cli.ts
+├── types/          shared types
+└── utils/          logging and shared helpers
+```
+---
+## Subsystems
+### Memory (`src/memory/`)
+A four-tier memory system that gives the agent persistent context across
+sessions. The tiers (`src/memory/README.md`):
+| Tier | Backend | Purpose |
+|------|---------|---------|
+| **L1 Working** | SQLite `memories` table (~50 cap, FTS5) | recent actions |
+| **L2 Session** | SQLite `session_memories` table (FTS5) | current-session context, "open loops" |
+| **L3 Semantic** | Qdrant, 768-dim vectors | long-term learnings, semantic recall |
+| **L4 Knowledge** | SQLite entity/relationship graph | entity relationships, N-hop traversal |
+Embeddings (`src/memory/embeddings.ts`) use **`nomic-embed-text-v2-moe`
+(768-dim)** via a llama-server `--embeddings` endpoint, with fallbacks down a
+chain (Ollama `nomic-embed-text` → OpenAI `text-embedding-3-small` → local
+`all-MiniLM-L6-v2` → TF-IDF). The provider is pluggable and cached
+(SHA-256-keyed LRU).
+Key modules:
+- `hierarchical-memory.ts` — in-memory hot/warm/cold tier manager with
+  auto promote/demote, time-decay importance, and token-budget enforcement,
+  persisted to its own SQLite DB.
+- `dynamic-retrieval.ts` — the per-task orchestrator: classifies the task,
+  sets adaptive retrieval depth + token budget, queries all sources, dedups,
+  compresses, and formats the final context block.
+- `memory-consolidator.ts` — summarizes working entries into session memory,
+  extracts lessons, and dedups by content hash + embedding similarity.
+- `write-gate.ts` — a quality filter that scores candidate memories and only
+  persists those above threshold (prevents memory pollution).
+- `knowledge-graph.ts` — the L4 graph: upsert entities, strengthen
+  relationships, recursive-CTE traversal.
+- `context-compressor.ts` / `semantic-compression.ts` — token budgeting and
+  distillation of context into atomic typed facts.
+- `predictive-memory.ts` / `speculative-cache.ts` — prefetch likely-needed
+  memories before they are queried.
+- `task-classifier.ts` — classifies an instruction to drive retrieval hints.
+- `model-router.ts` — benchmark-fingerprint LLM routing with feedback learning
+  (consumed by `src/models/unified-router.ts`).
+### MCP Router (`src/mcp-router/`)
+A hierarchical Model Context Protocol router that achieves large token savings
+by two independent mechanisms (`src/mcp-router/server.ts`,
+`output-compressor.ts`):
+1. **Tool hiding.** Instead of exposing 150+ downstream MCP tool schemas
+   (~500 tokens each) to the model, the router exposes just three meta-tools —
+   `discover_tools`, `execute_tool`, `deliver`. The model issues a
+   natural-language `discover_tools` query, gets back matching tool paths, then
+   calls `execute_tool({path, args})`. Downstream tools live in an in-memory
+   fuzzy search index and are never surfaced as definitions.
+2. **Output compression (FTS5).** `execute_tool` accepts an `intent`. Large
+   tool output is chunked, indexed into an in-memory SQLite **FTS5** virtual
+   table, queried with the intent using **BM25 ranking**, and only the top
+   matching snippets (plus a searchable-vocabulary footer) are returned. Small
+   outputs pass through unchanged; very large outputs without an intent are
+   head+tail truncated.
+The design target documented in source is ~75,000 tokens of tool definitions
+collapsed to ~700 (98%+). Per-output FTS5 savings are computed live per call.
+See [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) for setup.
+### Policies (`src/policies/`)
+Project guidelines expressed as **executable hook gates** rather than prose.
+Two layers:
+- **TypeScript middleware** — `policy-gate.ts` (`PolicyGate.executeWithGates`)
+  is the in-process gate used by the MCP server's `execute_tool`. It loads
+  REQUIRED policies from a SQLite store (`policies.db`), evaluates keyword /
+  anti-pattern rules against the operation, and throws `PolicyViolationError`
+  on a REQUIRED violation. Stages: `pre-exec | post-exec | review | always`;
+  completion/merge/deploy operations auto-force a `review` stage.
+- **Shell gate + Python enforcers** — `templates/hooks/uap-policy-gate.sh`
+  binds to harness hook events and invokes the ~20 enforcers in
+  `src/policies/enforcers/`. A blocked verdict is **exit code 2** (hard block).
+Enforcers cover the worktree gate (`worktree_required.py`), task discipline
+(`task_required.py`), delivery routing (`delivery_enforcement.py`), test deltas
+(`test_gate.py`), expert review (`expert_review_required.py`), schema diffs
+(`schema_diff_gate.py`), memory-before-plan, MCP-router-first, RTK wrapping, and
+more. Levels: **REQUIRED** blocks, **RECOMMENDED** logs, **OPTIONAL** informs.
+### Delivery — `uap deliver` (`src/delivery/`)
+A 15-module convergence loop that drives an underlying model against the
+project's **real** completion gates until the work actually passes — the
+mechanism behind UAP's "agents stop declaring victory on broken code." See the
+[deliver flow](#how-uap-deliver-orchestrates) below.
+### Coordination (`src/coordination/`)
+Lets multiple agents work the same repo without colliding. A singleton SQLite
+DB (`database.ts`) backs an agent registry, work announcements, work claims,
+inter-agent messages, and a deploy queue. `service.ts` detects **overlap** when
+agents announce work on the same files and suggests merge order;
+`deploy-batcher.ts` queues git/CI actions with per-type batch windows
+(commit 30s, push 5s, merge 10s, deploy 60s), folds/squashes similar pending
+actions, and executes batches sequentially or in parallel.
+`expert-orchestrator.ts` builds an ordered expert-droid chain across the
+`plan → design → implement → review → release` lifecycle, drawing implement
+droids from `capability-router.ts`. `pattern-router.ts` matches tasks to
+Terminal-Bench patterns (always enforcing **P12** Output Existence and
+**P35** Decoder-First).
+### Models (`src/models/`)
+Multi-model routing. `router.ts` classifies a task (complexity + type from
+keyword scoring) and `selectModel()` picks a model per the routing strategy —
+`performance-first`, `cost-optimized`, `adaptive`, or `balanced` (default,
+which walks priority-ordered routing rules). `unified-router.ts` layers a
+benchmark signal on top: it returns a consensus when the rule-based and
+benchmark routers agree, otherwise trusts the benchmark router only when it has
+enough data. `planner.ts` decomposes a task into a subtask DAG and assigns a
+model per subtask; `executor.ts` runs the plan level-by-level with retries and
+fallback; `execution-profiles.ts` tunes *how* the chosen model runs
+(temperature, budgets); `analytics.ts` records outcomes so routing improves.
+### Tasks (`src/tasks/`)
+A dependency-aware task tracker (a Beads alternative) backed by SQLite
+(`tasks`, `task_dependencies`, `task_history`, `task_activity`,
+`task_summaries`). Tasks form a DAG; closing a task transitions its dependents
+from `blocked` to `open` and emits events on an in-process bus (`event-bus.ts`).
+`coordination.ts` bridges tasks to the multi-agent coordination layer (claim /
+release with overlap detection); `decoder-gate.ts` implements the P35
+Decoder-First pre-execution validator (droid schema, tool availability,
+claim conflicts, worktree requirement, ambiguity).
+### Dashboard (`src/dashboard/`)
+A live visualization layer (`uap dashboard`) over tasks, agents, memory,
+policies, models, and benchmark/session history, with an event stream for
+real-time updates.
+### Observability (`src/observability/`)
+Emits HALO / OpenInference spans for delivery runs and tool calls, consumed by
+`uap harness analyze` to optimize agent execution from real traces.
+---
+## How a tool call flows: memory → policy → MCP Router
+For a representative `execute_tool` call routed through the UAP MCP server:
+```
+1. Tool call arrives at the PreToolUse hook.
+2. MEMORY  — On session start, recent memory is injected as <uap-context>.
+             Per task, dynamic-retrieval has already surfaced relevant
+             long-term learnings into the prompt. (Read/Query stage.)
+3. POLICY  — PolicyGate.executeWithGates loads REQUIRED policies and checks
+             the operation + args. A REQUIRED violation → PolicyViolationError
+             (or exit 2 from the shell enforcer) → the harness ABORTS the call.
+             Otherwise the call proceeds.
+4. ROUTER  — execute_tool resolves the tool path, dispatches to the downstream
+             MCP client (or an expert droid), and captures the raw result.
+5. COMPRESS— output-compressor indexes large output into FTS5 and returns only
+             the top BM25 snippets for the call's `intent`, plus a vocabulary
+             footer. The model sees a compact result, not the full payload.
+6. RECORD  — The agent records the observation to short-term memory; the
+             consolidator later promotes significant lessons to long-term
+             memory through the write-gate. (Record/Promote stage.)
+```
+The four-stage **read → query → act → record → promote** loop is the agent
+decision loop defined normatively in [PROTOCOL.md](PROTOCOL.md#agent-decision-loop).
+---
+## How `uap deliver` orchestrates
+`uap deliver "<instruction>"` runs `ConvergenceLoop.deliver()`
+(`src/delivery/convergence-loop.ts`). The staged flow:
+```
+detect gates ──▶ baseline check ──▶ protect files ──▶ ╔═════ turn loop ═════╗
+(verifier-       (already green?    (snapshot tests/   ║  build prompt       ║
+ ladder reads     skip, no model    oracle + integrity ║  EXECUTE model      ║
+ package.json)    call)             guard)             ║  APPLY file blocks  ║
+                                                       ║  VERIFY (ladder)    ║
+                                                       ║  passed? ─▶ done    ║
+                                                       ║  else: CRITIC +     ║
+                                                       ║        ESCALATE     ║
+                                                       ╚═════════╤═══════════╝
+                                                                 │ repeat until
+                                                                 ▼ gates pass or
+                                                                   budget spent
+```
+- **Verifier ladder** (`verifier-ladder.ts`) — derives gate rungs from
+  `package.json` scripts (build → typecheck via `tsc --noEmit` → test → lint),
+  runs each as a real command in a secret-stripped env, fails fast on required
+  rungs. "Delivered" means all *required* rungs pass; lint is optional.
+- **Explorer + ideation** (`explorer.ts`, `ideation.ts`) — best-of-N: generate
+  N candidates with distinct strategy seeds, apply/verify/rollback each on the
+  same baseline, commit only the winner. A model **judge** (`judge.ts`)
+  tie-breaks candidates with equal gate scores.
+- **Critic** (`critic.ts`) — turns a failed turn's gate output into a
+  file-scoped repair plan fed into the next turn (not a raw compiler dump).
+- **Escalation** (`escalation.ts`) — on score stagnation, climbs a cost ladder:
+  widen exploration → enable critic → switch to a stronger model + raise budget.
+- **Protection** (`spec-imports.ts`, `integrity.ts`, `applier.ts`) — protects
+  pre-existing test/spec files and their transitive oracle imports, and
+  byte-verifies protected files after each gate run, so the model can't satisfy
+  a spec by rewriting what it asserts against.
+- **Auto / optimize** (`auto-optimizer.ts`) — by default, the run classifies
+  task complexity and enables the matching aids automatically; `--optimize`
+  enables every aid at once.
+- **Coordination + observability** (`run-coordinator.ts`, `halo-trace.ts`) —
+  optionally registers the run as a coordination agent, heartbeats, queues
+  applied files into the deploy batcher, and emits HALO spans.
+The default model preset is `qwen35-a3b`; `--until-delivered` (on by default)
+keeps extending the turn budget while the best score is improving, up to a
+ceiling (default 30, hard cap 50), and stops once progress stalls.
+---
+## See also
+- [PROTOCOL.md](PROTOCOL.md) — the harness↔UAP contract and agent loop
+- [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) — MCP Router setup
+- [../integrations/RTK.md](../integrations/RTK.md) — RTK (Rust Token Killer)
+- [../../CONTRIBUTING.md](../../CONTRIBUTING.md) — development workflow

package/docs/architecture/PROTOCOL.md ADDED Viewed

@@ -0,0 +1,204 @@
+# The UAP Protocol
+`v1.40.0`
+This document specifies the **Universal Agent Protocol** itself: the contract
+between an AI agent harness and the UAP layer beneath it. It is normative —
+where it says *MUST* / *SHOULD* / *MAY*, those carry their usual meaning. For an
+architectural tour of the components that implement this contract, see
+[OVERVIEW.md](OVERVIEW.md).
+UAP is not a wire protocol. It is a **convention enforced by hooks**: a small
+set of interception points the harness exposes, a defined hook lifecycle, an
+agent decision loop, and a set of gates that block work which violates the
+contract. Any harness that can run a hook before tool execution can host UAP.
+---
+## 1. The harness ↔ UAP contract
+A conforming harness MUST provide UAP with:
+1. **A session-start interception point** — a place to run a script when an
+   agent session begins, whose stdout is injected into the agent's context.
+2. **A pre-tool-use interception point** — a place to run a script *before*
+   each tool call, where a non-zero exit (specifically **exit code 2**) aborts
+   the call.
+UAP, in return, guarantees:
+- Memory injection is **advisory and fail-open** — if hydration fails, the
+  session proceeds without it. It never blocks an agent.
+- Policy enforcement is **fail-closed for REQUIRED policies** — a REQUIRED
+  violation blocks the call. RECOMMENDED / OPTIONAL policies only log.
+- Tool output routed through the MCP Router is **compressed, not altered in
+  meaning** — the model receives a faithful, smaller view of the same result.
+### Supported interception points
+| Harness | Session start | Pre-tool-use |
+|---------|---------------|--------------|
+| Claude Code / VSCode | `SessionStart` | `PreToolUse` |
+| Factory | `SessionStart` | `PreToolUse` |
+| Cursor | hooks.json | `preToolUse` |
+| OpenCode | plugin | `tool.execute.before` |
+| Codex | AGENTS.md / MCP | gating via UAP MCP `execute_tool` |
+| Hermes | — | `pre_tool_call` |
+Hook scripts are generated and installed by `uap hooks install`
+(`src/cli/hooks.ts`) from `templates/hooks/`. Verify coverage with
+`uap hooks doctor`.
+---
+## 2. Hook lifecycle
+### 2.1 Session start
+On session start the harness runs the session-start hook, which MUST:
+1. **Inject memory.** Query the short-term store (last-24h top memories plus
+   open "session" loops of type action/goal/decision with importance ≥ 7) and
+   emit it wrapped in a `<uap-context>` block on stdout. The harness places
+   this in the agent's context.
+2. **Clean stale state.** Deregister dead agents and release abandoned work
+   claims so coordination state stays accurate.
+The hook is **self-healing and fail-open**: it auto-creates missing
+coordination/memory DBs and never exits non-zero. Its output is advisory
+context, not a gate.
+### 2.2 Pre-tool-use
+Before each tool call the harness runs the pre-tool-use hook, which runs the
+relevant gates for that tool. Conceptually:
+```
+pre-tool-use(tool, args):
+    if tool in {Edit, Write, MultiEdit}:
+        run worktree gate            # path MUST be under .worktrees/
+    if tool == Bash:
+        run dangerous-command guard  # block terraform apply, force-push, ...
+    run policy gate (DB-driven)      # REQUIRED policies for this tool
+    if any gate denies:
+        exit 2                       # harness ABORTS the call
+    else:
+        allow                        # call proceeds (optionally via MCP Router)
+```
+A gate verdict is binary — **allow** or **deny**. There is no "modify" verdict.
+A denied call returns **exit code 2** (shell enforcers) or throws
+`PolicyViolationError` (in-process MCP gate). Post-tool-use hooks MAY run a
+build gate or backup reminder; pre/post-compact hooks preserve protocol context
+across context compaction; the stop hook runs a completion checklist.
+---
+## 3. Agent decision loop
+A conforming agent SHOULD execute each task through this loop. The TypeScript
+implementation lives in `src/memory/dynamic-retrieval.ts` (query) and the
+short-term store / consolidator (record, promote).
+```
+        ┌──────────────────────────────────────────────┐
+        │                                              │
+        ▼                                              │
+   ┌─────────┐   ┌─────────┐   ┌──────┐   ┌────────┐   ┌─────────┐
+   │  READ   │──▶│  QUERY  │──▶│ ACT  │──▶│ RECORD │──▶│ PROMOTE │
+   │ short-  │   │ long-   │   │ via  │   │ to     │   │ lessons │
+   │ term    │   │ term    │   │ tool │   │ short- │   │ to long-│
+   │ memory  │   │ (semant)│   │      │   │ term   │   │ term    │
+   └─────────┘   └─────────┘   └──────┘   └────────┘   └────┬────┘
+                                                            │ next task
+                                                            ▼
+```
+1. **READ** — read short-term memory for recent context (`uap memory query`).
+2. **QUERY** — semantic search of long-term memory for related learnings.
+3. **ACT** — classify the task, then execute via the appropriate tool
+   (Edit / Write / Bash / a routed MCP tool / `uap deliver`).
+4. **RECORD** — write observations to short-term memory.
+5. **PROMOTE** — promote significant learnings to long-term memory, **through
+   the write-gate** (which scores and rejects low-quality memories).
+A learning is significant enough to PROMOTE when it changes future behavior:
+an important decision with rationale (importance ≥ 7), a pattern that prevents a
+recurring error, or a configuration choice with context. Trivial observations,
+transient debugging state, and secrets MUST NOT be promoted.
+---
+## 4. Worktree convention
+All file edits MUST happen inside a git worktree under `.worktrees/NNN-<slug>/`.
+Edits to the project root are blocked by the worktree gate
+(`worktree_required.py`).
+```bash
+uap worktree ensure --strict     # verify you are inside a worktree (exit 0)
+uap worktree create <slug>       # auto-numbered branch + worktree if not
+# ... edit, stage, commit inside the worktree ...
+uap worktree pr                  # open a PR from the worktree branch → master
+uap worktree finish              # finish + clean up after merge
+```
+Rules:
+- All edit paths MUST be under `.worktrees/NNN-<slug>/`.
+- Version bumps MUST happen on the feature branch, never on `master`.
+- PRs open from the worktree branch against `master`.
+- This applies to **every** file type — `.ts`, `.md`, `.json`, `.sh`,
+  configs, tests, docs. There is no exemption for "small" or "docs-only"
+  changes.
+**Read-only tasks** (analysis, diagnostics, queries) do NOT require a worktree.
+---
+## 5. Completion gates
+Claiming a code change is DONE is prohibited until all gates pass. The gates are
+decomposed across policy enforcers and the `review`-stage policy logic in
+`policy-gate.ts` (auto-forced on completion / merge / deploy operations):
+| Gate | Enforcer / mechanism | Requirement |
+|------|----------------------|-------------|
+| Tests | `test_gate.py` + `npm test` | new tests cover changed behavior; suite passes |
+| Build | post-tool-use build gate + `npm run build` | compiles with zero errors |
+| Type-check | `tsc --noEmit` | passes cleanly |
+| Task discipline | `task_required.py` | a UAP task is `in_progress` before mutating work |
+| Expert review | `expert_review_required.py` | parallel expert review precedes ship |
+| Schema diff | `schema_diff_gate.py` | schema/contract changes pass `uap schema-diff` |
+| Memory lesson | `session_memory_write.py` | code-changing sessions write a lesson |
+| Version bump | `npm run version:patch/minor/major` | bumped on the feature branch |
+The `uap deliver` convergence loop is the programmatic embodiment of these
+gates: its verifier ladder runs build → typecheck → test → lint as real
+commands and iterates the model until every *required* gate passes. See
+[OVERVIEW.md](OVERVIEW.md#how-uap-deliver-orchestrates).
+Completion gates MUST be verified before claiming done. RECOMMENDED practice is
+to verify at least three points: before changes (baseline), after changes, and
+after fixes.
+---
+## 6. Conformance summary
+A harness + agent pair conforms to the UAP protocol when:
+- [ ] Session-start hook injects `<uap-context>` memory and is fail-open.
+- [ ] Pre-tool-use hook runs worktree, command-safety, and policy gates.
+- [ ] A REQUIRED policy denial aborts the tool call (exit 2).
+- [ ] The agent follows the READ → QUERY → ACT → RECORD → PROMOTE loop.
+- [ ] All edits occur inside `.worktrees/NNN-<slug>/`.
+- [ ] Completion gates pass before any DONE claim.
+Install and audit the full stack with:
+```bash
+uap setup           # init + memory + patterns + hooks + policies
+uap hooks doctor    # audit policy-gate coverage across harnesses
+uap compliance check
+```