@miller-tech/uap 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/README.md +109 -642
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/cli/deliver-defaults.d.ts +23 -0
  4. package/dist/cli/deliver-defaults.d.ts.map +1 -0
  5. package/dist/cli/deliver-defaults.js +121 -0
  6. package/dist/cli/deliver-defaults.js.map +1 -0
  7. package/dist/cli/init.d.ts.map +1 -1
  8. package/dist/cli/init.js +29 -0
  9. package/dist/cli/init.js.map +1 -1
  10. package/dist/cli/setup.d.ts.map +1 -1
  11. package/dist/cli/setup.js +19 -0
  12. package/dist/cli/setup.js.map +1 -1
  13. package/dist/policies/policy-tools.d.ts +7 -0
  14. package/dist/policies/policy-tools.d.ts.map +1 -1
  15. package/dist/policies/policy-tools.js +24 -2
  16. package/dist/policies/policy-tools.js.map +1 -1
  17. package/docs/INDEX.md +48 -286
  18. package/docs/architecture/OVERVIEW.md +328 -0
  19. package/docs/architecture/PROTOCOL.md +204 -0
  20. package/docs/benchmarks/README.md +17 -192
  21. package/docs/getting-started/CONFIGURATION.md +237 -0
  22. package/docs/getting-started/INSTALLATION.md +125 -0
  23. package/docs/getting-started/QUICKSTART.md +115 -0
  24. package/docs/guides/COORDINATION.md +162 -0
  25. package/docs/guides/DELIVER.md +115 -0
  26. package/docs/guides/DEPLOY_BATCHING.md +212 -0
  27. package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
  28. package/docs/guides/LOCAL_MODELS.md +148 -0
  29. package/docs/guides/MCP_ROUTER.md +195 -0
  30. package/docs/guides/MEMORY.md +235 -0
  31. package/docs/guides/MULTI_MODEL.md +223 -0
  32. package/docs/guides/POLICIES.md +190 -0
  33. package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
  34. package/docs/integrations/MCP_ROUTER.md +147 -0
  35. package/docs/integrations/RTK.md +102 -0
  36. package/docs/reference/API.md +485 -0
  37. package/docs/reference/CLI.md +719 -0
  38. package/docs/reference/CONFIGURATION.md +90 -193
  39. package/docs/reference/DATABASE_SCHEMA.md +110 -344
  40. package/docs/reference/FEATURES.md +176 -472
  41. package/docs/reference/PATTERNS.md +102 -0
  42. package/docs/reference/PLATFORMS.md +83 -0
  43. package/package.json +3 -1
  44. package/src/policies/enforcers/7ebbc721-7540-4e9f-879a-770e0213a09b_architecture_review.py +101 -0
  45. package/src/policies/enforcers/__pycache__/_common.cpython-312.pyc +0 -0
  46. package/src/policies/enforcers/_common.py +100 -0
  47. package/src/policies/enforcers/artifact_hygiene.py +52 -0
  48. package/src/policies/enforcers/cluster_routing.py +63 -0
  49. package/src/policies/enforcers/codebase_read_before_plan.py +52 -0
  50. package/src/policies/enforcers/coord_overlap.py +81 -0
  51. package/src/policies/enforcers/delivery_enforcement.py +97 -0
  52. package/src/policies/enforcers/doc_live_over_report.py +50 -0
  53. package/src/policies/enforcers/expert_review_required.py +135 -0
  54. package/src/policies/enforcers/iac_parity.py +53 -0
  55. package/src/policies/enforcers/mcp_router_first.py +37 -0
  56. package/src/policies/enforcers/memory_before_plan.py +61 -0
  57. package/src/policies/enforcers/parallel_reads.py +50 -0
  58. package/src/policies/enforcers/rtk_wrap.py +44 -0
  59. package/src/policies/enforcers/schema_diff_gate.py +80 -0
  60. package/src/policies/enforcers/session_memory_write.py +52 -0
  61. package/src/policies/enforcers/task_required.py +131 -0
  62. package/src/policies/enforcers/test_gate.py +58 -0
  63. package/src/policies/enforcers/validate_plan_before_build.py +75 -0
  64. package/src/policies/enforcers/worktree_required.py +57 -0
  65. package/src/policies/schemas/policies/architecture-review.md +51 -0
  66. package/src/policies/schemas/policies/artifact-hygiene.md +29 -0
  67. package/src/policies/schemas/policies/cluster-routing.md +31 -0
  68. package/src/policies/schemas/policies/codebase-read-before-plan.md +30 -0
  69. package/src/policies/schemas/policies/coord-overlap.md +24 -0
  70. package/src/policies/schemas/policies/delivery-enforcement.md +45 -0
  71. package/src/policies/schemas/policies/doc-live-over-report.md +32 -0
  72. package/src/policies/schemas/policies/expert-review-required.md +60 -0
  73. package/src/policies/schemas/policies/iac-parity.md +31 -0
  74. package/src/policies/schemas/policies/mandatory-testing-deployment.md +147 -0
  75. package/src/policies/schemas/policies/mcp-router-first.md +24 -0
  76. package/src/policies/schemas/policies/memory-before-plan.md +24 -0
  77. package/src/policies/schemas/policies/merge-deploy-monitor-verify.md +145 -0
  78. package/src/policies/schemas/policies/parallel-reads.md +24 -0
  79. package/src/policies/schemas/policies/rtk-wrap.md +26 -0
  80. package/src/policies/schemas/policies/schema-diff-gate.md +30 -0
  81. package/src/policies/schemas/policies/session-memory-write.md +24 -0
  82. package/src/policies/schemas/policies/task-required.md +49 -0
  83. package/src/policies/schemas/policies/test-gate.md +24 -0
  84. package/src/policies/schemas/policies/validate-plan-before-build.md +28 -0
  85. package/src/policies/schemas/policies/worktree-required.md +28 -0
  86. package/templates/hooks/uap-policy-gate.sh +5 -0
  87. package/docs/AGENTS.md +0 -423
  88. package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
  89. package/docs/GETTING_STARTED.md +0 -288
  90. package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
  91. package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
  92. package/docs/architecture/EXPERT_STACK.md +0 -137
  93. package/docs/architecture/MULTI_MODEL.md +0 -224
  94. package/docs/architecture/PLATFORM_GATING.md +0 -68
  95. package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
  96. package/docs/architecture/UAP_COMPLIANCE.md +0 -217
  97. package/docs/architecture/UAP_PROTOCOL.md +0 -339
  98. package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
  99. package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
  100. package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
  101. package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
  102. package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
  103. package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
  104. package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
  105. package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
  106. package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
  107. package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
  108. package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
  109. package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
  110. package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
  111. package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
  112. package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
  113. package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
  114. package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
  115. package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
  116. package/docs/archive/opencode-integration-guide.md +0 -740
  117. package/docs/archive/opencode-integration-quickref.md +0 -180
  118. package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
  119. package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
  120. package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
  121. package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
  122. package/docs/blog/local-coding-agents.md +0 -266
  123. package/docs/blog/x-thread.md +0 -254
  124. package/docs/deployment/DEPLOYMENT.md +0 -895
  125. package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
  126. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
  127. package/docs/deployment/DEPLOY_BATCHING.md +0 -273
  128. package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
  129. package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
  130. package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
  131. package/docs/getting-started/INTEGRATION.md +0 -628
  132. package/docs/getting-started/OVERVIEW.md +0 -324
  133. package/docs/getting-started/SETUP.md +0 -377
  134. package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
  135. package/docs/integrations/RTK_INTEGRATION.md +0 -468
  136. package/docs/operations/TROUBLESHOOTING.md +0 -660
  137. package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
  138. package/docs/pr/UPSTREAM_PRS.md +0 -424
  139. package/docs/reference/API_REFERENCE.md +0 -903
  140. package/docs/reference/EXPERT_DROIDS.md +0 -219
  141. package/docs/reference/HARNESS-MATRIX.md +0 -318
  142. package/docs/reference/PATTERN_LIBRARY.md +0 -636
  143. package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
  144. package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
  145. package/docs/research/DOMAIN_STRATEGIES.md +0 -316
  146. package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
  147. package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
  148. package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
  149. package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
  150. package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
@@ -0,0 +1,328 @@
1
+ # UAP Architecture Overview
2
+
3
+ `v1.40.0` · 168 TypeScript modules across 18 `src/` subsystems · 117 test suites
4
+
5
+ The Universal Agent Protocol (UAP) is a layer that sits **underneath** an AI
6
+ coding agent's harness — Claude Code, Factory, Cursor, OpenCode, Codex, and
7
+ others. It does not replace the model or the harness. Instead it installs
8
+ **hooks** that intercept the harness's tool calls, then mediates each call
9
+ through three services — memory injection, policy enforcement, and tool-output
10
+ compression — before handing control back. On top of that mediation layer it
11
+ ships a rich CLI for memory, delivery, worktrees, tasks, deployment, and
12
+ multi-model routing.
13
+
14
+ This document describes the system architecture. For the normative
15
+ harness↔UAP contract, see [PROTOCOL.md](PROTOCOL.md).
16
+
17
+ ---
18
+
19
+ ## The hook-mediation model
20
+
21
+ A bare agent harness calls a tool (Edit, Write, Bash, a spawned sub-agent, an
22
+ MCP tool) and the model sees the raw result. UAP inserts itself between the
23
+ harness and the tool by registering hooks at the harness's interception points:
24
+
25
+ - **Claude Code / VSCode / Factory / Cursor** — `PreToolUse` hooks
26
+ - **OpenCode** — the `tool.execute.before` plugin hook
27
+ - **Codex** — gating via the UAP MCP server (`execute_tool`)
28
+ - **Hermes** — the `pre_tool_call` event
29
+
30
+ The same logical lifecycle runs on every harness:
31
+
32
+ ```
33
+ ┌──────────────────────────────────────┐
34
+ │ AGENT HARNESS │
35
+ │ (Claude Code / Factory / OpenCode) │
36
+ └───────────────┬──────────────────────┘
37
+ │ tool call
38
+
39
+ ┌────────────────────── UAP HOOK LAYER ──────────────────────┐
40
+ │ │
41
+ session │ SessionStart hook │
42
+ start ─────┼─▶ • inject last-24h memory (<uap-context> … ) │
43
+ │ • clean stale agents / work claims │
44
+ │ │
45
+ per tool │ PreToolUse / tool.execute.before hook │
46
+ call ──────┼─▶ ┌───────────┐ ┌────────────┐ ┌────────────────────┐ │
47
+ │ │ MEMORY │──▶│ POLICY │──▶│ MCP ROUTER │ │
48
+ │ │ injection │ │ gates │ │ token compression │ │
49
+ │ └───────────┘ └─────┬──────┘ └─────────┬──────────┘ │
50
+ │ │ deny (exit 2) │ │
51
+ │ ▼ ▼ │
52
+ │ BLOCK call compressed result │
53
+ └─────────────────────────────────────────────┼────────────┘
54
+ │ allow │
55
+ ▼ ▼
56
+ ┌──────────────────────────────┐
57
+ │ THE ACTUAL TOOL │
58
+ │ (fs / shell / MCP server) │
59
+ └──────────────────────────────┘
60
+ ```
61
+
62
+ Hooks are **fail-open for context** (memory injection never blocks) and
63
+ **fail-closed for safety** (a required policy violation returns exit code 2 and
64
+ the harness aborts the call). Hook scripts are generated and installed by
65
+ `uap hooks install` (`src/cli/hooks.ts`) from templates in `templates/hooks/`.
66
+
67
+ ---
68
+
69
+ ## Component map
70
+
71
+ ```
72
+ src/
73
+ ├── memory/ 4-tier memory: working, session, semantic (Qdrant), graph
74
+ ├── mcp-router/ hierarchical MCP router — tool hiding + FTS5 output compression
75
+ ├── policies/ hook-based policy gates + 20 Python enforcers
76
+ ├── delivery/ `uap deliver` convergence loop (15 modules)
77
+ ├── coordination/ multi-agent registry, overlap detection, deploy batching
78
+ ├── models/ multi-model routing, planning, execution profiles
79
+ ├── tasks/ dependency-aware task tracker (SQLite, DAG)
80
+ ├── dashboard/ live task / agent / memory / policy visualization
81
+ ├── observability/ HALO / OpenInference span export for harness analysis
82
+ ├── analyzers/ project structure analysis + metadata generation
83
+ ├── generators/ CLAUDE.md / config generation
84
+ ├── benchmarks/ Terminal-Bench harness + scoring
85
+ ├── browser/ cloaked browser automation for agents
86
+ ├── telemetry/ run telemetry
87
+ ├── models/… (see above)
88
+ ├── bin/ CLI entry (cli.ts), policy bin, llama-server-optimize
89
+ ├── cli/ ~35 command modules wired into bin/cli.ts
90
+ ├── types/ shared types
91
+ └── utils/ logging and shared helpers
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Subsystems
97
+
98
+ ### Memory (`src/memory/`)
99
+
100
+ A four-tier memory system that gives the agent persistent context across
101
+ sessions. The tiers (`src/memory/README.md`):
102
+
103
+ | Tier | Backend | Purpose |
104
+ |------|---------|---------|
105
+ | **L1 Working** | SQLite `memories` table (~50 cap, FTS5) | recent actions |
106
+ | **L2 Session** | SQLite `session_memories` table (FTS5) | current-session context, "open loops" |
107
+ | **L3 Semantic** | Qdrant, 768-dim vectors | long-term learnings, semantic recall |
108
+ | **L4 Knowledge** | SQLite entity/relationship graph | entity relationships, N-hop traversal |
109
+
110
+ Embeddings (`src/memory/embeddings.ts`) use **`nomic-embed-text-v2-moe`
111
+ (768-dim)** via a llama-server `--embeddings` endpoint, with fallbacks down a
112
+ chain (Ollama `nomic-embed-text` → OpenAI `text-embedding-3-small` → local
113
+ `all-MiniLM-L6-v2` → TF-IDF). The provider is pluggable and cached
114
+ (SHA-256-keyed LRU).
115
+
116
+ Key modules:
117
+
118
+ - `hierarchical-memory.ts` — in-memory hot/warm/cold tier manager with
119
+ auto promote/demote, time-decay importance, and token-budget enforcement,
120
+ persisted to its own SQLite DB.
121
+ - `dynamic-retrieval.ts` — the per-task orchestrator: classifies the task,
122
+ sets adaptive retrieval depth + token budget, queries all sources, dedups,
123
+ compresses, and formats the final context block.
124
+ - `memory-consolidator.ts` — summarizes working entries into session memory,
125
+ extracts lessons, and dedups by content hash + embedding similarity.
126
+ - `write-gate.ts` — a quality filter that scores candidate memories and only
127
+ persists those above threshold (prevents memory pollution).
128
+ - `knowledge-graph.ts` — the L4 graph: upsert entities, strengthen
129
+ relationships, recursive-CTE traversal.
130
+ - `context-compressor.ts` / `semantic-compression.ts` — token budgeting and
131
+ distillation of context into atomic typed facts.
132
+ - `predictive-memory.ts` / `speculative-cache.ts` — prefetch likely-needed
133
+ memories before they are queried.
134
+ - `task-classifier.ts` — classifies an instruction to drive retrieval hints.
135
+ - `model-router.ts` — benchmark-fingerprint LLM routing with feedback learning
136
+ (consumed by `src/models/unified-router.ts`).
137
+
138
+ ### MCP Router (`src/mcp-router/`)
139
+
140
+ A hierarchical Model Context Protocol router that achieves large token savings
141
+ by two independent mechanisms (`src/mcp-router/server.ts`,
142
+ `output-compressor.ts`):
143
+
144
+ 1. **Tool hiding.** Instead of exposing 150+ downstream MCP tool schemas
145
+ (~500 tokens each) to the model, the router exposes just three meta-tools —
146
+ `discover_tools`, `execute_tool`, `deliver`. The model issues a
147
+ natural-language `discover_tools` query, gets back matching tool paths, then
148
+ calls `execute_tool({path, args})`. Downstream tools live in an in-memory
149
+ fuzzy search index and are never surfaced as definitions.
150
+ 2. **Output compression (FTS5).** `execute_tool` accepts an `intent`. Large
151
+ tool output is chunked, indexed into an in-memory SQLite **FTS5** virtual
152
+ table, queried with the intent using **BM25 ranking**, and only the top
153
+ matching snippets (plus a searchable-vocabulary footer) are returned. Small
154
+ outputs pass through unchanged; very large outputs without an intent are
155
+ head+tail truncated.
156
+
157
+ The design target documented in source is ~75,000 tokens of tool definitions
158
+ collapsed to ~700 (98%+). Per-output FTS5 savings are computed live per call.
159
+ See [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) for setup.
160
+
161
+ ### Policies (`src/policies/`)
162
+
163
+ Project guidelines expressed as **executable hook gates** rather than prose.
164
+ Two layers:
165
+
166
+ - **TypeScript middleware** — `policy-gate.ts` (`PolicyGate.executeWithGates`)
167
+ is the in-process gate used by the MCP server's `execute_tool`. It loads
168
+ REQUIRED policies from a SQLite store (`policies.db`), evaluates keyword /
169
+ anti-pattern rules against the operation, and throws `PolicyViolationError`
170
+ on a REQUIRED violation. Stages: `pre-exec | post-exec | review | always`;
171
+ completion/merge/deploy operations auto-force a `review` stage.
172
+ - **Shell gate + Python enforcers** — `templates/hooks/uap-policy-gate.sh`
173
+ binds to harness hook events and invokes the ~20 enforcers in
174
+ `src/policies/enforcers/`. A blocked verdict is **exit code 2** (hard block).
175
+
176
+ Enforcers cover the worktree gate (`worktree_required.py`), task discipline
177
+ (`task_required.py`), delivery routing (`delivery_enforcement.py`), test deltas
178
+ (`test_gate.py`), expert review (`expert_review_required.py`), schema diffs
179
+ (`schema_diff_gate.py`), memory-before-plan, MCP-router-first, RTK wrapping, and
180
+ more. Levels: **REQUIRED** blocks, **RECOMMENDED** logs, **OPTIONAL** informs.
181
+
182
+ ### Delivery — `uap deliver` (`src/delivery/`)
183
+
184
+ A 15-module convergence loop that drives an underlying model against the
185
+ project's **real** completion gates until the work actually passes — the
186
+ mechanism behind UAP's "agents stop declaring victory on broken code." See the
187
+ [deliver flow](#how-uap-deliver-orchestrates) below.
188
+
189
+ ### Coordination (`src/coordination/`)
190
+
191
+ Lets multiple agents work the same repo without colliding. A singleton SQLite
192
+ DB (`database.ts`) backs an agent registry, work announcements, work claims,
193
+ inter-agent messages, and a deploy queue. `service.ts` detects **overlap** when
194
+ agents announce work on the same files and suggests merge order;
195
+ `deploy-batcher.ts` queues git/CI actions with per-type batch windows
196
+ (commit 30s, push 5s, merge 10s, deploy 60s), folds/squashes similar pending
197
+ actions, and executes batches sequentially or in parallel.
198
+ `expert-orchestrator.ts` builds an ordered expert-droid chain across the
199
+ `plan → design → implement → review → release` lifecycle, drawing implement
200
+ droids from `capability-router.ts`. `pattern-router.ts` matches tasks to
201
+ Terminal-Bench patterns (always enforcing **P12** Output Existence and
202
+ **P35** Decoder-First).
203
+
204
+ ### Models (`src/models/`)
205
+
206
+ Multi-model routing. `router.ts` classifies a task (complexity + type from
207
+ keyword scoring) and `selectModel()` picks a model per the routing strategy —
208
+ `performance-first`, `cost-optimized`, `adaptive`, or `balanced` (default,
209
+ which walks priority-ordered routing rules). `unified-router.ts` layers a
210
+ benchmark signal on top: it returns a consensus when the rule-based and
211
+ benchmark routers agree, otherwise trusts the benchmark router only when it has
212
+ enough data. `planner.ts` decomposes a task into a subtask DAG and assigns a
213
+ model per subtask; `executor.ts` runs the plan level-by-level with retries and
214
+ fallback; `execution-profiles.ts` tunes *how* the chosen model runs
215
+ (temperature, budgets); `analytics.ts` records outcomes so routing improves.
216
+
217
+ ### Tasks (`src/tasks/`)
218
+
219
+ A dependency-aware task tracker (a Beads alternative) backed by SQLite
220
+ (`tasks`, `task_dependencies`, `task_history`, `task_activity`,
221
+ `task_summaries`). Tasks form a DAG; closing a task transitions its dependents
222
+ from `blocked` to `open` and emits events on an in-process bus (`event-bus.ts`).
223
+ `coordination.ts` bridges tasks to the multi-agent coordination layer (claim /
224
+ release with overlap detection); `decoder-gate.ts` implements the P35
225
+ Decoder-First pre-execution validator (droid schema, tool availability,
226
+ claim conflicts, worktree requirement, ambiguity).
227
+
228
+ ### Dashboard (`src/dashboard/`)
229
+
230
+ A live visualization layer (`uap dashboard`) over tasks, agents, memory,
231
+ policies, models, and benchmark/session history, with an event stream for
232
+ real-time updates.
233
+
234
+ ### Observability (`src/observability/`)
235
+
236
+ Emits HALO / OpenInference spans for delivery runs and tool calls, consumed by
237
+ `uap harness analyze` to optimize agent execution from real traces.
238
+
239
+ ---
240
+
241
+ ## How a tool call flows: memory → policy → MCP Router
242
+
243
+ For a representative `execute_tool` call routed through the UAP MCP server:
244
+
245
+ ```
246
+ 1. Tool call arrives at the PreToolUse hook.
247
+
248
+ 2. MEMORY — On session start, recent memory is injected as <uap-context>.
249
+ Per task, dynamic-retrieval has already surfaced relevant
250
+ long-term learnings into the prompt. (Read/Query stage.)
251
+
252
+ 3. POLICY — PolicyGate.executeWithGates loads REQUIRED policies and checks
253
+ the operation + args. A REQUIRED violation → PolicyViolationError
254
+ (or exit 2 from the shell enforcer) → the harness ABORTS the call.
255
+ Otherwise the call proceeds.
256
+
257
+ 4. ROUTER — execute_tool resolves the tool path, dispatches to the downstream
258
+ MCP client (or an expert droid), and captures the raw result.
259
+
260
+ 5. COMPRESS— output-compressor indexes large output into FTS5 and returns only
261
+ the top BM25 snippets for the call's `intent`, plus a vocabulary
262
+ footer. The model sees a compact result, not the full payload.
263
+
264
+ 6. RECORD — The agent records the observation to short-term memory; the
265
+ consolidator later promotes significant lessons to long-term
266
+ memory through the write-gate. (Record/Promote stage.)
267
+ ```
268
+
269
+ The four-stage **read → query → act → record → promote** loop is the agent
270
+ decision loop defined normatively in [PROTOCOL.md](PROTOCOL.md#agent-decision-loop).
271
+
272
+ ---
273
+
274
+ ## How `uap deliver` orchestrates
275
+
276
+ `uap deliver "<instruction>"` runs `ConvergenceLoop.deliver()`
277
+ (`src/delivery/convergence-loop.ts`). The staged flow:
278
+
279
+ ```
280
+ detect gates ──▶ baseline check ──▶ protect files ──▶ ╔═════ turn loop ═════╗
281
+ (verifier- (already green? (snapshot tests/ ║ build prompt ║
282
+ ladder reads skip, no model oracle + integrity ║ EXECUTE model ║
283
+ package.json) call) guard) ║ APPLY file blocks ║
284
+ ║ VERIFY (ladder) ║
285
+ ║ passed? ─▶ done ║
286
+ ║ else: CRITIC + ║
287
+ ║ ESCALATE ║
288
+ ╚═════════╤═══════════╝
289
+ │ repeat until
290
+ ▼ gates pass or
291
+ budget spent
292
+ ```
293
+
294
+ - **Verifier ladder** (`verifier-ladder.ts`) — derives gate rungs from
295
+ `package.json` scripts (build → typecheck via `tsc --noEmit` → test → lint),
296
+ runs each as a real command in a secret-stripped env, fails fast on required
297
+ rungs. "Delivered" means all *required* rungs pass; lint is optional.
298
+ - **Explorer + ideation** (`explorer.ts`, `ideation.ts`) — best-of-N: generate
299
+ N candidates with distinct strategy seeds, apply/verify/rollback each on the
300
+ same baseline, commit only the winner. A model **judge** (`judge.ts`)
301
+ tie-breaks candidates with equal gate scores.
302
+ - **Critic** (`critic.ts`) — turns a failed turn's gate output into a
303
+ file-scoped repair plan fed into the next turn (not a raw compiler dump).
304
+ - **Escalation** (`escalation.ts`) — on score stagnation, climbs a cost ladder:
305
+ widen exploration → enable critic → switch to a stronger model + raise budget.
306
+ - **Protection** (`spec-imports.ts`, `integrity.ts`, `applier.ts`) — protects
307
+ pre-existing test/spec files and their transitive oracle imports, and
308
+ byte-verifies protected files after each gate run, so the model can't satisfy
309
+ a spec by rewriting what it asserts against.
310
+ - **Auto / optimize** (`auto-optimizer.ts`) — by default, the run classifies
311
+ task complexity and enables the matching aids automatically; `--optimize`
312
+ enables every aid at once.
313
+ - **Coordination + observability** (`run-coordinator.ts`, `halo-trace.ts`) —
314
+ optionally registers the run as a coordination agent, heartbeats, queues
315
+ applied files into the deploy batcher, and emits HALO spans.
316
+
317
+ The default model preset is `qwen35-a3b`; `--until-delivered` (on by default)
318
+ keeps extending the turn budget while the best score is improving, up to a
319
+ ceiling (default 30, hard cap 50), and stops once progress stalls.
320
+
321
+ ---
322
+
323
+ ## See also
324
+
325
+ - [PROTOCOL.md](PROTOCOL.md) — the harness↔UAP contract and agent loop
326
+ - [../integrations/MCP_ROUTER.md](../integrations/MCP_ROUTER.md) — MCP Router setup
327
+ - [../integrations/RTK.md](../integrations/RTK.md) — RTK (Rust Token Killer)
328
+ - [../../CONTRIBUTING.md](../../CONTRIBUTING.md) — development workflow
@@ -0,0 +1,204 @@
1
+ # The UAP Protocol
2
+
3
+ `v1.40.0`
4
+
5
+ This document specifies the **Universal Agent Protocol** itself: the contract
6
+ between an AI agent harness and the UAP layer beneath it. It is normative —
7
+ where it says *MUST* / *SHOULD* / *MAY*, those carry their usual meaning. For an
8
+ architectural tour of the components that implement this contract, see
9
+ [OVERVIEW.md](OVERVIEW.md).
10
+
11
+ UAP is not a wire protocol. It is a **convention enforced by hooks**: a small
12
+ set of interception points the harness exposes, a defined hook lifecycle, an
13
+ agent decision loop, and a set of gates that block work which violates the
14
+ contract. Any harness that can run a hook before tool execution can host UAP.
15
+
16
+ ---
17
+
18
+ ## 1. The harness ↔ UAP contract
19
+
20
+ A conforming harness MUST provide UAP with:
21
+
22
+ 1. **A session-start interception point** — a place to run a script when an
23
+ agent session begins, whose stdout is injected into the agent's context.
24
+ 2. **A pre-tool-use interception point** — a place to run a script *before*
25
+ each tool call, where a non-zero exit (specifically **exit code 2**) aborts
26
+ the call.
27
+
28
+ UAP, in return, guarantees:
29
+
30
+ - Memory injection is **advisory and fail-open** — if hydration fails, the
31
+ session proceeds without it. It never blocks an agent.
32
+ - Policy enforcement is **fail-closed for REQUIRED policies** — a REQUIRED
33
+ violation blocks the call. RECOMMENDED / OPTIONAL policies only log.
34
+ - Tool output routed through the MCP Router is **compressed, not altered in
35
+ meaning** — the model receives a faithful, smaller view of the same result.
36
+
37
+ ### Supported interception points
38
+
39
+ | Harness | Session start | Pre-tool-use |
40
+ |---------|---------------|--------------|
41
+ | Claude Code / VSCode | `SessionStart` | `PreToolUse` |
42
+ | Factory | `SessionStart` | `PreToolUse` |
43
+ | Cursor | hooks.json | `preToolUse` |
44
+ | OpenCode | plugin | `tool.execute.before` |
45
+ | Codex | AGENTS.md / MCP | gating via UAP MCP `execute_tool` |
46
+ | Hermes | — | `pre_tool_call` |
47
+
48
+ Hook scripts are generated and installed by `uap hooks install`
49
+ (`src/cli/hooks.ts`) from `templates/hooks/`. Verify coverage with
50
+ `uap hooks doctor`.
51
+
52
+ ---
53
+
54
+ ## 2. Hook lifecycle
55
+
56
+ ### 2.1 Session start
57
+
58
+ On session start the harness runs the session-start hook, which MUST:
59
+
60
+ 1. **Inject memory.** Query the short-term store (last-24h top memories plus
61
+ open "session" loops of type action/goal/decision with importance ≥ 7) and
62
+ emit it wrapped in a `<uap-context>` block on stdout. The harness places
63
+ this in the agent's context.
64
+ 2. **Clean stale state.** Deregister dead agents and release abandoned work
65
+ claims so coordination state stays accurate.
66
+
67
+ The hook is **self-healing and fail-open**: it auto-creates missing
68
+ coordination/memory DBs and never exits non-zero. Its output is advisory
69
+ context, not a gate.
70
+
71
+ ### 2.2 Pre-tool-use
72
+
73
+ Before each tool call the harness runs the pre-tool-use hook, which runs the
74
+ relevant gates for that tool. Conceptually:
75
+
76
+ ```
77
+ pre-tool-use(tool, args):
78
+ if tool in {Edit, Write, MultiEdit}:
79
+ run worktree gate # path MUST be under .worktrees/
80
+ if tool == Bash:
81
+ run dangerous-command guard # block terraform apply, force-push, ...
82
+ run policy gate (DB-driven) # REQUIRED policies for this tool
83
+ if any gate denies:
84
+ exit 2 # harness ABORTS the call
85
+ else:
86
+ allow # call proceeds (optionally via MCP Router)
87
+ ```
88
+
89
+ A gate verdict is binary — **allow** or **deny**. There is no "modify" verdict.
90
+ A denied call returns **exit code 2** (shell enforcers) or throws
91
+ `PolicyViolationError` (in-process MCP gate). Post-tool-use hooks MAY run a
92
+ build gate or backup reminder; pre/post-compact hooks preserve protocol context
93
+ across context compaction; the stop hook runs a completion checklist.
94
+
95
+ ---
96
+
97
+ ## 3. Agent decision loop
98
+
99
+ A conforming agent SHOULD execute each task through this loop. The TypeScript
100
+ implementation lives in `src/memory/dynamic-retrieval.ts` (query) and the
101
+ short-term store / consolidator (record, promote).
102
+
103
+ ```
104
+ ┌──────────────────────────────────────────────┐
105
+ │ │
106
+ ▼ │
107
+ ┌─────────┐ ┌─────────┐ ┌──────┐ ┌────────┐ ┌─────────┐
108
+ │ READ │──▶│ QUERY │──▶│ ACT │──▶│ RECORD │──▶│ PROMOTE │
109
+ │ short- │ │ long- │ │ via │ │ to │ │ lessons │
110
+ │ term │ │ term │ │ tool │ │ short- │ │ to long-│
111
+ │ memory │ │ (semant)│ │ │ │ term │ │ term │
112
+ └─────────┘ └─────────┘ └──────┘ └────────┘ └────┬────┘
113
+ │ next task
114
+
115
+ ```
116
+
117
+ 1. **READ** — read short-term memory for recent context (`uap memory query`).
118
+ 2. **QUERY** — semantic search of long-term memory for related learnings.
119
+ 3. **ACT** — classify the task, then execute via the appropriate tool
120
+ (Edit / Write / Bash / a routed MCP tool / `uap deliver`).
121
+ 4. **RECORD** — write observations to short-term memory.
122
+ 5. **PROMOTE** — promote significant learnings to long-term memory, **through
123
+ the write-gate** (which scores and rejects low-quality memories).
124
+
125
+ A learning is significant enough to PROMOTE when it changes future behavior:
126
+ an important decision with rationale (importance ≥ 7), a pattern that prevents a
127
+ recurring error, or a configuration choice with context. Trivial observations,
128
+ transient debugging state, and secrets MUST NOT be promoted.
129
+
130
+ ---
131
+
132
+ ## 4. Worktree convention
133
+
134
+ All file edits MUST happen inside a git worktree under `.worktrees/NNN-<slug>/`.
135
+ Edits to the project root are blocked by the worktree gate
136
+ (`worktree_required.py`).
137
+
138
+ ```bash
139
+ uap worktree ensure --strict # verify you are inside a worktree (exit 0)
140
+ uap worktree create <slug> # auto-numbered branch + worktree if not
141
+ # ... edit, stage, commit inside the worktree ...
142
+ uap worktree pr # open a PR from the worktree branch → master
143
+ uap worktree finish # finish + clean up after merge
144
+ ```
145
+
146
+ Rules:
147
+
148
+ - All edit paths MUST be under `.worktrees/NNN-<slug>/`.
149
+ - Version bumps MUST happen on the feature branch, never on `master`.
150
+ - PRs open from the worktree branch against `master`.
151
+ - This applies to **every** file type — `.ts`, `.md`, `.json`, `.sh`,
152
+ configs, tests, docs. There is no exemption for "small" or "docs-only"
153
+ changes.
154
+
155
+ **Read-only tasks** (analysis, diagnostics, queries) do NOT require a worktree.
156
+
157
+ ---
158
+
159
+ ## 5. Completion gates
160
+
161
+ Claiming a code change is DONE is prohibited until all gates pass. The gates are
162
+ decomposed across policy enforcers and the `review`-stage policy logic in
163
+ `policy-gate.ts` (auto-forced on completion / merge / deploy operations):
164
+
165
+ | Gate | Enforcer / mechanism | Requirement |
166
+ |------|----------------------|-------------|
167
+ | Tests | `test_gate.py` + `npm test` | new tests cover changed behavior; suite passes |
168
+ | Build | post-tool-use build gate + `npm run build` | compiles with zero errors |
169
+ | Type-check | `tsc --noEmit` | passes cleanly |
170
+ | Task discipline | `task_required.py` | a UAP task is `in_progress` before mutating work |
171
+ | Expert review | `expert_review_required.py` | parallel expert review precedes ship |
172
+ | Schema diff | `schema_diff_gate.py` | schema/contract changes pass `uap schema-diff` |
173
+ | Memory lesson | `session_memory_write.py` | code-changing sessions write a lesson |
174
+ | Version bump | `npm run version:patch/minor/major` | bumped on the feature branch |
175
+
176
+ The `uap deliver` convergence loop is the programmatic embodiment of these
177
+ gates: its verifier ladder runs build → typecheck → test → lint as real
178
+ commands and iterates the model until every *required* gate passes. See
179
+ [OVERVIEW.md](OVERVIEW.md#how-uap-deliver-orchestrates).
180
+
181
+ Completion gates MUST be verified before claiming done. RECOMMENDED practice is
182
+ to verify at least three points: before changes (baseline), after changes, and
183
+ after fixes.
184
+
185
+ ---
186
+
187
+ ## 6. Conformance summary
188
+
189
+ A harness + agent pair conforms to the UAP protocol when:
190
+
191
+ - [ ] Session-start hook injects `<uap-context>` memory and is fail-open.
192
+ - [ ] Pre-tool-use hook runs worktree, command-safety, and policy gates.
193
+ - [ ] A REQUIRED policy denial aborts the tool call (exit 2).
194
+ - [ ] The agent follows the READ → QUERY → ACT → RECORD → PROMOTE loop.
195
+ - [ ] All edits occur inside `.worktrees/NNN-<slug>/`.
196
+ - [ ] Completion gates pass before any DONE claim.
197
+
198
+ Install and audit the full stack with:
199
+
200
+ ```bash
201
+ uap setup # init + memory + patterns + hooks + policies
202
+ uap hooks doctor # audit policy-gate coverage across harnesses
203
+ uap compliance check
204
+ ```