@miller-tech/uap 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/README.md +109 -642
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/cli/deliver-defaults.d.ts +23 -0
  4. package/dist/cli/deliver-defaults.d.ts.map +1 -0
  5. package/dist/cli/deliver-defaults.js +121 -0
  6. package/dist/cli/deliver-defaults.js.map +1 -0
  7. package/dist/cli/init.d.ts.map +1 -1
  8. package/dist/cli/init.js +29 -0
  9. package/dist/cli/init.js.map +1 -1
  10. package/dist/cli/setup.d.ts.map +1 -1
  11. package/dist/cli/setup.js +19 -0
  12. package/dist/cli/setup.js.map +1 -1
  13. package/dist/policies/policy-tools.d.ts +7 -0
  14. package/dist/policies/policy-tools.d.ts.map +1 -1
  15. package/dist/policies/policy-tools.js +24 -2
  16. package/dist/policies/policy-tools.js.map +1 -1
  17. package/docs/INDEX.md +48 -286
  18. package/docs/architecture/OVERVIEW.md +328 -0
  19. package/docs/architecture/PROTOCOL.md +204 -0
  20. package/docs/benchmarks/README.md +17 -192
  21. package/docs/getting-started/CONFIGURATION.md +237 -0
  22. package/docs/getting-started/INSTALLATION.md +125 -0
  23. package/docs/getting-started/QUICKSTART.md +115 -0
  24. package/docs/guides/COORDINATION.md +162 -0
  25. package/docs/guides/DELIVER.md +115 -0
  26. package/docs/guides/DEPLOY_BATCHING.md +212 -0
  27. package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
  28. package/docs/guides/LOCAL_MODELS.md +148 -0
  29. package/docs/guides/MCP_ROUTER.md +195 -0
  30. package/docs/guides/MEMORY.md +235 -0
  31. package/docs/guides/MULTI_MODEL.md +223 -0
  32. package/docs/guides/POLICIES.md +190 -0
  33. package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
  34. package/docs/integrations/MCP_ROUTER.md +147 -0
  35. package/docs/integrations/RTK.md +102 -0
  36. package/docs/reference/API.md +485 -0
  37. package/docs/reference/CLI.md +719 -0
  38. package/docs/reference/CONFIGURATION.md +90 -193
  39. package/docs/reference/DATABASE_SCHEMA.md +110 -344
  40. package/docs/reference/FEATURES.md +176 -472
  41. package/docs/reference/PATTERNS.md +102 -0
  42. package/docs/reference/PLATFORMS.md +83 -0
  43. package/package.json +3 -1
  44. package/src/policies/enforcers/7ebbc721-7540-4e9f-879a-770e0213a09b_architecture_review.py +101 -0
  45. package/src/policies/enforcers/__pycache__/_common.cpython-312.pyc +0 -0
  46. package/src/policies/enforcers/_common.py +100 -0
  47. package/src/policies/enforcers/artifact_hygiene.py +52 -0
  48. package/src/policies/enforcers/cluster_routing.py +63 -0
  49. package/src/policies/enforcers/codebase_read_before_plan.py +52 -0
  50. package/src/policies/enforcers/coord_overlap.py +81 -0
  51. package/src/policies/enforcers/delivery_enforcement.py +97 -0
  52. package/src/policies/enforcers/doc_live_over_report.py +50 -0
  53. package/src/policies/enforcers/expert_review_required.py +135 -0
  54. package/src/policies/enforcers/iac_parity.py +53 -0
  55. package/src/policies/enforcers/mcp_router_first.py +37 -0
  56. package/src/policies/enforcers/memory_before_plan.py +61 -0
  57. package/src/policies/enforcers/parallel_reads.py +50 -0
  58. package/src/policies/enforcers/rtk_wrap.py +44 -0
  59. package/src/policies/enforcers/schema_diff_gate.py +80 -0
  60. package/src/policies/enforcers/session_memory_write.py +52 -0
  61. package/src/policies/enforcers/task_required.py +131 -0
  62. package/src/policies/enforcers/test_gate.py +58 -0
  63. package/src/policies/enforcers/validate_plan_before_build.py +75 -0
  64. package/src/policies/enforcers/worktree_required.py +57 -0
  65. package/src/policies/schemas/policies/architecture-review.md +51 -0
  66. package/src/policies/schemas/policies/artifact-hygiene.md +29 -0
  67. package/src/policies/schemas/policies/cluster-routing.md +31 -0
  68. package/src/policies/schemas/policies/codebase-read-before-plan.md +30 -0
  69. package/src/policies/schemas/policies/coord-overlap.md +24 -0
  70. package/src/policies/schemas/policies/delivery-enforcement.md +45 -0
  71. package/src/policies/schemas/policies/doc-live-over-report.md +32 -0
  72. package/src/policies/schemas/policies/expert-review-required.md +60 -0
  73. package/src/policies/schemas/policies/iac-parity.md +31 -0
  74. package/src/policies/schemas/policies/mandatory-testing-deployment.md +147 -0
  75. package/src/policies/schemas/policies/mcp-router-first.md +24 -0
  76. package/src/policies/schemas/policies/memory-before-plan.md +24 -0
  77. package/src/policies/schemas/policies/merge-deploy-monitor-verify.md +145 -0
  78. package/src/policies/schemas/policies/parallel-reads.md +24 -0
  79. package/src/policies/schemas/policies/rtk-wrap.md +26 -0
  80. package/src/policies/schemas/policies/schema-diff-gate.md +30 -0
  81. package/src/policies/schemas/policies/session-memory-write.md +24 -0
  82. package/src/policies/schemas/policies/task-required.md +49 -0
  83. package/src/policies/schemas/policies/test-gate.md +24 -0
  84. package/src/policies/schemas/policies/validate-plan-before-build.md +28 -0
  85. package/src/policies/schemas/policies/worktree-required.md +28 -0
  86. package/templates/hooks/uap-policy-gate.sh +5 -0
  87. package/docs/AGENTS.md +0 -423
  88. package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
  89. package/docs/GETTING_STARTED.md +0 -288
  90. package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
  91. package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
  92. package/docs/architecture/EXPERT_STACK.md +0 -137
  93. package/docs/architecture/MULTI_MODEL.md +0 -224
  94. package/docs/architecture/PLATFORM_GATING.md +0 -68
  95. package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
  96. package/docs/architecture/UAP_COMPLIANCE.md +0 -217
  97. package/docs/architecture/UAP_PROTOCOL.md +0 -339
  98. package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
  99. package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
  100. package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
  101. package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
  102. package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
  103. package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
  104. package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
  105. package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
  106. package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
  107. package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
  108. package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
  109. package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
  110. package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
  111. package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
  112. package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
  113. package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
  114. package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
  115. package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
  116. package/docs/archive/opencode-integration-guide.md +0 -740
  117. package/docs/archive/opencode-integration-quickref.md +0 -180
  118. package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
  119. package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
  120. package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
  121. package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
  122. package/docs/blog/local-coding-agents.md +0 -266
  123. package/docs/blog/x-thread.md +0 -254
  124. package/docs/deployment/DEPLOYMENT.md +0 -895
  125. package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
  126. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
  127. package/docs/deployment/DEPLOY_BATCHING.md +0 -273
  128. package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
  129. package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
  130. package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
  131. package/docs/getting-started/INTEGRATION.md +0 -628
  132. package/docs/getting-started/OVERVIEW.md +0 -324
  133. package/docs/getting-started/SETUP.md +0 -377
  134. package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
  135. package/docs/integrations/RTK_INTEGRATION.md +0 -468
  136. package/docs/operations/TROUBLESHOOTING.md +0 -660
  137. package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
  138. package/docs/pr/UPSTREAM_PRS.md +0 -424
  139. package/docs/reference/API_REFERENCE.md +0 -903
  140. package/docs/reference/EXPERT_DROIDS.md +0 -219
  141. package/docs/reference/HARNESS-MATRIX.md +0 -318
  142. package/docs/reference/PATTERN_LIBRARY.md +0 -636
  143. package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
  144. package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
  145. package/docs/research/DOMAIN_STRATEGIES.md +0 -316
  146. package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
  147. package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
  148. package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
  149. package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
  150. package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
@@ -0,0 +1,223 @@
1
+ # Multi-Model Routing
2
+
3
+ > Applies to UAP **v1.40.0**
4
+
5
+ UAP runs agentic work across multiple LLMs instead of one. A high-capability
6
+ model plans, a cheaper or local model executes, and a reviewer model checks the
7
+ result. Routing decisions are made per task (and per subtask) so you pay for
8
+ expensive reasoning only when the work actually needs it.
9
+
10
+ ## Why multi-model
11
+
12
+ A single frontier model is the simplest setup, but most of the tokens an agent
13
+ spends are on routine execution — applying an edit, running a tool, writing a
14
+ test — not on hard reasoning. Sending all of that to a premium model is
15
+ expensive and slow.
16
+
17
+ Multi-model routing lets you:
18
+
19
+ - Use a strong **planner** (e.g. Claude Opus) for decomposition and review.
20
+ - Use a cheap or **local executor** (e.g. Qwen 3.5 on llama.cpp) for the bulk
21
+ of the work, at near-zero marginal cost.
22
+ - Fall back to another model automatically when the executor struggles.
23
+ - Pick a routing strategy that trades cost against quality on your terms.
24
+
25
+ ## The 3-tier plan → route → execute flow
26
+
27
+ UAP separates planning, routing, and execution into three components:
28
+
29
+ 1. **TaskPlanner** (`src/models/planner.ts`) — decomposes a task into subtasks.
30
+ It classifies the task's complexity, and for non-trivial work it breaks the
31
+ task into ordered subtasks with inputs, outputs, and constraints. Every plan
32
+ is auto-validated by a plan validator (`src/models/plan-validator.ts`) at all
33
+ complexity levels before it is returned.
34
+
35
+ 2. **ModelRouter** (`src/models/router.ts`) — assigns a model to the overall
36
+ task and to each subtask. It classifies the task by complexity and type,
37
+ applies the routing rules, and selects a model for the matched role
38
+ (planner / executor / reviewer / fallback). Classification results are cached
39
+ to avoid repeated work on near-identical task descriptions.
40
+
41
+ 3. **TaskExecutor** (`src/models/executor.ts`) — executes the plan. Subtasks run
42
+ with a bounded level of parallelism, each call has retry-with-backoff logic
43
+ (`retryDelayMs`, default 1000 ms), and failed attempts feed retry context
44
+ into subsequent tries. The executor produces per-subtask results and a
45
+ run summary.
46
+
47
+ The router and planner are wired together by the CLI and the programmatic API
48
+ (`createRouter`, `createPlanner`, `createExecutor` from `src/models/index.js`).
49
+
50
+ ## The model presets
51
+
52
+ The router ships with built-in presets (`ModelPresets` in
53
+ `src/models/types.ts`). These are the ids you reference in role assignments and
54
+ in `uap model` output:
55
+
56
+ | Preset id | Name | Provider | Context | $/1M in | $/1M out | Capabilities |
57
+ | -------------- | ------------------------- | --------- | -------- | ------- | -------- | -------------------------------------------------------------- |
58
+ | `opus-4.6` | Claude Opus 4.6 | anthropic | 200,000 | 7.5 | 37.5 | planning, complex-reasoning, code-generation, review, advanced-planning |
59
+ | `sonnet-4.6` | Claude Sonnet 4.6 | anthropic | 200,000 | 3.0 | 15.0 | code-generation, execution, review, agentic |
60
+ | `haiku` | Claude Haiku (Latest) | anthropic | 200,000 | 0.8 | 4.0 | code-generation, execution, simple-tasks |
61
+ | `qwen35-a3b` | Qwen 3.5 35B A3B (llama.cpp) | custom | 262,144 | 0 | 0 | code-generation, execution, planning, simple-tasks |
62
+ | `gpt-5.4` | GPT 5.4 | openai | 128,000 | 2.5 | 10.0 | planning, code-generation, complex-reasoning |
63
+ | `gpt-5.3-codex`| GPT 5.3 Codex | openai | 192,000 | 3.0 | 12.0 | code-generation, execution, complex-reasoning, agentic |
64
+
65
+ Run `uap model presets` to print the live list.
66
+
67
+ ### Runtime profiles
68
+
69
+ In addition to the presets above, UAP ships seven detailed JSON **profiles** in
70
+ [`config/model-profiles/`](../../config/model-profiles/). These carry richer
71
+ runtime settings the presets lack — pricing tiers, rate limits, tool-calling
72
+ options, extended-thinking budgets, server-optimization flags, and ready-to-run
73
+ launch commands:
74
+
75
+ | Profile (`_profile`) | `model` id | Provider | Context |
76
+ | -------------------- | -------------------------- | --------- | -------- |
77
+ | `claude-opus-4.6` | `claude-opus-4-6-20250616` | anthropic | 200,000 |
78
+ | `claude-sonnet-4.6` | `claude-sonnet-4-6-20250514` | anthropic | 200,000 |
79
+ | `claude-haiku-3.5` | `claude-3-5-haiku-20241022` | anthropic | 200,000 |
80
+ | `gpt-5.4` | `gpt-5.4` | openai | 128,000 |
81
+ | `gpt-5.3-codex` | `gpt-5.3-codex` | openai | 192,000 |
82
+ | `qwen35` | `qwen3.5-a3b-iq4xs` | custom (llama.cpp) | 262,144 |
83
+ | `generic` | `default` | any OpenAI-compatible | 32,768 |
84
+
85
+ The active profile is selected by the `UAP_MODEL_PROFILE` environment variable
86
+ (defaults to `generic`). The loader lives in `src/models/profile-loader.ts`.
87
+
88
+ ## How routing decides
89
+
90
+ The router first classifies the task, then applies rules.
91
+
92
+ **Complexity** is inferred from keywords. Examples (from
93
+ `COMPLEXITY_KEYWORDS` in `src/models/router.ts`):
94
+
95
+ - `critical` — security, authentication, authorization, deployment, migration,
96
+ production, database, encryption, credentials, secrets
97
+ - `high` — architecture, design, refactor, performance, optimization,
98
+ algorithm, distributed, concurrent, multi-step, complex
99
+ - `medium` — feature, implement, add, create, update, integrate, api, endpoint
100
+ - `low` — fix, typo, comment, rename, format, style, simple, minor, quick,
101
+ documentation
102
+
103
+ **Task type** is inferred similarly: `planning`, `coding`, `refactoring`,
104
+ `bug-fix`, `review`, `documentation`.
105
+
106
+ **Routing rules** (`DEFAULT_ROUTING_RULES`) map complexity/type to a role by
107
+ priority (higher wins):
108
+
109
+ | Match | Role | Priority |
110
+ | ---------------------------------------------------------- | ---------- | -------- |
111
+ | complexity `critical` | planner | 100 |
112
+ | keywords: security, authentication, deployment, migration | planner | 90 |
113
+ | complexity `high` | planner | 80 |
114
+ | keywords: architecture, design, refactor | planner | 70 |
115
+ | task type `planning` | planner | 70 |
116
+ | task type `review` | reviewer | 60 |
117
+ | complexity `medium` | executor | 50 |
118
+ | task type `coding` | executor | 50 |
119
+ | task type `bug-fix` | executor | 50 |
120
+ | complexity `low` | executor | 30 |
121
+ | task type `documentation` | executor | 30 |
122
+
123
+ The matched role is resolved to a concrete model via your role assignments.
124
+
125
+ **Routing strategy** further shapes selection. Four strategies are supported
126
+ (`routingStrategy`, default `balanced`):
127
+
128
+ - `cost-optimized` — minimize cost, use the cheapest capable model
129
+ - `performance-first` — maximize quality, use the best model
130
+ - `balanced` — balance cost and performance (default)
131
+ - `adaptive` — learn from task results over time
132
+
133
+ ## The `uap model` CLI
134
+
135
+ All subcommands are defined in `src/cli/model.ts`.
136
+
137
+ ```bash
138
+ uap model status # show configured models, role assignments, strategy
139
+ uap model route <task> # analyze how a task would be routed
140
+ uap model route <task> -v # + matched rules and cost comparison
141
+ uap model plan <task> # build an execution plan (decomposition + assignments)
142
+ uap model plan <task> -v # + per-subtask detail
143
+ uap model plan <task> -e # execute the plan (mock client unless API keys set)
144
+ uap model compare # compare cost/performance across sample configs
145
+ uap model presets # list all built-in model presets
146
+ uap model select # interactively assign models to each role
147
+ uap model select --save # persist the selection to .uap.json
148
+ uap model export # print current config as JSON
149
+ uap model export -f yaml # ... or YAML
150
+ uap model health # validate that assigned models exist and resolve
151
+ ```
152
+
153
+ Example — see how a task routes:
154
+
155
+ ```bash
156
+ uap model route "add OAuth2 login with JWT sessions" --verbose
157
+ ```
158
+
159
+ This prints the inferred complexity, task type, the selected and fallback
160
+ models, the matched rules, and an estimated cost comparison.
161
+
162
+ ## Configuring profiles
163
+
164
+ ### Role assignments
165
+
166
+ Configure the multi-model setup under `multiModel` in your `.uap.json`. The
167
+ default configuration is:
168
+
169
+ ```json
170
+ {
171
+ "multiModel": {
172
+ "enabled": true,
173
+ "models": ["opus-4.6", "qwen35-a3b"],
174
+ "roles": {
175
+ "planner": "opus-4.6",
176
+ "executor": "qwen35-a3b",
177
+ "fallback": "qwen35-a3b"
178
+ },
179
+ "routingStrategy": "balanced"
180
+ }
181
+ }
182
+ ```
183
+
184
+ A `reviewer` role is also supported; if unset it falls back to the planner.
185
+ You can add `costOptimization` (with `targetReduction`,
186
+ `maxPerformanceDegradation`, and `fallbackThreshold`) when using a
187
+ cost-oriented strategy.
188
+
189
+ The fastest way to edit this is interactively:
190
+
191
+ ```bash
192
+ uap model select --save
193
+ ```
194
+
195
+ ### Runtime profile + endpoints
196
+
197
+ Pick a runtime profile and provide credentials/endpoints via environment
198
+ variables (see each file's `running_config` in
199
+ [`config/model-profiles/`](../../config/model-profiles/)):
200
+
201
+ ```bash
202
+ # Anthropic-hosted models
203
+ export ANTHROPIC_API_KEY=<your-key>
204
+ export UAP_MODEL_PROFILE=claude-opus-4.6
205
+
206
+ # OpenAI-hosted models
207
+ export OPENAI_API_KEY=<your-key>
208
+ export UAP_MODEL_PROFILE=gpt-5.4
209
+
210
+ # Local / any OpenAI-compatible server
211
+ export TARGET_URL=http://127.0.0.1:8080
212
+ export UAP_MODEL_PROFILE=generic
213
+ ```
214
+
215
+ To customize a model's runtime behavior — temperature, tool-call batching,
216
+ extended-thinking budget, rate limits, or server-optimization flags — edit the
217
+ corresponding JSON file in `config/model-profiles/`. Each file is documented
218
+ inline with `_comment` fields.
219
+
220
+ ## See also
221
+
222
+ - [Droids and Skills](./DROIDS_AND_SKILLS.md) — specialist agents and reusable
223
+ workflows that run on top of the routed models.
@@ -0,0 +1,190 @@
1
+ # Policies
2
+
3
+ > Applies to UAP v1.40.0
4
+
5
+ UAP policies are **executable gates, not prose**. Each policy can carry a Python
6
+ enforcer that inspects an operation and decides whether it may proceed. A
7
+ `PreToolUse` hook queries the policy store and runs the relevant enforcers
8
+ before a tool call executes; an enforcer that exits with status `2` blocks the
9
+ call.
10
+
11
+ The engine lives in
12
+ [`src/policies/policy-gate.ts`](../../src/policies/policy-gate.ts); enforcers
13
+ live in [`src/policies/enforcers/`](../../src/policies/enforcers/); the CLI is
14
+ in [`src/cli/policy.ts`](../../src/cli/policy.ts).
15
+
16
+ ## The policy-gate model
17
+
18
+ The flow is **hook to DB to enforcer to block**:
19
+
20
+ 1. **Hook** — A `PreToolUse` hook fires before a tool call (Edit, Write, Bash,
21
+ etc.). Tools registered through the enforced tool router
22
+ ([`src/policies/enforced-tool-router.ts`](../../src/policies/enforced-tool-router.ts))
23
+ are automatically routed through the policy gate.
24
+ 2. **DB** — The gate
25
+ ([`PolicyGate`](../../src/policies/policy-gate.ts)) loads all active policies
26
+ from the policy store (a SQLite-backed DB, cached with a short TTL) and
27
+ filters them to the ones matching the current enforcement stage
28
+ (`pre-exec`, `post-exec`, `review`, or `always`).
29
+ 3. **Enforcer** — Each matching policy that has an attached Python enforcer is
30
+ invoked as `python3 <enforcer>.py --operation <op> --args <json>`. Enforcers
31
+ receive the operation name and its arguments and return a JSON verdict.
32
+ 4. **Block** — An enforcer emits `{"allowed": true, ...}` and exits `0` to
33
+ allow, or `{"allowed": false, "reason": ...}` and **exits `2` to block** (see
34
+ the shared `emit()` helper in
35
+ [`src/policies/enforcers/_common.py`](../../src/policies/enforcers/_common.py)).
36
+ When a `REQUIRED` policy blocks, the gate raises a `PolicyViolationError` and
37
+ the tool call never runs. Every check is written to an audit trail.
38
+
39
+ Task-completion operations (anything that looks like merge / deploy / release /
40
+ "mark done") are automatically re-checked at the `review` stage, so completion
41
+ gates fire even if the operation was issued at `pre-exec`.
42
+
43
+ ### Cooperative-guardrail caveat
44
+
45
+ The policy gate is a **cooperative-agent guardrail, not a hard security
46
+ boundary.** It steers well-behaved agents away from unsafe or out-of-process
47
+ actions; it does not sandbox a hostile process. Enforcers also honor explicit
48
+ overrides (for example the `worktree-required` enforcer respects
49
+ `UAP_NO_WORKTREE=1`). Treat policies as guardrails that keep cooperating agents
50
+ on the rails — not as a containment mechanism against untrusted code.
51
+
52
+ ## The enforcers
53
+
54
+ The enforcers in
55
+ [`src/policies/enforcers/`](../../src/policies/enforcers/) group as follows.
56
+ `_common.py` is shared helper code, not an enforcer.
57
+
58
+ ### Workflow & isolation
59
+
60
+ | Enforcer | What it gates |
61
+ |----------|---------------|
62
+ | `worktree_required` | Edit/Write/MultiEdit must target a `.worktrees/` path |
63
+ | `task_required` | A UAP task must be `in_progress` before mutating work |
64
+ | `coord_overlap` | Checks for in-flight agent path reservations (parallel-agent overlap) |
65
+ | `delivery_enforcement` | Route substantive coding through `uap deliver` |
66
+
67
+ ### Plan discipline
68
+
69
+ | Enforcer | What it gates |
70
+ |----------|---------------|
71
+ | `memory_before_plan` | Plans require a recent `uap memory query` |
72
+ | `codebase_read_before_plan` | Plans require prior reads of the target paths |
73
+ | `validate_plan_before_build` | A plan must be validated before building |
74
+
75
+ ### Quality & review gates
76
+
77
+ | Enforcer | What it gates |
78
+ |----------|---------------|
79
+ | `test_gate` | Changed services need accompanying test deltas |
80
+ | `schema_diff_gate` | Schema/pool changes must pass `uap schema-diff` |
81
+ | `expert_review_required` | A parallel expert review must precede ship |
82
+ | `architecture_review` | Merge / PR-ready operations need an architecture review when the diff warrants it |
83
+
84
+ ### Hygiene & artifacts
85
+
86
+ | Enforcer | What it gates |
87
+ |----------|---------------|
88
+ | `artifact_hygiene` | Block binary artifacts outside curated directories |
89
+ | `doc_live_over_report` | Block new `*_REPORT` / `*_COMPLETE` / `*_SUMMARY` / `*_PLAN` markdown files |
90
+ | `session_memory_write` | Code-changing sessions must write a lesson to memory |
91
+
92
+ ### Tooling & routing
93
+
94
+ | Enforcer | What it gates |
95
+ |----------|---------------|
96
+ | `mcp_router_first` | MCP tools must be loaded on demand |
97
+ | `rtk_wrap` | Heavy CLIs must be invoked via `rtk` |
98
+ | `parallel_reads` | Nudge when serial read fan-out is detected |
99
+
100
+ ### Infrastructure
101
+
102
+ | Enforcer | What it gates |
103
+ |----------|---------------|
104
+ | `iac_parity` | Live-state changes must have a matching infrastructure-as-code diff |
105
+ | `cluster_routing` | Cluster tooling context must match the component domain |
106
+
107
+ > The `architecture_review` enforcer file is stored with a policy-ID prefix
108
+ > (`<uuid>_architecture_review.py`) because it is attached to a specific
109
+ > installed policy; the others are named directly after their policy slug.
110
+
111
+ ## The `uap policy` CLI
112
+
113
+ All commands are subcommands of `uap policy`, implemented in
114
+ [`src/cli/policy.ts`](../../src/cli/policy.ts).
115
+
116
+ ### Inspect
117
+
118
+ ```bash
119
+ uap policy list # list all policies with status, level, category, stage, version
120
+ uap policy status # summary of enabled/disabled plus enforcement stages
121
+ ```
122
+
123
+ ### Install & attach
124
+
125
+ `install` reads a built-in policy markdown file and stores it. If a Python
126
+ enforcer with the matching name lives in `src/policies/enforcers/`, it is
127
+ auto-attached.
128
+
129
+ ```bash
130
+ uap policy install worktree-enforcement
131
+ ```
132
+
133
+ Add a policy from an arbitrary markdown file, or attach tool code to an existing
134
+ policy:
135
+
136
+ ```bash
137
+ uap policy add --file ./my-policy.md --category custom --level RECOMMENDED --tags "a,b"
138
+ uap policy add-tool --policy <id> --tool <name> --code ./enforcer.py
139
+ ```
140
+
141
+ ### Enable / disable / toggle
142
+
143
+ ```bash
144
+ uap policy enable <id> # turn a policy on
145
+ uap policy disable <id> # turn a policy off
146
+ uap policy toggle <id> # flip current state
147
+ uap policy toggle <id> --on
148
+ uap policy toggle <id> --off
149
+ ```
150
+
151
+ ### Tune enforcement
152
+
153
+ ```bash
154
+ uap policy level <id> --level REQUIRED # REQUIRED | RECOMMENDED | OPTIONAL
155
+ uap policy stage <id> --stage pre-exec # pre-exec | post-exec | review | always
156
+ ```
157
+
158
+ Only `REQUIRED` policies can block an operation; `RECOMMENDED` and `OPTIONAL`
159
+ checks are recorded but do not deny the call.
160
+
161
+ ### Check & audit
162
+
163
+ ```bash
164
+ uap policy check --operation Write --args '{"file_path":"src/x.ts"}' # dry-run a gate
165
+ uap policy audit --limit 20 # enforcement audit trail
166
+ uap policy audit --policy <id> # filter to one policy
167
+ ```
168
+
169
+ ### Other
170
+
171
+ ```bash
172
+ uap policy get-relevant --task "ship the api" --top 3 # context-relevant policies
173
+ uap policy convert --input <id|file.md> --output out.md # render to CLAUDE.md format
174
+ ```
175
+
176
+ ## How to add, enable, and disable a policy
177
+
178
+ 1. **Author** a policy markdown file (and, optionally, a Python enforcer named
179
+ after the policy slug with hyphens replaced by underscores).
180
+ 2. **Install / add** it: `uap policy install <name>` for a built-in, or
181
+ `uap policy add --file <path>` for a custom one. A co-located enforcer is
182
+ auto-attached on install; otherwise attach it with `uap policy add-tool`.
183
+ 3. **Set its teeth**: `uap policy level <id> --level REQUIRED` so it can block,
184
+ and `uap policy stage <id> --stage <stage>` to choose when it fires.
185
+ 4. **Enable / disable** at any time with `uap policy enable <id>` /
186
+ `uap policy disable <id>` (or `toggle`). Disabled policies are skipped by the
187
+ gate entirely.
188
+
189
+ Changes invalidate the gate's policy cache immediately, so they take effect on
190
+ the next tool call.
@@ -0,0 +1,185 @@
1
+ # Worktree Workflow
2
+
3
+ > Applies to UAP v1.40.0
4
+
5
+ UAP runs agents — often many of them at once — against a single repository. The
6
+ worktree workflow exists to keep every edit an agent makes isolated on its own
7
+ branch and its own checkout, so that agent work never touches the project root
8
+ and parallel agents never collide. This guide explains why that matters, walks
9
+ through the full lifecycle, and documents every `uap worktree` subcommand.
10
+
11
+ The implementation lives in [`src/cli/worktree.ts`](../../src/cli/worktree.ts).
12
+
13
+ ## Why isolation matters
14
+
15
+ When an agent edits files directly in the project root, three problems appear:
16
+
17
+ - **Cross-contamination** — a half-finished change sits in the working tree
18
+ where the next operation (build, test, another agent) can trip over it.
19
+ - **No clean PR boundary** — there is no branch that contains *only* this unit
20
+ of work, so review and rollback become guesswork.
21
+ - **Parallel collisions** — two agents writing to the same files at the same
22
+ time produce corrupt, non-deterministic state.
23
+
24
+ A git worktree solves all three. Each feature gets its own directory under
25
+ `.worktrees/NNN-<slug>/` backed by its own branch (`feature/NNN-<slug>`). An
26
+ agent works entirely inside that directory; the project root stays clean, and
27
+ any number of worktrees can be active simultaneously without interfering.
28
+
29
+ ## The lifecycle
30
+
31
+ ```
32
+ uap worktree create <slug> # 1. isolate: new branch + checkout under .worktrees/NNN-<slug>/
33
+ cd .worktrees/NNN-<slug>/ # 2. work: all edits happen here
34
+ uap worktree pr <id> # 3. publish: sync with master, push, open a PR
35
+ uap worktree finish <id> # 4. land: sync, push, merge the PR, then clean up
36
+ # (or) uap worktree cleanup <id> manual teardown without merging
37
+ ```
38
+
39
+ 1. **Create** — `create` allocates the next numeric ID from a registry,
40
+ builds the branch name `feature/NNN-<slug>`, and runs
41
+ `git worktree add -b <branch> .worktrees/NNN-<slug> <base>`. The base branch
42
+ defaults to your current branch (override with `--from`). The new worktree is
43
+ recorded in a SQLite registry at `.uap/worktree_registry.db` so concurrent
44
+ `create` calls never race on the same ID.
45
+ 2. **Work** — `cd` into the worktree and make changes. Everything stays on the
46
+ feature branch and inside the worktree directory.
47
+ 3. **Publish** — `pr` syncs the branch with `origin/master` (a clean merge, or a
48
+ clear failure asking you to resolve conflicts in the worktree), pushes the
49
+ branch, and opens a PR via the `gh` CLI.
50
+ 4. **Land** — `finish` does the full sync → push → ensure-PR → merge sequence,
51
+ deletes the remote branch, then runs `cleanup` for you.
52
+ 5. **Clean up** — `cleanup` removes the worktree, deletes the local and remote
53
+ branch, and marks the registry entry as `cleaned`.
54
+
55
+ ## The enforcement gate
56
+
57
+ `uap worktree ensure --strict` is the gate used by CI and by the per-edit
58
+ hook. It checks whether the current working directory is inside a
59
+ `.worktrees/` path and exits non-zero if it is not:
60
+
61
+ ```bash
62
+ uap worktree ensure --strict # exit 0 inside a worktree, exit 1 otherwise
63
+ ```
64
+
65
+ In strict mode, when you are *not* in a worktree, it prints the remediation and
66
+ fails hard:
67
+
68
+ ```
69
+ NOT in a worktree. All file edits are prohibited.
70
+ Run: uap worktree create <slug>
71
+ Then: cd .worktrees/<id>-<slug>/
72
+ ```
73
+
74
+ Without `--strict`, `ensure` is advisory: it lists active worktrees (flagging
75
+ any sitting on `master`/`main`) and suggests next steps instead of exiting
76
+ non-zero. The strict variant is what you wire into a CI step or a pre-edit
77
+ check; the advisory variant is for interactive orientation.
78
+
79
+ The same `.worktrees/` containment is enforced at edit time by the
80
+ `worktree-required` policy enforcer — see [POLICIES.md](./POLICIES.md).
81
+
82
+ ## Parallel-agent safety
83
+
84
+ The numeric ID is allocated from the SQLite registry, not from a directory
85
+ scan, so two agents calling `create` at the same moment get distinct IDs and
86
+ distinct branches. Because each agent operates in its own worktree directory on
87
+ its own branch, their edits, builds, and commits are fully isolated — the only
88
+ shared point is `origin/master`, which each branch syncs against at `pr`/`finish`
89
+ time. This is what makes conflict-free parallel agent execution possible.
90
+
91
+ ## Command reference
92
+
93
+ All commands are subcommands of `uap worktree`, registered in
94
+ [`src/bin/cli.ts`](../../src/bin/cli.ts) and implemented in
95
+ [`src/cli/worktree.ts`](../../src/cli/worktree.ts).
96
+
97
+ ### `create <slug>`
98
+
99
+ Create a new worktree and feature branch for `<slug>`.
100
+
101
+ | Flag | Description |
102
+ |------|-------------|
103
+ | `-f, --from <branch>` | Base branch (defaults to the current branch) |
104
+ | `-d, --description <description>` | Optional worktree description |
105
+
106
+ ```bash
107
+ uap worktree create add-user-auth
108
+ uap worktree create fix-login-bug --from master
109
+ ```
110
+
111
+ Produces a branch `feature/NNN-add-user-auth` and a checkout at
112
+ `.worktrees/NNN-add-user-auth/`, where `NNN` is the next zero-padded ID.
113
+
114
+ ### `list`
115
+
116
+ List all git worktrees under `.worktrees/`, with their ID, name, branch, and
117
+ path.
118
+
119
+ ```bash
120
+ uap worktree list
121
+ ```
122
+
123
+ ### `pr <id>`
124
+
125
+ Create a pull request from the worktree identified by `<id>`. Syncs the branch
126
+ with `origin/master`, pushes it, then runs `gh pr create --fill`.
127
+
128
+ | Flag | Description |
129
+ |------|-------------|
130
+ | `--draft` | Create the PR as a draft |
131
+
132
+ ```bash
133
+ uap worktree pr 7
134
+ uap worktree pr 7 --draft
135
+ ```
136
+
137
+ ### `finish <id>`
138
+
139
+ End-to-end landing: sync with `origin/master`, push, ensure a PR exists, merge
140
+ it (`gh pr merge --merge`), delete the remote branch, and then clean up the
141
+ worktree.
142
+
143
+ ```bash
144
+ uap worktree finish 7
145
+ ```
146
+
147
+ ### `cleanup <id>`
148
+
149
+ Remove the worktree directory, delete the local and remote branch, and mark the
150
+ registry entry as `cleaned`. Use this to tear down a worktree without merging.
151
+
152
+ ```bash
153
+ uap worktree cleanup 7
154
+ ```
155
+
156
+ ### `ensure`
157
+
158
+ Check whether you are working inside a worktree.
159
+
160
+ | Flag | Description |
161
+ |------|-------------|
162
+ | `--strict` | Exit with code 1 if not in a worktree (for use as a gate) |
163
+
164
+ ```bash
165
+ uap worktree ensure # advisory: list options
166
+ uap worktree ensure --strict # gate: exit non-zero if not in a worktree
167
+ ```
168
+
169
+ ### `prune`
170
+
171
+ Prune stale worktrees from the registry and disk.
172
+
173
+ | Flag | Description | Default |
174
+ |------|-------------|---------|
175
+ | `-o, --older-than <days>` | Only prune worktrees older than N days | `30` |
176
+ | `-f, --force` | Skip the confirmation prompt | off |
177
+ | `-n, --dry-run` | Preview without making changes | off |
178
+
179
+ ```bash
180
+ uap worktree prune --dry-run
181
+ uap worktree prune --older-than 14 --force
182
+ ```
183
+
184
+ Stale worktrees are selected by age from the registry; pruning deletes the
185
+ worktree directory and removes the registry row.