@miller-tech/uap 1.39.0 → 1.40.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +109 -642
- package/dist/.tsbuildinfo +1 -1
- package/dist/bin/cli.js +2 -2
- package/dist/bin/cli.js.map +1 -1
- package/dist/cli/deliver.d.ts +3 -2
- package/dist/cli/deliver.d.ts.map +1 -1
- package/dist/cli/deliver.js +10 -5
- package/dist/cli/deliver.js.map +1 -1
- package/docs/INDEX.md +48 -286
- package/docs/architecture/OVERVIEW.md +328 -0
- package/docs/architecture/PROTOCOL.md +204 -0
- package/docs/benchmarks/README.md +17 -192
- package/docs/getting-started/CONFIGURATION.md +237 -0
- package/docs/getting-started/INSTALLATION.md +125 -0
- package/docs/getting-started/QUICKSTART.md +115 -0
- package/docs/guides/COORDINATION.md +162 -0
- package/docs/guides/DELIVER.md +115 -0
- package/docs/guides/DEPLOY_BATCHING.md +212 -0
- package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
- package/docs/guides/LOCAL_MODELS.md +148 -0
- package/docs/guides/MCP_ROUTER.md +195 -0
- package/docs/guides/MEMORY.md +235 -0
- package/docs/guides/MULTI_MODEL.md +223 -0
- package/docs/guides/POLICIES.md +190 -0
- package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
- package/docs/integrations/MCP_ROUTER.md +147 -0
- package/docs/integrations/RTK.md +102 -0
- package/docs/reference/API.md +485 -0
- package/docs/reference/CLI.md +719 -0
- package/docs/reference/CONFIGURATION.md +90 -193
- package/docs/reference/DATABASE_SCHEMA.md +110 -344
- package/docs/reference/FEATURES.md +176 -472
- package/docs/reference/PATTERNS.md +102 -0
- package/docs/reference/PLATFORMS.md +83 -0
- package/package.json +1 -1
- package/docs/AGENTS.md +0 -423
- package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
- package/docs/GETTING_STARTED.md +0 -288
- package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
- package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
- package/docs/architecture/EXPERT_STACK.md +0 -137
- package/docs/architecture/MULTI_MODEL.md +0 -224
- package/docs/architecture/PLATFORM_GATING.md +0 -68
- package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
- package/docs/architecture/UAP_COMPLIANCE.md +0 -217
- package/docs/architecture/UAP_PROTOCOL.md +0 -339
- package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
- package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
- package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
- package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
- package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
- package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
- package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
- package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
- package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
- package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
- package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
- package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
- package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
- package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
- package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
- package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
- package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
- package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
- package/docs/archive/opencode-integration-guide.md +0 -740
- package/docs/archive/opencode-integration-quickref.md +0 -180
- package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
- package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
- package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
- package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
- package/docs/blog/local-coding-agents.md +0 -266
- package/docs/blog/x-thread.md +0 -254
- package/docs/deployment/DEPLOYMENT.md +0 -895
- package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
- package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
- package/docs/deployment/DEPLOY_BATCHING.md +0 -273
- package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
- package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
- package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
- package/docs/getting-started/INTEGRATION.md +0 -628
- package/docs/getting-started/OVERVIEW.md +0 -324
- package/docs/getting-started/SETUP.md +0 -377
- package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
- package/docs/integrations/RTK_INTEGRATION.md +0 -468
- package/docs/operations/TROUBLESHOOTING.md +0 -660
- package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
- package/docs/pr/UPSTREAM_PRS.md +0 -424
- package/docs/reference/API_REFERENCE.md +0 -903
- package/docs/reference/EXPERT_DROIDS.md +0 -219
- package/docs/reference/HARNESS-MATRIX.md +0 -318
- package/docs/reference/PATTERN_LIBRARY.md +0 -636
- package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
- package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
- package/docs/research/DOMAIN_STRATEGIES.md +0 -316
- package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
- package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
- package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
- package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
- package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
|
@@ -0,0 +1,223 @@
|
|
|
1
|
+
# Multi-Model Routing
|
|
2
|
+
|
|
3
|
+
> Applies to UAP **v1.40.0**
|
|
4
|
+
|
|
5
|
+
UAP runs agentic work across multiple LLMs instead of one. A high-capability
|
|
6
|
+
model plans, a cheaper or local model executes, and a reviewer model checks the
|
|
7
|
+
result. Routing decisions are made per task (and per subtask) so you pay for
|
|
8
|
+
expensive reasoning only when the work actually needs it.
|
|
9
|
+
|
|
10
|
+
## Why multi-model
|
|
11
|
+
|
|
12
|
+
A single frontier model is the simplest setup, but most of the tokens an agent
|
|
13
|
+
spends are on routine execution — applying an edit, running a tool, writing a
|
|
14
|
+
test — not on hard reasoning. Sending all of that to a premium model is
|
|
15
|
+
expensive and slow.
|
|
16
|
+
|
|
17
|
+
Multi-model routing lets you:
|
|
18
|
+
|
|
19
|
+
- Use a strong **planner** (e.g. Claude Opus) for decomposition and review.
|
|
20
|
+
- Use a cheap or **local executor** (e.g. Qwen 3.5 on llama.cpp) for the bulk
|
|
21
|
+
of the work, at near-zero marginal cost.
|
|
22
|
+
- Fall back to another model automatically when the executor struggles.
|
|
23
|
+
- Pick a routing strategy that trades cost against quality on your terms.
|
|
24
|
+
|
|
25
|
+
## The 3-tier plan → route → execute flow
|
|
26
|
+
|
|
27
|
+
UAP separates planning, routing, and execution into three components:
|
|
28
|
+
|
|
29
|
+
1. **TaskPlanner** (`src/models/planner.ts`) — decomposes a task into subtasks.
|
|
30
|
+
It classifies the task's complexity, and for non-trivial work it breaks the
|
|
31
|
+
task into ordered subtasks with inputs, outputs, and constraints. Every plan
|
|
32
|
+
is auto-validated by a plan validator (`src/models/plan-validator.ts`) at all
|
|
33
|
+
complexity levels before it is returned.
|
|
34
|
+
|
|
35
|
+
2. **ModelRouter** (`src/models/router.ts`) — assigns a model to the overall
|
|
36
|
+
task and to each subtask. It classifies the task by complexity and type,
|
|
37
|
+
applies the routing rules, and selects a model for the matched role
|
|
38
|
+
(planner / executor / reviewer / fallback). Classification results are cached
|
|
39
|
+
to avoid repeated work on near-identical task descriptions.
|
|
40
|
+
|
|
41
|
+
3. **TaskExecutor** (`src/models/executor.ts`) — executes the plan. Subtasks run
|
|
42
|
+
with a bounded level of parallelism, each call has retry-with-backoff logic
|
|
43
|
+
(`retryDelayMs`, default 1000 ms), and failed attempts feed retry context
|
|
44
|
+
into subsequent tries. The executor produces per-subtask results and a
|
|
45
|
+
run summary.
|
|
46
|
+
|
|
47
|
+
The router and planner are wired together by the CLI and the programmatic API
|
|
48
|
+
(`createRouter`, `createPlanner`, `createExecutor` from `src/models/index.js`).
|
|
49
|
+
|
|
50
|
+
## The model presets
|
|
51
|
+
|
|
52
|
+
The router ships with built-in presets (`ModelPresets` in
|
|
53
|
+
`src/models/types.ts`). These are the ids you reference in role assignments and
|
|
54
|
+
in `uap model` output:
|
|
55
|
+
|
|
56
|
+
| Preset id | Name | Provider | Context | $/1M in | $/1M out | Capabilities |
|
|
57
|
+
| -------------- | ------------------------- | --------- | -------- | ------- | -------- | -------------------------------------------------------------- |
|
|
58
|
+
| `opus-4.6` | Claude Opus 4.6 | anthropic | 200,000 | 7.5 | 37.5 | planning, complex-reasoning, code-generation, review, advanced-planning |
|
|
59
|
+
| `sonnet-4.6` | Claude Sonnet 4.6 | anthropic | 200,000 | 3.0 | 15.0 | code-generation, execution, review, agentic |
|
|
60
|
+
| `haiku` | Claude Haiku (Latest) | anthropic | 200,000 | 0.8 | 4.0 | code-generation, execution, simple-tasks |
|
|
61
|
+
| `qwen35-a3b` | Qwen 3.5 35B A3B (llama.cpp) | custom | 262,144 | 0 | 0 | code-generation, execution, planning, simple-tasks |
|
|
62
|
+
| `gpt-5.4` | GPT 5.4 | openai | 128,000 | 2.5 | 10.0 | planning, code-generation, complex-reasoning |
|
|
63
|
+
| `gpt-5.3-codex`| GPT 5.3 Codex | openai | 192,000 | 3.0 | 12.0 | code-generation, execution, complex-reasoning, agentic |
|
|
64
|
+
|
|
65
|
+
Run `uap model presets` to print the live list.
|
|
66
|
+
|
|
67
|
+
### Runtime profiles
|
|
68
|
+
|
|
69
|
+
In addition to the presets above, UAP ships seven detailed JSON **profiles** in
|
|
70
|
+
[`config/model-profiles/`](../../config/model-profiles/). These carry richer
|
|
71
|
+
runtime settings the presets lack — pricing tiers, rate limits, tool-calling
|
|
72
|
+
options, extended-thinking budgets, server-optimization flags, and ready-to-run
|
|
73
|
+
launch commands:
|
|
74
|
+
|
|
75
|
+
| Profile (`_profile`) | `model` id | Provider | Context |
|
|
76
|
+
| -------------------- | -------------------------- | --------- | -------- |
|
|
77
|
+
| `claude-opus-4.6` | `claude-opus-4-6-20250616` | anthropic | 200,000 |
|
|
78
|
+
| `claude-sonnet-4.6` | `claude-sonnet-4-6-20250514` | anthropic | 200,000 |
|
|
79
|
+
| `claude-haiku-3.5` | `claude-3-5-haiku-20241022` | anthropic | 200,000 |
|
|
80
|
+
| `gpt-5.4` | `gpt-5.4` | openai | 128,000 |
|
|
81
|
+
| `gpt-5.3-codex` | `gpt-5.3-codex` | openai | 192,000 |
|
|
82
|
+
| `qwen35` | `qwen3.5-a3b-iq4xs` | custom (llama.cpp) | 262,144 |
|
|
83
|
+
| `generic` | `default` | any OpenAI-compatible | 32,768 |
|
|
84
|
+
|
|
85
|
+
The active profile is selected by the `UAP_MODEL_PROFILE` environment variable
|
|
86
|
+
(defaults to `generic`). The loader lives in `src/models/profile-loader.ts`.
|
|
87
|
+
|
|
88
|
+
## How routing decides
|
|
89
|
+
|
|
90
|
+
The router first classifies the task, then applies rules.
|
|
91
|
+
|
|
92
|
+
**Complexity** is inferred from keywords. Examples (from
|
|
93
|
+
`COMPLEXITY_KEYWORDS` in `src/models/router.ts`):
|
|
94
|
+
|
|
95
|
+
- `critical` — security, authentication, authorization, deployment, migration,
|
|
96
|
+
production, database, encryption, credentials, secrets
|
|
97
|
+
- `high` — architecture, design, refactor, performance, optimization,
|
|
98
|
+
algorithm, distributed, concurrent, multi-step, complex
|
|
99
|
+
- `medium` — feature, implement, add, create, update, integrate, api, endpoint
|
|
100
|
+
- `low` — fix, typo, comment, rename, format, style, simple, minor, quick,
|
|
101
|
+
documentation
|
|
102
|
+
|
|
103
|
+
**Task type** is inferred similarly: `planning`, `coding`, `refactoring`,
|
|
104
|
+
`bug-fix`, `review`, `documentation`.
|
|
105
|
+
|
|
106
|
+
**Routing rules** (`DEFAULT_ROUTING_RULES`) map complexity/type to a role by
|
|
107
|
+
priority (higher wins):
|
|
108
|
+
|
|
109
|
+
| Match | Role | Priority |
|
|
110
|
+
| ---------------------------------------------------------- | ---------- | -------- |
|
|
111
|
+
| complexity `critical` | planner | 100 |
|
|
112
|
+
| keywords: security, authentication, deployment, migration | planner | 90 |
|
|
113
|
+
| complexity `high` | planner | 80 |
|
|
114
|
+
| keywords: architecture, design, refactor | planner | 70 |
|
|
115
|
+
| task type `planning` | planner | 70 |
|
|
116
|
+
| task type `review` | reviewer | 60 |
|
|
117
|
+
| complexity `medium` | executor | 50 |
|
|
118
|
+
| task type `coding` | executor | 50 |
|
|
119
|
+
| task type `bug-fix` | executor | 50 |
|
|
120
|
+
| complexity `low` | executor | 30 |
|
|
121
|
+
| task type `documentation` | executor | 30 |
|
|
122
|
+
|
|
123
|
+
The matched role is resolved to a concrete model via your role assignments.
|
|
124
|
+
|
|
125
|
+
**Routing strategy** further shapes selection. Four strategies are supported
|
|
126
|
+
(`routingStrategy`, default `balanced`):
|
|
127
|
+
|
|
128
|
+
- `cost-optimized` — minimize cost, use the cheapest capable model
|
|
129
|
+
- `performance-first` — maximize quality, use the best model
|
|
130
|
+
- `balanced` — balance cost and performance (default)
|
|
131
|
+
- `adaptive` — learn from task results over time
|
|
132
|
+
|
|
133
|
+
## The `uap model` CLI
|
|
134
|
+
|
|
135
|
+
All subcommands are defined in `src/cli/model.ts`.
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
uap model status # show configured models, role assignments, strategy
|
|
139
|
+
uap model route <task> # analyze how a task would be routed
|
|
140
|
+
uap model route <task> -v # + matched rules and cost comparison
|
|
141
|
+
uap model plan <task> # build an execution plan (decomposition + assignments)
|
|
142
|
+
uap model plan <task> -v # + per-subtask detail
|
|
143
|
+
uap model plan <task> -e # execute the plan (mock client unless API keys set)
|
|
144
|
+
uap model compare # compare cost/performance across sample configs
|
|
145
|
+
uap model presets # list all built-in model presets
|
|
146
|
+
uap model select # interactively assign models to each role
|
|
147
|
+
uap model select --save # persist the selection to .uap.json
|
|
148
|
+
uap model export # print current config as JSON
|
|
149
|
+
uap model export -f yaml # ... or YAML
|
|
150
|
+
uap model health # validate that assigned models exist and resolve
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Example — see how a task routes:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
uap model route "add OAuth2 login with JWT sessions" --verbose
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
This prints the inferred complexity, task type, the selected and fallback
|
|
160
|
+
models, the matched rules, and an estimated cost comparison.
|
|
161
|
+
|
|
162
|
+
## Configuring profiles
|
|
163
|
+
|
|
164
|
+
### Role assignments
|
|
165
|
+
|
|
166
|
+
Configure the multi-model setup under `multiModel` in your `.uap.json`. The
|
|
167
|
+
default configuration is:
|
|
168
|
+
|
|
169
|
+
```json
|
|
170
|
+
{
|
|
171
|
+
"multiModel": {
|
|
172
|
+
"enabled": true,
|
|
173
|
+
"models": ["opus-4.6", "qwen35-a3b"],
|
|
174
|
+
"roles": {
|
|
175
|
+
"planner": "opus-4.6",
|
|
176
|
+
"executor": "qwen35-a3b",
|
|
177
|
+
"fallback": "qwen35-a3b"
|
|
178
|
+
},
|
|
179
|
+
"routingStrategy": "balanced"
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
A `reviewer` role is also supported; if unset it falls back to the planner.
|
|
185
|
+
You can add `costOptimization` (with `targetReduction`,
|
|
186
|
+
`maxPerformanceDegradation`, and `fallbackThreshold`) when using a
|
|
187
|
+
cost-oriented strategy.
|
|
188
|
+
|
|
189
|
+
The fastest way to edit this is interactively:
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
uap model select --save
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### Runtime profile + endpoints
|
|
196
|
+
|
|
197
|
+
Pick a runtime profile and provide credentials/endpoints via environment
|
|
198
|
+
variables (see each file's `running_config` in
|
|
199
|
+
[`config/model-profiles/`](../../config/model-profiles/)):
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
# Anthropic-hosted models
|
|
203
|
+
export ANTHROPIC_API_KEY=<your-key>
|
|
204
|
+
export UAP_MODEL_PROFILE=claude-opus-4.6
|
|
205
|
+
|
|
206
|
+
# OpenAI-hosted models
|
|
207
|
+
export OPENAI_API_KEY=<your-key>
|
|
208
|
+
export UAP_MODEL_PROFILE=gpt-5.4
|
|
209
|
+
|
|
210
|
+
# Local / any OpenAI-compatible server
|
|
211
|
+
export TARGET_URL=http://127.0.0.1:8080
|
|
212
|
+
export UAP_MODEL_PROFILE=generic
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
To customize a model's runtime behavior — temperature, tool-call batching,
|
|
216
|
+
extended-thinking budget, rate limits, or server-optimization flags — edit the
|
|
217
|
+
corresponding JSON file in `config/model-profiles/`. Each file is documented
|
|
218
|
+
inline with `_comment` fields.
|
|
219
|
+
|
|
220
|
+
## See also
|
|
221
|
+
|
|
222
|
+
- [Droids and Skills](./DROIDS_AND_SKILLS.md) — specialist agents and reusable
|
|
223
|
+
workflows that run on top of the routed models.
|
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# Policies
|
|
2
|
+
|
|
3
|
+
> Applies to UAP v1.40.0
|
|
4
|
+
|
|
5
|
+
UAP policies are **executable gates, not prose**. Each policy can carry a Python
|
|
6
|
+
enforcer that inspects an operation and decides whether it may proceed. A
|
|
7
|
+
`PreToolUse` hook queries the policy store and runs the relevant enforcers
|
|
8
|
+
before a tool call executes; an enforcer that exits with status `2` blocks the
|
|
9
|
+
call.
|
|
10
|
+
|
|
11
|
+
The engine lives in
|
|
12
|
+
[`src/policies/policy-gate.ts`](../../src/policies/policy-gate.ts); enforcers
|
|
13
|
+
live in [`src/policies/enforcers/`](../../src/policies/enforcers/); the CLI is
|
|
14
|
+
in [`src/cli/policy.ts`](../../src/cli/policy.ts).
|
|
15
|
+
|
|
16
|
+
## The policy-gate model
|
|
17
|
+
|
|
18
|
+
The flow is **hook to DB to enforcer to block**:
|
|
19
|
+
|
|
20
|
+
1. **Hook** — A `PreToolUse` hook fires before a tool call (Edit, Write, Bash,
|
|
21
|
+
etc.). Tools registered through the enforced tool router
|
|
22
|
+
([`src/policies/enforced-tool-router.ts`](../../src/policies/enforced-tool-router.ts))
|
|
23
|
+
are automatically routed through the policy gate.
|
|
24
|
+
2. **DB** — The gate
|
|
25
|
+
([`PolicyGate`](../../src/policies/policy-gate.ts)) loads all active policies
|
|
26
|
+
from the policy store (a SQLite-backed DB, cached with a short TTL) and
|
|
27
|
+
filters them to the ones matching the current enforcement stage
|
|
28
|
+
(`pre-exec`, `post-exec`, `review`, or `always`).
|
|
29
|
+
3. **Enforcer** — Each matching policy that has an attached Python enforcer is
|
|
30
|
+
invoked as `python3 <enforcer>.py --operation <op> --args <json>`. Enforcers
|
|
31
|
+
receive the operation name and its arguments and return a JSON verdict.
|
|
32
|
+
4. **Block** — An enforcer emits `{"allowed": true, ...}` and exits `0` to
|
|
33
|
+
allow, or `{"allowed": false, "reason": ...}` and **exits `2` to block** (see
|
|
34
|
+
the shared `emit()` helper in
|
|
35
|
+
[`src/policies/enforcers/_common.py`](../../src/policies/enforcers/_common.py)).
|
|
36
|
+
When a `REQUIRED` policy blocks, the gate raises a `PolicyViolationError` and
|
|
37
|
+
the tool call never runs. Every check is written to an audit trail.
|
|
38
|
+
|
|
39
|
+
Task-completion operations (anything that looks like merge / deploy / release /
|
|
40
|
+
"mark done") are automatically re-checked at the `review` stage, so completion
|
|
41
|
+
gates fire even if the operation was issued at `pre-exec`.
|
|
42
|
+
|
|
43
|
+
### Cooperative-guardrail caveat
|
|
44
|
+
|
|
45
|
+
The policy gate is a **cooperative-agent guardrail, not a hard security
|
|
46
|
+
boundary.** It steers well-behaved agents away from unsafe or out-of-process
|
|
47
|
+
actions; it does not sandbox a hostile process. Enforcers also honor explicit
|
|
48
|
+
overrides (for example the `worktree-required` enforcer respects
|
|
49
|
+
`UAP_NO_WORKTREE=1`). Treat policies as guardrails that keep cooperating agents
|
|
50
|
+
on the rails — not as a containment mechanism against untrusted code.
|
|
51
|
+
|
|
52
|
+
## The enforcers
|
|
53
|
+
|
|
54
|
+
The enforcers in
|
|
55
|
+
[`src/policies/enforcers/`](../../src/policies/enforcers/) group as follows.
|
|
56
|
+
`_common.py` is shared helper code, not an enforcer.
|
|
57
|
+
|
|
58
|
+
### Workflow & isolation
|
|
59
|
+
|
|
60
|
+
| Enforcer | What it gates |
|
|
61
|
+
|----------|---------------|
|
|
62
|
+
| `worktree_required` | Edit/Write/MultiEdit must target a `.worktrees/` path |
|
|
63
|
+
| `task_required` | A UAP task must be `in_progress` before mutating work |
|
|
64
|
+
| `coord_overlap` | Checks for in-flight agent path reservations (parallel-agent overlap) |
|
|
65
|
+
| `delivery_enforcement` | Route substantive coding through `uap deliver` |
|
|
66
|
+
|
|
67
|
+
### Plan discipline
|
|
68
|
+
|
|
69
|
+
| Enforcer | What it gates |
|
|
70
|
+
|----------|---------------|
|
|
71
|
+
| `memory_before_plan` | Plans require a recent `uap memory query` |
|
|
72
|
+
| `codebase_read_before_plan` | Plans require prior reads of the target paths |
|
|
73
|
+
| `validate_plan_before_build` | A plan must be validated before building |
|
|
74
|
+
|
|
75
|
+
### Quality & review gates
|
|
76
|
+
|
|
77
|
+
| Enforcer | What it gates |
|
|
78
|
+
|----------|---------------|
|
|
79
|
+
| `test_gate` | Changed services need accompanying test deltas |
|
|
80
|
+
| `schema_diff_gate` | Schema/pool changes must pass `uap schema-diff` |
|
|
81
|
+
| `expert_review_required` | A parallel expert review must precede ship |
|
|
82
|
+
| `architecture_review` | Merge / PR-ready operations need an architecture review when the diff warrants it |
|
|
83
|
+
|
|
84
|
+
### Hygiene & artifacts
|
|
85
|
+
|
|
86
|
+
| Enforcer | What it gates |
|
|
87
|
+
|----------|---------------|
|
|
88
|
+
| `artifact_hygiene` | Block binary artifacts outside curated directories |
|
|
89
|
+
| `doc_live_over_report` | Block new `*_REPORT` / `*_COMPLETE` / `*_SUMMARY` / `*_PLAN` markdown files |
|
|
90
|
+
| `session_memory_write` | Code-changing sessions must write a lesson to memory |
|
|
91
|
+
|
|
92
|
+
### Tooling & routing
|
|
93
|
+
|
|
94
|
+
| Enforcer | What it gates |
|
|
95
|
+
|----------|---------------|
|
|
96
|
+
| `mcp_router_first` | MCP tools must be loaded on demand |
|
|
97
|
+
| `rtk_wrap` | Heavy CLIs must be invoked via `rtk` |
|
|
98
|
+
| `parallel_reads` | Nudge when serial read fan-out is detected |
|
|
99
|
+
|
|
100
|
+
### Infrastructure
|
|
101
|
+
|
|
102
|
+
| Enforcer | What it gates |
|
|
103
|
+
|----------|---------------|
|
|
104
|
+
| `iac_parity` | Live-state changes must have a matching infrastructure-as-code diff |
|
|
105
|
+
| `cluster_routing` | Cluster tooling context must match the component domain |
|
|
106
|
+
|
|
107
|
+
> The `architecture_review` enforcer file is stored with a policy-ID prefix
|
|
108
|
+
> (`<uuid>_architecture_review.py`) because it is attached to a specific
|
|
109
|
+
> installed policy; the others are named directly after their policy slug.
|
|
110
|
+
|
|
111
|
+
## The `uap policy` CLI
|
|
112
|
+
|
|
113
|
+
All commands are subcommands of `uap policy`, implemented in
|
|
114
|
+
[`src/cli/policy.ts`](../../src/cli/policy.ts).
|
|
115
|
+
|
|
116
|
+
### Inspect
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
uap policy list # list all policies with status, level, category, stage, version
|
|
120
|
+
uap policy status # summary of enabled/disabled plus enforcement stages
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Install & attach
|
|
124
|
+
|
|
125
|
+
`install` reads a built-in policy markdown file and stores it. If a Python
|
|
126
|
+
enforcer with the matching name lives in `src/policies/enforcers/`, it is
|
|
127
|
+
auto-attached.
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
uap policy install worktree-enforcement
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Add a policy from an arbitrary markdown file, or attach tool code to an existing
|
|
134
|
+
policy:
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
uap policy add --file ./my-policy.md --category custom --level RECOMMENDED --tags "a,b"
|
|
138
|
+
uap policy add-tool --policy <id> --tool <name> --code ./enforcer.py
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Enable / disable / toggle
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
uap policy enable <id> # turn a policy on
|
|
145
|
+
uap policy disable <id> # turn a policy off
|
|
146
|
+
uap policy toggle <id> # flip current state
|
|
147
|
+
uap policy toggle <id> --on
|
|
148
|
+
uap policy toggle <id> --off
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Tune enforcement
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
uap policy level <id> --level REQUIRED # REQUIRED | RECOMMENDED | OPTIONAL
|
|
155
|
+
uap policy stage <id> --stage pre-exec # pre-exec | post-exec | review | always
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Only `REQUIRED` policies can block an operation; `RECOMMENDED` and `OPTIONAL`
|
|
159
|
+
checks are recorded but do not deny the call.
|
|
160
|
+
|
|
161
|
+
### Check & audit
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
uap policy check --operation Write --args '{"file_path":"src/x.ts"}' # dry-run a gate
|
|
165
|
+
uap policy audit --limit 20 # enforcement audit trail
|
|
166
|
+
uap policy audit --policy <id> # filter to one policy
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### Other
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
uap policy get-relevant --task "ship the api" --top 3 # context-relevant policies
|
|
173
|
+
uap policy convert --input <id|file.md> --output out.md # render to CLAUDE.md format
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## How to add, enable, and disable a policy
|
|
177
|
+
|
|
178
|
+
1. **Author** a policy markdown file (and, optionally, a Python enforcer named
|
|
179
|
+
after the policy slug with hyphens replaced by underscores).
|
|
180
|
+
2. **Install / add** it: `uap policy install <name>` for a built-in, or
|
|
181
|
+
`uap policy add --file <path>` for a custom one. A co-located enforcer is
|
|
182
|
+
auto-attached on install; otherwise attach it with `uap policy add-tool`.
|
|
183
|
+
3. **Set its teeth**: `uap policy level <id> --level REQUIRED` so it can block,
|
|
184
|
+
and `uap policy stage <id> --stage <stage>` to choose when it fires.
|
|
185
|
+
4. **Enable / disable** at any time with `uap policy enable <id>` /
|
|
186
|
+
`uap policy disable <id>` (or `toggle`). Disabled policies are skipped by the
|
|
187
|
+
gate entirely.
|
|
188
|
+
|
|
189
|
+
Changes invalidate the gate's policy cache immediately, so they take effect on
|
|
190
|
+
the next tool call.
|
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# Worktree Workflow
|
|
2
|
+
|
|
3
|
+
> Applies to UAP v1.40.0
|
|
4
|
+
|
|
5
|
+
UAP runs agents — often many of them at once — against a single repository. The
|
|
6
|
+
worktree workflow exists to keep every edit an agent makes isolated on its own
|
|
7
|
+
branch and its own checkout, so that agent work never touches the project root
|
|
8
|
+
and parallel agents never collide. This guide explains why that matters, walks
|
|
9
|
+
through the full lifecycle, and documents every `uap worktree` subcommand.
|
|
10
|
+
|
|
11
|
+
The implementation lives in [`src/cli/worktree.ts`](../../src/cli/worktree.ts).
|
|
12
|
+
|
|
13
|
+
## Why isolation matters
|
|
14
|
+
|
|
15
|
+
When an agent edits files directly in the project root, three problems appear:
|
|
16
|
+
|
|
17
|
+
- **Cross-contamination** — a half-finished change sits in the working tree
|
|
18
|
+
where the next operation (build, test, another agent) can trip over it.
|
|
19
|
+
- **No clean PR boundary** — there is no branch that contains *only* this unit
|
|
20
|
+
of work, so review and rollback become guesswork.
|
|
21
|
+
- **Parallel collisions** — two agents writing to the same files at the same
|
|
22
|
+
time produce corrupt, non-deterministic state.
|
|
23
|
+
|
|
24
|
+
A git worktree solves all three. Each feature gets its own directory under
|
|
25
|
+
`.worktrees/NNN-<slug>/` backed by its own branch (`feature/NNN-<slug>`). An
|
|
26
|
+
agent works entirely inside that directory; the project root stays clean, and
|
|
27
|
+
any number of worktrees can be active simultaneously without interfering.
|
|
28
|
+
|
|
29
|
+
## The lifecycle
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
uap worktree create <slug> # 1. isolate: new branch + checkout under .worktrees/NNN-<slug>/
|
|
33
|
+
cd .worktrees/NNN-<slug>/ # 2. work: all edits happen here
|
|
34
|
+
uap worktree pr <id> # 3. publish: sync with master, push, open a PR
|
|
35
|
+
uap worktree finish <id> # 4. land: sync, push, merge the PR, then clean up
|
|
36
|
+
# (or) uap worktree cleanup <id> manual teardown without merging
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
1. **Create** — `create` allocates the next numeric ID from a registry,
|
|
40
|
+
builds the branch name `feature/NNN-<slug>`, and runs
|
|
41
|
+
`git worktree add -b <branch> .worktrees/NNN-<slug> <base>`. The base branch
|
|
42
|
+
defaults to your current branch (override with `--from`). The new worktree is
|
|
43
|
+
recorded in a SQLite registry at `.uap/worktree_registry.db` so concurrent
|
|
44
|
+
`create` calls never race on the same ID.
|
|
45
|
+
2. **Work** — `cd` into the worktree and make changes. Everything stays on the
|
|
46
|
+
feature branch and inside the worktree directory.
|
|
47
|
+
3. **Publish** — `pr` syncs the branch with `origin/master` (a clean merge, or a
|
|
48
|
+
clear failure asking you to resolve conflicts in the worktree), pushes the
|
|
49
|
+
branch, and opens a PR via the `gh` CLI.
|
|
50
|
+
4. **Land** — `finish` does the full sync → push → ensure-PR → merge sequence,
|
|
51
|
+
deletes the remote branch, then runs `cleanup` for you.
|
|
52
|
+
5. **Clean up** — `cleanup` removes the worktree, deletes the local and remote
|
|
53
|
+
branch, and marks the registry entry as `cleaned`.
|
|
54
|
+
|
|
55
|
+
## The enforcement gate
|
|
56
|
+
|
|
57
|
+
`uap worktree ensure --strict` is the gate used by CI and by the per-edit
|
|
58
|
+
hook. It checks whether the current working directory is inside a
|
|
59
|
+
`.worktrees/` path and exits non-zero if it is not:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
uap worktree ensure --strict # exit 0 inside a worktree, exit 1 otherwise
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
In strict mode, when you are *not* in a worktree, it prints the remediation and
|
|
66
|
+
fails hard:
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
NOT in a worktree. All file edits are prohibited.
|
|
70
|
+
Run: uap worktree create <slug>
|
|
71
|
+
Then: cd .worktrees/<id>-<slug>/
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Without `--strict`, `ensure` is advisory: it lists active worktrees (flagging
|
|
75
|
+
any sitting on `master`/`main`) and suggests next steps instead of exiting
|
|
76
|
+
non-zero. The strict variant is what you wire into a CI step or a pre-edit
|
|
77
|
+
check; the advisory variant is for interactive orientation.
|
|
78
|
+
|
|
79
|
+
The same `.worktrees/` containment is enforced at edit time by the
|
|
80
|
+
`worktree-required` policy enforcer — see [POLICIES.md](./POLICIES.md).
|
|
81
|
+
|
|
82
|
+
## Parallel-agent safety
|
|
83
|
+
|
|
84
|
+
The numeric ID is allocated from the SQLite registry, not from a directory
|
|
85
|
+
scan, so two agents calling `create` at the same moment get distinct IDs and
|
|
86
|
+
distinct branches. Because each agent operates in its own worktree directory on
|
|
87
|
+
its own branch, their edits, builds, and commits are fully isolated — the only
|
|
88
|
+
shared point is `origin/master`, which each branch syncs against at `pr`/`finish`
|
|
89
|
+
time. This is what makes conflict-free parallel agent execution possible.
|
|
90
|
+
|
|
91
|
+
## Command reference
|
|
92
|
+
|
|
93
|
+
All commands are subcommands of `uap worktree`, registered in
|
|
94
|
+
[`src/bin/cli.ts`](../../src/bin/cli.ts) and implemented in
|
|
95
|
+
[`src/cli/worktree.ts`](../../src/cli/worktree.ts).
|
|
96
|
+
|
|
97
|
+
### `create <slug>`
|
|
98
|
+
|
|
99
|
+
Create a new worktree and feature branch for `<slug>`.
|
|
100
|
+
|
|
101
|
+
| Flag | Description |
|
|
102
|
+
|------|-------------|
|
|
103
|
+
| `-f, --from <branch>` | Base branch (defaults to the current branch) |
|
|
104
|
+
| `-d, --description <description>` | Optional worktree description |
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
uap worktree create add-user-auth
|
|
108
|
+
uap worktree create fix-login-bug --from master
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Produces a branch `feature/NNN-add-user-auth` and a checkout at
|
|
112
|
+
`.worktrees/NNN-add-user-auth/`, where `NNN` is the next zero-padded ID.
|
|
113
|
+
|
|
114
|
+
### `list`
|
|
115
|
+
|
|
116
|
+
List all git worktrees under `.worktrees/`, with their ID, name, branch, and
|
|
117
|
+
path.
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
uap worktree list
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### `pr <id>`
|
|
124
|
+
|
|
125
|
+
Create a pull request from the worktree identified by `<id>`. Syncs the branch
|
|
126
|
+
with `origin/master`, pushes it, then runs `gh pr create --fill`.
|
|
127
|
+
|
|
128
|
+
| Flag | Description |
|
|
129
|
+
|------|-------------|
|
|
130
|
+
| `--draft` | Create the PR as a draft |
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
uap worktree pr 7
|
|
134
|
+
uap worktree pr 7 --draft
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### `finish <id>`
|
|
138
|
+
|
|
139
|
+
End-to-end landing: sync with `origin/master`, push, ensure a PR exists, merge
|
|
140
|
+
it (`gh pr merge --merge`), delete the remote branch, and then clean up the
|
|
141
|
+
worktree.
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
uap worktree finish 7
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### `cleanup <id>`
|
|
148
|
+
|
|
149
|
+
Remove the worktree directory, delete the local and remote branch, and mark the
|
|
150
|
+
registry entry as `cleaned`. Use this to tear down a worktree without merging.
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
uap worktree cleanup 7
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### `ensure`
|
|
157
|
+
|
|
158
|
+
Check whether you are working inside a worktree.
|
|
159
|
+
|
|
160
|
+
| Flag | Description |
|
|
161
|
+
|------|-------------|
|
|
162
|
+
| `--strict` | Exit with code 1 if not in a worktree (for use as a gate) |
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
uap worktree ensure # advisory: list options
|
|
166
|
+
uap worktree ensure --strict # gate: exit non-zero if not in a worktree
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### `prune`
|
|
170
|
+
|
|
171
|
+
Prune stale worktrees from the registry and disk.
|
|
172
|
+
|
|
173
|
+
| Flag | Description | Default |
|
|
174
|
+
|------|-------------|---------|
|
|
175
|
+
| `-o, --older-than <days>` | Only prune worktrees older than N days | `30` |
|
|
176
|
+
| `-f, --force` | Skip the confirmation prompt | off |
|
|
177
|
+
| `-n, --dry-run` | Preview without making changes | off |
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
uap worktree prune --dry-run
|
|
181
|
+
uap worktree prune --older-than 14 --force
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
Stale worktrees are selected by age from the registry; pruning deletes the
|
|
185
|
+
worktree directory and removes the registry row.
|