@miller-tech/uap 1.20.32 → 1.20.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/config/model-profiles/qwen35.json +6 -5
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/bin/cli.js +6 -1
  4. package/dist/bin/cli.js.map +1 -1
  5. package/dist/cli/hooks.js +30 -7
  6. package/dist/cli/hooks.js.map +1 -1
  7. package/dist/cli/policy.d.ts.map +1 -1
  8. package/dist/cli/policy.js +26 -0
  9. package/dist/cli/policy.js.map +1 -1
  10. package/dist/dashboard/data-seeder.d.ts.map +1 -1
  11. package/dist/dashboard/data-seeder.js +72 -3
  12. package/dist/dashboard/data-seeder.js.map +1 -1
  13. package/dist/dashboard/data-service.js +1 -1
  14. package/dist/dashboard/data-service.js.map +1 -1
  15. package/dist/dashboard/server.js +1 -1
  16. package/dist/dashboard/server.js.map +1 -1
  17. package/dist/index.d.ts +15 -1
  18. package/dist/index.d.ts.map +1 -1
  19. package/dist/index.js +14 -0
  20. package/dist/index.js.map +1 -1
  21. package/dist/types/index.d.ts +20 -0
  22. package/dist/types/index.d.ts.map +1 -1
  23. package/dist/types/index.js +20 -0
  24. package/dist/types/index.js.map +1 -1
  25. package/docs/AGENTS.md +423 -0
  26. package/docs/AGENTS.md</path>CLAUDE.md</path>/home/cogtek/dev/miller-tech/universal-agent-protocol/docs/INDEX.md</path>/home/cogtek/dev/miller-tech/universal-agent-protocol/docs/reference/API_REFERENCE.md</path>/home/cogtek/dev/miller-tech/universal-agent-protocol/docs/reference/UAP_CLI_REFERENCE.md</path>src/index.ts</path>/src/cli/worktree.ts</path>/src/coordination/deploy-batcher.ts</path>/src/policies/policy-gate.ts</path>/src/memory/model-router.ts</path>/src/memory/embeddings.ts</path>/src/models/types.ts</path>/src/types/coordination.ts</path>/src/utils/logger.ts</path>/src/utils/config-loader.ts</path>/src/utils/performance-monitor.ts</path>/src/utils/concurrency.ts</path>/src/utils/concurrency-pool.ts</path>/src/utils/string-similarity.ts</path>/src/utils/rate-limiter.ts</path>/src/utils/system-resources.ts</path>/src/utils/adaptive-cache.ts</path>/src/utils/lazy-imports.ts</path>/src/utils/merge-claude-md.ts</path>/src/utils/stopwords.ts</path>/src/utils/config-loader.ts</path>/src/utils/performance-monitor.ts</path>/src/utils/concurrency.ts</path>/src/utils/concurrency-pool.ts</path>/src/utils/string-similarity.ts</path>/src/utils/rate-limiter.ts</path>/src/utils/system-resources.ts</path>/src/utils/adaptive-cache.ts</path>/src/utils/lazy-imports.ts</path>/src/utils/merge-claude-md.ts</path>/src/utils/stopwords.ts</path> +433 -0
  27. package/docs/DOCUMENTATION_AUDIT_REPORT.md +131 -0
  28. package/docs/GETTING_STARTED.md +288 -0
  29. package/docs/INDEX.md +272 -42
  30. package/docs/PROJECT_ANALYSIS_REPORT.md +510 -0
  31. package/docs/architecture/SYSTEM_ANALYSIS.md +220 -1003
  32. package/docs/blog/local-coding-agents.md +266 -0
  33. package/docs/blog/x-thread.md +254 -0
  34. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +15 -647
  35. package/docs/getting-started/OVERVIEW.md +10 -30
  36. package/docs/getting-started/SETUP.md +183 -9
  37. package/docs/pr/UPSTREAM_PRS.md +424 -0
  38. package/docs/reference/CONFIGURATION.md +208 -0
  39. package/docs/reference/DATABASE_SCHEMA.md +344 -0
  40. package/docs/reference/PATTERN_LIBRARY.md +636 -0
  41. package/package.json +1 -1
  42. package/templates/hooks/uap-policy-gate.sh +36 -0
  43. package/tools/agents/claude_local_agent.py +92 -0
  44. package/tools/agents/opencode_uap_agent.py +3 -0
  45. package/tools/agents/scripts/anthropic_proxy.py +654 -20
  46. package/tools/agents/uap_agent.py +1 -1
@@ -0,0 +1,424 @@
1
+ # UAP Upstream PR Plan
2
+
3
+ 5 PRs covering the session stickiness bug, loop protection hardening, per-request spec control, OpenAI-compat endpoint, and the policy engine.
4
+
5
+ ## Dependency graph
6
+
7
+ ```
8
+ PR 1 (session fingerprinting) ── CRITICAL ──► enables PR 2, PR 3, PR 5
9
+ PR 2 (loop protection) ── depends on PR 1
10
+ PR 3 (spec decoding control) ── independent
11
+ PR 4 (OpenAI /v1/chat/completions) ── depends on PR 2 (via guardrails)
12
+ PR 5 (policy engine) ── depends on PR 1 + PR 2
13
+ ```
14
+
15
+ ---
16
+
17
+ ## PR 1 — `proxy: stable session fingerprinting`
18
+
19
+ **Scope:** Critical bug fix
20
+ **Files:** `tools/agents/scripts/anthropic_proxy.py`
21
+ **Risk:** Low — pure fix, no new surface area
22
+ **Priority:** Highest — every stateful guardrail depends on this
23
+
24
+ ### Problem
25
+
26
+ Session fingerprints were hashed from `remote | model | system | first_user_content`. Two inputs were volatile:
27
+
28
+ 1. **`tool_use_id`** values in tool_result blocks — random UUIDs regenerated per turn. `_content_fingerprint` included `f"result:{block.get('tool_use_id', '')}"` in the hash.
29
+ 2. **`system` prompt** — clients inject volatile context (timestamps, cwd, session markers) into system prompts.
30
+
31
+ Result: **every single request got a different session ID** → every request spawned a fresh `SessionMonitor` → every stateful guardrail (cycle detection, forced_budget, review_cycles, finalize_hard_stop, unproductive_exhaustion_streak) was effectively stateless per-request.
32
+
33
+ This silently broke every loop protection mechanism ever built on top of the session monitor.
34
+
35
+ ### Diagnostic evidence
36
+
37
+ After adding session ID logging:
38
+
39
+ ```
40
+ sess=fp:9c8f26a802f9f4739f18 msgs=79
41
+ sess=fp:b801857a9e49e21a6599 msgs=81
42
+ sess=fp:aeef638954a390ef7aec msgs=83
43
+ sess=fp:16f908db2e478f31cb91 msgs=85
44
+ ```
45
+
46
+ Every request got a new session ID. `session_count: 35` after 35 requests on what should have been one session.
47
+
48
+ ### Fix
49
+
50
+ 1. `_content_fingerprint` uses stable content excerpt (`result:<first 64 chars>`) instead of `tool_use_id`
51
+ 2. `resolve_session_id` hashes only the first user message's **text content**, excludes `system` prompt entirely
52
+
53
+ ```python
54
+ def resolve_session_id(request: Request, anthropic_body: dict) -> str:
55
+ # ... header-based lookup unchanged ...
56
+
57
+ first_user = ""
58
+ for msg in anthropic_body.get("messages", []):
59
+ if msg.get("role") == "user":
60
+ content = msg.get("content", "")
61
+ if isinstance(content, str):
62
+ first_user = content[:512]
63
+ elif isinstance(content, list):
64
+ text_parts = [
65
+ b.get("text", "") for b in content
66
+ if isinstance(b, dict) and b.get("type") == "text"
67
+ ]
68
+ first_user = "\n".join(text_parts)[:512]
69
+ break
70
+
71
+ # Deliberately exclude `system` from fingerprint — clients inject
72
+ # volatile context (timestamps, cwd, session markers).
73
+ digest = hashlib.sha256(
74
+ f"{remote}|{model}|{first_user}".encode("utf-8", errors="ignore")
75
+ ).hexdigest()[:20]
76
+ return f"fp:{digest}"
77
+ ```
78
+
79
+ ### Impact
80
+
81
+ - Before: 1 request per session
82
+ - After: 170+ requests on the same session (verified with Claude Code + OpenCode + Forge clients)
83
+ - All downstream guardrails suddenly started working — no changes needed to them
84
+
85
+ ### Add session ID logging
86
+
87
+ The REQ line now includes `sess=` for diagnosis:
88
+
89
+ ```
90
+ REQ: client=remote:127.0.0.1 sess=fp:aa5169796b2c39c2a4a4 rate_60s=1 ...
91
+ ```
92
+
93
+ ### Tests
94
+
95
+ - [ ] Unit test: same message with changing tool_use_ids → stable fingerprint
96
+ - [ ] Unit test: same message with changing system timestamps → stable fingerprint
97
+ - [ ] Integration test: 3 sequential requests on same conversation → same session_id
98
+
99
+ ---
100
+
101
+ ## PR 2 — `proxy: loop protection hardening`
102
+
103
+ **Scope:** Medium — new counters + threshold gates
104
+ **Files:** `anthropic_proxy.py`
105
+ **Depends on:** PR 1 (counters only work with sticky sessions)
106
+
107
+ ### Additions
108
+
109
+ 1. **`tool_state_unproductive_exhaustion_streak`**
110
+ - Tracks consecutive `forced_budget_exhausted` events where NEITHER cycling NOR stagnation was detected
111
+ - After `PROXY_UNPRODUCTIVE_EXHAUSTION_LIMIT` (default 4), forces finalize
112
+ - Catches "distinct-but-unproductive tool spam" that defeats per-tool cycle detection
113
+
114
+ 2. **`finalize_hard_stop_count`** (monotonic session-level)
115
+ - NOT reset by `fresh_user_text` / `inactive_loop` paths
116
+ - Incremented in BOTH:
117
+ - `_inject_synthetic_continuation` (synthetic continuation path)
118
+ - `state_choice == "finalize"` handler (tool-stripping path)
119
+ - When `>= PROXY_FINALIZE_SESSION_HARD_CAP` (default 6), synthetic continuation injection is blocked, natural end_turn passes through → client terminates loop cleanly
120
+
121
+ 3. **`finalize_fired` flag in `_completion_blockers()`**
122
+ - When `finalize_hard_stop_count > 0`, suppresses `text_only_after_tool_results` blocker
123
+ - Prevents state machine from re-entering active loop after a finalize wraps up the work
124
+ - Was causing `finalize → review → cycle_detected → finalize → review → ...` infinite ping-pong
125
+
126
+ ### New env vars
127
+
128
+ ```
129
+ PROXY_UNPRODUCTIVE_EXHAUSTION_LIMIT=4 # new
130
+ PROXY_FINALIZE_SESSION_HARD_CAP=6 # new
131
+ ```
132
+
133
+ ### Tuned thresholds (tighter defaults)
134
+
135
+ ```
136
+ PROXY_LOOP_REPEAT_THRESHOLD=4 # was 10
137
+ PROXY_FORCED_THRESHOLD=12 # was 18
138
+ PROXY_NO_PROGRESS_THRESHOLD=3 # was 5
139
+ PROXY_TOOL_STATE_STAGNATION_THRESHOLD=4 # was 8
140
+ PROXY_TOOL_STATE_FINALIZE_THRESHOLD=8 # was 18
141
+ PROXY_TOOL_STATE_REVIEW_CYCLE_LIMIT=5 # was 3 (relaxed from prior 3 after tuning)
142
+ PROXY_TOOL_NARROWING_EXPAND_ON_LOOP=off # was on
143
+ PROXY_TOOL_NARROWING_KEEP=8 # was 12
144
+ ```
145
+
146
+ ### Verification
147
+
148
+ Real session that was previously looping indefinitely terminated cleanly:
149
+ ```
150
+ TOOL STATE MACHINE: 4 consecutive unproductive budget exhaustions — forcing finalize
151
+ TOOL STATE MACHINE: phase review -> finalize reason=unproductive_exhaustion
152
+ FINALIZE CONTINUATION: session hard cap reached (6/6) — not injecting, allowing termination
153
+ ```
154
+
155
+ Client received clean `end_turn`, started a fresh new task.
156
+
157
+ ### Tests
158
+
159
+ - [ ] Simulated loop: distinct tool calls with no context growth → triggers unproductive exhaustion
160
+ - [ ] Simulated loop: same tool repeated → triggers per-tool cycle detection (existing)
161
+ - [ ] Finalize → synthetic continuation → reset → new active loop → hard cap at 6 → natural termination
162
+
163
+ ---
164
+
165
+ ## PR 3 — `proxy: per-request speculative decoding control`
166
+
167
+ **Scope:** Small, focused
168
+ **Files:** `anthropic_proxy.py`, README
169
+ **Risk:** Low
170
+
171
+ ### Feature
172
+
173
+ New env var `PROXY_DISABLE_SPEC_ON_TOOL_TURNS` (default off). When on, the proxy sets `openai_body["speculative.n_max"] = 0` on tool-turn requests, telling llama.cpp to skip the draft/spec path for that request only.
174
+
175
+ ### Why
176
+
177
+ Some models (observed: early Qwen3.5-35B-A3B Q4_K_M) produce garbled tool-call output under speculative decoding due to rejected-draft state leakage. Disabling spec on tool turns while keeping it on for plain chat gives the best of both worlds for unstable models. Stable models can leave this off and benefit from spec on every turn.
178
+
179
+ ### Applied in two places
180
+
181
+ 1. Main handler (`_build_openai_request` end)
182
+ 2. Tool starvation breaker early-return path (so the flag is respected on both code paths)
183
+
184
+ ```python
185
+ if PROXY_DISABLE_SPEC_ON_TOOL_TURNS:
186
+ openai_body["speculative.n_max"] = 0
187
+ logger.info("Spec decoding disabled for tool turn (PROXY_DISABLE_SPEC_ON_TOOL_TURNS=on)")
188
+ ```
189
+
190
+ ### Relies on llama.cpp upstream support
191
+
192
+ llama.cpp already supports per-request `speculative.n_max` in `server-task.cpp`:
193
+ ```cpp
194
+ params.speculative.n_max = json_value(data, "speculative.n_max", defaults.speculative.n_max);
195
+ ```
196
+
197
+ Setting it to 0 gates the entire draft path (`if (n_draft_max > 0)` in `server-context.cpp`).
198
+
199
+ ### Tests
200
+
201
+ - [ ] Tool-turn request with flag on → `speculative.n_max=0` in forwarded body
202
+ - [ ] Non-tool request with flag on → no speculative field added
203
+ - [ ] Flag off → no speculative field added regardless
204
+
205
+ ---
206
+
207
+ ## PR 4 — `proxy: fully guarded OpenAI /v1/chat/completions endpoint`
208
+
209
+ **Scope:** Medium — new endpoint with full bidirectional conversion
210
+ **Files:** `anthropic_proxy.py`
211
+ **Depends on:** PR 2 (reuses the guardrail pipeline)
212
+
213
+ ### Motivation
214
+
215
+ Clients like **OpenCode**, **Forge**, **Cline**, and many LangChain-based agents expect OpenAI's `/v1/chat/completions` shape. The proxy previously only exposed `/v1/messages` (Anthropic shape), so these clients either:
216
+ 1. Bypassed the proxy and talked directly to llama.cpp (no guardrails), OR
217
+ 2. Couldn't use the proxy at all
218
+
219
+ ### Approach
220
+
221
+ Add `/v1/chat/completions` handler that:
222
+ 1. Receives OpenAI-format request
223
+ 2. Converts to Anthropic format (`openai_to_anthropic_request`)
224
+ 3. Invokes the existing `messages()` handler via synthetic `Request` with Anthropic body
225
+ 4. Converts the Anthropic response back to OpenAI format (`anthropic_to_openai_response`)
226
+ 5. Returns to the client
227
+
228
+ **All guardrails from the `/v1/messages` path apply automatically** — loop detection, tool narrowing, cycle breaking, malformed tool retry, context pruning, profile overrides, activation replay (llama.cpp side).
229
+
230
+ ### Streaming
231
+
232
+ Client stream requests are processed internally as non-stream through the Anthropic pipeline, then re-streamed as OpenAI SSE chunks:
233
+
234
+ ```
235
+ data: {"id":"msg_...","delta":{"role":"assistant"},...}
236
+ data: {"id":"msg_...","delta":{"content":"..."},...}
237
+ data: {"id":"msg_...","delta":{"tool_calls":[...]},...}
238
+ data: {"id":"msg_...","delta":{},"finish_reason":"tool_calls"}
239
+ data: [DONE]
240
+ ```
241
+
242
+ This sacrifices token-by-token streaming granularity in exchange for keeping all guardrails. The difference is invisible to most clients.
243
+
244
+ ### Helper functions added
245
+
246
+ - **`openai_to_anthropic_request(openai_body)`** — full conversion (system prompt, messages, tool_calls, tool_responses, tools, tool_choice, sampling params)
247
+ - **`anthropic_to_openai_response(anthropic_resp)`** — content blocks → message, tool_use → tool_calls, stop_reason → finish_reason, usage mapping
248
+ - **`_parse_anthropic_sse_to_message(raw)`** — SSE fallback parser if inner pipeline returns a stream despite `stream=False`
249
+
250
+ ### Verification
251
+
252
+ Tested against OpenCode, Forge, and synthetic curl requests:
253
+ - Plain chat: clean text response
254
+ - Tool use: proper `tool_calls` with JSON arguments
255
+ - Streaming: proper SSE chunks with finish_reason
256
+ - All guardrails active (verified via log `CHAT (guarded)` marker)
257
+
258
+ ### Tests
259
+
260
+ - [ ] Round-trip: OpenAI request → Anthropic → OpenAI with matching content
261
+ - [ ] Tool call conversion (both directions)
262
+ - [ ] System prompt extraction from messages
263
+ - [ ] Streaming endpoint emits valid SSE sequence
264
+ - [ ] Profile overrides apply to chat/completions path
265
+
266
+ ---
267
+
268
+ ## PR 5 — `proxy: policy engine with worktree + CI/CD enforcement`
269
+
270
+ **Scope:** Large — new module + hook points
271
+ **Files:** `policies/engine.py`, `policies/rules/*.py`, `anthropic_proxy.py` (hook points), tests
272
+ **Depends on:** PR 1 (session continuity), PR 2 (guardrail infrastructure)
273
+ **Risk:** Medium — new subsystem
274
+
275
+ ### Motivation
276
+
277
+ You can tell a local coding agent to use a git worktree. You can write it in CLAUDE.md, put it in the system prompt, make it the first rule. Local 27–35B models **still commit directly to main**.
278
+
279
+ Policy-as-prompt is not an enforcement mechanism for local coding agents — it's a suggestion. The only reliable way to enforce workflow requirements is to make them non-bypassable at the proxy layer.
280
+
281
+ ### What it enforces
282
+
283
+ - **Worktree routing** — `Edit`, `Write`, `Bash` tool inputs get rewritten to reference the active worktree path. Operations targeting the main working tree are rejected.
284
+ - **Completion gates** — `end_turn` is blocked unless tests ran, memory was queried, parallel reviewers were invoked.
285
+ - **Pre-commit discipline** — commit tool calls blocked until code-reviewer + security-auditor + architect-reviewer were invoked.
286
+ - **CI/CD deploy bucketing** — each agent session has a deploy bucket tied to its worktree. Concurrent agents don't collide at the pipeline layer.
287
+ - **Per-profile rule sets** — `build` / `plan` / `memory` / `autoaccept` each get a different policy set.
288
+ - **Session start protocol** — mandatory bootstrap checks (memory query, session context load)
289
+ - **Auditable trail** — every policy decision logged with rule ID, context, outcome
290
+
291
+ ### Architecture
292
+
293
+ ```
294
+ client → proxy → [guardrails] → [policy engine] → [tool rewriter] → llama.cpp
295
+
296
+ audit log
297
+ ```
298
+
299
+ Every tool call goes through a policy check chain before being forwarded to llama.cpp. Rules can allow, rewrite, or block.
300
+
301
+ ### Rule DSL
302
+
303
+ ```python
304
+ from uap.policies import policy, block, allow, MUTATING_TOOLS
305
+
306
+ @policy("worktree.enforce", profile=["build", "autoaccept"])
307
+ def enforce_worktree(request, session):
308
+ if request.tool_name in MUTATING_TOOLS:
309
+ if not session.worktree_active:
310
+ return block("worktree_not_in_use",
311
+ hint="Create a worktree first with `git worktree add`")
312
+ request.tool_input["path"] = rewrite_to_worktree(
313
+ request.tool_input["path"], session.worktree
314
+ )
315
+ return allow()
316
+
317
+ @policy("commit.parallel_review", profile="build")
318
+ def enforce_parallel_review(request, session):
319
+ if request.tool_name == "Bash" and "git commit" in request.tool_input.get("command", ""):
320
+ if not session.review_completed_this_turn:
321
+ return block("parallel_review_required",
322
+ hint="Invoke code-reviewer + security-auditor + architect-reviewer in parallel before committing")
323
+ return allow()
324
+
325
+ @policy("completion.gates", profile="build")
326
+ def enforce_completion_gates(request, session):
327
+ if request.is_end_turn:
328
+ blockers = []
329
+ if not session.tests_ran:
330
+ blockers.append("tests_not_run")
331
+ if not session.memory_queried:
332
+ blockers.append("memory_not_queried")
333
+ if blockers:
334
+ return block(f"completion_gates_failed: {','.join(blockers)}")
335
+ return allow()
336
+ ```
337
+
338
+ ### Integration with existing `_completion_blockers()`
339
+
340
+ Policy blockers extend the existing completion contract:
341
+
342
+ ```python
343
+ def _completion_blockers(anthropic_body, has_tool_results, phase="", finalize_fired=False):
344
+ blockers = []
345
+ # ... existing checks ...
346
+
347
+ # NEW: policy-level blockers
348
+ policy_blockers = policy_engine.evaluate_completion(anthropic_body, session)
349
+ blockers.extend(policy_blockers)
350
+
351
+ return blockers
352
+ ```
353
+
354
+ ### Per-profile rule sets
355
+
356
+ ```python
357
+ # policies/profiles.py
358
+ BUILD_PROFILE_RULES = [
359
+ "worktree.enforce",
360
+ "commit.parallel_review",
361
+ "commit.message_format",
362
+ "commit.no_secrets",
363
+ "completion.gates",
364
+ "session.bootstrap",
365
+ ]
366
+
367
+ PLAN_PROFILE_RULES = [
368
+ "tools.read_only", # blocks write/edit/bash tools
369
+ "session.bootstrap",
370
+ ]
371
+
372
+ MEMORY_PROFILE_RULES = [
373
+ "tools.memory_only", # only memory read/write tools allowed
374
+ ]
375
+
376
+ AUTOACCEPT_PROFILE_RULES = [
377
+ "worktree.enforce", # same worktree rule
378
+ "commit.no_secrets", # security still enforced
379
+ # no parallel review required (autoaccept is explicit trade-off)
380
+ ]
381
+ ```
382
+
383
+ ### Audit trail
384
+
385
+ Every policy decision is logged with session, rule ID, tool name, decision, and blocker reason:
386
+
387
+ ```
388
+ POLICY: sess=fp:aa51... rule=worktree.enforce tool=Edit decision=rewrite old_path=/home/cogtek/dev/main/app.py new_path=/home/cogtek/dev/.worktrees/feat-x/app.py
389
+ POLICY: sess=fp:aa51... rule=commit.parallel_review tool=Bash decision=block reason=parallel_review_required
390
+ ```
391
+
392
+ ### Tests
393
+
394
+ - [ ] Unit tests for each rule in isolation
395
+ - [ ] Integration: build profile session → attempt commit without review → blocked → invoke review → commit succeeds
396
+ - [ ] Integration: plan profile session → attempt Write → blocked
397
+ - [ ] Multi-agent: two sessions with different worktrees → no collision
398
+ - [ ] Audit log format validation
399
+
400
+ ### Migration path
401
+
402
+ - PR introduces the policy engine as **opt-in** per profile (default profile has no policies — fully backward-compatible)
403
+ - Users can enable rules one at a time via profile env vars
404
+ - Existing CLAUDE.md prose instructions can reference policies for context, but policies are now enforced independent of prose
405
+
406
+ ---
407
+
408
+ ## Submission order
409
+
410
+ 1. **PR 1 (session fingerprinting)** — critical bug fix, low risk, unblocks everything else
411
+ 2. **PR 2 (loop protection hardening)** — depends on PR 1, reviewers can verify that PR 1's fix makes these counters functional
412
+ 3. **PR 3 (spec decoding control)** — independent, small, easy to review
413
+ 4. **PR 4 (OpenAI endpoint)** — depends on PR 2 (reuses guardrails), adds major new functionality
414
+ 5. **PR 5 (policy engine)** — depends on PR 1 + PR 2, new subsystem, needs the most review
415
+
416
+ ## Pre-submission checklist (all PRs)
417
+
418
+ - [ ] Unit tests added
419
+ - [ ] Integration tests with real llama.cpp upstream
420
+ - [ ] README / docs updated
421
+ - [ ] Env var reference updated
422
+ - [ ] No breaking changes to existing endpoints (or clearly flagged)
423
+ - [ ] Config migration notes for existing deployments
424
+ - [ ] Diff against current production (`anthropic-proxy.env.*` profiles)
@@ -0,0 +1,208 @@
1
+ # UAP Configuration Reference
2
+
3
+ Complete configuration schema and environment variables for Universal Agent Protocol.
4
+
5
+ ## .uap.json Project Configuration
6
+
7
+ ### Root Schema
8
+
9
+ ```json
10
+ {
11
+ "version": "1.0.0",
12
+ "project": {
13
+ "name": "string (required)",
14
+ "defaultBranch": "string (optional, default: main)"
15
+ },
16
+ "memory": {
17
+ "shortTerm": {
18
+ "enabled": "boolean (default: true)",
19
+ "path": "string (default: ./agents/data/memory/short_term.db)",
20
+ "maxEntries": "integer (default: 50)"
21
+ },
22
+ "longTerm": {
23
+ "enabled": "boolean (default: true)",
24
+ "provider": "string (qdrant | github | local)",
25
+ "endpoint": "string (for Qdrant cloud)",
26
+ "apiKey": "string (for Qdrant cloud)"
27
+ }
28
+ },
29
+ "multiModel": {
30
+ "enabled": "boolean (default: true)",
31
+ "models": "string[] (required)",
32
+ "roles": {
33
+ "planner": "string (model ID)",
34
+ "executor": "string (model ID)",
35
+ "fallback": "string (model ID)"
36
+ },
37
+ "routingStrategy": "string (cost-optimized | performance-first | balanced)"
38
+ },
39
+ "worktrees": {
40
+ "enabled": "boolean (default: true)",
41
+ "directory": "string (default: .worktrees)"
42
+ },
43
+ "policies": {
44
+ "enabled": "boolean (default: true)",
45
+ "auditTrail": "boolean (default: true)"
46
+ },
47
+ "hooks": {
48
+ "sessionStart": "boolean (default: true)",
49
+ "preCompact": "boolean (default: true)"
50
+ }
51
+ }
52
+ ```
53
+
54
+ ### Validation Rules
55
+
56
+ | Field | Type | Required | Default | Description |
57
+ | --------------------------- | -------- | -------- | ---------------------------------- | -------------------------- |
58
+ | version | string | Yes | - | Schema version (1.0.0) |
59
+ | project.name | string | Yes | - | Project identifier |
60
+ | project.defaultBranch | string | No | main | Git default branch |
61
+ | memory.shortTerm.enabled | boolean | No | true | Enable short-term memory |
62
+ | memory.shortTerm.path | string | No | ./agents/data/memory/short_term.db | SQLite path |
63
+ | memory.shortTerm.maxEntries | integer | No | 50 | Max working memory entries |
64
+ | memory.longTerm.provider | string | No | qdrant | Backend provider |
65
+ | multiModel.models | string[] | Yes | - | Available model IDs |
66
+ | multiModel.routingStrategy | string | No | balanced | Routing strategy |
67
+
68
+ ## Environment Variables
69
+
70
+ ### Memory Configuration
71
+
72
+ | Variable | Type | Default | Description |
73
+ | ----------------------------- | ------ | ---------------------------------- | ------------------------- |
74
+ | UAP_MEMORY_SHORT_TERM_PATH | string | ./agents/data/memory/short_term.db | Short-term memory DB path |
75
+ | UAP_MEMORY_LONG_TERM_PROVIDER | string | qdrant | Long-term memory backend |
76
+ | UAP_QDRANT_ENDPOINT | string | - | Qdrant cloud endpoint |
77
+ | UAP_QDRANT_API_KEY | string | - | Qdrant API key |
78
+
79
+ ### Multi-Model Configuration
80
+
81
+ | Variable | Type | Default | Description |
82
+ | -------------------- | ------ | -------- | ---------------------- |
83
+ | UAP_MODEL_PLANNER | string | opus-4.6 | Default planner model |
84
+ | UAP_MODEL_EXECUTOR | string | glm-4.7 | Default executor model |
85
+ | UAP_MODEL_FALLBACK | string | opus-4.5 | Fallback on failure |
86
+ | UAP_ROUTING_STRATEGY | string | balanced | Routing strategy |
87
+
88
+ ### Worktree Configuration
89
+
90
+ | Variable | Type | Default | Description |
91
+ | -------------------- | ------- | ---------- | ----------------------- |
92
+ | UAP_WORKTREE_DIR | string | .worktrees | Worktree directory path |
93
+ | UAP_WORKTREE_ENABLED | boolean | true | Enable worktree system |
94
+
95
+ ### Policy Configuration
96
+
97
+ | Variable | Type | Default | Description |
98
+ | -------------------- | ------- | ------- | ------------------------- |
99
+ | UAP_POLICIES_ENABLED | boolean | true | Enable policy enforcement |
100
+ | UAP_AUDIT_TRAIL | boolean | true | Enable audit logging |
101
+
102
+ ### Debug & Logging
103
+
104
+ | Variable | Type | Default | Description |
105
+ | --------------------- | ------- | ------- | ------------------------------------ |
106
+ | UAP_VERBOSE | boolean | false | Enable verbose logging |
107
+ | UAP_LOG_LEVEL | string | info | Log level (debug, info, warn, error) |
108
+ | UAP_TELEMETRY_ENABLED | boolean | true | Enable telemetry collection |
109
+
110
+ ## Platform-Specific Configurations
111
+
112
+ ### Claude Code Integration
113
+
114
+ ```json
115
+ {
116
+ "hooks": {
117
+ "claude": {
118
+ "sessionStart": "templates/hooks/session-start.sh",
119
+ "preCompact": "templates/hooks/pre-compact.sh"
120
+ }
121
+ }
122
+ }
123
+ ```
124
+
125
+ ### Factory.AI Integration
126
+
127
+ ```json
128
+ {
129
+ "hooks": {
130
+ "factory": {
131
+ "sessionStart": "templates/hooks/session-start.sh",
132
+ "preCompact": "templates/hooks/pre-compact.sh"
133
+ }
134
+ }
135
+ }
136
+ ```
137
+
138
+ ### OpenCode Integration
139
+
140
+ ```json
141
+ {
142
+ "hooks": {
143
+ "opencode": {
144
+ "sessionStart": "templates/hooks/session-start.sh",
145
+ "preCompact": "templates/hooks/pre-compact.sh"
146
+ }
147
+ }
148
+ }
149
+ ```
150
+
151
+ ## Example Configurations
152
+
153
+ ### Minimal Configuration
154
+
155
+ ```json
156
+ {
157
+ "version": "1.0.0",
158
+ "project": { "name": "my-project" },
159
+ "memory": { "shortTerm": { "enabled": true } },
160
+ "multiModel": {
161
+ "enabled": true,
162
+ "models": ["opus-4.6", "qwen35"],
163
+ "roles": { "planner": "opus-4.6", "executor": "qwen35" }
164
+ }
165
+ }
166
+ ```
167
+
168
+ ### Production Configuration
169
+
170
+ ```json
171
+ {
172
+ "version": "1.0.0",
173
+ "project": { "name": "production-app", "defaultBranch": "main" },
174
+ "memory": {
175
+ "shortTerm": { "enabled": true, "maxEntries": 50 },
176
+ "longTerm": { "enabled": true, "provider": "qdrant", "endpoint": "https://qdrant.example.com" }
177
+ },
178
+ "multiModel": {
179
+ "enabled": true,
180
+ "models": ["opus-4.6", "sonnet-4.6", "qwen35"],
181
+ "roles": { "planner": "opus-4.6", "executor": "qwen35", "fallback": "sonnet-4.6" },
182
+ "routingStrategy": "cost-optimized"
183
+ },
184
+ "worktrees": { "enabled": true, "directory": ".worktrees" },
185
+ "policies": { "enabled": true, "auditTrail": true }
186
+ }
187
+ ```
188
+
189
+ ## Configuration Validation
190
+
191
+ Run validation:
192
+
193
+ ```bash
194
+ uap compliance check
195
+ ```
196
+
197
+ This verifies:
198
+
199
+ - Memory database paths exist or can be created
200
+ - Model IDs are valid
201
+ - Worktree directory is accessible
202
+ - Policy enforcement is properly configured
203
+
204
+ ## See Also
205
+
206
+ - [Getting Started](../../docs/getting-started/SETUP.md)
207
+ - [Multi-Model Architecture](../../docs/reference/FEATURES.md#multi-model-architecture)
208
+ - [Memory System](../../docs/reference/FEATURES.md#memory-system)