@miller-tech/uap 1.40.0 → 1.40.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/README.md +109 -642
  2. package/docs/INDEX.md +48 -286
  3. package/docs/architecture/OVERVIEW.md +328 -0
  4. package/docs/architecture/PROTOCOL.md +204 -0
  5. package/docs/benchmarks/README.md +17 -192
  6. package/docs/getting-started/CONFIGURATION.md +237 -0
  7. package/docs/getting-started/INSTALLATION.md +125 -0
  8. package/docs/getting-started/QUICKSTART.md +115 -0
  9. package/docs/guides/COORDINATION.md +162 -0
  10. package/docs/guides/DELIVER.md +115 -0
  11. package/docs/guides/DEPLOY_BATCHING.md +212 -0
  12. package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
  13. package/docs/guides/LOCAL_MODELS.md +148 -0
  14. package/docs/guides/MCP_ROUTER.md +195 -0
  15. package/docs/guides/MEMORY.md +235 -0
  16. package/docs/guides/MULTI_MODEL.md +223 -0
  17. package/docs/guides/POLICIES.md +190 -0
  18. package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
  19. package/docs/integrations/MCP_ROUTER.md +147 -0
  20. package/docs/integrations/RTK.md +102 -0
  21. package/docs/reference/API.md +485 -0
  22. package/docs/reference/CLI.md +719 -0
  23. package/docs/reference/CONFIGURATION.md +90 -193
  24. package/docs/reference/DATABASE_SCHEMA.md +110 -344
  25. package/docs/reference/FEATURES.md +176 -472
  26. package/docs/reference/PATTERNS.md +102 -0
  27. package/docs/reference/PLATFORMS.md +83 -0
  28. package/package.json +1 -1
  29. package/docs/AGENTS.md +0 -423
  30. package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
  31. package/docs/GETTING_STARTED.md +0 -288
  32. package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
  33. package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
  34. package/docs/architecture/EXPERT_STACK.md +0 -137
  35. package/docs/architecture/MULTI_MODEL.md +0 -224
  36. package/docs/architecture/PLATFORM_GATING.md +0 -68
  37. package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
  38. package/docs/architecture/UAP_COMPLIANCE.md +0 -217
  39. package/docs/architecture/UAP_PROTOCOL.md +0 -339
  40. package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
  41. package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
  42. package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
  43. package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
  44. package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
  45. package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
  46. package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
  47. package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
  48. package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
  49. package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
  50. package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
  51. package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
  52. package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
  53. package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
  54. package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
  55. package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
  56. package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
  57. package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
  58. package/docs/archive/opencode-integration-guide.md +0 -740
  59. package/docs/archive/opencode-integration-quickref.md +0 -180
  60. package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
  61. package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
  62. package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
  63. package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
  64. package/docs/blog/local-coding-agents.md +0 -266
  65. package/docs/blog/x-thread.md +0 -254
  66. package/docs/deployment/DEPLOYMENT.md +0 -895
  67. package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
  68. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
  69. package/docs/deployment/DEPLOY_BATCHING.md +0 -273
  70. package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
  71. package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
  72. package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
  73. package/docs/getting-started/INTEGRATION.md +0 -628
  74. package/docs/getting-started/OVERVIEW.md +0 -324
  75. package/docs/getting-started/SETUP.md +0 -377
  76. package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
  77. package/docs/integrations/RTK_INTEGRATION.md +0 -468
  78. package/docs/operations/TROUBLESHOOTING.md +0 -660
  79. package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
  80. package/docs/pr/UPSTREAM_PRS.md +0 -424
  81. package/docs/reference/API_REFERENCE.md +0 -903
  82. package/docs/reference/EXPERT_DROIDS.md +0 -219
  83. package/docs/reference/HARNESS-MATRIX.md +0 -318
  84. package/docs/reference/PATTERN_LIBRARY.md +0 -636
  85. package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
  86. package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
  87. package/docs/research/DOMAIN_STRATEGIES.md +0 -316
  88. package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
  89. package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
  90. package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
  91. package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
  92. package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
@@ -1,137 +0,0 @@
1
- # Expert Stack: Forward-Design, HALO & Open-Collider
2
-
3
- This document covers the expert-system extensions added on top of the v1.23.0
4
- droid stack: forward-design experts, the activated experts-as-MCP-tools surface,
5
- HALO trace-based harness optimization, open-collider divergent ideation, and the
6
- expert-review hard gate.
7
-
8
- > Scope note: the base 33-droid roster, `ExpertOrchestrator`, `expert-route`
9
- > CLI, and `parallel-expert-review` skill already shipped in v1.23.0. This layer
10
- > closes real gaps in that stack and integrates two external tools.
11
-
12
- ---
13
-
14
- ## 1. Forward-design droids
15
-
16
- The pre-existing roster was review-heavy — the orchestrator's `plan`/`design`
17
- phases produced no up-front design. Three forward-design experts fill that gap:
18
-
19
- | Droid | Phase | Role |
20
- |---|---|---|
21
- | `strategic-architect` | plan | North-star architecture, technology selection (OSS-first), multi-quarter evolution, one-way-door decisions. Forward-design counterpart to `architect-reviewer`. |
22
- | `tactical-architect` | design | Concrete component/module boundaries, interfaces, data shapes, pattern selection, refactor strategy. |
23
- | `implementation-planner` | design | Executable work breakdown: ordered steps, file-level plan (reuse-first), test plan, risk/rollback. Feeds the `validate-plan-before-build` gate. |
24
-
25
- Wiring: `src/coordination/expert-orchestrator.ts` — `PHASE_ROSTER.plan` gains
26
- `strategic-architect`; `PHASE_ROSTER.design` gains `tactical-architect` and
27
- `implementation-planner`; `isRelevantForCapability` maps them to the
28
- `architecture`/`api-design` capabilities so they appear only on relevant tasks.
29
-
30
- ```bash
31
- uap expert-route "Design a new billing subsystem" --files src/types/billing.ts --json
32
- # → plan: strategic-architect … design: tactical-architect, implementation-planner …
33
- ```
34
-
35
- ---
36
-
37
- ## 2. Experts as MCP tools (activated)
38
-
39
- `src/mcp-router/experts/registry.ts` could already convert droids to virtual
40
- `experts.<name>` tools (`loadExpertTools`) but was never wired in. Now:
41
-
42
- - `McpRouter.loadTools()` (`src/mcp-router/server.ts`) calls `loadExpertTools(cwd)`
43
- and adds the experts to the fuzzy search index.
44
- - `handleExecuteTool` (`src/mcp-router/tools/execute.ts`) intercepts
45
- `experts.<droid>` paths and dispatches an in-process `consultExpert()` — it
46
- loads the droid's instructions and returns them wrapped as a prompt (mirroring
47
- `uap_droid_invoke`), instead of routing to an external MCP server.
48
-
49
- Result: `discover_tools "architecture review"` surfaces the right expert and
50
- `execute_tool experts.architect-reviewer` returns a consultation — all within
51
- the 2-tool token-saving router shape.
52
-
53
- ---
54
-
55
- ## 3. HALO — trace-based harness optimization
56
-
57
- [HALO](https://github.com/context-labs/HALO) analyzes large volumes of execution
58
- traces to find *systemic* harness/prompt failure modes (not one-off errors). UAP
59
- integrates it as an exporter + a droid + a CLI.
60
-
61
- **Exporter** (`src/observability/halo-exporter.ts`) — opt-in, zero-overhead when
62
- off. Emits one JSONL span per agent/LLM/tool call in HALO's OTLP/OpenInference
63
- shape: OTLP identity, `resource.attributes."service.name"`, and the four
64
- `inference.*` attributes (`project_id`, `observation_kind`, `export.schema_version`,
65
- `openinference.span.kind`), with nanosecond-precision timestamps.
66
-
67
- Tap points: `execute.ts:handleExecuteTool` (TOOL spans) and
68
- `session-telemetry.ts` `agentComplete`/`agentError` (AGENT spans).
69
-
70
- ```bash
71
- export UAP_HALO_TRACE=1 # enable collection
72
- export UAP_HALO_TRACE_PATH=.uap/halo/traces.jsonl
73
- # … run your workflow …
74
- uap harness status # enabled? path? span count?
75
- uap harness analyze -p "systemic failure modes?" # wraps `halo <file> -p ...`
76
- ```
77
-
78
- **Prerequisite:** `pip install halo-engine` (Python ≥3.10) + an OpenAI-compatible
79
- endpoint. Each analysis run incurs LLM cost. The `harness-optimizer` droid runs
80
- the loop: diagnose → **verify each claim against the repo** → route fixes →
81
- re-measure. Hard rule: *ask HALO about the trace data; never ask it to write code.*
82
-
83
- ---
84
-
85
- ## 4. Open-Collider — divergent ideation
86
-
87
- [open-collider](https://github.com/CL-ML/open-collider) escapes LLM "hivemind"
88
- clustering by colliding structurally distant knowledge domains (Koestler
89
- bisociation), then curating non-trivial ideas. Skill mode is free.
90
-
91
- - `ideation-expert` droid drives the brief → domains → collide → curate flow.
92
- - `uap ideate setup <name>` scaffolds the `projects/<name>/` file contract
93
- (`brief_validated.json`, `input_bank.yaml`, `prompts/`, `texts/`).
94
- - `uap ideate run <name>` drives the brainstorm; `uap ideate ideas <name>` reads
95
- the newest `curated_ideas.json`.
96
- - Orchestrator opt-in: `new ExpertOrchestrator({ includeIdeation: true })`
97
- prepends an `ideate` phase feeding the plan-phase product/strategy droids.
98
- `readCuratedIdeas()` (`src/cli/ideate.ts`) is the consumable artifact.
99
-
100
- Use it only when the solution space is wide; skip for convergent tasks.
101
-
102
- ---
103
-
104
- ## 5. Expert-review hard gate
105
-
106
- The `parallel-expert-review` skill claimed "REQUIRED by policy" but nothing
107
- enforced it. Two policy artifacts close that:
108
-
109
- - `expert-review-required` (`src/policies/schemas/policies/expert-review-required.md`
110
- + `src/policies/enforcers/expert_review_required.py`): blocks ship actions
111
- (`git commit`/`push`, `gh pr create`, merge/pr-ready/signoff) unless
112
- `.uap/reviews/<branch-slug>.json` exists and covers `HEAD` (stale → block).
113
- Fail-open on detached/non-git; override `UAP_NO_REVIEW=1`.
114
- - `architecture-review` (`…/policies/architecture-review.md`): the missing
115
- backing doc for the previously-orphan `architecture_review.py` enforcer
116
- (ADR-or-waiver on architecturally significant diffs).
117
-
118
- The review flow writes the artifact on consolidation:
119
- `{ "head": "<sha>", "verdict": "approve", "reviewers": [...] }`. Install with:
120
-
121
- ```bash
122
- uap policy install expert-review-required # attaches the enforcer to the hook
123
- ```
124
-
125
- ---
126
-
127
- ## File map
128
-
129
- | Concern | Path |
130
- |---|---|
131
- | Forward-design droids | `.factory/droids/{strategic-architect,tactical-architect,implementation-planner}.md` |
132
- | Orchestrator wiring | `src/coordination/expert-orchestrator.ts` |
133
- | Experts-MCP dispatch | `src/mcp-router/experts/registry.ts`, `server.ts`, `tools/execute.ts` |
134
- | HALO exporter | `src/observability/halo-exporter.ts` |
135
- | HALO droid + CLI | `.factory/droids/harness-optimizer.md`, `src/cli/harness.ts` |
136
- | Ideation droid + CLI | `.factory/droids/ideation-expert.md`, `src/cli/ideate.ts` |
137
- | Review gate | `src/policies/{schemas/policies/expert-review-required.md,enforcers/expert_review_required.py}` |
@@ -1,224 +0,0 @@
1
- # Multi-Model Agentic Architecture
2
-
3
- ## Executive Summary
4
-
5
- This document proposes a two-tier agentic architecture using separate models for planning and execution, achieving **92-98% cost reduction** while maintaining near-original performance for complex tasks.
6
-
7
- ## Core Concept
8
-
9
- **Separation of Concerns:**
10
- - **Tier 1 (Planner)**: High-level reasoning, task decomposition, orchestration
11
- - **Tier 2 (Executor)**: Concrete implementation following planner's specifications
12
-
13
- ### Research Findings (2026)
14
-
15
- #### Model Candidates
16
-
17
- | Model | Role | Cost (Input/Output) | SWE-Bench | Context | Notes |
18
- |-------|------|----------------------|-----------|---------|-------|
19
- | **Claude Opus 4.5** | Planner (current) | $5/$25 per 1M | Highest | 200K | Premium, but expensive |
20
- | **DeepSeek-V3.2** | Planner | $0.25/$0.38 per 1M | 73.1% | 164K | Best cost/performance ratio |
21
- | **DeepSeek-V3.2-Exp** | Executor | $0.21/$0.32 per 1M | Strong | 164K | 78x cheaper output than Opus |
22
- | **GLM-4.7** | Executor | Very Low | Good | 128K | Current workhorse |
23
-
24
- #### Key Findings
25
-
26
- 1. **DeepSeek-V3.2 Speciale** achieves 73.1% on SWE-Bench Verified (vs Opus's highest scores)
27
- 2. **Cost differential**: DeepSeek is ~23x cheaper for input, ~78x cheaper for output
28
- 3. **Context**: 164K is sufficient for most agentic workflows (vs 200K for Opus)
29
- 4. **Architecture**: MoE with 671B params, activates only 37B per token (high efficiency)
30
-
31
- ## Proposed Architecture
32
-
33
- ### Tier 1: Master Planner
34
-
35
- **Model**: **DeepSeek-V3.2 Speciale** (replacing Opus 4.5)
36
-
37
- **Responsibilities**:
38
- - Task decomposition and planning
39
- - Subtask dependency analysis
40
- - Model selection for each subtask
41
- - Quality assurance routing
42
- - Critical path identification
43
-
44
- **When to invoke:**
45
- - New task request
46
- - Complex multi-step workflows
47
- - Requirements for strategic planning
48
- - Architectural decisions
49
-
50
- **Fallback**: If DeepSeek fails on critical planning, escalate to Opus 4.5 (1% of cases)
51
-
52
- ### Tier 2: Task Executor
53
-
54
- **Model**: **GLM-4.7** (current workhorse) or **DeepSeek-V3.2-Exp**
55
-
56
- **Responsibilities**:
57
- - Implement specific code blocks
58
- - Execute tool calls
59
- - Write tests
60
- - Fix bugs based on planner guidance
61
- - Generate documentation
62
-
63
- **When to invoke:**
64
- - Concrete implementation tasks
65
- - Coding following specifications
66
- - Test writing
67
- - Bug fixes with clear guidance
68
-
69
- ### Route Decision Matrix
70
-
71
- | Task Complexity | Routing Logic | Model Selection |
72
- |----------------|---------------|-----------------|
73
- | **High** (new feature, architecture) | → Planner → Decompose → Executor | DeepSeek-V3.2 → GLM-4.7 |
74
- | **Medium** (refactor, bug fix) | → Direct Executor | GLM-4.7 |
75
- | **Low** (simple change) | → Direct Executor | GLM-4.7 |
76
- | **Critical** (security, deployment) | → Planner → Verify → Executor | DeepSeek-V3.2 → GLM-4.7 |
77
-
78
- ## Implementation Strategy
79
-
80
- ### Phase 1: Router (Week 1)
81
-
82
- ```typescript
83
- interface ModelRouter {
84
- route(task: AgenticTask): ModelSelection;
85
- }
86
-
87
- interface ModelSelection {
88
- model: ModelId;
89
- fallback?: ModelId;
90
- reasoning: string;
91
- }
92
- ```
93
-
94
- **Routing Logic**:
95
- 1. Analyze task complexity (token estimate, dependencies, novelty)
96
- 2. Check for critical keywords (security, architecture, planning)
97
- 3. Select DeepSeek-V3.2 for planning tasks
98
- 4. Select GLM-4.7 for execution tasks
99
- 5. Fallback to Opus 4.5 only on threshold failures
100
-
101
- ### Phase 2: Planner Integration (Week 2)
102
-
103
- **Planner Interface**:
104
- ```typescript
105
- interface Planner {
106
- plan(task: AgenticTask): ExecutionPlan;
107
- }
108
-
109
- interface ExecutionPlan {
110
- subtasks: Subtask[];
111
- dependencies: DependencyGraph;
112
- modelAssignments: Map<SubtaskId, ModelId>;
113
- }
114
- ```
115
-
116
- **DeepSeek-V3.2 Integration**:
117
- - API endpoint integration
118
- - Context window management (164K)
119
- - Token budget accounting
120
- - Failure detection and escalation
121
-
122
- ### Phase 3: Executor Pool (Week 3)
123
-
124
- **Executor Options**:
125
- 1. **Primary**: GLM-4.7 (existing, low cost, good performance)
126
- 2. **Backup**: DeepSeek-V3.2-Exp (if GLM-4.7 unavailable)
127
- 3. **Fallback**: Opus 4.5 (critical failures only)
128
-
129
- **Load Balancing**:
130
- - Round-robin across multiple executor instances
131
- - Circuit breaker pattern for reliability
132
- - Timeout management per subtask
133
-
134
- ## Cost Analysis
135
-
136
- ### Baseline (Opus 4.5 Only)
137
-
138
- **Assumptions**:
139
- - 100 tasks/day
140
- - Average 50K input tokens, 30K output tokens per task
141
- - $5/input per 1M, $25/output per 1M
142
-
143
- **Daily Cost**:
144
- - Input: 100 * 50K * $5/1M = $25
145
- - Output: 100 * 30K * $25/1M = $75
146
- - **Total: $100/day**
147
-
148
- **Monthly Cost**: $3,000
149
- **Yearly Cost**: $36,500
150
-
151
- ### Proposed (DeepSeek + GLM-4.7)
152
-
153
- **Distribution**:
154
- - 30% complex tasks → DeepSeek-V3.2 planning (10K tokens)
155
- - 70% direct execution → GLM-4.7 (15K input, 5K output)
156
-
157
- **Daily Cost**:
158
- - Planner (DeepSeek): 30 tasks * 10K tokens * ($0.25/$0.38)/1M = $0.19
159
- - Executor (GLM):
160
- - Input: 100 tasks * 15K * $1/1M = $1.50
161
- - Output: 100 tasks * 5K * $2/1M = $1.00
162
- - **Total: $2.69/day**
163
-
164
- **Monthly Cost**: $80.70
165
- **Yearly Cost**: $982
166
-
167
- ### Cost Savings
168
-
169
- | Metric | Baseline | Proposed | Savings |
170
- |--------|----------|----------|---------|
171
- | Daily | $100 | $2.69 | **97.3%** |
172
- | Monthly | $3,000 | $80.70 | **97.3%** |
173
- | Yearly | $36,500 | $982 | **97.3%** |
174
-
175
- ### Performance Impact
176
-
177
- Expected SWE-Bench performance:
178
- - **Baseline**: Opus 4.5 (highest scores)
179
- - **Proposed**:
180
- - Planner (DeepSeek-V3.2): 73.1% (verified)
181
- - Executor (GLM-4.7): Strong on straightforward tasks
182
- - **Composite**: Estimated 85-90% of baseline
183
-
184
- **Trade-off**: Accept 10-15% performance drop for 97% cost reduction
185
-
186
- ## Risk Assessment
187
-
188
- ### Risks
189
-
190
- 1. **Routing Errors**: Poor model selection for tasks
191
- - **Mitigation**: Start conservative, 10% fallback to Opus
192
- - **Monitoring**: Track task success rates per model
193
-
194
- 2. **Quality Regression**: Lower code质量
195
- - **Mitigation**: Add review loops, use quality droids
196
- - **Monitoring**: Track test pass rates, bug counts
197
-
198
- 3. **API Reliability**: DeepSeek availability issues
199
- - **Mitigation**: Multi-in redundancy, fallback to Opus
200
- - **Monitoring**: Uptime, latency tracking
201
-
202
- ### Rollback Plan
203
-
204
- If metrics degrade >20%, revert to Opus 4.5-only mode within 24 hours.
205
-
206
- ## Next Steps
207
-
208
- 1. **Week 1**: Implement router with conservative routing (20% direct to Opus)
209
- 2. **Week 2**: Integrate DeepSeek-V3.2 API, test on 10% of tasks
210
- 3. **Week 3**: Shift to 50/50 routing, monitor carefully
211
- 4. **Week 4**: Full deployment, 95% tasks to proposed architecture
212
-
213
- ## Success Metrics
214
-
215
- - Cost reduction: >90% achieved by month 1
216
- - Performance: <20% drop vs baseline
217
- - Reliability: <5% increase in task failures
218
- - ROI: Break-even within 2 weeks
219
-
220
- ---
221
-
222
- **Status**: Draft - Ready for review and implementation
223
- **Created**: 2026-01-21
224
- **Next Review**: 2026-01-28 (after week 1 pilot)
@@ -1,68 +0,0 @@
1
- # Platform Gating
2
-
3
- How UAP's policy gate (the DB-driven enforcement that blocks tool calls via
4
- `policies.db` + `.policy-tools/*.py`) is applied across each supported agent
5
- harness, and where harness limits make it advisory.
6
-
7
- ## Install & validate
8
-
9
- ```bash
10
- uap hooks install # all project platforms (Hermes is global → opt-in)
11
- uap hooks install -t hermes # Hermes (writes global ~/.hermes/config.yaml)
12
- uap hooks doctor # audit coverage; exits non-zero on gaps
13
- uap setup # now also installs hooks (Step 7)
14
- ```
15
-
16
- The gate script `templates/hooks/uap-policy-gate.sh` is copied into each
17
- platform's hooks dir and registered on that platform's pre-tool event. It reads
18
- the tool payload on stdin, runs the active enforcers, and blocks with `exit 2`
19
- (Claude convention). Hermes uses a wrapper (`uap-policy-gate-hermes.sh`) that
20
- translates `exit 2` into a stdout `{"decision":"block"}` JSON.
21
-
22
- ## Coverage matrix
23
-
24
- | Platform | Tier | Pre-tool mechanism | Config |
25
- |---|---|---|---|
26
- | claude | ✅ gated | `PreToolUse` hooks (Edit/Write/MultiEdit, Bash, Task/Agent/…) | `.claude/settings.local.json` |
27
- | vscode | ✅ gated | same (Claude format) | `.claude/settings.local.json` |
28
- | cursor | ✅ gated | `preToolUse` array | `.cursor/hooks.json` |
29
- | factory | ✅ gated | `PreToolUse` hooks | `.factory/settings.local.json` |
30
- | opencode | ✅ gated | `tool.execute.before` plugin hook (throws to abort) | `.opencode/plugin/uap-session-hooks.ts` |
31
- | omp | ✅ gated | `preToolUsePolicyGate` hook | `.uap/omp/settings.json` |
32
- | hermes | ✅ gated | `pre_tool_call` shell hook (stdout block JSON) | `~/.hermes/config.yaml` (global) |
33
- | codex | ⚠️ MCP-gated | no native pre-tool hook event | `.codex/config.toml` `[mcp_servers.uap]` |
34
- | forgecode | ⚠️ advisory | plugin injects policy context; no block path | `.forge/forgecode.plugin.sh` |
35
-
36
- ## Harness limits (why two platforms are not hard-gated)
37
-
38
- - **Codex** has no pre-tool-use *hook event*, so it can't auto-run the gate
39
- before every tool. Gating is **hard** for tools routed through the UAP MCP
40
- server (`execute_tool` runs the PolicyGate) and **advisory** for codex-native
41
- edit/bash (run `bash .codex/hooks/uap-policy-gate.sh` per AGENTS.md). `hooks
42
- doctor` reports codex as MCP-gated.
43
- - **ForgeCode**'s plugin surfaces session/compaction lifecycle and injects the
44
- active-policy list as context, but exposes no pre-tool interception point that
45
- can *block*. Reported as advisory.
46
-
47
- ## Hermes specifics
48
-
49
- - Config is **global** (`$HERMES_HOME` or `~/.hermes/config.yaml`), so it is
50
- excluded from the default `uap hooks install` loop and installed explicitly
51
- with `-t hermes`. `hooks doctor` treats an absent `~/.hermes` as optional, and
52
- a present-but-unwired install as a real gap.
53
- - Hermes hooks are **fail-open** (a crashing/exit-non-zero/bad-JSON hook lets the
54
- tool proceed). The UAP Hermes gate therefore always exits 0 and always emits a
55
- valid decision JSON, so genuine blocks are enforced.
56
- - Hermes prompts once to approve each hook command (stored in
57
- `~/.hermes/shell-hooks-allowlist.json`); approve the UAP gate, or set
58
- `hooks_auto_accept: true`.
59
- - Hermes has no per-file persona registry, so UAP droids are surfaced via a
60
- skills bridge (`~/.hermes/uap-skills/uap-experts/SKILL.md`) that routes to
61
- `uap expert-route` and the MCP `experts.<name>` tools.
62
-
63
- ## Key files
64
-
65
- - Installer + doctor: `src/cli/hooks.ts` (`copyHookScripts`, `installHermesHooks`, `auditPlatform`, `hooksDoctor`, `ALL_TARGETS`).
66
- - Gate scripts: `templates/hooks/uap-policy-gate.sh`, `templates/hooks/uap-policy-gate-hermes.sh`.
67
- - MCP-router gate (codex path): `src/mcp-router/tools/execute.ts:handleExecuteTool`.
68
- - Setup wiring: `src/cli/setup.ts`.