@miller-tech/uap 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/README.md +109 -642
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/cli/deliver-defaults.d.ts +23 -0
  4. package/dist/cli/deliver-defaults.d.ts.map +1 -0
  5. package/dist/cli/deliver-defaults.js +121 -0
  6. package/dist/cli/deliver-defaults.js.map +1 -0
  7. package/dist/cli/init.d.ts.map +1 -1
  8. package/dist/cli/init.js +29 -0
  9. package/dist/cli/init.js.map +1 -1
  10. package/dist/cli/setup.d.ts.map +1 -1
  11. package/dist/cli/setup.js +19 -0
  12. package/dist/cli/setup.js.map +1 -1
  13. package/dist/policies/policy-tools.d.ts +7 -0
  14. package/dist/policies/policy-tools.d.ts.map +1 -1
  15. package/dist/policies/policy-tools.js +24 -2
  16. package/dist/policies/policy-tools.js.map +1 -1
  17. package/docs/INDEX.md +48 -286
  18. package/docs/architecture/OVERVIEW.md +328 -0
  19. package/docs/architecture/PROTOCOL.md +204 -0
  20. package/docs/benchmarks/README.md +17 -192
  21. package/docs/getting-started/CONFIGURATION.md +237 -0
  22. package/docs/getting-started/INSTALLATION.md +125 -0
  23. package/docs/getting-started/QUICKSTART.md +115 -0
  24. package/docs/guides/COORDINATION.md +162 -0
  25. package/docs/guides/DELIVER.md +115 -0
  26. package/docs/guides/DEPLOY_BATCHING.md +212 -0
  27. package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
  28. package/docs/guides/LOCAL_MODELS.md +148 -0
  29. package/docs/guides/MCP_ROUTER.md +195 -0
  30. package/docs/guides/MEMORY.md +235 -0
  31. package/docs/guides/MULTI_MODEL.md +223 -0
  32. package/docs/guides/POLICIES.md +190 -0
  33. package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
  34. package/docs/integrations/MCP_ROUTER.md +147 -0
  35. package/docs/integrations/RTK.md +102 -0
  36. package/docs/reference/API.md +485 -0
  37. package/docs/reference/CLI.md +719 -0
  38. package/docs/reference/CONFIGURATION.md +90 -193
  39. package/docs/reference/DATABASE_SCHEMA.md +110 -344
  40. package/docs/reference/FEATURES.md +176 -472
  41. package/docs/reference/PATTERNS.md +102 -0
  42. package/docs/reference/PLATFORMS.md +83 -0
  43. package/package.json +3 -1
  44. package/src/policies/enforcers/7ebbc721-7540-4e9f-879a-770e0213a09b_architecture_review.py +101 -0
  45. package/src/policies/enforcers/__pycache__/_common.cpython-312.pyc +0 -0
  46. package/src/policies/enforcers/_common.py +100 -0
  47. package/src/policies/enforcers/artifact_hygiene.py +52 -0
  48. package/src/policies/enforcers/cluster_routing.py +63 -0
  49. package/src/policies/enforcers/codebase_read_before_plan.py +52 -0
  50. package/src/policies/enforcers/coord_overlap.py +81 -0
  51. package/src/policies/enforcers/delivery_enforcement.py +97 -0
  52. package/src/policies/enforcers/doc_live_over_report.py +50 -0
  53. package/src/policies/enforcers/expert_review_required.py +135 -0
  54. package/src/policies/enforcers/iac_parity.py +53 -0
  55. package/src/policies/enforcers/mcp_router_first.py +37 -0
  56. package/src/policies/enforcers/memory_before_plan.py +61 -0
  57. package/src/policies/enforcers/parallel_reads.py +50 -0
  58. package/src/policies/enforcers/rtk_wrap.py +44 -0
  59. package/src/policies/enforcers/schema_diff_gate.py +80 -0
  60. package/src/policies/enforcers/session_memory_write.py +52 -0
  61. package/src/policies/enforcers/task_required.py +131 -0
  62. package/src/policies/enforcers/test_gate.py +58 -0
  63. package/src/policies/enforcers/validate_plan_before_build.py +75 -0
  64. package/src/policies/enforcers/worktree_required.py +57 -0
  65. package/src/policies/schemas/policies/architecture-review.md +51 -0
  66. package/src/policies/schemas/policies/artifact-hygiene.md +29 -0
  67. package/src/policies/schemas/policies/cluster-routing.md +31 -0
  68. package/src/policies/schemas/policies/codebase-read-before-plan.md +30 -0
  69. package/src/policies/schemas/policies/coord-overlap.md +24 -0
  70. package/src/policies/schemas/policies/delivery-enforcement.md +45 -0
  71. package/src/policies/schemas/policies/doc-live-over-report.md +32 -0
  72. package/src/policies/schemas/policies/expert-review-required.md +60 -0
  73. package/src/policies/schemas/policies/iac-parity.md +31 -0
  74. package/src/policies/schemas/policies/mandatory-testing-deployment.md +147 -0
  75. package/src/policies/schemas/policies/mcp-router-first.md +24 -0
  76. package/src/policies/schemas/policies/memory-before-plan.md +24 -0
  77. package/src/policies/schemas/policies/merge-deploy-monitor-verify.md +145 -0
  78. package/src/policies/schemas/policies/parallel-reads.md +24 -0
  79. package/src/policies/schemas/policies/rtk-wrap.md +26 -0
  80. package/src/policies/schemas/policies/schema-diff-gate.md +30 -0
  81. package/src/policies/schemas/policies/session-memory-write.md +24 -0
  82. package/src/policies/schemas/policies/task-required.md +49 -0
  83. package/src/policies/schemas/policies/test-gate.md +24 -0
  84. package/src/policies/schemas/policies/validate-plan-before-build.md +28 -0
  85. package/src/policies/schemas/policies/worktree-required.md +28 -0
  86. package/templates/hooks/uap-policy-gate.sh +5 -0
  87. package/docs/AGENTS.md +0 -423
  88. package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
  89. package/docs/GETTING_STARTED.md +0 -288
  90. package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
  91. package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
  92. package/docs/architecture/EXPERT_STACK.md +0 -137
  93. package/docs/architecture/MULTI_MODEL.md +0 -224
  94. package/docs/architecture/PLATFORM_GATING.md +0 -68
  95. package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
  96. package/docs/architecture/UAP_COMPLIANCE.md +0 -217
  97. package/docs/architecture/UAP_PROTOCOL.md +0 -339
  98. package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
  99. package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
  100. package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
  101. package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
  102. package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
  103. package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
  104. package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
  105. package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
  106. package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
  107. package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
  108. package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
  109. package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
  110. package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
  111. package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
  112. package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
  113. package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
  114. package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
  115. package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
  116. package/docs/archive/opencode-integration-guide.md +0 -740
  117. package/docs/archive/opencode-integration-quickref.md +0 -180
  118. package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
  119. package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
  120. package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
  121. package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
  122. package/docs/blog/local-coding-agents.md +0 -266
  123. package/docs/blog/x-thread.md +0 -254
  124. package/docs/deployment/DEPLOYMENT.md +0 -895
  125. package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
  126. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
  127. package/docs/deployment/DEPLOY_BATCHING.md +0 -273
  128. package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
  129. package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
  130. package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
  131. package/docs/getting-started/INTEGRATION.md +0 -628
  132. package/docs/getting-started/OVERVIEW.md +0 -324
  133. package/docs/getting-started/SETUP.md +0 -377
  134. package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
  135. package/docs/integrations/RTK_INTEGRATION.md +0 -468
  136. package/docs/operations/TROUBLESHOOTING.md +0 -660
  137. package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
  138. package/docs/pr/UPSTREAM_PRS.md +0 -424
  139. package/docs/reference/API_REFERENCE.md +0 -903
  140. package/docs/reference/EXPERT_DROIDS.md +0 -219
  141. package/docs/reference/HARNESS-MATRIX.md +0 -318
  142. package/docs/reference/PATTERN_LIBRARY.md +0 -636
  143. package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
  144. package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
  145. package/docs/research/DOMAIN_STRATEGIES.md +0 -316
  146. package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
  147. package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
  148. package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
  149. package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
  150. package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
@@ -1,701 +0,0 @@
1
- # UAP Optimization & Dashboard Overlay Plan (Validated)
2
-
3
- > Validated against codebase on 2026-03-17. All references point to real files, types, and services.
4
- > **Updated**: 2026-03-19 after implementing all critical fixes from validated plan.
5
-
6
- ## Validation Summary
7
-
8
- ### What exists today
9
-
10
- - **Policy system**: Full CRUD + enforcement gate + audit trail (`src/policies/`), SQLite-backed, 3 enforcement levels (REQUIRED/RECOMMENDED/OPTIONAL), `togglePolicy()` already on `PolicyMemoryManager`
11
- - **Memory system**: 4-tier (L1-L4), 26 files in `src/memory/`, adaptive context, dynamic retrieval, predictive pre-fetch
12
- - **Model router**: Rule-based (`src/models/router.ts`) + benchmark-data (`src/memory/model-router.ts`) + unified consensus (`src/models/unified-router.ts`), execution profiles per model family (`src/models/execution-profiles.ts`)
13
- - **Dashboard**: 1830-line terminal dashboard (`src/cli/dashboard.ts`) with 8 views, 424-line viz library (`src/cli/visualize.ts`), session telemetry (`src/telemetry/session-telemetry.ts`)
14
- - **No web dashboard exists**. All visualization is chalk-based terminal output.
15
- - **No enforcement stage concept exists** on policies. Policies have `level` (REQUIRED/RECOMMENDED/OPTIONAL) and `isActive` (boolean) but no stage gating.
16
-
17
- ### What the original plan got wrong
18
-
19
- 1. Proposed `ink`/`blessed` TUI -- unnecessary. The existing chalk-based dashboard + visualize.ts primitives already work and are battle-tested. New panels should extend the existing system.
20
- 2. Proposed web React dashboard from scratch -- premature. Option 3 (embedded) is correct: extend the existing `uap dashboard` CLI with new panels first, add a lightweight HTTP/WebSocket server later.
21
- 3. Missed that `PolicyMemoryManager.togglePolicy()` already exists at `src/policies/policy-memory.ts:91`.
22
- 4. Missed that `unified-router.ts` model maps need updating for opus-4.6 and qwen35.
23
- 5. Missed that the session dashboard already shows policies but only as hardcoded bullet items (`dashboard.ts:1305-1317`), not from the database.
24
-
25
- ### Codebase gaps identified (2026-03-17 validated plan)
26
-
27
- **CRITICAL - Fixed in Phase 1 (2026-03-19):**
28
-
29
- 1. ✅ `src/memory/adaptive-context.ts:829` -- `validModelIds` array missing `opus-4.6`, breaking adaptive learning feedback loop for default models
30
- 2. ✅ `src/models/router.ts` -- `routingMatrix` config defined but never consumed in `selectModel()` method
31
-
32
- **PERFORMANCE - Fixed in Phase 2-4 (2026-03-19):** 3. ✅ `src/tasks/service.ts:566` -- N+1 query pattern in `batchGetWithRelations()` calling `this.get(task.parentId)` for each task 4. ✅ `src/dashboard/data-service.ts:242` -- Dashboard opens/closes SQLite DB on every refresh (every 2 seconds) 5. ✅ `src/memory/dynamic-retrieval.ts:704,763` -- Synchronous file reads on every retrieval call for rarely-changing files 6. ✅ `src/memory/adaptive-context.ts:74` -- Over-provisioned DB pool (5 connections for sync driver) 7. ✅ `src/memory/model-router.ts:203` -- Over-provisioned DB pool (5 connections for sync driver)
33
-
34
- **RESOURCE LEAKS - Fixed in Phase 3 (2026-03-19):** 8. ✅ `src/utils/adaptive-cache.ts:160` -- setInterval already has `.unref()` (already fixed) 9. ✅ `src/models/analytics.ts:294-299` -- process.on('exit') handler exists (already fixed) 10. ✅ `src/policies/policy-gate.ts:493-499` -- process.on('exit') handler exists (already fixed)
35
-
36
- **ALREADY FIXED:**
37
-
38
- - Costs array capping in `src/telemetry/session-telemetry.ts:711-714`
39
- - Constant hoisting in `src/memory/adaptive-context.ts:672`
40
- - File read caching with mtime invalidation (new in Phase 4)
41
-
42
- ---
43
-
44
- ## Part 1: Model Optimization
45
-
46
- ### A. Immediate Fixes (Prerequisite)
47
-
48
- #### 1.1 Update unified-router model maps
49
-
50
- **File**: `src/models/unified-router.ts:35-49`
51
-
52
- ```typescript
53
- const BENCHMARK_TO_RULE_MODEL_MAP: Record<string, string> = {
54
- 'claude-opus-4.5': 'opus-4.5',
55
- 'claude-opus-4.6': 'opus-4.6', // ADD
56
- 'gpt-5.2': 'gpt-5.2',
57
- 'glm-4.7': 'glm-4.7',
58
- 'gpt-5.2-codex': 'gpt-5.2',
59
- qwen35: 'qwen35', // ADD
60
- };
61
-
62
- const RULE_TO_BENCHMARK_MODEL_MAP: Record<string, ModelId> = {
63
- 'opus-4.5': 'claude-opus-4.5',
64
- 'opus-4.6': 'claude-opus-4.6', // ADD
65
- 'gpt-5.2': 'gpt-5.2',
66
- 'glm-4.7': 'glm-4.7',
67
- 'deepseek-v3.2': 'gpt-5.2',
68
- 'deepseek-v3.2-exp': 'gpt-5.2',
69
- 'qwen35-a3b': 'glm-4.7',
70
- qwen35: 'qwen35', // ADD
71
- };
72
- ```
73
-
74
- #### 1.2 Update CLI model defaults
75
-
76
- **File**: `src/cli/model.ts:44-59`
77
-
78
- Change `getMultiModelConfig()` fallback to use `ModelRouter.getDefaultUAPConfig()` instead of hardcoded opus-4.5 defaults.
79
-
80
- #### 1.3 Add benchmark fingerprint for opus-4.6 and qwen35
81
-
82
- **File**: `src/memory/model-router.ts`
83
-
84
- Add `MODEL_FINGERPRINTS` entries for `claude-opus-4.6` and `qwen35` so the benchmark-data router can track them.
85
-
86
- ### B. Qwen 3.5 Optimizations
87
-
88
- #### 2.1 Dynamic quantization switching
89
-
90
- **Where**: New function in `src/models/execution-profiles.ts`
91
-
92
- The `SMALL_MOE_PROFILE` already covers qwen3.5 correctly. Enhancement: add a `quantizationHint` field to `ExecutionProfile` so the llama.cpp server can be told which quant to load.
93
-
94
- ```typescript
95
- // Add to ExecutionProfile interface
96
- quantizationHint?: {
97
- low: string; // e.g. 'iq2_xs' for simple tasks
98
- medium: string; // e.g. 'iq4_xs' for standard tasks
99
- high: string; // e.g. 'q5_k_m' for complex tasks
100
- };
101
- ```
102
-
103
- The router already classifies complexity. Wire the quant hint into the `ModelSelection` result so the agent runner can pass it to the llama.cpp endpoint.
104
-
105
- #### 2.2 Context window management
106
-
107
- **Where**: Extend `src/memory/context-compressor.ts` and `src/memory/adaptive-context.ts`
108
-
109
- These already exist and handle token budgets. Enhancement:
110
-
111
- - Add a `modelContextBudget` field to `ModelConfig` in `src/models/types.ts` (distinct from `maxContextTokens`) representing the _effective_ context the model handles well
112
- - For qwen35: `maxContextTokens: 262144` but `modelContextBudget: 32768` (sweet spot for 3B active params)
113
- - `AdaptiveContext` already selects context level by task type -- wire it to respect `modelContextBudget`
114
-
115
- #### 2.3 Prompt token budget tracking
116
-
117
- **Where**: `src/memory/context-compressor.ts` already has `SemanticCompressor` with entropy-aware compression
118
-
119
- Enhancement: expose a per-session token counter that the dashboard can read. Add to `globalSessionStats` in `src/mcp-router/session-stats.ts`:
120
-
121
- ```typescript
122
- // Already exists: totalContextBytes, totalRawBytes, savingsRatio
123
- // Add:
124
- modelTokenBudget: number; // from modelContextBudget
125
- modelTokensConsumed: number; // running total
126
- compressionEvents: number; // how many times compressor fired
127
- ```
128
-
129
- ### C. Multi-Model Routing Enhancements
130
-
131
- #### 3.1 Complexity-based routing matrix
132
-
133
- **Where**: `src/models/router.ts` -- `selectAdaptiveModel()` already implements this logic
134
-
135
- Current behavior (validated):
136
-
137
- - `critical`/`high` -> planner (opus-4.6)
138
- - `medium` -> executor (qwen35)
139
- - `low` -> cheapest model (qwen35, $0/1M)
140
-
141
- This is correct. No change needed for the matrix itself.
142
-
143
- Enhancement: add a `routingMatrix` config option to `MultiModelConfig` so users can override per-complexity routing without editing code:
144
-
145
- ```typescript
146
- // Add to MultiModelConfig in src/models/types.ts
147
- routingMatrix?: Record<TaskComplexity, { planner: string; executor: string }>;
148
- ```
149
-
150
- #### 3.2 Performance analytics module
151
-
152
- **Where**: New file `src/models/analytics.ts`
153
-
154
- ```typescript
155
- export interface TaskOutcome {
156
- modelId: string;
157
- taskType: string;
158
- complexity: TaskComplexity;
159
- success: boolean;
160
- durationMs: number;
161
- tokensUsed: { input: number; output: number };
162
- cost: number;
163
- timestamp: string;
164
- }
165
-
166
- export class ModelAnalytics {
167
- private db: Database; // SQLite, same pattern as other DBs
168
-
169
- recordOutcome(outcome: TaskOutcome): void;
170
- getSuccessRate(modelId: string, taskType?: string): number;
171
- getAvgLatency(modelId: string, taskType?: string): number;
172
- getOptimalRouting(): Record<string, string>; // taskType -> modelId
173
- getCostBreakdown(since?: Date): CostBreakdown[];
174
- }
175
- ```
176
-
177
- This feeds into the dashboard cost tracker panel.
178
-
179
- ---
180
-
181
- ## Part 2: Dashboard -- Policies / Memories / Model Active Panels
182
-
183
- ### A. Architecture Decision
184
-
185
- **Extend the existing `src/cli/dashboard.ts`** with new panels and a new `uap dashboard policies` view. Do NOT build a separate TUI framework. The existing chalk + visualize.ts primitives cover everything needed.
186
-
187
- For the web overlay (Phase 3), add a thin HTTP + WebSocket server that serves JSON from the same data sources the CLI dashboard reads. A single-page HTML file (like the existing `web/generator.html` pattern) consumes it.
188
-
189
- ### B. New Dashboard Panels
190
-
191
- #### Panel 1: Policies Active
192
-
193
- **CLI command**: `uap dashboard policies`
194
-
195
- Reads from `PolicyMemoryManager.getAllPolicies()` and `PolicyGate.getAuditTrail()`.
196
-
197
- ```
198
- UAP Policies Dashboard
199
- ──────────────────────────────────────────────────────
200
-
201
- Active Policies (3)
202
- ──────────────────────────────────────────────────────
203
- Name Level Category Stage Status
204
- ─────────────────────────────────────────────────────────────────────
205
- IaC State Parity REQUIRED code pre-exec ON
206
- Mandatory File Backup REQUIRED code pre-exec ON
207
- Image Asset Verification RECOMMENDED image pre-exec ON
208
-
209
- Enforcement Stages
210
- ──────────────────────────────────────────────────────
211
- pre-exec ████████████████████ 3 policies
212
- post-exec ░░░░░░░░░░░░░░░░░░░ 0 policies
213
- review ░░░░░░░░░░░░░░░░░░░ 0 policies
214
-
215
- Recent Audit Trail (last 10)
216
- ──────────────────────────────────────────────────────
217
- 2026-03-17 14:23 IaC State Parity web_browser ALLOWED
218
- 2026-03-17 14:22 Mandatory File Backup file_write ALLOWED
219
- 2026-03-17 14:20 IaC State Parity terraform BLOCKED "No state file"
220
-
221
- Toggle: uap policy toggle <id> --off
222
- Stage: uap policy stage <id> --stage post-exec
223
- Level: uap policy level <id> --level OPTIONAL
224
- ```
225
-
226
- #### Panel 2: Memories Active
227
-
228
- **CLI command**: `uap dashboard memories`
229
-
230
- Extends the existing memory section in `showSessionDashboard()` (`dashboard.ts:1217-1247`).
231
-
232
- ```
233
- UAP Memories Dashboard
234
- ──────────────────────────────────────────────────────
235
-
236
- Memory Tiers
237
- ──────────────────────────────────────────────────────
238
- L1 Working ████████░░░░░░░░░░░░ 42/50 entries 12 KB
239
- L2 Session ██░░░░░░░░░░░░░░░░░░ 8 entries 3 KB
240
- L3 Semantic Qdrant: Running (Up 4h 23m) 1,247 vectors
241
- L4 Knowledge 23 entities 47 relationships
242
-
243
- Active Memories This Session (by type)
244
- ──────────────────────────────────────────────────────
245
- decision ████████████ 12
246
- observation ████████ 8
247
- pattern ██████ 6
248
- correction ██ 2
249
-
250
- Open Loops (3)
251
- ──────────────────────────────────────────────────────
252
- > TODO: wire dashboard WebSocket to session-stats
253
- > BLOCKED: Qdrant cloud migration pending API key
254
- > REVIEW: memory consolidation threshold too aggressive
255
-
256
- Compression Stats
257
- ──────────────────────────────────────────────────────
258
- Token budget: 32,768 / 262,144 (12.5%)
259
- Compressions: 4 this session
260
- Savings ratio: 73.2%
261
- ```
262
-
263
- #### Panel 3: Model Active Per Task
264
-
265
- **CLI command**: `uap dashboard models`
266
-
267
- Reads from `ModelRouter`, `UnifiedRoutingService`, and the new `ModelAnalytics`.
268
-
269
- ```
270
- UAP Model Dashboard
271
- ──────────────────────────────────────────────────────
272
-
273
- Active Configuration
274
- ──────────────────────────────────────────────────────
275
- Planner: opus-4.6 Claude Opus 4.6 $7.50/$37.50 per 1M
276
- Executor: qwen35 Qwen 3.5 (local) $0.00/$0.00
277
- Reviewer: opus-4.6 Claude Opus 4.6
278
- Fallback: qwen35 Qwen 3.5 (local)
279
- Strategy: balanced
280
-
281
- Routing Matrix
282
- ──────────────────────────────────────────────────────
283
- Complexity Planner Executor
284
- low qwen35 qwen35 $0.00
285
- medium opus-4.6 qwen35 $0.04
286
- high opus-4.6 opus-4.6 $0.22
287
- critical opus-4.6 opus-4.6 $0.22
288
-
289
- Session Usage
290
- ──────────────────────────────────────────────────────
291
- Model Tasks Tokens In Tokens Out Cost Success
292
- opus-4.6 3 4,521 2,103 $0.11 100%
293
- qwen35 12 18,432 9,876 $0.00 91.7%
294
-
295
- Execution Profile: small-moe (Qwen 3.5)
296
- ──────────────────────────────────────────────────────
297
- domainHints: ON webSearch: OFF reflectionCheckpoints: OFF
298
- temperature: 0.15 loopEscapeThreshold: 3 toolChoiceForce: required
299
- softBudget: 35 hardBudget: 50
300
-
301
- Unified Router Consensus
302
- ──────────────────────────────────────────────────────
303
- Last 10 decisions: 8 consensus, 1 rule-based, 1 benchmark-data
304
- Avg confidence: 0.82
305
- ```
306
-
307
- ### C. Policy Enforcement Stages & Toggling
308
-
309
- #### Schema Changes
310
-
311
- **File**: `src/policies/schemas/policy.ts`
312
-
313
- Add `enforcementStage` to the policy schema:
314
-
315
- ```typescript
316
- export const PolicySchema = z.object({
317
- id: z.string().uuid(),
318
- name: z.string(),
319
- category: z.enum(['image', 'code', 'security', 'testing', 'ui', 'automation', 'custom']),
320
- level: z.enum(['REQUIRED', 'RECOMMENDED', 'OPTIONAL']),
321
- enforcementStage: z.enum(['pre-exec', 'post-exec', 'review', 'always']).default('pre-exec'), // NEW
322
- rawMarkdown: z.string(),
323
- convertedFormat: z.string().optional(),
324
- executableTools: z.array(z.string()).optional(),
325
- tags: z.array(z.string()),
326
- createdAt: z
327
- .string()
328
- .refine((d) => !Number.isNaN(Date.parse(d)), { message: 'Invalid ISO date string' }),
329
- updatedAt: z
330
- .string()
331
- .refine((d) => !Number.isNaN(Date.parse(d)), { message: 'Invalid ISO date string' }),
332
- version: z.number(),
333
- isActive: z.boolean(),
334
- priority: z.number().default(50),
335
- });
336
- ```
337
-
338
- **File**: `src/policies/database-manager.ts`
339
-
340
- Add column to `policies` table:
341
-
342
- ```sql
343
- ALTER TABLE policies ADD COLUMN enforcementStage TEXT NOT NULL DEFAULT 'pre-exec';
344
- ```
345
-
346
- Use a migration check pattern (check if column exists before adding).
347
-
348
- #### PolicyGate Changes
349
-
350
- **File**: `src/policies/policy-gate.ts`
351
-
352
- Add stage-aware enforcement:
353
-
354
- ```typescript
355
- async executeWithGates<T>(
356
- operation: string,
357
- args: Record<string, unknown>,
358
- executor: () => Promise<T>,
359
- stage: 'pre-exec' | 'post-exec' | 'review' = 'pre-exec' // NEW param
360
- ): Promise<T> {
361
- const gateResult = await this.checkPolicies(operation, args, stage);
362
- // ... existing logic, but only check policies matching this stage
363
- }
364
-
365
- async checkPolicies(
366
- operation: string,
367
- args: Record<string, unknown>,
368
- stage: 'pre-exec' | 'post-exec' | 'review' | 'always' = 'pre-exec'
369
- ): Promise<GateResult> {
370
- const allPolicies = await this.memory.getAllPolicies();
371
- // Filter to policies matching this stage or 'always'
372
- const stagePolicies = allPolicies.filter(
373
- p => p.enforcementStage === stage || p.enforcementStage === 'always'
374
- );
375
- // ... evaluate only stagePolicies
376
- }
377
- ```
378
-
379
- #### CLI Commands for Toggling
380
-
381
- **File**: `src/bin/policy.ts` (extend existing)
382
-
383
- ```
384
- uap policy toggle <id> [--on|--off] # Uses existing PolicyMemoryManager.togglePolicy()
385
- uap policy stage <id> --stage <stage> # New: change enforcement stage
386
- uap policy level <id> --level <level> # New: change REQUIRED/RECOMMENDED/OPTIONAL
387
- uap policy list # New: list all with status/stage/level
388
- uap policy audit [--policy-id <id>] # New: show audit trail
389
- ```
390
-
391
- Implementation: `togglePolicy()` already exists. Add `setEnforcementStage()` and `setLevel()` to `PolicyMemoryManager`:
392
-
393
- ```typescript
394
- // src/policies/policy-memory.ts
395
- async setEnforcementStage(id: string, stage: 'pre-exec' | 'post-exec' | 'review' | 'always'): Promise<void> {
396
- this.db.updatePolicy({ id }, { enforcementStage: stage, updatedAt: new Date().toISOString() });
397
- }
398
-
399
- async setLevel(id: string, level: 'REQUIRED' | 'RECOMMENDED' | 'OPTIONAL'): Promise<void> {
400
- this.db.updatePolicy({ id }, { level, updatedAt: new Date().toISOString() });
401
- }
402
- ```
403
-
404
- ### D. Grouping: Per-Task vs Grouped Display
405
-
406
- The dashboard supports both views:
407
-
408
- 1. **Grouped view** (default for `uap dashboard policies`, `uap dashboard models`): Shows aggregate state -- all active policies, all model assignments, memory tier health.
409
-
410
- 2. **Per-task view** (when a task ID is provided): Shows what was active _for that specific task_.
411
-
412
- ```
413
- uap dashboard policies # Grouped: all policies, stages, audit
414
- uap dashboard policies --task <task-id> # Per-task: which policies fired for this task
415
- uap dashboard models # Grouped: all model assignments, session totals
416
- uap dashboard models --task <task-id> # Per-task: which model handled this task, tokens, cost
417
- uap dashboard memories --task <task-id> # Per-task: memories retrieved/stored for this task
418
- ```
419
-
420
- Per-task view requires linking `policy_executions` and `ModelAnalytics.TaskOutcome` to task IDs. Add a `taskId` column to both:
421
-
422
- - `policy_executions` table: `taskId TEXT` (nullable, for backward compat)
423
- - `ModelAnalytics` outcomes table: `taskId TEXT`
424
-
425
- ---
426
-
427
- ## Part 3: Phase 3 -- Advanced Features
428
-
429
- ### A. Web Overlay (Option 3: Embedded)
430
-
431
- Architecture: The CLI dashboard functions already compute all the data. Extract the data-gathering logic into shared service functions, then expose via a lightweight HTTP server.
432
-
433
- #### 3.1 Data service layer
434
-
435
- **New file**: `src/dashboard/data-service.ts`
436
-
437
- ```typescript
438
- export interface DashboardData {
439
- policies: PolicyDashboardData;
440
- memories: MemoryDashboardData;
441
- models: ModelDashboardData;
442
- tasks: TaskDashboardData;
443
- coordination: CoordinationDashboardData;
444
- }
445
-
446
- export async function getDashboardData(): Promise<DashboardData> {
447
- // Reuse the same DB queries from dashboard.ts but return structured data
448
- // instead of printing to console
449
- }
450
- ```
451
-
452
- #### 3.2 Embedded HTTP + WebSocket server
453
-
454
- **New file**: `src/dashboard/server.ts`
455
-
456
- ```typescript
457
- import { createServer } from 'http';
458
- import { WebSocketServer } from 'ws'; // ws package, already common in Node ecosystem
459
- import { getDashboardData } from './data-service.js';
460
-
461
- export function startDashboardServer(port: number = 3847): void {
462
- const server = createServer(async (req, res) => {
463
- if (req.url === '/api/dashboard') {
464
- const data = await getDashboardData();
465
- res.writeHead(200, { 'Content-Type': 'application/json' });
466
- res.end(JSON.stringify(data));
467
- }
468
- if (req.url === '/') {
469
- // Serve the single-page dashboard HTML (like web/generator.html pattern)
470
- res.writeHead(200, { 'Content-Type': 'text/html' });
471
- res.end(DASHBOARD_HTML); // Inline or read from file
472
- }
473
- });
474
-
475
- const wss = new WebSocketServer({ server });
476
- // Push updates every 2s
477
- setInterval(async () => {
478
- const data = await getDashboardData();
479
- for (const client of wss.clients) {
480
- client.send(JSON.stringify(data));
481
- }
482
- }, 2000);
483
-
484
- server.listen(port);
485
- }
486
- ```
487
-
488
- #### 3.3 CLI integration
489
-
490
- **File**: `src/cli/dashboard.ts`
491
-
492
- ```
493
- uap dashboard serve [--port 3847] # Start embedded web dashboard
494
- ```
495
-
496
- Launches as foreground process. When opencode/claude-code exits, the server dies with it (child process of the same shell).
497
-
498
- #### 3.4 Single-page HTML dashboard
499
-
500
- **New file**: `web/dashboard.html`
501
-
502
- Self-contained HTML + CSS + vanilla JS (no build step, same pattern as `web/generator.html`). Connects to `ws://localhost:3847`, renders:
503
-
504
- - Policy table with toggle buttons (POST to `/api/policy/:id/toggle`)
505
- - Memory tier gauges
506
- - Model routing live view
507
- - Cost tracker
508
- - Task timeline
509
-
510
- ### B. Historical session comparison
511
-
512
- Store session snapshots in SQLite (`agents/data/memory/sessions.db`):
513
-
514
- ```sql
515
- CREATE TABLE session_snapshots (
516
- id TEXT PRIMARY KEY,
517
- timestamp TEXT NOT NULL,
518
- data TEXT NOT NULL, -- JSON blob from getDashboardData()
519
- duration_ms INTEGER,
520
- total_cost REAL,
521
- tasks_completed INTEGER,
522
- models_used TEXT -- JSON array
523
- );
524
- ```
525
-
526
- CLI: `uap dashboard history [--last 10]`
527
-
528
- ### C. Export
529
-
530
- `uap dashboard export [--format json|csv] [--output file]`
531
-
532
- Dumps current dashboard data. JSON is the `DashboardData` object. CSV flattens the key tables (policies, model usage, task outcomes).
533
-
534
- ---
535
-
536
- ## Part 4: Implementation Roadmap
537
-
538
- ### Phase 1: Foundation (Week 1-2)
539
-
540
- | # | Task | File(s) | Status |
541
- | --- | ---------------------------------------------------------------- | -------------------------------------------------------------------- | ------ |
542
- | 1 | Update unified-router model maps for opus-4.6/qwen35 | `src/models/unified-router.ts` | Ready |
543
- | 2 | Update CLI model defaults | `src/cli/model.ts` | Ready |
544
- | 3 | Add benchmark fingerprints for new models | `src/memory/model-router.ts` | Ready |
545
- | 4 | Add `enforcementStage` to policy schema + DB migration | `src/policies/schemas/policy.ts`, `src/policies/database-manager.ts` | Ready |
546
- | 5 | Add `setEnforcementStage()`, `setLevel()` to PolicyMemoryManager | `src/policies/policy-memory.ts` | Ready |
547
- | 6 | Add stage-aware filtering to PolicyGate | `src/policies/policy-gate.ts` | Ready |
548
- | 7 | Add policy CLI commands (toggle/stage/level/list/audit) | `src/bin/policy.ts` | Ready |
549
-
550
- ### Phase 2: Dashboard Panels (Week 3-4)
551
-
552
- | # | Task | File(s) | Status |
553
- | --- | ------------------------------------------------------- | ------------------------------------------------ | ------ |
554
- | 8 | Replace hardcoded policies section with DB-driven panel | `src/cli/dashboard.ts:1305-1317` | Ready |
555
- | 9 | Build `showPoliciesDashboard()` panel | `src/cli/dashboard.ts` | Ready |
556
- | 10 | Build `showModelsDashboard()` panel | `src/cli/dashboard.ts` | Ready |
557
- | 11 | Extend `showMemoryDashboard()` with compression stats | `src/cli/dashboard.ts` | Ready |
558
- | 12 | Add `--task <id>` per-task filtering to all panels | `src/cli/dashboard.ts` | Ready |
559
- | 13 | Create `ModelAnalytics` module | `src/models/analytics.ts` | Ready |
560
- | 14 | Wire `ModelAnalytics` into router + executor | `src/models/router.ts`, `src/models/executor.ts` | Ready |
561
-
562
- ### Phase 3: Web Overlay + Advanced (Week 5-6)
563
-
564
- | # | Task | File(s) | Status |
565
- | --- | -------------------------------------------- | ------------------------------- | ------ |
566
- | 15 | Extract data-service layer from dashboard.ts | `src/dashboard/data-service.ts` | Ready |
567
- | 16 | Build embedded HTTP + WebSocket server | `src/dashboard/server.ts` | Ready |
568
- | 17 | Build single-page HTML dashboard | `web/dashboard.html` | Ready |
569
- | 18 | Add `uap dashboard serve` command | `src/cli/dashboard.ts` | Ready |
570
- | 19 | Add policy toggle/stage/level API endpoints | `src/dashboard/server.ts` | Ready |
571
- | 20 | Session snapshot storage + history view | `src/dashboard/data-service.ts` | Ready |
572
- | 21 | Export command (JSON/CSV) | `src/cli/dashboard.ts` | Ready |
573
-
574
- ### Phase 4: Model Optimization (Week 7-8)
575
-
576
- | # | Task | File(s) | Status |
577
- | --- | --------------------------------------------------- | --------------------------------------------- | ------ |
578
- | 22 | Add `quantizationHint` to ExecutionProfile | `src/models/execution-profiles.ts` | Ready |
579
- | 23 | Add `modelContextBudget` to ModelConfig | `src/models/types.ts` | Ready |
580
- | 24 | Wire adaptive context to respect modelContextBudget | `src/memory/adaptive-context.ts` | Ready |
581
- | 25 | Add token counter to globalSessionStats | `src/mcp-router/session-stats.ts` | Ready |
582
- | 26 | Add `routingMatrix` config option | `src/models/types.ts`, `src/models/router.ts` | Ready |
583
- | 27 | Training data collection script | `scripts/collect-training-data.py` | Ready |
584
-
585
- ---
586
-
587
- ## Part 5: Dependency Graph
588
-
589
- ```
590
- Phase 1 (foundation)
591
- [1,2,3] unified-router + CLI + fingerprints (parallel, no deps)
592
- [4,5,6] policy schema + memory + gate (sequential: 4 -> 5 -> 6)
593
- [7] policy CLI commands (depends on 5,6)
594
-
595
- Phase 2 (dashboard panels)
596
- [8,9] policies dashboard (depends on 5,6)
597
- [10] models dashboard (depends on 1,2,3)
598
- [11] memory dashboard (no deps)
599
- [12] per-task filtering (depends on 13)
600
- [13,14] ModelAnalytics (depends on 1)
601
-
602
- Phase 3 (web overlay)
603
- [15] data-service extraction (depends on 8,9,10,11)
604
- [16,17,18] server + HTML + CLI (depends on 15)
605
- [19] policy API endpoints (depends on 16, 5,6)
606
- [20,21] history + export (depends on 15)
607
-
608
- Phase 4 (model optimization)
609
- [22-27] all independent of Phase 3, can run in parallel with Phase 2-3
610
- ```
611
-
612
- ---
613
-
614
- ## Part 6: Risk Assessment
615
-
616
- | Risk | Impact | Mitigation |
617
- | ------------------------------------------------------- | ------ | ------------------------------------------------------------------------------------ |
618
- | Policy DB migration breaks existing data | High | Use `ALTER TABLE ADD COLUMN ... DEFAULT` -- backward compatible |
619
- | WebSocket server port conflicts | Low | Configurable port, default 3847 (unlikely to conflict) |
620
- | Dashboard overhead slows agent execution | Medium | Data-service reads are read-only SQLite queries (<5ms each), WebSocket push is async |
621
- | Qwen 3.5 quantization switching requires server restart | Medium | Document as limitation; future: llama.cpp hot-swap support |
622
- | Unified router model map drift | Low | Add test that validates all ModelPresets have map entries |
623
-
624
- ---
625
-
626
- ## Part 7: Test Strategy
627
-
628
- ### Unit tests needed
629
-
630
- ```
631
- test/policies/enforcement-stage.test.ts -- stage filtering in PolicyGate
632
- test/policies/policy-toggle.test.ts -- toggle/level/stage mutations
633
- test/models/unified-router-maps.test.ts -- all presets have map entries
634
- test/models/analytics.test.ts -- outcome recording + queries
635
- test/dashboard/data-service.test.ts -- structured data output
636
- ```
637
-
638
- ### Integration tests
639
-
640
- ```
641
- test/dashboard/policies-panel.test.ts -- end-to-end: store policy -> toggle -> verify dashboard output
642
- test/dashboard/models-panel.test.ts -- route task -> verify model usage in dashboard
643
- test/dashboard/web-server.test.ts -- HTTP + WebSocket connectivity
644
- ```
645
-
646
- ### Validation command
647
-
648
- ```bash
649
- npm test -- --grep "enforcement-stage|unified-router-maps|analytics"
650
- ```
651
-
652
- ---
653
-
654
- ## Implementation Status (2026-03-19)
655
-
656
- ### ✅ Completed - All Critical Issues Fixed
657
-
658
- **Phase 1: Correctness Bugs (COMPLETED)**
659
-
660
- - ✅ Fixed `validModelIds` in `adaptive-context.ts:834` to include `opus-4.6`
661
- - ✅ Updated `ModelId` type in `model-router.ts:23-29` to include `opus-4.6`
662
- - ✅ Updated `RULE_TO_BENCHMARK_MODEL_MAP` in `unified-router.ts:51` to use `opus-4.6`
663
- - ✅ Verified `routingMatrix` consumption in `router.ts:247-264` (already implemented)
664
-
665
- **Phase 2: Performance Fixes (COMPLETED)**
666
-
667
- - ✅ Fixed N+1 query in `tasks/service.ts:535-570` by batching parent lookups
668
- - ✅ Added DB connection caching in `dashboard/data-service.ts:25-26, 242-275`
669
- - ✅ Added file read cache with mtime invalidation in `dynamic-retrieval.ts:40-48, 693-748, 794-845`
670
-
671
- **Phase 3: Resource Leaks (COMPLETED)**
672
-
673
- - ✅ Verified `.unref()` on `adaptive-cache.ts:160` (already present)
674
- - ✅ Verified `process.on('exit')` handlers in `analytics.ts:294-299` and `policy-gate.ts:493-499`
675
- - ✅ Reduced DB_POOL_SIZE from 5 to 1 in `adaptive-context.ts:74` and `model-router.ts:203`
676
-
677
- **Phase 4: Performance Improvements (COMPLETED)**
678
-
679
- - ✅ Added file read caching for `long_term_prepopulated.json` and `CLAUDE.md`
680
- - ✅ Verified constant hoisting in `adaptive-context.ts:672` (already present)
681
-
682
- **Phase 5: Code Quality (COMPLETED)**
683
-
684
- - ✅ No additional improvements needed - all identified issues were already fixed or intentional
685
-
686
- ### Build Status
687
-
688
- ```bash
689
- npm run build # ✅ Passes with zero errors
690
- tsc --noEmit # ✅ Type checking passes
691
- ```
692
-
693
- ### Next Steps
694
-
695
- The validated optimization plan is now fully implemented. Remaining work from the original stale documents can be safely ignored as they describe features that were already implemented before the validation exercise.
696
-
697
- Focus areas for future work:
698
-
699
- 1. Web dashboard implementation (embedded HTTP server with WebSocket)
700
- 2. Enforcement stage concept for policies
701
- 3. Additional pattern optimizations from `.factory/patterns/`