@miller-tech/uap 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/README.md +109 -642
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/cli/deliver-defaults.d.ts +23 -0
  4. package/dist/cli/deliver-defaults.d.ts.map +1 -0
  5. package/dist/cli/deliver-defaults.js +121 -0
  6. package/dist/cli/deliver-defaults.js.map +1 -0
  7. package/dist/cli/init.d.ts.map +1 -1
  8. package/dist/cli/init.js +29 -0
  9. package/dist/cli/init.js.map +1 -1
  10. package/dist/cli/setup.d.ts.map +1 -1
  11. package/dist/cli/setup.js +19 -0
  12. package/dist/cli/setup.js.map +1 -1
  13. package/dist/policies/policy-tools.d.ts +7 -0
  14. package/dist/policies/policy-tools.d.ts.map +1 -1
  15. package/dist/policies/policy-tools.js +24 -2
  16. package/dist/policies/policy-tools.js.map +1 -1
  17. package/docs/INDEX.md +48 -286
  18. package/docs/architecture/OVERVIEW.md +328 -0
  19. package/docs/architecture/PROTOCOL.md +204 -0
  20. package/docs/benchmarks/README.md +17 -192
  21. package/docs/getting-started/CONFIGURATION.md +237 -0
  22. package/docs/getting-started/INSTALLATION.md +125 -0
  23. package/docs/getting-started/QUICKSTART.md +115 -0
  24. package/docs/guides/COORDINATION.md +162 -0
  25. package/docs/guides/DELIVER.md +115 -0
  26. package/docs/guides/DEPLOY_BATCHING.md +212 -0
  27. package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
  28. package/docs/guides/LOCAL_MODELS.md +148 -0
  29. package/docs/guides/MCP_ROUTER.md +195 -0
  30. package/docs/guides/MEMORY.md +235 -0
  31. package/docs/guides/MULTI_MODEL.md +223 -0
  32. package/docs/guides/POLICIES.md +190 -0
  33. package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
  34. package/docs/integrations/MCP_ROUTER.md +147 -0
  35. package/docs/integrations/RTK.md +102 -0
  36. package/docs/reference/API.md +485 -0
  37. package/docs/reference/CLI.md +719 -0
  38. package/docs/reference/CONFIGURATION.md +90 -193
  39. package/docs/reference/DATABASE_SCHEMA.md +110 -344
  40. package/docs/reference/FEATURES.md +176 -472
  41. package/docs/reference/PATTERNS.md +102 -0
  42. package/docs/reference/PLATFORMS.md +83 -0
  43. package/package.json +3 -1
  44. package/src/policies/enforcers/7ebbc721-7540-4e9f-879a-770e0213a09b_architecture_review.py +101 -0
  45. package/src/policies/enforcers/__pycache__/_common.cpython-312.pyc +0 -0
  46. package/src/policies/enforcers/_common.py +100 -0
  47. package/src/policies/enforcers/artifact_hygiene.py +52 -0
  48. package/src/policies/enforcers/cluster_routing.py +63 -0
  49. package/src/policies/enforcers/codebase_read_before_plan.py +52 -0
  50. package/src/policies/enforcers/coord_overlap.py +81 -0
  51. package/src/policies/enforcers/delivery_enforcement.py +97 -0
  52. package/src/policies/enforcers/doc_live_over_report.py +50 -0
  53. package/src/policies/enforcers/expert_review_required.py +135 -0
  54. package/src/policies/enforcers/iac_parity.py +53 -0
  55. package/src/policies/enforcers/mcp_router_first.py +37 -0
  56. package/src/policies/enforcers/memory_before_plan.py +61 -0
  57. package/src/policies/enforcers/parallel_reads.py +50 -0
  58. package/src/policies/enforcers/rtk_wrap.py +44 -0
  59. package/src/policies/enforcers/schema_diff_gate.py +80 -0
  60. package/src/policies/enforcers/session_memory_write.py +52 -0
  61. package/src/policies/enforcers/task_required.py +131 -0
  62. package/src/policies/enforcers/test_gate.py +58 -0
  63. package/src/policies/enforcers/validate_plan_before_build.py +75 -0
  64. package/src/policies/enforcers/worktree_required.py +57 -0
  65. package/src/policies/schemas/policies/architecture-review.md +51 -0
  66. package/src/policies/schemas/policies/artifact-hygiene.md +29 -0
  67. package/src/policies/schemas/policies/cluster-routing.md +31 -0
  68. package/src/policies/schemas/policies/codebase-read-before-plan.md +30 -0
  69. package/src/policies/schemas/policies/coord-overlap.md +24 -0
  70. package/src/policies/schemas/policies/delivery-enforcement.md +45 -0
  71. package/src/policies/schemas/policies/doc-live-over-report.md +32 -0
  72. package/src/policies/schemas/policies/expert-review-required.md +60 -0
  73. package/src/policies/schemas/policies/iac-parity.md +31 -0
  74. package/src/policies/schemas/policies/mandatory-testing-deployment.md +147 -0
  75. package/src/policies/schemas/policies/mcp-router-first.md +24 -0
  76. package/src/policies/schemas/policies/memory-before-plan.md +24 -0
  77. package/src/policies/schemas/policies/merge-deploy-monitor-verify.md +145 -0
  78. package/src/policies/schemas/policies/parallel-reads.md +24 -0
  79. package/src/policies/schemas/policies/rtk-wrap.md +26 -0
  80. package/src/policies/schemas/policies/schema-diff-gate.md +30 -0
  81. package/src/policies/schemas/policies/session-memory-write.md +24 -0
  82. package/src/policies/schemas/policies/task-required.md +49 -0
  83. package/src/policies/schemas/policies/test-gate.md +24 -0
  84. package/src/policies/schemas/policies/validate-plan-before-build.md +28 -0
  85. package/src/policies/schemas/policies/worktree-required.md +28 -0
  86. package/templates/hooks/uap-policy-gate.sh +5 -0
  87. package/docs/AGENTS.md +0 -423
  88. package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
  89. package/docs/GETTING_STARTED.md +0 -288
  90. package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
  91. package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
  92. package/docs/architecture/EXPERT_STACK.md +0 -137
  93. package/docs/architecture/MULTI_MODEL.md +0 -224
  94. package/docs/architecture/PLATFORM_GATING.md +0 -68
  95. package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
  96. package/docs/architecture/UAP_COMPLIANCE.md +0 -217
  97. package/docs/architecture/UAP_PROTOCOL.md +0 -339
  98. package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
  99. package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
  100. package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
  101. package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
  102. package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
  103. package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
  104. package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
  105. package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
  106. package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
  107. package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
  108. package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
  109. package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
  110. package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
  111. package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
  112. package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
  113. package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
  114. package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
  115. package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
  116. package/docs/archive/opencode-integration-guide.md +0 -740
  117. package/docs/archive/opencode-integration-quickref.md +0 -180
  118. package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
  119. package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
  120. package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
  121. package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
  122. package/docs/blog/local-coding-agents.md +0 -266
  123. package/docs/blog/x-thread.md +0 -254
  124. package/docs/deployment/DEPLOYMENT.md +0 -895
  125. package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
  126. package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
  127. package/docs/deployment/DEPLOY_BATCHING.md +0 -273
  128. package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
  129. package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
  130. package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
  131. package/docs/getting-started/INTEGRATION.md +0 -628
  132. package/docs/getting-started/OVERVIEW.md +0 -324
  133. package/docs/getting-started/SETUP.md +0 -377
  134. package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
  135. package/docs/integrations/RTK_INTEGRATION.md +0 -468
  136. package/docs/operations/TROUBLESHOOTING.md +0 -660
  137. package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
  138. package/docs/pr/UPSTREAM_PRS.md +0 -424
  139. package/docs/reference/API_REFERENCE.md +0 -903
  140. package/docs/reference/EXPERT_DROIDS.md +0 -219
  141. package/docs/reference/HARNESS-MATRIX.md +0 -318
  142. package/docs/reference/PATTERN_LIBRARY.md +0 -636
  143. package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
  144. package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
  145. package/docs/research/DOMAIN_STRATEGIES.md +0 -316
  146. package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
  147. package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
  148. package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
  149. package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
  150. package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
@@ -1,221 +0,0 @@
1
- # Speculative Decoding Journey (2026-03)
2
-
3
- This document records the end-to-end speculative decoding stabilization journey across `llama.cpp` runtime tuning and `uap-anthropic-proxy` guardrails, including fixes, benchmark results, and the production profile now in use.
4
-
5
- ## Scope
6
-
7
- - Runtime: `llama.cpp` with Qwen3.5 models, CUDA, `ctx-size=262144`.
8
- - Gateway: Anthropic-compatible proxy (`tools/agents/scripts/anthropic_proxy.py`).
9
- - Client behavior: agentic coding loops with tool calls (Claude Code style).
10
-
11
- ## Goals
12
-
13
- 1. Preserve high speculative decoding throughput.
14
- 2. Eliminate pathological loops and malformed visible output.
15
- 3. Keep tool-call behavior reliable under long sessions.
16
- 4. Keep production context window at `262144`.
17
-
18
- ## Phase 1 - Llama.cpp Speculative Stability
19
-
20
- ### Problems Observed
21
-
22
- - Rollback loops and instability under aggressive speculative settings.
23
- - `find_slot` and related server warnings during long agentic sessions.
24
- - Throughput regressions compared to known fast baseline.
25
-
26
- ### Work Performed
27
-
28
- - Implemented and tested multiple rollback strategies in `llama.cpp` worktree branches.
29
- - Compared baseline fast commit vs newer speculative logic.
30
- - Restored proven fast runtime path for production service while preserving learned guardrails.
31
-
32
- ### Key Runtime Decisions
33
-
34
- - Keep production on fast validated binary lineage (`029edcafc` baseline family).
35
- - Use strict balanced speculative profile for 35B operations:
36
- - `speculative.n_max=12`
37
- - `speculative.n_min=2`
38
- - `speculative.p_min=0.80`
39
-
40
- ### Representative Throughput Findings
41
-
42
- - Qwen3.5-27B, `ctx=262144`, q4 KV cache:
43
- - No spec: ~43 tok/s coding, ~41 tok/s pattern.
44
- - Spec (balanced): ~43 tok/s coding, ~102 tok/s pattern.
45
- - Main uplift appears in pattern-heavy turns, not all coding turns.
46
-
47
- ## Phase 2 - Proxy Reasoning Fallback Leak Fix
48
-
49
- ### Problems Observed
50
-
51
- - Empty visible output (`output_tokens=0`) with large hidden reasoning payloads.
52
- - Proxy emitted malformed chain-of-thought text as fallback, causing user-visible garbage:
53
- - repeated fragments like `</parameter>`, tool schema echoes, policy text loops.
54
-
55
- ### Fixes Implemented
56
-
57
- - Added explicit streaming fallback policy:
58
- - `PROXY_STREAM_REASONING_FALLBACK=off|sanitized|visible`
59
- - `PROXY_STREAM_REASONING_MAX_CHARS`
60
- - Set production default to `off`.
61
-
62
- ### Result
63
-
64
- - Malformed reasoning fallback leakage is suppressed by default.
65
- - Debugging remains possible with `sanitized`/`visible` modes when intentionally enabled.
66
-
67
- ## Phase 3 - Token Floor and Prune Controls
68
-
69
- ### Problems Observed
70
-
71
- - Hardcoded `max_tokens` floor (`16384`) forced very long failure turns.
72
- - Pruning threshold flag alone could trigger pruning path without meaningful message reduction.
73
-
74
- ### Fixes Implemented
75
-
76
- - Added configurable max token floor:
77
- - `PROXY_MAX_TOKENS_FLOOR` (`0` disables floor)
78
- - Added configurable prune target:
79
- - `PROXY_CONTEXT_PRUNE_TARGET_FRACTION`
80
-
81
- ### Live A/B Result (Production-Like)
82
-
83
- `PROXY_MAX_TOKENS_FLOOR=16384` vs `4096`:
84
-
85
- - Silent reasoning-heavy turn:
86
- - `16384`: avg `78.749s`
87
- - `4096`: avg `19.777s`
88
- - Latency reduction: ~`74.9%`
89
- - Predicted throughput unchanged (~`208 tok/s` class)
90
- - Normal tool turns remained stable and slightly faster with `4096`.
91
-
92
- ## Phase 4 - Malformed Tool-Loop Hardening
93
-
94
- ### Problem Pattern
95
-
96
- Under adversarial or degraded prompt states, the model can emit pseudo-tool text instead of valid tool calls, e.g.:
97
-
98
- - `</parameter>` fragments
99
- - echoed policy snippets (`you MUST call a tool...`)
100
- - long no-progress text with no `tool_calls`
101
-
102
- ### Feature Set Added (Flag Controlled)
103
-
104
- 1. **Malformed tool guardrail + retry**
105
- - `PROXY_MALFORMED_TOOL_GUARDRAIL`
106
- - `PROXY_MALFORMED_TOOL_RETRY_MAX`
107
- - `PROXY_MALFORMED_TOOL_RETRY_MAX_TOKENS`
108
- - `PROXY_MALFORMED_TOOL_RETRY_TEMPERATURE`
109
-
110
- 2. **Strict stream guardrail path**
111
- - `PROXY_MALFORMED_TOOL_STREAM_STRICT`
112
- - For stream+tools requests, proxy runs guarded non-stream upstream call, then replays SSE.
113
-
114
- 3. **Tool narrowing (optional)**
115
- - `PROXY_TOOL_NARROWING`
116
- - `PROXY_TOOL_NARROWING_KEEP`
117
- - `PROXY_TOOL_NARROWING_MIN_TOOLS`
118
-
119
- 4. **Disable thinking on tool turns (optional)**
120
- - `PROXY_DISABLE_THINKING_ON_TOOL_TURNS`
121
-
122
- 5. **Session contamination breaker (optional safety net)**
123
- - `PROXY_SESSION_CONTAMINATION_BREAKER`
124
- - `PROXY_SESSION_CONTAMINATION_THRESHOLD`
125
- - `PROXY_SESSION_CONTAMINATION_KEEP_LAST`
126
-
127
- 6. **Agentic supplement mode**
128
- - `PROXY_AGENTIC_SUPPLEMENT_MODE=clean|legacy`
129
-
130
- ### Test Coverage
131
-
132
- - Unit tests in `tools/agents/tests/test_anthropic_proxy_streaming.py`
133
- - Current targeted suite count in this workstream: `16` passing tests.
134
-
135
- ## Benchmark Highlights (Per-Option Toggles)
136
-
137
- ### Artifact Stress Benchmark (v3)
138
-
139
- Source: `/tmp/proxy_visibility_benchmark_v3.json`
140
-
141
- | Mode | Key Flags | Outcome Summary |
142
- | --- | --- | --- |
143
- | Baseline | none | no tool call, policy-echo text surfaced |
144
- | Option 1 | malformed guardrail + strict stream | malformed detected and retried; returned `tool_use` with empty visible text |
145
- | Option 2 | tool narrowing only | not sufficient alone in stress case |
146
- | Option 3 | disable thinking only | not sufficient alone in stress case |
147
- | Option 4 | contamination breaker only | not sufficient alone in this synthetic workload |
148
- | Option 5 | clean supplement only | not sufficient alone in stress case |
149
-
150
- ### Practical Conclusion
151
-
152
- - Strongest primary mitigation: **Option 1** (malformed guardrail + strict stream + bounded retry).
153
- - Other options are secondary tuning aids and should not replace Option 1 for this failure class.
154
-
155
- ## 10-Turn Live Stability Soak
156
-
157
- Source: `/tmp/proxy_10turn_soak_results.json`
158
-
159
- - 10 turns, alternating malformed-stress and normal tool-call turns, single live session id.
160
- - Results:
161
- - Error rate: `0.0%`
162
- - Malformed visible output rate (stress turns): `0.0%`
163
- - Normal tool-call success rate: `100.0%`
164
- - Duration p50/p95: `10.2s` / `21.366s`
165
- - Stop reasons: `tool_use=6`, `max_tokens=3`, `end_turn=1`
166
-
167
- ## Production Profile (Current)
168
-
169
- File: `/home/cogtek/.config/uap/anthropic-proxy.env`
170
-
171
- ```bash
172
- PROXY_MAX_TOKENS_FLOOR=4096
173
- PROXY_STREAM_REASONING_FALLBACK=off
174
-
175
- PROXY_MALFORMED_TOOL_GUARDRAIL=on
176
- PROXY_MALFORMED_TOOL_STREAM_STRICT=on
177
- PROXY_MALFORMED_TOOL_RETRY_MAX=1
178
- PROXY_MALFORMED_TOOL_RETRY_MAX_TOKENS=512
179
- PROXY_MALFORMED_TOOL_RETRY_TEMPERATURE=0
180
-
181
- PROXY_TOOL_NARROWING=off
182
- PROXY_DISABLE_THINKING_ON_TOOL_TURNS=off
183
- PROXY_SESSION_CONTAMINATION_BREAKER=off
184
- PROXY_AGENTIC_SUPPLEMENT_MODE=legacy
185
- ```
186
-
187
- Rationale:
188
-
189
- - Keep the strongest practical fix enabled (malformed guardrail + strict stream path).
190
- - Keep latency-optimized floor (`4096`).
191
- - Keep optional secondary heuristics off unless new evidence warrants enablement.
192
-
193
- ## Reproduction Checklist
194
-
195
- 1. Restart services:
196
-
197
- ```bash
198
- systemctl --user restart uap-llama-server.service
199
- systemctl --user restart uap-anthropic-proxy.service
200
- ```
201
-
202
- 2. Run targeted unit tests:
203
-
204
- ```bash
205
- python3 -m pytest tools/agents/tests/test_anthropic_proxy_streaming.py -q
206
- ```
207
-
208
- 3. Run soak script (or equivalent alternating malformed/normal stream sequence).
209
-
210
- 4. Validate logs:
211
-
212
- - `MALFORMED TOOL PAYLOAD`
213
- - `MALFORMED RETRY ...`
214
- - `STRICT STREAM GUARDRAIL`
215
- - Absence of user-visible malformed fragments.
216
-
217
- ## Open Follow-Ups
218
-
219
- - Add a dedicated persistent benchmark harness under `scripts/` for this exact soak profile.
220
- - Add branch/commit links from `llama.cpp` worktrees for cross-repo traceability.
221
- - Optionally evaluate enabling `PROXY_TOOL_NARROWING` in production only after longer mixed-workload soak data.