npm - @miller-tech/uap - Versions diffs - 1.40.0 → 1.40.1 - Mend

@miller-tech/uap 1.40.0 → 1.40.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/README.md +109 -642
package/docs/INDEX.md +48 -286
package/docs/architecture/OVERVIEW.md +328 -0
package/docs/architecture/PROTOCOL.md +204 -0
package/docs/benchmarks/README.md +17 -192
package/docs/getting-started/CONFIGURATION.md +237 -0
package/docs/getting-started/INSTALLATION.md +125 -0
package/docs/getting-started/QUICKSTART.md +115 -0
package/docs/guides/COORDINATION.md +162 -0
package/docs/guides/DELIVER.md +115 -0
package/docs/guides/DEPLOY_BATCHING.md +212 -0
package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
package/docs/guides/LOCAL_MODELS.md +148 -0
package/docs/guides/MCP_ROUTER.md +195 -0
package/docs/guides/MEMORY.md +235 -0
package/docs/guides/MULTI_MODEL.md +223 -0
package/docs/guides/POLICIES.md +190 -0
package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
package/docs/integrations/MCP_ROUTER.md +147 -0
package/docs/integrations/RTK.md +102 -0
package/docs/reference/API.md +485 -0
package/docs/reference/CLI.md +719 -0
package/docs/reference/CONFIGURATION.md +90 -193
package/docs/reference/DATABASE_SCHEMA.md +110 -344
package/docs/reference/FEATURES.md +176 -472
package/docs/reference/PATTERNS.md +102 -0
package/docs/reference/PLATFORMS.md +83 -0
package/package.json +1 -1
package/docs/AGENTS.md +0 -423
package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
package/docs/GETTING_STARTED.md +0 -288
package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
package/docs/architecture/EXPERT_STACK.md +0 -137
package/docs/architecture/MULTI_MODEL.md +0 -224
package/docs/architecture/PLATFORM_GATING.md +0 -68
package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
package/docs/architecture/UAP_COMPLIANCE.md +0 -217
package/docs/architecture/UAP_PROTOCOL.md +0 -339
package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
package/docs/archive/opencode-integration-guide.md +0 -740
package/docs/archive/opencode-integration-quickref.md +0 -180
package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
package/docs/blog/local-coding-agents.md +0 -266
package/docs/blog/x-thread.md +0 -254
package/docs/deployment/DEPLOYMENT.md +0 -895
package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
package/docs/deployment/DEPLOY_BATCHING.md +0 -273
package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
package/docs/getting-started/INTEGRATION.md +0 -628
package/docs/getting-started/OVERVIEW.md +0 -324
package/docs/getting-started/SETUP.md +0 -377
package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
package/docs/integrations/RTK_INTEGRATION.md +0 -468
package/docs/operations/TROUBLESHOOTING.md +0 -660
package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
package/docs/pr/UPSTREAM_PRS.md +0 -424
package/docs/reference/API_REFERENCE.md +0 -903
package/docs/reference/EXPERT_DROIDS.md +0 -219
package/docs/reference/HARNESS-MATRIX.md +0 -318
package/docs/reference/PATTERN_LIBRARY.md +0 -636
package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
package/docs/research/DOMAIN_STRATEGIES.md +0 -316
package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217

package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md DELETED Viewed

@@ -1,221 +0,0 @@
-# Speculative Decoding Journey (2026-03)
-This document records the end-to-end speculative decoding stabilization journey across `llama.cpp` runtime tuning and `uap-anthropic-proxy` guardrails, including fixes, benchmark results, and the production profile now in use.
-## Scope
-- Runtime: `llama.cpp` with Qwen3.5 models, CUDA, `ctx-size=262144`.
-- Gateway: Anthropic-compatible proxy (`tools/agents/scripts/anthropic_proxy.py`).
-- Client behavior: agentic coding loops with tool calls (Claude Code style).
-## Goals
-1. Preserve high speculative decoding throughput.
-2. Eliminate pathological loops and malformed visible output.
-3. Keep tool-call behavior reliable under long sessions.
-4. Keep production context window at `262144`.
-## Phase 1 - Llama.cpp Speculative Stability
-### Problems Observed
-- Rollback loops and instability under aggressive speculative settings.
-- `find_slot` and related server warnings during long agentic sessions.
-- Throughput regressions compared to known fast baseline.
-### Work Performed
-- Implemented and tested multiple rollback strategies in `llama.cpp` worktree branches.
-- Compared baseline fast commit vs newer speculative logic.
-- Restored proven fast runtime path for production service while preserving learned guardrails.
-### Key Runtime Decisions
-- Keep production on fast validated binary lineage (`029edcafc` baseline family).
-- Use strict balanced speculative profile for 35B operations:
-  - `speculative.n_max=12`
-  - `speculative.n_min=2`
-  - `speculative.p_min=0.80`
-### Representative Throughput Findings
-- Qwen3.5-27B, `ctx=262144`, q4 KV cache:
-  - No spec: ~43 tok/s coding, ~41 tok/s pattern.
-  - Spec (balanced): ~43 tok/s coding, ~102 tok/s pattern.
-  - Main uplift appears in pattern-heavy turns, not all coding turns.
-## Phase 2 - Proxy Reasoning Fallback Leak Fix
-### Problems Observed
-- Empty visible output (`output_tokens=0`) with large hidden reasoning payloads.
-- Proxy emitted malformed chain-of-thought text as fallback, causing user-visible garbage:
-  - repeated fragments like `</parameter>`, tool schema echoes, policy text loops.
-### Fixes Implemented
-- Added explicit streaming fallback policy:
-  - `PROXY_STREAM_REASONING_FALLBACK=off|sanitized|visible`
-  - `PROXY_STREAM_REASONING_MAX_CHARS`
-- Set production default to `off`.
-### Result
-- Malformed reasoning fallback leakage is suppressed by default.
-- Debugging remains possible with `sanitized`/`visible` modes when intentionally enabled.
-## Phase 3 - Token Floor and Prune Controls
-### Problems Observed
-- Hardcoded `max_tokens` floor (`16384`) forced very long failure turns.
-- Pruning threshold flag alone could trigger pruning path without meaningful message reduction.
-### Fixes Implemented
-- Added configurable max token floor:
-  - `PROXY_MAX_TOKENS_FLOOR` (`0` disables floor)
-- Added configurable prune target:
-  - `PROXY_CONTEXT_PRUNE_TARGET_FRACTION`
-### Live A/B Result (Production-Like)
-`PROXY_MAX_TOKENS_FLOOR=16384` vs `4096`:
-- Silent reasoning-heavy turn:
-  - `16384`: avg `78.749s`
-  - `4096`: avg `19.777s`
-  - Latency reduction: ~`74.9%`
-  - Predicted throughput unchanged (~`208 tok/s` class)
-- Normal tool turns remained stable and slightly faster with `4096`.
-## Phase 4 - Malformed Tool-Loop Hardening
-### Problem Pattern
-Under adversarial or degraded prompt states, the model can emit pseudo-tool text instead of valid tool calls, e.g.:
-- `</parameter>` fragments
-- echoed policy snippets (`you MUST call a tool...`)
-- long no-progress text with no `tool_calls`
-### Feature Set Added (Flag Controlled)
-1. **Malformed tool guardrail + retry**
-   - `PROXY_MALFORMED_TOOL_GUARDRAIL`
-   - `PROXY_MALFORMED_TOOL_RETRY_MAX`
-   - `PROXY_MALFORMED_TOOL_RETRY_MAX_TOKENS`
-   - `PROXY_MALFORMED_TOOL_RETRY_TEMPERATURE`
-2. **Strict stream guardrail path**
-   - `PROXY_MALFORMED_TOOL_STREAM_STRICT`
-   - For stream+tools requests, proxy runs guarded non-stream upstream call, then replays SSE.
-3. **Tool narrowing (optional)**
-   - `PROXY_TOOL_NARROWING`
-   - `PROXY_TOOL_NARROWING_KEEP`
-   - `PROXY_TOOL_NARROWING_MIN_TOOLS`
-4. **Disable thinking on tool turns (optional)**
-   - `PROXY_DISABLE_THINKING_ON_TOOL_TURNS`
-5. **Session contamination breaker (optional safety net)**
-   - `PROXY_SESSION_CONTAMINATION_BREAKER`
-   - `PROXY_SESSION_CONTAMINATION_THRESHOLD`
-   - `PROXY_SESSION_CONTAMINATION_KEEP_LAST`
-6. **Agentic supplement mode**
-   - `PROXY_AGENTIC_SUPPLEMENT_MODE=clean|legacy`
-### Test Coverage
-- Unit tests in `tools/agents/tests/test_anthropic_proxy_streaming.py`
-- Current targeted suite count in this workstream: `16` passing tests.
-## Benchmark Highlights (Per-Option Toggles)
-### Artifact Stress Benchmark (v3)
-Source: `/tmp/proxy_visibility_benchmark_v3.json`
-| Mode | Key Flags | Outcome Summary |
-| --- | --- | --- |
-| Baseline | none | no tool call, policy-echo text surfaced |
-| Option 1 | malformed guardrail + strict stream | malformed detected and retried; returned `tool_use` with empty visible text |
-| Option 2 | tool narrowing only | not sufficient alone in stress case |
-| Option 3 | disable thinking only | not sufficient alone in stress case |
-| Option 4 | contamination breaker only | not sufficient alone in this synthetic workload |
-| Option 5 | clean supplement only | not sufficient alone in stress case |
-### Practical Conclusion
-- Strongest primary mitigation: **Option 1** (malformed guardrail + strict stream + bounded retry).
-- Other options are secondary tuning aids and should not replace Option 1 for this failure class.
-## 10-Turn Live Stability Soak
-Source: `/tmp/proxy_10turn_soak_results.json`
-- 10 turns, alternating malformed-stress and normal tool-call turns, single live session id.
-- Results:
-  - Error rate: `0.0%`
-  - Malformed visible output rate (stress turns): `0.0%`
-  - Normal tool-call success rate: `100.0%`
-  - Duration p50/p95: `10.2s` / `21.366s`
-  - Stop reasons: `tool_use=6`, `max_tokens=3`, `end_turn=1`
-## Production Profile (Current)
-File: `/home/cogtek/.config/uap/anthropic-proxy.env`
-```bash
-PROXY_MAX_TOKENS_FLOOR=4096
-PROXY_STREAM_REASONING_FALLBACK=off
-PROXY_MALFORMED_TOOL_GUARDRAIL=on
-PROXY_MALFORMED_TOOL_STREAM_STRICT=on
-PROXY_MALFORMED_TOOL_RETRY_MAX=1
-PROXY_MALFORMED_TOOL_RETRY_MAX_TOKENS=512
-PROXY_MALFORMED_TOOL_RETRY_TEMPERATURE=0
-PROXY_TOOL_NARROWING=off
-PROXY_DISABLE_THINKING_ON_TOOL_TURNS=off
-PROXY_SESSION_CONTAMINATION_BREAKER=off
-PROXY_AGENTIC_SUPPLEMENT_MODE=legacy
-```
-Rationale:
-- Keep the strongest practical fix enabled (malformed guardrail + strict stream path).
-- Keep latency-optimized floor (`4096`).
-- Keep optional secondary heuristics off unless new evidence warrants enablement.
-## Reproduction Checklist
-1. Restart services:
-```bash
-systemctl --user restart uap-llama-server.service
-systemctl --user restart uap-anthropic-proxy.service
-```
-2. Run targeted unit tests:
-```bash
-python3 -m pytest tools/agents/tests/test_anthropic_proxy_streaming.py -q
-```
-3. Run soak script (or equivalent alternating malformed/normal stream sequence).
-4. Validate logs:
-- `MALFORMED TOOL PAYLOAD`
-- `MALFORMED RETRY ...`
-- `STRICT STREAM GUARDRAIL`
-- Absence of user-visible malformed fragments.
-## Open Follow-Ups
-- Add a dedicated persistent benchmark harness under `scripts/` for this exact soak profile.
-- Add branch/commit links from `llama.cpp` worktrees for cross-repo traceability.
-- Optionally evaluate enabling `PROXY_TOOL_NARROWING` in production only after longer mixed-workload soak data.