@miller-tech/uap 1.29.0 → 1.30.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -15,6 +15,18 @@
15
15
 
16
16
  ## Recent Updates
17
17
 
18
+ **New:** Delivery Harness (`uap deliver`) — a convergence loop that drives an
19
+ underlying model through execute → apply → verify → feedback against the
20
+ project's real completion gates until delivery is achieved. Best-of-N
21
+ exploration, a structured critic, semantically-recalled best-practice cards,
22
+ and a stagnation-driven escalation ladder turn weaker/local models into
23
+ reliable closers. See [Delivery Harness](#delivery-harness).
24
+
25
+ ```bash
26
+ uap deliver "add a parseDuration(str) helper returning seconds" \
27
+ --candidates 3 --critic --practices --escalate
28
+ ```
29
+
18
30
  **New:** Expert-stack extensions — forward-design droids (strategic/tactical
19
31
  architect, implementation-planner), activated `experts.<name>` MCP tools, HALO
20
32
  trace-based harness optimization, open-collider divergent ideation, and a real
@@ -55,6 +67,7 @@ uap setup -p all
55
67
  - [Browser Automation](#browser-automation)
56
68
  - [MCP Router](#mcp-router)
57
69
  - [Multi-Model Architecture](#multi-model-architecture)
70
+ - [Delivery Harness](#delivery-harness)
58
71
  - [Pattern System](#pattern-system)
59
72
  - [Droids and Skills](#droids--skills)
60
73
  - [Task Management](#task-management)
@@ -78,6 +91,7 @@ uap setup -p all
78
91
  | Browser | 1 module | Stealth web automation via CloakBrowser (Playwright drop-in) |
79
92
  | MCP Router | 11 modules | 2-tool meta-router + expert-consultation registry (98% token savings) |
80
93
  | Models | 10 modules | Multi-model routing, planning, execution, validation, 13 model profiles |
94
+ | Delivery Harness | 8 modules | `uap deliver`: convergence loop, best-of-N explorer, critic, practice recall, escalation |
81
95
  | Patterns | 23 patterns | Battle-tested workflows from Terminal-Bench 2.0 |
82
96
  | Droids | 30 experts | Full SDLC expert stack: strategy, design, build, review, release, ops ([reference](docs/reference/EXPERT_DROIDS.md)) |
83
97
  | Expert Orchestrator | 1 module | Adaptive droid-chain selection across plan→design→implement→review→release |
@@ -329,6 +343,88 @@ Each profile supports: `dynamic_temperature` (decay per retry), `tool_call_batch
329
343
 
330
344
  ---
331
345
 
346
+ ## Delivery Harness
347
+
348
+ `uap deliver` forces an underlying model — including weaker or local models —
349
+ to reach a **verified** outcome. Instead of trusting a single generation, it
350
+ loops: the model emits whole files, the harness writes them, runs the
351
+ project's real completion gates, and feeds the failures back until every gate
352
+ passes or the turn budget is exhausted. "Done" is defined by the gates, not by
353
+ the model's say-so.
354
+
355
+ ### Pipeline
356
+
357
+ ```
358
+ ┌─────────────────────────── loop until gates pass ───────────────────────────┐
359
+ │ │
360
+ instruction → build prompt → execute → apply files → verify (gates) → feedback ─────────┘
361
+ (+ practices) (+ critique) model to tree build/typecheck/test/lint
362
+ │ │
363
+ best-of-N candidates pass → done ✓ fail → critic + escalate
364
+ ```
365
+
366
+ 1. **Convergence loop** — execute → apply → verify → feedback against real gates. A baseline check short-circuits when the tree is already green (no model call, no false success).
367
+ 2. **Best-of-N explorer** (`--candidates N`) — generates N candidates per turn under distinct strategy seeds, evaluates each on the same tree via apply→verify→rollback, and commits the winner; a model judge breaks ties.
368
+ 3. **Structured critic** (`--critic`) — turns a failed turn's gate output into a numbered, file-scoped repair plan via a gate-specific analyst persona.
369
+ 4. **Best-practice recall** (`--practices`) — injects provenance-safe practice cards learned from past successful deliveries, retrieved by semantic similarity (nomic-768 embeddings, keyword fallback).
370
+ 5. **Escalation ladder** (`--escalate`) — on stagnation, climbs cheap→expensive: widen exploration → enable the critic → switch to a stronger model.
371
+
372
+ ### Components (8 modules)
373
+
374
+ | Component | File | Purpose |
375
+ | ----------------- | ------------------------------------- | ----------------------------------------------------------------- |
376
+ | Convergence Loop | `src/delivery/convergence-loop.ts` | Turn loop with pluggable seams + mutable run-state for escalation |
377
+ | Verifier Ladder | `src/delivery/verifier-ladder.ts` | Build/typecheck/test/lint gates with fail-fast and diagnostics |
378
+ | Applier | `src/delivery/applier.ts` | Writes ` ```file:path ` blocks; path-safe, rollback-capable |
379
+ | Explorer | `src/delivery/explorer.ts` | Best-of-N candidates with strategy seeds + rollback evaluation |
380
+ | Judge | `src/delivery/judge.ts` | Model tie-break among equally-scored candidates |
381
+ | Critic | `src/delivery/critic.ts` | Gate-persona repair plans from failed turns |
382
+ | Practice Store | `src/delivery/practice.ts` | Provenance-safe best-practice cards with semantic recall |
383
+ | Escalation | `src/delivery/escalation.ts` | Stagnation-driven ladder returning loop directives |
384
+
385
+ The model is reached through an OpenAI-compatible client
386
+ (`src/models/openai-compat-client.ts`) — the local inference gateway,
387
+ llama.cpp, vLLM, Ollama, or any `/v1/chat/completions` endpoint.
388
+
389
+ ### Usage
390
+
391
+ ```bash
392
+ # Single-shot loop against the current project's gates
393
+ uap deliver "implement src/slugify.js exporting slugify(str)"
394
+
395
+ # Full quality stack: 3 candidates/turn, critic, learned practices, escalation
396
+ uap deliver "add retry-with-backoff to the HTTP client" \
397
+ --candidates 3 --critic --practices --escalate --escalate-model opus-4.6
398
+
399
+ # Preview detected gates and plan without calling the model
400
+ uap deliver "..." --dry-run
401
+
402
+ # Scope to a subset of gates, cap turns, target another project
403
+ uap deliver "..." --gates build,test --max-turns 8 --project-root ../service
404
+ ```
405
+
406
+ ### Key flags
407
+
408
+ | Flag | Effect |
409
+ | -------------------------- | ---------------------------------------------------------------------- |
410
+ | `-m, --model <preset>` | Model preset (default `$UAP_DELIVER_MODEL` or `qwen35-a3b`) |
411
+ | `--max-turns <n>` | Maximum execute→verify iterations (default 5) |
412
+ | `--gates <ids>` | Gate subset: `build,typecheck,test,lint` |
413
+ | `--candidates <n>` | Best-of-N exploration (2–8) per turn |
414
+ | `--critic` | Structured repair plans on failed turns |
415
+ | `--practices` | Inject and record best-practice cards |
416
+ | `--no-semantic` | Use keyword (not embedding) practice recall |
417
+ | `--escalate` | Escalation ladder on stagnation |
418
+ | `--escalate-model <preset>`| Stronger model for the final escalation tier |
419
+ | `--endpoint <url>` | Override the model endpoint (OpenAI-compatible `/v1`) |
420
+ | `--dry-run` / `--json` | Show the plan only / emit machine-readable result |
421
+
422
+ Model output is never executed — only written as files and checked by the
423
+ gates. The applier refuses writes to executed config (`package.json`,
424
+ lockfiles), `.git`/hooks/CI paths, and symlinks that escape the project root.
425
+
426
+ ---
427
+
332
428
  ## Pattern System (23 Patterns)
333
429
 
334
430
  Battle-tested patterns from Terminal-Bench 2.0, stored in `.factory/patterns/`.
@@ -478,7 +574,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
478
574
 
479
575
  ## CLI Reference
480
576
 
481
- ### 28 Top-Level Commands
577
+ ### 29 Top-Level Commands
482
578
 
483
579
  | Command | Description |
484
580
  | ------------------------- | -------------------------------------------- |
@@ -498,6 +594,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
498
594
  | `uap task <action>` | Task management (15 subcommands) |
499
595
  | `uap droids <action>` | Droid management (3 subcommands) |
500
596
  | `uap expert-route <task>` | Recommend an expert droid chain for a task |
597
+ | `uap deliver <task>` | Convergence loop: iterate a model against real gates until delivery |
501
598
  | `uap harness <action>` | HALO trace analysis (analyze, status) |
502
599
  | `uap ideate <action>` | Open-collider ideation (setup, run, ideas) |
503
600
  | `uap model <action>` | Multi-model management (8 subcommands) |
@@ -511,7 +608,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
511
608
  | `uap sync` | Sync configuration between platforms |
512
609
  | `uap uap-omp <action>` | Oh-My-Pi integration (7 subcommands) |
513
610
 
514
- **Total: 117 commands and subcommands.**
611
+ **Total: 118 commands and subcommands.**
515
612
 
516
613
  ### Additional Binaries
517
614