@miller-tech/uap 1.30.0 → 1.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/README.md +119 -2
  2. package/dist/.tsbuildinfo +1 -1
  3. package/dist/bin/cli.js +6 -0
  4. package/dist/bin/cli.js.map +1 -1
  5. package/dist/cli/deliver.d.ts +12 -0
  6. package/dist/cli/deliver.d.ts.map +1 -1
  7. package/dist/cli/deliver.js +144 -9
  8. package/dist/cli/deliver.js.map +1 -1
  9. package/dist/coordination/deploy-batcher.d.ts.map +1 -1
  10. package/dist/coordination/deploy-batcher.js +9 -3
  11. package/dist/coordination/deploy-batcher.js.map +1 -1
  12. package/dist/delivery/applier.d.ts.map +1 -1
  13. package/dist/delivery/applier.js +4 -0
  14. package/dist/delivery/applier.js.map +1 -1
  15. package/dist/delivery/convergence-loop.d.ts +7 -0
  16. package/dist/delivery/convergence-loop.d.ts.map +1 -1
  17. package/dist/delivery/convergence-loop.js +42 -0
  18. package/dist/delivery/convergence-loop.js.map +1 -1
  19. package/dist/delivery/halo-trace.d.ts +29 -0
  20. package/dist/delivery/halo-trace.d.ts.map +1 -0
  21. package/dist/delivery/halo-trace.js +88 -0
  22. package/dist/delivery/halo-trace.js.map +1 -0
  23. package/dist/delivery/ideation.d.ts +36 -0
  24. package/dist/delivery/ideation.d.ts.map +1 -0
  25. package/dist/delivery/ideation.js +109 -0
  26. package/dist/delivery/ideation.js.map +1 -0
  27. package/dist/delivery/index.d.ts +4 -1
  28. package/dist/delivery/index.d.ts.map +1 -1
  29. package/dist/delivery/index.js +4 -1
  30. package/dist/delivery/index.js.map +1 -1
  31. package/dist/delivery/run-coordinator.d.ts +48 -0
  32. package/dist/delivery/run-coordinator.d.ts.map +1 -0
  33. package/dist/delivery/run-coordinator.js +132 -0
  34. package/dist/delivery/run-coordinator.js.map +1 -0
  35. package/package.json +1 -1
package/README.md CHANGED
@@ -15,6 +15,18 @@
15
15
 
16
16
  ## Recent Updates
17
17
 
18
+ **New:** Delivery Harness (`uap deliver`) — a convergence loop that drives an
19
+ underlying model through execute → apply → verify → feedback against the
20
+ project's real completion gates until delivery is achieved. Best-of-N
21
+ exploration, a structured critic, semantically-recalled best-practice cards,
22
+ and a stagnation-driven escalation ladder turn weaker/local models into
23
+ reliable closers. See [Delivery Harness](#delivery-harness).
24
+
25
+ ```bash
26
+ uap deliver "add a parseDuration(str) helper returning seconds" \
27
+ --candidates 3 --critic --practices --escalate
28
+ ```
29
+
18
30
  **New:** Expert-stack extensions — forward-design droids (strategic/tactical
19
31
  architect, implementation-planner), activated `experts.<name>` MCP tools, HALO
20
32
  trace-based harness optimization, open-collider divergent ideation, and a real
@@ -55,6 +67,7 @@ uap setup -p all
55
67
  - [Browser Automation](#browser-automation)
56
68
  - [MCP Router](#mcp-router)
57
69
  - [Multi-Model Architecture](#multi-model-architecture)
70
+ - [Delivery Harness](#delivery-harness)
58
71
  - [Pattern System](#pattern-system)
59
72
  - [Droids and Skills](#droids--skills)
60
73
  - [Task Management](#task-management)
@@ -78,6 +91,7 @@ uap setup -p all
78
91
  | Browser | 1 module | Stealth web automation via CloakBrowser (Playwright drop-in) |
79
92
  | MCP Router | 11 modules | 2-tool meta-router + expert-consultation registry (98% token savings) |
80
93
  | Models | 10 modules | Multi-model routing, planning, execution, validation, 13 model profiles |
94
+ | Delivery Harness | 11 modules | `uap deliver`: convergence loop, best-of-N explorer, critic, practice recall, escalation, ideation seeds, HALO tracing, coordination + deploy queueing |
81
95
  | Patterns | 23 patterns | Battle-tested workflows from Terminal-Bench 2.0 |
82
96
  | Droids | 30 experts | Full SDLC expert stack: strategy, design, build, review, release, ops ([reference](docs/reference/EXPERT_DROIDS.md)) |
83
97
  | Expert Orchestrator | 1 module | Adaptive droid-chain selection across plan→design→implement→review→release |
@@ -329,6 +343,108 @@ Each profile supports: `dynamic_temperature` (decay per retry), `tool_call_batch
329
343
 
330
344
  ---
331
345
 
346
+ ## Delivery Harness
347
+
348
+ `uap deliver` forces an underlying model — including weaker or local models —
349
+ to reach a **verified** outcome. Instead of trusting a single generation, it
350
+ loops: the model emits whole files, the harness writes them, runs the
351
+ project's real completion gates, and feeds the failures back until every gate
352
+ passes or the turn budget is exhausted. "Done" is defined by the gates, not by
353
+ the model's say-so.
354
+
355
+ ### Pipeline
356
+
357
+ ```
358
+ ┌─────────────────────────── loop until gates pass ───────────────────────────┐
359
+ │ │
360
+ instruction → build prompt → execute → apply files → verify (gates) → feedback ─────────┘
361
+ (+ practices) (+ critique) model to tree build/typecheck/test/lint
362
+ │ │
363
+ best-of-N candidates pass → done ✓ fail → critic + escalate
364
+ ```
365
+
366
+ 1. **Convergence loop** — execute → apply → verify → feedback against real gates. A baseline check short-circuits when the tree is already green (no model call, no false success).
367
+ 2. **Best-of-N explorer** (`--candidates N`) — generates N candidates per turn under distinct strategy seeds, evaluates each on the same tree via apply→verify→rollback, and commits the winner; a model judge breaks ties.
368
+ 3. **Structured critic** (`--critic`) — turns a failed turn's gate output into a numbered, file-scoped repair plan via a gate-specific analyst persona.
369
+ 4. **Best-practice recall** (`--practices`) — injects provenance-safe practice cards learned from past successful deliveries, retrieved by semantic similarity (nomic-768 embeddings, keyword fallback).
370
+ 5. **Escalation ladder** (`--escalate`) — on stagnation, climbs cheap→expensive: widen exploration → enable the critic → switch to a stronger model.
371
+ 6. **Divergent ideation** (`--ideate`, `--ideate-project <name>`) — replaces the static strategy seeds with task-specific, deliberately diverse seeds: generated by a bisociation-style model call, or taken from an open-collider project's curated ideas (`uap ideate`). Implies best-of-N exploration.
372
+ 7. **HALO tracing** (`--halo`) — emits one AGENT span per run and one CHAIN span per turn (scores, strategies, failed gates) so `uap harness analyze` can mine systemic failure modes across runs.
373
+ 8. **Coordination** (`--coordinate`) — registers the run with the multi-agent coordination layer (`uap agent`): announces work on the project, warns about overlapping agents, heartbeats every turn, completes/deregisters on exit.
374
+ 9. **Deploy batching** (`--deploy`) — on success, queues a commit of the applied files into the deploy batcher; execute with `uap deploy flush`.
375
+ 10. **`--optimize`** — one switch for every convergence aid: 4 candidates/turn + critic + practices + escalation + ideation + HALO + coordination (deploy stays explicit).
376
+
377
+ ### Components (11 modules)
378
+
379
+ | Component | File | Purpose |
380
+ | ----------------- | ------------------------------------- | ----------------------------------------------------------------- |
381
+ | Convergence Loop | `src/delivery/convergence-loop.ts` | Turn loop with pluggable seams + mutable run-state for escalation |
382
+ | Verifier Ladder | `src/delivery/verifier-ladder.ts` | Build/typecheck/test/lint gates with fail-fast and diagnostics |
383
+ | Applier | `src/delivery/applier.ts` | Writes ` ```file:path ` blocks; path-safe, rollback-capable |
384
+ | Explorer | `src/delivery/explorer.ts` | Best-of-N candidates with strategy seeds + rollback evaluation |
385
+ | Judge | `src/delivery/judge.ts` | Model tie-break among equally-scored candidates |
386
+ | Critic | `src/delivery/critic.ts` | Gate-persona repair plans from failed turns |
387
+ | Practice Store | `src/delivery/practice.ts` | Provenance-safe best-practice cards with semantic recall |
388
+ | Escalation | `src/delivery/escalation.ts` | Stagnation-driven ladder returning loop directives |
389
+ | Ideation Seeder | `src/delivery/ideation.ts` | Divergent strategy seeds (generated or from curated ideas) |
390
+ | HALO Tracer | `src/delivery/halo-trace.ts` | Run/turn spans for `uap harness analyze` |
391
+ | Run Coordinator | `src/delivery/run-coordinator.ts` | `uap agent` registration/heartbeat + `uap deploy` commit queueing |
392
+
393
+ The model is reached through an OpenAI-compatible client
394
+ (`src/models/openai-compat-client.ts`) — the local inference gateway,
395
+ llama.cpp, vLLM, Ollama, or any `/v1/chat/completions` endpoint.
396
+
397
+ ### Usage
398
+
399
+ ```bash
400
+ # Single-shot loop against the current project's gates
401
+ uap deliver "implement src/slugify.js exporting slugify(str)"
402
+
403
+ # Full quality stack: 3 candidates/turn, critic, learned practices, escalation
404
+ uap deliver "add retry-with-backoff to the HTTP client" \
405
+ --candidates 3 --critic --practices --escalate --escalate-model opus-4.6
406
+
407
+ # Preview detected gates and plan without calling the model
408
+ uap deliver "..." --dry-run
409
+
410
+ # Scope to a subset of gates, cap turns, target another project
411
+ uap deliver "..." --gates build,test --max-turns 8 --project-root ../service
412
+
413
+ # Everything on: exploration, critic, practices, escalation, ideation, HALO, coordination
414
+ uap deliver "refactor the cache layer to LRU with TTL" --optimize
415
+
416
+ # Divergent ideation seeds + queue a commit into the deploy batcher on success
417
+ uap deliver "..." --ideate --candidates 4 --deploy
418
+ ```
419
+
420
+ ### Key flags
421
+
422
+ | Flag | Effect |
423
+ | -------------------------- | ---------------------------------------------------------------------- |
424
+ | `-m, --model <preset>` | Model preset (default `$UAP_DELIVER_MODEL` or `qwen35-a3b`) |
425
+ | `--max-turns <n>` | Maximum execute→verify iterations (default 5) |
426
+ | `--gates <ids>` | Gate subset: `build,typecheck,test,lint` |
427
+ | `--candidates <n>` | Best-of-N exploration (2–8) per turn |
428
+ | `--critic` | Structured repair plans on failed turns |
429
+ | `--practices` | Inject and record best-practice cards |
430
+ | `--no-semantic` | Use keyword (not embedding) practice recall |
431
+ | `--escalate` | Escalation ladder on stagnation |
432
+ | `--escalate-model <preset>`| Stronger model for the final escalation tier |
433
+ | `--ideate` | Divergent ideation: task-specific strategy seeds (implies exploration) |
434
+ | `--ideate-project <name>` | Seed exploration from `projects/<name>` curated ideas (`uap ideate`) |
435
+ | `--halo` | Emit HALO spans; analyze with `uap harness analyze` |
436
+ | `--coordinate` | Register with `uap agent`: announce, heartbeat, overlap detection |
437
+ | `--deploy` | On success, queue a commit into the deploy batcher (`uap deploy`) |
438
+ | `--optimize` | Enable every convergence aid (deploy excluded) |
439
+ | `--endpoint <url>` | Override the model endpoint (OpenAI-compatible `/v1`) |
440
+ | `--dry-run` / `--json` | Show the plan only / emit machine-readable result |
441
+
442
+ Model output is never executed — only written as files and checked by the
443
+ gates. The applier refuses writes to executed config (`package.json`,
444
+ lockfiles), `.git`/hooks/CI paths, and symlinks that escape the project root.
445
+
446
+ ---
447
+
332
448
  ## Pattern System (23 Patterns)
333
449
 
334
450
  Battle-tested patterns from Terminal-Bench 2.0, stored in `.factory/patterns/`.
@@ -478,7 +594,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
478
594
 
479
595
  ## CLI Reference
480
596
 
481
- ### 28 Top-Level Commands
597
+ ### 29 Top-Level Commands
482
598
 
483
599
  | Command | Description |
484
600
  | ------------------------- | -------------------------------------------- |
@@ -498,6 +614,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
498
614
  | `uap task <action>` | Task management (15 subcommands) |
499
615
  | `uap droids <action>` | Droid management (3 subcommands) |
500
616
  | `uap expert-route <task>` | Recommend an expert droid chain for a task |
617
+ | `uap deliver <task>` | Convergence loop: iterate a model against real gates until delivery |
501
618
  | `uap harness <action>` | HALO trace analysis (analyze, status) |
502
619
  | `uap ideate <action>` | Open-collider ideation (setup, run, ideas) |
503
620
  | `uap model <action>` | Multi-model management (8 subcommands) |
@@ -511,7 +628,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
511
628
  | `uap sync` | Sync configuration between platforms |
512
629
  | `uap uap-omp <action>` | Oh-My-Pi integration (7 subcommands) |
513
630
 
514
- **Total: 117 commands and subcommands.**
631
+ **Total: 118 commands and subcommands.**
515
632
 
516
633
  ### Additional Binaries
517
634