@miller-tech/uap 1.30.0 → 1.30.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +99 -2
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -15,6 +15,18 @@
|
|
|
15
15
|
|
|
16
16
|
## Recent Updates
|
|
17
17
|
|
|
18
|
+
**New:** Delivery Harness (`uap deliver`) — a convergence loop that drives an
|
|
19
|
+
underlying model through execute → apply → verify → feedback against the
|
|
20
|
+
project's real completion gates until delivery is achieved. Best-of-N
|
|
21
|
+
exploration, a structured critic, semantically-recalled best-practice cards,
|
|
22
|
+
and a stagnation-driven escalation ladder turn weaker/local models into
|
|
23
|
+
reliable closers. See [Delivery Harness](#delivery-harness).
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
uap deliver "add a parseDuration(str) helper returning seconds" \
|
|
27
|
+
--candidates 3 --critic --practices --escalate
|
|
28
|
+
```
|
|
29
|
+
|
|
18
30
|
**New:** Expert-stack extensions — forward-design droids (strategic/tactical
|
|
19
31
|
architect, implementation-planner), activated `experts.<name>` MCP tools, HALO
|
|
20
32
|
trace-based harness optimization, open-collider divergent ideation, and a real
|
|
@@ -55,6 +67,7 @@ uap setup -p all
|
|
|
55
67
|
- [Browser Automation](#browser-automation)
|
|
56
68
|
- [MCP Router](#mcp-router)
|
|
57
69
|
- [Multi-Model Architecture](#multi-model-architecture)
|
|
70
|
+
- [Delivery Harness](#delivery-harness)
|
|
58
71
|
- [Pattern System](#pattern-system)
|
|
59
72
|
- [Droids and Skills](#droids--skills)
|
|
60
73
|
- [Task Management](#task-management)
|
|
@@ -78,6 +91,7 @@ uap setup -p all
|
|
|
78
91
|
| Browser | 1 module | Stealth web automation via CloakBrowser (Playwright drop-in) |
|
|
79
92
|
| MCP Router | 11 modules | 2-tool meta-router + expert-consultation registry (98% token savings) |
|
|
80
93
|
| Models | 10 modules | Multi-model routing, planning, execution, validation, 13 model profiles |
|
|
94
|
+
| Delivery Harness | 8 modules | `uap deliver`: convergence loop, best-of-N explorer, critic, practice recall, escalation |
|
|
81
95
|
| Patterns | 23 patterns | Battle-tested workflows from Terminal-Bench 2.0 |
|
|
82
96
|
| Droids | 30 experts | Full SDLC expert stack: strategy, design, build, review, release, ops ([reference](docs/reference/EXPERT_DROIDS.md)) |
|
|
83
97
|
| Expert Orchestrator | 1 module | Adaptive droid-chain selection across plan→design→implement→review→release |
|
|
@@ -329,6 +343,88 @@ Each profile supports: `dynamic_temperature` (decay per retry), `tool_call_batch
|
|
|
329
343
|
|
|
330
344
|
---
|
|
331
345
|
|
|
346
|
+
## Delivery Harness
|
|
347
|
+
|
|
348
|
+
`uap deliver` forces an underlying model — including weaker or local models —
|
|
349
|
+
to reach a **verified** outcome. Instead of trusting a single generation, it
|
|
350
|
+
loops: the model emits whole files, the harness writes them, runs the
|
|
351
|
+
project's real completion gates, and feeds the failures back until every gate
|
|
352
|
+
passes or the turn budget is exhausted. "Done" is defined by the gates, not by
|
|
353
|
+
the model's say-so.
|
|
354
|
+
|
|
355
|
+
### Pipeline
|
|
356
|
+
|
|
357
|
+
```
|
|
358
|
+
┌─────────────────────────── loop until gates pass ───────────────────────────┐
|
|
359
|
+
│ │
|
|
360
|
+
instruction → build prompt → execute → apply files → verify (gates) → feedback ─────────┘
|
|
361
|
+
(+ practices) (+ critique) model to tree build/typecheck/test/lint
|
|
362
|
+
│ │
|
|
363
|
+
best-of-N candidates pass → done ✓ fail → critic + escalate
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
1. **Convergence loop** — execute → apply → verify → feedback against real gates. A baseline check short-circuits when the tree is already green (no model call, no false success).
|
|
367
|
+
2. **Best-of-N explorer** (`--candidates N`) — generates N candidates per turn under distinct strategy seeds, evaluates each on the same tree via apply→verify→rollback, and commits the winner; a model judge breaks ties.
|
|
368
|
+
3. **Structured critic** (`--critic`) — turns a failed turn's gate output into a numbered, file-scoped repair plan via a gate-specific analyst persona.
|
|
369
|
+
4. **Best-practice recall** (`--practices`) — injects provenance-safe practice cards learned from past successful deliveries, retrieved by semantic similarity (nomic-768 embeddings, keyword fallback).
|
|
370
|
+
5. **Escalation ladder** (`--escalate`) — on stagnation, climbs cheap→expensive: widen exploration → enable the critic → switch to a stronger model.
|
|
371
|
+
|
|
372
|
+
### Components (8 modules)
|
|
373
|
+
|
|
374
|
+
| Component | File | Purpose |
|
|
375
|
+
| ----------------- | ------------------------------------- | ----------------------------------------------------------------- |
|
|
376
|
+
| Convergence Loop | `src/delivery/convergence-loop.ts` | Turn loop with pluggable seams + mutable run-state for escalation |
|
|
377
|
+
| Verifier Ladder | `src/delivery/verifier-ladder.ts` | Build/typecheck/test/lint gates with fail-fast and diagnostics |
|
|
378
|
+
| Applier | `src/delivery/applier.ts` | Writes ` ```file:path ` blocks; path-safe, rollback-capable |
|
|
379
|
+
| Explorer | `src/delivery/explorer.ts` | Best-of-N candidates with strategy seeds + rollback evaluation |
|
|
380
|
+
| Judge | `src/delivery/judge.ts` | Model tie-break among equally-scored candidates |
|
|
381
|
+
| Critic | `src/delivery/critic.ts` | Gate-persona repair plans from failed turns |
|
|
382
|
+
| Practice Store | `src/delivery/practice.ts` | Provenance-safe best-practice cards with semantic recall |
|
|
383
|
+
| Escalation | `src/delivery/escalation.ts` | Stagnation-driven ladder returning loop directives |
|
|
384
|
+
|
|
385
|
+
The model is reached through an OpenAI-compatible client
|
|
386
|
+
(`src/models/openai-compat-client.ts`) — the local inference gateway,
|
|
387
|
+
llama.cpp, vLLM, Ollama, or any `/v1/chat/completions` endpoint.
|
|
388
|
+
|
|
389
|
+
### Usage
|
|
390
|
+
|
|
391
|
+
```bash
|
|
392
|
+
# Single-shot loop against the current project's gates
|
|
393
|
+
uap deliver "implement src/slugify.js exporting slugify(str)"
|
|
394
|
+
|
|
395
|
+
# Full quality stack: 3 candidates/turn, critic, learned practices, escalation
|
|
396
|
+
uap deliver "add retry-with-backoff to the HTTP client" \
|
|
397
|
+
--candidates 3 --critic --practices --escalate --escalate-model opus-4.6
|
|
398
|
+
|
|
399
|
+
# Preview detected gates and plan without calling the model
|
|
400
|
+
uap deliver "..." --dry-run
|
|
401
|
+
|
|
402
|
+
# Scope to a subset of gates, cap turns, target another project
|
|
403
|
+
uap deliver "..." --gates build,test --max-turns 8 --project-root ../service
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
### Key flags
|
|
407
|
+
|
|
408
|
+
| Flag | Effect |
|
|
409
|
+
| -------------------------- | ---------------------------------------------------------------------- |
|
|
410
|
+
| `-m, --model <preset>` | Model preset (default `$UAP_DELIVER_MODEL` or `qwen35-a3b`) |
|
|
411
|
+
| `--max-turns <n>` | Maximum execute→verify iterations (default 5) |
|
|
412
|
+
| `--gates <ids>` | Gate subset: `build,typecheck,test,lint` |
|
|
413
|
+
| `--candidates <n>` | Best-of-N exploration (2–8) per turn |
|
|
414
|
+
| `--critic` | Structured repair plans on failed turns |
|
|
415
|
+
| `--practices` | Inject and record best-practice cards |
|
|
416
|
+
| `--no-semantic` | Use keyword (not embedding) practice recall |
|
|
417
|
+
| `--escalate` | Escalation ladder on stagnation |
|
|
418
|
+
| `--escalate-model <preset>`| Stronger model for the final escalation tier |
|
|
419
|
+
| `--endpoint <url>` | Override the model endpoint (OpenAI-compatible `/v1`) |
|
|
420
|
+
| `--dry-run` / `--json` | Show the plan only / emit machine-readable result |
|
|
421
|
+
|
|
422
|
+
Model output is never executed — only written as files and checked by the
|
|
423
|
+
gates. The applier refuses writes to executed config (`package.json`,
|
|
424
|
+
lockfiles), `.git`/hooks/CI paths, and symlinks that escape the project root.
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
332
428
|
## Pattern System (23 Patterns)
|
|
333
429
|
|
|
334
430
|
Battle-tested patterns from Terminal-Bench 2.0, stored in `.factory/patterns/`.
|
|
@@ -478,7 +574,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
|
|
|
478
574
|
|
|
479
575
|
## CLI Reference
|
|
480
576
|
|
|
481
|
-
###
|
|
577
|
+
### 29 Top-Level Commands
|
|
482
578
|
|
|
483
579
|
| Command | Description |
|
|
484
580
|
| ------------------------- | -------------------------------------------- |
|
|
@@ -498,6 +594,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
|
|
|
498
594
|
| `uap task <action>` | Task management (15 subcommands) |
|
|
499
595
|
| `uap droids <action>` | Droid management (3 subcommands) |
|
|
500
596
|
| `uap expert-route <task>` | Recommend an expert droid chain for a task |
|
|
597
|
+
| `uap deliver <task>` | Convergence loop: iterate a model against real gates until delivery |
|
|
501
598
|
| `uap harness <action>` | HALO trace analysis (analyze, status) |
|
|
502
599
|
| `uap ideate <action>` | Open-collider ideation (setup, run, ideas) |
|
|
503
600
|
| `uap model <action>` | Multi-model management (8 subcommands) |
|
|
@@ -511,7 +608,7 @@ pre-tool-use mechanism (claude, vscode, cursor, factory, opencode, omp, hermes).
|
|
|
511
608
|
| `uap sync` | Sync configuration between platforms |
|
|
512
609
|
| `uap uap-omp <action>` | Oh-My-Pi integration (7 subcommands) |
|
|
513
610
|
|
|
514
|
-
**Total:
|
|
611
|
+
**Total: 118 commands and subcommands.**
|
|
515
612
|
|
|
516
613
|
### Additional Binaries
|
|
517
614
|
|