npm - @chllming/wave-orchestration - Versions diffs - 0.5.4 → 0.6.0 - Mend

@chllming/wave-orchestration 0.5.4 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (126) hide show

package/CHANGELOG.md +46 -3
package/README.md +33 -5
package/docs/README.md +18 -4
package/docs/agents/wave-cont-eval-role.md +36 -0
package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} +14 -11
package/docs/agents/wave-documentation-role.md +1 -1
package/docs/agents/wave-infra-role.md +1 -1
package/docs/agents/wave-integration-role.md +3 -3
package/docs/agents/wave-launcher-role.md +4 -3
package/docs/agents/wave-security-role.md +40 -0
package/docs/concepts/context7-vs-skills.md +1 -1
package/docs/concepts/what-is-a-wave.md +56 -6
package/docs/evals/README.md +166 -0
package/docs/evals/benchmark-catalog.json +663 -0
package/docs/guides/author-and-run-waves.md +135 -0
package/docs/guides/planner.md +5 -0
package/docs/guides/terminal-surfaces.md +2 -0
package/docs/plans/component-cutover-matrix.json +1 -1
package/docs/plans/component-cutover-matrix.md +1 -1
package/docs/plans/current-state.md +19 -1
package/docs/plans/examples/wave-example-live-proof.md +435 -0
package/docs/plans/migration.md +42 -0
package/docs/plans/wave-orchestrator.md +46 -7
package/docs/plans/waves/wave-0.md +4 -4
package/docs/reference/live-proof-waves.md +177 -0
package/docs/reference/migration-0.2-to-0.5.md +26 -19
package/docs/reference/npmjs-trusted-publishing.md +6 -5
package/docs/reference/runtime-config/README.md +13 -3
package/docs/reference/sample-waves.md +87 -0
package/docs/reference/skills.md +110 -42
package/docs/research/agent-context-sources.md +130 -11
package/docs/research/coordination-failure-review.md +266 -0
package/docs/roadmap.md +6 -2
package/package.json +2 -2
package/releases/manifest.json +20 -2
package/scripts/research/agent-context-archive.mjs +83 -1
package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +811 -0
package/scripts/wave-orchestrator/adhoc.mjs +1331 -0
package/scripts/wave-orchestrator/agent-state.mjs +358 -6
package/scripts/wave-orchestrator/artifact-schemas.mjs +173 -0
package/scripts/wave-orchestrator/clarification-triage.mjs +10 -3
package/scripts/wave-orchestrator/config.mjs +48 -12
package/scripts/wave-orchestrator/context7.mjs +2 -0
package/scripts/wave-orchestrator/coord-cli.mjs +51 -19
package/scripts/wave-orchestrator/coordination-store.mjs +26 -4
package/scripts/wave-orchestrator/coordination.mjs +83 -9
package/scripts/wave-orchestrator/dashboard-state.mjs +20 -8
package/scripts/wave-orchestrator/dep-cli.mjs +5 -2
package/scripts/wave-orchestrator/docs-queue.mjs +8 -2
package/scripts/wave-orchestrator/evals.mjs +451 -0
package/scripts/wave-orchestrator/feedback.mjs +15 -1
package/scripts/wave-orchestrator/install.mjs +32 -9
package/scripts/wave-orchestrator/launcher-closure.mjs +281 -0
package/scripts/wave-orchestrator/launcher-runtime.mjs +334 -0
package/scripts/wave-orchestrator/launcher.mjs +709 -601
package/scripts/wave-orchestrator/ledger.mjs +123 -20
package/scripts/wave-orchestrator/local-executor.mjs +99 -12
package/scripts/wave-orchestrator/planner.mjs +177 -42
package/scripts/wave-orchestrator/replay.mjs +6 -3
package/scripts/wave-orchestrator/role-helpers.mjs +84 -0
package/scripts/wave-orchestrator/shared.mjs +75 -11
package/scripts/wave-orchestrator/skills.mjs +637 -106
package/scripts/wave-orchestrator/traces.mjs +71 -48
package/scripts/wave-orchestrator/wave-files.mjs +947 -101
package/scripts/wave.mjs +9 -0
package/skills/README.md +202 -0
package/skills/provider-aws/SKILL.md +111 -0
package/skills/provider-aws/adapters/claude.md +1 -0
package/skills/provider-aws/adapters/codex.md +1 -0
package/skills/provider-aws/references/service-verification.md +39 -0
package/skills/provider-aws/skill.json +50 -1
package/skills/provider-custom-deploy/SKILL.md +59 -0
package/skills/provider-custom-deploy/skill.json +46 -1
package/skills/provider-docker-compose/SKILL.md +90 -0
package/skills/provider-docker-compose/adapters/local.md +1 -0
package/skills/provider-docker-compose/skill.json +49 -1
package/skills/provider-github-release/SKILL.md +116 -1
package/skills/provider-github-release/adapters/claude.md +1 -0
package/skills/provider-github-release/adapters/codex.md +1 -0
package/skills/provider-github-release/skill.json +51 -1
package/skills/provider-kubernetes/SKILL.md +137 -0
package/skills/provider-kubernetes/adapters/claude.md +1 -0
package/skills/provider-kubernetes/adapters/codex.md +1 -0
package/skills/provider-kubernetes/references/kubectl-patterns.md +58 -0
package/skills/provider-kubernetes/skill.json +48 -1
package/skills/provider-railway/SKILL.md +118 -1
package/skills/provider-railway/references/verification-commands.md +39 -0
package/skills/provider-railway/skill.json +67 -1
package/skills/provider-ssh-manual/SKILL.md +91 -0
package/skills/provider-ssh-manual/skill.json +50 -1
package/skills/repo-coding-rules/SKILL.md +84 -0
package/skills/repo-coding-rules/skill.json +30 -1
package/skills/role-cont-eval/SKILL.md +90 -0
package/skills/role-cont-eval/adapters/codex.md +1 -0
package/skills/role-cont-eval/skill.json +36 -0
package/skills/role-cont-qa/SKILL.md +93 -0
package/skills/role-cont-qa/adapters/claude.md +1 -0
package/skills/role-cont-qa/skill.json +36 -0
package/skills/role-deploy/SKILL.md +90 -0
package/skills/role-deploy/skill.json +32 -1
package/skills/role-documentation/SKILL.md +66 -0
package/skills/role-documentation/skill.json +32 -1
package/skills/role-implementation/SKILL.md +62 -0
package/skills/role-implementation/skill.json +32 -1
package/skills/role-infra/SKILL.md +74 -0
package/skills/role-infra/skill.json +32 -1
package/skills/role-integration/SKILL.md +79 -1
package/skills/role-integration/skill.json +32 -1
package/skills/role-research/SKILL.md +58 -0
package/skills/role-research/skill.json +32 -1
package/skills/role-security/SKILL.md +60 -0
package/skills/role-security/skill.json +36 -0
package/skills/runtime-claude/SKILL.md +60 -1
package/skills/runtime-claude/skill.json +32 -1
package/skills/runtime-codex/SKILL.md +52 -1
package/skills/runtime-codex/skill.json +32 -1
package/skills/runtime-local/SKILL.md +39 -0
package/skills/runtime-local/skill.json +32 -1
package/skills/runtime-opencode/SKILL.md +51 -0
package/skills/runtime-opencode/skill.json +32 -1
package/skills/wave-core/SKILL.md +107 -0
package/skills/wave-core/references/marker-syntax.md +62 -0
package/skills/wave-core/skill.json +31 -1
package/wave.config.json +35 -6
package/skills/role-evaluator/SKILL.md +0 -6
package/skills/role-evaluator/skill.json +0 -5

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,48 @@
 # Changelog
+## Unreleased
+## 0.6.0 - 2026-03-22
+### Breaking Changes
+- Breaking rename: legacy `evaluator` role/config terminology has been removed in favor of `cont-QA`, and config now rejects `roles.evaluator*`, `skills.byRole.evaluator`, and `runtimePolicy.defaultExecutorByRole.evaluator`.
+- Closure authoring, prompts, starter bundles, validation, and gate parsing now distinguish optional `cont-EVAL` (`E0`) from the mandatory `cont-QA` (`A0`) role instead of treating them as one overloaded evaluator surface.
+### Added
+- Added optional `cont-EVAL` as a first-class closure-stage role for iterative service-output and benchmark tuning, with `## Eval targets`, repo-owned benchmark catalog validation, delegated versus pinned benchmark selection, dedicated `E0` sequencing before integration closure, and a new `scripts/wave-orchestrator/evals.mjs` policy layer.
+- Added `docs/evals/README.md` plus `docs/evals/benchmark-catalog.json` so waves can authorize benchmark families and pinned checks against repo-governed coordination, latency, contradiction-recovery, and quality targets.
+- Added an optional report-only security reviewer role via `docs/agents/wave-security-role.md`, wave parsing support, planner authoring support, a `security-review` executor profile, per-wave security summaries, structured `[wave-security]` markers, and report-path validation that routes fixes back to implementation owners instead of silently folding review into integration.
+- Added transient ad-hoc task flows on top of the launcher substrate with `wave adhoc plan`, `wave adhoc run`, `wave adhoc show`, and `wave adhoc promote`, including stored specs under `.wave/adhoc/runs/`, generated launcher-compatible markdown, and launcher-backed dry-run or live execution.
+- Added dedicated role-helper logic used by planner, launcher, validation, and trace code to reason about `cont-EVAL`, `cont-QA`, and security-review responsibilities.
+- Added dedicated regression suites for eval target parsing and validation, security review validation, ad-hoc run planning and promotion, docs queue behavior, and the expanded research archive topic grouping.
+### Changed
+- Expanded the authored wave surface and starter docs to match the new closure model: updated role prompts, wave examples, migration guidance, current-state docs, roadmap notes, and package docs so `cont-EVAL`, `cont-QA`, and security review are all first-class authoring concepts.
+- Expanded the skills surface substantially: richer `skill.json` manifests, more complete runtime and provider adapters, recursive `references/` material, updated starter role packs, new `skills/README.md`, and clearer runtime-projection/reference docs for role-, runtime-, and deploy-kind-aware skill activation.
+- Expanded provider and operator guidance across the shipped skill packs, including richer Railway, AWS, Kubernetes, Docker Compose, SSH/manual, GitHub Release, repo-coding-rules, role-security, and wave-core references.
+- Expanded proof-first authoring guidance with new sample waves and reference docs for live-proof work, benchmark-driven closure, sticky executor guidance, and richer example wave surfaces.
+- Expanded the local research bibliography and tooling: updated `docs/research/agent-context-sources.md`, added `docs/research/coordination-failure-review.md`, introduced the combined research manifest under `scripts/research/manifests/agent-context-expanded-2026-03-22.mjs`, and taught the archive indexer about planning, skills, blackboard, repo-context, and security topic slices.
+- Curated the README research section so the public-facing bibliography points at the specific papers and practice articles the implementation is based on, rather than only a generic source list.
+### Fixed And Hardened
+- Hardened agent-state, launcher, ledger, replay, traces, local-executor, config, and wave-file validation so `cont-EVAL`, `cont-QA`, and security review all use the correct markers, report ownership, gate sequencing, exit expectations, and replay-visible state.
+- Hardened runtime artifact normalization so versioned dashboard payloads always rewrite stale `kind` and `schemaVersion` fields to the canonical `0.6` metadata contract.
+- Hardened closure-sweep validation so waves that override the integration or documentation steward ids are validated against the same role ids that the launcher actually runs.
+- Hardened coordination and clarification handling so new integration-summary, security-review, and human-follow-up surfaces stay visible in the canonical coordination state, generated board projections, inboxes, and trace artifacts.
+- Hardened `wave coord show` into a read-only inspection path again; artifact materialization stays on `wave coord render` and `wave coord inbox`.
+- Hardened skill and runtime overlays so invalid manifests, mismatched selectors, missing adapters or references, and runtime-specific projection mistakes fail loudly instead of degrading silently at launch time.
+- Hardened ad-hoc planning and promotion so `wave adhoc promote` promotes the stored ad-hoc spec instead of re-reading the current project profile, shared-plan deltas still queue the canonical lane docs correctly, and ownership inference ignores external URL-style hints rather than treating them as repo paths.
+- Hardened install and starter-surface updates so newly seeded workspaces pick up the renamed closure roles, eval catalog, security review role, and expanded skill/reference materials consistently.
+### Testing And Validation
+- Expanded regression coverage across `agent-state`, `config`, `coordination`, `launcher`, `planner`, `skills`, `traces`, `wave-files`, `install`, `local-executor`, and the new `adhoc` and `evals` modules to cover the release's new closure, security, skills, and ad-hoc execution behavior end to end.
+- Added focused regression coverage for dashboard metadata normalization, custom closure-role ids, read-only `wave coord show`, and the per-agent rate-limit retry wrapper.
 ## 0.5.4 - 2026-03-22
 - Added the planner foundation: project bootstrap memory in `.wave/project-profile.json`, `wave project setup|show`, and interactive `wave draft` generation of structured wave specs plus launcher-compatible markdown.
@@ -9,14 +52,14 @@
 ## 0.5.3 - 2026-03-22
-- Deferred integration, documentation, and evaluator agents until the closure sweep whenever implementation work is still pending, so the runtime now matches the documented closure model.
+- Deferred integration, documentation, and cont-QA agents until the closure sweep whenever implementation work is still pending, so the runtime now matches the documented closure model.
 - Scoped wave wait/progress and human-feedback monitoring to the runs actually launched in the current pass, preventing deferred closure agents from surfacing as false pending or missing-status failures.
 - Added regression coverage for mixed implementation/closure waves and for closure-only retry waves.
 - Published `@chllming/wave-orchestration@0.5.3` successfully to npmjs and GitHub Releases.
 ## 0.5.2 - 2026-03-22
-- Hardened structured closure marker parsing so fenced or prose example `[wave-*]` lines no longer satisfy implementation, integration, documentation, or evaluator gates.
+- Hardened structured closure marker parsing so fenced or prose example `[wave-*]` lines no longer satisfy implementation, integration, documentation, or cont-QA gates.
 - Hardened `### Deliverables` so declared outputs must remain repo-relative file paths inside the implementation agent's declared file ownership before the exit contract can pass.
 - Added regression coverage for the fenced-marker false-positive path and for deliverables that escape ownership boundaries.
 - Published `@chllming/wave-orchestration@0.5.2` successfully to npmjs, making npmjs the working public install path instead of a pending rollout target.
@@ -48,7 +91,7 @@
 - Added the Phase 1 and 2 harness runtime: canonical coordination store, compiled inboxes, wave ledger, integration summaries, and clarification triage.
 - Added planning-time runtime profiles, lane runtime policy, hard runtime-mix validation, and retry fallback reassignment recording.
-- Added integration stewardship and staged closure so integration gates documentation and evaluator closure.
+- Added integration stewardship and staged closure so integration gates documentation and cont-QA closure.
 ## 0.2.0 - 2026-03-21

package/README.md CHANGED Viewed

@@ -7,7 +7,7 @@ Wave Orchestration is a repository harness for running multi-agent work in bound
 1. Write shared docs and one or more `docs/plans/waves/wave-<n>.md` files.
 2. Run `wave launch --dry-run` to validate the wave and materialize prompts, inboxes, dashboards, and executor previews.
 3. A real launch runs implementation agents first. Agents post claims, evidence, requests, and decisions into the coordination log and rolling message board.
-4. When implementation gates pass, closure runs in order: integration (`A8`), documentation (`A9`), evaluator (`A0`).
+4. When implementation gates pass, closure runs in order: optional `cont-EVAL` (`E0`), integration (`A8`), documentation (`A9`), and `cont-QA` (`A0`).
 5. Operators use the generated ledgers, inboxes, feedback queue, dependency views, and traces instead of guessing from raw terminal output.
 ## Features
@@ -26,10 +26,24 @@ Wave Orchestration is a repository harness for running multi-agent work in bound
 Representative rolling message board output from a real wave run:
-<img src="./docs/image.png" alt="Example rolling message board output showing claims, evidence, requests, and evaluator closure for a wave run" width="100%" />
+<img src="./docs/image.png" alt="Example rolling message board output showing claims, evidence, requests, and cont-QA closure for a wave run" width="100%" />
 ## Quick Start
+Current release:
+- `@chllming/wave-orchestration@0.6.0`
+- Release tag: [`v0.6.0`](https://github.com/chllming/wave-orchestration/releases/tag/v0.6.0)
+- Public install path: npmjs
+- Authenticated fallback: GitHub Packages
+Highlights in `0.6.0`:
+- `cont-EVAL` (`E0`) is now a first-class optional eval stage before integration, separate from final `cont-QA` closure.
+- Optional security review now has a dedicated role, report path, and `[wave-security]` closure marker.
+- `wave adhoc plan|run|show|promote` now supports transient operator requests on the same launcher substrate.
+- Starter docs and skills now cover the current `0.6.0` closure, benchmark, security, and provider surfaces.
 Requirements:
 - Node.js 22+
@@ -54,7 +68,7 @@ If the repo already has Wave config, plans, or waves you want to keep:
 pnpm exec wave init --adopt-existing
 ```
-Fresh init also seeds a starter `skills/` library. The launcher projects those skill bundles into Codex, Claude, OpenCode, and local executor overlays after the final runtime for each agent is resolved.
+Fresh init also seeds a starter `skills/` library plus `docs/evals/benchmark-catalog.json`. The launcher projects those skill bundles into Codex, Claude, OpenCode, and local executor overlays after the final runtime for each agent is resolved, and waves that include `cont-EVAL` can declare `## Eval targets` against that catalog.
 ## Common Commands
@@ -100,16 +114,30 @@ node scripts/wave.mjs launch --lane main --dry-run --no-dashboard
 Canonical source index:
 - [docs/research/agent-context-sources.md](./docs/research/agent-context-sources.md)
-Key external sources:
+The implementation is based on the following research:
+**Harness and Runtime Surfaces**
 - [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
 - [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)
 - [Unlocking the Codex harness: how we built the App Server](https://openai.com/index/unlocking-the-codex-harness/)
 - [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
 - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
 - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
+- [Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution](https://arxiv.org/abs/2603.11445)
+- [Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models](https://arxiv.org/abs/2510.04618)
+**Shared Coordination and Closure**
 - [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
 - [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
 - [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
+- [Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/abs/2503.13657)
 - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
-- [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
 - [An Open Agent Architecture](https://cdn.aaai.org/Symposia/Spring/1994/SS-94-03/SS94-03-001.pdf)
+**Skills, Repo Context, and Reusable Operating Knowledge**
+- [SoK: Agentic Skills -- Beyond Tool Use in LLM Agents](https://arxiv.org/abs/2602.20867)
+- [Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward](https://arxiv.org/abs/2602.12430)
+- [SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks](https://arxiv.org/abs/2602.12670)
+- [Agent Workflow Memory](https://arxiv.org/abs/2409.07429)
+- [Agent READMEs: An Empirical Study of Context Files for Agentic Coding](https://arxiv.org/abs/2511.12884)
+- [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)

package/docs/README.md CHANGED Viewed

@@ -1,6 +1,10 @@
 # Wave Documentation
-This repository now uses a layered docs structure so operators, maintainers, and adopting repos can find the right level of detail quickly.
+This repository now uses a layered docs structure, but the useful path is journey-first:
+- start with one core concept doc
+- then use one end-to-end workflow guide
+- then drop into reference or narrower concept pages only when needed
 ## Suggested Structure
@@ -18,13 +22,23 @@ This repository now uses a layered docs structure so operators, maintainers, and
 ## Start Here
 - New to Wave:
-  Read [concepts/what-is-a-wave.md](./concepts/what-is-a-wave.md), [concepts/runtime-agnostic-orchestration.md](./concepts/runtime-agnostic-orchestration.md), and [concepts/context7-vs-skills.md](./concepts/context7-vs-skills.md).
+  Read [concepts/what-is-a-wave.md](./concepts/what-is-a-wave.md). It now covers the core execution model, runtime posture, closure, and state model in one place.
 - Drafting or revising waves:
-  Read [guides/planner.md](./guides/planner.md) and then the operator runbook in [plans/wave-orchestrator.md](./plans/wave-orchestrator.md).
+  Read [guides/author-and-run-waves.md](./guides/author-and-run-waves.md), then use [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) as the operator runbook.
+- Adding a security review pass:
+  Read [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) and the standing reviewer prompt in [agents/wave-security-role.md](./agents/wave-security-role.md).
+- Upgrading an existing repo:
+  Read [plans/migration.md](./plans/migration.md), then review the release notes in [../CHANGELOG.md](../CHANGELOG.md) before running `pnpm exec wave upgrade`.
+- Looking for concrete example waves:
+  Read [reference/sample-waves.md](./reference/sample-waves.md) for showcase-first examples that demonstrate the current authored wave surface.
+- Release notes and shipped deltas:
+  Use [../CHANGELOG.md](../CHANGELOG.md) as the canonical version-by-version surface summary, then use [plans/current-state.md](./plans/current-state.md) to see what the starter workspace assumes today.
 - Running live waves:
-  Read [guides/terminal-surfaces.md](./guides/terminal-surfaces.md), [concepts/operating-modes.md](./concepts/operating-modes.md), and [plans/wave-orchestrator.md](./plans/wave-orchestrator.md).
+  Start with [guides/author-and-run-waves.md](./guides/author-and-run-waves.md), then use [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) for the live operator flow.
 - Tuning runtime behavior:
   Read [reference/runtime-config/README.md](./reference/runtime-config/README.md) and [reference/skills.md](./reference/skills.md).
+- Looking for supporting concept pages:
+  Use [concepts/runtime-agnostic-orchestration.md](./concepts/runtime-agnostic-orchestration.md), [concepts/operating-modes.md](./concepts/operating-modes.md), and [concepts/context7-vs-skills.md](./concepts/context7-vs-skills.md) after the main concept and workflow docs.
 ## Package vs Repo-Owned Material

package/docs/agents/wave-cont-eval-role.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+title: "Wave cont-EVAL Role"
+summary: "Standing prompt for the continuous eval role that tunes service output against declared eval targets and benchmarks."
+---
+# Wave cont-EVAL Role
+Use this prompt when an agent should act as the continuous eval tuning role for a wave.
+## Standing prompt
+```text
+You are the cont-EVAL role for the current wave.
+Your job is to run the relevant service or benchmark surfaces, inspect real outputs, identify quality gaps, and drive iterative improvements until the declared eval targets are satisfied or clearly blocked.
+Operating rules:
+- Read the wave's `## Eval targets` section before doing any tuning work.
+- Treat benchmark choice as a repo-governed decision. If the wave delegates benchmark selection, choose only from the declared benchmark family and record the exact selected set.
+- Re-run the service or eval procedure after each material change. Do not claim improvement from one-off inspection alone.
+- By default, you are report-only. You may directly edit implementation files only when the wave explicitly assigns you non-report owned paths.
+- Stay within your declared file ownership for direct edits. If the required fix belongs to another owner, open explicit follow-up work instead of freelancing across boundaries.
+- Keep regressions explicit. Improvement in one target does not justify silent breakage elsewhere.
+What you must do:
+- select or confirm the benchmark set used for the eval pass
+- run the service, benchmark commands, or output reviews needed to score the targets
+- record the observed gaps, regressions, and next changes after each meaningful iteration
+- when you own non-report files, emit the same final proof, doc-delta, and component markers required of other implementation owners
+- leave an append-only cont-EVAL report with the selected benchmarks, commands run, observed gaps, regressions, and final disposition
+- emit one final structured marker:
+  `[wave-eval] state=<satisfied|needs-more-work|blocked> targets=<n> benchmarks=<n> regressions=<n> target_ids=<csv> benchmark_ids=<csv> detail=<short-note>`
+Use `satisfied` only when the declared eval targets are actually met by observed outputs or benchmark results, not when the code merely looks plausible.
+Use `satisfied` only when `target_ids` exactly matches the wave contract, `benchmark_ids` enumerates the executed benchmark set, and unresolved regressions are zero.
+```

package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} RENAMED Viewed

@@ -1,43 +1,46 @@
 ---
-title: "Wave Evaluator Role"
-summary: "Standing prompt for the running evaluator that gates a wave through architecture, proof, and documentation closure."
+title: "Wave cont-QA Role"
+summary: "Standing prompt for the continuous QA role that gates a wave through architecture, proof, and documentation closure."
 ---
-# Wave Evaluator Role
+# Wave cont-QA Role
-Use this prompt when an agent should act as the running evaluator for a wave.
+Use this prompt when an agent should act as the continuous QA closure role for a wave.
 ## Standing prompt
 ```text
-You are the running evaluator for the current wave.
+You are the cont-QA role for the current wave.
-Your job is to keep the wave aligned with repository guidance, plan docs, and proof expectations while the wave is still in progress. You are a live gate, not a final cleanup reviewer.
+Your job is to make the final closure judgment after implementation proof, optional cont-EVAL, integration, and documentation closure have all produced their evidence. You are the fail-closed final steward, not an in-progress reviewer.
 Operating rules:
 - Review changed files against the relevant repository docs and plan docs.
 - Read docs/reference/repository-guidance.md and docs/research/agent-context-sources.md before making final judgments.
 - Re-read the compiled shared summary, your inbox, and the generated wave board projection before major decisions, before validation, and before final output.
+- Judge landed evidence, not intent, effort, or ownership handoff language.
 - Require implementation agents to make gaps explicit instead of implying completion.
 - Treat shared-plan documentation closure as a real gate when the wave changes status, sequencing, ownership, or proof expectations.
 - Distinguish landed evidence from intent, future work, or handoff notes.
 What you must do:
-- detect architecture or planning drift while implementation is in progress
-- surface missing proof, missing validation, missing ownership, and missing documentation closure early
 - compare landed evidence to each agent's declared exit contract
 - compare landed evidence to the wave's declared component promotions and required target levels
+- confirm the integration steward's closure recommendation still matches the final landed state
+- confirm documentation closure is actually closed or explicitly `no-change` where allowed
+- keep the final verdict and final `[wave-gate]` marker internally consistent
 - require exact shared-doc deltas and explicit `closed` or `no-change` notes before PASS when shared plan docs are affected
-- publish an append-only evaluator report for the wave
+- report the smallest blocking set that prevents closure
+- publish an append-only cont-QA report for the wave
 Verdict contract:
-- End the evaluator report with exactly one machine-readable line:
+- End the cont-QA report with exactly one machine-readable line:
   `Verdict: PASS`
   `Verdict: CONCERNS`
   or `Verdict: BLOCKED`
 - Also emit one final structured gate marker:
   `[wave-gate] architecture=<pass|concerns|blocked> integration=<pass|concerns|blocked> durability=<pass|concerns|blocked> live=<pass|concerns|blocked> docs=<pass|concerns|blocked> detail=<short-note>`
-Use PASS only when the required proof is actually present.
+Use PASS only when the required proof is actually present and the final gate marker is fully PASS.
 If the wave declares component promotions, PASS requires those components to reach the declared level instead of merely landing adjacent code.
 ```

package/docs/agents/wave-documentation-role.md CHANGED Viewed

@@ -17,7 +17,7 @@ Your job is to keep shared plan and status docs aligned with the real landed imp
 Operating rules:
 - Anchor updates to docs/reference/repository-guidance.md.
 - Re-read the compiled shared summary, your inbox, and the generated wave board projection before major decisions, before validation, and before final output.
-- Coordinate with the evaluator and implementation agents, but do not use coordination as an excuse to defer obvious shared-plan updates.
+- Coordinate with the cont-QA and implementation agents, but do not use coordination as an excuse to defer obvious shared-plan updates.
 - Keep subsystem-specific docs with the agents that land those deliverables.
 What you must do:

package/docs/agents/wave-infra-role.md CHANGED Viewed

@@ -24,7 +24,7 @@ What you must do:
 - identify the exact infra surface you own for the wave
 - surface missing dependencies, identity gaps, admission blockers, and machine drift early
 - emit durable coordination records when the work depends on another agent or a human decision
-- leave enough exact evidence that the integration steward and evaluator can tell whether the infra surface is conformant, still in setup, or blocked
+- leave enough exact evidence that the integration steward and cont-QA can tell whether the infra surface is conformant, still in setup, or blocked
 - emit structured infra markers whenever the task touches machine validation, workload identity, node admission, deployment bootstrap, or approved machine actions:
   `[infra-status] kind=<conformance|role-drift|dependency|identity|admission|action> target=<machine-or-surface> state=<checking|setup-required|setup-in-progress|conformant|drift|blocked|failed|action-required|action-approved|action-complete> detail=<short-note>`

package/docs/agents/wave-integration-role.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: "Wave Integration Role"
-summary: "Standing prompt for the integration steward that reconciles cross-agent state before documentation and evaluator closure."
+summary: "Standing prompt for the integration steward that reconciles cross-agent state after cont-EVAL and before documentation and cont-QA closure."
 ---
 # Wave Integration Role
@@ -12,7 +12,7 @@ Use this prompt when an agent should act as the integration steward for a wave.
 ```text
 You are the integration steward for the current wave.
-Your job is to synthesize cross-agent state before the documentation steward and evaluator make their final pass. You do not replace implementation ownership. You decide whether the wave is coherent enough for doc closure.
+Your job is to synthesize cross-agent state after any `cont-EVAL` tuning pass and before the documentation steward and cont-QA make their final pass. You do not replace implementation ownership. You decide whether the wave is coherent enough for doc closure.
 Operating rules:
 - Re-read the generated wave inboxes and coordination board projection before major decisions.
@@ -28,5 +28,5 @@ What you must do:
 - emit one final structured marker:
   `[wave-integration] state=<ready-for-doc-closure|needs-more-work> claims=<n> conflicts=<n> blockers=<n> detail=<short-note>`
-Use `ready-for-doc-closure` only when the remaining work is documentation and evaluator closure, not when material implementation or integration risk still exists.
+Use `ready-for-doc-closure` only when the remaining work is documentation and cont-QA closure, not when material implementation or integration risk still exists.
 ```

package/docs/agents/wave-launcher-role.md CHANGED Viewed

@@ -12,7 +12,7 @@ Use this prompt when an agent or human operator should launch waves through the
 ```text
 You are the wave launcher operator.
-Your job is to run wave files safely, one wave at a time by default, while respecting launcher locks, runtime policy, clarification barriers, integration gates, documentation closure, and evaluator closure.
+Your job is to run wave files safely, one wave at a time by default, while respecting launcher locks, runtime policy, clarification barriers, optional `cont-EVAL` gates, integration gates, documentation closure, and cont-QA closure.
 Before launching:
 1. Run `pnpm exec wave doctor`.
@@ -24,8 +24,9 @@ Before launching:
 Completion requires:
 - all agents exit `0`
-- integration must be `ready-for-doc-closure` before documentation and evaluator closure run
-- evaluator verdict is `PASS`
+- if `cont-EVAL` is present, it must report satisfied targets before integration closure runs
+- integration must be `ready-for-doc-closure` before documentation and cont-QA closure run
+- cont-QA verdict is `PASS`
 - prompt hashes still match the current wave definitions
 - shared-plan documentation closure is resolved when required
 - no routed clarification chain or unresolved human escalation remains open

package/docs/agents/wave-security-role.md ADDED Viewed

@@ -0,0 +1,40 @@
+---
+title: "Wave Security Role"
+summary: "Standing prompt for the security reviewer that performs a threat-model-first review before integration closure."
+---
+# Wave Security Role
+Use this prompt when an agent should act as the security reviewer for a wave.
+## Standing prompt
+```text
+You are the wave security reviewer for the current wave.
+Your job is to review the landed change set before integration closure, identify security-sensitive risks, and route exact fixes or approvals while the wave is still active. You are report-only by default. Do not replace implementation ownership.
+Operating rules:
+- Re-read the compiled shared summary, your inbox, the generated wave board projection, and the owned reports before major decisions.
+- Do a threat-model pass before finalizing conclusions. Identify trust boundaries, attacker-controlled inputs, sensitive assets, approval-sensitive operations, and any external execution or data access paths touched by the wave.
+- Prefer exact findings and exact requested fixes over vague warnings.
+- Route fixes to the owning agent when the required change is outside your report path.
+- Keep the final output short enough to drive relaunch decisions and closure gates.
+What you must do:
+- leave a security review report with these sections in order:
+  `Threat Model`
+  `Risky Surfaces`
+  `Findings`
+  `Required Approvals`
+  `Requested Fixes`
+  `Final Disposition`
+- record each finding with severity, concrete file or surface, exploit or failure mode, and the owner expected to fix it
+- record each approval-sensitive action explicitly, even if the wave can proceed without blocking
+- emit one final structured marker:
+  `[wave-security] state=<clear|concerns|blocked> findings=<n> approvals=<n> detail=<short-note>`
+Use `clear` only when no unresolved findings or approvals remain.
+Use `concerns` when findings remain advisory for this wave and do not automatically block progression.
+Use `blocked` only when the wave must stop before integration until a finding or approval is resolved.
+```

package/docs/concepts/context7-vs-skills.md CHANGED Viewed

@@ -44,7 +44,7 @@ Use skills when the guidance is reusable, repo-owned, and should survive across
 - environment-specific rules
 - Railway, Kubernetes, or GitHub release procedures
 - runtime-specific instructions for Codex, Claude, or OpenCode
-- role-oriented heuristics for implementation, deploy, evaluator, or research agents
+- role-oriented heuristics for implementation, deploy, cont-QA, or research agents
 ## What Remains Authoritative

package/docs/concepts/what-is-a-wave.md CHANGED Viewed

@@ -18,7 +18,7 @@ It is not just a prompt file. A wave is a bounded slice of repository work with:
 - Wave
   One numbered work package inside a lane, usually stored as `docs/plans/waves/wave-<n>.md`.
 - Agent
-  One role inside the wave, such as implementation, integration, documentation, evaluator, infra, or deploy.
+  One role inside the wave, such as implementation, `cont-EVAL`, security review, integration, documentation, cont-QA, infra, or deploy.
 - Attempt
   One execution pass of a wave. A wave can have multiple attempts due to retries or fallback.
 - Closure
@@ -44,6 +44,7 @@ Wave markdown is the authored execution surface today. A typical wave can includ
 - reference rule
 - deploy environments
 - component promotions
+- eval targets
 - Context7 defaults
 - one `## Agent ...` block per role
@@ -53,6 +54,11 @@ Inside each agent block, the important sections are:
   Standing role identity imported from `docs/agents/*.md`.
 - `### Executor`
   Runtime selection, profile, model, fallbacks, and budgets.
+- `## Eval targets`
+  Optional wave-level contract for `cont-EVAL`, including benchmark family or pinned benchmarks, objective, and stop condition.
+  See [docs/evals/README.md](../evals/README.md) for guidance on delegated versus pinned targets and the coordination benchmark families.
+- `### Proof artifacts`
+  Optional machine-visible local evidence required for proof-centric waves, especially `pilot-live` and above.
 - `### Context7`
   External library truth to prefetch and inject.
 - `### Skills`
@@ -70,16 +76,20 @@ Inside each agent block, the important sections are:
 ## Standard Roles
-The starter runtime expects three closure roles:
+The starter runtime expects three standard closure roles plus up to two optional review specialists:
 - `A8`
   Integration steward
 - `A9`
   Documentation steward
 - `A0`
-  Evaluator
+  cont-QA
+- `E0`
+  Optional `cont-EVAL` for iterative benchmark or output tuning; report-only by default, implementation-owning only when explicitly assigned non-report files
+- `A7`
+  Optional security reviewer; report-only by default and used to publish a threat-model-first security review before integration closure
-Implementation or specialist agents own the actual work slices. Closure roles do not replace implementation ownership; they decide whether the combined result is closure-ready.
+Implementation or specialist agents own the actual work slices. Closure roles do not replace implementation ownership; they decide whether the combined result is closure-ready. `cont-EVAL` is the one hybrid role: most waves keep it report-only, but human-authored waves may assign explicit tuning files to `E0`, in which case it must satisfy both implementation proof and eval proof.
 ## Lifecycle Of A Wave
@@ -89,21 +99,60 @@ Implementation or specialist agents own the actual work slices. Closure roles do
 4. A live run launches implementation agents first when implementation work remains.
 5. Agents write structured coordination events instead of relying on ad hoc terminal output.
 6. The launcher checks implementation contracts, promoted-component proof, helper assignments, dependencies, and clarification state.
-7. If implementation is ready, closure runs in order: integration, documentation, evaluator.
+7. If implementation is ready, closure runs in order: optional `cont-EVAL`, optional security review, integration, documentation, then cont-QA.
 8. The attempt is captured in per-wave traces, ledgers, inboxes, summaries, and copied artifacts.
+## Runtime And Operating Posture
+Wave is runtime agnostic at the orchestration layer.
+Planning, ownership, closure, durable state, and traces do not depend on whether an agent runs on Codex, Claude Code, OpenCode, or the local smoke executor. Runtime-specific behavior is isolated to executor adapters and overlays.
+That means a wave should usually be authored in runtime-neutral terms:
+- ownership and deliverables
+- proof and validation
+- closure order
+- dependencies and helper flow
+- promoted component expectations
+The runtime choice resolves later, from the agent executor block, profile defaults, lane defaults, CLI overrides, and fallback policy.
+Wave also has an execution posture:
+- `oversight`
+  Human review or intervention is expected for risky or ambiguous work.
+- `dark-factory`
+  The wave is authored for routine execution without normal human intervention.
+Today these postures are planning vocabulary and saved project defaults, not two separate execution engines. Human feedback is still an escalation mechanism inside the orchestration loop, not the definition of the operating mode itself.
+If you need the narrower supporting pages, see [runtime-agnostic-orchestration.md](./runtime-agnostic-orchestration.md) and [operating-modes.md](./operating-modes.md).
+Current live waves are strict about closure artifacts:
+- `cont-EVAL` must emit a structured `[wave-eval]` marker whose `target_ids` matches the declared eval targets and whose `benchmark_ids` enumerates the executed benchmark set.
+- Security reviewers must leave a security review report and emit a final `[wave-security]` marker with `state=<clear|concerns|blocked>`, finding count, and approval count.
+- `cont-QA` must emit both a final `Verdict:` line and a final `[wave-gate]` marker.
+- Replay keeps read-only compatibility with older traces and older evaluator-era artifacts, but live waves do not pass on verdict-only or underspecified closure markers.
 ## What Makes A Wave "Done"
 A wave is not done because an agent said so. It is done only when the runtime surfaces agree:
 - implementation exit contracts pass
 - required deliverables exist and stay within ownership boundaries
+- required proof artifacts exist when the wave declares proof-first live evidence
 - required component proof and promotions pass
 - helper assignments are resolved
 - required dependency tickets are resolved
 - clarification follow-ups or escalations are resolved
+- if present, `cont-EVAL` satisfies its declared eval targets
+- if present, the security reviewer publishes a report plus a final `[wave-security]` marker; `blocked` stops closure while `concerns` stays advisory
 - integration recommends closure
-- documentation and evaluator closure pass
+- documentation and cont-QA closure pass
+For proof-first live-wave examples, see [docs/reference/live-proof-waves.md](../reference/live-proof-waves.md).
 ## Where The State Lives
@@ -115,6 +164,7 @@ The wave file is only part of the story. The runtime writes durable state under
 - rendered message boards
 - compiled inboxes
 - ledger and docs queue
+- security summaries
 - integration summaries
 - dependency snapshots
 - executor overlays