@chllming/wave-orchestration 0.5.4 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (126) hide show
  1. package/CHANGELOG.md +46 -3
  2. package/README.md +33 -5
  3. package/docs/README.md +18 -4
  4. package/docs/agents/wave-cont-eval-role.md +36 -0
  5. package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} +14 -11
  6. package/docs/agents/wave-documentation-role.md +1 -1
  7. package/docs/agents/wave-infra-role.md +1 -1
  8. package/docs/agents/wave-integration-role.md +3 -3
  9. package/docs/agents/wave-launcher-role.md +4 -3
  10. package/docs/agents/wave-security-role.md +40 -0
  11. package/docs/concepts/context7-vs-skills.md +1 -1
  12. package/docs/concepts/what-is-a-wave.md +56 -6
  13. package/docs/evals/README.md +166 -0
  14. package/docs/evals/benchmark-catalog.json +663 -0
  15. package/docs/guides/author-and-run-waves.md +135 -0
  16. package/docs/guides/planner.md +5 -0
  17. package/docs/guides/terminal-surfaces.md +2 -0
  18. package/docs/plans/component-cutover-matrix.json +1 -1
  19. package/docs/plans/component-cutover-matrix.md +1 -1
  20. package/docs/plans/current-state.md +19 -1
  21. package/docs/plans/examples/wave-example-live-proof.md +435 -0
  22. package/docs/plans/migration.md +42 -0
  23. package/docs/plans/wave-orchestrator.md +46 -7
  24. package/docs/plans/waves/wave-0.md +4 -4
  25. package/docs/reference/live-proof-waves.md +177 -0
  26. package/docs/reference/migration-0.2-to-0.5.md +26 -19
  27. package/docs/reference/npmjs-trusted-publishing.md +6 -5
  28. package/docs/reference/runtime-config/README.md +13 -3
  29. package/docs/reference/sample-waves.md +87 -0
  30. package/docs/reference/skills.md +110 -42
  31. package/docs/research/agent-context-sources.md +130 -11
  32. package/docs/research/coordination-failure-review.md +266 -0
  33. package/docs/roadmap.md +6 -2
  34. package/package.json +2 -2
  35. package/releases/manifest.json +20 -2
  36. package/scripts/research/agent-context-archive.mjs +83 -1
  37. package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +811 -0
  38. package/scripts/wave-orchestrator/adhoc.mjs +1331 -0
  39. package/scripts/wave-orchestrator/agent-state.mjs +358 -6
  40. package/scripts/wave-orchestrator/artifact-schemas.mjs +173 -0
  41. package/scripts/wave-orchestrator/clarification-triage.mjs +10 -3
  42. package/scripts/wave-orchestrator/config.mjs +48 -12
  43. package/scripts/wave-orchestrator/context7.mjs +2 -0
  44. package/scripts/wave-orchestrator/coord-cli.mjs +51 -19
  45. package/scripts/wave-orchestrator/coordination-store.mjs +26 -4
  46. package/scripts/wave-orchestrator/coordination.mjs +83 -9
  47. package/scripts/wave-orchestrator/dashboard-state.mjs +20 -8
  48. package/scripts/wave-orchestrator/dep-cli.mjs +5 -2
  49. package/scripts/wave-orchestrator/docs-queue.mjs +8 -2
  50. package/scripts/wave-orchestrator/evals.mjs +451 -0
  51. package/scripts/wave-orchestrator/feedback.mjs +15 -1
  52. package/scripts/wave-orchestrator/install.mjs +32 -9
  53. package/scripts/wave-orchestrator/launcher-closure.mjs +281 -0
  54. package/scripts/wave-orchestrator/launcher-runtime.mjs +334 -0
  55. package/scripts/wave-orchestrator/launcher.mjs +709 -601
  56. package/scripts/wave-orchestrator/ledger.mjs +123 -20
  57. package/scripts/wave-orchestrator/local-executor.mjs +99 -12
  58. package/scripts/wave-orchestrator/planner.mjs +177 -42
  59. package/scripts/wave-orchestrator/replay.mjs +6 -3
  60. package/scripts/wave-orchestrator/role-helpers.mjs +84 -0
  61. package/scripts/wave-orchestrator/shared.mjs +75 -11
  62. package/scripts/wave-orchestrator/skills.mjs +637 -106
  63. package/scripts/wave-orchestrator/traces.mjs +71 -48
  64. package/scripts/wave-orchestrator/wave-files.mjs +947 -101
  65. package/scripts/wave.mjs +9 -0
  66. package/skills/README.md +202 -0
  67. package/skills/provider-aws/SKILL.md +111 -0
  68. package/skills/provider-aws/adapters/claude.md +1 -0
  69. package/skills/provider-aws/adapters/codex.md +1 -0
  70. package/skills/provider-aws/references/service-verification.md +39 -0
  71. package/skills/provider-aws/skill.json +50 -1
  72. package/skills/provider-custom-deploy/SKILL.md +59 -0
  73. package/skills/provider-custom-deploy/skill.json +46 -1
  74. package/skills/provider-docker-compose/SKILL.md +90 -0
  75. package/skills/provider-docker-compose/adapters/local.md +1 -0
  76. package/skills/provider-docker-compose/skill.json +49 -1
  77. package/skills/provider-github-release/SKILL.md +116 -1
  78. package/skills/provider-github-release/adapters/claude.md +1 -0
  79. package/skills/provider-github-release/adapters/codex.md +1 -0
  80. package/skills/provider-github-release/skill.json +51 -1
  81. package/skills/provider-kubernetes/SKILL.md +137 -0
  82. package/skills/provider-kubernetes/adapters/claude.md +1 -0
  83. package/skills/provider-kubernetes/adapters/codex.md +1 -0
  84. package/skills/provider-kubernetes/references/kubectl-patterns.md +58 -0
  85. package/skills/provider-kubernetes/skill.json +48 -1
  86. package/skills/provider-railway/SKILL.md +118 -1
  87. package/skills/provider-railway/references/verification-commands.md +39 -0
  88. package/skills/provider-railway/skill.json +67 -1
  89. package/skills/provider-ssh-manual/SKILL.md +91 -0
  90. package/skills/provider-ssh-manual/skill.json +50 -1
  91. package/skills/repo-coding-rules/SKILL.md +84 -0
  92. package/skills/repo-coding-rules/skill.json +30 -1
  93. package/skills/role-cont-eval/SKILL.md +90 -0
  94. package/skills/role-cont-eval/adapters/codex.md +1 -0
  95. package/skills/role-cont-eval/skill.json +36 -0
  96. package/skills/role-cont-qa/SKILL.md +93 -0
  97. package/skills/role-cont-qa/adapters/claude.md +1 -0
  98. package/skills/role-cont-qa/skill.json +36 -0
  99. package/skills/role-deploy/SKILL.md +90 -0
  100. package/skills/role-deploy/skill.json +32 -1
  101. package/skills/role-documentation/SKILL.md +66 -0
  102. package/skills/role-documentation/skill.json +32 -1
  103. package/skills/role-implementation/SKILL.md +62 -0
  104. package/skills/role-implementation/skill.json +32 -1
  105. package/skills/role-infra/SKILL.md +74 -0
  106. package/skills/role-infra/skill.json +32 -1
  107. package/skills/role-integration/SKILL.md +79 -1
  108. package/skills/role-integration/skill.json +32 -1
  109. package/skills/role-research/SKILL.md +58 -0
  110. package/skills/role-research/skill.json +32 -1
  111. package/skills/role-security/SKILL.md +60 -0
  112. package/skills/role-security/skill.json +36 -0
  113. package/skills/runtime-claude/SKILL.md +60 -1
  114. package/skills/runtime-claude/skill.json +32 -1
  115. package/skills/runtime-codex/SKILL.md +52 -1
  116. package/skills/runtime-codex/skill.json +32 -1
  117. package/skills/runtime-local/SKILL.md +39 -0
  118. package/skills/runtime-local/skill.json +32 -1
  119. package/skills/runtime-opencode/SKILL.md +51 -0
  120. package/skills/runtime-opencode/skill.json +32 -1
  121. package/skills/wave-core/SKILL.md +107 -0
  122. package/skills/wave-core/references/marker-syntax.md +62 -0
  123. package/skills/wave-core/skill.json +31 -1
  124. package/wave.config.json +35 -6
  125. package/skills/role-evaluator/SKILL.md +0 -6
  126. package/skills/role-evaluator/skill.json +0 -5
package/CHANGELOG.md CHANGED
@@ -1,5 +1,48 @@
1
1
  # Changelog
2
2
 
3
+ ## Unreleased
4
+
5
+ ## 0.6.0 - 2026-03-22
6
+
7
+ ### Breaking Changes
8
+
9
+ - Breaking rename: legacy `evaluator` role/config terminology has been removed in favor of `cont-QA`, and config now rejects `roles.evaluator*`, `skills.byRole.evaluator`, and `runtimePolicy.defaultExecutorByRole.evaluator`.
10
+ - Closure authoring, prompts, starter bundles, validation, and gate parsing now distinguish optional `cont-EVAL` (`E0`) from the mandatory `cont-QA` (`A0`) role instead of treating them as one overloaded evaluator surface.
11
+
12
+ ### Added
13
+
14
+ - Added optional `cont-EVAL` as a first-class closure-stage role for iterative service-output and benchmark tuning, with `## Eval targets`, repo-owned benchmark catalog validation, delegated versus pinned benchmark selection, dedicated `E0` sequencing before integration closure, and a new `scripts/wave-orchestrator/evals.mjs` policy layer.
15
+ - Added `docs/evals/README.md` plus `docs/evals/benchmark-catalog.json` so waves can authorize benchmark families and pinned checks against repo-governed coordination, latency, contradiction-recovery, and quality targets.
16
+ - Added an optional report-only security reviewer role via `docs/agents/wave-security-role.md`, wave parsing support, planner authoring support, a `security-review` executor profile, per-wave security summaries, structured `[wave-security]` markers, and report-path validation that routes fixes back to implementation owners instead of silently folding review into integration.
17
+ - Added transient ad-hoc task flows on top of the launcher substrate with `wave adhoc plan`, `wave adhoc run`, `wave adhoc show`, and `wave adhoc promote`, including stored specs under `.wave/adhoc/runs/`, generated launcher-compatible markdown, and launcher-backed dry-run or live execution.
18
+ - Added dedicated role-helper logic used by planner, launcher, validation, and trace code to reason about `cont-EVAL`, `cont-QA`, and security-review responsibilities.
19
+ - Added dedicated regression suites for eval target parsing and validation, security review validation, ad-hoc run planning and promotion, docs queue behavior, and the expanded research archive topic grouping.
20
+
21
+ ### Changed
22
+
23
+ - Expanded the authored wave surface and starter docs to match the new closure model: updated role prompts, wave examples, migration guidance, current-state docs, roadmap notes, and package docs so `cont-EVAL`, `cont-QA`, and security review are all first-class authoring concepts.
24
+ - Expanded the skills surface substantially: richer `skill.json` manifests, more complete runtime and provider adapters, recursive `references/` material, updated starter role packs, new `skills/README.md`, and clearer runtime-projection/reference docs for role-, runtime-, and deploy-kind-aware skill activation.
25
+ - Expanded provider and operator guidance across the shipped skill packs, including richer Railway, AWS, Kubernetes, Docker Compose, SSH/manual, GitHub Release, repo-coding-rules, role-security, and wave-core references.
26
+ - Expanded proof-first authoring guidance with new sample waves and reference docs for live-proof work, benchmark-driven closure, sticky executor guidance, and richer example wave surfaces.
27
+ - Expanded the local research bibliography and tooling: updated `docs/research/agent-context-sources.md`, added `docs/research/coordination-failure-review.md`, introduced the combined research manifest under `scripts/research/manifests/agent-context-expanded-2026-03-22.mjs`, and taught the archive indexer about planning, skills, blackboard, repo-context, and security topic slices.
28
+ - Curated the README research section so the public-facing bibliography points at the specific papers and practice articles the implementation is based on, rather than only a generic source list.
29
+
30
+ ### Fixed And Hardened
31
+
32
+ - Hardened agent-state, launcher, ledger, replay, traces, local-executor, config, and wave-file validation so `cont-EVAL`, `cont-QA`, and security review all use the correct markers, report ownership, gate sequencing, exit expectations, and replay-visible state.
33
+ - Hardened runtime artifact normalization so versioned dashboard payloads always rewrite stale `kind` and `schemaVersion` fields to the canonical `0.6` metadata contract.
34
+ - Hardened closure-sweep validation so waves that override the integration or documentation steward ids are validated against the same role ids that the launcher actually runs.
35
+ - Hardened coordination and clarification handling so new integration-summary, security-review, and human-follow-up surfaces stay visible in the canonical coordination state, generated board projections, inboxes, and trace artifacts.
36
+ - Hardened `wave coord show` into a read-only inspection path again; artifact materialization stays on `wave coord render` and `wave coord inbox`.
37
+ - Hardened skill and runtime overlays so invalid manifests, mismatched selectors, missing adapters or references, and runtime-specific projection mistakes fail loudly instead of degrading silently at launch time.
38
+ - Hardened ad-hoc planning and promotion so `wave adhoc promote` promotes the stored ad-hoc spec instead of re-reading the current project profile, shared-plan deltas still queue the canonical lane docs correctly, and ownership inference ignores external URL-style hints rather than treating them as repo paths.
39
+ - Hardened install and starter-surface updates so newly seeded workspaces pick up the renamed closure roles, eval catalog, security review role, and expanded skill/reference materials consistently.
40
+
41
+ ### Testing And Validation
42
+
43
+ - Expanded regression coverage across `agent-state`, `config`, `coordination`, `launcher`, `planner`, `skills`, `traces`, `wave-files`, `install`, `local-executor`, and the new `adhoc` and `evals` modules to cover the release's new closure, security, skills, and ad-hoc execution behavior end to end.
44
+ - Added focused regression coverage for dashboard metadata normalization, custom closure-role ids, read-only `wave coord show`, and the per-agent rate-limit retry wrapper.
45
+
3
46
  ## 0.5.4 - 2026-03-22
4
47
 
5
48
  - Added the planner foundation: project bootstrap memory in `.wave/project-profile.json`, `wave project setup|show`, and interactive `wave draft` generation of structured wave specs plus launcher-compatible markdown.
@@ -9,14 +52,14 @@
9
52
 
10
53
  ## 0.5.3 - 2026-03-22
11
54
 
12
- - Deferred integration, documentation, and evaluator agents until the closure sweep whenever implementation work is still pending, so the runtime now matches the documented closure model.
55
+ - Deferred integration, documentation, and cont-QA agents until the closure sweep whenever implementation work is still pending, so the runtime now matches the documented closure model.
13
56
  - Scoped wave wait/progress and human-feedback monitoring to the runs actually launched in the current pass, preventing deferred closure agents from surfacing as false pending or missing-status failures.
14
57
  - Added regression coverage for mixed implementation/closure waves and for closure-only retry waves.
15
58
  - Published `@chllming/wave-orchestration@0.5.3` successfully to npmjs and GitHub Releases.
16
59
 
17
60
  ## 0.5.2 - 2026-03-22
18
61
 
19
- - Hardened structured closure marker parsing so fenced or prose example `[wave-*]` lines no longer satisfy implementation, integration, documentation, or evaluator gates.
62
+ - Hardened structured closure marker parsing so fenced or prose example `[wave-*]` lines no longer satisfy implementation, integration, documentation, or cont-QA gates.
20
63
  - Hardened `### Deliverables` so declared outputs must remain repo-relative file paths inside the implementation agent's declared file ownership before the exit contract can pass.
21
64
  - Added regression coverage for the fenced-marker false-positive path and for deliverables that escape ownership boundaries.
22
65
  - Published `@chllming/wave-orchestration@0.5.2` successfully to npmjs, making npmjs the working public install path instead of a pending rollout target.
@@ -48,7 +91,7 @@
48
91
 
49
92
  - Added the Phase 1 and 2 harness runtime: canonical coordination store, compiled inboxes, wave ledger, integration summaries, and clarification triage.
50
93
  - Added planning-time runtime profiles, lane runtime policy, hard runtime-mix validation, and retry fallback reassignment recording.
51
- - Added integration stewardship and staged closure so integration gates documentation and evaluator closure.
94
+ - Added integration stewardship and staged closure so integration gates documentation and cont-QA closure.
52
95
 
53
96
  ## 0.2.0 - 2026-03-21
54
97
 
package/README.md CHANGED
@@ -7,7 +7,7 @@ Wave Orchestration is a repository harness for running multi-agent work in bound
7
7
  1. Write shared docs and one or more `docs/plans/waves/wave-<n>.md` files.
8
8
  2. Run `wave launch --dry-run` to validate the wave and materialize prompts, inboxes, dashboards, and executor previews.
9
9
  3. A real launch runs implementation agents first. Agents post claims, evidence, requests, and decisions into the coordination log and rolling message board.
10
- 4. When implementation gates pass, closure runs in order: integration (`A8`), documentation (`A9`), evaluator (`A0`).
10
+ 4. When implementation gates pass, closure runs in order: optional `cont-EVAL` (`E0`), integration (`A8`), documentation (`A9`), and `cont-QA` (`A0`).
11
11
  5. Operators use the generated ledgers, inboxes, feedback queue, dependency views, and traces instead of guessing from raw terminal output.
12
12
 
13
13
  ## Features
@@ -26,10 +26,24 @@ Wave Orchestration is a repository harness for running multi-agent work in bound
26
26
 
27
27
  Representative rolling message board output from a real wave run:
28
28
 
29
- <img src="./docs/image.png" alt="Example rolling message board output showing claims, evidence, requests, and evaluator closure for a wave run" width="100%" />
29
+ <img src="./docs/image.png" alt="Example rolling message board output showing claims, evidence, requests, and cont-QA closure for a wave run" width="100%" />
30
30
 
31
31
  ## Quick Start
32
32
 
33
+ Current release:
34
+
35
+ - `@chllming/wave-orchestration@0.6.0`
36
+ - Release tag: [`v0.6.0`](https://github.com/chllming/wave-orchestration/releases/tag/v0.6.0)
37
+ - Public install path: npmjs
38
+ - Authenticated fallback: GitHub Packages
39
+
40
+ Highlights in `0.6.0`:
41
+
42
+ - `cont-EVAL` (`E0`) is now a first-class optional eval stage before integration, separate from final `cont-QA` closure.
43
+ - Optional security review now has a dedicated role, report path, and `[wave-security]` closure marker.
44
+ - `wave adhoc plan|run|show|promote` now supports transient operator requests on the same launcher substrate.
45
+ - Starter docs and skills now cover the current `0.6.0` closure, benchmark, security, and provider surfaces.
46
+
33
47
  Requirements:
34
48
 
35
49
  - Node.js 22+
@@ -54,7 +68,7 @@ If the repo already has Wave config, plans, or waves you want to keep:
54
68
  pnpm exec wave init --adopt-existing
55
69
  ```
56
70
 
57
- Fresh init also seeds a starter `skills/` library. The launcher projects those skill bundles into Codex, Claude, OpenCode, and local executor overlays after the final runtime for each agent is resolved.
71
+ Fresh init also seeds a starter `skills/` library plus `docs/evals/benchmark-catalog.json`. The launcher projects those skill bundles into Codex, Claude, OpenCode, and local executor overlays after the final runtime for each agent is resolved, and waves that include `cont-EVAL` can declare `## Eval targets` against that catalog.
58
72
 
59
73
  ## Common Commands
60
74
 
@@ -100,16 +114,30 @@ node scripts/wave.mjs launch --lane main --dry-run --no-dashboard
100
114
  Canonical source index:
101
115
  - [docs/research/agent-context-sources.md](./docs/research/agent-context-sources.md)
102
116
 
103
- Key external sources:
117
+ The implementation is based on the following research:
118
+
119
+ **Harness and Runtime Surfaces**
104
120
  - [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
105
121
  - [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)
106
122
  - [Unlocking the Codex harness: how we built the App Server](https://openai.com/index/unlocking-the-codex-harness/)
107
123
  - [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
108
124
  - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
109
125
  - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
126
+ - [Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution](https://arxiv.org/abs/2603.11445)
127
+ - [Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models](https://arxiv.org/abs/2510.04618)
128
+
129
+ **Shared Coordination and Closure**
110
130
  - [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
111
131
  - [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
112
132
  - [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
133
+ - [Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/abs/2503.13657)
113
134
  - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
114
- - [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
115
135
  - [An Open Agent Architecture](https://cdn.aaai.org/Symposia/Spring/1994/SS-94-03/SS94-03-001.pdf)
136
+
137
+ **Skills, Repo Context, and Reusable Operating Knowledge**
138
+ - [SoK: Agentic Skills -- Beyond Tool Use in LLM Agents](https://arxiv.org/abs/2602.20867)
139
+ - [Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward](https://arxiv.org/abs/2602.12430)
140
+ - [SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks](https://arxiv.org/abs/2602.12670)
141
+ - [Agent Workflow Memory](https://arxiv.org/abs/2409.07429)
142
+ - [Agent READMEs: An Empirical Study of Context Files for Agentic Coding](https://arxiv.org/abs/2511.12884)
143
+ - [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)
package/docs/README.md CHANGED
@@ -1,6 +1,10 @@
1
1
  # Wave Documentation
2
2
 
3
- This repository now uses a layered docs structure so operators, maintainers, and adopting repos can find the right level of detail quickly.
3
+ This repository now uses a layered docs structure, but the useful path is journey-first:
4
+
5
+ - start with one core concept doc
6
+ - then use one end-to-end workflow guide
7
+ - then drop into reference or narrower concept pages only when needed
4
8
 
5
9
  ## Suggested Structure
6
10
 
@@ -18,13 +22,23 @@ This repository now uses a layered docs structure so operators, maintainers, and
18
22
  ## Start Here
19
23
 
20
24
  - New to Wave:
21
- Read [concepts/what-is-a-wave.md](./concepts/what-is-a-wave.md), [concepts/runtime-agnostic-orchestration.md](./concepts/runtime-agnostic-orchestration.md), and [concepts/context7-vs-skills.md](./concepts/context7-vs-skills.md).
25
+ Read [concepts/what-is-a-wave.md](./concepts/what-is-a-wave.md). It now covers the core execution model, runtime posture, closure, and state model in one place.
22
26
  - Drafting or revising waves:
23
- Read [guides/planner.md](./guides/planner.md) and then the operator runbook in [plans/wave-orchestrator.md](./plans/wave-orchestrator.md).
27
+ Read [guides/author-and-run-waves.md](./guides/author-and-run-waves.md), then use [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) as the operator runbook.
28
+ - Adding a security review pass:
29
+ Read [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) and the standing reviewer prompt in [agents/wave-security-role.md](./agents/wave-security-role.md).
30
+ - Upgrading an existing repo:
31
+ Read [plans/migration.md](./plans/migration.md), then review the release notes in [../CHANGELOG.md](../CHANGELOG.md) before running `pnpm exec wave upgrade`.
32
+ - Looking for concrete example waves:
33
+ Read [reference/sample-waves.md](./reference/sample-waves.md) for showcase-first examples that demonstrate the current authored wave surface.
34
+ - Release notes and shipped deltas:
35
+ Use [../CHANGELOG.md](../CHANGELOG.md) as the canonical version-by-version surface summary, then use [plans/current-state.md](./plans/current-state.md) to see what the starter workspace assumes today.
24
36
  - Running live waves:
25
- Read [guides/terminal-surfaces.md](./guides/terminal-surfaces.md), [concepts/operating-modes.md](./concepts/operating-modes.md), and [plans/wave-orchestrator.md](./plans/wave-orchestrator.md).
37
+ Start with [guides/author-and-run-waves.md](./guides/author-and-run-waves.md), then use [plans/wave-orchestrator.md](./plans/wave-orchestrator.md) for the live operator flow.
26
38
  - Tuning runtime behavior:
27
39
  Read [reference/runtime-config/README.md](./reference/runtime-config/README.md) and [reference/skills.md](./reference/skills.md).
40
+ - Looking for supporting concept pages:
41
+ Use [concepts/runtime-agnostic-orchestration.md](./concepts/runtime-agnostic-orchestration.md), [concepts/operating-modes.md](./concepts/operating-modes.md), and [concepts/context7-vs-skills.md](./concepts/context7-vs-skills.md) after the main concept and workflow docs.
28
42
 
29
43
  ## Package vs Repo-Owned Material
30
44
 
@@ -0,0 +1,36 @@
1
+ ---
2
+ title: "Wave cont-EVAL Role"
3
+ summary: "Standing prompt for the continuous eval role that tunes service output against declared eval targets and benchmarks."
4
+ ---
5
+
6
+ # Wave cont-EVAL Role
7
+
8
+ Use this prompt when an agent should act as the continuous eval tuning role for a wave.
9
+
10
+ ## Standing prompt
11
+
12
+ ```text
13
+ You are the cont-EVAL role for the current wave.
14
+
15
+ Your job is to run the relevant service or benchmark surfaces, inspect real outputs, identify quality gaps, and drive iterative improvements until the declared eval targets are satisfied or clearly blocked.
16
+
17
+ Operating rules:
18
+ - Read the wave's `## Eval targets` section before doing any tuning work.
19
+ - Treat benchmark choice as a repo-governed decision. If the wave delegates benchmark selection, choose only from the declared benchmark family and record the exact selected set.
20
+ - Re-run the service or eval procedure after each material change. Do not claim improvement from one-off inspection alone.
21
+ - By default, you are report-only. You may directly edit implementation files only when the wave explicitly assigns you non-report owned paths.
22
+ - Stay within your declared file ownership for direct edits. If the required fix belongs to another owner, open explicit follow-up work instead of freelancing across boundaries.
23
+ - Keep regressions explicit. Improvement in one target does not justify silent breakage elsewhere.
24
+
25
+ What you must do:
26
+ - select or confirm the benchmark set used for the eval pass
27
+ - run the service, benchmark commands, or output reviews needed to score the targets
28
+ - record the observed gaps, regressions, and next changes after each meaningful iteration
29
+ - when you own non-report files, emit the same final proof, doc-delta, and component markers required of other implementation owners
30
+ - leave an append-only cont-EVAL report with the selected benchmarks, commands run, observed gaps, regressions, and final disposition
31
+ - emit one final structured marker:
32
+ `[wave-eval] state=<satisfied|needs-more-work|blocked> targets=<n> benchmarks=<n> regressions=<n> target_ids=<csv> benchmark_ids=<csv> detail=<short-note>`
33
+
34
+ Use `satisfied` only when the declared eval targets are actually met by observed outputs or benchmark results, not when the code merely looks plausible.
35
+ Use `satisfied` only when `target_ids` exactly matches the wave contract, `benchmark_ids` enumerates the executed benchmark set, and unresolved regressions are zero.
36
+ ```
@@ -1,43 +1,46 @@
1
1
  ---
2
- title: "Wave Evaluator Role"
3
- summary: "Standing prompt for the running evaluator that gates a wave through architecture, proof, and documentation closure."
2
+ title: "Wave cont-QA Role"
3
+ summary: "Standing prompt for the continuous QA role that gates a wave through architecture, proof, and documentation closure."
4
4
  ---
5
5
 
6
- # Wave Evaluator Role
6
+ # Wave cont-QA Role
7
7
 
8
- Use this prompt when an agent should act as the running evaluator for a wave.
8
+ Use this prompt when an agent should act as the continuous QA closure role for a wave.
9
9
 
10
10
  ## Standing prompt
11
11
 
12
12
  ```text
13
- You are the running evaluator for the current wave.
13
+ You are the cont-QA role for the current wave.
14
14
 
15
- Your job is to keep the wave aligned with repository guidance, plan docs, and proof expectations while the wave is still in progress. You are a live gate, not a final cleanup reviewer.
15
+ Your job is to make the final closure judgment after implementation proof, optional cont-EVAL, integration, and documentation closure have all produced their evidence. You are the fail-closed final steward, not an in-progress reviewer.
16
16
 
17
17
  Operating rules:
18
18
  - Review changed files against the relevant repository docs and plan docs.
19
19
  - Read docs/reference/repository-guidance.md and docs/research/agent-context-sources.md before making final judgments.
20
20
  - Re-read the compiled shared summary, your inbox, and the generated wave board projection before major decisions, before validation, and before final output.
21
+ - Judge landed evidence, not intent, effort, or ownership handoff language.
21
22
  - Require implementation agents to make gaps explicit instead of implying completion.
22
23
  - Treat shared-plan documentation closure as a real gate when the wave changes status, sequencing, ownership, or proof expectations.
23
24
  - Distinguish landed evidence from intent, future work, or handoff notes.
24
25
 
25
26
  What you must do:
26
- - detect architecture or planning drift while implementation is in progress
27
- - surface missing proof, missing validation, missing ownership, and missing documentation closure early
28
27
  - compare landed evidence to each agent's declared exit contract
29
28
  - compare landed evidence to the wave's declared component promotions and required target levels
29
+ - confirm the integration steward's closure recommendation still matches the final landed state
30
+ - confirm documentation closure is actually closed or explicitly `no-change` where allowed
31
+ - keep the final verdict and final `[wave-gate]` marker internally consistent
30
32
  - require exact shared-doc deltas and explicit `closed` or `no-change` notes before PASS when shared plan docs are affected
31
- - publish an append-only evaluator report for the wave
33
+ - report the smallest blocking set that prevents closure
34
+ - publish an append-only cont-QA report for the wave
32
35
 
33
36
  Verdict contract:
34
- - End the evaluator report with exactly one machine-readable line:
37
+ - End the cont-QA report with exactly one machine-readable line:
35
38
  `Verdict: PASS`
36
39
  `Verdict: CONCERNS`
37
40
  or `Verdict: BLOCKED`
38
41
  - Also emit one final structured gate marker:
39
42
  `[wave-gate] architecture=<pass|concerns|blocked> integration=<pass|concerns|blocked> durability=<pass|concerns|blocked> live=<pass|concerns|blocked> docs=<pass|concerns|blocked> detail=<short-note>`
40
43
 
41
- Use PASS only when the required proof is actually present.
44
+ Use PASS only when the required proof is actually present and the final gate marker is fully PASS.
42
45
  If the wave declares component promotions, PASS requires those components to reach the declared level instead of merely landing adjacent code.
43
46
  ```
@@ -17,7 +17,7 @@ Your job is to keep shared plan and status docs aligned with the real landed imp
17
17
  Operating rules:
18
18
  - Anchor updates to docs/reference/repository-guidance.md.
19
19
  - Re-read the compiled shared summary, your inbox, and the generated wave board projection before major decisions, before validation, and before final output.
20
- - Coordinate with the evaluator and implementation agents, but do not use coordination as an excuse to defer obvious shared-plan updates.
20
+ - Coordinate with the cont-QA and implementation agents, but do not use coordination as an excuse to defer obvious shared-plan updates.
21
21
  - Keep subsystem-specific docs with the agents that land those deliverables.
22
22
 
23
23
  What you must do:
@@ -24,7 +24,7 @@ What you must do:
24
24
  - identify the exact infra surface you own for the wave
25
25
  - surface missing dependencies, identity gaps, admission blockers, and machine drift early
26
26
  - emit durable coordination records when the work depends on another agent or a human decision
27
- - leave enough exact evidence that the integration steward and evaluator can tell whether the infra surface is conformant, still in setup, or blocked
27
+ - leave enough exact evidence that the integration steward and cont-QA can tell whether the infra surface is conformant, still in setup, or blocked
28
28
  - emit structured infra markers whenever the task touches machine validation, workload identity, node admission, deployment bootstrap, or approved machine actions:
29
29
  `[infra-status] kind=<conformance|role-drift|dependency|identity|admission|action> target=<machine-or-surface> state=<checking|setup-required|setup-in-progress|conformant|drift|blocked|failed|action-required|action-approved|action-complete> detail=<short-note>`
30
30
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  title: "Wave Integration Role"
3
- summary: "Standing prompt for the integration steward that reconciles cross-agent state before documentation and evaluator closure."
3
+ summary: "Standing prompt for the integration steward that reconciles cross-agent state after cont-EVAL and before documentation and cont-QA closure."
4
4
  ---
5
5
 
6
6
  # Wave Integration Role
@@ -12,7 +12,7 @@ Use this prompt when an agent should act as the integration steward for a wave.
12
12
  ```text
13
13
  You are the integration steward for the current wave.
14
14
 
15
- Your job is to synthesize cross-agent state before the documentation steward and evaluator make their final pass. You do not replace implementation ownership. You decide whether the wave is coherent enough for doc closure.
15
+ Your job is to synthesize cross-agent state after any `cont-EVAL` tuning pass and before the documentation steward and cont-QA make their final pass. You do not replace implementation ownership. You decide whether the wave is coherent enough for doc closure.
16
16
 
17
17
  Operating rules:
18
18
  - Re-read the generated wave inboxes and coordination board projection before major decisions.
@@ -28,5 +28,5 @@ What you must do:
28
28
  - emit one final structured marker:
29
29
  `[wave-integration] state=<ready-for-doc-closure|needs-more-work> claims=<n> conflicts=<n> blockers=<n> detail=<short-note>`
30
30
 
31
- Use `ready-for-doc-closure` only when the remaining work is documentation and evaluator closure, not when material implementation or integration risk still exists.
31
+ Use `ready-for-doc-closure` only when the remaining work is documentation and cont-QA closure, not when material implementation or integration risk still exists.
32
32
  ```
@@ -12,7 +12,7 @@ Use this prompt when an agent or human operator should launch waves through the
12
12
  ```text
13
13
  You are the wave launcher operator.
14
14
 
15
- Your job is to run wave files safely, one wave at a time by default, while respecting launcher locks, runtime policy, clarification barriers, integration gates, documentation closure, and evaluator closure.
15
+ Your job is to run wave files safely, one wave at a time by default, while respecting launcher locks, runtime policy, clarification barriers, optional `cont-EVAL` gates, integration gates, documentation closure, and cont-QA closure.
16
16
 
17
17
  Before launching:
18
18
  1. Run `pnpm exec wave doctor`.
@@ -24,8 +24,9 @@ Before launching:
24
24
 
25
25
  Completion requires:
26
26
  - all agents exit `0`
27
- - integration must be `ready-for-doc-closure` before documentation and evaluator closure run
28
- - evaluator verdict is `PASS`
27
+ - if `cont-EVAL` is present, it must report satisfied targets before integration closure runs
28
+ - integration must be `ready-for-doc-closure` before documentation and cont-QA closure run
29
+ - cont-QA verdict is `PASS`
29
30
  - prompt hashes still match the current wave definitions
30
31
  - shared-plan documentation closure is resolved when required
31
32
  - no routed clarification chain or unresolved human escalation remains open
@@ -0,0 +1,40 @@
1
+ ---
2
+ title: "Wave Security Role"
3
+ summary: "Standing prompt for the security reviewer that performs a threat-model-first review before integration closure."
4
+ ---
5
+
6
+ # Wave Security Role
7
+
8
+ Use this prompt when an agent should act as the security reviewer for a wave.
9
+
10
+ ## Standing prompt
11
+
12
+ ```text
13
+ You are the wave security reviewer for the current wave.
14
+
15
+ Your job is to review the landed change set before integration closure, identify security-sensitive risks, and route exact fixes or approvals while the wave is still active. You are report-only by default. Do not replace implementation ownership.
16
+
17
+ Operating rules:
18
+ - Re-read the compiled shared summary, your inbox, the generated wave board projection, and the owned reports before major decisions.
19
+ - Do a threat-model pass before finalizing conclusions. Identify trust boundaries, attacker-controlled inputs, sensitive assets, approval-sensitive operations, and any external execution or data access paths touched by the wave.
20
+ - Prefer exact findings and exact requested fixes over vague warnings.
21
+ - Route fixes to the owning agent when the required change is outside your report path.
22
+ - Keep the final output short enough to drive relaunch decisions and closure gates.
23
+
24
+ What you must do:
25
+ - leave a security review report with these sections in order:
26
+ `Threat Model`
27
+ `Risky Surfaces`
28
+ `Findings`
29
+ `Required Approvals`
30
+ `Requested Fixes`
31
+ `Final Disposition`
32
+ - record each finding with severity, concrete file or surface, exploit or failure mode, and the owner expected to fix it
33
+ - record each approval-sensitive action explicitly, even if the wave can proceed without blocking
34
+ - emit one final structured marker:
35
+ `[wave-security] state=<clear|concerns|blocked> findings=<n> approvals=<n> detail=<short-note>`
36
+
37
+ Use `clear` only when no unresolved findings or approvals remain.
38
+ Use `concerns` when findings remain advisory for this wave and do not automatically block progression.
39
+ Use `blocked` only when the wave must stop before integration until a finding or approval is resolved.
40
+ ```
@@ -44,7 +44,7 @@ Use skills when the guidance is reusable, repo-owned, and should survive across
44
44
  - environment-specific rules
45
45
  - Railway, Kubernetes, or GitHub release procedures
46
46
  - runtime-specific instructions for Codex, Claude, or OpenCode
47
- - role-oriented heuristics for implementation, deploy, evaluator, or research agents
47
+ - role-oriented heuristics for implementation, deploy, cont-QA, or research agents
48
48
 
49
49
  ## What Remains Authoritative
50
50
 
@@ -18,7 +18,7 @@ It is not just a prompt file. A wave is a bounded slice of repository work with:
18
18
  - Wave
19
19
  One numbered work package inside a lane, usually stored as `docs/plans/waves/wave-<n>.md`.
20
20
  - Agent
21
- One role inside the wave, such as implementation, integration, documentation, evaluator, infra, or deploy.
21
+ One role inside the wave, such as implementation, `cont-EVAL`, security review, integration, documentation, cont-QA, infra, or deploy.
22
22
  - Attempt
23
23
  One execution pass of a wave. A wave can have multiple attempts due to retries or fallback.
24
24
  - Closure
@@ -44,6 +44,7 @@ Wave markdown is the authored execution surface today. A typical wave can includ
44
44
  - reference rule
45
45
  - deploy environments
46
46
  - component promotions
47
+ - eval targets
47
48
  - Context7 defaults
48
49
  - one `## Agent ...` block per role
49
50
 
@@ -53,6 +54,11 @@ Inside each agent block, the important sections are:
53
54
  Standing role identity imported from `docs/agents/*.md`.
54
55
  - `### Executor`
55
56
  Runtime selection, profile, model, fallbacks, and budgets.
57
+ - `## Eval targets`
58
+ Optional wave-level contract for `cont-EVAL`, including benchmark family or pinned benchmarks, objective, and stop condition.
59
+ See [docs/evals/README.md](../evals/README.md) for guidance on delegated versus pinned targets and the coordination benchmark families.
60
+ - `### Proof artifacts`
61
+ Optional machine-visible local evidence required for proof-centric waves, especially `pilot-live` and above.
56
62
  - `### Context7`
57
63
  External library truth to prefetch and inject.
58
64
  - `### Skills`
@@ -70,16 +76,20 @@ Inside each agent block, the important sections are:
70
76
 
71
77
  ## Standard Roles
72
78
 
73
- The starter runtime expects three closure roles:
79
+ The starter runtime expects three standard closure roles plus up to two optional review specialists:
74
80
 
75
81
  - `A8`
76
82
  Integration steward
77
83
  - `A9`
78
84
  Documentation steward
79
85
  - `A0`
80
- Evaluator
86
+ cont-QA
87
+ - `E0`
88
+ Optional `cont-EVAL` for iterative benchmark or output tuning; report-only by default, implementation-owning only when explicitly assigned non-report files
89
+ - `A7`
90
+ Optional security reviewer; report-only by default and used to publish a threat-model-first security review before integration closure
81
91
 
82
- Implementation or specialist agents own the actual work slices. Closure roles do not replace implementation ownership; they decide whether the combined result is closure-ready.
92
+ Implementation or specialist agents own the actual work slices. Closure roles do not replace implementation ownership; they decide whether the combined result is closure-ready. `cont-EVAL` is the one hybrid role: most waves keep it report-only, but human-authored waves may assign explicit tuning files to `E0`, in which case it must satisfy both implementation proof and eval proof.
83
93
 
84
94
  ## Lifecycle Of A Wave
85
95
 
@@ -89,21 +99,60 @@ Implementation or specialist agents own the actual work slices. Closure roles do
89
99
  4. A live run launches implementation agents first when implementation work remains.
90
100
  5. Agents write structured coordination events instead of relying on ad hoc terminal output.
91
101
  6. The launcher checks implementation contracts, promoted-component proof, helper assignments, dependencies, and clarification state.
92
- 7. If implementation is ready, closure runs in order: integration, documentation, evaluator.
102
+ 7. If implementation is ready, closure runs in order: optional `cont-EVAL`, optional security review, integration, documentation, then cont-QA.
93
103
  8. The attempt is captured in per-wave traces, ledgers, inboxes, summaries, and copied artifacts.
94
104
 
105
+ ## Runtime And Operating Posture
106
+
107
+ Wave is runtime agnostic at the orchestration layer.
108
+
109
+ Planning, ownership, closure, durable state, and traces do not depend on whether an agent runs on Codex, Claude Code, OpenCode, or the local smoke executor. Runtime-specific behavior is isolated to executor adapters and overlays.
110
+
111
+ That means a wave should usually be authored in runtime-neutral terms:
112
+
113
+ - ownership and deliverables
114
+ - proof and validation
115
+ - closure order
116
+ - dependencies and helper flow
117
+ - promoted component expectations
118
+
119
+ The runtime choice resolves later, from the agent executor block, profile defaults, lane defaults, CLI overrides, and fallback policy.
120
+
121
+ Wave also has an execution posture:
122
+
123
+ - `oversight`
124
+ Human review or intervention is expected for risky or ambiguous work.
125
+ - `dark-factory`
126
+ The wave is authored for routine execution without normal human intervention.
127
+
128
+ Today these postures are planning vocabulary and saved project defaults, not two separate execution engines. Human feedback is still an escalation mechanism inside the orchestration loop, not the definition of the operating mode itself.
129
+
130
+ If you need the narrower supporting pages, see [runtime-agnostic-orchestration.md](./runtime-agnostic-orchestration.md) and [operating-modes.md](./operating-modes.md).
131
+
132
+ Current live waves are strict about closure artifacts:
133
+
134
+ - `cont-EVAL` must emit a structured `[wave-eval]` marker whose `target_ids` matches the declared eval targets and whose `benchmark_ids` enumerates the executed benchmark set.
135
+ - Security reviewers must leave a security review report and emit a final `[wave-security]` marker with `state=<clear|concerns|blocked>`, finding count, and approval count.
136
+ - `cont-QA` must emit both a final `Verdict:` line and a final `[wave-gate]` marker.
137
+ - Replay keeps read-only compatibility with older traces and older evaluator-era artifacts, but live waves do not pass on verdict-only or underspecified closure markers.
138
+
95
139
  ## What Makes A Wave "Done"
96
140
 
97
141
  A wave is not done because an agent said so. It is done only when the runtime surfaces agree:
98
142
 
99
143
  - implementation exit contracts pass
100
144
  - required deliverables exist and stay within ownership boundaries
145
+ - required proof artifacts exist when the wave declares proof-first live evidence
101
146
  - required component proof and promotions pass
102
147
  - helper assignments are resolved
103
148
  - required dependency tickets are resolved
104
149
  - clarification follow-ups or escalations are resolved
150
+ - if present, `cont-EVAL` satisfies its declared eval targets
151
+ - if present, the security reviewer publishes a report plus a final `[wave-security]` marker; `blocked` stops closure while `concerns` stays advisory
105
152
  - integration recommends closure
106
- - documentation and evaluator closure pass
153
+ - documentation and cont-QA closure pass
154
+
155
+ For proof-first live-wave examples, see [docs/reference/live-proof-waves.md](../reference/live-proof-waves.md).
107
156
 
108
157
  ## Where The State Lives
109
158
 
@@ -115,6 +164,7 @@ The wave file is only part of the story. The runtime writes durable state under
115
164
  - rendered message boards
116
165
  - compiled inboxes
117
166
  - ledger and docs queue
167
+ - security summaries
118
168
  - integration summaries
119
169
  - dependency snapshots
120
170
  - executor overlays