@chllming/wave-orchestration 0.6.3 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (118) hide show
  1. package/CHANGELOG.md +82 -1
  2. package/README.md +40 -7
  3. package/docs/agents/wave-orchestrator-role.md +50 -0
  4. package/docs/agents/wave-planner-role.md +39 -0
  5. package/docs/context7/bundles.json +9 -0
  6. package/docs/context7/planner-agent/README.md +25 -0
  7. package/docs/context7/planner-agent/manifest.json +83 -0
  8. package/docs/context7/planner-agent/papers/cooperbench-why-coding-agents-cannot-be-your-teammates-yet.md +3283 -0
  9. package/docs/context7/planner-agent/papers/dova-deliberation-first-multi-agent-orchestration-for-autonomous-research-automation.md +1699 -0
  10. package/docs/context7/planner-agent/papers/dpbench-large-language-models-struggle-with-simultaneous-coordination.md +2251 -0
  11. package/docs/context7/planner-agent/papers/incremental-planning-to-control-a-blackboard-based-problem-solver.md +1729 -0
  12. package/docs/context7/planner-agent/papers/silo-bench-a-scalable-environment-for-evaluating-distributed-coordination-in-multi-agent-llm-systems.md +3747 -0
  13. package/docs/context7/planner-agent/papers/todoevolve-learning-to-architect-agent-planning-systems.md +1675 -0
  14. package/docs/context7/planner-agent/papers/verified-multi-agent-orchestration-a-plan-execute-verify-replan-framework-for-complex-query-resolution.md +1173 -0
  15. package/docs/context7/planner-agent/papers/why-do-multi-agent-llm-systems-fail.md +5211 -0
  16. package/docs/context7/planner-agent/topics/planning-and-orchestration.md +24 -0
  17. package/docs/evals/README.md +96 -1
  18. package/docs/evals/arm-templates/README.md +13 -0
  19. package/docs/evals/arm-templates/full-wave.json +15 -0
  20. package/docs/evals/arm-templates/single-agent.json +15 -0
  21. package/docs/evals/benchmark-catalog.json +7 -0
  22. package/docs/evals/cases/README.md +47 -0
  23. package/docs/evals/cases/wave-blackboard-inbox-targeting.json +73 -0
  24. package/docs/evals/cases/wave-contradiction-conflict.json +104 -0
  25. package/docs/evals/cases/wave-expert-routing-preservation.json +69 -0
  26. package/docs/evals/cases/wave-hidden-profile-private-evidence.json +81 -0
  27. package/docs/evals/cases/wave-premature-closure-guard.json +71 -0
  28. package/docs/evals/cases/wave-silo-cross-agent-state.json +77 -0
  29. package/docs/evals/cases/wave-simultaneous-lockstep.json +92 -0
  30. package/docs/evals/cooperbench/real-world-mitigation.md +341 -0
  31. package/docs/evals/external-benchmarks.json +85 -0
  32. package/docs/evals/external-command-config.sample.json +9 -0
  33. package/docs/evals/external-command-config.swe-bench-pro.json +8 -0
  34. package/docs/evals/pilots/README.md +47 -0
  35. package/docs/evals/pilots/swe-bench-pro-public-full-wave-review-10.json +64 -0
  36. package/docs/evals/pilots/swe-bench-pro-public-pilot.json +111 -0
  37. package/docs/evals/wave-benchmark-program.md +302 -0
  38. package/docs/guides/planner.md +67 -11
  39. package/docs/guides/terminal-surfaces.md +12 -0
  40. package/docs/plans/context7-wave-orchestrator.md +20 -0
  41. package/docs/plans/current-state.md +8 -1
  42. package/docs/plans/examples/wave-benchmark-improvement.md +108 -0
  43. package/docs/plans/examples/wave-example-live-proof.md +1 -1
  44. package/docs/plans/examples/wave-example-rollout-fidelity.md +340 -0
  45. package/docs/plans/migration.md +26 -0
  46. package/docs/plans/wave-orchestrator.md +60 -12
  47. package/docs/plans/waves/reviews/wave-1-benchmark-operator.md +118 -0
  48. package/docs/reference/cli-reference.md +547 -0
  49. package/docs/reference/coordination-and-closure.md +436 -0
  50. package/docs/reference/live-proof-waves.md +25 -3
  51. package/docs/reference/npmjs-trusted-publishing.md +3 -3
  52. package/docs/reference/proof-metrics.md +90 -0
  53. package/docs/reference/runtime-config/README.md +63 -2
  54. package/docs/reference/runtime-config/codex.md +2 -1
  55. package/docs/reference/sample-waves.md +29 -18
  56. package/docs/reference/wave-control.md +164 -0
  57. package/docs/reference/wave-planning-lessons.md +131 -0
  58. package/package.json +5 -4
  59. package/releases/manifest.json +40 -0
  60. package/scripts/research/agent-context-archive.mjs +18 -0
  61. package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +17 -0
  62. package/scripts/research/sync-planner-context7-bundle.mjs +133 -0
  63. package/scripts/wave-orchestrator/agent-state.mjs +11 -2
  64. package/scripts/wave-orchestrator/artifact-schemas.mjs +232 -0
  65. package/scripts/wave-orchestrator/autonomous.mjs +7 -0
  66. package/scripts/wave-orchestrator/benchmark-cases.mjs +374 -0
  67. package/scripts/wave-orchestrator/benchmark-external.mjs +1384 -0
  68. package/scripts/wave-orchestrator/benchmark.mjs +972 -0
  69. package/scripts/wave-orchestrator/clarification-triage.mjs +78 -12
  70. package/scripts/wave-orchestrator/config.mjs +175 -0
  71. package/scripts/wave-orchestrator/control-cli.mjs +1216 -0
  72. package/scripts/wave-orchestrator/control-plane.mjs +697 -0
  73. package/scripts/wave-orchestrator/coord-cli.mjs +360 -2
  74. package/scripts/wave-orchestrator/coordination-store.mjs +211 -9
  75. package/scripts/wave-orchestrator/coordination.mjs +84 -0
  76. package/scripts/wave-orchestrator/dashboard-renderer.mjs +120 -5
  77. package/scripts/wave-orchestrator/dashboard-state.mjs +22 -0
  78. package/scripts/wave-orchestrator/evals.mjs +23 -0
  79. package/scripts/wave-orchestrator/executors.mjs +3 -2
  80. package/scripts/wave-orchestrator/feedback.mjs +55 -0
  81. package/scripts/wave-orchestrator/install.mjs +151 -2
  82. package/scripts/wave-orchestrator/launcher-closure.mjs +4 -1
  83. package/scripts/wave-orchestrator/launcher-runtime.mjs +33 -30
  84. package/scripts/wave-orchestrator/launcher.mjs +884 -36
  85. package/scripts/wave-orchestrator/planner-context.mjs +75 -0
  86. package/scripts/wave-orchestrator/planner.mjs +2270 -136
  87. package/scripts/wave-orchestrator/proof-cli.mjs +195 -0
  88. package/scripts/wave-orchestrator/proof-registry.mjs +317 -0
  89. package/scripts/wave-orchestrator/replay.mjs +10 -4
  90. package/scripts/wave-orchestrator/retry-cli.mjs +184 -0
  91. package/scripts/wave-orchestrator/retry-control.mjs +225 -0
  92. package/scripts/wave-orchestrator/shared.mjs +26 -0
  93. package/scripts/wave-orchestrator/swe-bench-pro-task.mjs +1004 -0
  94. package/scripts/wave-orchestrator/terminals.mjs +1 -1
  95. package/scripts/wave-orchestrator/traces.mjs +157 -2
  96. package/scripts/wave-orchestrator/wave-control-client.mjs +532 -0
  97. package/scripts/wave-orchestrator/wave-control-schema.mjs +309 -0
  98. package/scripts/wave-orchestrator/wave-files.mjs +144 -23
  99. package/scripts/wave.mjs +27 -0
  100. package/skills/repo-coding-rules/SKILL.md +1 -0
  101. package/skills/role-cont-eval/SKILL.md +1 -0
  102. package/skills/role-cont-qa/SKILL.md +13 -6
  103. package/skills/role-deploy/SKILL.md +1 -0
  104. package/skills/role-documentation/SKILL.md +4 -0
  105. package/skills/role-implementation/SKILL.md +4 -0
  106. package/skills/role-infra/SKILL.md +2 -1
  107. package/skills/role-integration/SKILL.md +15 -8
  108. package/skills/role-planner/SKILL.md +39 -0
  109. package/skills/role-planner/skill.json +21 -0
  110. package/skills/role-research/SKILL.md +1 -0
  111. package/skills/role-security/SKILL.md +2 -2
  112. package/skills/runtime-claude/SKILL.md +2 -1
  113. package/skills/runtime-codex/SKILL.md +1 -0
  114. package/skills/runtime-local/SKILL.md +2 -0
  115. package/skills/runtime-opencode/SKILL.md +1 -0
  116. package/skills/wave-core/SKILL.md +25 -6
  117. package/skills/wave-core/references/marker-syntax.md +16 -8
  118. package/wave.config.json +45 -0
@@ -0,0 +1,436 @@
1
+ ---
2
+ title: "Coordination And Closure"
3
+ summary: "How agent-to-agent work, deliverables, integration, and final closure behave end to end in the Wave runtime."
4
+ ---
5
+
6
+ # Coordination And Closure
7
+
8
+ This page explains the runtime model behind Wave coordination, helper work, integration, and final closure.
9
+
10
+ The short version is:
11
+
12
+ - `exit 0` means an agent process finished
13
+ - it does not mean the wave is ready to close
14
+ - closure is based on durable coordination state plus the staged closure gates
15
+
16
+ ## Core Model
17
+
18
+ Wave distinguishes three different things:
19
+
20
+ 1. an agent finishing its own owned work
21
+ 2. an agent asking another agent or lane for follow-up work
22
+ 3. the wave being globally coherent enough to pass integration, documentation, and cont-QA closure
23
+
24
+ Those are related, but they are not the same.
25
+
26
+ An implementation agent can be locally complete and still leave the wave blocked if it created open helper work, unresolved clarification chains, or required dependencies.
27
+
28
+ ## Durable State Surfaces
29
+
30
+ The runtime writes several different artifacts, but they do different jobs:
31
+
32
+ - canonical coordination log:
33
+ `.tmp/<lane>-wave-launcher/coordination/wave-<n>.jsonl`
34
+ - helper-assignment snapshot:
35
+ `.tmp/<lane>-wave-launcher/assignments/wave-<n>.json`
36
+ - dependency snapshot:
37
+ `.tmp/<lane>-wave-launcher/dependencies/wave-<n>.json`
38
+ - shared summary:
39
+ `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/shared-summary.md`
40
+ - per-agent inboxes:
41
+ `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/<agent>.md`
42
+ - integration summary:
43
+ `.tmp/<lane>-wave-launcher/integration/wave-<n>.json`
44
+ - wave dashboard:
45
+ `.tmp/<lane>-wave-launcher/dashboards/wave-<n>.json`
46
+ - run-state:
47
+ `.tmp/<lane>-wave-launcher/run-state.json`
48
+
49
+ The important rule is that the JSONL coordination log is the scheduler truth. The markdown board is a projection for humans. See [wave-orchestrator.md](../plans/wave-orchestrator.md).
50
+
51
+ Live waves now keep refreshing that derived state while agents are still running. Shared summaries, inboxes, dashboard coordination metrics, and clarification routing are not only recomputed at attempt boundaries; they are also refreshed during active wave execution so stale clarification and acknowledgement timing is machine-visible before the attempt ends.
52
+
53
+ ## What Agents Should Use
54
+
55
+ Use the coordination log for durable state:
56
+
57
+ - `request`
58
+ Use this when you need another agent or capability owner to do work. Target it explicitly. This is the kind that becomes a helper assignment.
59
+ - `blocker`
60
+ Use this when the wave is blocked, but not because the launcher needs to route work to a specific assignee.
61
+ - `handoff`
62
+ Use this for continuity and context transfer. This is informative by itself; it is not the same as a blocking helper assignment.
63
+ - `evidence`
64
+ Use this for durable facts, artifacts, or proof that another agent may need.
65
+ - `claim`
66
+ Use this for assertions that integration should reconcile.
67
+ - `clarification-request`
68
+ Use this when an ambiguity must be triaged before work can safely continue.
69
+
70
+ ## What Stewards and Orchestrators May Also Use
71
+
72
+ - `ack`
73
+ Acknowledge receipt of a request or clarification. Resets the acknowledgement timer.
74
+ - `decision`
75
+ Record a binding decision that downstream agents should follow.
76
+ - `orchestrator-guidance`
77
+ Non-binding guidance from the resident orchestrator.
78
+
79
+ Implementation agents normally do not need these kinds.
80
+
81
+ Practical rule:
82
+
83
+ - if you need another agent to take action and you want the wave to stay blocked until it is done, use a targeted `request`
84
+ - a plain board note or plain `handoff` is not enough
85
+
86
+ ## Open Versus Resolved
87
+
88
+ Wave treats these coordination statuses as open:
89
+
90
+ - `open`
91
+ - `acknowledged`
92
+ - `in_progress`
93
+
94
+ It treats these as non-blocking:
95
+
96
+ - `resolved`
97
+ - `closed`
98
+ - `superseded`
99
+ - `cancelled`
100
+
101
+ That means a targeted helper request keeps blocking until the request leaves the open set in coordination state.
102
+
103
+ This page is documenting runtime semantics first. The important contract is that closure follows the durable coordination state, not that a particular human or agent used one exact command path to mutate it.
104
+
105
+ ## Deliverables Versus Helper Work
106
+
107
+ Deliverables prove an agent landed its own owned outputs.
108
+
109
+ For implementation agents with an exit contract, closure validates:
110
+
111
+ - `[wave-proof]`
112
+ - `[wave-doc-delta]`
113
+ - any required `[wave-component]` markers
114
+ - declared `### Deliverables`
115
+ - declared `### Proof artifacts`
116
+
117
+ Deliverables and proof artifacts are local ownership proof. They do not replace cross-agent follow-up.
118
+
119
+ That distinction matters:
120
+
121
+ - if Agent A1 owns `src/foo.ts` and `docs/reviews/foo.md`, those should be modeled as A1 deliverables
122
+ - if A1 needs Agent A8 to reconcile a cross-component interface or integration contradiction, that is not an A1 deliverable
123
+ - that second case is coordination work, and it should become a targeted request
124
+
125
+ ## End-To-End Example: Agent A1 Needs A8
126
+
127
+ Assume:
128
+
129
+ - A1 owns implementation files and its review output
130
+ - A8 is the integration steward
131
+ - A1 finishes its code and report, but notices an interface contradiction that only A8 can reconcile
132
+
133
+ ### Step 1: A1 Lands Its Owned Work
134
+
135
+ A1 can still satisfy its own slice by:
136
+
137
+ - writing its owned files
138
+ - emitting a valid `[wave-proof]`
139
+ - emitting a valid `[wave-doc-delta]`
140
+ - satisfying any declared deliverables and proof artifacts
141
+
142
+ At this point A1 can be locally done.
143
+
144
+ ### Step 2: A1 Raises A Durable Request
145
+
146
+ Example:
147
+
148
+ ```bash
149
+ pnpm exec wave control task create \
150
+ --lane main \
151
+ --wave 4 \
152
+ --agent A1 \
153
+ --kind request \
154
+ --summary "Need integration decision for auth/session interface change" \
155
+ --detail "A1 landed the auth refactor, but session ownership now spans auth, gateway, and docs surfaces. A8 must reconcile the final contract and closure path." \
156
+ --target agent:A8 \
157
+ --priority high
158
+ ```
159
+
160
+ What happens next:
161
+
162
+ - the request lands in the canonical coordination log
163
+ - the launcher derives a helper assignment for `agent:A8`
164
+ - that assignment is written into the assignment snapshot
165
+ - the shared summary and A8 inbox now show the open helper work
166
+
167
+ `wave control task list` and `wave control task get` surface both blocking and informative coordination kinds. `wave control status` only turns `request`, `blocker`, `clarification-request`, `human-feedback`, and `human-escalation` into blocking task edges; plain `handoff`, `evidence`, `claim`, and `decision` records stay visible without falsely blocking the owner. When a launcher attempt is already running, status scopes the top-level blocking edge to that active attempt instead of letting stale relaunch metadata or unrelated closure tasks dominate the wave-level view.
168
+
169
+ ### Step 3: Why A1 Can Be Done But The Wave Is Still Blocked
170
+
171
+ This is the important distinction:
172
+
173
+ - A1 may be done with A1's ownership
174
+ - the wave is not done
175
+
176
+ The launcher will still see:
177
+
178
+ - an open helper assignment for the request
179
+ - an integration summary that is not yet ready for doc closure
180
+
181
+ So the wave remains blocked.
182
+
183
+ In runtime terms, this becomes:
184
+
185
+ - `helper-assignment-open` if the request has an assignee
186
+ - `helper-assignment-unresolved` if no assignee could be found
187
+
188
+ ### Step 4: A8 Resolves The Follow-Up
189
+
190
+ A8 reads the shared summary and inbox, reconciles the issue, and updates the integration state.
191
+
192
+ That usually means:
193
+
194
+ - closing the targeted follow-up in coordination state
195
+ - publishing a final integration position
196
+ - emitting a final `[wave-integration] state=ready-for-doc-closure ...` marker only when no meaningful contradiction or blocker remains
197
+
198
+ ### Step 5: Closure Can Continue
199
+
200
+ Only after that does the launcher allow the wave to move on to:
201
+
202
+ 1. documentation closure
203
+ 2. cont-QA closure
204
+
205
+ So the correct mental model is:
206
+
207
+ - A1 can finish first
208
+ - A8 may still owe wave-level closure work
209
+ - the wave does not pass just because the original implementation owner exited successfully
210
+
211
+ ## End-To-End Example: Clarification Chain
212
+
213
+ Assume an agent cannot safely choose between two interpretations of a migration rule.
214
+
215
+ The agent should emit a clarification request:
216
+
217
+ ```bash
218
+ pnpm exec wave coord post \
219
+ --lane main \
220
+ --wave 6 \
221
+ --agent A3 \
222
+ --kind clarification-request \
223
+ --summary "Need policy answer for backward-compat migration path" \
224
+ --detail "I checked the current-state doc and migration plan, but the required compatibility window is still ambiguous."
225
+ ```
226
+
227
+ What happens next:
228
+
229
+ 1. the launcher triages the clarification from repo policy, ownership, prior decisions, and routing context
230
+ 2. if it can answer inside the wave, it writes the resolution back into coordination state
231
+ 3. if another owner can answer it, the launcher opens a targeted follow-up request and keeps the clarification chain blocking
232
+ 4. only after policy and routed follow-up paths are exhausted does it create human feedback or escalation artifacts
233
+ 5. until that chain is resolved, clarification remains a closure barrier and any routed follow-up also remains blocking helper work
234
+
235
+ Important implication:
236
+
237
+ - even if code is landed, an open clarification chain can still block the wave
238
+ - a routed clarification that stays `open` past the acknowledgement policy can be rerouted during the same live attempt instead of waiting for a full retry cycle
239
+ - operators can now inspect and intervene through one command surface:
240
+
241
+ ```bash
242
+ pnpm exec wave control status --lane main --wave 10 --agent A7 --json
243
+ pnpm exec wave control task act reassign --lane main --wave 10 --id clarify-a7-rollout --to A1
244
+ pnpm exec wave control task act resolve --lane main --wave 10 --id escalation-clarify-a7-rollout --detail "Published command surface covers this question."
245
+ ```
246
+
247
+ That keeps clarification routing, dismissal, escalation, and human-answer handling inside the canonical coordination state instead of forcing ad hoc file edits.
248
+
249
+ ## End-To-End Example: Required Dependency
250
+
251
+ Assume the wave needs another lane to land a required API first.
252
+
253
+ That should be modeled as a required dependency ticket, not as a local deliverable.
254
+
255
+ Example:
256
+
257
+ ```bash
258
+ pnpm exec wave dep post \
259
+ --owner-lane release \
260
+ --requester-lane main \
261
+ --owner-wave 2 \
262
+ --requester-wave 4 \
263
+ --agent launcher \
264
+ --summary "Need release lane to publish session token contract before Wave 4 can close" \
265
+ --target capability:integration \
266
+ --required
267
+ ```
268
+
269
+ What happens next:
270
+
271
+ - the dependency appears in the per-wave dependency snapshot
272
+ - integration and inboxes surface it
273
+ - required inbound or outbound dependencies keep the wave blocked
274
+
275
+ This is separate from helper assignment logic:
276
+
277
+ - helper assignments are intra-wave follow-up work
278
+ - dependencies are cross-wave or cross-lane prerequisites
279
+
280
+ ## What Integration Actually Does
281
+
282
+ Integration is not a generic summary pass. It is the place where Wave asks:
283
+
284
+ - are there still unresolved blockers?
285
+ - do any agent claims contradict each other?
286
+ - are there still proof gaps?
287
+ - are there still deploy or infra risks?
288
+ - are there still documentation gaps?
289
+ - are helper assignments or dependencies still open?
290
+
291
+ If any of those remain material, the recommendation is `needs-more-work`.
292
+
293
+ Only when that synthesized state is clean does integration become `ready-for-doc-closure`.
294
+
295
+ This is why integration sits between raw implementation success and final docs or QA closure.
296
+
297
+ ## Why Closure Is Staged
298
+
299
+ Closure runs in order:
300
+
301
+ 1. `cont-EVAL`
302
+ 2. optional security review
303
+ 3. integration
304
+ 4. documentation
305
+ 5. `cont-QA`
306
+
307
+ That ordering exists to prevent false PASS outcomes.
308
+
309
+ Examples:
310
+
311
+ - `cont-EVAL` should not PASS if the declared eval contract is still unsatisfied
312
+ - security should run before final closure if findings could still change integration or rollout readiness
313
+ - documentation should not close while integration still says the story is unstable
314
+ - cont-QA should be last, because it is supposed to judge the final landed state
315
+
316
+ ## What Each Closure Role Must Prove
317
+
318
+ ### Implementation Owners
319
+
320
+ Implementation owners must prove their own exit contract, not just exit cleanly.
321
+
322
+ That means:
323
+
324
+ - proof state is `met`
325
+ - completion, durability, and proof level meet the contract
326
+ - documentation impact is reported correctly
327
+ - all declared deliverables exist
328
+ - all required proof artifacts exist
329
+
330
+ ### `cont-EVAL`
331
+
332
+ `cont-EVAL` must emit a final `[wave-eval]` marker and satisfy the declared target and benchmark contract.
333
+
334
+ For live closure, it is not enough to say "looks good." The target ids and benchmark ids must match the declared wave contract.
335
+
336
+ ### Security Review
337
+
338
+ If present, security review must emit a final `[wave-security]` marker and publish its report artifact.
339
+
340
+ - `blocked` stops the wave before integration
341
+ - `concerns` remains visible in summaries and traces
342
+ - `clear` is only valid when no unresolved findings or approvals remain
343
+
344
+ ### Integration
345
+
346
+ Integration must reconcile cross-agent state and report `ready-for-doc-closure` only when there is no remaining meaningful contradiction, blocker, proof gap, or deploy risk.
347
+
348
+ ### Documentation Steward
349
+
350
+ Documentation closure must emit `[wave-doc-closure]`.
351
+
352
+ The important distinction is:
353
+
354
+ - `closed` means the shared-plan delta was reconciled
355
+ - `no-change` means no shared-plan changes were required
356
+ - `delta` means documentation closure is still open
357
+
358
+ ### `cont-QA`
359
+
360
+ `cont-QA` must emit:
361
+
362
+ - a final verdict
363
+ - a final `[wave-gate]` marker
364
+
365
+ Final PASS requires all gate dimensions to pass in the final state.
366
+
367
+ ## Why The Closure Model Works
368
+
369
+ The closure model is deliberately conservative.
370
+
371
+ It works because it refuses to trust weak signals:
372
+
373
+ - a process exiting successfully
374
+ - a board note saying "done"
375
+ - one agent claiming success while another still reports contradiction
376
+ - stale prior attempt output
377
+
378
+ Instead, it trusts machine-visible current state:
379
+
380
+ - current coordination log state
381
+ - current assignment and dependency snapshots
382
+ - current integration summary
383
+ - current docs closure state
384
+ - current cont-QA and cont-EVAL markers
385
+ - current proof artifacts and deliverables
386
+
387
+ That gives Wave two useful properties:
388
+
389
+ - already-valid work can stay reusable
390
+ - the wave still refuses to PASS while open follow-up work remains
391
+
392
+ ## Targeted Retry Behavior
393
+
394
+ When closure fails, the launcher does not always relaunch the entire wave.
395
+
396
+ It tries to relaunch only the implicated owners:
397
+
398
+ - agents named by the failure
399
+ - sibling owners that still owe shared promoted-component proof after a landed owner already passed its slice
400
+ - helper assignees
401
+ - dependency owners where relevant
402
+ - the closure stewards needed after that state changes
403
+
404
+ That is why the system can safely reuse already-valid implementation slices while still forcing the wave to stay blocked until the right follow-up work is done.
405
+
406
+ Operators now have a first-class override path for that recovery flow:
407
+
408
+ ```bash
409
+ pnpm exec wave control rerun get --lane main --wave 10 --json
410
+ pnpm exec wave control rerun request --lane main --wave 10 --agent A2 --agent A7 --clear-reuse A2 --reason "Resume sibling-owned component closure"
411
+ ```
412
+
413
+ The canonical rerun request is written under `.tmp/<lane>-wave-launcher/control-plane/`, projected to `.tmp/<lane>-wave-launcher/control/` for compatibility, consumed by the launcher on the next retry decision, and then cleared by default after one application. This is the supported path for:
414
+
415
+ - rerunning only specific owners
416
+ - preserving explicit reuse selectors such as attempt ids, proof bundle ids, derived-summary reuse, and invalidated component ids through the compatibility projection
417
+ - clearing reuse for selected agents without wiping the whole wave state
418
+ - resuming at the real remaining implementation owners instead of restarting or stopping at the wrong sibling
419
+
420
+ ## Common Mistakes
421
+
422
+ - Treating `exit 0` as wave completion.
423
+ - Using a board note or `handoff` when the work should be a blocking targeted `request`.
424
+ - Modeling cross-agent follow-up as a deliverable instead of coordination work.
425
+ - Declaring integration ready while helper assignments or dependencies are still open.
426
+ - Treating documentation closure as optional after plan-affecting outcomes.
427
+ - Treating `cont-QA` as an implementation reviewer instead of the final closure gate.
428
+
429
+ ## Practical Rule Of Thumb
430
+
431
+ Ask two questions:
432
+
433
+ 1. "Did this agent finish its own owned outputs?"
434
+ 2. "Is the wave globally coherent enough that no other blocking owner still owes follow-up work?"
435
+
436
+ Wave only closes when both are true.
@@ -7,6 +7,8 @@ summary: "How to author proof-first `pilot-live` and higher-maturity waves with
7
7
 
8
8
  `pilot-live`, `fleet-ready`, `cutover-ready`, and `deprecation-ready` waves are not normal repo-only implementation waves.
9
9
 
10
+ For the general runtime model behind helper requests, integration, and final staged closure, see [docs/reference/coordination-and-closure.md](./coordination-and-closure.md).
11
+
10
12
  For these waves:
11
13
 
12
14
  - operator-run commands are part of closure
@@ -148,9 +150,29 @@ When new proof artifacts arrive after an earlier failed attempt, the right respo
148
150
  Typical pattern:
149
151
 
150
152
  1. operator captures the missing proof bundle locally
151
- 2. the proof owner reruns on the same executor
152
- 3. any stale synthesis or integration owner reruns if needed
153
- 4. already-valid implementation slices stay reused
153
+ 2. operator can register that bundle directly:
154
+
155
+ ```bash
156
+ pnpm exec wave control proof register \
157
+ --lane main \
158
+ --wave 8 \
159
+ --agent A6 \
160
+ --artifact .tmp/wave-8-learning-proof/learning-plane-before-restart.json \
161
+ --artifact .tmp/wave-8-learning-proof/learning-plane-after-restart.json \
162
+ --authoritative \
163
+ --satisfy-owned-components \
164
+ --completion live \
165
+ --durability durable \
166
+ --proof-level live \
167
+ --doc-delta owned \
168
+ --detail "Operator captured and verified restart evidence."
169
+ ```
170
+
171
+ 3. the proof owner reruns on the same executor only if additional synthesis is still needed
172
+ 4. any stale integration or closure owner reruns if needed
173
+ 5. already-valid implementation slices stay reused
174
+
175
+ Authoritative proof registration is the supported way to make operator-produced evidence visible to A8, A0, rerun control, and hermetic traces without forcing an implementation agent to rediscover the same local artifacts in a fresh session. The canonical proof bundle now lands in `.tmp/<lane>-wave-launcher/control-plane/` and is projected into `.tmp/<lane>-wave-launcher/proof/` for compatibility.
154
176
 
155
177
  ## Suggested Eval Targets For Live-Proof Waves
156
178
 
@@ -2,7 +2,7 @@
2
2
 
3
3
  This repo now includes a dedicated npmjs publish workflow at [publish-npm.yml](../../.github/workflows/publish-npm.yml).
4
4
 
5
- The current `0.6.1` release procedure publishes through a repository Actions secret named `NPM_TOKEN`.
5
+ The current `0.7.1` release procedure publishes through a repository Actions secret named `NPM_TOKEN`.
6
6
 
7
7
  ## What This Repo Already Does
8
8
 
@@ -18,7 +18,7 @@ The current `0.6.1` release procedure publishes through a repository Actions sec
18
18
  - package or scope access for `@chllming/wave-orchestration`
19
19
  - `Read and write` permission
20
20
  - `Bypass 2FA` enabled
21
- 2. In the GitHub repo `chllming/wave-orchestration`, add that token as an Actions secret named `NPM_TOKEN`.
21
+ 2. In the GitHub repo `chllming/agent-wave-orchestrator`, add that token as an Actions secret named `NPM_TOKEN`.
22
22
  3. Rotate or revoke the token when no longer needed.
23
23
 
24
24
  ## GitHub Workflow Behavior
@@ -47,6 +47,6 @@ If this repo later needs private npm dependencies during CI, consider a separate
47
47
  1. Confirm [publish-npm.yml](../../.github/workflows/publish-npm.yml) is on the default branch.
48
48
  2. Confirm `NPM_TOKEN` exists in the GitHub repo secrets.
49
49
  3. Confirm the package version has been bumped and committed.
50
- 4. Push the release commit and release tag, for example `v0.6.1`.
50
+ 4. Push the release commit and release tag, for example `v0.7.1`.
51
51
  5. Verify both `publish-npm.yml` and `publish-package.yml` start from the tag push.
52
52
  6. Verify the npmjs publish completes successfully for the tagged source.
@@ -0,0 +1,90 @@
1
+ ---
2
+ title: "Proof Metrics"
3
+ summary: "How Wave maps README multi-agent failure modes to concrete runtime telemetry and benchmark evidence."
4
+ ---
5
+
6
+ # Proof Metrics
7
+
8
+ This document turns the README failure cases into concrete proof obligations.
9
+
10
+ Wave does not treat these as narrative quality goals. The point of native telemetry is to gather enough durable evidence that we can answer:
11
+
12
+ - did the runtime behave as intended
13
+ - which proof signals back that claim
14
+ - where the system still fails or only partially proves the claim
15
+
16
+ For the event and artifact contract, see [wave-control.md](./wave-control.md).
17
+
18
+ ## Signal Map
19
+
20
+ | Failure case | Native telemetry to inspect | Benchmark evidence to inspect | What success should look like |
21
+ | --- | --- | --- | --- |
22
+ | `Cosmetic board, no canonical state` | `coordination_record`, `wave_run`, `attempt`, `artifact`, trace bundle metadata, control-plane raw log | `benchmark_run` attestation plus linked trace metadata for `full-wave` arms | The board, shared summary, and dashboards are projections over a durable JSONL/event trail, not the only record |
23
+ | `Hidden evidence never gets pooled` | evidence refs in `coordination_record`, proof-bundle artifacts, integration summary artifacts, closure timeline | `benchmark_item` review validity plus linked proof/verification artifacts | Decision-changing evidence can be traced from the owner agent into shared summary, integration, and final closure |
24
+ | `Communication without global-state reconstruction` | `gate` snapshots, integration summary artifacts, contradiction-repair traces, attempt timeline | distributed-reasoning benchmark items and validity buckets | Shared state converges on the correct integrated recommendation rather than only showing message traffic |
25
+ | `Simultaneous coordination collapse` | coordination backlog counts, open blockers, request/ack timing from task snapshots, dependency and helper-assignment barriers | `benchmark_item` wall clock, timeout reviews, harness-vs-model validity split | Multiple active blockers and cross-owner dependencies stay visible and closure is blocked until they resolve |
26
+ | `Expert signal gets averaged away` | targeted routing in assignments, `coordination_record.targets`, final owner on accepted recommendation, reroute history | task-level arm telemetry and benchmark outcome grouped by routing-heavy tasks | The accepted recommendation still comes from the appropriate owner or shows an explicit override reason |
27
+ | `Contradictions get smoothed over` | `gate` artifacts, contradiction-related coordination records, proof bundle supersession chain, retry/rerun control events | `review` validity and contradiction-oriented benchmark families | Material conflicts remain explicit and either produce repair work or block PASS |
28
+ | `Premature closure` | `gate` transitions, `proof_bundle`, `attempt`, `review`, final `wave_run` state, trace `outcome.json` | `review` validity buckets like `proof-blocked`, `benchmark-invalid`, and `trustworthy-model-failure` | PASS only appears after proof completeness, integration, and closure stewardship agree; reopen/rerun remains visible when PASS was premature |
29
+
30
+ ## Native Benchmark Metrics As Proof
31
+
32
+ `wave benchmark run` is the native proof surface for the coordination substrate. It matters because it lets us evaluate the Wave mechanics directly before live-model noise, runtime variance, or external harness issues enter the picture.
33
+
34
+ The native metric groups line up with the README claims:
35
+
36
+ - evidence pooling:
37
+ `distributed-info-accuracy`, `global-state-reconstruction-rate`, and `communication-reasoning-gap` tell us whether distributed facts became one correct integration-visible state instead of remaining split across private owner views
38
+ - projection fidelity:
39
+ `summary-fact-retention-rate`, `projection-consistency-rate`, `targeted-inbox-recall`, and `integration-coherence-rate` tell us whether the blackboard projections stayed faithful enough to be useful
40
+ - routing quality:
41
+ `capability-routing-precision`, `expert-preservation-rate`, and `expert-performance-gap` tell us whether specialization survives routing and synthesis
42
+ - contradiction handling:
43
+ `contradiction-detection-rate`, `repair-closure-rate`, and `false-consensus-rate` tell us whether conflicts become explicit repair work instead of narrative consensus
44
+ - closure discipline:
45
+ `latent-asymmetry-surfacing-rate` and `premature-convergence-rate` tell us whether the system notices missing evidence and keeps closure blocked until it is integrated
46
+ - simultaneous coordination:
47
+ `deadlock-rate`, `contention-resolution-rate`, and `symmetry-breaking-rate` tell us whether the team can coordinate under concurrent blockers rather than collapsing into lockstep failure
48
+
49
+ These metrics matter because Wave's core promise is not just "many agents talked." The promise is that the system reconstructs shared state, routes work intelligently, preserves important evidence through projections, and refuses to close while critical uncertainty remains.
50
+
51
+ The deterministic runner is strict about that distinction:
52
+
53
+ - global reconstruction is scored from integration-visible artifacts, not the union of every inbox
54
+ - clarification surfacing is scored from explicit record ids, so a metric only moves when the missing-evidence record is actually preserved in the generated artifacts
55
+ - family summaries and deltas are direction-aligned, so lower-is-better guard metrics do not invert the headline comparison
56
+
57
+ ## Native Views To Build Around
58
+
59
+ The minimum useful derived views are:
60
+
61
+ - closure fidelity:
62
+ track gate transitions, proof completeness, blocked reasons, and any rerun after a would-be PASS
63
+ - evidence pooling:
64
+ track whether integration and closure cite the proof artifacts and evidence refs that mattered
65
+ - contradiction handling:
66
+ track open conflicts, superseded proof bundles, repair work, and unresolved contradiction count at finish
67
+ - coordination pressure:
68
+ track open tasks, human escalations, stale clarifications, assignment lag, and dependency barriers
69
+ - benchmark trust:
70
+ keep verifier/setup invalidation separate from real capability failure
71
+
72
+ ## Recommended Success Criteria
73
+
74
+ For a run to count as evidence that Wave is working as intended, prefer all of the following:
75
+
76
+ 1. The run has a durable `wave_run` plus `attempt` timeline.
77
+ 2. The trace bundle contains `run-metadata.json`, `quality.json`, and `outcome.json`.
78
+ 3. Closure evidence is visible through `gate` and `proof_bundle` events rather than only markdown text.
79
+ 4. If the run includes a benchmark, the result has explicit `benchmark_run`, `benchmark_item`, `verification`, and `review` records.
80
+ 5. Invalid or unpublishable benchmark outcomes are still retained, but labeled as such.
81
+
82
+ ## Current Limits
83
+
84
+ Current telemetry proves more than the old file-by-file reporting, but it is not yet perfect:
85
+
86
+ - v1 tracks evidence refs and artifact lineage at the event/artifact level, not stable fact ids
87
+ - expert-routing proof currently comes from assignment/reroute ownership and accepted final owner, not a dedicated expert-override schema
88
+ - contradiction evidence is visible through gate state, review disposition, and coordination records, but not yet as a standalone normalized contradiction entity
89
+
90
+ Those gaps should be treated as visibility work, not as permission to fall back to narrative-only conclusions.