npm - @chllming/wave-orchestration - Versions diffs - 0.6.3 → 0.7.1 - Mend

@chllming/wave-orchestration 0.6.3 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/CHANGELOG.md +82 -1
package/README.md +40 -7
package/docs/agents/wave-orchestrator-role.md +50 -0
package/docs/agents/wave-planner-role.md +39 -0
package/docs/context7/bundles.json +9 -0
package/docs/context7/planner-agent/README.md +25 -0
package/docs/context7/planner-agent/manifest.json +83 -0
package/docs/context7/planner-agent/papers/cooperbench-why-coding-agents-cannot-be-your-teammates-yet.md +3283 -0
package/docs/context7/planner-agent/papers/dova-deliberation-first-multi-agent-orchestration-for-autonomous-research-automation.md +1699 -0
package/docs/context7/planner-agent/papers/dpbench-large-language-models-struggle-with-simultaneous-coordination.md +2251 -0
package/docs/context7/planner-agent/papers/incremental-planning-to-control-a-blackboard-based-problem-solver.md +1729 -0
package/docs/context7/planner-agent/papers/silo-bench-a-scalable-environment-for-evaluating-distributed-coordination-in-multi-agent-llm-systems.md +3747 -0
package/docs/context7/planner-agent/papers/todoevolve-learning-to-architect-agent-planning-systems.md +1675 -0
package/docs/context7/planner-agent/papers/verified-multi-agent-orchestration-a-plan-execute-verify-replan-framework-for-complex-query-resolution.md +1173 -0
package/docs/context7/planner-agent/papers/why-do-multi-agent-llm-systems-fail.md +5211 -0
package/docs/context7/planner-agent/topics/planning-and-orchestration.md +24 -0
package/docs/evals/README.md +96 -1
package/docs/evals/arm-templates/README.md +13 -0
package/docs/evals/arm-templates/full-wave.json +15 -0
package/docs/evals/arm-templates/single-agent.json +15 -0
package/docs/evals/benchmark-catalog.json +7 -0
package/docs/evals/cases/README.md +47 -0
package/docs/evals/cases/wave-blackboard-inbox-targeting.json +73 -0
package/docs/evals/cases/wave-contradiction-conflict.json +104 -0
package/docs/evals/cases/wave-expert-routing-preservation.json +69 -0
package/docs/evals/cases/wave-hidden-profile-private-evidence.json +81 -0
package/docs/evals/cases/wave-premature-closure-guard.json +71 -0
package/docs/evals/cases/wave-silo-cross-agent-state.json +77 -0
package/docs/evals/cases/wave-simultaneous-lockstep.json +92 -0
package/docs/evals/cooperbench/real-world-mitigation.md +341 -0
package/docs/evals/external-benchmarks.json +85 -0
package/docs/evals/external-command-config.sample.json +9 -0
package/docs/evals/external-command-config.swe-bench-pro.json +8 -0
package/docs/evals/pilots/README.md +47 -0
package/docs/evals/pilots/swe-bench-pro-public-full-wave-review-10.json +64 -0
package/docs/evals/pilots/swe-bench-pro-public-pilot.json +111 -0
package/docs/evals/wave-benchmark-program.md +302 -0
package/docs/guides/planner.md +67 -11
package/docs/guides/terminal-surfaces.md +12 -0
package/docs/plans/context7-wave-orchestrator.md +20 -0
package/docs/plans/current-state.md +8 -1
package/docs/plans/examples/wave-benchmark-improvement.md +108 -0
package/docs/plans/examples/wave-example-live-proof.md +1 -1
package/docs/plans/examples/wave-example-rollout-fidelity.md +340 -0
package/docs/plans/migration.md +26 -0
package/docs/plans/wave-orchestrator.md +60 -12
package/docs/plans/waves/reviews/wave-1-benchmark-operator.md +118 -0
package/docs/reference/cli-reference.md +547 -0
package/docs/reference/coordination-and-closure.md +436 -0
package/docs/reference/live-proof-waves.md +25 -3
package/docs/reference/npmjs-trusted-publishing.md +3 -3
package/docs/reference/proof-metrics.md +90 -0
package/docs/reference/runtime-config/README.md +63 -2
package/docs/reference/runtime-config/codex.md +2 -1
package/docs/reference/sample-waves.md +29 -18
package/docs/reference/wave-control.md +164 -0
package/docs/reference/wave-planning-lessons.md +131 -0
package/package.json +5 -4
package/releases/manifest.json +40 -0
package/scripts/research/agent-context-archive.mjs +18 -0
package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +17 -0
package/scripts/research/sync-planner-context7-bundle.mjs +133 -0
package/scripts/wave-orchestrator/agent-state.mjs +11 -2
package/scripts/wave-orchestrator/artifact-schemas.mjs +232 -0
package/scripts/wave-orchestrator/autonomous.mjs +7 -0
package/scripts/wave-orchestrator/benchmark-cases.mjs +374 -0
package/scripts/wave-orchestrator/benchmark-external.mjs +1384 -0
package/scripts/wave-orchestrator/benchmark.mjs +972 -0
package/scripts/wave-orchestrator/clarification-triage.mjs +78 -12
package/scripts/wave-orchestrator/config.mjs +175 -0
package/scripts/wave-orchestrator/control-cli.mjs +1216 -0
package/scripts/wave-orchestrator/control-plane.mjs +697 -0
package/scripts/wave-orchestrator/coord-cli.mjs +360 -2
package/scripts/wave-orchestrator/coordination-store.mjs +211 -9
package/scripts/wave-orchestrator/coordination.mjs +84 -0
package/scripts/wave-orchestrator/dashboard-renderer.mjs +120 -5
package/scripts/wave-orchestrator/dashboard-state.mjs +22 -0
package/scripts/wave-orchestrator/evals.mjs +23 -0
package/scripts/wave-orchestrator/executors.mjs +3 -2
package/scripts/wave-orchestrator/feedback.mjs +55 -0
package/scripts/wave-orchestrator/install.mjs +151 -2
package/scripts/wave-orchestrator/launcher-closure.mjs +4 -1
package/scripts/wave-orchestrator/launcher-runtime.mjs +33 -30
package/scripts/wave-orchestrator/launcher.mjs +884 -36
package/scripts/wave-orchestrator/planner-context.mjs +75 -0
package/scripts/wave-orchestrator/planner.mjs +2270 -136
package/scripts/wave-orchestrator/proof-cli.mjs +195 -0
package/scripts/wave-orchestrator/proof-registry.mjs +317 -0
package/scripts/wave-orchestrator/replay.mjs +10 -4
package/scripts/wave-orchestrator/retry-cli.mjs +184 -0
package/scripts/wave-orchestrator/retry-control.mjs +225 -0
package/scripts/wave-orchestrator/shared.mjs +26 -0
package/scripts/wave-orchestrator/swe-bench-pro-task.mjs +1004 -0
package/scripts/wave-orchestrator/terminals.mjs +1 -1
package/scripts/wave-orchestrator/traces.mjs +157 -2
package/scripts/wave-orchestrator/wave-control-client.mjs +532 -0
package/scripts/wave-orchestrator/wave-control-schema.mjs +309 -0
package/scripts/wave-orchestrator/wave-files.mjs +144 -23
package/scripts/wave.mjs +27 -0
package/skills/repo-coding-rules/SKILL.md +1 -0
package/skills/role-cont-eval/SKILL.md +1 -0
package/skills/role-cont-qa/SKILL.md +13 -6
package/skills/role-deploy/SKILL.md +1 -0
package/skills/role-documentation/SKILL.md +4 -0
package/skills/role-implementation/SKILL.md +4 -0
package/skills/role-infra/SKILL.md +2 -1
package/skills/role-integration/SKILL.md +15 -8
package/skills/role-planner/SKILL.md +39 -0
package/skills/role-planner/skill.json +21 -0
package/skills/role-research/SKILL.md +1 -0
package/skills/role-security/SKILL.md +2 -2
package/skills/runtime-claude/SKILL.md +2 -1
package/skills/runtime-codex/SKILL.md +1 -0
package/skills/runtime-local/SKILL.md +2 -0
package/skills/runtime-opencode/SKILL.md +1 -0
package/skills/wave-core/SKILL.md +25 -6
package/skills/wave-core/references/marker-syntax.md +16 -8
package/wave.config.json +45 -0

package/docs/reference/coordination-and-closure.md ADDED Viewed

@@ -0,0 +1,436 @@
+---
+title: "Coordination And Closure"
+summary: "How agent-to-agent work, deliverables, integration, and final closure behave end to end in the Wave runtime."
+---
+# Coordination And Closure
+This page explains the runtime model behind Wave coordination, helper work, integration, and final closure.
+The short version is:
+- `exit 0` means an agent process finished
+- it does not mean the wave is ready to close
+- closure is based on durable coordination state plus the staged closure gates
+## Core Model
+Wave distinguishes three different things:
+1. an agent finishing its own owned work
+2. an agent asking another agent or lane for follow-up work
+3. the wave being globally coherent enough to pass integration, documentation, and cont-QA closure
+Those are related, but they are not the same.
+An implementation agent can be locally complete and still leave the wave blocked if it created open helper work, unresolved clarification chains, or required dependencies.
+## Durable State Surfaces
+The runtime writes several different artifacts, but they do different jobs:
+- canonical coordination log:
+  `.tmp/<lane>-wave-launcher/coordination/wave-<n>.jsonl`
+- helper-assignment snapshot:
+  `.tmp/<lane>-wave-launcher/assignments/wave-<n>.json`
+- dependency snapshot:
+  `.tmp/<lane>-wave-launcher/dependencies/wave-<n>.json`
+- shared summary:
+  `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/shared-summary.md`
+- per-agent inboxes:
+  `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/<agent>.md`
+- integration summary:
+  `.tmp/<lane>-wave-launcher/integration/wave-<n>.json`
+- wave dashboard:
+  `.tmp/<lane>-wave-launcher/dashboards/wave-<n>.json`
+- run-state:
+  `.tmp/<lane>-wave-launcher/run-state.json`
+The important rule is that the JSONL coordination log is the scheduler truth. The markdown board is a projection for humans. See [wave-orchestrator.md](../plans/wave-orchestrator.md).
+Live waves now keep refreshing that derived state while agents are still running. Shared summaries, inboxes, dashboard coordination metrics, and clarification routing are not only recomputed at attempt boundaries; they are also refreshed during active wave execution so stale clarification and acknowledgement timing is machine-visible before the attempt ends.
+## What Agents Should Use
+Use the coordination log for durable state:
+- `request`
+  Use this when you need another agent or capability owner to do work. Target it explicitly. This is the kind that becomes a helper assignment.
+- `blocker`
+  Use this when the wave is blocked, but not because the launcher needs to route work to a specific assignee.
+- `handoff`
+  Use this for continuity and context transfer. This is informative by itself; it is not the same as a blocking helper assignment.
+- `evidence`
+  Use this for durable facts, artifacts, or proof that another agent may need.
+- `claim`
+  Use this for assertions that integration should reconcile.
+- `clarification-request`
+  Use this when an ambiguity must be triaged before work can safely continue.
+## What Stewards and Orchestrators May Also Use
+- `ack`
+  Acknowledge receipt of a request or clarification. Resets the acknowledgement timer.
+- `decision`
+  Record a binding decision that downstream agents should follow.
+- `orchestrator-guidance`
+  Non-binding guidance from the resident orchestrator.
+Implementation agents normally do not need these kinds.
+Practical rule:
+- if you need another agent to take action and you want the wave to stay blocked until it is done, use a targeted `request`
+- a plain board note or plain `handoff` is not enough
+## Open Versus Resolved
+Wave treats these coordination statuses as open:
+- `open`
+- `acknowledged`
+- `in_progress`
+It treats these as non-blocking:
+- `resolved`
+- `closed`
+- `superseded`
+- `cancelled`
+That means a targeted helper request keeps blocking until the request leaves the open set in coordination state.
+This page is documenting runtime semantics first. The important contract is that closure follows the durable coordination state, not that a particular human or agent used one exact command path to mutate it.
+## Deliverables Versus Helper Work
+Deliverables prove an agent landed its own owned outputs.
+For implementation agents with an exit contract, closure validates:
+- `[wave-proof]`
+- `[wave-doc-delta]`
+- any required `[wave-component]` markers
+- declared `### Deliverables`
+- declared `### Proof artifacts`
+Deliverables and proof artifacts are local ownership proof. They do not replace cross-agent follow-up.
+That distinction matters:
+- if Agent A1 owns `src/foo.ts` and `docs/reviews/foo.md`, those should be modeled as A1 deliverables
+- if A1 needs Agent A8 to reconcile a cross-component interface or integration contradiction, that is not an A1 deliverable
+- that second case is coordination work, and it should become a targeted request
+## End-To-End Example: Agent A1 Needs A8
+Assume:
+- A1 owns implementation files and its review output
+- A8 is the integration steward
+- A1 finishes its code and report, but notices an interface contradiction that only A8 can reconcile
+### Step 1: A1 Lands Its Owned Work
+A1 can still satisfy its own slice by:
+- writing its owned files
+- emitting a valid `[wave-proof]`
+- emitting a valid `[wave-doc-delta]`
+- satisfying any declared deliverables and proof artifacts
+At this point A1 can be locally done.
+### Step 2: A1 Raises A Durable Request
+Example:
+```bash
+pnpm exec wave control task create \
+  --lane main \
+  --wave 4 \
+  --agent A1 \
+  --kind request \
+  --summary "Need integration decision for auth/session interface change" \
+  --detail "A1 landed the auth refactor, but session ownership now spans auth, gateway, and docs surfaces. A8 must reconcile the final contract and closure path." \
+  --target agent:A8 \
+  --priority high
+```
+What happens next:
+- the request lands in the canonical coordination log
+- the launcher derives a helper assignment for `agent:A8`
+- that assignment is written into the assignment snapshot
+- the shared summary and A8 inbox now show the open helper work
+`wave control task list` and `wave control task get` surface both blocking and informative coordination kinds. `wave control status` only turns `request`, `blocker`, `clarification-request`, `human-feedback`, and `human-escalation` into blocking task edges; plain `handoff`, `evidence`, `claim`, and `decision` records stay visible without falsely blocking the owner. When a launcher attempt is already running, status scopes the top-level blocking edge to that active attempt instead of letting stale relaunch metadata or unrelated closure tasks dominate the wave-level view.
+### Step 3: Why A1 Can Be Done But The Wave Is Still Blocked
+This is the important distinction:
+- A1 may be done with A1's ownership
+- the wave is not done
+The launcher will still see:
+- an open helper assignment for the request
+- an integration summary that is not yet ready for doc closure
+So the wave remains blocked.
+In runtime terms, this becomes:
+- `helper-assignment-open` if the request has an assignee
+- `helper-assignment-unresolved` if no assignee could be found
+### Step 4: A8 Resolves The Follow-Up
+A8 reads the shared summary and inbox, reconciles the issue, and updates the integration state.
+That usually means:
+- closing the targeted follow-up in coordination state
+- publishing a final integration position
+- emitting a final `[wave-integration] state=ready-for-doc-closure ...` marker only when no meaningful contradiction or blocker remains
+### Step 5: Closure Can Continue
+Only after that does the launcher allow the wave to move on to:
+1. documentation closure
+2. cont-QA closure
+So the correct mental model is:
+- A1 can finish first
+- A8 may still owe wave-level closure work
+- the wave does not pass just because the original implementation owner exited successfully
+## End-To-End Example: Clarification Chain
+Assume an agent cannot safely choose between two interpretations of a migration rule.
+The agent should emit a clarification request:
+```bash
+pnpm exec wave coord post \
+  --lane main \
+  --wave 6 \
+  --agent A3 \
+  --kind clarification-request \
+  --summary "Need policy answer for backward-compat migration path" \
+  --detail "I checked the current-state doc and migration plan, but the required compatibility window is still ambiguous."
+```
+What happens next:
+1. the launcher triages the clarification from repo policy, ownership, prior decisions, and routing context
+2. if it can answer inside the wave, it writes the resolution back into coordination state
+3. if another owner can answer it, the launcher opens a targeted follow-up request and keeps the clarification chain blocking
+4. only after policy and routed follow-up paths are exhausted does it create human feedback or escalation artifacts
+5. until that chain is resolved, clarification remains a closure barrier and any routed follow-up also remains blocking helper work
+Important implication:
+- even if code is landed, an open clarification chain can still block the wave
+- a routed clarification that stays `open` past the acknowledgement policy can be rerouted during the same live attempt instead of waiting for a full retry cycle
+- operators can now inspect and intervene through one command surface:
+```bash
+pnpm exec wave control status --lane main --wave 10 --agent A7 --json
+pnpm exec wave control task act reassign --lane main --wave 10 --id clarify-a7-rollout --to A1
+pnpm exec wave control task act resolve --lane main --wave 10 --id escalation-clarify-a7-rollout --detail "Published command surface covers this question."
+```
+That keeps clarification routing, dismissal, escalation, and human-answer handling inside the canonical coordination state instead of forcing ad hoc file edits.
+## End-To-End Example: Required Dependency
+Assume the wave needs another lane to land a required API first.
+That should be modeled as a required dependency ticket, not as a local deliverable.
+Example:
+```bash
+pnpm exec wave dep post \
+  --owner-lane release \
+  --requester-lane main \
+  --owner-wave 2 \
+  --requester-wave 4 \
+  --agent launcher \
+  --summary "Need release lane to publish session token contract before Wave 4 can close" \
+  --target capability:integration \
+  --required
+```
+What happens next:
+- the dependency appears in the per-wave dependency snapshot
+- integration and inboxes surface it
+- required inbound or outbound dependencies keep the wave blocked
+This is separate from helper assignment logic:
+- helper assignments are intra-wave follow-up work
+- dependencies are cross-wave or cross-lane prerequisites
+## What Integration Actually Does
+Integration is not a generic summary pass. It is the place where Wave asks:
+- are there still unresolved blockers?
+- do any agent claims contradict each other?
+- are there still proof gaps?
+- are there still deploy or infra risks?
+- are there still documentation gaps?
+- are helper assignments or dependencies still open?
+If any of those remain material, the recommendation is `needs-more-work`.
+Only when that synthesized state is clean does integration become `ready-for-doc-closure`.
+This is why integration sits between raw implementation success and final docs or QA closure.
+## Why Closure Is Staged
+Closure runs in order:
+1. `cont-EVAL`
+2. optional security review
+3. integration
+4. documentation
+5. `cont-QA`
+That ordering exists to prevent false PASS outcomes.
+Examples:
+- `cont-EVAL` should not PASS if the declared eval contract is still unsatisfied
+- security should run before final closure if findings could still change integration or rollout readiness
+- documentation should not close while integration still says the story is unstable
+- cont-QA should be last, because it is supposed to judge the final landed state
+## What Each Closure Role Must Prove
+### Implementation Owners
+Implementation owners must prove their own exit contract, not just exit cleanly.
+That means:
+- proof state is `met`
+- completion, durability, and proof level meet the contract
+- documentation impact is reported correctly
+- all declared deliverables exist
+- all required proof artifacts exist
+### `cont-EVAL`
+`cont-EVAL` must emit a final `[wave-eval]` marker and satisfy the declared target and benchmark contract.
+For live closure, it is not enough to say "looks good." The target ids and benchmark ids must match the declared wave contract.
+### Security Review
+If present, security review must emit a final `[wave-security]` marker and publish its report artifact.
+- `blocked` stops the wave before integration
+- `concerns` remains visible in summaries and traces
+- `clear` is only valid when no unresolved findings or approvals remain
+### Integration
+Integration must reconcile cross-agent state and report `ready-for-doc-closure` only when there is no remaining meaningful contradiction, blocker, proof gap, or deploy risk.
+### Documentation Steward
+Documentation closure must emit `[wave-doc-closure]`.
+The important distinction is:
+- `closed` means the shared-plan delta was reconciled
+- `no-change` means no shared-plan changes were required
+- `delta` means documentation closure is still open
+### `cont-QA`
+`cont-QA` must emit:
+- a final verdict
+- a final `[wave-gate]` marker
+Final PASS requires all gate dimensions to pass in the final state.
+## Why The Closure Model Works
+The closure model is deliberately conservative.
+It works because it refuses to trust weak signals:
+- a process exiting successfully
+- a board note saying "done"
+- one agent claiming success while another still reports contradiction
+- stale prior attempt output
+Instead, it trusts machine-visible current state:
+- current coordination log state
+- current assignment and dependency snapshots
+- current integration summary
+- current docs closure state
+- current cont-QA and cont-EVAL markers
+- current proof artifacts and deliverables
+That gives Wave two useful properties:
+- already-valid work can stay reusable
+- the wave still refuses to PASS while open follow-up work remains
+## Targeted Retry Behavior
+When closure fails, the launcher does not always relaunch the entire wave.
+It tries to relaunch only the implicated owners:
+- agents named by the failure
+- sibling owners that still owe shared promoted-component proof after a landed owner already passed its slice
+- helper assignees
+- dependency owners where relevant
+- the closure stewards needed after that state changes
+That is why the system can safely reuse already-valid implementation slices while still forcing the wave to stay blocked until the right follow-up work is done.
+Operators now have a first-class override path for that recovery flow:
+```bash
+pnpm exec wave control rerun get --lane main --wave 10 --json
+pnpm exec wave control rerun request --lane main --wave 10 --agent A2 --agent A7 --clear-reuse A2 --reason "Resume sibling-owned component closure"
+```
+The canonical rerun request is written under `.tmp/<lane>-wave-launcher/control-plane/`, projected to `.tmp/<lane>-wave-launcher/control/` for compatibility, consumed by the launcher on the next retry decision, and then cleared by default after one application. This is the supported path for:
+- rerunning only specific owners
+- preserving explicit reuse selectors such as attempt ids, proof bundle ids, derived-summary reuse, and invalidated component ids through the compatibility projection
+- clearing reuse for selected agents without wiping the whole wave state
+- resuming at the real remaining implementation owners instead of restarting or stopping at the wrong sibling
+## Common Mistakes
+- Treating `exit 0` as wave completion.
+- Using a board note or `handoff` when the work should be a blocking targeted `request`.
+- Modeling cross-agent follow-up as a deliverable instead of coordination work.
+- Declaring integration ready while helper assignments or dependencies are still open.
+- Treating documentation closure as optional after plan-affecting outcomes.
+- Treating `cont-QA` as an implementation reviewer instead of the final closure gate.
+## Practical Rule Of Thumb
+Ask two questions:
+1. "Did this agent finish its own owned outputs?"
+2. "Is the wave globally coherent enough that no other blocking owner still owes follow-up work?"
+Wave only closes when both are true.

package/docs/reference/live-proof-waves.md CHANGED Viewed

@@ -7,6 +7,8 @@ summary: "How to author proof-first `pilot-live` and higher-maturity waves with
 `pilot-live`, `fleet-ready`, `cutover-ready`, and `deprecation-ready` waves are not normal repo-only implementation waves.
+For the general runtime model behind helper requests, integration, and final staged closure, see [docs/reference/coordination-and-closure.md](./coordination-and-closure.md).
 For these waves:
 - operator-run commands are part of closure
@@ -148,9 +150,29 @@ When new proof artifacts arrive after an earlier failed attempt, the right respo
 Typical pattern:
 1. operator captures the missing proof bundle locally
-2. the proof owner reruns on the same executor
-3. any stale synthesis or integration owner reruns if needed
-4. already-valid implementation slices stay reused
+2. operator can register that bundle directly:
+```bash
+pnpm exec wave control proof register \
+  --lane main \
+  --wave 8 \
+  --agent A6 \
+  --artifact .tmp/wave-8-learning-proof/learning-plane-before-restart.json \
+  --artifact .tmp/wave-8-learning-proof/learning-plane-after-restart.json \
+  --authoritative \
+  --satisfy-owned-components \
+  --completion live \
+  --durability durable \
+  --proof-level live \
+  --doc-delta owned \
+  --detail "Operator captured and verified restart evidence."
+```
+3. the proof owner reruns on the same executor only if additional synthesis is still needed
+4. any stale integration or closure owner reruns if needed
+5. already-valid implementation slices stay reused
+Authoritative proof registration is the supported way to make operator-produced evidence visible to A8, A0, rerun control, and hermetic traces without forcing an implementation agent to rediscover the same local artifacts in a fresh session. The canonical proof bundle now lands in `.tmp/<lane>-wave-launcher/control-plane/` and is projected into `.tmp/<lane>-wave-launcher/proof/` for compatibility.
 ## Suggested Eval Targets For Live-Proof Waves

package/docs/reference/npmjs-trusted-publishing.md CHANGED Viewed

@@ -2,7 +2,7 @@
 This repo now includes a dedicated npmjs publish workflow at [publish-npm.yml](../../.github/workflows/publish-npm.yml).
-The current `0.6.1` release procedure publishes through a repository Actions secret named `NPM_TOKEN`.
+The current `0.7.1` release procedure publishes through a repository Actions secret named `NPM_TOKEN`.
 ## What This Repo Already Does
@@ -18,7 +18,7 @@ The current `0.6.1` release procedure publishes through a repository Actions sec
    - package or scope access for `@chllming/wave-orchestration`
    - `Read and write` permission
    - `Bypass 2FA` enabled
-2. In the GitHub repo `chllming/wave-orchestration`, add that token as an Actions secret named `NPM_TOKEN`.
+2. In the GitHub repo `chllming/agent-wave-orchestrator`, add that token as an Actions secret named `NPM_TOKEN`.
 3. Rotate or revoke the token when no longer needed.
 ## GitHub Workflow Behavior
@@ -47,6 +47,6 @@ If this repo later needs private npm dependencies during CI, consider a separate
 1. Confirm [publish-npm.yml](../../.github/workflows/publish-npm.yml) is on the default branch.
 2. Confirm `NPM_TOKEN` exists in the GitHub repo secrets.
 3. Confirm the package version has been bumped and committed.
-4. Push the release commit and release tag, for example `v0.6.1`.
+4. Push the release commit and release tag, for example `v0.7.1`.
 5. Verify both `publish-npm.yml` and `publish-package.yml` start from the tag push.
 6. Verify the npmjs publish completes successfully for the tagged source.

package/docs/reference/proof-metrics.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+title: "Proof Metrics"
+summary: "How Wave maps README multi-agent failure modes to concrete runtime telemetry and benchmark evidence."
+---
+# Proof Metrics
+This document turns the README failure cases into concrete proof obligations.
+Wave does not treat these as narrative quality goals. The point of native telemetry is to gather enough durable evidence that we can answer:
+- did the runtime behave as intended
+- which proof signals back that claim
+- where the system still fails or only partially proves the claim
+For the event and artifact contract, see [wave-control.md](./wave-control.md).
+## Signal Map
+| Failure case | Native telemetry to inspect | Benchmark evidence to inspect | What success should look like |
+| --- | --- | --- | --- |
+| `Cosmetic board, no canonical state` | `coordination_record`, `wave_run`, `attempt`, `artifact`, trace bundle metadata, control-plane raw log | `benchmark_run` attestation plus linked trace metadata for `full-wave` arms | The board, shared summary, and dashboards are projections over a durable JSONL/event trail, not the only record |
+| `Hidden evidence never gets pooled` | evidence refs in `coordination_record`, proof-bundle artifacts, integration summary artifacts, closure timeline | `benchmark_item` review validity plus linked proof/verification artifacts | Decision-changing evidence can be traced from the owner agent into shared summary, integration, and final closure |
+| `Communication without global-state reconstruction` | `gate` snapshots, integration summary artifacts, contradiction-repair traces, attempt timeline | distributed-reasoning benchmark items and validity buckets | Shared state converges on the correct integrated recommendation rather than only showing message traffic |
+| `Simultaneous coordination collapse` | coordination backlog counts, open blockers, request/ack timing from task snapshots, dependency and helper-assignment barriers | `benchmark_item` wall clock, timeout reviews, harness-vs-model validity split | Multiple active blockers and cross-owner dependencies stay visible and closure is blocked until they resolve |
+| `Expert signal gets averaged away` | targeted routing in assignments, `coordination_record.targets`, final owner on accepted recommendation, reroute history | task-level arm telemetry and benchmark outcome grouped by routing-heavy tasks | The accepted recommendation still comes from the appropriate owner or shows an explicit override reason |
+| `Contradictions get smoothed over` | `gate` artifacts, contradiction-related coordination records, proof bundle supersession chain, retry/rerun control events | `review` validity and contradiction-oriented benchmark families | Material conflicts remain explicit and either produce repair work or block PASS |
+| `Premature closure` | `gate` transitions, `proof_bundle`, `attempt`, `review`, final `wave_run` state, trace `outcome.json` | `review` validity buckets like `proof-blocked`, `benchmark-invalid`, and `trustworthy-model-failure` | PASS only appears after proof completeness, integration, and closure stewardship agree; reopen/rerun remains visible when PASS was premature |
+## Native Benchmark Metrics As Proof
+`wave benchmark run` is the native proof surface for the coordination substrate. It matters because it lets us evaluate the Wave mechanics directly before live-model noise, runtime variance, or external harness issues enter the picture.
+The native metric groups line up with the README claims:
+- evidence pooling:
+  `distributed-info-accuracy`, `global-state-reconstruction-rate`, and `communication-reasoning-gap` tell us whether distributed facts became one correct integration-visible state instead of remaining split across private owner views
+- projection fidelity:
+  `summary-fact-retention-rate`, `projection-consistency-rate`, `targeted-inbox-recall`, and `integration-coherence-rate` tell us whether the blackboard projections stayed faithful enough to be useful
+- routing quality:
+  `capability-routing-precision`, `expert-preservation-rate`, and `expert-performance-gap` tell us whether specialization survives routing and synthesis
+- contradiction handling:
+  `contradiction-detection-rate`, `repair-closure-rate`, and `false-consensus-rate` tell us whether conflicts become explicit repair work instead of narrative consensus
+- closure discipline:
+  `latent-asymmetry-surfacing-rate` and `premature-convergence-rate` tell us whether the system notices missing evidence and keeps closure blocked until it is integrated
+- simultaneous coordination:
+  `deadlock-rate`, `contention-resolution-rate`, and `symmetry-breaking-rate` tell us whether the team can coordinate under concurrent blockers rather than collapsing into lockstep failure
+These metrics matter because Wave's core promise is not just "many agents talked." The promise is that the system reconstructs shared state, routes work intelligently, preserves important evidence through projections, and refuses to close while critical uncertainty remains.
+The deterministic runner is strict about that distinction:
+- global reconstruction is scored from integration-visible artifacts, not the union of every inbox
+- clarification surfacing is scored from explicit record ids, so a metric only moves when the missing-evidence record is actually preserved in the generated artifacts
+- family summaries and deltas are direction-aligned, so lower-is-better guard metrics do not invert the headline comparison
+## Native Views To Build Around
+The minimum useful derived views are:
+- closure fidelity:
+  track gate transitions, proof completeness, blocked reasons, and any rerun after a would-be PASS
+- evidence pooling:
+  track whether integration and closure cite the proof artifacts and evidence refs that mattered
+- contradiction handling:
+  track open conflicts, superseded proof bundles, repair work, and unresolved contradiction count at finish
+- coordination pressure:
+  track open tasks, human escalations, stale clarifications, assignment lag, and dependency barriers
+- benchmark trust:
+  keep verifier/setup invalidation separate from real capability failure
+## Recommended Success Criteria
+For a run to count as evidence that Wave is working as intended, prefer all of the following:
+1. The run has a durable `wave_run` plus `attempt` timeline.
+2. The trace bundle contains `run-metadata.json`, `quality.json`, and `outcome.json`.
+3. Closure evidence is visible through `gate` and `proof_bundle` events rather than only markdown text.
+4. If the run includes a benchmark, the result has explicit `benchmark_run`, `benchmark_item`, `verification`, and `review` records.
+5. Invalid or unpublishable benchmark outcomes are still retained, but labeled as such.
+## Current Limits
+Current telemetry proves more than the old file-by-file reporting, but it is not yet perfect:
+- v1 tracks evidence refs and artifact lineage at the event/artifact level, not stable fact ids
+- expert-routing proof currently comes from assignment/reroute ownership and accepted final owner, not a dedicated expert-override schema
+- contradiction evidence is visible through gate state, review disposition, and coordination records, but not yet as a standalone normalized contradiction entity
+Those gaps should be treated as visibility work, not as permission to fall back to narrative-only conclusions.