npm - cool-workflow - Versions diffs - 0.1.78 - Mend

cool-workflow 0.1.78

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (193) hide show

package/.claude-plugin/plugin.json +20 -0
package/.codex-plugin/mcp.json +10 -0
package/.codex-plugin/plugin.json +38 -0
package/.mcp.json +10 -0
package/LICENSE +24 -0
package/README.md +638 -0
package/apps/architecture-review/app.json +51 -0
package/apps/architecture-review/workflow.js +116 -0
package/apps/end-to-end-golden-path/app.json +30 -0
package/apps/end-to-end-golden-path/workflow.js +33 -0
package/apps/pr-review-fix-ci/app.json +59 -0
package/apps/pr-review-fix-ci/workflow.js +90 -0
package/apps/release-cut/app.json +54 -0
package/apps/release-cut/workflow.js +82 -0
package/apps/research-synthesis/app.json +50 -0
package/apps/research-synthesis/workflow.js +76 -0
package/apps/workflow-app-framework-demo/app.json +29 -0
package/apps/workflow-app-framework-demo/workflow.js +44 -0
package/dist/agent-config.js +223 -0
package/dist/candidate-scoring.js +715 -0
package/dist/capability-core.js +630 -0
package/dist/capability-dispatcher.js +86 -0
package/dist/capability-registry.js +523 -0
package/dist/cli.js +1276 -0
package/dist/collaboration.js +727 -0
package/dist/commit.js +570 -0
package/dist/contract-migration.js +234 -0
package/dist/coordinator.js +1163 -0
package/dist/daemon.js +44 -0
package/dist/dispatch.js +201 -0
package/dist/drive.js +503 -0
package/dist/error-feedback.js +415 -0
package/dist/evidence-grounding.js +179 -0
package/dist/evidence-reasoning.js +733 -0
package/dist/execution-backend.js +1279 -0
package/dist/harness.js +61 -0
package/dist/mcp-server.js +1615 -0
package/dist/multi-agent-eval.js +857 -0
package/dist/multi-agent-host.js +764 -0
package/dist/multi-agent-operator-ux.js +537 -0
package/dist/multi-agent-trust.js +366 -0
package/dist/multi-agent.js +1173 -0
package/dist/node-snapshot.js +270 -0
package/dist/observability.js +922 -0
package/dist/operator-ux.js +971 -0
package/dist/orchestrator/audit-operations.js +182 -0
package/dist/orchestrator/candidate-operations.js +117 -0
package/dist/orchestrator/cli-options.js +288 -0
package/dist/orchestrator/collaboration-operations.js +86 -0
package/dist/orchestrator/feedback-operations.js +81 -0
package/dist/orchestrator/host-operations.js +78 -0
package/dist/orchestrator/lifecycle-operations.js +462 -0
package/dist/orchestrator/migration-operations.js +44 -0
package/dist/orchestrator/multi-agent-operations.js +362 -0
package/dist/orchestrator/report.js +369 -0
package/dist/orchestrator/topology-operations.js +84 -0
package/dist/orchestrator.js +874 -0
package/dist/pipeline-contract.js +92 -0
package/dist/pipeline-runner.js +285 -0
package/dist/reclamation.js +882 -0
package/dist/result-normalize.js +194 -0
package/dist/run-export.js +64 -0
package/dist/run-registry.js +1347 -0
package/dist/run-state-schema.js +67 -0
package/dist/sandbox-profile.js +471 -0
package/dist/scheduler.js +266 -0
package/dist/scheduling.js +184 -0
package/dist/schema-validate.js +98 -0
package/dist/state-explosion.js +1213 -0
package/dist/state-migrations.js +463 -0
package/dist/state-node.js +301 -0
package/dist/state.js +308 -0
package/dist/telemetry-attestation.js +156 -0
package/dist/telemetry-ledger.js +145 -0
package/dist/topology.js +527 -0
package/dist/triggers.js +159 -0
package/dist/trust-audit.js +475 -0
package/dist/types/blackboard.js +2 -0
package/dist/types/boundary.js +29 -0
package/dist/types/candidate.js +2 -0
package/dist/types/collaboration.js +2 -0
package/dist/types/core.js +2 -0
package/dist/types/drive.js +10 -0
package/dist/types/error-feedback.js +2 -0
package/dist/types/evidence-reasoning.js +2 -0
package/dist/types/execution-backend.js +2 -0
package/dist/types/multi-agent.js +2 -0
package/dist/types/observability.js +2 -0
package/dist/types/pipeline.js +2 -0
package/dist/types/reclamation.js +8 -0
package/dist/types/result.js +2 -0
package/dist/types/run-registry.js +2 -0
package/dist/types/run.js +2 -0
package/dist/types/sandbox.js +2 -0
package/dist/types/schedule.js +2 -0
package/dist/types/state-node.js +2 -0
package/dist/types/topology.js +2 -0
package/dist/types/trust.js +2 -0
package/dist/types/workbench.js +2 -0
package/dist/types/worker.js +2 -0
package/dist/types/workflow-app.js +2 -0
package/dist/types.js +43 -0
package/dist/verifier-registry.js +46 -0
package/dist/verifier.js +78 -0
package/dist/version.js +8 -0
package/dist/workbench-host.js +172 -0
package/dist/workbench.js +190 -0
package/dist/worker-isolation.js +1028 -0
package/dist/workflow-api.js +98 -0
package/dist/workflow-app-framework.js +626 -0
package/docs/agent-delegation-drive.7.md +190 -0
package/docs/agent-framework.md +176 -0
package/docs/candidate-scoring.7.md +106 -0
package/docs/canonical-workflow-apps.7.md +137 -0
package/docs/capability-topology-registry.7.md +168 -0
package/docs/cli-mcp-parity.7.md +373 -0
package/docs/contract-migration-tooling.7.md +123 -0
package/docs/control-plane-scheduling.7.md +110 -0
package/docs/coordinator-blackboard.7.md +183 -0
package/docs/dogfood/architecture-review-cool-workflow.md +16 -0
package/docs/dogfood-one-real-repo.7.md +168 -0
package/docs/durable-state-and-locking.7.md +107 -0
package/docs/end-to-end-golden-path.7.md +117 -0
package/docs/error-feedback.7.md +153 -0
package/docs/evidence-adoption-reasoning-chain.7.md +270 -0
package/docs/execution-backends.7.md +300 -0
package/docs/getting-started.md +99 -0
package/docs/index.md +41 -0
package/docs/mcp-app-surface.7.md +235 -0
package/docs/multi-agent-cli-mcp-surface.7.md +265 -0
package/docs/multi-agent-eval-replay-harness.7.md +302 -0
package/docs/multi-agent-operator-ux.7.md +314 -0
package/docs/multi-agent-runtime-core.7.md +231 -0
package/docs/multi-agent-topologies.7.md +103 -0
package/docs/multi-agent-trust-policy-audit.7.md +154 -0
package/docs/node-snapshot-diff-replay.7.md +135 -0
package/docs/observability-cost-accounting.7.md +194 -0
package/docs/operator-ux.7.md +180 -0
package/docs/pipeline-runner.7.md +136 -0
package/docs/project-index.md +261 -0
package/docs/real-execution-backends.7.md +142 -0
package/docs/release-and-migration.7.md +280 -0
package/docs/release-tooling.7.md +159 -0
package/docs/routines.md +48 -0
package/docs/run-registry-control-plane.7.md +312 -0
package/docs/run-retention-reclamation.7.md +191 -0
package/docs/sandbox-profiles.7.md +137 -0
package/docs/scheduled-tasks.md +80 -0
package/docs/security-trust-hardening.7.md +117 -0
package/docs/state-explosion-management.7.md +264 -0
package/docs/state-node.7.md +96 -0
package/docs/team-collaboration.7.md +207 -0
package/docs/unix-principles.md +192 -0
package/docs/verifier-gated-commit.7.md +140 -0
package/docs/web-desktop-workbench.7.md +215 -0
package/docs/worker-isolation.7.md +167 -0
package/docs/workflow-app-framework.7.md +274 -0
package/manifest/README.md +43 -0
package/manifest/plugin.manifest.json +316 -0
package/manifest/pricing.policy.json +14 -0
package/package.json +79 -0
package/scripts/agents/claude-p-agent.js +104 -0
package/scripts/agents/claude-p-agent.sh +9 -0
package/scripts/agents/cw-attest-keygen.js +55 -0
package/scripts/agents/cw-attest-wrap.js +143 -0
package/scripts/block-unapproved-tag.sh +39 -0
package/scripts/bump-version.js +249 -0
package/scripts/canonical-apps.js +171 -0
package/scripts/cw.js +4 -0
package/scripts/dist-drift-check.js +79 -0
package/scripts/dogfood-architecture-review.js +237 -0
package/scripts/dogfood-release.js +624 -0
package/scripts/forward-ref-docs.js +73 -0
package/scripts/gen-manifests.js +232 -0
package/scripts/golden-path.js +300 -0
package/scripts/mcp-server.js +4 -0
package/scripts/new-feature.js +121 -0
package/scripts/parity-check.js +213 -0
package/scripts/release-check.js +118 -0
package/scripts/release-flow.js +272 -0
package/scripts/release-gate.sh +85 -0
package/scripts/sync-project-index.js +387 -0
package/scripts/validate-run-state-schema.js +126 -0
package/scripts/verify-container-selfref.js +64 -0
package/scripts/version-sync-check.js +237 -0
package/skills/cool-workflow/SKILL.md +162 -0
package/skills/cool-workflow/references/commands.md +282 -0
package/tsconfig.json +16 -0
package/ui/workbench/app.css +76 -0
package/ui/workbench/app.js +159 -0
package/ui/workbench/index.html +32 -0
package/workflows/architecture-review.workflow.js +84 -0
package/workflows/research-synthesis.workflow.js +47 -0

package/docs/execution-backends.7.md ADDED Viewed

@@ -0,0 +1,300 @@
+# EXECUTION-BACKENDS(7)
+## NAME
+Execution Backends - pluggable, swappable execution drivers for Cool Workflow (v0.1.29)
+## SYNOPSIS
+```text
+node dist/cli.js backend list
+node dist/cli.js backend show shell
+node dist/cli.js backend probe container
+node dist/cli.js dispatch <run-id> --sandbox readonly --backend shell
+node dist/cli.js worker manifest <run-id> <worker-id>
+```
+## DESCRIPTION
+An execution backend is a CW driver: a thin adapter that runs a dispatched
+task/worker somewhere, under the requested sandbox profile, and records a
+canonical result envelope plus a sandbox attestation. v0.1.29 lifts execution
+out of the kernel into this driver layer.
+The model is a BSD VFS / device-driver layer. There is ONE narrow
+`ExecutionBackend` interface (the mechanism) and many interchangeable drivers
+(`node`, `bun`, `shell`, `container`, `remote`, `ci`). The kernel —
+orchestrator, dispatch, and pipeline-runner — never learns which backend ran a
+task. WHAT to run and which evidence to record is kernel policy; HOW and WHERE
+it runs is the driver's concern.
+```text
+selected backend -> sandbox attestation -> execution/delegation -> canonical envelope
+```
+The result envelope, evidence refs, and provenance a task produces are
+schema-identical no matter which backend ran it. The backend id and its sandbox
+attestation are recorded AS provenance, so eval/replay, the verifier gates, and
+the v0.1.28 run registry do not care which backend executed a run.
+## THE CONTRACT
+The `ExecutionBackend` interface is three members:
+```text
+descriptor   the capability descriptor: which sandbox dimensions it enforces vs
+             attests, local vs remote, kind (local/delegating), readiness
+probe(ctx)   live, deterministic readiness check
+run(request) execute (or delegate) under a sandbox profile and return a
+             canonical ExecutionResultEnvelope { status, result, evidence, provenance }
+```
+`run` takes a dispatch/worker manifest plus a resolved sandbox profile and
+returns `{ result, evidence }` (byte-stable across backends) and `provenance`
+(backend id + `SandboxAttestation` + optional delegation handle).
+## THE SANDBOX PROFILE IS THE CONTRACT
+Every backend MUST honor the five sandbox-profile dimensions: read, write,
+command, network, env. For each dimension a driver declares one of:
+`enforce`
+: the driver actively restricts the dimension at execution time.
+`attest`
+: the driver records a verifiable claim but relies on the host/runner to
+  enforce it (mirrors the existing `sandbox.hostRequired` split).
+`unsupported`
+: the driver can neither enforce nor attest it.
+A profile requires a dimension when it restricts it (`command` when
+`execute.mode != any`, `network` when `network.mode != any`, `env` when
+`env.inherit` is false; read/write are always bounded). If a required dimension
+is `unsupported`, or the backend is not ready, or the command is denied by the
+profile, the backend FAILS CLOSED: `run` returns `status: "refused"` with an
+attestation whose `status` is `refused`. It never silently downgrades to an
+unsandboxed execution.
+## DRIVERS
+`node` (default)
+: Reproduces pre-v0.1.29 behavior exactly. The host runs the worker in-process
+  under CW's worker-output acceptance (a delegate-host execution). When it
+  executes a command it enforces command + env via the Node child process and
+  attests read/write/network to the host.
+`bun`
+: Node-compatible by default, Bun-friendly. Executes via the Node-compatible
+  runtime so evidence is byte-stable with `node`, and attests Bun availability
+  in provenance. Enforces command + env; attests read/write/network.
+`shell`
+: Runs a command/worker via the system shell (`/bin/sh -c`) under the sandbox
+  contract. Enforces command + env; attests read/write/network.
+`container`
+: Delegates to a container runtime (docker/podman) and records the
+  `image@digest` handle + attestation + result. A container can enforce all
+  five dimensions. Fails closed when no image is supplied.
+`remote`
+: Delegates to a remote runner and records the endpoint + job handle +
+  attestation + result. Fails closed when no endpoint is configured
+  (`CW_REMOTE_ENDPOINT` or `--endpoint`).
+`ci`
+: Delegates to a CI runner and records the job handle + attestation + result.
+  Fails closed when no CI job target is configured (`CW_CI_ENDPOINT` or
+  `--job`).
+CW DELEGATES; IT DOES NOT BECOME THE EXECUTOR. The local drivers run a thin
+child process to capture verifiable evidence (exit code + an output digest). The
+container/remote/ci drivers delegate and record a handle + attestation +
+result; they never reimplement a container runtime or a CI system.
+## SELECTION
+Backend selection parallels `--sandbox`:
+```text
+--backend <id>   (flag)   > CW_BACKEND   (env)   > node   (default)
+```
+Selection is recorded in run state (dispatch manifest, worker scope, worker
+manifest, the RunDispatch) and surfaced in the v0.1.28 run registry as the
+record's `backends` field. A per-task `backendId` overrides the run default.
+`backend list|show|probe` and the `--backend` flag are declared once in
+`src/capability-registry.ts`, so `cw <cmd> --json` and `cw_<cmd>` render one
+data source and pass the v0.1.27 parity gate.
+## EVIDENCE PARITY
+The canonical evidence a local driver records for a command run is
+backend-independent:
+```text
+command:<command + args>
+exitCode:<code>
+stdoutSha256:sha256:<hex>
+```
+Running CW's own self-verify (`node dist/cli.js list`) through `node`, `shell`,
+and `bun` yields byte-identical `result` and `evidence`; only
+`provenance.backendId` (and the attestation detail) differs. The
+`test/execution-backends-smoke.js` gate proves this, proves the fail-closed
+refusals, proves the recorded provenance and delegation handles, and proves the
+verifier/registry stay backend-agnostic.
+## ATTESTATION SHAPE
+```json
+{
+  "backendId": "shell",
+  "locality": "local",
+  "kind": "local",
+  "sandboxProfileId": "readonly",
+  "required": ["read", "write", "network", "env"],
+  "enforced": ["command", "env"],
+  "attested": ["read", "write", "network"],
+  "unenforceable": [],
+  "status": "enforced",
+  "enforcedByCW": ["..."],
+  "hostRequired": ["..."]
+}
+```
+A delegating driver additionally records `handle` (e.g.
+`{ "kind": "container", "ref": "img@sha256:..." }`).
+## FILES
+```text
+.cw/runs/<run-id>/state.json
+.cw/runs/<run-id>/dispatches/<dispatch-id>.json
+.cw/runs/<run-id>/workers/<worker-id>/worker.json
+.cw/runs/<run-id>/workers/<worker-id>/manifest.json
+.cw/registry/index.json
+```
+## FAILURE MODES
+Unknown backends fail closed with `backend-not-found` (CLI/dispatch/`CW_BACKEND`).
+`run` returns `status: "refused"` with `attestation.status: "refused"` when:
+- the command is denied by the sandbox profile (`sandbox-command-denied`),
+- a required sandbox dimension is `unsupported` (`sandbox-unenforceable`),
+- a local backend is not ready (`backend-not-ready`),
+- a delegating backend has no delegation target (`delegation-target-missing`).
+CW never silently downgrades a requested backend, and never runs a task
+unsandboxed when the requested profile cannot be honored.
+## COMPATIBILITY
+Execution Backends are introduced in CW v0.1.29. The default (`node`) backend
+reproduces pre-v0.1.29 behavior exactly; runs with no backend selected keep
+working and old run state loads unchanged (the backend fields are additive and
+optional). The `ResultEnvelope` schema (`summary`, `findings`, `evidence`) is
+unchanged — the backend id and attestation live in provenance and run state,
+never in the result envelope.
+## SEE ALSO
+sandbox-profiles(7), worker-isolation(7), cli-mcp-parity(7),
+run-registry-control-plane(7), security-trust-hardening(7)
+```
+## Web / Desktop Workbench (v0.1.30)
+v0.1.30 adds the Web / Desktop Workbench: a read-only, localhost-only human
+console that renders this surface (and the other four operator panels — run
+graph, blackboard, worker logs, candidate compare, audit timeline) for any run,
+reading the SAME capability `--json` payloads. It is a THIRD FRONT DOOR alongside
+the CLI and MCP that holds no authoritative state and forks no schema: each panel
+equals its `cw <cmd> --json` payload byte-for-byte (parity-gated), and refresh
+re-derives everything from disk. See
+[web-desktop-workbench.7.md](web-desktop-workbench.7.md).
+## Observability + Cost Accounting (v0.1.31)
+v0.1.31 adds Observability + Cost Accounting: `metrics show`/`metrics summary`
+derive durations, failure/verifier/acceptance rates (with sample counts and
+fail-closed `n/a`), and host-attested token/cost from existing durable run state
+— no metrics database, no collector daemon, no hidden counter. Usage is additive
+and optional (absent ⇒ `unreported`, never 0); cost is `attested` (attested usage
+× a recorded pricing policy) or clearly `estimated`, with pricing as policy. Both
+verbs are parity-gated and render read-only in the v0.1.30 Workbench. See
+[observability-cost-accounting.7.md](observability-cost-accounting.7.md).
+## Team Collaboration (v0.1.32)
+v0.1.32 adds Team Collaboration: a host-attested actor and append-only
+approvals/rejections/comments/handoffs provenance-linked to a durable target,
+plus a review gate that STACKS ON the verifier gate — required approvals from
+authorized roles, enforced inside `resolveCommitGate` AFTER the verifier checks
+and never instead of them, failing closed on quorum/authority/self-approval and
+recording who approved the very artifact that shipped. Policy (required approvals,
+authorized roles, self-approval) is data, default off (pre-v0.1.32 behavior
+unchanged). The verbs are parity-gated and render read-only in the v0.1.30
+Workbench. See [Team Collaboration](team-collaboration.7.md).
+## Release Tooling (v0.1.33)
+the per-tag mechanical surfaces (version bump across 17 surfaces, feature scaffold, and the forward-reference docs) become deterministic scripts, with a de-duplicated release gate. See release-tooling(7).
+## Real Execution Backend Integrations (v0.1.34)
+container/remote/ci backends really execute (docker/podman run, remote/CI POST-and-poll) under the sandbox contract, with byte-stable evidence vs node and fail-closed refusal when a runtime/endpoint is unavailable. See real-execution-backends(7).
+## Node Snapshot / Diff / Replay (v0.1.35)
+per-node snapshot, structural diff, and isolated deterministic replay over StateNode, reusing the v0.1.23 eval harness; fail-closed on source drift (valid|stale|absent). See node-snapshot-diff-replay(7).
+## Contract Migration Tooling (v0.1.36)
+first-class declared migration registry (run-state + workflow-app) with per-edge compatibility proofs, fail-closed reachability, and a round-trip/non-destruction prover. See contract-migration-tooling(7).
+## Control-Plane Scheduling (v0.1.37)
+priority + concurrency limits + lease lifecycle + retry/backoff + fail-closed park over the v0.1.28 Run Registry queue; policy-as-data, deterministic. See control-plane-scheduling(7).
+## Agent Delegation Drive (v0.1.38)
+spawn an external agent process per worker, capture result.md + attestation, auto-drive plan->dispatch->fulfill->accept->commit
+## Run Retention & Provable Reclamation (v0.1.39)
+tiered, append-only, cryptographically-verifiable run reclamation: seal the audit skeleton, free the reconstructable bulk, prove it
+## Durable State & Locking (v0.1.40)
+atomic temp->rename writes + fsync-durability for authoritative stores; portable stale-stealing file lock serializing the cross-process read-modify-write stores
+## Self-Audit Hardening & Pure-Router Decomposition (v0.1.41)
+evidence grounding + durable audit append + symlink-hardened containment + deterministic worker ids + recursive redaction; BackendRegistry self-describing drivers (no per-id switches); orchestrator god-object decomposed into per-domain operation modules (pure loadRun->delegate router)
+## Robust Result Ingest (v0.1.42)
+capture findings/evidence from any reasonable agent shape (alt keys + prose), CW derives grounded evidence itself, warn on empty capture — closes the v0.1.41 live-drive 'accepted with 0 captured' failure
+## No-False-Green Gate & Launch Prep (v0.1.43)
+Hard gate blocking empty-capture verifier-gated commits, plus quickstart and launch-prep docs.
+## Release-Gate Determinism & Agents Vendor (v0.1.44)
+Release-readiness checks now validate the committed blob (`git show HEAD:<path>`) instead of the mutable working tree — eliminating false-red/false-green from concurrent working-tree writes (iCloud/Spotlight/editor). Adds the `agents` vendor manifest target: a generated `.agents/plugins/cool-workflow/` adapter giving any non-Claude AI agent one common interface to CW.
+## P1-P2 Fixes & CI Content Surfaces (v0.1.49)
+Migration DAG with reversible edges (v0.1.45), capability auto-discovery (v0.1.46), vendor-adapter registry (v0.1.47), state auto-compaction and P2 fixes (v0.1.48), plus CI content-surface determinism hardening (v0.1.49).
+0.1.51
+0.1.76
+0.1.77
+0.1.78

package/docs/getting-started.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Getting Started
+From a fresh clone:
+```bash
+cd plugins/cool-workflow
+npm install
+npm run build
+node scripts/cw.js app list
+```
+Create a run with a canonical workflow app:
+```bash
+node scripts/cw.js plan release-cut \
+  --repo "$PWD" \
+  --version 0.1.25 \
+  --previousVersion 0.1.24 \
+  --releaseBranch main \
+  --dryRun true
+```
+Use the returned run id:
+```bash
+node scripts/cw.js status <run-id>
+node scripts/cw.js graph <run-id>
+node scripts/cw.js dispatch <run-id> --limit 1 --sandbox readonly
+node scripts/cw.js worker summary <run-id>
+node scripts/cw.js topology list
+node scripts/cw.js topology apply <run-id> map-reduce --task <task-id>
+node scripts/cw.js topology summary <run-id>
+node scripts/cw.js multi-agent run <run-id> --topology judge-panel --task <task-id>
+node scripts/cw.js multi-agent status <run-id>
+node scripts/cw.js multi-agent graph <run-id>
+node scripts/cw.js multi-agent dependencies <run-id>
+node scripts/cw.js multi-agent failures <run-id>
+node scripts/cw.js multi-agent evidence <run-id>
+node scripts/cw.js multi-agent step <run-id> --sandbox readonly
+node scripts/cw.js multi-agent blackboard <run-id> summary
+node scripts/cw.js multi-agent score <run-id> <candidate-id> --criterion correctness=1 --evidence <ref>
+node scripts/cw.js multi-agent select <run-id> <candidate-id> --reason "verified winner"
+node scripts/cw.js multi-agent summary <run-id>
+node scripts/cw.js blackboard summary <run-id>
+node scripts/cw.js audit summary <run-id>
+node scripts/cw.js audit multi-agent <run-id>
+node scripts/cw.js audit policy <run-id>
+node scripts/cw.js audit blackboard <run-id>
+node scripts/cw.js audit judge <run-id>
+node scripts/cw.js eval snapshot <run-id> --id <suite-id>
+node scripts/cw.js eval replay .cw/evals/<suite-id>/snapshot.json
+node scripts/cw.js eval compare .cw/evals/<suite-id>/snapshot.json .cw/evals/<suite-id>/replay-run.json
+node scripts/cw.js eval score .cw/evals/<suite-id>/replay-run.json
+node scripts/cw.js eval gate .cw/evals/<suite-id>
+node scripts/cw.js eval report .cw/evals/<suite-id>/replay-run.json
+node scripts/cw.js report <run-id> --show
+```
+Run the deterministic regression commands:
+```bash
+npm run check
+npm test
+npm run canonical-apps
+npm run golden-path
+npm run eval:replay
+npm run fixture-compat
+```
+Before cutting a release, run the full dry-run gate:
+```bash
+npm run release:check
+npm run dogfood:release
+```
+The release check is non-destructive. It builds, type-checks, runs tests,
+validates canonical apps and golden path behavior, checks old fixture
+compatibility, verifies docs, runs the dogfood smoke proof, and checks version
+synchronization. It does not tag, push, publish, or rewrite fixture files.
+`npm run dogfood:release` is the real-repository release proof. It uses the
+canonical `release-cut` app against this repository in dry-run mode, records CW
+worker outputs from real command logs, scores and selects a release candidate,
+creates a verifier-gated CW state commit, and writes
+`.cw/runs/<run-id>/dogfood-summary.json`.
+Trust audit records live under `.cw/runs/<run-id>/audit/`. CW records the
+sandbox profile used by each worker, allowed and denied decisions, evidence
+provenance, and why selected candidates or verifier-gated commits were
+accepted. Multi-agent trust records add role policy, blackboard write audit,
+message provenance, judge rationale, and policy violations. Inspect them with
+`audit summary`, `audit worker`, `audit provenance`, `audit multi-agent`,
+`audit policy`, `audit blackboard`, and `audit judge`.
+Eval/replay artifacts live under `.cw/evals/<suite-id>/`. They let a release
+gate prove replay completion, graph/dependency parity, evidence adoption,
+trust/policy/audit parity, judge rationale, candidate scoring, selection, and
+verifier-gated commit readiness without running live agents.

package/docs/index.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Cool Workflow Docs
+Read these in order when you are new to CW:
+1. [Getting Started](getting-started.md) - clone, install, run a workflow, inspect it, and run the release check.
+2. [Project Index](project-index.md) - code-derived map of source modules, workflow apps, docs, tests, and sync targets.
+3. [Workflow App framework](workflow-app-framework.7.md) - userland app manifests, entrypoints, compatibility, and validation.
+4. [Sandbox Profiles](sandbox-profiles.7.md) - named worker policy contracts for read/write/execute/network/env handling.
+5. [Security / Trust Hardening](security-trust-hardening.7.md) - audit records, provenance, sandbox attestations, and acceptance rationale.
+6. [Multi-Agent Runtime Core](multi-agent-runtime-core.7.md) - first-class MultiAgentRun, roles, groups, memberships, fanout, fanin, and lifecycle state.
+7. [Coordinator / Blackboard](coordinator-blackboard.7.md) - shared topics, messages, context frames, artifact refs, snapshots, decisions, conflicts, and fanin evidence.
+8. [Multi-Agent Topologies](multi-agent-topologies.7.md) - official map-reduce, debate, and judge-panel recipes built on multi-agent and blackboard records.
+9. [Multi-Agent CLI + MCP Surface](multi-agent-cli-mcp-surface.7.md) - preferred host loop for run, status, step, blackboard, score, and select.
+10. [Multi-Agent Operator UX](multi-agent-operator-ux.7.md) - graph, dependencies, failures, and evidence adoption for topology-backed multi-agent runs.
+11. [Multi-Agent Trust / Policy / Audit](multi-agent-trust-policy-audit.7.md) - role authority, message provenance, blackboard write audit, judge rationale, and policy violations.
+12. [Multi-Agent Eval & Replay Harness](multi-agent-eval-replay-harness.7.md) - snapshots, isolated replays, comparison, scoring, gates, reports, and MCP parity.
+13. [State Explosion Management](state-explosion-management.7.md) - durable summary records, compact and focused graph views, blackboard digests, and stale-aware compaction for large multi-agent runs.
+14. [Evidence Adoption Reasoning Chain](evidence-adoption-reasoning-chain.7.md) - derived, fingerprinted reasoning chains explaining why each evidence item was adopted/rejected with basis, authority, rationale, and counterfactual, and a fail-closed `unexplained` state.
+15. [Run Registry / Control Plane](run-registry-control-plane.7.md) - derived, fingerprinted, fail-closed index over runs across repos: search, resume, archive, durable queue, cross-repo history, and failed-run rerun with provenance.
+16. [Execution Backends](execution-backends.7.md) - the pluggable driver layer (node/bun/shell/container/remote/ci): one narrow `ExecutionBackend` contract, sandbox attestation, identical envelopes across backends, and fail-closed delegation.
+17. [Operator UX](operator-ux.7.md) - `status`, `graph`, report, worker, candidate, feedback, commit, topology, multi-agent, blackboard, coordinator, and trust summaries.
+18. [MCP App Surface](mcp-app-surface.7.md) - JSON tool parity for agent hosts.
+19. [CLI ↔ MCP Parity](cli-mcp-parity.7.md) - the capability registry and fail-closed gate proving the CLI and MCP surfaces render one data source.
+20. [End-to-End Golden Path](end-to-end-golden-path.7.md) - deterministic proof of app, worker, verifier, candidate, commit, and report flow.
+21. [Dogfood One Real Repo](dogfood-one-real-repo.7.md) - dry-run release proof against the real Cool Workflow repository.
+22. [Web / Desktop Workbench](web-desktop-workbench.7.md) - a read-only, localhost-only human console rendering the run graph, blackboard, worker logs, candidate compare, and audit timeline over existing capability payloads — a third front door that holds no authoritative state.
+23. [Observability + Cost Accounting](observability-cost-accounting.7.md) - derived time/duration, failure/verifier/acceptance rates with sample counts and fail-closed `n/a`, plus host-attested token usage and attested-vs-estimated cost with explicit `unreported` coverage; pricing is policy as data.
+24. [Team Collaboration](team-collaboration.7.md) - host-attested actor, append-only approvals/rejections/comments/handoffs provenance-linked to durable targets, and a review gate that stacks on the verifier gate (required approvals from authorized roles, fail-closed quorum/authority/self-approval); policy is data.
+25. [Release And Migration](release-and-migration.7.md) - release and migration discipline for durable run state.
+26. [Release Tooling](release-tooling.7.md) - one-command version bump across every surface, a per-feature scaffolder, forward-reference doc automation, and a de-duplicated release gate.
+27. [Real Execution Backend Integrations](real-execution-backends.7.md) - container/remote/ci backends really execute (docker/podman run, remote/CI POST-and-poll) under the sandbox contract, byte-stable evidence vs node, fail-closed on an unavailable runtime/endpoint.
+28. [Node Snapshot / Diff / Replay](node-snapshot-diff-replay.7.md) - per-node snapshot, structural diff, and isolated deterministic replay over StateNode, reusing the eval harness; sha256-fingerprinted with fail-closed `valid|stale|absent` freshness.
+29. [Contract Migration Tooling](contract-migration-tooling.7.md) - a declared migration registry (run-state + workflow-app) with per-edge compatibility proofs, fail-closed reachability, and a round-trip/non-destruction prover over the existing migrateRunState pipeline.
+30. [Control-Plane Scheduling](control-plane-scheduling.7.md) - priority + hard concurrency ceiling + lease lifecycle + retry/backoff + fail-closed park policy over the v0.1.28 Run Registry queue; policy-as-data, deterministic, with a read-only `sched plan`.
+31. [Agent Delegation Drive](agent-delegation-drive.7.md) - the `agent` backend delegates each worker to an EXTERNAL agent process (claude/codex/HTTP endpoint) and `run --drive` auto-advances plan→dispatch→fulfill→accept→commit; the model runs in the agent's process, never in CW. Two-layer evidence, operator-vs-attested model, fail-closed park, replay without re-spawn.
+32. [Run Retention & Provable Reclamation](run-retention-reclamation.7.md) - tiered, append-only, cryptographically-verifiable disk reclamation over the v0.1.28 archive overlay: seal the audit skeleton, free the reconstructable/scratch bulk, and prove it via a hash-chained tombstone; `gc plan|run|verify`, write-ahead + fail-closed, explicit capability downgrade.
+33. [Durable State & Locking](durable-state-and-locking.7.md) - atomic (temp→rename) writes for every authoritative store with fsync-durability for the audit-essential ones, plus a portable stale-stealing file lock serializing the cross-process read-modify-write stores (home queue, archive overlay, reclamation chain); closes the prior verdict's non-atomic/unlocked P1.
+CW is the base system. Workflow apps are userland. Release and migration rules
+must preserve that line: stable contracts, explicit compatibility checks, and
+inspectable state.

package/docs/mcp-app-surface.7.md ADDED Viewed

@@ -0,0 +1,235 @@
+# MCP App Surface
+Cool Workflow v0.1.13 completes the MCP bridge as a runtime surface for agent
+hosts. The CLI remains the reference interface, and MCP exposes the same
+operational contracts as explicit JSON tools.
+The bridge follows CW's base-system discipline:
+- old tool names remain compatible
+- read-only inspection tools do not mutate state
+- state-changing tools write durable run files
+- inputs use stable names such as `runId`, `appId`, `workerId`,
+  `candidateId`, `selectionId`, `profileId`, `cwd`, `reason`, `evidence`, and
+  `criteria`
+- errors fail closed through JSON-RPC errors and durable ErrorFeedback where the
+  runtime already records feedback
+## App Run Flow
+Use `cw_app_list`, `cw_app_show`, and `cw_app_validate` to inspect app
+contracts. `cw_app_package` writes a package artifact. `cw_app_run` creates a
+run from a Workflow App framework app id and structured inputs:
+```json
+{
+  "appId": "end-to-end-golden-path",
+  "cwd": "/repo",
+  "inputs": {
+    "question": "Prove the MCP runtime surface."
+  },
+  "sandbox": "readonly"
+}
+```
+The result includes `runId`, workflow/app id and version, `statePath`,
+`reportPath`, pending task count, compact operator status, next actions, and
+the resolved sandbox profile when one was requested.
+`cw_plan` remains the lower-level planning tool and returns the full run object
+for compatibility.
+## Worker Inspection
+Worker isolation is first-class over MCP:
+- `cw_worker_list`
+- `cw_worker_show`
+- `cw_worker_manifest`
+- `cw_worker_validate`
+- `cw_worker_output`
+- `cw_worker_fail`
+- `cw_worker_summary`
+Worker records expose the worker id, task id, status, worker directory,
+`input.md`, `result.md`, artifacts/logs directories, sandbox profile id,
+sandbox policy, feedback ids, multi-agent metadata when present, and
+result/verifier node ids.
+An agent host should inspect `cw_worker_manifest`, write worker-local output to
+the manifest `resultPath`, then call `cw_worker_output`. CW validates the
+worker boundary, parses the `cw:result` block, creates result and verifier
+nodes, updates the task, writes reports, and checkpoints state.
+## Candidate Scoring
+Candidate operations mirror the CLI:
+- `cw_candidate_register`
+- `cw_candidate_list`
+- `cw_candidate_show`
+- `cw_candidate_score`
+- `cw_candidate_rank`
+- `cw_candidate_select`
+- `cw_candidate_reject`
+- `cw_candidate_summary`
+`cw_candidate_score` accepts structured `criteria` and evidence locators:
+```json
+{
+  "runId": "run-id",
+  "candidateId": "candidate-one",
+  "criteria": { "correctness": 4, "evidence": 4, "fit": 2 },
+  "maxTotal": 10,
+  "evidence": ["docs/mcp-app-surface.7.md:1"],
+  "verdict": "pass",
+  "notes": "Evidence-backed candidate."
+}
+```
+`cw_candidate_rank` and `cw_candidate_select` support the same
+evidence/verifier-gate policy as the CLI with `requireEvidence`,
+`requireVerifierGate`, `minNormalized`, and `allowUnverified`. Missing evidence
+or verifier gates fail closed and produce structured feedback through the
+candidate scoring layer.
+## Sandbox Profiles
+Existing sandbox tools remain:
+- `cw_sandbox_list`
+- `cw_sandbox_show`
+- `cw_sandbox_validate`
+v0.1.13 adds `cw_sandbox_choose` and `cw_sandbox_resolve` as read-only helpers
+that validate and resolve `sandbox`, `sandboxProfile`, `sandboxProfileId`, or
+`profileId` without dispatching work. `cw_dispatch` accepts all three sandbox
+field spellings for compatibility with different hosts.
+## Multi-Agent Runtime
+v0.1.17 adds MCP parity for first-class multi-agent state.
+v0.1.20 adds preferred host-facing tools for the full multi-agent loop:
+- `cw_multi_agent_run`
+- `cw_multi_agent_status`
+- `cw_multi_agent_step`
+- `cw_multi_agent_blackboard`
+- `cw_multi_agent_score`
+- `cw_multi_agent_select`
+Use these when an agent host wants to drive `run -> status -> step ->
+blackboard -> score -> select` without manually plumbing topology, blackboard,
+candidate, and audit ids. The lower-level tools below remain advanced
+primitives.
+v0.1.22 adds audit parity for multi-agent trust:
+- `cw_audit_multi_agent`
+- `cw_audit_policy`
+- `cw_audit_role`
+- `cw_audit_blackboard`
+- `cw_audit_judge`
+These tools expose role policies, permission decisions, blackboard write audit,
+message provenance, judge rationales, panel decisions, and policy violations in
+deterministic JSON.
+v0.1.24 adds eval/replay parity for multi-agent regression gates:
+- `cw_eval_snapshot`
+- `cw_eval_replay`
+- `cw_eval_compare`
+- `cw_eval_score`
+- `cw_eval_gate`
+- `cw_eval_report`
+These tools create replay snapshots, run isolated replays, compare normalized
+baseline/replay records, score metrics, fail closed on regressions, and return
+artifact paths in deterministic JSON.
+v0.1.25 adds State Explosion Management parity for large multi-agent runs:
+- `cw_summary_refresh`
+- `cw_summary_show`
+- `cw_blackboard_summarize`
+- `cw_multi_agent_summarize`
+- `cw_multi_agent_graph_compact`
+These tools refresh durable, versioned summary records, read the stale-aware
+state-explosion report, return the blackboard digest, and return compact or
+focused graph views with synthetic summary nodes. Every response keeps source
+refs and expansion hints and never deletes raw blackboard, graph, audit, or
+evidence records.
+Read and inspect:
+- `cw_multi_agent_summary`
+- `cw_multi_agent_graph`
+- `cw_multi_agent_run_show`
+- `cw_multi_agent_role_show`
+- `cw_multi_agent_group_show`
+- `cw_multi_agent_membership_show`
+- `cw_multi_agent_fanout_show`
+- `cw_multi_agent_fanin_show`
+Safe writes:
+- `cw_multi_agent_run_create`
+- `cw_multi_agent_run_transition`
+- `cw_multi_agent_role_create`
+- `cw_multi_agent_group_create`
+- `cw_multi_agent_membership_create`
+- `cw_multi_agent_fanout_create`
+- `cw_multi_agent_fanin_collect`
+These tools mirror the CLI state model. CW records and validates roles, groups,
+memberships, fanout/fanin, and lifecycle state; the host still executes agents
+and enforces OS/process/network/environment controls.
+## Verifier-Gated Commit
+`cw_commit` accepts verifier-gate fields:
+```json
+{
+  "runId": "run-id",
+  "selection": "selection-id",
+  "reason": "verified candidate selected"
+}
+```
+It also supports `verifier`, `verifierNode`, `candidate`, `selection`,
+`allowUnverifiedCheckpoint`, and `reason`. The MCP response includes `runId`,
+`commitId`, `verifierGated`, `checkpoint`, verifier/candidate/selection ids,
+`evidenceCount`, `snapshotPath`, next actions, and the underlying commit record.
+Use `cw_commit_summary` for a read-only view of verifier-gated commits and
+explicit checkpoints.
+## Operator Views
+MCP exposes structured JSON equivalents of Operator UX:
+- `cw_operator_status`
+- `cw_operator_graph`
+- `cw_operator_report`
+- `cw_worker_summary`
+- `cw_candidate_summary`
+- `cw_feedback_summary`
+- `cw_commit_summary`
+- `cw_multi_agent_summary`
+These tools return JSON summaries instead of console text. `cw_operator_report`
+refreshes the Markdown report the same way the CLI renderer does; the rest are
+read-only inspection tools.
+## CLI/MCP Parity
+The CLI remains the easiest way for humans to drive a run. MCP is the stable
+tool surface for agent hosts. New runtime capabilities should appear in both
+surfaces, keep old names as aliases or wrappers, and use explicit JSON
+contracts rather than host-specific policy hidden in the bridge.
+0.1.51