cool-workflow 0.1.78
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +20 -0
- package/.codex-plugin/mcp.json +10 -0
- package/.codex-plugin/plugin.json +38 -0
- package/.mcp.json +10 -0
- package/LICENSE +24 -0
- package/README.md +638 -0
- package/apps/architecture-review/app.json +51 -0
- package/apps/architecture-review/workflow.js +116 -0
- package/apps/end-to-end-golden-path/app.json +30 -0
- package/apps/end-to-end-golden-path/workflow.js +33 -0
- package/apps/pr-review-fix-ci/app.json +59 -0
- package/apps/pr-review-fix-ci/workflow.js +90 -0
- package/apps/release-cut/app.json +54 -0
- package/apps/release-cut/workflow.js +82 -0
- package/apps/research-synthesis/app.json +50 -0
- package/apps/research-synthesis/workflow.js +76 -0
- package/apps/workflow-app-framework-demo/app.json +29 -0
- package/apps/workflow-app-framework-demo/workflow.js +44 -0
- package/dist/agent-config.js +223 -0
- package/dist/candidate-scoring.js +715 -0
- package/dist/capability-core.js +630 -0
- package/dist/capability-dispatcher.js +86 -0
- package/dist/capability-registry.js +523 -0
- package/dist/cli.js +1276 -0
- package/dist/collaboration.js +727 -0
- package/dist/commit.js +570 -0
- package/dist/contract-migration.js +234 -0
- package/dist/coordinator.js +1163 -0
- package/dist/daemon.js +44 -0
- package/dist/dispatch.js +201 -0
- package/dist/drive.js +503 -0
- package/dist/error-feedback.js +415 -0
- package/dist/evidence-grounding.js +179 -0
- package/dist/evidence-reasoning.js +733 -0
- package/dist/execution-backend.js +1279 -0
- package/dist/harness.js +61 -0
- package/dist/mcp-server.js +1615 -0
- package/dist/multi-agent-eval.js +857 -0
- package/dist/multi-agent-host.js +764 -0
- package/dist/multi-agent-operator-ux.js +537 -0
- package/dist/multi-agent-trust.js +366 -0
- package/dist/multi-agent.js +1173 -0
- package/dist/node-snapshot.js +270 -0
- package/dist/observability.js +922 -0
- package/dist/operator-ux.js +971 -0
- package/dist/orchestrator/audit-operations.js +182 -0
- package/dist/orchestrator/candidate-operations.js +117 -0
- package/dist/orchestrator/cli-options.js +288 -0
- package/dist/orchestrator/collaboration-operations.js +86 -0
- package/dist/orchestrator/feedback-operations.js +81 -0
- package/dist/orchestrator/host-operations.js +78 -0
- package/dist/orchestrator/lifecycle-operations.js +462 -0
- package/dist/orchestrator/migration-operations.js +44 -0
- package/dist/orchestrator/multi-agent-operations.js +362 -0
- package/dist/orchestrator/report.js +369 -0
- package/dist/orchestrator/topology-operations.js +84 -0
- package/dist/orchestrator.js +874 -0
- package/dist/pipeline-contract.js +92 -0
- package/dist/pipeline-runner.js +285 -0
- package/dist/reclamation.js +882 -0
- package/dist/result-normalize.js +194 -0
- package/dist/run-export.js +64 -0
- package/dist/run-registry.js +1347 -0
- package/dist/run-state-schema.js +67 -0
- package/dist/sandbox-profile.js +471 -0
- package/dist/scheduler.js +266 -0
- package/dist/scheduling.js +184 -0
- package/dist/schema-validate.js +98 -0
- package/dist/state-explosion.js +1213 -0
- package/dist/state-migrations.js +463 -0
- package/dist/state-node.js +301 -0
- package/dist/state.js +308 -0
- package/dist/telemetry-attestation.js +156 -0
- package/dist/telemetry-ledger.js +145 -0
- package/dist/topology.js +527 -0
- package/dist/triggers.js +159 -0
- package/dist/trust-audit.js +475 -0
- package/dist/types/blackboard.js +2 -0
- package/dist/types/boundary.js +29 -0
- package/dist/types/candidate.js +2 -0
- package/dist/types/collaboration.js +2 -0
- package/dist/types/core.js +2 -0
- package/dist/types/drive.js +10 -0
- package/dist/types/error-feedback.js +2 -0
- package/dist/types/evidence-reasoning.js +2 -0
- package/dist/types/execution-backend.js +2 -0
- package/dist/types/multi-agent.js +2 -0
- package/dist/types/observability.js +2 -0
- package/dist/types/pipeline.js +2 -0
- package/dist/types/reclamation.js +8 -0
- package/dist/types/result.js +2 -0
- package/dist/types/run-registry.js +2 -0
- package/dist/types/run.js +2 -0
- package/dist/types/sandbox.js +2 -0
- package/dist/types/schedule.js +2 -0
- package/dist/types/state-node.js +2 -0
- package/dist/types/topology.js +2 -0
- package/dist/types/trust.js +2 -0
- package/dist/types/workbench.js +2 -0
- package/dist/types/worker.js +2 -0
- package/dist/types/workflow-app.js +2 -0
- package/dist/types.js +43 -0
- package/dist/verifier-registry.js +46 -0
- package/dist/verifier.js +78 -0
- package/dist/version.js +8 -0
- package/dist/workbench-host.js +172 -0
- package/dist/workbench.js +190 -0
- package/dist/worker-isolation.js +1028 -0
- package/dist/workflow-api.js +98 -0
- package/dist/workflow-app-framework.js +626 -0
- package/docs/agent-delegation-drive.7.md +190 -0
- package/docs/agent-framework.md +176 -0
- package/docs/candidate-scoring.7.md +106 -0
- package/docs/canonical-workflow-apps.7.md +137 -0
- package/docs/capability-topology-registry.7.md +168 -0
- package/docs/cli-mcp-parity.7.md +373 -0
- package/docs/contract-migration-tooling.7.md +123 -0
- package/docs/control-plane-scheduling.7.md +110 -0
- package/docs/coordinator-blackboard.7.md +183 -0
- package/docs/dogfood/architecture-review-cool-workflow.md +16 -0
- package/docs/dogfood-one-real-repo.7.md +168 -0
- package/docs/durable-state-and-locking.7.md +107 -0
- package/docs/end-to-end-golden-path.7.md +117 -0
- package/docs/error-feedback.7.md +153 -0
- package/docs/evidence-adoption-reasoning-chain.7.md +270 -0
- package/docs/execution-backends.7.md +300 -0
- package/docs/getting-started.md +99 -0
- package/docs/index.md +41 -0
- package/docs/mcp-app-surface.7.md +235 -0
- package/docs/multi-agent-cli-mcp-surface.7.md +265 -0
- package/docs/multi-agent-eval-replay-harness.7.md +302 -0
- package/docs/multi-agent-operator-ux.7.md +314 -0
- package/docs/multi-agent-runtime-core.7.md +231 -0
- package/docs/multi-agent-topologies.7.md +103 -0
- package/docs/multi-agent-trust-policy-audit.7.md +154 -0
- package/docs/node-snapshot-diff-replay.7.md +135 -0
- package/docs/observability-cost-accounting.7.md +194 -0
- package/docs/operator-ux.7.md +180 -0
- package/docs/pipeline-runner.7.md +136 -0
- package/docs/project-index.md +261 -0
- package/docs/real-execution-backends.7.md +142 -0
- package/docs/release-and-migration.7.md +280 -0
- package/docs/release-tooling.7.md +159 -0
- package/docs/routines.md +48 -0
- package/docs/run-registry-control-plane.7.md +312 -0
- package/docs/run-retention-reclamation.7.md +191 -0
- package/docs/sandbox-profiles.7.md +137 -0
- package/docs/scheduled-tasks.md +80 -0
- package/docs/security-trust-hardening.7.md +117 -0
- package/docs/state-explosion-management.7.md +264 -0
- package/docs/state-node.7.md +96 -0
- package/docs/team-collaboration.7.md +207 -0
- package/docs/unix-principles.md +192 -0
- package/docs/verifier-gated-commit.7.md +140 -0
- package/docs/web-desktop-workbench.7.md +215 -0
- package/docs/worker-isolation.7.md +167 -0
- package/docs/workflow-app-framework.7.md +274 -0
- package/manifest/README.md +43 -0
- package/manifest/plugin.manifest.json +316 -0
- package/manifest/pricing.policy.json +14 -0
- package/package.json +79 -0
- package/scripts/agents/claude-p-agent.js +104 -0
- package/scripts/agents/claude-p-agent.sh +9 -0
- package/scripts/agents/cw-attest-keygen.js +55 -0
- package/scripts/agents/cw-attest-wrap.js +143 -0
- package/scripts/block-unapproved-tag.sh +39 -0
- package/scripts/bump-version.js +249 -0
- package/scripts/canonical-apps.js +171 -0
- package/scripts/cw.js +4 -0
- package/scripts/dist-drift-check.js +79 -0
- package/scripts/dogfood-architecture-review.js +237 -0
- package/scripts/dogfood-release.js +624 -0
- package/scripts/forward-ref-docs.js +73 -0
- package/scripts/gen-manifests.js +232 -0
- package/scripts/golden-path.js +300 -0
- package/scripts/mcp-server.js +4 -0
- package/scripts/new-feature.js +121 -0
- package/scripts/parity-check.js +213 -0
- package/scripts/release-check.js +118 -0
- package/scripts/release-flow.js +272 -0
- package/scripts/release-gate.sh +85 -0
- package/scripts/sync-project-index.js +387 -0
- package/scripts/validate-run-state-schema.js +126 -0
- package/scripts/verify-container-selfref.js +64 -0
- package/scripts/version-sync-check.js +237 -0
- package/skills/cool-workflow/SKILL.md +162 -0
- package/skills/cool-workflow/references/commands.md +282 -0
- package/tsconfig.json +16 -0
- package/ui/workbench/app.css +76 -0
- package/ui/workbench/app.js +159 -0
- package/ui/workbench/index.html +32 -0
- package/workflows/architecture-review.workflow.js +84 -0
- package/workflows/research-synthesis.workflow.js +47 -0
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
# End-to-End Golden Path
|
|
2
|
+
|
|
3
|
+
Cool Workflow v0.1.10 added a deterministic golden path that proves the base
|
|
4
|
+
system is connected from workflow app planning through verifier-gated commit and
|
|
5
|
+
report generation.
|
|
6
|
+
|
|
7
|
+
Run it from the plugin root:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
cd plugins/cool-workflow
|
|
11
|
+
npm run golden-path
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
The command uses only Node.js standard library APIs and the public CW CLI. It
|
|
15
|
+
does not use the network, sleeps, hidden daemon state, or real subagents.
|
|
16
|
+
|
|
17
|
+
v0.1.13 adds `test/mcp-app-surface-smoke.js`, a sibling deterministic proof
|
|
18
|
+
that drives the same app/worker/candidate/commit/operator chain over MCP stdio
|
|
19
|
+
JSON-RPC instead of direct CLI commands.
|
|
20
|
+
|
|
21
|
+
## What It Proves
|
|
22
|
+
|
|
23
|
+
The runner exercises this chain:
|
|
24
|
+
|
|
25
|
+
```text
|
|
26
|
+
workflow app -> plan -> dispatch -> isolated worker -> candidate scoring
|
|
27
|
+
-> verifier -> gated commit -> report
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
It uses the first-class `end-to-end-golden-path` app in
|
|
31
|
+
`apps/end-to-end-golden-path/`. The app has one phase and one evidence-required
|
|
32
|
+
worker task with the `readonly` sandbox profile.
|
|
33
|
+
|
|
34
|
+
## CLI Surface
|
|
35
|
+
|
|
36
|
+
The runner performs the same public commands an operator would use:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
node scripts/cw.js app validate end-to-end-golden-path
|
|
40
|
+
node scripts/cw.js plan end-to-end-golden-path --repo <tmp> --question "..."
|
|
41
|
+
node scripts/cw.js dispatch <run-id> --limit 1 --sandbox readonly
|
|
42
|
+
node scripts/cw.js worker manifest <run-id> <worker-id>
|
|
43
|
+
node scripts/cw.js worker output <run-id> <worker-id> <result.md>
|
|
44
|
+
node scripts/cw.js candidate register <run-id> --worker <worker-id> --id golden-candidate
|
|
45
|
+
node scripts/cw.js candidate score <run-id> golden-candidate \
|
|
46
|
+
--criterion correctness=4 \
|
|
47
|
+
--criterion evidence=4 \
|
|
48
|
+
--criterion fit=2 \
|
|
49
|
+
--maxTotal 10 \
|
|
50
|
+
--evidence <file:line>
|
|
51
|
+
node scripts/cw.js candidate rank <run-id>
|
|
52
|
+
node scripts/cw.js candidate select <run-id> golden-candidate --reason "golden path verified"
|
|
53
|
+
node scripts/cw.js commit <run-id> --selection <selection-id> \
|
|
54
|
+
--reason "golden path verifier-gated commit"
|
|
55
|
+
node scripts/cw.js report <run-id>
|
|
56
|
+
node scripts/cw.js status <run-id>
|
|
57
|
+
node scripts/cw.js graph <run-id>
|
|
58
|
+
node scripts/cw.js report <run-id> --show
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
After dispatch, the script reads the generated worker manifest and writes a
|
|
62
|
+
valid Markdown result to the worker's declared `result.md`. The result contains
|
|
63
|
+
a `cw:result` JSON fence with file:line evidence.
|
|
64
|
+
|
|
65
|
+
## Files Written
|
|
66
|
+
|
|
67
|
+
The runner creates a temporary workspace under the OS temp directory:
|
|
68
|
+
|
|
69
|
+
```text
|
|
70
|
+
<tmp>/
|
|
71
|
+
golden-evidence.md
|
|
72
|
+
.cw/runs/<run-id>/
|
|
73
|
+
state.json
|
|
74
|
+
report.md
|
|
75
|
+
tasks/
|
|
76
|
+
dispatches/
|
|
77
|
+
workers/<worker-id>/
|
|
78
|
+
input.md
|
|
79
|
+
worker.json
|
|
80
|
+
manifest.json
|
|
81
|
+
result.md
|
|
82
|
+
results/
|
|
83
|
+
nodes/
|
|
84
|
+
candidates/
|
|
85
|
+
golden-candidate/
|
|
86
|
+
scores/
|
|
87
|
+
ranking.json
|
|
88
|
+
commits/
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
By default the workspace is left on disk so the report and state can be
|
|
92
|
+
inspected. Tests run the same script with `--cleanup`.
|
|
93
|
+
|
|
94
|
+
## Invariants
|
|
95
|
+
|
|
96
|
+
The golden path asserts durable state, not just exit codes:
|
|
97
|
+
|
|
98
|
+
- run state includes workflow app id and version metadata
|
|
99
|
+
- MCP hosts can reproduce the flow with `cw_app_run`, `cw_dispatch`,
|
|
100
|
+
`cw_worker_manifest`, `cw_worker_output`, `cw_candidate_score`,
|
|
101
|
+
`cw_candidate_select`, `cw_commit`, and operator summary tools
|
|
102
|
+
- dispatch records a worker id and `readonly` sandbox profile
|
|
103
|
+
- the worker manifest includes resolved sandbox policy data
|
|
104
|
+
- the worker reaches `verified`
|
|
105
|
+
- result and verifier nodes exist
|
|
106
|
+
- the verifier node carries evidence
|
|
107
|
+
- `golden-candidate` reaches `verified` after selection
|
|
108
|
+
- candidate score and ranking files exist
|
|
109
|
+
- the final commit has `verifierGated: true` and `checkpoint: false`
|
|
110
|
+
- the final commit references the selection, candidate, verifier node, and
|
|
111
|
+
evidence
|
|
112
|
+
- the report mentions the workflow app, candidates, and verifier-gated commit
|
|
113
|
+
- operator status, graph, report, and summary commands can inspect the run
|
|
114
|
+
- no ErrorFeedback records are produced
|
|
115
|
+
|
|
116
|
+
If this command fails, one of the base integration contracts is broken.
|
|
117
|
+
0.1.51
|
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
# ERROR-FEEDBACK(7)
|
|
2
|
+
|
|
3
|
+
## NAME
|
|
4
|
+
|
|
5
|
+
Error Feedback Loop - inspectable diagnostic and correction state for Cool Workflow
|
|
6
|
+
|
|
7
|
+
## SYNOPSIS
|
|
8
|
+
|
|
9
|
+
```ts
|
|
10
|
+
import {
|
|
11
|
+
collectRunErrors,
|
|
12
|
+
createCorrectionTask,
|
|
13
|
+
recordFeedback,
|
|
14
|
+
resolveFeedback
|
|
15
|
+
} from "./error-feedback";
|
|
16
|
+
|
|
17
|
+
const records = collectRunErrors(run);
|
|
18
|
+
const task = createCorrectionTask(run, records[0].id, {
|
|
19
|
+
verifierCommand: "npm test"
|
|
20
|
+
});
|
|
21
|
+
const resolved = resolveFeedback(run, records[0].id, {
|
|
22
|
+
status: "resolved",
|
|
23
|
+
nodeId: verifiedNode.id
|
|
24
|
+
});
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## DESCRIPTION
|
|
28
|
+
|
|
29
|
+
The Error Feedback Loop is the small layer between structured failures and
|
|
30
|
+
operator correction. It records failures as durable JSON, classifies them with
|
|
31
|
+
plain identifiers, creates optional correction tasks, and resolves records only
|
|
32
|
+
after verifier evidence is present.
|
|
33
|
+
|
|
34
|
+
It does not repair code, retry stages, or own domain workflow behavior. Workflow
|
|
35
|
+
apps and operators decide how corrections are applied.
|
|
36
|
+
|
|
37
|
+
The loop follows:
|
|
38
|
+
|
|
39
|
+
```text
|
|
40
|
+
error -> classify -> feedback record -> correction task -> verify -> checkpoint
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## FEEDBACK MODEL
|
|
44
|
+
|
|
45
|
+
Each feedback record is an `ErrorFeedbackRecord` with schema version `1`.
|
|
46
|
+
Important fields are:
|
|
47
|
+
|
|
48
|
+
- `id`
|
|
49
|
+
- `runId`
|
|
50
|
+
- `status`
|
|
51
|
+
- `severity`
|
|
52
|
+
- `classification`
|
|
53
|
+
- `source`
|
|
54
|
+
- `code`
|
|
55
|
+
- `message`
|
|
56
|
+
- `nodeId`
|
|
57
|
+
- `stageId`
|
|
58
|
+
- `contractId`
|
|
59
|
+
- `taskId`
|
|
60
|
+
- `path`
|
|
61
|
+
- `retryable`
|
|
62
|
+
- `evidence`
|
|
63
|
+
- `artifacts`
|
|
64
|
+
- `correctionTaskId`
|
|
65
|
+
- `resolvedByNodeId`
|
|
66
|
+
- `metadata`
|
|
67
|
+
|
|
68
|
+
The runtime also keeps `run.feedback` in `state.json` for quick inspection.
|
|
69
|
+
|
|
70
|
+
## FAILURE CLASSIFICATION
|
|
71
|
+
|
|
72
|
+
Classifications are stable, plain strings:
|
|
73
|
+
|
|
74
|
+
```text
|
|
75
|
+
contract-violation
|
|
76
|
+
verifier-failure
|
|
77
|
+
state-transition
|
|
78
|
+
missing-artifact
|
|
79
|
+
missing-evidence
|
|
80
|
+
parse-error
|
|
81
|
+
pipeline-failure
|
|
82
|
+
runtime-error
|
|
83
|
+
unknown
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Classification is conservative. The feedback loop does not duplicate
|
|
87
|
+
`PipelineContract` validation logic; it classifies structured errors already
|
|
88
|
+
produced by StateNode, PipelineRunner, verifier, or CLI surfaces.
|
|
89
|
+
|
|
90
|
+
## CORRECTION TASKS
|
|
91
|
+
|
|
92
|
+
Correction tasks are normal task Markdown files under the run `tasks/`
|
|
93
|
+
directory. They include the original error, affected node/stage/contract,
|
|
94
|
+
evidence, expected verification command, and retry guidance.
|
|
95
|
+
|
|
96
|
+
Creating a correction task marks the feedback record as `tasked`. It does not
|
|
97
|
+
apply code changes.
|
|
98
|
+
|
|
99
|
+
Resolving feedback requires a node id whose status is `verified` or `committed`.
|
|
100
|
+
Rejected corrections are preserved by setting status to `rejected`.
|
|
101
|
+
|
|
102
|
+
## FILES
|
|
103
|
+
|
|
104
|
+
```text
|
|
105
|
+
.cw/runs/<run-id>/feedback/<feedback-id>.json
|
|
106
|
+
.cw/runs/<run-id>/feedback/index.json
|
|
107
|
+
.cw/runs/<run-id>/state.json
|
|
108
|
+
.cw/runs/<run-id>/tasks/feedback:<feedback-id>.md
|
|
109
|
+
.cw/runs/<run-id>/report.md
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## EXAMPLES
|
|
113
|
+
|
|
114
|
+
Collect failed node errors:
|
|
115
|
+
|
|
116
|
+
```text
|
|
117
|
+
node dist/cli.js feedback collect <run-id>
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
List feedback records:
|
|
121
|
+
|
|
122
|
+
```text
|
|
123
|
+
node dist/cli.js feedback list <run-id>
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Show one feedback record:
|
|
127
|
+
|
|
128
|
+
```text
|
|
129
|
+
node dist/cli.js feedback show <run-id> <feedback-id>
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Create a correction task:
|
|
133
|
+
|
|
134
|
+
```text
|
|
135
|
+
node dist/cli.js feedback task <run-id> <feedback-id> --verify "npm test"
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Resolve after a verified node:
|
|
139
|
+
|
|
140
|
+
```text
|
|
141
|
+
node dist/cli.js feedback resolve <run-id> <feedback-id> --node <verified-node-id>
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
All commands print stable JSON.
|
|
145
|
+
|
|
146
|
+
## COMPATIBILITY
|
|
147
|
+
|
|
148
|
+
Error Feedback is introduced in CW v0.1.4. It adds optional `feedback` state and
|
|
149
|
+
`feedbackDir` path metadata. Older runs remain readable; missing fields are
|
|
150
|
+
initialized when the run is loaded.
|
|
151
|
+
|
|
152
|
+
Existing workflow, node, contract, pipeline, and CLI behavior is preserved.
|
|
153
|
+
0.1.51
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
# Evidence Adoption Reasoning Chain
|
|
2
|
+
|
|
3
|
+
CW v0.1.26 adds the Evidence Adoption Reasoning Chain. Earlier releases can
|
|
4
|
+
already answer *what* was adopted: `multi-agent evidence <run-id>` reports each
|
|
5
|
+
evidence item as `adopted`, `rejected`, `pending`, `superseded`, `conflicting`,
|
|
6
|
+
or `missing`, and traces the path worker result -> blackboard -> fanin ->
|
|
7
|
+
candidate score -> selection -> verifier-gated commit. This release answers
|
|
8
|
+
*why* each adoption happened, as a first-class, inspectable reasoning chain.
|
|
9
|
+
|
|
10
|
+
The design keeps the base-system observability philosophy:
|
|
11
|
+
|
|
12
|
+
- raw state is the source of truth
|
|
13
|
+
- the reasoning chain is a derived userland view, never a replacement for source
|
|
14
|
+
records and never authoritative over them
|
|
15
|
+
- mechanism is separate from policy: the chain captures, stores, and renders the
|
|
16
|
+
recorded "why"; what counts as a *sufficient* reason stays with the verifier
|
|
17
|
+
and role policy
|
|
18
|
+
- fail closed, never infer: a "why" that cannot be traced to a real record
|
|
19
|
+
renders as `unexplained`, never a fabricated rationale, and an unexplained
|
|
20
|
+
adoption is never silently treated as explained
|
|
21
|
+
- plain files, stable JSON, deterministic output
|
|
22
|
+
- backward compatible; pre-v0.1.26 run state loads and renders with derived,
|
|
23
|
+
empty-where-absent reasoning records
|
|
24
|
+
|
|
25
|
+
## What the chain records
|
|
26
|
+
|
|
27
|
+
For every adopted, rejected, superseded, or conflicting evidence item the chain
|
|
28
|
+
makes the following traceable and machine-readable, per gate:
|
|
29
|
+
|
|
30
|
+
- DECISION - what was adopted/rejected and at which gate (`fanin`,
|
|
31
|
+
`candidate-score`, `selection`, `verifier`, or `commit`).
|
|
32
|
+
- BASIS - the concrete evidence refs, provenance source, parent evidence ids,
|
|
33
|
+
and audit event ids that grounded the decision. These link to existing
|
|
34
|
+
`EvidenceProvenance` and trust-audit records; they are not duplicated.
|
|
35
|
+
- AUTHORITY - which role / membership / worker / scorer / verifier made the call,
|
|
36
|
+
and the role `policyRef` under which it was permitted. Links to existing
|
|
37
|
+
trust / policy / audit records.
|
|
38
|
+
- RATIONALE - the explicit recorded reason. The chain reuses existing rationale
|
|
39
|
+
fields: selection `reason` and `AcceptanceRationale`, candidate score `notes`,
|
|
40
|
+
verifier commit-gate result, commit `reason`, `CoordinatorDecision.reason`, and
|
|
41
|
+
judge-rationale audit metadata. No new rationale source of truth is created.
|
|
42
|
+
- COUNTERFACTUAL - the rejected/losing alternatives (rejected candidates, failed
|
|
43
|
+
scores, rejected or superseded coordinator decisions) and the recorded reason
|
|
44
|
+
each lost, with a normalized score delta when computable, so an adoption is
|
|
45
|
+
understood relative to its alternatives.
|
|
46
|
+
- INTEGRITY - a `sourceFingerprint` and `valid|stale|absent` freshness so a
|
|
47
|
+
reader knows the explanation still matches the underlying records, plus the
|
|
48
|
+
explicit `unexplained` state when a rationale is absent.
|
|
49
|
+
|
|
50
|
+
## Derived record model
|
|
51
|
+
|
|
52
|
+
The records live in `src/types.ts` and reuse existing provenance / trust /
|
|
53
|
+
rationale types by reference:
|
|
54
|
+
|
|
55
|
+
- `EvidenceReasoningStep` - one gate's reasoning: `gate`, `decision`, `basis`,
|
|
56
|
+
`authority`, `rationale`, and `counterfactuals`.
|
|
57
|
+
- `EvidenceReasoningChain` - the full chain for one evidence item: `id`, `ref`,
|
|
58
|
+
`evidenceStatus`, a rolled-up `rationaleStatus`
|
|
59
|
+
(`explained`, `unexplained`, or `not-applicable`), `sourceKind`, `steps`,
|
|
60
|
+
`sourceRecordIds`, and `unexplainedReasons`.
|
|
61
|
+
- `EvidenceReasoningReport` - the run-level report: `freshness`,
|
|
62
|
+
`sourceFingerprint`, `totals`, `chains`, and a `nextAction`.
|
|
63
|
+
|
|
64
|
+
Status values mirror the existing evidence vocabulary and add the fail-closed
|
|
65
|
+
`unexplained` state. A chain is `explained` only when *every* decision-bearing
|
|
66
|
+
step is explained; if any adopting step has no traceable rationale the chain
|
|
67
|
+
rolls up to `unexplained`.
|
|
68
|
+
|
|
69
|
+
## Durable storage
|
|
70
|
+
|
|
71
|
+
`multi-agent reasoning <run-id> --refresh` materializes a durable, versioned,
|
|
72
|
+
provenance-backed index under `.cw/runs/<run-id>/reasoning/`:
|
|
73
|
+
|
|
74
|
+
- `index.json` - the `EvidenceReasoningIndex` (schema version, run id,
|
|
75
|
+
`sourceFingerprint`, totals, and per-chain entries with their own
|
|
76
|
+
fingerprints).
|
|
77
|
+
- `chain-<evidence-id>.json` - one record per reasoning chain.
|
|
78
|
+
- `report.json` - the rendered report at refresh time.
|
|
79
|
+
|
|
80
|
+
Raw results, candidates, scores, selections, commits, blackboard records, and
|
|
81
|
+
audit events are never deleted or overwritten. The reasoning view is derived and
|
|
82
|
+
re-derivable; the index only persists a snapshot for freshness comparison.
|
|
83
|
+
|
|
84
|
+
## Freshness
|
|
85
|
+
|
|
86
|
+
`multi-agent reasoning <run-id>` re-derives the chain from current source state
|
|
87
|
+
and compares its `sourceFingerprint` against the persisted index:
|
|
88
|
+
|
|
89
|
+
- `absent` - no index has been refreshed yet.
|
|
90
|
+
- `valid` - the persisted fingerprint matches current source state.
|
|
91
|
+
- `stale` - source records changed since the last refresh; re-run with
|
|
92
|
+
`--refresh`.
|
|
93
|
+
|
|
94
|
+
This follows the v0.1.25 state-explosion summary discipline exactly. Freshness
|
|
95
|
+
is a visible state, never an inferred guess.
|
|
96
|
+
|
|
97
|
+
## Commands
|
|
98
|
+
|
|
99
|
+
`multi-agent reasoning <run-id>` (MCP: `cw_evidence_reasoning`) renders the
|
|
100
|
+
report. Add `--evidence <id>` to explain a single adoption, `--refresh` to
|
|
101
|
+
materialize the durable index first, and `--json` / `--format json` for the full
|
|
102
|
+
machine-readable report.
|
|
103
|
+
|
|
104
|
+
`multi-agent reasoning <run-id> --refresh` with no `--evidence` returns the
|
|
105
|
+
written index (MCP: `cw_evidence_reasoning_refresh`).
|
|
106
|
+
|
|
107
|
+
`multi-agent evidence <run-id>` is unchanged in shape but each row now carries an
|
|
108
|
+
additive `rationaleStatus` field (`explained`, `unexplained`, or
|
|
109
|
+
`not-applicable`), so the existing evidence surface answers both *what* and
|
|
110
|
+
whether the *why* is recorded.
|
|
111
|
+
|
|
112
|
+
The console report adds a single new panel, `Adoption Rationale`, alongside the
|
|
113
|
+
existing operator panels. It is the only panel added by this release.
|
|
114
|
+
|
|
115
|
+
## Composition with graph views and compaction
|
|
116
|
+
|
|
117
|
+
The reasoning chain composes with the existing graph views, especially
|
|
118
|
+
`multi-agent graph <run-id> --view evidence`. A reasoning step is on the critical
|
|
119
|
+
path: every decision-gate node backing an adopted chain (candidate, score,
|
|
120
|
+
selection, commit, and fanin nodes) is protected from state-explosion
|
|
121
|
+
compaction and is never collapsed into a synthetic summary node. In particular
|
|
122
|
+
score nodes, which are otherwise collapsible, stay expanded when they carry a
|
|
123
|
+
reasoning step.
|
|
124
|
+
|
|
125
|
+
## Eval / replay regression gates
|
|
126
|
+
|
|
127
|
+
The eval harness adds three deterministic, replay-stable metrics, reported under
|
|
128
|
+
the `Evidence Adoption Reasoning Chain` section of the replay report:
|
|
129
|
+
|
|
130
|
+
- `reasoning_freshness` - the derived chain totals and `sourceFingerprint` are
|
|
131
|
+
stable across replay.
|
|
132
|
+
- `reasoning_chain_parity` - every chain's gates, decisions, rationale statuses,
|
|
133
|
+
and counterfactual counts match the baseline.
|
|
134
|
+
- `reasoning_unexplained_parity` - the set of `unexplained` chains is unchanged,
|
|
135
|
+
so a regression that hides or fabricates a rationale fails the gate.
|
|
136
|
+
|
|
137
|
+
These sections are optional on pre-v0.1.26 snapshots so older fixtures stay
|
|
138
|
+
loadable.
|
|
139
|
+
|
|
140
|
+
## Data flow
|
|
141
|
+
|
|
142
|
+
```
|
|
143
|
+
worker result / blackboard / coordinator decision
|
|
144
|
+
-> EvidenceProvenance + trust-audit event (BASIS)
|
|
145
|
+
-> role / membership / worker + role policyRef (AUTHORITY)
|
|
146
|
+
-> fanin coverage / score notes+verdict / selection reason
|
|
147
|
+
/ verifier gate / commit reason / judge rationale (RATIONALE)
|
|
148
|
+
-> rejected candidates / failed scores / rejected decisions (COUNTERFACTUAL)
|
|
149
|
+
-> EvidenceReasoningChain (per evidence item)
|
|
150
|
+
-> EvidenceReasoningReport + sourceFingerprint + freshness (INTEGRITY)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
No daemon, no hidden dashboard, no LLM call. The chain is derived from recorded
|
|
154
|
+
state by `src/evidence-reasoning.ts` and rendered on demand.
|
|
155
|
+
## CLI ↔ MCP Parity (v0.1.27)
|
|
156
|
+
|
|
157
|
+
Every command and tool referenced above is declared in the v0.1.27 capability
|
|
158
|
+
registry (`src/capability-registry.ts`) and validated by `npm run parity:check`,
|
|
159
|
+
so `cw <cmd> --json` and the matching `cw_<tool>` result render one data source.
|
|
160
|
+
See [cli-mcp-parity.7.md](cli-mcp-parity.7.md).
|
|
161
|
+
|
|
162
|
+
## Run Registry / Control Plane (v0.1.28)
|
|
163
|
+
|
|
164
|
+
The runs described here are indexed, searchable, resumable, archivable, and
|
|
165
|
+
rerunnable across repos by the v0.1.28 Run Registry / Control Plane, which derives
|
|
166
|
+
a fingerprinted, fail-closed index over the same per-run `.cw/runs/<id>/state.json`
|
|
167
|
+
source of truth. See [run-registry-control-plane.7.md](run-registry-control-plane.7.md).
|
|
168
|
+
|
|
169
|
+
## Execution Backends (v0.1.29)
|
|
170
|
+
|
|
171
|
+
v0.1.29 lifts execution into a pluggable driver layer: one narrow `ExecutionBackend`
|
|
172
|
+
contract with interchangeable `node`/`bun`/`shell`/`container`/`remote`/`ci`
|
|
173
|
+
drivers, selected by `--backend` (parallel to `--sandbox`) and inspected via
|
|
174
|
+
`backend list|show|probe`. The result/evidence envelope is schema-identical across
|
|
175
|
+
backends; the backend id + sandbox attestation are recorded as provenance, so this
|
|
176
|
+
surface is unchanged regardless of which backend executed a run. See
|
|
177
|
+
[execution-backends.7.md](execution-backends.7.md).
|
|
178
|
+
## Web / Desktop Workbench (v0.1.30)
|
|
179
|
+
|
|
180
|
+
v0.1.30 adds the Web / Desktop Workbench: a read-only, localhost-only human
|
|
181
|
+
console that renders this surface (and the other four operator panels — run
|
|
182
|
+
graph, blackboard, worker logs, candidate compare, audit timeline) for any run,
|
|
183
|
+
reading the SAME capability `--json` payloads. It is a THIRD FRONT DOOR alongside
|
|
184
|
+
the CLI and MCP that holds no authoritative state and forks no schema: each panel
|
|
185
|
+
equals its `cw <cmd> --json` payload byte-for-byte (parity-gated), and refresh
|
|
186
|
+
re-derives everything from disk. See
|
|
187
|
+
[web-desktop-workbench.7.md](web-desktop-workbench.7.md).
|
|
188
|
+
|
|
189
|
+
## Observability + Cost Accounting (v0.1.31)
|
|
190
|
+
|
|
191
|
+
v0.1.31 adds Observability + Cost Accounting: `metrics show`/`metrics summary`
|
|
192
|
+
derive durations, failure/verifier/acceptance rates (with sample counts and
|
|
193
|
+
fail-closed `n/a`), and host-attested token/cost from existing durable run state
|
|
194
|
+
— no metrics database, no collector daemon, no hidden counter. Usage is additive
|
|
195
|
+
and optional (absent ⇒ `unreported`, never 0); cost is `attested` (attested usage
|
|
196
|
+
× a recorded pricing policy) or clearly `estimated`, with pricing as policy. Both
|
|
197
|
+
verbs are parity-gated and render read-only in the v0.1.30 Workbench. See
|
|
198
|
+
[observability-cost-accounting.7.md](observability-cost-accounting.7.md).
|
|
199
|
+
|
|
200
|
+
|
|
201
|
+
## Team Collaboration (v0.1.32)
|
|
202
|
+
|
|
203
|
+
v0.1.32 adds Team Collaboration: a host-attested actor and append-only
|
|
204
|
+
approvals/rejections/comments/handoffs provenance-linked to a durable target,
|
|
205
|
+
plus a review gate that STACKS ON the verifier gate — required approvals from
|
|
206
|
+
authorized roles, enforced inside `resolveCommitGate` AFTER the verifier checks
|
|
207
|
+
and never instead of them, failing closed on quorum/authority/self-approval and
|
|
208
|
+
recording who approved the very artifact that shipped. Policy (required approvals,
|
|
209
|
+
authorized roles, self-approval) is data, default off (pre-v0.1.32 behavior
|
|
210
|
+
unchanged). The verbs are parity-gated and render read-only in the v0.1.30
|
|
211
|
+
Workbench. See [Team Collaboration](team-collaboration.7.md).
|
|
212
|
+
|
|
213
|
+
## Release Tooling (v0.1.33)
|
|
214
|
+
|
|
215
|
+
the per-tag mechanical surfaces (version bump across 17 surfaces, feature scaffold, and the forward-reference docs) become deterministic scripts, with a de-duplicated release gate. See release-tooling(7).
|
|
216
|
+
|
|
217
|
+
## Real Execution Backend Integrations (v0.1.34)
|
|
218
|
+
|
|
219
|
+
container/remote/ci backends really execute (docker/podman run, remote/CI POST-and-poll) under the sandbox contract, with byte-stable evidence vs node and fail-closed refusal when a runtime/endpoint is unavailable. See real-execution-backends(7).
|
|
220
|
+
|
|
221
|
+
## Node Snapshot / Diff / Replay (v0.1.35)
|
|
222
|
+
|
|
223
|
+
per-node snapshot, structural diff, and isolated deterministic replay over StateNode, reusing the v0.1.23 eval harness; fail-closed on source drift (valid|stale|absent). See node-snapshot-diff-replay(7).
|
|
224
|
+
|
|
225
|
+
## Contract Migration Tooling (v0.1.36)
|
|
226
|
+
|
|
227
|
+
first-class declared migration registry (run-state + workflow-app) with per-edge compatibility proofs, fail-closed reachability, and a round-trip/non-destruction prover. See contract-migration-tooling(7).
|
|
228
|
+
|
|
229
|
+
## Control-Plane Scheduling (v0.1.37)
|
|
230
|
+
|
|
231
|
+
priority + concurrency limits + lease lifecycle + retry/backoff + fail-closed park over the v0.1.28 Run Registry queue; policy-as-data, deterministic. See control-plane-scheduling(7).
|
|
232
|
+
|
|
233
|
+
## Agent Delegation Drive (v0.1.38)
|
|
234
|
+
|
|
235
|
+
spawn an external agent process per worker, capture result.md + attestation, auto-drive plan->dispatch->fulfill->accept->commit
|
|
236
|
+
|
|
237
|
+
## Run Retention & Provable Reclamation (v0.1.39)
|
|
238
|
+
|
|
239
|
+
tiered, append-only, cryptographically-verifiable run reclamation: seal the audit skeleton, free the reconstructable bulk, prove it
|
|
240
|
+
|
|
241
|
+
## Durable State & Locking (v0.1.40)
|
|
242
|
+
|
|
243
|
+
atomic temp->rename writes + fsync-durability for authoritative stores; portable stale-stealing file lock serializing the cross-process read-modify-write stores
|
|
244
|
+
|
|
245
|
+
## Self-Audit Hardening & Pure-Router Decomposition (v0.1.41)
|
|
246
|
+
|
|
247
|
+
evidence grounding + durable audit append + symlink-hardened containment + deterministic worker ids + recursive redaction; BackendRegistry self-describing drivers (no per-id switches); orchestrator god-object decomposed into per-domain operation modules (pure loadRun->delegate router)
|
|
248
|
+
|
|
249
|
+
## Robust Result Ingest (v0.1.42)
|
|
250
|
+
|
|
251
|
+
capture findings/evidence from any reasonable agent shape (alt keys + prose), CW derives grounded evidence itself, warn on empty capture — closes the v0.1.41 live-drive 'accepted with 0 captured' failure
|
|
252
|
+
|
|
253
|
+
## No-False-Green Gate & Launch Prep (v0.1.43)
|
|
254
|
+
|
|
255
|
+
Hard gate blocking empty-capture verifier-gated commits, plus quickstart and launch-prep docs.
|
|
256
|
+
|
|
257
|
+
## Release-Gate Determinism & Agents Vendor (v0.1.44)
|
|
258
|
+
|
|
259
|
+
Release-readiness checks now validate the committed blob (`git show HEAD:<path>`) instead of the mutable working tree — eliminating false-red/false-green from concurrent working-tree writes (iCloud/Spotlight/editor). Adds the `agents` vendor manifest target: a generated `.agents/plugins/cool-workflow/` adapter giving any non-Claude AI agent one common interface to CW.
|
|
260
|
+
|
|
261
|
+
## P1-P2 Fixes & CI Content Surfaces (v0.1.49)
|
|
262
|
+
|
|
263
|
+
Migration DAG with reversible edges (v0.1.45), capability auto-discovery (v0.1.46), vendor-adapter registry (v0.1.47), state auto-compaction and P2 fixes (v0.1.48), plus CI content-surface determinism hardening (v0.1.49).
|
|
264
|
+
0.1.51
|
|
265
|
+
|
|
266
|
+
0.1.76
|
|
267
|
+
|
|
268
|
+
0.1.77
|
|
269
|
+
|
|
270
|
+
0.1.78
|