@chllming/wave-orchestration 0.5.2 → 0.5.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +15 -0
- package/README.md +56 -501
- package/docs/README.md +39 -0
- package/docs/concepts/context7-vs-skills.md +94 -0
- package/docs/concepts/operating-modes.md +91 -0
- package/docs/concepts/runtime-agnostic-orchestration.md +95 -0
- package/docs/concepts/what-is-a-wave.md +133 -0
- package/docs/guides/planner.md +113 -0
- package/docs/guides/terminal-surfaces.md +80 -0
- package/docs/image.png +0 -0
- package/docs/plans/context7-wave-orchestrator.md +2 -0
- package/docs/plans/current-state.md +10 -0
- package/docs/plans/master-plan.md +3 -3
- package/docs/plans/migration.md +4 -3
- package/docs/plans/wave-orchestrator.md +27 -3
- package/docs/reference/runtime-config/README.md +19 -0
- package/docs/reference/skills.md +156 -0
- package/docs/roadmap.md +160 -564
- package/package.json +2 -1
- package/releases/manifest.json +32 -0
- package/scripts/wave-orchestrator/config.mjs +17 -0
- package/scripts/wave-orchestrator/context7.mjs +9 -0
- package/scripts/wave-orchestrator/coordination.mjs +16 -0
- package/scripts/wave-orchestrator/executors.mjs +24 -11
- package/scripts/wave-orchestrator/install.mjs +41 -2
- package/scripts/wave-orchestrator/launcher.mjs +131 -25
- package/scripts/wave-orchestrator/planner.mjs +1328 -0
- package/scripts/wave-orchestrator/project-profile.mjs +190 -0
- package/scripts/wave-orchestrator/shared.mjs +2 -0
- package/scripts/wave-orchestrator/skills.mjs +448 -0
- package/scripts/wave-orchestrator/terminals.mjs +16 -0
- package/scripts/wave-orchestrator/traces.mjs +23 -0
- package/scripts/wave-orchestrator/wave-files.mjs +299 -84
- package/scripts/wave.mjs +11 -0
- package/skills/provider-aws/SKILL.md +6 -0
- package/skills/provider-aws/skill.json +5 -0
- package/skills/provider-custom-deploy/SKILL.md +5 -0
- package/skills/provider-custom-deploy/skill.json +5 -0
- package/skills/provider-docker-compose/SKILL.md +6 -0
- package/skills/provider-docker-compose/skill.json +5 -0
- package/skills/provider-github-release/SKILL.md +6 -0
- package/skills/provider-github-release/skill.json +5 -0
- package/skills/provider-kubernetes/SKILL.md +6 -0
- package/skills/provider-kubernetes/skill.json +5 -0
- package/skills/provider-railway/SKILL.md +6 -0
- package/skills/provider-railway/adapters/claude.md +1 -0
- package/skills/provider-railway/adapters/codex.md +1 -0
- package/skills/provider-railway/adapters/local.md +1 -0
- package/skills/provider-railway/adapters/opencode.md +1 -0
- package/skills/provider-railway/skill.json +5 -0
- package/skills/provider-ssh-manual/SKILL.md +6 -0
- package/skills/provider-ssh-manual/skill.json +5 -0
- package/skills/repo-coding-rules/SKILL.md +7 -0
- package/skills/repo-coding-rules/skill.json +5 -0
- package/skills/role-deploy/SKILL.md +6 -0
- package/skills/role-deploy/skill.json +5 -0
- package/skills/role-documentation/SKILL.md +6 -0
- package/skills/role-documentation/skill.json +5 -0
- package/skills/role-evaluator/SKILL.md +6 -0
- package/skills/role-evaluator/skill.json +5 -0
- package/skills/role-implementation/SKILL.md +6 -0
- package/skills/role-implementation/skill.json +5 -0
- package/skills/role-infra/SKILL.md +6 -0
- package/skills/role-infra/skill.json +5 -0
- package/skills/role-integration/SKILL.md +6 -0
- package/skills/role-integration/skill.json +5 -0
- package/skills/role-research/SKILL.md +6 -0
- package/skills/role-research/skill.json +5 -0
- package/skills/runtime-claude/SKILL.md +6 -0
- package/skills/runtime-claude/skill.json +5 -0
- package/skills/runtime-codex/SKILL.md +6 -0
- package/skills/runtime-codex/skill.json +5 -0
- package/skills/runtime-local/SKILL.md +5 -0
- package/skills/runtime-local/skill.json +5 -0
- package/skills/runtime-opencode/SKILL.md +6 -0
- package/skills/runtime-opencode/skill.json +5 -0
- package/skills/wave-core/SKILL.md +7 -0
- package/skills/wave-core/skill.json +5 -0
- package/wave.config.json +27 -0
package/docs/roadmap.md
CHANGED
|
@@ -1,626 +1,222 @@
|
|
|
1
1
|
# Wave Orchestrator Roadmap
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Wave Orchestrator should keep wave markdown as the authored plan surface, but it needs a higher planning-fidelity bar and a better authoring loop.
|
|
4
4
|
|
|
5
|
-
-
|
|
6
|
-
- wave markdown as the authored plan surface
|
|
7
|
-
- multi-role agents with explicit ownership
|
|
8
|
-
- component promotions, exit contracts, documentation stewardship, and evaluator closure
|
|
5
|
+
The same planning and execution substrate should also support ad-hoc operator requests without forcing every one-off task into the long-lived numbered roadmap sequence.
|
|
9
6
|
|
|
10
|
-
The
|
|
7
|
+
The target is the level of specificity shown in [Wave 7](/home/coder/slowfast.ai/docs/plans/waves/wave-7.md): explicit sequencing, hard requirements, exact validation commands, earlier-wave inputs, concrete ownership, and clear closure rules. This roadmap focuses on how to get this repo there without replacing the current architecture.
|
|
11
8
|
|
|
12
|
-
## Current
|
|
9
|
+
## Current Position
|
|
13
10
|
|
|
14
|
-
|
|
11
|
+
The repository already has the right runtime substrate:
|
|
15
12
|
|
|
16
|
-
- the package-first install and upgrade flow is shipped
|
|
17
|
-
- the canonical coordination JSONL store, rendered board projection, compiled inboxes, per-wave ledger, docs queue, integration summaries, and trace bundles are shipped
|
|
18
|
-
- `A8` integration stewardship and staged closure are shipped
|
|
19
|
-
- orchestrator-first clarification triage and human-escalation artifacts are shipped
|
|
20
|
-
- per-agent executor profiles, per-lane runtime policy, hard runtime mix targets, retry-time fallback, and generic budgets are shipped
|
|
21
|
-
- required inbound cross-lane dependency tickets now block both autonomous wave launch and lane finalization
|
|
22
|
-
- integration summaries now carry actionable evidence for claims, interface drift, proof gaps, docs gaps, and deploy or ops risk
|
|
23
|
-
- cumulative `quality.json` metrics and internal, read-only hermetic trace replay validation are shipped
|
|
24
|
-
- capability-targeted requests now become explicit helper assignments with deterministic assignee selection, ledger/traces coverage, and closure barriers
|
|
25
|
-
- typed cross-lane dependency workflows now have operator commands, per-wave dependency snapshots, and replay-visible gating
|
|
26
|
-
|
|
27
|
-
The remaining roadmap work is mostly about extending those foundations rather than inventing a new orchestration model.
|
|
28
|
-
|
|
29
|
-
## Design Position
|
|
30
|
-
|
|
31
|
-
The recent harness and blackboard sources point in the same direction:
|
|
32
|
-
|
|
33
|
-
- compaction alone is not enough for long-running work
|
|
34
|
-
- append-only communication logs are useful, but not sufficient as the canonical coordination substrate
|
|
35
|
-
- messaging quality matters less than whether the system can integrate distributed findings into a coherent decision
|
|
36
|
-
- runtime choice should be treated as authored plan data, not only as a launch-time default
|
|
37
|
-
- clarification should stay inside the harness loop until the orchestrator can prove that human input is actually required
|
|
38
|
-
- the harness needs reproducible traces, explicit loop control, and durable state across sessions
|
|
39
|
-
|
|
40
|
-
Wave Orchestration already has a strong base:
|
|
41
|
-
|
|
42
|
-
- wave parsing and role imports
|
|
43
13
|
- lane-scoped state under `.tmp/`
|
|
44
|
-
-
|
|
45
|
-
-
|
|
46
|
-
-
|
|
47
|
-
-
|
|
48
|
-
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
-
|
|
59
|
-
-
|
|
60
|
-
-
|
|
61
|
-
-
|
|
62
|
-
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
75
|
-
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
-
|
|
84
|
-
- `
|
|
85
|
-
- `
|
|
86
|
-
- `
|
|
87
|
-
- `
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
- `
|
|
91
|
-
-
|
|
92
|
-
|
|
93
|
-
Required fields:
|
|
94
|
-
|
|
95
|
-
- `id`
|
|
96
|
-
- `kind`
|
|
97
|
-
- `wave`
|
|
98
|
-
- `lane`
|
|
99
|
-
- `agentId`
|
|
100
|
-
- `targets`
|
|
101
|
-
- `status`
|
|
102
|
-
- `priority`
|
|
103
|
-
- `artifactRefs`
|
|
104
|
-
- `dependsOn`
|
|
105
|
-
- `closureCondition`
|
|
106
|
-
- `createdAt`
|
|
107
|
-
- `updatedAt`
|
|
108
|
-
- `confidence`
|
|
109
|
-
- `summary`
|
|
110
|
-
- `detail`
|
|
111
|
-
|
|
112
|
-
Compatibility rule:
|
|
113
|
-
|
|
114
|
-
- keep writing the markdown board, but generate it from the coordination store and append a short human-readable projection
|
|
115
|
-
|
|
116
|
-
### 2. Agent Inbox Compiler
|
|
117
|
-
|
|
118
|
-
Stop injecting a raw board tail into every agent prompt. Compile role-specific inboxes from the canonical coordination state.
|
|
119
|
-
|
|
120
|
-
Why this is high value:
|
|
121
|
-
|
|
122
|
-
- Long-running harness guidance favors explicit handoff artifacts and short “get up to speed” paths.
|
|
123
|
-
- Raw tail snapshots are noisy and lose important old-but-still-open obligations.
|
|
124
|
-
- Multi-agent blackboard systems work when the current blackboard state determines who should act next and what they should see.
|
|
125
|
-
|
|
126
|
-
Proposed artifacts:
|
|
127
|
-
|
|
128
|
-
- `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/<agent-id>.md`
|
|
129
|
-
- `.tmp/<lane>-wave-launcher/inboxes/wave-<n>/shared-summary.md`
|
|
130
|
-
|
|
131
|
-
Each inbox should contain:
|
|
132
|
-
|
|
133
|
-
- owned open requests
|
|
134
|
-
- claims that conflict with this agent’s work
|
|
135
|
-
- unresolved blockers affecting owned files or components
|
|
136
|
-
- required doc deltas
|
|
137
|
-
- human feedback relevant to that agent
|
|
138
|
-
- integration findings from prior attempts
|
|
139
|
-
- only the minimal recent audit context needed for recovery
|
|
140
|
-
|
|
141
|
-
Prompt change:
|
|
14
|
+
- wave parsing and validation
|
|
15
|
+
- role-based execution with evaluator, integration, and documentation stewards
|
|
16
|
+
- executor profiles and lane runtime policy
|
|
17
|
+
- compiled inboxes, ledgers, docs queues, dependency snapshots, and trace bundles
|
|
18
|
+
- orchestrator-first clarification handling and human feedback workflows
|
|
19
|
+
|
|
20
|
+
The biggest remaining gap is not runtime execution. It is authored planning quality, the tooling around planning, and a lower-friction entry point for ad-hoc work that still preserves the same coordination and trace surfaces.
|
|
21
|
+
|
|
22
|
+
## Planning Fidelity Target
|
|
23
|
+
|
|
24
|
+
Every serious wave should be able to answer these questions before launch:
|
|
25
|
+
|
|
26
|
+
- What earlier waves or artifacts are prerequisites?
|
|
27
|
+
- What exact components are being promoted and why now?
|
|
28
|
+
- What is the required runtime mix and fallback policy?
|
|
29
|
+
- Which deploy environment or infra substrate is in scope?
|
|
30
|
+
- Is the run `oversight` or `dark-factory`?
|
|
31
|
+
- What exact validation commands must pass?
|
|
32
|
+
- What exact artifact closes the role?
|
|
33
|
+
|
|
34
|
+
Generated waves and transient ad-hoc runs should default to these sections when relevant:
|
|
35
|
+
|
|
36
|
+
- sequencing note
|
|
37
|
+
- reference rule or source-of-truth note
|
|
38
|
+
- project bootstrap context
|
|
39
|
+
- deploy environments
|
|
40
|
+
- component promotions
|
|
41
|
+
- Context7 defaults
|
|
42
|
+
- per-agent required context
|
|
43
|
+
- earlier-wave outputs to read
|
|
44
|
+
- requirements
|
|
45
|
+
- validation
|
|
46
|
+
- output or closure contract
|
|
47
|
+
|
|
48
|
+
## Phase 1: Planner Foundation
|
|
49
|
+
|
|
50
|
+
Status: shipped in `0.5.4`.
|
|
51
|
+
|
|
52
|
+
- Add saved project bootstrap memory in `.wave/project-profile.json`.
|
|
53
|
+
- Ask once whether the repo is a new project and keep that answer for future drafts.
|
|
54
|
+
- Add `wave project setup` and `wave project show`.
|
|
55
|
+
- Add interactive `wave draft` that writes:
|
|
56
|
+
- `docs/plans/waves/specs/wave-<n>.json`
|
|
57
|
+
- `docs/plans/waves/wave-<n>.md`
|
|
58
|
+
- Treat the JSON draft spec as the canonical authoring artifact and render markdown from it.
|
|
59
|
+
- Keep generated waves fully compatible with the current parser and launcher.
|
|
60
|
+
- Add `wave launch --terminal-surface vscode|tmux|none`.
|
|
61
|
+
- Support a tmux-only operator mode that never touches `.vscode/terminals.json`.
|
|
142
62
|
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
### 3. Explicit Integration Phase Before Final Closure
|
|
146
|
-
|
|
147
|
-
Add a dedicated integration phase between implementation completion and documentation/evaluator closure.
|
|
148
|
-
|
|
149
|
-
Why this is essential:
|
|
63
|
+
Why first:
|
|
150
64
|
|
|
151
|
-
-
|
|
152
|
-
-
|
|
153
|
-
-
|
|
65
|
+
- Better planning is the highest-leverage missing piece.
|
|
66
|
+
- The repo already has strong runtime and closure machinery.
|
|
67
|
+
- Project memory removes repeated setup questions and gives future planner steps a durable baseline.
|
|
154
68
|
|
|
155
|
-
|
|
69
|
+
## Phase 2: Ad-Hoc Task Runs
|
|
156
70
|
|
|
157
|
-
-
|
|
158
|
-
- the integration steward does not own feature implementation
|
|
159
|
-
- it owns synthesis, conflict detection, integration risk, and open dependency reconciliation
|
|
71
|
+
The orchestrator should support operator-driven one-off requests without requiring the user to author or commit a numbered roadmap wave first.
|
|
160
72
|
|
|
161
|
-
|
|
73
|
+
CLI target:
|
|
162
74
|
|
|
163
|
-
-
|
|
164
|
-
-
|
|
75
|
+
- `wave adhoc plan --task "..."`
|
|
76
|
+
- `wave adhoc run --task "..." [--task "..."]`
|
|
77
|
+
- `wave adhoc list`
|
|
78
|
+
- `wave adhoc show --run <id>`
|
|
165
79
|
|
|
166
|
-
|
|
80
|
+
Behavior:
|
|
167
81
|
|
|
168
|
-
-
|
|
169
|
-
-
|
|
170
|
-
-
|
|
171
|
-
-
|
|
172
|
-
-
|
|
173
|
-
- proof gaps
|
|
174
|
-
- doc gaps
|
|
175
|
-
- release/deploy risks
|
|
176
|
-
- final recommendation: `ready-for-doc-closure` or `needs-more-work`
|
|
82
|
+
- accept one or more free-form task requests
|
|
83
|
+
- normalize them into a single transient plan or spec
|
|
84
|
+
- synthesize the worker roles needed for the request while still preserving evaluator, integration, and documentation closure when relevant
|
|
85
|
+
- run that transient plan through the existing launcher, coordination, inbox, ledger, docs queue, integration, and trace machinery
|
|
86
|
+
- keep ad-hoc runs logged, inspectable, and replayable with the same basic operator surfaces as roadmap waves
|
|
177
87
|
|
|
178
|
-
|
|
88
|
+
Storage model:
|
|
179
89
|
|
|
180
|
-
-
|
|
90
|
+
- do not write ad-hoc runs into the canonical numbered wave sequence under `docs/plans/waves/`
|
|
91
|
+
- store the original request, generated spec, rendered markdown, and final result under `.wave/adhoc/runs/<run-id>/`
|
|
92
|
+
- keep runtime state isolated under `.tmp/<lane>-wave-launcher/adhoc/<run-id>/`
|
|
93
|
+
- extend trace metadata with `runKind: adhoc` and `runId`
|
|
181
94
|
|
|
182
|
-
|
|
95
|
+
Design constraints:
|
|
183
96
|
|
|
184
|
-
|
|
97
|
+
- reuse the planner and launcher instead of building a second runtime
|
|
98
|
+
- treat ad-hoc as a transient single-run execution unit, not a fake roadmap wave
|
|
99
|
+
- do not let ad-hoc completion mutate normal `completedWaves` lane state
|
|
100
|
+
- give `wave coord`, `wave feedback`, and future replay or reporting flows a way to target `--run <id>`
|
|
185
101
|
|
|
186
102
|
Why this matters:
|
|
187
103
|
|
|
188
|
-
-
|
|
189
|
-
-
|
|
104
|
+
- many real operator requests are one-off bugfix, investigation, doc, infra, or release tasks
|
|
105
|
+
- the framework's coordination, closure, and traceability should apply to ad-hoc work too
|
|
106
|
+
- isolated ad-hoc runs preserve auditability without polluting the long-lived roadmap
|
|
190
107
|
|
|
191
|
-
|
|
108
|
+
## Phase 3: Forward Replanning
|
|
192
109
|
|
|
193
|
-
|
|
110
|
+
Add `wave update --from-wave <n>`.
|
|
194
111
|
|
|
195
|
-
|
|
112
|
+
Rules:
|
|
196
113
|
|
|
197
|
-
-
|
|
198
|
-
-
|
|
199
|
-
-
|
|
200
|
-
-
|
|
201
|
-
- docs status
|
|
202
|
-
- infra/deploy status
|
|
203
|
-
- dependent tasks
|
|
204
|
-
- baseline verification status
|
|
114
|
+
- closed waves are immutable
|
|
115
|
+
- the current open wave and later waves may be regenerated
|
|
116
|
+
- replanning must record what changed and why
|
|
117
|
+
- new repo state, new user intent, and refreshed research may all trigger a replan
|
|
205
118
|
|
|
206
|
-
|
|
119
|
+
Outputs:
|
|
207
120
|
|
|
208
|
-
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
Use coordination state to drive execution decisions.
|
|
121
|
+
- updated draft JSON specs
|
|
122
|
+
- regenerated markdown waves
|
|
123
|
+
- a short replan summary for operator review
|
|
213
124
|
|
|
214
125
|
Why this matters:
|
|
215
126
|
|
|
216
|
-
-
|
|
217
|
-
-
|
|
218
|
-
|
|
219
|
-
Additions:
|
|
220
|
-
|
|
221
|
-
- if an agent has unacknowledged targeted requests, prioritize or relaunch that agent
|
|
222
|
-
- if a high-priority blocker remains unresolved, prevent wave completion
|
|
223
|
-
- if integration detects unresolved cross-agent contradictions, force a focused follow-up round
|
|
224
|
-
- if only documentation deltas remain, relaunch only the documentation steward
|
|
225
|
-
- if only deployment or infra proof remains, relaunch only the relevant infra/deploy role
|
|
226
|
-
|
|
227
|
-
### 6. Mixed-Runtime Planning And Runtime Profiles
|
|
228
|
-
|
|
229
|
-
Treat executor choice as authored plan data at wave design time, not only as a launcher default.
|
|
230
|
-
|
|
231
|
-
Why this is useful:
|
|
232
|
-
|
|
233
|
-
- The current harness already supports per-agent executor selection, but the planning surface is too narrow for real mixed-runtime lane design.
|
|
234
|
-
- Different roles benefit from different runtimes: implementation, evaluation, documentation, integration, and infra/deploy do not need identical execution substrates.
|
|
235
|
-
- The OpenAI App Server pattern and OPENDEV's provider-conditional harness design both point toward a stable harness loop with swappable underlying runtimes.
|
|
236
|
-
|
|
237
|
-
Wave file change:
|
|
238
|
-
|
|
239
|
-
- strengthen `### Executor` from optional override into a first-class planning section for roles that need non-default runtime behavior
|
|
240
|
-
- allow runtime profiles plus inline overrides
|
|
241
|
-
|
|
242
|
-
Recommended keys:
|
|
243
|
-
|
|
244
|
-
- `id`
|
|
245
|
-
- `profile`
|
|
246
|
-
- `model`
|
|
247
|
-
- `fallbacks`
|
|
248
|
-
- `tags`
|
|
249
|
-
- `budget.turns`
|
|
250
|
-
- `budget.minutes`
|
|
251
|
-
- `codex.command`
|
|
252
|
-
- `codex.sandbox`
|
|
253
|
-
- `claude.command`
|
|
254
|
-
- `claude.agent`
|
|
255
|
-
- `claude.permission_mode`
|
|
256
|
-
- `claude.permission_prompt_tool`
|
|
257
|
-
- `claude.max_turns`
|
|
258
|
-
- `claude.mcp_config`
|
|
259
|
-
- `claude.settings`
|
|
260
|
-
- `claude.output_format`
|
|
261
|
-
- `claude.allowed_tools`
|
|
262
|
-
- `claude.disallowed_tools`
|
|
263
|
-
- `opencode.command`
|
|
264
|
-
- `opencode.agent`
|
|
265
|
-
- `opencode.attach`
|
|
266
|
-
- `opencode.format`
|
|
267
|
-
- `opencode.steps`
|
|
268
|
-
- `opencode.instructions`
|
|
269
|
-
- `opencode.permission`
|
|
270
|
-
|
|
271
|
-
Lane config additions:
|
|
272
|
-
|
|
273
|
-
- `executors.profiles.<profile-name>`
|
|
274
|
-
- `lanes.<lane>.runtimeMixTargets`
|
|
275
|
-
- `lanes.<lane>.defaultExecutorByRole`
|
|
276
|
-
- `lanes.<lane>.fallbackExecutorOrder`
|
|
277
|
-
|
|
278
|
-
Example runtime mix target:
|
|
279
|
-
|
|
280
|
-
- `codex: 3`
|
|
281
|
-
- `claude: 3`
|
|
282
|
-
- `opencode: 2`
|
|
283
|
-
|
|
284
|
-
Use:
|
|
285
|
-
|
|
286
|
-
- planners assign runtime and runtime profile inside the wave, not only at launch time
|
|
287
|
-
- launcher validation accepts only supported runtime fields and rejects silent drift
|
|
288
|
-
- the orchestrator can reassign an agent only when the fallback policy allows it
|
|
289
|
-
- dashboards, ledgers, and traces report runtime by agent, by role, and by fallback path
|
|
290
|
-
|
|
291
|
-
### 7. Capability-Based Volunteer Roles
|
|
292
|
-
|
|
293
|
-
Extend fixed roles with optional capability-based volunteering.
|
|
294
|
-
|
|
295
|
-
Why this is useful:
|
|
296
|
-
|
|
297
|
-
- the blackboard papers show that rigid controller knowledge does not scale well
|
|
298
|
-
- the current wave format already supports multiple roles; capability tags make routing smarter without removing explicit ownership
|
|
299
|
-
|
|
300
|
-
Wave file addition:
|
|
301
|
-
|
|
302
|
-
- optional `### Capabilities`
|
|
303
|
-
|
|
304
|
-
Examples:
|
|
305
|
-
|
|
306
|
-
- `integration`
|
|
307
|
-
- `docs-shared-plan`
|
|
308
|
-
- `infra-k8s`
|
|
309
|
-
- `deploy-railway`
|
|
310
|
-
- `schema-migration`
|
|
311
|
-
- `frontend-validation`
|
|
312
|
-
|
|
313
|
-
Use:
|
|
314
|
-
|
|
315
|
-
- requests can target a named agent or a capability class
|
|
316
|
-
- the launcher can assign the next step to the least-busy matching agent or a configured preferred role
|
|
127
|
+
- multi-wave plans drift as code lands
|
|
128
|
+
- research and infra assumptions change
|
|
129
|
+
- forward-only replanning preserves auditability without pretending older waves never existed
|
|
317
130
|
|
|
318
|
-
|
|
131
|
+
## Phase 4: Infra and Deploy-Aware Planning
|
|
319
132
|
|
|
320
|
-
|
|
133
|
+
Infra and deploy roles need typed environment context, not free-form prompt notes only.
|
|
321
134
|
|
|
322
|
-
|
|
135
|
+
Project profile should support typed deploy providers with a `custom` escape hatch:
|
|
323
136
|
|
|
324
|
-
-
|
|
325
|
-
-
|
|
326
|
-
-
|
|
137
|
+
- `railway-mcp`
|
|
138
|
+
- `railway-cli`
|
|
139
|
+
- `docker-compose`
|
|
140
|
+
- `kubernetes`
|
|
141
|
+
- `ssh-manual`
|
|
142
|
+
- `custom`
|
|
327
143
|
|
|
328
|
-
|
|
144
|
+
Planner-generated infra or deploy roles should know:
|
|
329
145
|
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
- resolves it from existing policy or prior decisions
|
|
336
|
-
- escalates to a human when external intent is truly missing
|
|
337
|
-
4. only unresolved product, policy, safety, or externally-owned decisions become human tickets
|
|
146
|
+
- which environment they own
|
|
147
|
+
- which substrate is authoritative
|
|
148
|
+
- what credentials or executors are expected
|
|
149
|
+
- what validation commands prove readiness
|
|
150
|
+
- what rollback or recovery guidance applies
|
|
338
151
|
|
|
339
|
-
|
|
152
|
+
This is especially important for `dark-factory` mode. Fully autonomous infra work should require stronger environment modeling than human-overseen work.
|
|
340
153
|
|
|
341
|
-
-
|
|
342
|
-
- `orchestrator-guidance`
|
|
343
|
-
- `resolved-by-policy`
|
|
344
|
-
- `human-escalation`
|
|
345
|
-
- `human-feedback`
|
|
154
|
+
## Phase 5: Oversight and Dark-Factory Modes
|
|
346
155
|
|
|
347
|
-
|
|
156
|
+
Execution posture must be explicit plan data.
|
|
348
157
|
|
|
349
|
-
|
|
350
|
-
- `.tmp/<lane>-wave-launcher/feedback/triage/wave-<n>/pending-human.md`
|
|
158
|
+
Default:
|
|
351
159
|
|
|
352
|
-
|
|
160
|
+
- `oversight`
|
|
353
161
|
|
|
354
|
-
-
|
|
355
|
-
- autonomous mode should drain orchestrator-resolvable clarification items before refusing to continue
|
|
356
|
-
- answered human feedback should be written back into the coordination store and wave ledger so the same question is not asked twice
|
|
162
|
+
Opt-in:
|
|
357
163
|
|
|
358
|
-
|
|
164
|
+
- `dark-factory`
|
|
359
165
|
|
|
360
|
-
|
|
166
|
+
`oversight` means:
|
|
361
167
|
|
|
362
|
-
|
|
168
|
+
- human checkpoints remain normal for live mutation, deploy, release, or risky infra work
|
|
169
|
+
- the planner should generate explicit review gates
|
|
363
170
|
|
|
364
|
-
-
|
|
365
|
-
- without this, harness changes are anecdotal
|
|
171
|
+
`dark-factory` means:
|
|
366
172
|
|
|
367
|
-
|
|
173
|
+
- the wave is intended to run end-to-end without routine human approvals
|
|
174
|
+
- deploy environment, validation, rollback, and closure signals must be stricter
|
|
175
|
+
- missing environment context is a planning error, not a runtime surprise
|
|
368
176
|
|
|
369
|
-
|
|
370
|
-
- prompt fingerprints
|
|
371
|
-
- compiled inboxes
|
|
372
|
-
- coordination store snapshot
|
|
373
|
-
- structured markers from logs
|
|
374
|
-
- exit contract outcomes
|
|
375
|
-
- integration summary
|
|
376
|
-
- evaluator verdict
|
|
377
|
-
- docs closure state
|
|
378
|
-
- runtime budgets and retries
|
|
379
|
-
- cumulative quality metrics
|
|
380
|
-
- gate snapshot and artifact-presence metadata
|
|
381
|
-
- replay context and cumulative history snapshot for hermetic replay
|
|
177
|
+
## Phase 6: Coordination and Integration Upgrades
|
|
382
178
|
|
|
383
|
-
|
|
179
|
+
The runtime already has strong coordination primitives, but the roadmap should still push these areas:
|
|
384
180
|
|
|
385
|
-
-
|
|
181
|
+
- keep the canonical coordination store as the source of truth and the markdown board as a rendered view
|
|
182
|
+
- keep compiled per-agent inboxes and shared summaries central to prompt construction
|
|
183
|
+
- strengthen the integration steward output as the single closure-ready synthesis artifact
|
|
184
|
+
- add `wave lint` for ownership, component promotion, runtime mix, deploy environment, and closure completeness
|
|
185
|
+
- expand replay scenarios for replanning, autonomy modes, and infra-heavy waves
|
|
386
186
|
|
|
387
|
-
|
|
187
|
+
## Additional Features Worth Scheduling
|
|
388
188
|
|
|
389
|
-
-
|
|
390
|
-
-
|
|
391
|
-
-
|
|
392
|
-
-
|
|
393
|
-
-
|
|
394
|
-
-
|
|
189
|
+
- template packs for common wave shapes: implementation, QA, infra, release, migration
|
|
190
|
+
- doc-delta extraction plus changelog or release-note queues when waves change public behavior
|
|
191
|
+
- executor and credential preflight checks before launch
|
|
192
|
+
- project-profile-aware defaults for lane, template, terminal surface, and oversight mode
|
|
193
|
+
- richer branch and PR guidance in draft specs when the wave is release or deploy oriented
|
|
194
|
+
- benchmark scenarios that compare oversight vs dark-factory outcomes
|
|
395
195
|
|
|
396
|
-
##
|
|
196
|
+
## Research Notes
|
|
397
197
|
|
|
398
|
-
|
|
198
|
+
The direction above is consistent with the local source set and the current external references:
|
|
399
199
|
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
- communication is mostly free-text
|
|
411
|
-
- integration is implicit
|
|
412
|
-
- runtime planning is still too lane-default and not expressive enough for deliberate mixed-runtime teams
|
|
413
|
-
- clarification escalates too early to a human queue
|
|
414
|
-
- scheduling is not strongly driven by shared state
|
|
415
|
-
|
|
416
|
-
### Proposed Model
|
|
417
|
-
|
|
418
|
-
Upgraded flow, still wave- and lane-native:
|
|
419
|
-
|
|
420
|
-
1. Parse the wave file into the manifest, runtime plan, and wave ledger.
|
|
421
|
-
2. Resolve executor profiles, fallback policy, and runtime-mix targets for the lane.
|
|
422
|
-
3. Build or update the canonical coordination store.
|
|
423
|
-
4. Compile the shared summary and per-agent inboxes.
|
|
424
|
-
5. Launch implementation, infra, deploy, docs, research, or evaluation roles based on the ledger, runtime plan, and open requests.
|
|
425
|
-
6. Let the orchestrator triage clarification requests and resolve or route them before escalating to a human.
|
|
426
|
-
7. Continuously ingest structured outputs into the coordination store and ledger.
|
|
427
|
-
8. Run a dedicated integration phase to synthesize all claims and remaining gaps.
|
|
428
|
-
9. Run documentation closure using the integration summary.
|
|
429
|
-
10. Run evaluator closure using the integration summary plus final doc state.
|
|
430
|
-
11. Persist the attempt trace bundle for replay and evaluation.
|
|
431
|
-
|
|
432
|
-
## Recommended Role Model
|
|
433
|
-
|
|
434
|
-
This role model works with the current multi-role architecture and extends it rather than replacing it:
|
|
435
|
-
|
|
436
|
-
- `A0` evaluator
|
|
437
|
-
- `A8` integration steward
|
|
438
|
-
- `A9` documentation steward
|
|
439
|
-
- implementation roles, each owning explicit files and components
|
|
440
|
-
- optional infra role for identity, admission, machine conformance, or deployment substrates
|
|
441
|
-
- optional deploy verifier role for rollout, health, and operational proof
|
|
442
|
-
|
|
443
|
-
Responsibilities:
|
|
444
|
-
|
|
445
|
-
- implementation roles produce code, proofs, and doc deltas
|
|
446
|
-
- infra/deploy roles produce structured environment proof
|
|
447
|
-
- integration steward synthesizes cross-role state
|
|
448
|
-
- documentation steward reconciles shared docs and component matrix
|
|
449
|
-
- evaluator decides whether the wave is coherent enough to pass
|
|
450
|
-
|
|
451
|
-
## Runtime Planning And Lane Mix
|
|
452
|
-
|
|
453
|
-
Wave orchestration should support a deliberate runtime mix inside one lane. A lane can run `3 codex`, `2 claude`, and `2 opencode` agents as long as the wave declares which agent prefers which runtime and what fallbacks are allowed.
|
|
454
|
-
|
|
455
|
-
Recommended starting mapping for this repo:
|
|
456
|
-
|
|
457
|
-
- implementation and test-fix roles: `codex`
|
|
458
|
-
- integration steward, evaluator, and documentation steward: `claude`
|
|
459
|
-
- exploratory helper, research, and CLI-heavy ops roles: `opencode`
|
|
460
|
-
- infra and deploy roles: choose `codex` or `opencode` based on the command workflow and tooling needs, not by habit
|
|
461
|
-
|
|
462
|
-
Planning rules:
|
|
463
|
-
|
|
464
|
-
- every agent in a deliberate mixed-runtime wave should declare `### Executor`
|
|
465
|
-
- runtime reassignment during execution must preserve ownership and leave an audit record
|
|
466
|
-
- runtime profiles should capture the common presets such as `implement-fast`, `deep-review`, `docs-pass`, and `ops-triage`
|
|
467
|
-
- integration summaries should report the final runtime used by each agent and whether any fallback fired
|
|
468
|
-
|
|
469
|
-
This keeps runtime choice visible in the authored plan instead of hiding it inside CLI defaults.
|
|
470
|
-
|
|
471
|
-
## Lanes And Cross-Lane Coordination
|
|
472
|
-
|
|
473
|
-
Lanes should remain isolated in execution state but gain typed cross-lane dependency tickets.
|
|
474
|
-
|
|
475
|
-
Current strength:
|
|
476
|
-
|
|
477
|
-
- lane-scoped paths already exist
|
|
478
|
-
- an orchestrator board already exists
|
|
479
|
-
|
|
480
|
-
Upgrade:
|
|
481
|
-
|
|
482
|
-
- add `.tmp/wave-orchestrator/dependencies/<lane>.jsonl`
|
|
483
|
-
- each cross-lane dependency is a typed ticket with owner lane, requester lane, closure condition, and related waves
|
|
484
|
-
- lane autonomous mode should refuse to finalize if it has unresolved required inbound dependencies
|
|
485
|
-
|
|
486
|
-
This keeps lane isolation while making cross-lane work explicit and schedulable.
|
|
487
|
-
|
|
488
|
-
## Documentation Upgrades
|
|
489
|
-
|
|
490
|
-
The current documentation steward role is good, but it is overloaded.
|
|
491
|
-
|
|
492
|
-
Improve it by adding:
|
|
493
|
-
|
|
494
|
-
- doc delta extraction from implementation markers into a machine-readable queue
|
|
495
|
-
- explicit shared-plan reconciliation checklist
|
|
496
|
-
- component-matrix reconciliation checklist
|
|
497
|
-
- release-notes or changelog queue when a wave changes public package behavior
|
|
498
|
-
- a per-wave runtime assignment summary so doc and eval roles can see which runtime owned which artifacts
|
|
499
|
-
|
|
500
|
-
Documentation should consume integration outputs, not rediscover them from raw logs.
|
|
501
|
-
|
|
502
|
-
## Evaluation Upgrades
|
|
503
|
-
|
|
504
|
-
The harness should move from “wave passed or failed” to “wave quality is replayable and comparable.”
|
|
505
|
-
|
|
506
|
-
Add:
|
|
507
|
-
|
|
508
|
-
- per-wave regression datasets
|
|
509
|
-
- replayable trace bundles
|
|
510
|
-
- scoring for communication health, integration quality, and proof quality
|
|
511
|
-
- continuous-history benchmark scenarios, not only single-wave success
|
|
512
|
-
- runtime-mix reporting so success can be segmented by executor and by role
|
|
513
|
-
- clarification reporting so orchestrator-resolved questions and human escalations are both measurable
|
|
514
|
-
|
|
515
|
-
Suggested metrics:
|
|
516
|
-
|
|
517
|
-
- unresolved request count at closure
|
|
518
|
-
- integration contradiction count
|
|
519
|
-
- documentation drift count
|
|
520
|
-
- proof completeness ratio
|
|
521
|
-
- relaunch count by role
|
|
522
|
-
- relaunch count by executor
|
|
523
|
-
- runtime fallback rate
|
|
524
|
-
- mean time to first acknowledgement
|
|
525
|
-
- mean time to blocker resolution
|
|
526
|
-
- orchestrator clarification resolution rate
|
|
527
|
-
- human escalation rate
|
|
528
|
-
- evaluator reversal rate between early and final verdicts
|
|
529
|
-
|
|
530
|
-
## Infra And DevOps Upgrades
|
|
531
|
-
|
|
532
|
-
The harness already has structured deploy and infra markers. The next step is to make them durable and wave-aware.
|
|
533
|
-
|
|
534
|
-
Add:
|
|
535
|
-
|
|
536
|
-
- infra proof records into the coordination store and ledger
|
|
537
|
-
- deploy readiness and deploy verification as separate states
|
|
538
|
-
- environment baseline checks at wave start
|
|
539
|
-
- executor binary, credential, and profile availability checks for every runtime referenced by the wave
|
|
540
|
-
- required rollback or recovery guidance for waves that touch live systems
|
|
541
|
-
|
|
542
|
-
For infra- or deploy-heavy lanes, the integration steward should treat infra proof as first-class, not as a side detail in implementation logs.
|
|
543
|
-
|
|
544
|
-
## Prioritized Delivery Order
|
|
545
|
-
|
|
546
|
-
### Phase 1: Coordination And Planning Foundation
|
|
547
|
-
|
|
548
|
-
- canonical coordination store
|
|
549
|
-
- markdown board as rendered view
|
|
550
|
-
- per-agent inbox compiler
|
|
551
|
-
- full per-agent `### Executor` schema with runtime profiles
|
|
552
|
-
- typed clarification and human-feedback events
|
|
553
|
-
|
|
554
|
-
Why first:
|
|
555
|
-
|
|
556
|
-
- every other improvement depends on better shared state, a durable runtime plan, and a typed clarification model
|
|
557
|
-
|
|
558
|
-
### Phase 2: Integration And Scheduling
|
|
559
|
-
|
|
560
|
-
- integration steward role
|
|
561
|
-
- integration summary artifacts
|
|
562
|
-
- communication-aware relaunch and closure rules
|
|
563
|
-
- orchestrator-first clarification resolver
|
|
564
|
-
- wave ledger
|
|
565
|
-
|
|
566
|
-
Why second:
|
|
567
|
-
|
|
568
|
-
- this closes the communication-reasoning gap and the too-early human escalation loop without changing the authored wave format
|
|
569
|
-
|
|
570
|
-
### Phase 3: Evaluation And Replay
|
|
571
|
-
|
|
572
|
-
- shipped:
|
|
573
|
-
- trace bundles
|
|
574
|
-
- cumulative wave quality metrics
|
|
575
|
-
- runtime-mix and clarification metrics
|
|
576
|
-
- internal replay validation against stored attempt bundles
|
|
577
|
-
- launcher-generated replay acceptance coverage for hermetic pass, clarification, blocking, and retry/fallback traces
|
|
578
|
-
- still open:
|
|
579
|
-
- larger continuous-history replay scenario sets across more than one wave
|
|
580
|
-
- a public replay CLI if the internal helper proves stable
|
|
581
|
-
|
|
582
|
-
Why third:
|
|
583
|
-
|
|
584
|
-
- once state and flow are structured, evaluation becomes meaningful
|
|
585
|
-
|
|
586
|
-
### Phase 4: Capability Routing And Cross-Lane Dependencies
|
|
587
|
-
|
|
588
|
-
- shipped:
|
|
589
|
-
- capability tags
|
|
590
|
-
- deterministic helper-assignment routing from open requests
|
|
591
|
-
- helper-assignment snapshots under `.tmp/<lane>-wave-launcher/assignments/`
|
|
592
|
-
- typed `wave dep post|show|resolve|render` operator workflows
|
|
593
|
-
- per-wave inbound/outbound dependency snapshots under `.tmp/<lane>-wave-launcher/dependencies/`
|
|
594
|
-
- dependency-aware gating, inboxes, dashboards, and trace/replay artifacts
|
|
595
|
-
- still open:
|
|
596
|
-
- larger multi-lane benchmark scenarios that stress dependency resolution across more than one wave
|
|
597
|
-
- richer dependency-specific operator dashboards if the current JSON and markdown projections prove insufficient
|
|
598
|
-
|
|
599
|
-
Why fourth:
|
|
600
|
-
|
|
601
|
-
- this only became high leverage once the coordination, integration, and replay layers were already trustworthy
|
|
200
|
+
- OpenAI, “Harness engineering: leveraging Codex in an agent-first world”
|
|
201
|
+
- repository-local plans and environment design matter more than prompt-only control
|
|
202
|
+
- Anthropic, “Effective harnesses for long-running agents”
|
|
203
|
+
- first-run initialization and durable progress artifacts are critical
|
|
204
|
+
- DOVA
|
|
205
|
+
- deliberation-first orchestration and transparent intermediate state support better refinement loops
|
|
206
|
+
- Silo-Bench
|
|
207
|
+
- communication alone is not enough; integration quality is the real bottleneck
|
|
208
|
+
- Evaluating AGENTS.md
|
|
209
|
+
- repository-level context files help, but they should complement executable and versioned planning artifacts rather than replace them
|
|
602
210
|
|
|
603
211
|
## Immediate Recommendation
|
|
604
212
|
|
|
605
|
-
The
|
|
606
|
-
|
|
607
|
-
1. canonical coordination store
|
|
608
|
-
2. compiled agent inboxes
|
|
609
|
-
3. explicit integration steward and integration summary
|
|
610
|
-
4. full planning-time runtime profiles in `### Executor`
|
|
611
|
-
5. orchestrator-first clarification triage
|
|
612
|
-
|
|
613
|
-
That combination gives the harness the biggest improvement in:
|
|
614
|
-
|
|
615
|
-
- long-running robustness
|
|
616
|
-
- intra-agent messaging quality
|
|
617
|
-
- mixed-runtime planning quality
|
|
618
|
-
- reduced unnecessary human interruption
|
|
619
|
-
- closure reliability
|
|
620
|
-
- lane and multi-role scalability
|
|
621
|
-
|
|
622
|
-
without forcing a rewrite of wave files, lane structure, or existing proof markers.
|
|
213
|
+
The next shipping sequence should be:
|
|
623
214
|
|
|
624
|
-
|
|
215
|
+
1. planner foundation
|
|
216
|
+
2. ad-hoc task runs on the same substrate
|
|
217
|
+
3. forward replanning
|
|
218
|
+
4. typed infra and deploy planning
|
|
219
|
+
5. explicit oversight vs dark-factory workflows
|
|
220
|
+
6. stronger linting, replay, and benchmark coverage
|
|
625
221
|
|
|
626
|
-
|
|
222
|
+
That sequence keeps the current harness intact while making planning, execution posture, and infra ownership much more explicit and durable.
|