@chllming/wave-orchestration 0.5.3 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +53 -3
- package/README.md +81 -506
- package/docs/README.md +53 -0
- package/docs/agents/wave-cont-eval-role.md +36 -0
- package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} +14 -11
- package/docs/agents/wave-documentation-role.md +1 -1
- package/docs/agents/wave-infra-role.md +1 -1
- package/docs/agents/wave-integration-role.md +3 -3
- package/docs/agents/wave-launcher-role.md +4 -3
- package/docs/agents/wave-security-role.md +40 -0
- package/docs/concepts/context7-vs-skills.md +94 -0
- package/docs/concepts/operating-modes.md +91 -0
- package/docs/concepts/runtime-agnostic-orchestration.md +95 -0
- package/docs/concepts/what-is-a-wave.md +183 -0
- package/docs/evals/README.md +166 -0
- package/docs/evals/benchmark-catalog.json +663 -0
- package/docs/guides/author-and-run-waves.md +135 -0
- package/docs/guides/planner.md +118 -0
- package/docs/guides/terminal-surfaces.md +82 -0
- package/docs/image.png +0 -0
- package/docs/plans/component-cutover-matrix.json +1 -1
- package/docs/plans/component-cutover-matrix.md +1 -1
- package/docs/plans/context7-wave-orchestrator.md +2 -0
- package/docs/plans/current-state.md +29 -1
- package/docs/plans/examples/wave-example-live-proof.md +435 -0
- package/docs/plans/master-plan.md +3 -3
- package/docs/plans/migration.md +46 -3
- package/docs/plans/wave-orchestrator.md +71 -8
- package/docs/plans/waves/wave-0.md +4 -4
- package/docs/reference/live-proof-waves.md +177 -0
- package/docs/reference/migration-0.2-to-0.5.md +26 -19
- package/docs/reference/npmjs-trusted-publishing.md +6 -5
- package/docs/reference/runtime-config/README.md +29 -0
- package/docs/reference/sample-waves.md +87 -0
- package/docs/reference/skills.md +224 -0
- package/docs/research/agent-context-sources.md +130 -11
- package/docs/research/coordination-failure-review.md +266 -0
- package/docs/roadmap.md +164 -564
- package/package.json +3 -2
- package/releases/manifest.json +37 -2
- package/scripts/research/agent-context-archive.mjs +83 -1
- package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +811 -0
- package/scripts/wave-orchestrator/adhoc.mjs +1331 -0
- package/scripts/wave-orchestrator/agent-state.mjs +358 -6
- package/scripts/wave-orchestrator/artifact-schemas.mjs +173 -0
- package/scripts/wave-orchestrator/clarification-triage.mjs +10 -3
- package/scripts/wave-orchestrator/config.mjs +65 -12
- package/scripts/wave-orchestrator/context7.mjs +11 -0
- package/scripts/wave-orchestrator/coord-cli.mjs +51 -19
- package/scripts/wave-orchestrator/coordination-store.mjs +26 -4
- package/scripts/wave-orchestrator/coordination.mjs +99 -9
- package/scripts/wave-orchestrator/dashboard-state.mjs +20 -8
- package/scripts/wave-orchestrator/dep-cli.mjs +5 -2
- package/scripts/wave-orchestrator/docs-queue.mjs +8 -2
- package/scripts/wave-orchestrator/evals.mjs +451 -0
- package/scripts/wave-orchestrator/executors.mjs +24 -11
- package/scripts/wave-orchestrator/feedback.mjs +15 -1
- package/scripts/wave-orchestrator/install.mjs +69 -7
- package/scripts/wave-orchestrator/launcher-closure.mjs +281 -0
- package/scripts/wave-orchestrator/launcher-runtime.mjs +334 -0
- package/scripts/wave-orchestrator/launcher.mjs +778 -577
- package/scripts/wave-orchestrator/ledger.mjs +123 -20
- package/scripts/wave-orchestrator/local-executor.mjs +99 -12
- package/scripts/wave-orchestrator/planner.mjs +1463 -0
- package/scripts/wave-orchestrator/project-profile.mjs +190 -0
- package/scripts/wave-orchestrator/replay.mjs +6 -3
- package/scripts/wave-orchestrator/role-helpers.mjs +84 -0
- package/scripts/wave-orchestrator/shared.mjs +77 -11
- package/scripts/wave-orchestrator/skills.mjs +979 -0
- package/scripts/wave-orchestrator/terminals.mjs +16 -0
- package/scripts/wave-orchestrator/traces.mjs +73 -27
- package/scripts/wave-orchestrator/wave-files.mjs +1224 -163
- package/scripts/wave.mjs +20 -0
- package/skills/README.md +202 -0
- package/skills/provider-aws/SKILL.md +117 -0
- package/skills/provider-aws/adapters/claude.md +1 -0
- package/skills/provider-aws/adapters/codex.md +1 -0
- package/skills/provider-aws/references/service-verification.md +39 -0
- package/skills/provider-aws/skill.json +54 -0
- package/skills/provider-custom-deploy/SKILL.md +64 -0
- package/skills/provider-custom-deploy/skill.json +50 -0
- package/skills/provider-docker-compose/SKILL.md +96 -0
- package/skills/provider-docker-compose/adapters/local.md +1 -0
- package/skills/provider-docker-compose/skill.json +53 -0
- package/skills/provider-github-release/SKILL.md +121 -0
- package/skills/provider-github-release/adapters/claude.md +1 -0
- package/skills/provider-github-release/adapters/codex.md +1 -0
- package/skills/provider-github-release/skill.json +55 -0
- package/skills/provider-kubernetes/SKILL.md +143 -0
- package/skills/provider-kubernetes/adapters/claude.md +1 -0
- package/skills/provider-kubernetes/adapters/codex.md +1 -0
- package/skills/provider-kubernetes/references/kubectl-patterns.md +58 -0
- package/skills/provider-kubernetes/skill.json +52 -0
- package/skills/provider-railway/SKILL.md +123 -0
- package/skills/provider-railway/adapters/claude.md +1 -0
- package/skills/provider-railway/adapters/codex.md +1 -0
- package/skills/provider-railway/adapters/local.md +1 -0
- package/skills/provider-railway/adapters/opencode.md +1 -0
- package/skills/provider-railway/references/verification-commands.md +39 -0
- package/skills/provider-railway/skill.json +71 -0
- package/skills/provider-ssh-manual/SKILL.md +97 -0
- package/skills/provider-ssh-manual/skill.json +54 -0
- package/skills/repo-coding-rules/SKILL.md +91 -0
- package/skills/repo-coding-rules/skill.json +34 -0
- package/skills/role-cont-eval/SKILL.md +90 -0
- package/skills/role-cont-eval/adapters/codex.md +1 -0
- package/skills/role-cont-eval/skill.json +36 -0
- package/skills/role-cont-qa/SKILL.md +93 -0
- package/skills/role-cont-qa/adapters/claude.md +1 -0
- package/skills/role-cont-qa/skill.json +36 -0
- package/skills/role-deploy/SKILL.md +96 -0
- package/skills/role-deploy/skill.json +36 -0
- package/skills/role-documentation/SKILL.md +72 -0
- package/skills/role-documentation/skill.json +36 -0
- package/skills/role-implementation/SKILL.md +68 -0
- package/skills/role-implementation/skill.json +36 -0
- package/skills/role-infra/SKILL.md +80 -0
- package/skills/role-infra/skill.json +36 -0
- package/skills/role-integration/SKILL.md +84 -0
- package/skills/role-integration/skill.json +36 -0
- package/skills/role-research/SKILL.md +64 -0
- package/skills/role-research/skill.json +36 -0
- package/skills/role-security/SKILL.md +60 -0
- package/skills/role-security/skill.json +36 -0
- package/skills/runtime-claude/SKILL.md +65 -0
- package/skills/runtime-claude/skill.json +36 -0
- package/skills/runtime-codex/SKILL.md +57 -0
- package/skills/runtime-codex/skill.json +36 -0
- package/skills/runtime-local/SKILL.md +44 -0
- package/skills/runtime-local/skill.json +36 -0
- package/skills/runtime-opencode/SKILL.md +57 -0
- package/skills/runtime-opencode/skill.json +36 -0
- package/skills/wave-core/SKILL.md +114 -0
- package/skills/wave-core/references/marker-syntax.md +62 -0
- package/skills/wave-core/skill.json +35 -0
- package/wave.config.json +61 -5
|
@@ -0,0 +1,266 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Coordination Failure Review"
|
|
3
|
+
summary: "Assessment of whether the Wave orchestrator constructively addresses coordination and blackboard failure modes highlighted by recent multi-agent papers."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Coordination Failure Review
|
|
7
|
+
|
|
8
|
+
## Bottom Line
|
|
9
|
+
|
|
10
|
+
The Wave orchestrator addresses several coordination failure modes constructively in code, not just in prose. In particular, it has:
|
|
11
|
+
|
|
12
|
+
- a canonical machine-readable coordination log
|
|
13
|
+
- compiled shared summaries plus per-agent inboxes
|
|
14
|
+
- explicit clarification, helper-assignment, dependency, integration, documentation, and cont-QA barriers
|
|
15
|
+
- structured proof and verdict validation
|
|
16
|
+
- replayable trace bundles with coordination-quality metrics
|
|
17
|
+
|
|
18
|
+
That is materially stronger than the common "agents talk in a shared channel and we hope that was enough" pattern criticized by recent multi-agent papers.
|
|
19
|
+
|
|
20
|
+
The main weakness is empirical, not architectural. The repo does not yet contain a benchmark family that proves the blackboard actually helps agents reconstruct distributed state under HiddenBench or Silo-Bench style pressure, or that it handles DPBench-style simultaneous coordination reliably.
|
|
21
|
+
|
|
22
|
+
## What The Papers Warn About
|
|
23
|
+
|
|
24
|
+
### `Why Do Multi-Agent LLM Systems Fail?`
|
|
25
|
+
|
|
26
|
+
This paper is the broadest warning. Its failure taxonomy groups problems into:
|
|
27
|
+
|
|
28
|
+
- system design issues
|
|
29
|
+
- inter-agent misalignment
|
|
30
|
+
- task verification failures
|
|
31
|
+
|
|
32
|
+
Those categories are useful here because they distinguish "we gave agents a shared workspace" from "the workspace is actually enforceable and auditable."
|
|
33
|
+
|
|
34
|
+
### `HiddenBench` / `Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs`
|
|
35
|
+
|
|
36
|
+
This is the clearest warning for blackboard-style systems. The central result is that multi-agent groups often fail not because they never communicated, but because they do not notice latent information asymmetry and do not actively surface unshared evidence. They converge on shared evidence too early.
|
|
37
|
+
|
|
38
|
+
For this repo, the key question is therefore not "do agents have a board?" but "does the shared state force enough evidence pooling to avoid premature convergence?"
|
|
39
|
+
|
|
40
|
+
### `Silo-Bench`
|
|
41
|
+
|
|
42
|
+
Silo-Bench sharpens the same point. Agents can exchange information and even form reasonable communication topologies, yet still fail at the reasoning-integration step. Communication volume is not the same thing as distributed-state synthesis.
|
|
43
|
+
|
|
44
|
+
For this repo, the corresponding question is whether summaries, inboxes, and integration passes merely move information around, or actually make the final decision depend on the integrated state.
|
|
45
|
+
|
|
46
|
+
### `DPBench`
|
|
47
|
+
|
|
48
|
+
DPBench shows that LLM teams can look coordinated in serial settings and still collapse in simultaneous coordination settings, with communication often failing to save them. Its practical lesson is that explicit external coordination mechanisms matter when concurrent access or simultaneous action is involved.
|
|
49
|
+
|
|
50
|
+
For this repo, the relevant question is whether coordination is only conversational or whether there are explicit external barriers and tickets that serialize or block unsafe progress.
|
|
51
|
+
|
|
52
|
+
### `Multi-Agent Teams Hold Experts Back`
|
|
53
|
+
|
|
54
|
+
This paper argues that unconstrained teams underuse expertise. Even when the best agent is identifiable, teams often drift toward integrative compromise instead of properly weighting expert judgment.
|
|
55
|
+
|
|
56
|
+
For this repo, the key question is whether the design relies on self-organizing consensus or on explicit role ownership, routing, and gating.
|
|
57
|
+
|
|
58
|
+
## What This Repo Already Does Constructively
|
|
59
|
+
|
|
60
|
+
### Implemented In Code And Tests
|
|
61
|
+
|
|
62
|
+
#### 1. It uses a real canonical shared state, not a cosmetic board
|
|
63
|
+
|
|
64
|
+
The strongest blackboard-like mechanism is the canonical JSONL coordination log plus materialized state in [scripts/wave-orchestrator/coordination-store.mjs](../../scripts/wave-orchestrator/coordination-store.mjs). The markdown board is explicitly a projection for humans, not the scheduler's source of truth, as stated in [docs/plans/wave-orchestrator.md](../plans/wave-orchestrator.md).
|
|
65
|
+
|
|
66
|
+
That state is then compiled into:
|
|
67
|
+
|
|
68
|
+
- a wave-level shared summary via `compileSharedSummary()`
|
|
69
|
+
- targeted per-agent inboxes via `compileAgentInbox()`
|
|
70
|
+
|
|
71
|
+
This is a real mitigation against information silos because agents are not expected to reconstruct the whole wave by rereading raw logs. The inbox compiler also pulls in relevant open coordination through `artifactRefs`, ownership, components, docs items, helper assignments, and dependencies. That behavior is exercised in [test/wave-orchestrator/coordination-store.test.ts](../../test/wave-orchestrator/coordination-store.test.ts).
|
|
72
|
+
|
|
73
|
+
Assessment against the papers:
|
|
74
|
+
|
|
75
|
+
- `HiddenBench`: partially addressed in design
|
|
76
|
+
- `Silo-Bench`: partially addressed in design
|
|
77
|
+
- proof that this works under benchmarked distributed-information pressure: missing
|
|
78
|
+
|
|
79
|
+
#### 2. It makes completion depend on integrated state, not on agent self-report
|
|
80
|
+
|
|
81
|
+
The launcher's gate stack in [scripts/wave-orchestrator/launcher.mjs](../../scripts/wave-orchestrator/launcher.mjs) is the clearest constructive safeguard in the repo. Closure is blocked by:
|
|
82
|
+
|
|
83
|
+
- open clarifications
|
|
84
|
+
- unresolved clarification-linked follow-up requests
|
|
85
|
+
- pending human input
|
|
86
|
+
- unresolved helper assignments
|
|
87
|
+
- open required dependencies
|
|
88
|
+
- integration failures
|
|
89
|
+
- documentation closure failures
|
|
90
|
+
- cont-EVAL failures
|
|
91
|
+
- cont-QA failures
|
|
92
|
+
|
|
93
|
+
This matters because several paper failure modes are really verification failures: agents say they are done, but the system has no hard check that the distributed state was reconciled. Here, the final decision is made by barrier logic rather than informal consensus.
|
|
94
|
+
|
|
95
|
+
Tests in [test/wave-orchestrator/clarification-triage.test.ts](../../test/wave-orchestrator/clarification-triage.test.ts) and [test/wave-orchestrator/launcher.test.ts](../../test/wave-orchestrator/launcher.test.ts) confirm that routed clarification work remains blocking until the linked follow-up is resolved and that integration evidence is derived from coordination, docs, validation, and runtime signals.
|
|
96
|
+
|
|
97
|
+
Assessment against the papers:
|
|
98
|
+
|
|
99
|
+
- `Why Do Multi-Agent LLM Systems Fail?`: strong mitigation of task-verification failures
|
|
100
|
+
- `Silo-Bench`: helps because integrated state has operational consequences
|
|
101
|
+
- `DPBench`: helps by using external barriers instead of relying on emergent coordination alone
|
|
102
|
+
|
|
103
|
+
#### 3. It validates structured evidence instead of trusting narrative summaries
|
|
104
|
+
|
|
105
|
+
[scripts/wave-orchestrator/agent-state.mjs](../../scripts/wave-orchestrator/agent-state.mjs) validates structured markers for implementation proof, integration, cont-EVAL, documentation closure, and cont-QA verdicts. That means the orchestrator can reject:
|
|
106
|
+
|
|
107
|
+
- missing proof markers
|
|
108
|
+
- weaker completion or durability than promised
|
|
109
|
+
- missing doc-delta markers
|
|
110
|
+
- missing component evidence
|
|
111
|
+
- missing deliverables
|
|
112
|
+
- non-ready integration summaries
|
|
113
|
+
- non-satisfied cont-EVAL outcomes
|
|
114
|
+
- non-pass cont-QA gates
|
|
115
|
+
|
|
116
|
+
This directly addresses the "don't kid yourself" critique behind the failure-taxonomy paper. A system that validates explicit proof contracts is much less vulnerable to premature closure than a system that trusts free-form role reports.
|
|
117
|
+
|
|
118
|
+
Assessment against the papers:
|
|
119
|
+
|
|
120
|
+
- `Why Do Multi-Agent LLM Systems Fail?`: strong mitigation for verification and termination failures
|
|
121
|
+
- `Multi-Agent Teams Hold Experts Back`: indirect mitigation, because expert or steward judgment must still be grounded in evidence
|
|
122
|
+
|
|
123
|
+
#### 4. It reduces naive self-organizing compromise through explicit ownership and routing
|
|
124
|
+
|
|
125
|
+
The repo does not rely on free-form team consensus in the way criticized by `Multi-Agent Teams Hold Experts Back`. Instead it uses:
|
|
126
|
+
|
|
127
|
+
- named stewardship roles such as integration and cont-QA in [docs/agents/wave-integration-role.md](../agents/wave-integration-role.md) and [docs/agents/wave-cont-qa-role.md](../agents/wave-cont-qa-role.md)
|
|
128
|
+
- capability-targeted request routing in [scripts/wave-orchestrator/routing-state.mjs](../../scripts/wave-orchestrator/routing-state.mjs)
|
|
129
|
+
- deterministic assignment based on explicit target, preferred agent, or least-busy capability owner
|
|
130
|
+
- staged closure order documented in [docs/plans/current-state.md](../plans/current-state.md) and enforced in the launcher
|
|
131
|
+
|
|
132
|
+
This is a constructive response to the paper's warning about teams averaging expert and non-expert views. The repo favors explicit owner selection and role-specific closure authority over emergent compromise.
|
|
133
|
+
|
|
134
|
+
Assessment against the papers:
|
|
135
|
+
|
|
136
|
+
- `Multi-Agent Teams Hold Experts Back`: partially addressed and better than unconstrained collaboration
|
|
137
|
+
- not fully solved, because routing is based mostly on declared capability and load, not demonstrated expertise quality
|
|
138
|
+
|
|
139
|
+
#### 5. It is unusually observable and replayable
|
|
140
|
+
|
|
141
|
+
[scripts/wave-orchestrator/traces.mjs](../../scripts/wave-orchestrator/traces.mjs) and [scripts/wave-orchestrator/replay.mjs](../../scripts/wave-orchestrator/replay.mjs) give the system an unusually strong postmortem surface. A trace bundle includes:
|
|
142
|
+
|
|
143
|
+
- raw coordination log
|
|
144
|
+
- materialized coordination state
|
|
145
|
+
- ledger
|
|
146
|
+
- docs queue
|
|
147
|
+
- integration summary
|
|
148
|
+
- shared summary
|
|
149
|
+
- copied prompts, logs, status, and inbox artifacts
|
|
150
|
+
- structured signals
|
|
151
|
+
- `quality.json`
|
|
152
|
+
- replay metadata and outcome baseline
|
|
153
|
+
|
|
154
|
+
The quality metrics include unresolved clarifications, contradiction count, capability-assignment timing, dependency-resolution timing, blocker-resolution timing, and fallback counts. Tests in [test/wave-orchestrator/traces.test.ts](../../test/wave-orchestrator/traces.test.ts) verify replay integrity and hash validation.
|
|
155
|
+
|
|
156
|
+
This does not by itself solve coordination failure, but it is a serious safeguard against hidden failure modes because it makes them inspectable and replayable.
|
|
157
|
+
|
|
158
|
+
Assessment against the papers:
|
|
159
|
+
|
|
160
|
+
- `Why Do Multi-Agent LLM Systems Fail?`: strong support for diagnosis and failure analysis
|
|
161
|
+
- `Silo-Bench` and `HiddenBench`: useful observability layer, but not yet a direct capability benchmark
|
|
162
|
+
|
|
163
|
+
### Stated In Docs And Also Reflected In The Software
|
|
164
|
+
|
|
165
|
+
The docs are not purely aspirational here. The main claims in [docs/plans/current-state.md](../plans/current-state.md) and [docs/plans/wave-orchestrator.md](../plans/wave-orchestrator.md) are broadly backed by the code:
|
|
166
|
+
|
|
167
|
+
- canonical coordination log plus generated board
|
|
168
|
+
- compiled shared summaries and per-agent inboxes
|
|
169
|
+
- orchestrator-first clarification triage
|
|
170
|
+
- blocking helper assignments and cross-lane dependencies
|
|
171
|
+
- staged closure order
|
|
172
|
+
- trace bundles and replay validation
|
|
173
|
+
|
|
174
|
+
That alignment matters. In many MAS projects the docs promise a blackboard, but the runtime still reduces to prompt-only coordination. Here the repo's architectural claims are mostly real.
|
|
175
|
+
|
|
176
|
+
## What Is Still Missing To Make The Claim Credible
|
|
177
|
+
|
|
178
|
+
### 1. No distributed-information benchmark family yet
|
|
179
|
+
|
|
180
|
+
The biggest gap is in [docs/evals/benchmark-catalog.json](../evals/benchmark-catalog.json). The current families are:
|
|
181
|
+
|
|
182
|
+
- `service-output`
|
|
183
|
+
- `latency`
|
|
184
|
+
- `quality-regression`
|
|
185
|
+
|
|
186
|
+
There is nothing yet for:
|
|
187
|
+
|
|
188
|
+
- hidden-profile reconstruction
|
|
189
|
+
- silo escape under partial information
|
|
190
|
+
- blackboard consistency across raw log, summary, inboxes, ledger, and integration state
|
|
191
|
+
- contradiction injection and recovery
|
|
192
|
+
- simultaneous coordination under contention
|
|
193
|
+
|
|
194
|
+
So the repo can reasonably claim "we built mechanisms intended to mitigate these failures," but it cannot yet claim "we demonstrated that these mechanisms overcome the failures highlighted by HiddenBench, Silo-Bench, or DPBench."
|
|
195
|
+
|
|
196
|
+
### 2. Information integration is supported, but not measured directly
|
|
197
|
+
|
|
198
|
+
The shared summary, inboxes, and integration pass are all constructive. But there is still no metric that asks:
|
|
199
|
+
|
|
200
|
+
- Did the team reconstruct the globally correct hidden state?
|
|
201
|
+
- Did the summary preserve the critical fact that was originally siloed?
|
|
202
|
+
- Did a wave converge too early on shared evidence while missing private evidence?
|
|
203
|
+
|
|
204
|
+
This is the central failure highlighted by `HiddenBench` and `Silo-Bench`, and the repo does not yet score it directly.
|
|
205
|
+
|
|
206
|
+
### 3. Expertise routing is explicit, but shallow
|
|
207
|
+
|
|
208
|
+
[scripts/wave-orchestrator/routing-state.mjs](../../scripts/wave-orchestrator/routing-state.mjs) is better than unconstrained self-organization, but it still routes mostly by:
|
|
209
|
+
|
|
210
|
+
- explicit target
|
|
211
|
+
- configured preferred agents
|
|
212
|
+
- declared capability ownership
|
|
213
|
+
- least-busy fallback
|
|
214
|
+
|
|
215
|
+
It does not yet weight:
|
|
216
|
+
|
|
217
|
+
- historical success on a capability
|
|
218
|
+
- evidence quality by agent
|
|
219
|
+
- confidence calibration
|
|
220
|
+
- expert-leverage metrics
|
|
221
|
+
|
|
222
|
+
So the repo partially addresses the concern from `Multi-Agent Teams Hold Experts Back`, but it does not yet prove that the best agent's expertise is actually being exploited rather than merely named.
|
|
223
|
+
|
|
224
|
+
### 4. Clarification and contradiction handling are still somewhat heuristic
|
|
225
|
+
|
|
226
|
+
Clarification triage and integration evidence aggregation are real safeguards, but they still lean heavily on:
|
|
227
|
+
|
|
228
|
+
- ownership mappings
|
|
229
|
+
- artifact references
|
|
230
|
+
- structured markers
|
|
231
|
+
- text-level summaries and conflict extraction
|
|
232
|
+
|
|
233
|
+
That is enough to make the runtime operationally safer, but it is not yet a richer semantic evidence-integration layer. Subtle contradictions or latent information asymmetries may still be missed.
|
|
234
|
+
|
|
235
|
+
### 5. DPBench-style simultaneous coordination is only indirectly addressed
|
|
236
|
+
|
|
237
|
+
The repo already uses external coordination mechanisms such as blocking assignments, dependency tickets, and closure barriers. That is directionally aligned with DPBench's lesson that explicit external coordination beats naive emergent coordination.
|
|
238
|
+
|
|
239
|
+
But there is still no direct stress harness for:
|
|
240
|
+
|
|
241
|
+
- simultaneous resource contention
|
|
242
|
+
- many-way concurrent dependencies
|
|
243
|
+
- lock-step coordination failures
|
|
244
|
+
- deadlock-like patterns caused by convergent reasoning
|
|
245
|
+
|
|
246
|
+
So the design points in the right direction, but the claim is not yet validated.
|
|
247
|
+
|
|
248
|
+
## Gap Matrix
|
|
249
|
+
|
|
250
|
+
| Paper | Main warning | Repo response | Assessment |
|
|
251
|
+
| --- | --- | --- | --- |
|
|
252
|
+
| [Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/abs/2503.13657) | MAS fail through bad system design, misalignment, and weak verification | Canonical coordination state, barrier-based closure, structured evidence validation, replayable traces | Addressed materially in architecture and software |
|
|
253
|
+
| [Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs](https://arxiv.org/abs/2505.11556) | Teams miss latent information asymmetry and converge too early on shared evidence | Shared summaries, per-agent inboxes, integration steward, clarification flow | Partially addressed in design, not validated empirically |
|
|
254
|
+
| [Silo-Bench](https://arxiv.org/abs/2603.01045) | Communication is not enough; reasoning integration is the bottleneck | Integration evidence aggregation and barrier-driven closure | Partially addressed in design, but no direct integration-quality benchmark |
|
|
255
|
+
| [DPBench](https://arxiv.org/abs/2602.13255) | Simultaneous coordination can fail badly even with communication | External helper assignments, dependency barriers, explicit blocking workflow | Directionally addressed, but not benchmarked under simultaneous contention |
|
|
256
|
+
| [Multi-Agent Teams Hold Experts Back](https://arxiv.org/abs/2602.01011) | Self-organizing teams underuse experts and drift toward compromise | Named stewards, explicit role authority, capability routing, proof gates | Better than naive teams, but expertise leverage is not measured or optimized deeply |
|
|
257
|
+
|
|
258
|
+
## Final Assessment
|
|
259
|
+
|
|
260
|
+
If the standard is "does this repo merely claim multi-agent coordination," the answer is no. It has real machinery for blackboard-like state sharing, evidence-based closure, clarification handling, and coordination diagnostics.
|
|
261
|
+
|
|
262
|
+
If the standard is "has this repo already demonstrated that its design beats the core failure modes isolated by HiddenBench, Silo-Bench, DPBench, and related work," the answer is also no. The design is substantially more credible than most MAS stacks, but the empirical proof is still missing.
|
|
263
|
+
|
|
264
|
+
The most accurate claim today is:
|
|
265
|
+
|
|
266
|
+
> Wave already implements several constructive anti-failure mechanisms for coordination and blackboard-style orchestration, especially around shared state, gating, and observability. What it still lacks is a benchmark suite that proves those mechanisms actually overcome distributed-information and simultaneous-coordination failures rather than simply organizing them better.
|