@chllming/wave-orchestration 0.5.4 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (126) hide show
  1. package/CHANGELOG.md +52 -3
  2. package/README.md +33 -5
  3. package/docs/README.md +18 -4
  4. package/docs/agents/wave-cont-eval-role.md +36 -0
  5. package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} +14 -11
  6. package/docs/agents/wave-documentation-role.md +1 -1
  7. package/docs/agents/wave-infra-role.md +1 -1
  8. package/docs/agents/wave-integration-role.md +3 -3
  9. package/docs/agents/wave-launcher-role.md +4 -3
  10. package/docs/agents/wave-security-role.md +40 -0
  11. package/docs/concepts/context7-vs-skills.md +1 -1
  12. package/docs/concepts/what-is-a-wave.md +56 -6
  13. package/docs/evals/README.md +166 -0
  14. package/docs/evals/benchmark-catalog.json +663 -0
  15. package/docs/guides/author-and-run-waves.md +135 -0
  16. package/docs/guides/planner.md +5 -0
  17. package/docs/guides/terminal-surfaces.md +2 -0
  18. package/docs/plans/component-cutover-matrix.json +1 -1
  19. package/docs/plans/component-cutover-matrix.md +1 -1
  20. package/docs/plans/current-state.md +19 -1
  21. package/docs/plans/examples/wave-example-live-proof.md +435 -0
  22. package/docs/plans/migration.md +42 -0
  23. package/docs/plans/wave-orchestrator.md +46 -7
  24. package/docs/plans/waves/wave-0.md +4 -4
  25. package/docs/reference/live-proof-waves.md +177 -0
  26. package/docs/reference/migration-0.2-to-0.5.md +26 -19
  27. package/docs/reference/npmjs-trusted-publishing.md +6 -5
  28. package/docs/reference/runtime-config/README.md +14 -4
  29. package/docs/reference/sample-waves.md +87 -0
  30. package/docs/reference/skills.md +110 -42
  31. package/docs/research/agent-context-sources.md +130 -11
  32. package/docs/research/coordination-failure-review.md +266 -0
  33. package/docs/roadmap.md +6 -2
  34. package/package.json +2 -2
  35. package/releases/manifest.json +35 -2
  36. package/scripts/research/agent-context-archive.mjs +83 -1
  37. package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +811 -0
  38. package/scripts/wave-orchestrator/adhoc.mjs +1331 -0
  39. package/scripts/wave-orchestrator/agent-state.mjs +358 -6
  40. package/scripts/wave-orchestrator/artifact-schemas.mjs +173 -0
  41. package/scripts/wave-orchestrator/clarification-triage.mjs +10 -3
  42. package/scripts/wave-orchestrator/config.mjs +48 -12
  43. package/scripts/wave-orchestrator/context7.mjs +2 -0
  44. package/scripts/wave-orchestrator/coord-cli.mjs +51 -19
  45. package/scripts/wave-orchestrator/coordination-store.mjs +26 -4
  46. package/scripts/wave-orchestrator/coordination.mjs +83 -9
  47. package/scripts/wave-orchestrator/dashboard-state.mjs +20 -8
  48. package/scripts/wave-orchestrator/dep-cli.mjs +5 -2
  49. package/scripts/wave-orchestrator/docs-queue.mjs +8 -2
  50. package/scripts/wave-orchestrator/evals.mjs +451 -0
  51. package/scripts/wave-orchestrator/feedback.mjs +15 -1
  52. package/scripts/wave-orchestrator/install.mjs +32 -9
  53. package/scripts/wave-orchestrator/launcher-closure.mjs +281 -0
  54. package/scripts/wave-orchestrator/launcher-runtime.mjs +334 -0
  55. package/scripts/wave-orchestrator/launcher.mjs +709 -601
  56. package/scripts/wave-orchestrator/ledger.mjs +123 -20
  57. package/scripts/wave-orchestrator/local-executor.mjs +99 -12
  58. package/scripts/wave-orchestrator/planner.mjs +177 -42
  59. package/scripts/wave-orchestrator/replay.mjs +6 -3
  60. package/scripts/wave-orchestrator/role-helpers.mjs +84 -0
  61. package/scripts/wave-orchestrator/shared.mjs +75 -11
  62. package/scripts/wave-orchestrator/skills.mjs +637 -106
  63. package/scripts/wave-orchestrator/traces.mjs +71 -48
  64. package/scripts/wave-orchestrator/wave-files.mjs +947 -101
  65. package/scripts/wave.mjs +9 -0
  66. package/skills/README.md +202 -0
  67. package/skills/provider-aws/SKILL.md +111 -0
  68. package/skills/provider-aws/adapters/claude.md +1 -0
  69. package/skills/provider-aws/adapters/codex.md +1 -0
  70. package/skills/provider-aws/references/service-verification.md +39 -0
  71. package/skills/provider-aws/skill.json +50 -1
  72. package/skills/provider-custom-deploy/SKILL.md +59 -0
  73. package/skills/provider-custom-deploy/skill.json +46 -1
  74. package/skills/provider-docker-compose/SKILL.md +90 -0
  75. package/skills/provider-docker-compose/adapters/local.md +1 -0
  76. package/skills/provider-docker-compose/skill.json +49 -1
  77. package/skills/provider-github-release/SKILL.md +116 -1
  78. package/skills/provider-github-release/adapters/claude.md +1 -0
  79. package/skills/provider-github-release/adapters/codex.md +1 -0
  80. package/skills/provider-github-release/skill.json +51 -1
  81. package/skills/provider-kubernetes/SKILL.md +137 -0
  82. package/skills/provider-kubernetes/adapters/claude.md +1 -0
  83. package/skills/provider-kubernetes/adapters/codex.md +1 -0
  84. package/skills/provider-kubernetes/references/kubectl-patterns.md +58 -0
  85. package/skills/provider-kubernetes/skill.json +48 -1
  86. package/skills/provider-railway/SKILL.md +118 -1
  87. package/skills/provider-railway/references/verification-commands.md +39 -0
  88. package/skills/provider-railway/skill.json +67 -1
  89. package/skills/provider-ssh-manual/SKILL.md +91 -0
  90. package/skills/provider-ssh-manual/skill.json +50 -1
  91. package/skills/repo-coding-rules/SKILL.md +84 -0
  92. package/skills/repo-coding-rules/skill.json +30 -1
  93. package/skills/role-cont-eval/SKILL.md +90 -0
  94. package/skills/role-cont-eval/adapters/codex.md +1 -0
  95. package/skills/role-cont-eval/skill.json +36 -0
  96. package/skills/role-cont-qa/SKILL.md +93 -0
  97. package/skills/role-cont-qa/adapters/claude.md +1 -0
  98. package/skills/role-cont-qa/skill.json +36 -0
  99. package/skills/role-deploy/SKILL.md +90 -0
  100. package/skills/role-deploy/skill.json +32 -1
  101. package/skills/role-documentation/SKILL.md +66 -0
  102. package/skills/role-documentation/skill.json +32 -1
  103. package/skills/role-implementation/SKILL.md +62 -0
  104. package/skills/role-implementation/skill.json +32 -1
  105. package/skills/role-infra/SKILL.md +74 -0
  106. package/skills/role-infra/skill.json +32 -1
  107. package/skills/role-integration/SKILL.md +79 -1
  108. package/skills/role-integration/skill.json +32 -1
  109. package/skills/role-research/SKILL.md +58 -0
  110. package/skills/role-research/skill.json +32 -1
  111. package/skills/role-security/SKILL.md +60 -0
  112. package/skills/role-security/skill.json +36 -0
  113. package/skills/runtime-claude/SKILL.md +60 -1
  114. package/skills/runtime-claude/skill.json +32 -1
  115. package/skills/runtime-codex/SKILL.md +52 -1
  116. package/skills/runtime-codex/skill.json +32 -1
  117. package/skills/runtime-local/SKILL.md +39 -0
  118. package/skills/runtime-local/skill.json +32 -1
  119. package/skills/runtime-opencode/SKILL.md +51 -0
  120. package/skills/runtime-opencode/skill.json +32 -1
  121. package/skills/wave-core/SKILL.md +107 -0
  122. package/skills/wave-core/references/marker-syntax.md +62 -0
  123. package/skills/wave-core/skill.json +31 -1
  124. package/wave.config.json +35 -6
  125. package/skills/role-evaluator/SKILL.md +0 -6
  126. package/skills/role-evaluator/skill.json +0 -5
@@ -1,6 +1,6 @@
1
1
  # Runtime Configuration Reference
2
2
 
3
- This directory is the canonical reference for executor configuration in Wave `0.5.x`.
3
+ This directory is the canonical reference for executor configuration in Wave `0.6.1`.
4
4
 
5
5
  Use it when you need the full supported surface for:
6
6
 
@@ -50,6 +50,8 @@ Skill settings resolve after executor selection, because runtime and deploy-kind
50
50
  8. `lanes.<lane>.skills.byDeployKind[defaultDeployEnvironmentKind]`
51
51
  9. agent `### Skills`
52
52
 
53
+ Then Wave filters configured skills through each bundle's activation metadata. Explicit per-agent `### Skills` still force attachment even when activation metadata would not auto-match.
54
+
53
55
  When retry-time fallback changes the runtime, Wave recomputes the effective skill set and rewrites the executor overlay before relaunch.
54
56
 
55
57
  ## Common Fields
@@ -82,14 +84,22 @@ Wave writes runtime artifacts here:
82
84
  Common files:
83
85
 
84
86
  - `launch-preview.json`: resolved invocation lines, env vars, and retry mode
85
- - `skills.resolved.md`: canonical merged skill payload for the selected agent and runtime
86
- - `skills.metadata.json`: resolved skill ids, bundle metadata, hashes, and generated artifact paths
87
- - `<runtime>-skills.txt`: runtime-projected skill text used by the selected executor
87
+ - `skills.resolved.md`: compact metadata-first skill catalog for the selected agent and runtime
88
+ - `skills.expanded.md`: full canonical/debug skill payload with `SKILL.md` bodies and adapters
89
+ - `skills.metadata.json`: resolved skill ids, activation metadata, permissions, hashes, and generated artifact paths
90
+ - `<runtime>-skills.txt`: runtime-projected compact skill text used by the selected executor
88
91
  - `claude-system-prompt.txt`: generated Claude harness prompt overlay
89
92
  - `claude-settings.json`: generated Claude settings overlay when inline settings data is present
90
93
  - `opencode-agent-prompt.txt`: generated OpenCode harness prompt overlay
91
94
  - `opencode.json`: generated OpenCode runtime config overlay
92
95
 
96
+ Runtime-specific delivery:
97
+
98
+ - Codex uses the compact catalog in the compiled prompt and attaches bundle directories through `--add-dir`.
99
+ - Claude appends the compact catalog to the generated system-prompt overlay.
100
+ - OpenCode injects the compact catalog into `opencode.json` and attaches `skill.json`, `SKILL.md`, the selected adapter, and recursive `references/**` files through `--file`.
101
+ - Local keeps skills prompt-only.
102
+
93
103
  `launch-preview.json` also records the resolved skill metadata so dry-run can verify the exact runtime plus skill combination before any live launch.
94
104
 
95
105
  ## Recommended Validation Path
@@ -0,0 +1,87 @@
1
+ ---
2
+ title: "Sample Waves"
3
+ summary: "A showcase-first sample wave that demonstrates the current 0.6.1 Wave surface."
4
+ ---
5
+
6
+ # Sample Waves
7
+
8
+ This guide points to one showcase-first sample wave that demonstrates the current `0.6.1` authored Wave surface.
9
+
10
+ The example is intentionally denser than a typical production wave. Its job is to teach the current authoring and runtime surface quickly, not to be the smallest possible launch-ready file.
11
+
12
+ ## Canonical Example
13
+
14
+ - [Full modern sample wave](../plans/examples/wave-example-live-proof.md)
15
+ Shows the combined `0.6.1` authored surface in one file: closure roles, `E0`, optional security review, delegated and pinned benchmark targets, richer executor config, `### Skills`, `### Capabilities`, `### Deliverables`, `### Exit contract`, `### Proof artifacts`, sticky retry, deploy environments, and proof-first live-wave structure.
16
+
17
+ ## What This Example Teaches
18
+
19
+ - the standard closure-role structure with `A0`, `E0`, `A8`, and `A9`
20
+ - wave-level `## Eval targets`
21
+ - delegated versus pinned benchmark selection
22
+ - coordination benchmark families from `docs/evals/benchmark-catalog.json`
23
+ - richer executor blocks, runtime budgets, and retry policy
24
+ - cross-runtime `### Skills`
25
+ - helper-routing hints via `### Capabilities`
26
+ - `### Deliverables`
27
+ - `### Exit contract`
28
+ - proof-first `### Proof artifacts`
29
+ - sticky retry for proof-bearing owners
30
+ - deploy environments and provider-skill examples
31
+ - infra and deploy-verifier specialist slices
32
+
33
+ ## Feature Coverage Map
34
+
35
+ This sample covers the main surfaces added or hardened for `0.6.1`:
36
+
37
+ - planner-era authored wave structure
38
+ - cross-runtime `### Skills`
39
+ - richer `### Executor` blocks and runtime budgets
40
+ - `cont-EVAL` plus `## Eval targets`
41
+ - delegated and pinned benchmark selection
42
+ - coordination benchmark families from `docs/evals/benchmark-catalog.json`
43
+ - helper-routing hints through `### Capabilities`
44
+ - `### Deliverables`
45
+ - `### Proof artifacts`
46
+ - sticky retry for proof-bearing owners
47
+ - proof-first live-wave prompts
48
+ - deploy environments and deploy-kind-aware skills
49
+ - integration, documentation, and cont-QA closure-role structure
50
+
51
+ ## When To Copy Literally Vs Adapt
52
+
53
+ Copy more literally when:
54
+
55
+ - you need the section layout
56
+ - you want concrete wording for delegated versus pinned benchmark targets
57
+ - you want a proof-first owner example with local artifact bundles and sticky retry
58
+
59
+ Adapt more aggressively when:
60
+
61
+ - your repo has different role ids or role prompts
62
+ - your component promotions and maturity levels differ
63
+ - your runtime policy uses different executor profiles or runtime mix targets
64
+ - your deploy environments or provider skills differ from the example
65
+
66
+ ## How This Example Maps To Other Docs
67
+
68
+ - Use [docs/guides/planner.md](../guides/planner.md) for the planner-generated baseline, then use this sample to see how a human would enrich the generated draft.
69
+ - Use [docs/evals/README.md](../evals/README.md) with this sample when you need to see delegated and pinned benchmark targets in a real wave.
70
+ - Use [docs/reference/live-proof-waves.md](./live-proof-waves.md) with this sample when you need proof-first authoring for `pilot-live` and above.
71
+ - Use [docs/plans/wave-orchestrator.md](../plans/wave-orchestrator.md) for the operational runbook that explains how the launcher interprets these sections.
72
+
73
+ ## Suggested Reading Order
74
+
75
+ 1. Start with [Full modern sample wave](../plans/examples/wave-example-live-proof.md).
76
+ 2. Read [docs/evals/README.md](../evals/README.md) if you want more background on benchmark target selection.
77
+ 3. Read [docs/reference/live-proof-waves.md](./live-proof-waves.md) if you want more detail on proof-first `pilot-live` authoring.
78
+
79
+ ## Why This Example Lives In `docs/plans/examples/`
80
+
81
+ The example lives outside `docs/plans/waves/` on purpose.
82
+
83
+ That keeps it:
84
+
85
+ - easy to browse as teaching material
86
+ - clearly separate from the repo's real launcher-facing wave sequence
87
+ - safe to evolve as reference material without implying that it is part of the current lane's actual plan history
@@ -1,6 +1,6 @@
1
1
  # Skills Reference
2
2
 
3
- Skills are repo-owned reusable instruction bundles that can be attached by lane, role, runtime, deploy kind, or explicit per-agent declaration.
3
+ Skills are repo-owned reusable instruction bundles. Wave resolves them by config layer, then filters them through each bundle's activation metadata before projecting them into the selected runtime.
4
4
 
5
5
  ## Canonical Bundle Layout
6
6
 
@@ -9,12 +9,13 @@ Each bundle lives under `skills/<skill-id>/` and requires:
9
9
  - `skill.json`
10
10
  - `SKILL.md`
11
11
 
12
- Optional runtime adapters live under:
12
+ Optional files:
13
13
 
14
14
  - `adapters/codex.md`
15
15
  - `adapters/claude.md`
16
16
  - `adapters/opencode.md`
17
17
  - `adapters/local.md`
18
+ - `references/**` for on-demand reference material
18
19
 
19
20
  Minimal example:
20
21
 
@@ -26,30 +27,76 @@ skills/provider-railway/
26
27
  codex.md
27
28
  claude.md
28
29
  opencode.md
29
- local.md
30
+ references/
31
+ verification-commands.md
30
32
  ```
31
33
 
32
34
  ## `skill.json`
33
35
 
34
- Required fields in practice:
36
+ Required fields:
35
37
 
36
38
  - `id`
37
39
  - `title`
38
40
  - `description`
41
+ - `activation.when`
39
42
 
40
- The bundle directory name and manifest `id` must match the normalized skill id.
43
+ Optional fields:
41
44
 
42
- ## `SKILL.md`
45
+ - `version`
46
+ - `tags`
47
+ - `activation.roles`
48
+ - `activation.runtimes`
49
+ - `activation.deployKinds`
50
+ - `termination.when`
51
+ - `permissions.network`
52
+ - `permissions.shell`
53
+ - `permissions.mcpServers`
54
+ - `trust.tier`
55
+ - `evalCases[]`
43
56
 
44
- This is the canonical human-authored instruction body for the skill.
57
+ Example:
45
58
 
46
- Keep it focused on reusable guidance that should survive across:
59
+ ```json
60
+ {
61
+ "id": "provider-railway",
62
+ "title": "Railway",
63
+ "description": "Provider-aware Railway verification and rollback guidance.",
64
+ "activation": {
65
+ "when": "Attach when the wave deploy surface is Railway and the agent is acting in deploy, infra, integration, or cont-qa scope.",
66
+ "roles": ["deploy", "infra", "integration", "cont-qa"],
67
+ "runtimes": [],
68
+ "deployKinds": ["railway-cli", "railway-mcp"]
69
+ },
70
+ "termination": "Stop when Railway evidence is recorded or the blocking surface is explicit.",
71
+ "permissions": {
72
+ "network": ["railway.app"],
73
+ "shell": ["railway"],
74
+ "mcpServers": ["railway"]
75
+ },
76
+ "trust": {
77
+ "tier": "repo-owned"
78
+ },
79
+ "evalCases": [
80
+ {
81
+ "id": "deploy-railway-cli",
82
+ "role": "deploy",
83
+ "runtime": "opencode",
84
+ "deployKind": "railway-cli",
85
+ "expectActive": true
86
+ }
87
+ ]
88
+ }
89
+ ```
47
90
 
48
- - many waves
49
- - multiple roles
50
- - multiple runtimes
91
+ The bundle directory name and manifest `id` must match the normalized skill id.
92
+
93
+ ## `SKILL.md`
51
94
 
52
- Do not duplicate volatile assignment-specific details that belong in the wave prompt instead.
95
+ `SKILL.md` is the canonical instruction body. Keep it reusable and procedural:
96
+
97
+ - reusable across many waves
98
+ - free of assignment-specific details that belong in the wave prompt
99
+ - compact enough that long catalogs and command references can move into `references/`
53
100
 
54
101
  ## `wave.config.json` Surface
55
102
 
@@ -59,12 +106,12 @@ Top-level and lane-local skill attachment use the same shape:
59
106
  {
60
107
  "skills": {
61
108
  "dir": "skills",
62
- "base": ["wave-core"],
109
+ "base": ["wave-core", "repo-coding-rules"],
63
110
  "byRole": {
64
- "implementation": ["role-implementation"]
111
+ "deploy": ["role-deploy"]
65
112
  },
66
113
  "byRuntime": {
67
- "codex": ["runtime-codex"]
114
+ "claude": ["runtime-claude"]
68
115
  },
69
116
  "byDeployKind": {
70
117
  "railway-mcp": ["provider-railway"]
@@ -77,7 +124,7 @@ Lane-local `lanes.<lane>.skills` extends the global config instead of replacing
77
124
 
78
125
  ## Resolution Order
79
126
 
80
- Resolved skills attach in this order:
127
+ Resolved skills are gathered in this order:
81
128
 
82
129
  1. global `skills.base`
83
130
  2. lane `skills.base`
@@ -89,20 +136,12 @@ Resolved skills attach in this order:
89
136
  8. lane `skills.byDeployKind[defaultDeployEnvironmentKind]`
90
137
  9. agent `### Skills`
91
138
 
92
- Duplicates are removed while preserving first-seen order.
93
-
94
- ## Per-Agent Attachment
139
+ Then Wave applies manifest activation filtering:
95
140
 
96
- Wave markdown can add explicit skills:
141
+ - configured skills only stay active if their `activation.roles`, `activation.runtimes`, and `activation.deployKinds` match the agent context
142
+ - explicit agent `### Skills` still attach even if activation metadata would not auto-match
97
143
 
98
- ````md
99
- ### Skills
100
-
101
- - provider-github-release
102
- - provider-aws
103
- ````
104
-
105
- These are additive. They do not replace the base, role, runtime, or deploy-kind skill layers.
144
+ Duplicates are removed while preserving first-seen order.
106
145
 
107
146
  ## Deploy-Kind Attachment
108
147
 
@@ -116,41 +155,70 @@ If the wave declares:
116
155
  - `prod`: `railway-mcp` default
117
156
  ````
118
157
 
119
- then `byDeployKind.railway-mcp` skills become eligible for agents in that wave.
158
+ then `byDeployKind.railway-mcp` skills become eligible for that wave. Whether they actually attach still depends on each bundle's activation metadata.
159
+
160
+ Config-time validation rules:
161
+
162
+ - `skills.byRole` keys must be supported Wave roles
163
+ - `skills.byRuntime` keys must be supported runtimes
164
+ - `skills.byDeployKind` keys are validated by `wave doctor` against built-in kinds plus kinds declared in wave files
165
+
166
+ Built-in deploy kinds shipped by the starter profile are:
167
+
168
+ - `railway-cli`
169
+ - `railway-mcp`
170
+ - `docker-compose`
171
+ - `kubernetes`
172
+ - `ssh-manual`
173
+ - `custom`
174
+ - `aws`
175
+ - `github-release`
120
176
 
121
177
  ## Runtime Projection
122
178
 
123
- The canonical bundle is shared, but projection is runtime specific:
179
+ Wave now projects skills metadata-first:
180
+
181
+ - `skills.resolved.md` is a compact catalog with bundle summaries, activation scope, permissions, manifest paths, adapter paths, and available references
182
+ - `skills.expanded.md` contains the full canonical `SKILL.md` bodies plus runtime adapters for debugging and audit
183
+
184
+ Runtime delivery:
124
185
 
125
186
  - Codex
126
- Skill bundle directories become `--add-dir` inputs, and the merged skill text is included in the compiled prompt.
187
+ Bundle directories become `--add-dir` inputs. The compact catalog stays in the compiled prompt, and the agent can read bundle files directly from disk.
127
188
  - Claude
128
- The merged skill payload is appended to the generated system-prompt overlay.
189
+ The compact catalog is appended to the generated system-prompt overlay.
129
190
  - OpenCode
130
- Skill instructions flow into `opencode.json`, and relevant files are attached through `--file`.
191
+ The compact catalog is injected into `opencode.json`, and `skill.json`, `SKILL.md`, the selected adapter, and every recursive `references/**` file are attached through `--file`.
131
192
  - Local
132
- Skill text stays prompt-only.
193
+ The compact catalog stays prompt-only.
133
194
 
134
195
  ## Generated Artifacts
135
196
 
136
197
  Executor overlay directories can contain:
137
198
 
138
199
  - `skills.resolved.md`
200
+ - `skills.expanded.md`
139
201
  - `skills.metadata.json`
140
202
  - `<runtime>-skills.txt`
141
203
 
142
- Dry-run `launch-preview.json` and live trace metadata also record the resolved skill ids and bundle metadata.
204
+ Dry-run `launch-preview.json` and live trace metadata also record the resolved skill ids, bundle metadata, hashes, activation metadata, and artifact paths.
143
205
 
144
206
  ## Validation
145
207
 
146
- `wave doctor` validates that all configured skill bundles referenced by lane skill config exist and can be loaded.
208
+ `wave doctor` validates the skill surface end to end:
209
+
210
+ - referenced bundles exist and load
211
+ - every bundle under the skills directory has a valid manifest and `SKILL.md`
212
+ - `skills.byRole`, `skills.byRuntime`, and `skills.byDeployKind` selectors are valid
213
+ - config mapping does not contradict manifest activation metadata
214
+ - every shipped `evalCases[]` route resolves to the expected active or inactive outcome
147
215
 
148
- Missing or malformed bundles are treated as configuration errors, not silent no-ops.
216
+ Missing or malformed bundles are configuration errors, not silent no-ops.
149
217
 
150
218
  ## Best Practices
151
219
 
152
- - Put repo-specific norms into skills, not repeated wave prompts.
153
- - Keep skills short and reusable.
154
- - Use runtime adapters only for runtime-specific instructions.
155
- - Prefer deploy-kind mapping for environment conventions and explicit `### Skills` only for special cases.
156
- - Keep bundle ids stable so traces and prompt fingerprints stay intelligible across runs.
220
+ - Keep `SKILL.md` procedural and move long catalogs into `references/`.
221
+ - Put routing intent into `activation.*`, not only prose.
222
+ - Use explicit per-agent `### Skills` for true exceptions, not as a substitute for missing activation metadata.
223
+ - Keep provider skills role-scoped unless every role genuinely needs the provider context.
224
+ - Keep bundle ids stable so traces and prompt fingerprints remain intelligible across runs.
@@ -1,54 +1,173 @@
1
1
  ---
2
2
  title: "Agent Context Sources"
3
- summary: "Primary external sources used as inspiration for harness design, long-running agents, blackboard coordination, and repository-context evaluation."
3
+ summary: "Primary external sources used as inspiration for planning, harness design, skills and procedural memory, long-running agents, blackboard coordination, repository-context evaluation, and secure code generation."
4
4
  ---
5
5
 
6
6
  # Agent Context Sources
7
7
 
8
8
  This repository does not commit converted paper/article caches. Keep any hydrated local copies under `docs/research/agent-context-cache/` or another ignored cache directory.
9
9
 
10
- ## Harnesses and Long-Running Agents
10
+ ## Practice Articles
11
11
 
12
12
  - [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)
13
13
  - [Unlocking the Codex harness: how we built the App Server](https://openai.com/index/unlocking-the-codex-harness/)
14
14
  - [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
15
+
16
+ ## Planning and Orchestration
17
+
18
+ - [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
19
+ - [Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution](https://arxiv.org/abs/2603.11445)
20
+ - [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
21
+ - [TodoEvolve: Learning to Architect Agent Planning Systems](https://arxiv.org/abs/2602.07839)
22
+ - [Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems in Minecraft](https://arxiv.org/abs/2503.03505)
23
+ - [OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents](https://arxiv.org/abs/2603.03005)
24
+ - [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
25
+ - [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
26
+ - [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
27
+ - [MACC: Multi-Agent Collaborative Competition for Scientific Exploration](https://arxiv.org/abs/2603.03780)
15
28
  - [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
16
29
  - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
17
30
  - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
31
+ - [Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach](https://arxiv.org/abs/2510.12120)
32
+ - [Advancing Multi-Agent Systems Through Model Context Protocol: Architecture, Implementation, and Applications](https://arxiv.org/abs/2504.21030)
33
+ - [Enhancing Model Context Protocol (MCP) with Context-Aware Server Collaboration](https://arxiv.org/abs/2601.11595)
34
+ - [Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/abs/2503.13657)
35
+ - [Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs](https://arxiv.org/abs/2505.11556)
36
+ - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
37
+ - [DPBench: Large Language Models Struggle with Simultaneous Coordination](https://arxiv.org/abs/2602.13255)
38
+ - [Multi-Agent Teams Hold Experts Back](https://arxiv.org/abs/2602.01011)
39
+ - [A Survey on LLM-based Multi-agent Systems: Workflow, Infrastructure, and Challenges](https://link.springer.com/article/10.1007/s44336-024-00009-2)
40
+ - [LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead](https://arxiv.org/abs/2404.04834)
41
+ - [The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption](https://arxiv.org/abs/2601.13671)
42
+ - [Describing Agentic AI Systems with C4: Lessons from Industry Projects](https://arxiv.org/abs/2603.15021)
43
+ - [A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications](https://arxiv.org/abs/2508.12683)
44
+ - [Blackboard Systems, Part One: The Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures](https://ojs.aaai.org/index.php/aimagazine/article/view/537)
45
+ - [A Blackboard Architecture for Control](https://www.sciencedirect.com/science/article/abs/pii/0004370285900633)
46
+ - [Incremental Planning to Control a Blackboard-Based Problem Solver](https://cdn.aaai.org/AAAI/1986/AAAI86-010.pdf)
47
+ - [Blackboard Systems](https://mas.cs.umass.edu/Documents/Corkill/ai-expert.pdf)
48
+
49
+ ## Harnesses, Context Engineering, and Long-Running Agents
50
+
51
+ - [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
52
+ - [Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models](https://arxiv.org/abs/2510.04618)
53
+ - [ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization](https://arxiv.org/abs/2509.13313)
54
+ - [ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents](https://arxiv.org/abs/2601.12030)
55
+ - [Meta Context Engineering via Agentic Skill Evolution](https://arxiv.org/abs/2601.21557)
56
+ - [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)
57
+ - [Context Engineering: From Prompts to Corporate Multi-Agent Architecture](https://arxiv.org/abs/2603.09619)
58
+ - [CEDAR: Context Engineering for Agentic Data Science](https://arxiv.org/abs/2601.06606)
59
+ - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
60
+ - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
61
+ - [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
62
+
63
+ ## Skills and Procedural Memory
64
+
65
+ - [SoK: Agentic Skills -- Beyond Tool Use in LLM Agents](https://arxiv.org/abs/2602.20867)
66
+ - [Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward](https://arxiv.org/abs/2602.12430)
67
+ - [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
68
+ - [Meta Context Engineering via Agentic Skill Evolution](https://arxiv.org/abs/2601.21557)
69
+ - [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291)
70
+ - [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
71
+ - [Large Language Models as Tool Makers](https://arxiv.org/abs/2305.17126)
72
+ - [Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control](https://arxiv.org/abs/2306.07863)
73
+ - [ExpeL: LLM Agents Are Experiential Learners](https://arxiv.org/abs/2308.10144)
74
+ - [Agent Workflow Memory](https://arxiv.org/abs/2409.07429)
75
+ - [SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills](https://arxiv.org/abs/2504.07079)
76
+ - [ReUseIt: Synthesizing Reusable AI Agent Workflows for Web Automation](https://arxiv.org/abs/2510.14308)
77
+ - [ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents](https://arxiv.org/abs/2602.01869)
78
+ - [Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement](https://arxiv.org/abs/2512.18950)
79
+ - [Mem^p: Exploring Agent Procedural Memory](https://arxiv.org/abs/2508.06433)
80
+ - [MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents](https://arxiv.org/abs/2602.02474)
81
+ - [Reinforcement Learning for Self-Improving Agent with Skill Library](https://arxiv.org/abs/2512.17102)
82
+ - [Evolving Programmatic Skill Networks](https://arxiv.org/abs/2601.03509)
83
+ - [XSkill: Continual Learning from Experience and Skills in Multimodal Agents](https://arxiv.org/abs/2603.12056)
84
+ - [Memento-Skills: Let Agents Design Agents](https://arxiv.org/abs/2603.18743)
85
+ - [MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild](https://arxiv.org/abs/2603.17187)
86
+ - [SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks](https://arxiv.org/abs/2602.12670)
87
+
88
+ ## Agent Context Files and Configuration
89
+
90
+ - [Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?](https://arxiv.org/html/2602.11988v1)
91
+ - [Agent READMEs: An Empirical Study of Context Files for Agentic Coding](https://arxiv.org/abs/2511.12884)
92
+ - [Beyond the Prompt: An Empirical Study of Cursor Rules](https://arxiv.org/abs/2512.18925)
93
+ - [Decoding the Configuration of AI Coding Agents: Insights from Claude Code Projects](https://arxiv.org/abs/2511.09268)
94
+ - [On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents](https://arxiv.org/abs/2601.20404)
95
+ - [Interpretable Context Methodology: Folder Structure as Agentic Architecture](https://arxiv.org/abs/2603.16021)
18
96
 
19
97
  ## Blackboard and Shared Workspaces
20
98
 
21
99
  - [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
22
100
  - [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
101
+ - [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
23
102
  - [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
103
+ - [MACC: Multi-Agent Collaborative Competition for Scientific Exploration](https://arxiv.org/abs/2603.03780)
104
+ - [M3-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering](https://arxiv.org/abs/2603.08369)
24
105
  - [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
25
106
  - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
26
107
  - [An Open Agent Architecture](https://cdn.aaai.org/Symposia/Spring/1994/SS-94-03/SS94-03-001.pdf)
108
+ - [Blackboard Systems, Part One: The Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures](https://ojs.aaai.org/index.php/aimagazine/article/view/537)
109
+ - [A Blackboard Architecture for Control](https://www.sciencedirect.com/science/article/abs/pii/0004370285900633)
110
+ - [Incremental Planning to Control a Blackboard-Based Problem Solver](https://cdn.aaai.org/AAAI/1986/AAAI86-010.pdf)
111
+ - [Blackboard Systems](https://mas.cs.umass.edu/Documents/Corkill/ai-expert.pdf)
27
112
 
28
- ## Repo Context and Evaluation
113
+ ## Multi-Agent Orchestration and Architecture
29
114
 
30
- - [Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?](https://arxiv.org/html/2602.11988v1)
31
- - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
32
- - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
33
- - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
115
+ - [The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption](https://arxiv.org/abs/2601.13671)
116
+ - [Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents](https://arxiv.org/abs/2601.12560)
117
+ - [Describing Agentic AI Systems with C4: Lessons from Industry Projects](https://arxiv.org/abs/2603.15021)
118
+ - [A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications](https://arxiv.org/abs/2508.12683)
34
119
 
35
- ## Adjacent Context and Memory
120
+ ## Security and Secure Code Generation
121
+
122
+ - [Prompting Techniques for Secure Code Generation: A Systematic Investigation](https://arxiv.org/abs/2407.07064)
123
+ - [Retrieve, Refine, or Both? Using Task-Specific Guidelines for Secure Python Code Generation](https://emaiannone.github.io/assets/pdf/c6.pdf)
124
+ - [Discrete Prompt Optimization Using Genetic Algorithm for Secure Python Code Generation](https://www.sciencedirect.com/science/article/pii/S0164121225003516)
125
+ - [SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows](https://arxiv.org/abs/2506.07313)
126
+ - [RESCUE: Retrieval Augmented Secure Code Generation](https://arxiv.org/abs/2510.18204)
127
+ - [Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback](https://arxiv.org/abs/2601.00509)
128
+ - [Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation](https://arxiv.org/abs/2602.01187)
129
+ - [Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms](https://arxiv.org/abs/2603.11212)
130
+ - [Persistent Human Feedback, LLMs, and Static Analyzers for Secure Code Generation and Vulnerability Detection](https://arxiv.org/abs/2602.05868)
131
+
132
+ ## Security Benchmarks and Evaluation
133
+
134
+ - [SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories](https://arxiv.org/abs/2504.21205)
135
+ - [SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios](https://arxiv.org/abs/2509.22097)
136
+ - [SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI](https://arxiv.org/abs/2410.11096)
137
+ - [TOSSS: a CVE-based Software Security Benchmark for Large Language Models](https://arxiv.org/abs/2603.10969)
138
+ - [From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security](https://arxiv.org/abs/2412.15004)
139
+ - [Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics](https://arxiv.org/abs/2511.10271)
140
+
141
+ ## Multi-Agent Security
142
+
143
+ - [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
144
+ - [Security Considerations for Multi-agent Systems](https://arxiv.org/abs/2603.09002)
145
+
146
+ ## Skill Practice and Open Standards
147
+
148
+ - [Agent Skills - Codex](https://developers.openai.com/codex/skills/)
149
+ - [Customization - Codex](https://developers.openai.com/codex/concepts/customization/)
150
+ - [Testing Agent Skills Systematically with Evals](https://developers.openai.com/blog/eval-skills/)
151
+ - [Shell + Skills + Compaction: Tips for long-running agents that do real work](https://developers.openai.com/blog/skills-shell-tips/)
152
+ - [Using skills to accelerate OSS maintenance](https://developers.openai.com/blog/skills-agents-sdk/)
153
+ - [Equipping agents for the real world with Agent Skills](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills)
154
+ - [Best practices for skill creators](https://agentskills.io/skill-creation/best-practices)
155
+ - [Using scripts in skills](https://agentskills.io/skill-creation/using-scripts)
156
+
157
+ ## Adjacent Memory and Prompting Research
36
158
 
37
- - [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
38
159
  - [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
39
160
  - [Run long horizon tasks with Codex](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/)
40
161
  - [Prompt guidance for GPT-5](https://developers.openai.com/api/docs/guides/prompt-guidance/)
41
162
  - [Codex Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/codex_prompting_guide/)
42
163
  - [Writing effective tools for agents](https://www.anthropic.com/engineering/writing-tools-for-agents)
43
164
  - [Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
44
- - [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)
45
165
  - [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
46
166
  - [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
47
167
  - [Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091)
48
168
  - [Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs/2306.07174)
49
169
  - [MemGPT: Towards LLMs as Operating Systems](https://arxiv.org/abs/2310.08560)
50
170
  - [Lost in the Middle: How Language Models Use Long Contexts](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00638/119630/Lost-in-the-Middle-How-Language-Models-Use-Long)
51
- - [ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization](https://arxiv.org/abs/2509.13313)
52
171
 
53
172
  ## Notes
54
173