npm - @chllming/wave-orchestration - Versions diffs - 0.5.4 → 0.6.1 - Mend

@chllming/wave-orchestration 0.5.4 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (126) hide show

package/CHANGELOG.md +52 -3
package/README.md +33 -5
package/docs/README.md +18 -4
package/docs/agents/wave-cont-eval-role.md +36 -0
package/docs/agents/{wave-evaluator-role.md → wave-cont-qa-role.md} +14 -11
package/docs/agents/wave-documentation-role.md +1 -1
package/docs/agents/wave-infra-role.md +1 -1
package/docs/agents/wave-integration-role.md +3 -3
package/docs/agents/wave-launcher-role.md +4 -3
package/docs/agents/wave-security-role.md +40 -0
package/docs/concepts/context7-vs-skills.md +1 -1
package/docs/concepts/what-is-a-wave.md +56 -6
package/docs/evals/README.md +166 -0
package/docs/evals/benchmark-catalog.json +663 -0
package/docs/guides/author-and-run-waves.md +135 -0
package/docs/guides/planner.md +5 -0
package/docs/guides/terminal-surfaces.md +2 -0
package/docs/plans/component-cutover-matrix.json +1 -1
package/docs/plans/component-cutover-matrix.md +1 -1
package/docs/plans/current-state.md +19 -1
package/docs/plans/examples/wave-example-live-proof.md +435 -0
package/docs/plans/migration.md +42 -0
package/docs/plans/wave-orchestrator.md +46 -7
package/docs/plans/waves/wave-0.md +4 -4
package/docs/reference/live-proof-waves.md +177 -0
package/docs/reference/migration-0.2-to-0.5.md +26 -19
package/docs/reference/npmjs-trusted-publishing.md +6 -5
package/docs/reference/runtime-config/README.md +14 -4
package/docs/reference/sample-waves.md +87 -0
package/docs/reference/skills.md +110 -42
package/docs/research/agent-context-sources.md +130 -11
package/docs/research/coordination-failure-review.md +266 -0
package/docs/roadmap.md +6 -2
package/package.json +2 -2
package/releases/manifest.json +35 -2
package/scripts/research/agent-context-archive.mjs +83 -1
package/scripts/research/manifests/agent-context-expanded-2026-03-22.mjs +811 -0
package/scripts/wave-orchestrator/adhoc.mjs +1331 -0
package/scripts/wave-orchestrator/agent-state.mjs +358 -6
package/scripts/wave-orchestrator/artifact-schemas.mjs +173 -0
package/scripts/wave-orchestrator/clarification-triage.mjs +10 -3
package/scripts/wave-orchestrator/config.mjs +48 -12
package/scripts/wave-orchestrator/context7.mjs +2 -0
package/scripts/wave-orchestrator/coord-cli.mjs +51 -19
package/scripts/wave-orchestrator/coordination-store.mjs +26 -4
package/scripts/wave-orchestrator/coordination.mjs +83 -9
package/scripts/wave-orchestrator/dashboard-state.mjs +20 -8
package/scripts/wave-orchestrator/dep-cli.mjs +5 -2
package/scripts/wave-orchestrator/docs-queue.mjs +8 -2
package/scripts/wave-orchestrator/evals.mjs +451 -0
package/scripts/wave-orchestrator/feedback.mjs +15 -1
package/scripts/wave-orchestrator/install.mjs +32 -9
package/scripts/wave-orchestrator/launcher-closure.mjs +281 -0
package/scripts/wave-orchestrator/launcher-runtime.mjs +334 -0
package/scripts/wave-orchestrator/launcher.mjs +709 -601
package/scripts/wave-orchestrator/ledger.mjs +123 -20
package/scripts/wave-orchestrator/local-executor.mjs +99 -12
package/scripts/wave-orchestrator/planner.mjs +177 -42
package/scripts/wave-orchestrator/replay.mjs +6 -3
package/scripts/wave-orchestrator/role-helpers.mjs +84 -0
package/scripts/wave-orchestrator/shared.mjs +75 -11
package/scripts/wave-orchestrator/skills.mjs +637 -106
package/scripts/wave-orchestrator/traces.mjs +71 -48
package/scripts/wave-orchestrator/wave-files.mjs +947 -101
package/scripts/wave.mjs +9 -0
package/skills/README.md +202 -0
package/skills/provider-aws/SKILL.md +111 -0
package/skills/provider-aws/adapters/claude.md +1 -0
package/skills/provider-aws/adapters/codex.md +1 -0
package/skills/provider-aws/references/service-verification.md +39 -0
package/skills/provider-aws/skill.json +50 -1
package/skills/provider-custom-deploy/SKILL.md +59 -0
package/skills/provider-custom-deploy/skill.json +46 -1
package/skills/provider-docker-compose/SKILL.md +90 -0
package/skills/provider-docker-compose/adapters/local.md +1 -0
package/skills/provider-docker-compose/skill.json +49 -1
package/skills/provider-github-release/SKILL.md +116 -1
package/skills/provider-github-release/adapters/claude.md +1 -0
package/skills/provider-github-release/adapters/codex.md +1 -0
package/skills/provider-github-release/skill.json +51 -1
package/skills/provider-kubernetes/SKILL.md +137 -0
package/skills/provider-kubernetes/adapters/claude.md +1 -0
package/skills/provider-kubernetes/adapters/codex.md +1 -0
package/skills/provider-kubernetes/references/kubectl-patterns.md +58 -0
package/skills/provider-kubernetes/skill.json +48 -1
package/skills/provider-railway/SKILL.md +118 -1
package/skills/provider-railway/references/verification-commands.md +39 -0
package/skills/provider-railway/skill.json +67 -1
package/skills/provider-ssh-manual/SKILL.md +91 -0
package/skills/provider-ssh-manual/skill.json +50 -1
package/skills/repo-coding-rules/SKILL.md +84 -0
package/skills/repo-coding-rules/skill.json +30 -1
package/skills/role-cont-eval/SKILL.md +90 -0
package/skills/role-cont-eval/adapters/codex.md +1 -0
package/skills/role-cont-eval/skill.json +36 -0
package/skills/role-cont-qa/SKILL.md +93 -0
package/skills/role-cont-qa/adapters/claude.md +1 -0
package/skills/role-cont-qa/skill.json +36 -0
package/skills/role-deploy/SKILL.md +90 -0
package/skills/role-deploy/skill.json +32 -1
package/skills/role-documentation/SKILL.md +66 -0
package/skills/role-documentation/skill.json +32 -1
package/skills/role-implementation/SKILL.md +62 -0
package/skills/role-implementation/skill.json +32 -1
package/skills/role-infra/SKILL.md +74 -0
package/skills/role-infra/skill.json +32 -1
package/skills/role-integration/SKILL.md +79 -1
package/skills/role-integration/skill.json +32 -1
package/skills/role-research/SKILL.md +58 -0
package/skills/role-research/skill.json +32 -1
package/skills/role-security/SKILL.md +60 -0
package/skills/role-security/skill.json +36 -0
package/skills/runtime-claude/SKILL.md +60 -1
package/skills/runtime-claude/skill.json +32 -1
package/skills/runtime-codex/SKILL.md +52 -1
package/skills/runtime-codex/skill.json +32 -1
package/skills/runtime-local/SKILL.md +39 -0
package/skills/runtime-local/skill.json +32 -1
package/skills/runtime-opencode/SKILL.md +51 -0
package/skills/runtime-opencode/skill.json +32 -1
package/skills/wave-core/SKILL.md +107 -0
package/skills/wave-core/references/marker-syntax.md +62 -0
package/skills/wave-core/skill.json +31 -1
package/wave.config.json +35 -6
package/skills/role-evaluator/SKILL.md +0 -6
package/skills/role-evaluator/skill.json +0 -5

package/docs/reference/runtime-config/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Runtime Configuration Reference
-This directory is the canonical reference for executor configuration in Wave `0.5.x`.
+This directory is the canonical reference for executor configuration in Wave `0.6.1`.
 Use it when you need the full supported surface for:
@@ -50,6 +50,8 @@ Skill settings resolve after executor selection, because runtime and deploy-kind
 8. `lanes.<lane>.skills.byDeployKind[defaultDeployEnvironmentKind]`
 9. agent `### Skills`
+Then Wave filters configured skills through each bundle's activation metadata. Explicit per-agent `### Skills` still force attachment even when activation metadata would not auto-match.
 When retry-time fallback changes the runtime, Wave recomputes the effective skill set and rewrites the executor overlay before relaunch.
 ## Common Fields
@@ -82,14 +84,22 @@ Wave writes runtime artifacts here:
 Common files:
 - `launch-preview.json`: resolved invocation lines, env vars, and retry mode
-- `skills.resolved.md`: canonical merged skill payload for the selected agent and runtime
-- `skills.metadata.json`: resolved skill ids, bundle metadata, hashes, and generated artifact paths
-- `<runtime>-skills.txt`: runtime-projected skill text used by the selected executor
+- `skills.resolved.md`: compact metadata-first skill catalog for the selected agent and runtime
+- `skills.expanded.md`: full canonical/debug skill payload with `SKILL.md` bodies and adapters
+- `skills.metadata.json`: resolved skill ids, activation metadata, permissions, hashes, and generated artifact paths
+- `<runtime>-skills.txt`: runtime-projected compact skill text used by the selected executor
 - `claude-system-prompt.txt`: generated Claude harness prompt overlay
 - `claude-settings.json`: generated Claude settings overlay when inline settings data is present
 - `opencode-agent-prompt.txt`: generated OpenCode harness prompt overlay
 - `opencode.json`: generated OpenCode runtime config overlay
+Runtime-specific delivery:
+- Codex uses the compact catalog in the compiled prompt and attaches bundle directories through `--add-dir`.
+- Claude appends the compact catalog to the generated system-prompt overlay.
+- OpenCode injects the compact catalog into `opencode.json` and attaches `skill.json`, `SKILL.md`, the selected adapter, and recursive `references/**` files through `--file`.
+- Local keeps skills prompt-only.
 `launch-preview.json` also records the resolved skill metadata so dry-run can verify the exact runtime plus skill combination before any live launch.
 ## Recommended Validation Path

package/docs/reference/sample-waves.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+title: "Sample Waves"
+summary: "A showcase-first sample wave that demonstrates the current 0.6.1 Wave surface."
+---
+# Sample Waves
+This guide points to one showcase-first sample wave that demonstrates the current `0.6.1` authored Wave surface.
+The example is intentionally denser than a typical production wave. Its job is to teach the current authoring and runtime surface quickly, not to be the smallest possible launch-ready file.
+## Canonical Example
+- [Full modern sample wave](../plans/examples/wave-example-live-proof.md)
+  Shows the combined `0.6.1` authored surface in one file: closure roles, `E0`, optional security review, delegated and pinned benchmark targets, richer executor config, `### Skills`, `### Capabilities`, `### Deliverables`, `### Exit contract`, `### Proof artifacts`, sticky retry, deploy environments, and proof-first live-wave structure.
+## What This Example Teaches
+- the standard closure-role structure with `A0`, `E0`, `A8`, and `A9`
+- wave-level `## Eval targets`
+- delegated versus pinned benchmark selection
+- coordination benchmark families from `docs/evals/benchmark-catalog.json`
+- richer executor blocks, runtime budgets, and retry policy
+- cross-runtime `### Skills`
+- helper-routing hints via `### Capabilities`
+- `### Deliverables`
+- `### Exit contract`
+- proof-first `### Proof artifacts`
+- sticky retry for proof-bearing owners
+- deploy environments and provider-skill examples
+- infra and deploy-verifier specialist slices
+## Feature Coverage Map
+This sample covers the main surfaces added or hardened for `0.6.1`:
+- planner-era authored wave structure
+- cross-runtime `### Skills`
+- richer `### Executor` blocks and runtime budgets
+- `cont-EVAL` plus `## Eval targets`
+- delegated and pinned benchmark selection
+- coordination benchmark families from `docs/evals/benchmark-catalog.json`
+- helper-routing hints through `### Capabilities`
+- `### Deliverables`
+- `### Proof artifacts`
+- sticky retry for proof-bearing owners
+- proof-first live-wave prompts
+- deploy environments and deploy-kind-aware skills
+- integration, documentation, and cont-QA closure-role structure
+## When To Copy Literally Vs Adapt
+Copy more literally when:
+- you need the section layout
+- you want concrete wording for delegated versus pinned benchmark targets
+- you want a proof-first owner example with local artifact bundles and sticky retry
+Adapt more aggressively when:
+- your repo has different role ids or role prompts
+- your component promotions and maturity levels differ
+- your runtime policy uses different executor profiles or runtime mix targets
+- your deploy environments or provider skills differ from the example
+## How This Example Maps To Other Docs
+- Use [docs/guides/planner.md](../guides/planner.md) for the planner-generated baseline, then use this sample to see how a human would enrich the generated draft.
+- Use [docs/evals/README.md](../evals/README.md) with this sample when you need to see delegated and pinned benchmark targets in a real wave.
+- Use [docs/reference/live-proof-waves.md](./live-proof-waves.md) with this sample when you need proof-first authoring for `pilot-live` and above.
+- Use [docs/plans/wave-orchestrator.md](../plans/wave-orchestrator.md) for the operational runbook that explains how the launcher interprets these sections.
+## Suggested Reading Order
+1. Start with [Full modern sample wave](../plans/examples/wave-example-live-proof.md).
+2. Read [docs/evals/README.md](../evals/README.md) if you want more background on benchmark target selection.
+3. Read [docs/reference/live-proof-waves.md](./live-proof-waves.md) if you want more detail on proof-first `pilot-live` authoring.
+## Why This Example Lives In `docs/plans/examples/`
+The example lives outside `docs/plans/waves/` on purpose.
+That keeps it:
+- easy to browse as teaching material
+- clearly separate from the repo's real launcher-facing wave sequence
+- safe to evolve as reference material without implying that it is part of the current lane's actual plan history

package/docs/reference/skills.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Skills Reference
-Skills are repo-owned reusable instruction bundles that can be attached by lane, role, runtime, deploy kind, or explicit per-agent declaration.
+Skills are repo-owned reusable instruction bundles. Wave resolves them by config layer, then filters them through each bundle's activation metadata before projecting them into the selected runtime.
 ## Canonical Bundle Layout
@@ -9,12 +9,13 @@ Each bundle lives under `skills/<skill-id>/` and requires:
 - `skill.json`
 - `SKILL.md`
-Optional runtime adapters live under:
+Optional files:
 - `adapters/codex.md`
 - `adapters/claude.md`
 - `adapters/opencode.md`
 - `adapters/local.md`
+- `references/**` for on-demand reference material
 Minimal example:
@@ -26,30 +27,76 @@ skills/provider-railway/
     codex.md
     claude.md
     opencode.md
-    local.md
+  references/
+    verification-commands.md
 ```
 ## `skill.json`
-Required fields in practice:
+Required fields:
 - `id`
 - `title`
 - `description`
+- `activation.when`
-The bundle directory name and manifest `id` must match the normalized skill id.
+Optional fields:
-## `SKILL.md`
+- `version`
+- `tags`
+- `activation.roles`
+- `activation.runtimes`
+- `activation.deployKinds`
+- `termination.when`
+- `permissions.network`
+- `permissions.shell`
+- `permissions.mcpServers`
+- `trust.tier`
+- `evalCases[]`
-This is the canonical human-authored instruction body for the skill.
+Example:
-Keep it focused on reusable guidance that should survive across:
+```json
+{
+  "id": "provider-railway",
+  "title": "Railway",
+  "description": "Provider-aware Railway verification and rollback guidance.",
+  "activation": {
+    "when": "Attach when the wave deploy surface is Railway and the agent is acting in deploy, infra, integration, or cont-qa scope.",
+    "roles": ["deploy", "infra", "integration", "cont-qa"],
+    "runtimes": [],
+    "deployKinds": ["railway-cli", "railway-mcp"]
+  },
+  "termination": "Stop when Railway evidence is recorded or the blocking surface is explicit.",
+  "permissions": {
+    "network": ["railway.app"],
+    "shell": ["railway"],
+    "mcpServers": ["railway"]
+  },
+  "trust": {
+    "tier": "repo-owned"
+  },
+  "evalCases": [
+    {
+      "id": "deploy-railway-cli",
+      "role": "deploy",
+      "runtime": "opencode",
+      "deployKind": "railway-cli",
+      "expectActive": true
+    }
+  ]
+}
+```
-- many waves
-- multiple roles
-- multiple runtimes
+The bundle directory name and manifest `id` must match the normalized skill id.
+## `SKILL.md`
-Do not duplicate volatile assignment-specific details that belong in the wave prompt instead.
+`SKILL.md` is the canonical instruction body. Keep it reusable and procedural:
+- reusable across many waves
+- free of assignment-specific details that belong in the wave prompt
+- compact enough that long catalogs and command references can move into `references/`
 ## `wave.config.json` Surface
@@ -59,12 +106,12 @@ Top-level and lane-local skill attachment use the same shape:
 {
   "skills": {
     "dir": "skills",
-    "base": ["wave-core"],
+    "base": ["wave-core", "repo-coding-rules"],
     "byRole": {
-      "implementation": ["role-implementation"]
+      "deploy": ["role-deploy"]
     },
     "byRuntime": {
-      "codex": ["runtime-codex"]
+      "claude": ["runtime-claude"]
     },
     "byDeployKind": {
       "railway-mcp": ["provider-railway"]
@@ -77,7 +124,7 @@ Lane-local `lanes.<lane>.skills` extends the global config instead of replacing
 ## Resolution Order
-Resolved skills attach in this order:
+Resolved skills are gathered in this order:
 1. global `skills.base`
 2. lane `skills.base`
@@ -89,20 +136,12 @@ Resolved skills attach in this order:
 8. lane `skills.byDeployKind[defaultDeployEnvironmentKind]`
 9. agent `### Skills`
-Duplicates are removed while preserving first-seen order.
-## Per-Agent Attachment
+Then Wave applies manifest activation filtering:
-Wave markdown can add explicit skills:
+- configured skills only stay active if their `activation.roles`, `activation.runtimes`, and `activation.deployKinds` match the agent context
+- explicit agent `### Skills` still attach even if activation metadata would not auto-match
-````md
-### Skills
-- provider-github-release
-- provider-aws
-````
-These are additive. They do not replace the base, role, runtime, or deploy-kind skill layers.
+Duplicates are removed while preserving first-seen order.
 ## Deploy-Kind Attachment
@@ -116,41 +155,70 @@ If the wave declares:
 - `prod`: `railway-mcp` default
 ````
-then `byDeployKind.railway-mcp` skills become eligible for agents in that wave.
+then `byDeployKind.railway-mcp` skills become eligible for that wave. Whether they actually attach still depends on each bundle's activation metadata.
+Config-time validation rules:
+- `skills.byRole` keys must be supported Wave roles
+- `skills.byRuntime` keys must be supported runtimes
+- `skills.byDeployKind` keys are validated by `wave doctor` against built-in kinds plus kinds declared in wave files
+Built-in deploy kinds shipped by the starter profile are:
+- `railway-cli`
+- `railway-mcp`
+- `docker-compose`
+- `kubernetes`
+- `ssh-manual`
+- `custom`
+- `aws`
+- `github-release`
 ## Runtime Projection
-The canonical bundle is shared, but projection is runtime specific:
+Wave now projects skills metadata-first:
+- `skills.resolved.md` is a compact catalog with bundle summaries, activation scope, permissions, manifest paths, adapter paths, and available references
+- `skills.expanded.md` contains the full canonical `SKILL.md` bodies plus runtime adapters for debugging and audit
+Runtime delivery:
 - Codex
-  Skill bundle directories become `--add-dir` inputs, and the merged skill text is included in the compiled prompt.
+  Bundle directories become `--add-dir` inputs. The compact catalog stays in the compiled prompt, and the agent can read bundle files directly from disk.
 - Claude
-  The merged skill payload is appended to the generated system-prompt overlay.
+  The compact catalog is appended to the generated system-prompt overlay.
 - OpenCode
-  Skill instructions flow into `opencode.json`, and relevant files are attached through `--file`.
+  The compact catalog is injected into `opencode.json`, and `skill.json`, `SKILL.md`, the selected adapter, and every recursive `references/**` file are attached through `--file`.
 - Local
-  Skill text stays prompt-only.
+  The compact catalog stays prompt-only.
 ## Generated Artifacts
 Executor overlay directories can contain:
 - `skills.resolved.md`
+- `skills.expanded.md`
 - `skills.metadata.json`
 - `<runtime>-skills.txt`
-Dry-run `launch-preview.json` and live trace metadata also record the resolved skill ids and bundle metadata.
+Dry-run `launch-preview.json` and live trace metadata also record the resolved skill ids, bundle metadata, hashes, activation metadata, and artifact paths.
 ## Validation
-`wave doctor` validates that all configured skill bundles referenced by lane skill config exist and can be loaded.
+`wave doctor` validates the skill surface end to end:
+- referenced bundles exist and load
+- every bundle under the skills directory has a valid manifest and `SKILL.md`
+- `skills.byRole`, `skills.byRuntime`, and `skills.byDeployKind` selectors are valid
+- config mapping does not contradict manifest activation metadata
+- every shipped `evalCases[]` route resolves to the expected active or inactive outcome
-Missing or malformed bundles are treated as configuration errors, not silent no-ops.
+Missing or malformed bundles are configuration errors, not silent no-ops.
 ## Best Practices
-- Put repo-specific norms into skills, not repeated wave prompts.
-- Keep skills short and reusable.
-- Use runtime adapters only for runtime-specific instructions.
-- Prefer deploy-kind mapping for environment conventions and explicit `### Skills` only for special cases.
-- Keep bundle ids stable so traces and prompt fingerprints stay intelligible across runs.
+- Keep `SKILL.md` procedural and move long catalogs into `references/`.
+- Put routing intent into `activation.*`, not only prose.
+- Use explicit per-agent `### Skills` for true exceptions, not as a substitute for missing activation metadata.
+- Keep provider skills role-scoped unless every role genuinely needs the provider context.
+- Keep bundle ids stable so traces and prompt fingerprints remain intelligible across runs.

package/docs/research/agent-context-sources.md CHANGED Viewed

@@ -1,54 +1,173 @@
 ---
 title: "Agent Context Sources"
-summary: "Primary external sources used as inspiration for harness design, long-running agents, blackboard coordination, and repository-context evaluation."
+summary: "Primary external sources used as inspiration for planning, harness design, skills and procedural memory, long-running agents, blackboard coordination, repository-context evaluation, and secure code generation."
 ---
 # Agent Context Sources
 This repository does not commit converted paper/article caches. Keep any hydrated local copies under `docs/research/agent-context-cache/` or another ignored cache directory.
-## Harnesses and Long-Running Agents
+## Practice Articles
 - [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)
 - [Unlocking the Codex harness: how we built the App Server](https://openai.com/index/unlocking-the-codex-harness/)
 - [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
+## Planning and Orchestration
+- [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
+- [Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution](https://arxiv.org/abs/2603.11445)
+- [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
+- [TodoEvolve: Learning to Architect Agent Planning Systems](https://arxiv.org/abs/2602.07839)
+- [Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems in Minecraft](https://arxiv.org/abs/2503.03505)
+- [OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents](https://arxiv.org/abs/2603.03005)
+- [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
+- [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
+- [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
+- [MACC: Multi-Agent Collaborative Competition for Scientific Exploration](https://arxiv.org/abs/2603.03780)
 - [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
 - [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
 - [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
+- [Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach](https://arxiv.org/abs/2510.12120)
+- [Advancing Multi-Agent Systems Through Model Context Protocol: Architecture, Implementation, and Applications](https://arxiv.org/abs/2504.21030)
+- [Enhancing Model Context Protocol (MCP) with Context-Aware Server Collaboration](https://arxiv.org/abs/2601.11595)
+- [Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/abs/2503.13657)
+- [Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs](https://arxiv.org/abs/2505.11556)
+- [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
+- [DPBench: Large Language Models Struggle with Simultaneous Coordination](https://arxiv.org/abs/2602.13255)
+- [Multi-Agent Teams Hold Experts Back](https://arxiv.org/abs/2602.01011)
+- [A Survey on LLM-based Multi-agent Systems: Workflow, Infrastructure, and Challenges](https://link.springer.com/article/10.1007/s44336-024-00009-2)
+- [LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead](https://arxiv.org/abs/2404.04834)
+- [The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption](https://arxiv.org/abs/2601.13671)
+- [Describing Agentic AI Systems with C4: Lessons from Industry Projects](https://arxiv.org/abs/2603.15021)
+- [A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications](https://arxiv.org/abs/2508.12683)
+- [Blackboard Systems, Part One: The Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures](https://ojs.aaai.org/index.php/aimagazine/article/view/537)
+- [A Blackboard Architecture for Control](https://www.sciencedirect.com/science/article/abs/pii/0004370285900633)
+- [Incremental Planning to Control a Blackboard-Based Problem Solver](https://cdn.aaai.org/AAAI/1986/AAAI86-010.pdf)
+- [Blackboard Systems](https://mas.cs.umass.edu/Documents/Corkill/ai-expert.pdf)
+## Harnesses, Context Engineering, and Long-Running Agents
+- [Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned](https://arxiv.org/abs/2603.05344)
+- [Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models](https://arxiv.org/abs/2510.04618)
+- [ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization](https://arxiv.org/abs/2509.13313)
+- [ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents](https://arxiv.org/abs/2601.12030)
+- [Meta Context Engineering via Agentic Skill Evolution](https://arxiv.org/abs/2601.21557)
+- [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)
+- [Context Engineering: From Prompts to Corporate Multi-Agent Architecture](https://arxiv.org/abs/2603.09619)
+- [CEDAR: Context Engineering for Agentic Data Science](https://arxiv.org/abs/2601.06606)
+- [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
+- [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
+- [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
+## Skills and Procedural Memory
+- [SoK: Agentic Skills -- Beyond Tool Use in LLM Agents](https://arxiv.org/abs/2602.20867)
+- [Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward](https://arxiv.org/abs/2602.12430)
+- [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
+- [Meta Context Engineering via Agentic Skill Evolution](https://arxiv.org/abs/2601.21557)
+- [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291)
+- [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
+- [Large Language Models as Tool Makers](https://arxiv.org/abs/2305.17126)
+- [Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control](https://arxiv.org/abs/2306.07863)
+- [ExpeL: LLM Agents Are Experiential Learners](https://arxiv.org/abs/2308.10144)
+- [Agent Workflow Memory](https://arxiv.org/abs/2409.07429)
+- [SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills](https://arxiv.org/abs/2504.07079)
+- [ReUseIt: Synthesizing Reusable AI Agent Workflows for Web Automation](https://arxiv.org/abs/2510.14308)
+- [ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents](https://arxiv.org/abs/2602.01869)
+- [Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement](https://arxiv.org/abs/2512.18950)
+- [Mem^p: Exploring Agent Procedural Memory](https://arxiv.org/abs/2508.06433)
+- [MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents](https://arxiv.org/abs/2602.02474)
+- [Reinforcement Learning for Self-Improving Agent with Skill Library](https://arxiv.org/abs/2512.17102)
+- [Evolving Programmatic Skill Networks](https://arxiv.org/abs/2601.03509)
+- [XSkill: Continual Learning from Experience and Skills in Multimodal Agents](https://arxiv.org/abs/2603.12056)
+- [Memento-Skills: Let Agents Design Agents](https://arxiv.org/abs/2603.18743)
+- [MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild](https://arxiv.org/abs/2603.17187)
+- [SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks](https://arxiv.org/abs/2602.12670)
+## Agent Context Files and Configuration
+- [Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?](https://arxiv.org/html/2602.11988v1)
+- [Agent READMEs: An Empirical Study of Context Files for Agentic Coding](https://arxiv.org/abs/2511.12884)
+- [Beyond the Prompt: An Empirical Study of Cursor Rules](https://arxiv.org/abs/2512.18925)
+- [Decoding the Configuration of AI Coding Agents: Insights from Claude Code Projects](https://arxiv.org/abs/2511.09268)
+- [On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents](https://arxiv.org/abs/2601.20404)
+- [Interpretable Context Methodology: Folder Structure as Agentic Architecture](https://arxiv.org/abs/2603.16021)
 ## Blackboard and Shared Workspaces
 - [LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
 - [Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
+- [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
 - [DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation](https://arxiv.org/abs/2603.13327)
+- [MACC: Multi-Agent Collaborative Competition for Scientific Exploration](https://arxiv.org/abs/2603.03780)
+- [M3-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering](https://arxiv.org/abs/2603.08369)
 - [SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly](https://arxiv.org/abs/2601.22623)
 - [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
 - [An Open Agent Architecture](https://cdn.aaai.org/Symposia/Spring/1994/SS-94-03/SS94-03-001.pdf)
+- [Blackboard Systems, Part One: The Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures](https://ojs.aaai.org/index.php/aimagazine/article/view/537)
+- [A Blackboard Architecture for Control](https://www.sciencedirect.com/science/article/abs/pii/0004370285900633)
+- [Incremental Planning to Control a Blackboard-Based Problem Solver](https://cdn.aaai.org/AAAI/1986/AAAI86-010.pdf)
+- [Blackboard Systems](https://mas.cs.umass.edu/Documents/Corkill/ai-expert.pdf)
-## Repo Context and Evaluation
+## Multi-Agent Orchestration and Architecture
-- [Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?](https://arxiv.org/html/2602.11988v1)
-- [VeRO: An Evaluation Harness for Agents to Optimize Agents](https://arxiv.org/abs/2602.22480)
-- [EvoClaw: Evaluating AI Agents on Continuous Software Evolution](https://arxiv.org/abs/2603.13428)
-- [Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems](https://arxiv.org/abs/2603.01045)
+- [The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption](https://arxiv.org/abs/2601.13671)
+- [Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents](https://arxiv.org/abs/2601.12560)
+- [Describing Agentic AI Systems with C4: Lessons from Industry Projects](https://arxiv.org/abs/2603.15021)
+- [A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications](https://arxiv.org/abs/2508.12683)
-## Adjacent Context and Memory
+## Security and Secure Code Generation
+- [Prompting Techniques for Secure Code Generation: A Systematic Investigation](https://arxiv.org/abs/2407.07064)
+- [Retrieve, Refine, or Both? Using Task-Specific Guidelines for Secure Python Code Generation](https://emaiannone.github.io/assets/pdf/c6.pdf)
+- [Discrete Prompt Optimization Using Genetic Algorithm for Secure Python Code Generation](https://www.sciencedirect.com/science/article/pii/S0164121225003516)
+- [SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows](https://arxiv.org/abs/2506.07313)
+- [RESCUE: Retrieval Augmented Secure Code Generation](https://arxiv.org/abs/2510.18204)
+- [Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback](https://arxiv.org/abs/2601.00509)
+- [Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation](https://arxiv.org/abs/2602.01187)
+- [Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms](https://arxiv.org/abs/2603.11212)
+- [Persistent Human Feedback, LLMs, and Static Analyzers for Secure Code Generation and Vulnerability Detection](https://arxiv.org/abs/2602.05868)
+## Security Benchmarks and Evaluation
+- [SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories](https://arxiv.org/abs/2504.21205)
+- [SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios](https://arxiv.org/abs/2509.22097)
+- [SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI](https://arxiv.org/abs/2410.11096)
+- [TOSSS: a CVE-based Software Security Benchmark for Large Language Models](https://arxiv.org/abs/2603.10969)
+- [From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security](https://arxiv.org/abs/2412.15004)
+- [Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics](https://arxiv.org/abs/2511.10271)
+## Multi-Agent Security
+- [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312)
+- [Security Considerations for Multi-agent Systems](https://arxiv.org/abs/2603.09002)
+## Skill Practice and Open Standards
+- [Agent Skills - Codex](https://developers.openai.com/codex/skills/)
+- [Customization - Codex](https://developers.openai.com/codex/concepts/customization/)
+- [Testing Agent Skills Systematically with Evals](https://developers.openai.com/blog/eval-skills/)
+- [Shell + Skills + Compaction: Tips for long-running agents that do real work](https://developers.openai.com/blog/skills-shell-tips/)
+- [Using skills to accelerate OSS maintenance](https://developers.openai.com/blog/skills-agents-sdk/)
+- [Equipping agents for the real world with Agent Skills](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills)
+- [Best practices for skill creators](https://agentskills.io/skill-creation/best-practices)
+- [Using scripts in skills](https://agentskills.io/skill-creation/using-scripts)
+## Adjacent Memory and Prompting Research
-- [Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers](https://arxiv.org/abs/2603.07670)
 - [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
 - [Run long horizon tasks with Codex](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex/)
 - [Prompt guidance for GPT-5](https://developers.openai.com/api/docs/guides/prompt-guidance/)
 - [Codex Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/codex_prompting_guide/)
 - [Writing effective tools for agents](https://www.anthropic.com/engineering/writing-tools-for-agents)
 - [Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
-- [Context Engineering for AI Agents in Open-Source Software](https://arxiv.org/abs/2510.21413)
 - [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
 - [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
 - [Plan-and-Solve Prompting](https://arxiv.org/abs/2305.04091)
 - [Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs/2306.07174)
 - [MemGPT: Towards LLMs as Operating Systems](https://arxiv.org/abs/2310.08560)
 - [Lost in the Middle: How Language Models Use Long Contexts](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00638/119630/Lost-in-the-Middle-How-Language-Models-Use-Long)
-- [ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization](https://arxiv.org/abs/2509.13313)
 ## Notes