npm - macro-agent - Versions diffs - 0.0.11 → 0.0.13 - Mend

macro-agent 0.0.11 → 0.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (408) hide show

package/.macro-agent/teams/self-driving/prompts/grinder.md +27 -0
package/.macro-agent/teams/self-driving/prompts/judge.md +27 -0
package/.macro-agent/teams/self-driving/prompts/planner.md +33 -0
package/.macro-agent/teams/self-driving/roles/grinder.yaml +17 -0
package/.macro-agent/teams/self-driving/roles/judge.yaml +24 -0
package/.macro-agent/teams/self-driving/roles/planner.yaml +18 -0
package/.macro-agent/teams/self-driving/team.yaml +103 -0
package/.macro-agent/teams/structured/prompts/developer.md +26 -0
package/.macro-agent/teams/structured/prompts/lead.md +25 -0
package/.macro-agent/teams/structured/prompts/reviewer.md +24 -0
package/.macro-agent/teams/structured/roles/developer.yaml +12 -0
package/.macro-agent/teams/structured/roles/lead.yaml +11 -0
package/.macro-agent/teams/structured/roles/reviewer.yaml +19 -0
package/.macro-agent/teams/structured/team.yaml +89 -0
package/.sudocode/issues.jsonl +6 -0
package/.sudocode/specs.jsonl +7 -0
package/CLAUDE.md +110 -30
package/README.md +60 -3
package/dist/acp/macro-agent.d.ts +4 -0
package/dist/acp/macro-agent.d.ts.map +1 -1
package/dist/acp/macro-agent.js +50 -4
package/dist/acp/macro-agent.js.map +1 -1
package/dist/acp/session-mapper.d.ts +20 -1
package/dist/acp/session-mapper.d.ts.map +1 -1
package/dist/acp/session-mapper.js +90 -1
package/dist/acp/session-mapper.js.map +1 -1
package/dist/acp/types.d.ts +24 -1
package/dist/acp/types.d.ts.map +1 -1
package/dist/acp/types.js.map +1 -1
package/dist/agent/agent-manager.d.ts +25 -1
package/dist/agent/agent-manager.d.ts.map +1 -1
package/dist/agent/agent-manager.js +93 -7
package/dist/agent/agent-manager.js.map +1 -1
package/dist/agent/types.d.ts +22 -0
package/dist/agent/types.d.ts.map +1 -1
package/dist/agent/types.js.map +1 -1
package/dist/agent-detection/command-builder.d.ts +30 -0
package/dist/agent-detection/command-builder.d.ts.map +1 -0
package/dist/agent-detection/command-builder.js +71 -0
package/dist/agent-detection/command-builder.js.map +1 -0
package/dist/agent-detection/detector.d.ts +84 -0
package/dist/agent-detection/detector.d.ts.map +1 -0
package/dist/agent-detection/detector.js +240 -0
package/dist/agent-detection/detector.js.map +1 -0
package/dist/agent-detection/index.d.ts +12 -0
package/dist/agent-detection/index.d.ts.map +1 -0
package/dist/agent-detection/index.js +14 -0
package/dist/agent-detection/index.js.map +1 -0
package/dist/agent-detection/registry.d.ts +53 -0
package/dist/agent-detection/registry.d.ts.map +1 -0
package/dist/agent-detection/registry.js +177 -0
package/dist/agent-detection/registry.js.map +1 -0
package/dist/agent-detection/types.d.ts +121 -0
package/dist/agent-detection/types.d.ts.map +1 -0
package/dist/agent-detection/types.js +20 -0
package/dist/agent-detection/types.js.map +1 -0
package/dist/api/server.d.ts.map +1 -1
package/dist/api/server.js +95 -0
package/dist/api/server.js.map +1 -1
package/dist/cli/index.js +29 -0
package/dist/cli/index.js.map +1 -1
package/dist/cli/mcp.js +38 -0
package/dist/cli/mcp.js.map +1 -1
package/dist/config/index.d.ts +2 -0
package/dist/config/index.d.ts.map +1 -0
package/dist/config/index.js +2 -0
package/dist/config/index.js.map +1 -0
package/dist/config/project-config.d.ts +46 -0
package/dist/config/project-config.d.ts.map +1 -0
package/dist/config/project-config.js +68 -0
package/dist/config/project-config.js.map +1 -0
package/dist/lifecycle/cascade.d.ts +1 -1
package/dist/lifecycle/cascade.d.ts.map +1 -1
package/dist/lifecycle/handlers/index.d.ts +4 -0
package/dist/lifecycle/handlers/index.d.ts.map +1 -1
package/dist/lifecycle/handlers/index.js +2 -0
package/dist/lifecycle/handlers/index.js.map +1 -1
package/dist/lifecycle/handlers/worker.d.ts +4 -0
package/dist/lifecycle/handlers/worker.d.ts.map +1 -1
package/dist/lifecycle/handlers/worker.js +35 -3
package/dist/lifecycle/handlers/worker.js.map +1 -1
package/dist/map/adapter/acp-over-map.d.ts.map +1 -1
package/dist/map/adapter/acp-over-map.js +32 -2
package/dist/map/adapter/acp-over-map.js.map +1 -1
package/dist/map/adapter/event-translator.d.ts.map +1 -1
package/dist/map/adapter/event-translator.js +1 -0
package/dist/map/adapter/event-translator.js.map +1 -1
package/dist/map/adapter/extensions/agent-detection.d.ts +49 -0
package/dist/map/adapter/extensions/agent-detection.d.ts.map +1 -0
package/dist/map/adapter/extensions/agent-detection.js +91 -0
package/dist/map/adapter/extensions/agent-detection.js.map +1 -0
package/dist/map/adapter/extensions/index.d.ts +10 -1
package/dist/map/adapter/extensions/index.d.ts.map +1 -1
package/dist/map/adapter/extensions/index.js +39 -0
package/dist/map/adapter/extensions/index.js.map +1 -1
package/dist/map/adapter/extensions/resume.d.ts +47 -0
package/dist/map/adapter/extensions/resume.d.ts.map +1 -0
package/dist/map/adapter/extensions/resume.js +59 -0
package/dist/map/adapter/extensions/resume.js.map +1 -0
package/dist/map/adapter/extensions/workspace-files.d.ts +42 -0
package/dist/map/adapter/extensions/workspace-files.d.ts.map +1 -0
package/dist/map/adapter/extensions/workspace-files.js +338 -0
package/dist/map/adapter/extensions/workspace-files.js.map +1 -0
package/dist/mcp/mcp-server.d.ts +6 -0
package/dist/mcp/mcp-server.d.ts.map +1 -1
package/dist/mcp/mcp-server.js +45 -0
package/dist/mcp/mcp-server.js.map +1 -1
package/dist/mcp/tools/claim_task.d.ts +35 -0
package/dist/mcp/tools/claim_task.d.ts.map +1 -0
package/dist/mcp/tools/claim_task.js +58 -0
package/dist/mcp/tools/claim_task.js.map +1 -0
package/dist/mcp/tools/done.d.ts +11 -2
package/dist/mcp/tools/done.d.ts.map +1 -1
package/dist/mcp/tools/done.js +15 -10
package/dist/mcp/tools/done.js.map +1 -1
package/dist/mcp/tools/list_claimable_tasks.d.ts +38 -0
package/dist/mcp/tools/list_claimable_tasks.d.ts.map +1 -0
package/dist/mcp/tools/list_claimable_tasks.js +63 -0
package/dist/mcp/tools/list_claimable_tasks.js.map +1 -0
package/dist/mcp/tools/unclaim_task.d.ts +31 -0
package/dist/mcp/tools/unclaim_task.d.ts.map +1 -0
package/dist/mcp/tools/unclaim_task.js +47 -0
package/dist/mcp/tools/unclaim_task.js.map +1 -0
package/dist/metrics/index.d.ts +2 -0
package/dist/metrics/index.d.ts.map +1 -0
package/dist/metrics/index.js +2 -0
package/dist/metrics/index.js.map +1 -0
package/dist/metrics/metrics.d.ts +79 -0
package/dist/metrics/metrics.d.ts.map +1 -0
package/dist/metrics/metrics.js +166 -0
package/dist/metrics/metrics.js.map +1 -0
package/dist/roles/capabilities.d.ts +1 -0
package/dist/roles/capabilities.d.ts.map +1 -1
package/dist/roles/capabilities.js +3 -0
package/dist/roles/capabilities.js.map +1 -1
package/dist/roles/types.d.ts +1 -1
package/dist/roles/types.d.ts.map +1 -1
package/dist/router/message-router.d.ts +41 -0
package/dist/router/message-router.d.ts.map +1 -1
package/dist/router/message-router.js +136 -5
package/dist/router/message-router.js.map +1 -1
package/dist/store/event-store.d.ts +8 -1
package/dist/store/event-store.d.ts.map +1 -1
package/dist/store/event-store.js +120 -4
package/dist/store/event-store.js.map +1 -1
package/dist/store/types/agents.d.ts +1 -1
package/dist/store/types/agents.d.ts.map +1 -1
package/dist/store/types/events.d.ts +1 -1
package/dist/store/types/events.d.ts.map +1 -1
package/dist/store/types/events.js.map +1 -1
package/dist/store/types/index.d.ts +1 -0
package/dist/store/types/index.d.ts.map +1 -1
package/dist/store/types/index.js +1 -0
package/dist/store/types/index.js.map +1 -1
package/dist/store/types/sessions.d.ts +44 -0
package/dist/store/types/sessions.d.ts.map +1 -0
package/dist/store/types/sessions.js +9 -0
package/dist/store/types/sessions.js.map +1 -0
package/dist/store/types/tasks.d.ts +2 -0
package/dist/store/types/tasks.d.ts.map +1 -1
package/dist/task/backend/memory.d.ts +4 -1
package/dist/task/backend/memory.d.ts.map +1 -1
package/dist/task/backend/memory.js +81 -0
package/dist/task/backend/memory.js.map +1 -1
package/dist/task/backend/types.d.ts +30 -0
package/dist/task/backend/types.d.ts.map +1 -1
package/dist/task/backend/types.js.map +1 -1
package/dist/teams/index.d.ts +4 -0
package/dist/teams/index.d.ts.map +1 -0
package/dist/teams/index.js +4 -0
package/dist/teams/index.js.map +1 -0
package/dist/teams/team-loader.d.ts +20 -0
package/dist/teams/team-loader.d.ts.map +1 -0
package/dist/teams/team-loader.js +293 -0
package/dist/teams/team-loader.js.map +1 -0
package/dist/teams/team-runtime.d.ts +139 -0
package/dist/teams/team-runtime.d.ts.map +1 -0
package/dist/teams/team-runtime.js +613 -0
package/dist/teams/team-runtime.js.map +1 -0
package/dist/teams/types.d.ts +266 -0
package/dist/teams/types.d.ts.map +1 -0
package/dist/teams/types.js +20 -0
package/dist/teams/types.js.map +1 -0
package/dist/workspace/dataplane-adapter.d.ts +1 -1
package/dist/workspace/dataplane-adapter.d.ts.map +1 -1
package/dist/workspace/dataplane-adapter.js +1 -1
package/dist/workspace/dataplane-adapter.js.map +1 -1
package/dist/workspace/index.d.ts +1 -1
package/dist/workspace/index.d.ts.map +1 -1
package/dist/workspace/strategies/index.d.ts +6 -0
package/dist/workspace/strategies/index.d.ts.map +1 -0
package/dist/workspace/strategies/index.js +5 -0
package/dist/workspace/strategies/index.js.map +1 -0
package/dist/workspace/strategies/optimistic.d.ts +26 -0
package/dist/workspace/strategies/optimistic.d.ts.map +1 -0
package/dist/workspace/strategies/optimistic.js +121 -0
package/dist/workspace/strategies/optimistic.js.map +1 -0
package/dist/workspace/strategies/queue.d.ts +26 -0
package/dist/workspace/strategies/queue.d.ts.map +1 -0
package/dist/workspace/strategies/queue.js +67 -0
package/dist/workspace/strategies/queue.js.map +1 -0
package/dist/workspace/strategies/registry.d.ts +37 -0
package/dist/workspace/strategies/registry.d.ts.map +1 -0
package/dist/workspace/strategies/registry.js +63 -0
package/dist/workspace/strategies/registry.js.map +1 -0
package/dist/workspace/strategies/trunk.d.ts +20 -0
package/dist/workspace/strategies/trunk.d.ts.map +1 -0
package/dist/workspace/strategies/trunk.js +108 -0
package/dist/workspace/strategies/trunk.js.map +1 -0
package/dist/workspace/strategies/types.d.ts +104 -0
package/dist/workspace/strategies/types.d.ts.map +1 -0
package/dist/workspace/strategies/types.js +11 -0
package/dist/workspace/strategies/types.js.map +1 -0
package/dist/workspace/types.d.ts +1 -1
package/dist/workspace/types.d.ts.map +1 -1
package/dist/workspace/workspace-manager.d.ts +1 -1
package/dist/workspace/workspace-manager.d.ts.map +1 -1
package/docs/implementation-details.md +1127 -0
package/docs/implementation-summary.md +448 -0
package/docs/plan-self-driving-support.md +433 -0
package/docs/spec-self-driving-support.md +462 -0
package/docs/team-templates.md +860 -0
package/docs/teams.md +233 -0
package/package.json +5 -3
package/src/acp/__tests__/integration.test.ts +161 -1
package/src/acp/__tests__/macro-agent.test.ts +95 -0
package/src/acp/__tests__/session-persistence.test.ts +276 -0
package/src/acp/macro-agent.ts +79 -7
package/src/acp/session-mapper.ts +108 -1
package/src/acp/types.ts +33 -1
package/src/agent/agent-manager.ts +158 -6
package/src/agent/types.ts +27 -0
package/src/agent-detection/__tests__/command-builder.test.ts +336 -0
package/src/agent-detection/__tests__/detector.test.ts +768 -0
package/src/agent-detection/__tests__/registry.test.ts +254 -0
package/src/agent-detection/command-builder.ts +90 -0
package/src/agent-detection/detector.ts +307 -0
package/src/agent-detection/index.ts +36 -0
package/src/agent-detection/registry.ts +200 -0
package/src/agent-detection/types.ts +184 -0
package/src/api/server.ts +110 -0
package/src/cli/index.ts +44 -0
package/src/cli/mcp.ts +47 -0
package/src/config/index.ts +9 -0
package/src/config/project-config.ts +107 -0
package/src/lifecycle/cascade.ts +1 -1
package/src/lifecycle/handlers/index.ts +8 -0
package/src/lifecycle/handlers/worker.ts +48 -3
package/src/map/adapter/__tests__/extensions.test.ts +359 -0
package/src/map/adapter/__tests__/workspace-files.test.ts +673 -0
package/src/map/adapter/acp-over-map.ts +45 -2
package/src/map/adapter/event-translator.ts +1 -0
package/src/map/adapter/extensions/agent-detection.ts +201 -0
package/src/map/adapter/extensions/index.ts +63 -0
package/src/map/adapter/extensions/resume.ts +114 -0
package/src/map/adapter/extensions/workspace-files.ts +449 -0
package/src/mcp/mcp-server.ts +67 -0
package/src/mcp/tools/claim_task.ts +86 -0
package/src/mcp/tools/done.ts +24 -10
package/src/mcp/tools/list_claimable_tasks.ts +93 -0
package/src/mcp/tools/unclaim_task.ts +71 -0
package/src/metrics/index.ts +9 -0
package/src/metrics/metrics.ts +280 -0
package/src/roles/capabilities.ts +3 -0
package/src/roles/types.ts +2 -1
package/src/router/__tests__/message-router.test.ts +561 -0
package/src/router/message-router.ts +223 -6
package/src/store/event-store.ts +151 -3
package/src/store/types/agents.ts +1 -1
package/src/store/types/events.ts +2 -1
package/src/store/types/index.ts +1 -0
package/src/store/types/sessions.ts +53 -0
package/src/store/types/tasks.ts +3 -0
package/src/task/backend/memory.ts +116 -0
package/src/task/backend/types.ts +43 -0
package/src/teams/__tests__/cross-subsystem.integration.test.ts +983 -0
package/src/teams/__tests__/e2e/team-runtime.e2e.test.ts +553 -0
package/src/teams/__tests__/team-system.test.ts +1280 -0
package/src/teams/index.ts +13 -0
package/src/teams/team-loader.ts +434 -0
package/src/teams/team-runtime.ts +727 -0
package/src/teams/types.ts +377 -0
package/src/workspace/dataplane-adapter.ts +1 -1
package/src/workspace/index.ts +1 -1
package/src/workspace/strategies/index.ts +18 -0
package/src/workspace/strategies/optimistic.ts +136 -0
package/src/workspace/strategies/queue.ts +81 -0
package/src/workspace/strategies/registry.ts +89 -0
package/src/workspace/strategies/trunk.ts +123 -0
package/src/workspace/strategies/types.ts +145 -0
package/src/workspace/types.ts +1 -1
package/src/workspace/workspace-manager.ts +1 -1
package/.claude/settings.local.json +0 -59
package/dist/map/utils/address-translation.d.ts +0 -99
package/dist/map/utils/address-translation.d.ts.map +0 -1
package/dist/map/utils/address-translation.js +0 -285
package/dist/map/utils/address-translation.js.map +0 -1
package/dist/map/utils/index.d.ts +0 -7
package/dist/map/utils/index.d.ts.map +0 -1
package/dist/map/utils/index.js +0 -7
package/dist/map/utils/index.js.map +0 -1
package/references/acp-factory-ref/CHANGELOG.md +0 -33
package/references/acp-factory-ref/LICENSE +0 -21
package/references/acp-factory-ref/README.md +0 -341
package/references/acp-factory-ref/package-lock.json +0 -3102
package/references/acp-factory-ref/package.json +0 -96
package/references/acp-factory-ref/python/CHANGELOG.md +0 -33
package/references/acp-factory-ref/python/LICENSE +0 -21
package/references/acp-factory-ref/python/Makefile +0 -57
package/references/acp-factory-ref/python/README.md +0 -253
package/references/acp-factory-ref/python/pyproject.toml +0 -73
package/references/acp-factory-ref/python/tests/__init__.py +0 -0
package/references/acp-factory-ref/python/tests/e2e/__init__.py +0 -1
package/references/acp-factory-ref/python/tests/e2e/test_codex_e2e.py +0 -349
package/references/acp-factory-ref/python/tests/e2e/test_gemini_e2e.py +0 -165
package/references/acp-factory-ref/python/tests/e2e/test_opencode_e2e.py +0 -296
package/references/acp-factory-ref/python/tests/test_client_handler.py +0 -543
package/references/acp-factory-ref/python/tests/test_pushable.py +0 -199
package/references/claude-code-acp/.github/workflows/ci.yml +0 -45
package/references/claude-code-acp/.github/workflows/publish.yml +0 -34
package/references/claude-code-acp/.prettierrc.json +0 -4
package/references/claude-code-acp/CHANGELOG.md +0 -249
package/references/claude-code-acp/LICENSE +0 -222
package/references/claude-code-acp/README.md +0 -53
package/references/claude-code-acp/docs/RELEASES.md +0 -24
package/references/claude-code-acp/eslint.config.js +0 -48
package/references/claude-code-acp/package-lock.json +0 -4570
package/references/claude-code-acp/package.json +0 -88
package/references/claude-code-acp/scripts/release.sh +0 -119
package/references/claude-code-acp/src/acp-agent.ts +0 -2065
package/references/claude-code-acp/src/index.ts +0 -26
package/references/claude-code-acp/src/lib.ts +0 -38
package/references/claude-code-acp/src/mcp-server.ts +0 -911
package/references/claude-code-acp/src/settings.ts +0 -522
package/references/claude-code-acp/src/tests/.claude/commands/quick-math.md +0 -5
package/references/claude-code-acp/src/tests/.claude/commands/say-hello.md +0 -6
package/references/claude-code-acp/src/tests/acp-agent-fork.test.ts +0 -479
package/references/claude-code-acp/src/tests/acp-agent.test.ts +0 -1502
package/references/claude-code-acp/src/tests/extract-lines.test.ts +0 -103
package/references/claude-code-acp/src/tests/fork-session.test.ts +0 -335
package/references/claude-code-acp/src/tests/replace-and-calculate-location.test.ts +0 -334
package/references/claude-code-acp/src/tests/settings.test.ts +0 -617
package/references/claude-code-acp/src/tests/skills-options.test.ts +0 -187
package/references/claude-code-acp/src/tests/tools.test.ts +0 -318
package/references/claude-code-acp/src/tests/typescript-declarations.test.ts +0 -558
package/references/claude-code-acp/src/tools.ts +0 -819
package/references/claude-code-acp/src/utils.ts +0 -171
package/references/claude-code-acp/tsconfig.json +0 -18
package/references/claude-code-acp/vitest.config.ts +0 -19
package/references/multi-agent-protocol/.sudocode/issues.jsonl +0 -111
package/references/multi-agent-protocol/.sudocode/specs.jsonl +0 -13
package/references/multi-agent-protocol/LICENSE +0 -21
package/references/multi-agent-protocol/README.md +0 -113
package/references/multi-agent-protocol/docs/00-design-specification.md +0 -496
package/references/multi-agent-protocol/docs/01-open-questions.md +0 -1050
package/references/multi-agent-protocol/docs/02-wire-protocol.md +0 -296
package/references/multi-agent-protocol/docs/03-streaming-semantics.md +0 -252
package/references/multi-agent-protocol/docs/04-error-handling.md +0 -231
package/references/multi-agent-protocol/docs/05-connection-model.md +0 -244
package/references/multi-agent-protocol/docs/06-visibility-permissions.md +0 -243
package/references/multi-agent-protocol/docs/07-federation.md +0 -259
package/references/multi-agent-protocol/docs/08-macro-agent-migration.md +0 -253
package/references/multi-agent-protocol/docs/09-authentication.md +0 -680
package/references/multi-agent-protocol/docs/10-mail-protocol.md +0 -553
package/references/multi-agent-protocol/docs/agent-iam-integration.md +0 -877
package/references/multi-agent-protocol/docs/agentic-mesh-integration-draft.md +0 -459
package/references/multi-agent-protocol/docs/git-transport-draft.md +0 -251
package/references/multi-agent-protocol/docs-site/Gemfile +0 -22
package/references/multi-agent-protocol/docs-site/README.md +0 -82
package/references/multi-agent-protocol/docs-site/_config.yml +0 -91
package/references/multi-agent-protocol/docs-site/_includes/head_custom.html +0 -20
package/references/multi-agent-protocol/docs-site/_sass/color_schemes/map.scss +0 -42
package/references/multi-agent-protocol/docs-site/_sass/custom/custom.scss +0 -34
package/references/multi-agent-protocol/docs-site/examples/full-integration.md +0 -510
package/references/multi-agent-protocol/docs-site/examples/index.md +0 -138
package/references/multi-agent-protocol/docs-site/examples/simple-chat.md +0 -282
package/references/multi-agent-protocol/docs-site/examples/task-queue.md +0 -399
package/references/multi-agent-protocol/docs-site/getting-started/index.md +0 -98
package/references/multi-agent-protocol/docs-site/getting-started/installation.md +0 -219
package/references/multi-agent-protocol/docs-site/getting-started/overview.md +0 -172
package/references/multi-agent-protocol/docs-site/getting-started/quickstart.md +0 -237
package/references/multi-agent-protocol/docs-site/index.md +0 -136
package/references/multi-agent-protocol/docs-site/protocol/authentication.md +0 -391
package/references/multi-agent-protocol/docs-site/protocol/connection-model.md +0 -376
package/references/multi-agent-protocol/docs-site/protocol/design.md +0 -284
package/references/multi-agent-protocol/docs-site/protocol/error-handling.md +0 -312
package/references/multi-agent-protocol/docs-site/protocol/federation.md +0 -449
package/references/multi-agent-protocol/docs-site/protocol/index.md +0 -129
package/references/multi-agent-protocol/docs-site/protocol/permissions.md +0 -398
package/references/multi-agent-protocol/docs-site/protocol/streaming.md +0 -353
package/references/multi-agent-protocol/docs-site/protocol/wire-protocol.md +0 -369
package/references/multi-agent-protocol/docs-site/sdk/api/agent.md +0 -357
package/references/multi-agent-protocol/docs-site/sdk/api/client.md +0 -380
package/references/multi-agent-protocol/docs-site/sdk/api/index.md +0 -62
package/references/multi-agent-protocol/docs-site/sdk/api/server.md +0 -453
package/references/multi-agent-protocol/docs-site/sdk/api/types.md +0 -468
package/references/multi-agent-protocol/docs-site/sdk/guides/agent.md +0 -375
package/references/multi-agent-protocol/docs-site/sdk/guides/authentication.md +0 -405
package/references/multi-agent-protocol/docs-site/sdk/guides/client.md +0 -352
package/references/multi-agent-protocol/docs-site/sdk/guides/index.md +0 -89
package/references/multi-agent-protocol/docs-site/sdk/guides/server.md +0 -360
package/references/multi-agent-protocol/docs-site/sdk/guides/testing.md +0 -446
package/references/multi-agent-protocol/docs-site/sdk/guides/transports.md +0 -363
package/references/multi-agent-protocol/docs-site/sdk/index.md +0 -206
package/references/multi-agent-protocol/package-lock.json +0 -3886
package/references/multi-agent-protocol/package.json +0 -56
package/references/multi-agent-protocol/schema/meta.json +0 -467
package/references/multi-agent-protocol/schema/schema.json +0 -2558

package/.sudocode/specs.jsonl CHANGED Viewed

@@ -39,3 +39,10 @@
 {"id":"s-5yhx","uuid":"2109c48c-feda-4604-a43b-9e27065d3cc5","title":"Multi-Agent Orchestration: Operational & Extensibility Features","file_path":"specs/s-5yhx_multi_agent_orchestration_operational_extensibilit.md","content":"# Multi-Agent Orchestration: Operational & Extensibility Features\n\n## Overview\n\nThis spec defines the second wave of implementation following [[s-7t8b]] (Multi-Agent Orchestration Implementation Plan). While s-7t8b delivers the core orchestration infrastructure (Phases 0-9), this spec covers operational features needed for production-readiness and extensibility features for customization.\n\n**Prerequisite:** All phases of [[s-7t8b]] complete.\n\n**Last Updated:** Post-Phase 7/8 exploration - refined scope based on codebase analysis.\n\n---\n\n## Implementation Status Summary\n\nBased on codebase exploration, here's what's already implemented vs still needed:\n\n| Feature | Status | Notes |\n|---------|--------|-------|\n| Role override/extension logic | ✅ Done | Registry supports merge/replace, extends |\n| Tool filtering by capability | ✅ Done | CAPABILITY_TOOL_MAP in capabilities.ts |\n| Task tool provider wiring | ✅ Done | MCP server integrates TaskToolProvider |\n| Execution tracking config | ✅ Done | ExecutionTrackingConfig in sudocode backend |\n| YAML role config loading | ❌ Missing | Registry uses runtime registration only |\n| Role validation & fallbacks | ⚠️ Partial | Basic validation, needs graceful fallbacks |\n| GUPP violation detection | ❌ Missing | Not implemented |\n| Health check timers | ✅ Done | `HealthCheckService` in `src/monitor/` |\n| Stalled/stuck detection | ✅ Done | `StallDetector` in `src/monitor/` |\n| Retry policy framework | ✅ Done | `RetryPolicy` types and helpers in `src/task/retry-policy.ts` |\n| Activity tracking | ✅ Done | `last_activity_at` on Agent type |\n| Session processing state | ✅ Done | `isProcessing` in SessionMapping |\n| **Phase B Integration** | ❌ Missing | Infrastructure exists but not wired into coordinator |\n| Lease renewal/expiration | ❌ Missing | No lease management |\n| Handoff protocol | ❌ Missing | Not implemented |\n| Channel history API | ❌ Missing | No message history retrieval |\n| Cross-batch dependencies | ❌ Missing | Single-batch only |\n\n---\n\n## Revised Scope\n\n### Removed from Scope (Already Implemented)\n\nThe following were originally planned but are now implemented:\n\n1. **Phase A3: Tool Filtering by Capability** - `CAPABILITY_TOOL_MAP` exists in `src/roles/capabilities.ts:132-171`, `filterToolsForRole()` in registry\n2. **Phase D: Execution Binding** - `ExecutionTrackingConfig` implemented in sudocode backend with modes (none, bound-only, all, filter)\n3. **Phase F: Tool Provider Wiring** - MCP server checks `taskToolProvider?.getExcludedTools()` and registers tools dynamically\n\n### Remaining Scope\n\n| Phase | Scope | Priority | Status |\n|-------|-------|----------|--------|\n| **A** | YAML role config loading, graceful fallbacks | Medium | Not started |\n| **B** | Monitor active behaviors (GUPP, health checks, stuck detection, retry) | **High** | ✅ Infrastructure done, ⚠️ Integration pending |\n| **C** | Cross-batch coordinator dependencies | Low | Not started |\n| **E** | Session management (handoff, lease, history) | Medium | Not started |\n\n> **Note on Phase B:** The monitoring infrastructure (B0-B4) is fully implemented with 72+ tests passing. However, this infrastructure is not yet wired into the coordinator lifecycle. See \"Phase B Integration (Future Work)\" section below for details on what's needed to activate monitoring.\n\n---\n\n## Phase A: Role Configuration Files (Refined)\n\n**What exists:** RoleRegistry with runtime registration, override/extend logic, validation\n**What's missing:** File-based configuration loading\n\n### A1: YAML Config File Loading\n\nEnable users to define roles via configuration files.\n\n**Files to create:**\n- `src/roles/config-loader.ts` - YAML loading and parsing\n\n```typescript\ninterface RoleConfigLoader {\n  // Load roles from standard locations\n  loadProjectRoles(projectPath: string): RoleConfig[];  // .macro-agent/roles/\n  loadUserRoles(): RoleConfig[];                        // ~/.macro-agent/roles/\n  \n  // Watch for changes (optional hot reload)\n  watch(callback: (roles: RoleConfig[]) => void): Unsubscribe;\n}\n\n// Configuration file format\n// .macro-agent/roles/roles.yaml\nroles:\n  worker:\n    override: merge\n    lifecycle:\n      maxDurationMs: 3600000\n\n  security-auditor:\n    extends: worker\n    capabilities:\n      - file.read\n      - exec.command\n```\n\n**Integration:** Wire into RoleRegistry initialization to load from files before runtime registration.\n\n### A2: Graceful Fallback Enhancement\n\n**What exists:** Basic validation in registry\n**What's missing:** Structured fallback chain with logging\n\n```typescript\nfunction loadRoleWithFallback(name: string): RoleDefinition {\n  try {\n    const role = registry.resolveRole(name);\n    const validation = validateRole(role);\n    \n    if (!validation.isValid) {\n      logger.error(`Invalid role '${name}':`, validation.errors);\n      return getFallbackRole(name);\n    }\n    \n    if (validation.warnings.length > 0) {\n      logger.warn(`Role '${name}' warnings:`, validation.warnings);\n    }\n    \n    return role;\n  } catch (error) {\n    logger.error(`Failed to load role '${name}':`, error);\n    return getFallbackRole(name);\n  }\n}\n\n// Fallback hierarchy:\n// 1. Invalid custom role → built-in with same name\n// 2. Invalid built-in → generic role\n// 3. Invalid extends target → generic with warning\n```\n\n---\n\n## Phase B: Monitor Active Behaviors (Critical)\n\n**What exists:** MonitorRole definition, ActivityWatcher, event-driven architecture\n**What's missing:** Actual monitoring logic implementation\n\nThis is the highest priority phase - without it, stuck workers won't be detected.\n\n---\n\n### Design Decisions (Resolved)\n\n| Question | Decision | Rationale |\n|----------|----------|-----------|\n| **Activity tracking** | Add `lastActivityAt` to Agent type | Consistent, queryable, updated on agent events |\n| **Timer architecture** | Per-coordinator timer (service) | Starts when coordinator spawns, runs until issues, then activates monitor agent |\n| **Session liveness** | Use ACP-factory session status | `session.isProcessing` via SessionMapper |\n| **Terminology** | Use \"stalled\" only; detect & remove zombies | Simpler mental model |\n| **Remediation** | Escalate to coordinator | Coordinator decides re-spawn strategy |\n| **Retry orchestration** | Coordinator handles | Centralized decision making |\n\n---\n\n### Architecture Overview\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     Coordinator Agent                            │\n│  - Receives STALLED_AGENT signals                               │\n│  - Decides: restart worker, reassign task, or escalate          │\n│  - Handles retry orchestration                                   │\n└─────────────────────────────────────────────────────────────────┘\n                              ▲\n                              │ STALLED_AGENT signal\n                              │\n┌─────────────────────────────────────────────────────────────────┐\n│                  HealthCheckService (TypeScript)                 │\n│  - Per-coordinator timer, started on coordinator spawn          │\n│  - Runs periodically until issues detected                       │\n│  - Checks: lastActivityAt, ACP session status                   │\n│  - Emits: STALLED_AGENT, HEALTH_CHECK signals                   │\n│  - Detects & removes zombies automatically                       │\n└─────────────────────────────────────────────────────────────────┘\n                              │\n                              │ queries\n                              ▼\n┌─────────────────────────────────────────────────────────────────┐\n│  Agent (with lastActivityAt)     │    SessionMapper              │\n│  - Updated on any agent event    │    - Maps agentId → ACP session│\n│  - Queryable via EventStore      │    - Provides isProcessing    │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n### B0: Add `lastActivityAt` to Agent Type\n\n**Schema change required.** Update Agent type to track activity.\n\n**Files to modify:**\n- `src/store/types/agents.ts` - Add `lastActivityAt` field\n- `src/store/event-store.ts` - Update on agent events\n\n```typescript\n// src/store/types/agents.ts\nexport interface Agent {\n  // ... existing fields ...\n\n  /** Last time this agent emitted an event (updated automatically) */\n  lastActivityAt?: Timestamp;\n}\n```\n\n**Update triggers:** Any event emitted by the agent updates `lastActivityAt`:\n- Agent emits status\n- Agent creates/updates task\n- Agent spawns child\n- Agent calls done()\n\n---\n\n### B1: Health Check Timer Service\n\nPer-coordinator timer that runs as TypeScript service (not an agent).\n\n**Files to create:**\n- `src/monitor/health-check-service.ts` - Timer management and health checking\n\n```typescript\ninterface HealthCheckConfig {\n  intervalMs: number;               // Default: 5 min (300000)\n  stalledThresholdMs: number;       // Default: 10 min (600000)\n  guppThresholdMs: number;          // Default: 30 min (1800000)\n  consecutiveFailuresBeforeEscalate: number; // Default: 3\n}\n\ninterface HealthCheckService {\n  /** Start monitoring workers for a coordinator */\n  startForCoordinator(coordinatorId: AgentId, config?: Partial<HealthCheckConfig>): void;\n\n  /** Stop monitoring (coordinator terminated) */\n  stopForCoordinator(coordinatorId: AgentId): void;\n\n  /** Manual health check trigger */\n  checkNow(coordinatorId: AgentId): Promise<HealthCheckResult>;\n\n  /** Get current health state for debugging */\n  getHealthState(coordinatorId: AgentId): CoordinatorHealthState | undefined;\n}\n\ninterface CoordinatorHealthState {\n  coordinatorId: AgentId;\n  workers: Map<AgentId, WorkerHealthState>;\n  lastCheckAt: Timestamp;\n  nextCheckAt: Timestamp;\n}\n\ninterface WorkerHealthState {\n  agentId: AgentId;\n  lastActivityAt: Timestamp;\n  consecutiveFailures: number;\n  status: 'healthy' | 'warning' | 'stalled';\n  acpSessionId?: string;\n  isSessionProcessing?: boolean;\n}\n```\n\n**Health check flow:**\n1. Timer fires every `intervalMs`\n2. Query EventStore for workers under coordinator\n3. For each worker:\n   a. Check `lastActivityAt` - if stale, mark warning\n   b. Query SessionMapper for ACP session status\n   c. If `!session.isProcessing` and stale activity → STALLED\n4. If stalled: emit `STALLED_AGENT` signal to coordinator\n5. Detect zombies: agent.state === 'stopped' but ACP session still exists → cleanup\n\n---\n\n### B2: Session Status Integration\n\nUse ACP-factory session status to detect dead sessions.\n\n**Files to modify:**\n- `src/acp/session-mapper.ts` - Add method to get session status\n- `src/monitor/health-check-service.ts` - Use session status in checks\n\n```typescript\n// Addition to SessionMapper\nexport class SessionMapper {\n  // ... existing methods ...\n\n  /**\n   * Get session processing status for an agent\n   * Returns undefined if no session mapped\n   */\n  getSessionStatus(agentId: AgentId): { isProcessing: boolean; sessionId: string } | undefined {\n    // Find session for this agent\n    for (const [sessionId, mapping] of this.mappings) {\n      if (mapping.agentId === agentId) {\n        // Would need reference to actual ACP Session object\n        // This requires architectural consideration\n        return { isProcessing: false, sessionId }; // Placeholder\n      }\n    }\n    return undefined;\n  }\n}\n```\n\n**Integration approach:** The MacroAgent needs to expose session status:\n- Option A: SessionMapper holds reference to Session objects (memory overhead)\n- Option B: Add extension method `_macro/getSessionStatus` to query status\n- Option C: Track `isProcessing` state in SessionMapping (updated on prompt start/end)\n\n**Recommended:** Option C - track in mapping, lightweight and already have hooks.\n\n```typescript\ninterface SessionMapping {\n  // ... existing fields ...\n\n  /** Whether the session is currently processing a prompt */\n  isProcessing: boolean;\n\n  /** Last time isProcessing changed */\n  lastProcessingChangeAt: number;\n}\n```\n\n---\n\n### B3: Stalled Agent Detection & Zombie Cleanup\n\nUnified detection for stalled agents and zombie cleanup.\n\n**Files to create:**\n- `src/monitor/stall-detector.ts`\n\n```typescript\ninterface StalledAgent {\n  agentId: AgentId;\n  coordinatorId: AgentId;\n  lastActivityAt: Timestamp;\n  stalledDurationMs: number;\n  assignedTaskId?: TaskId;\n  sessionStatus: 'processing' | 'idle' | 'unknown';\n}\n\ninterface ZombieAgent {\n  agentId: AgentId;\n  stoppedAt: Timestamp;\n  acpSessionId: string;\n  reason: 'done_called' | 'parent_terminated';\n}\n\nclass StallDetector {\n  constructor(\n    private eventStore: EventStore,\n    private sessionMapper: SessionMapper,\n    private config: { stalledThresholdMs: number }\n  ) {}\n\n  /**\n   * Detect stalled workers under a coordinator\n   */\n  detectStalled(coordinatorId: AgentId): StalledAgent[] {\n    const workers = this.eventStore.listAgents({ parent: coordinatorId, state: 'running' });\n    const now = Date.now();\n    const stalled: StalledAgent[] = [];\n\n    for (const worker of workers) {\n      const stalledMs = now - (worker.lastActivityAt ?? worker.created_at);\n      if (stalledMs > this.config.stalledThresholdMs) {\n        const sessionStatus = this.sessionMapper.getSessionStatus(worker.id);\n\n        // Only stalled if session is NOT processing\n        if (!sessionStatus?.isProcessing) {\n          stalled.push({\n            agentId: worker.id,\n            coordinatorId,\n            lastActivityAt: worker.lastActivityAt ?? worker.created_at,\n            stalledDurationMs: stalledMs,\n            assignedTaskId: worker.task_id,\n            sessionStatus: sessionStatus ? 'idle' : 'unknown',\n          });\n        }\n      }\n    }\n\n    return stalled;\n  }\n\n  /**\n   * Detect and clean up zombie agents\n   */\n  async cleanupZombies(): Promise<ZombieAgent[]> {\n    // Agents in 'stopped' state but still have ACP session\n    const stoppedAgents = this.eventStore.listAgents({ state: 'stopped' });\n    const zombies: ZombieAgent[] = [];\n\n    for (const agent of stoppedAgents) {\n      const sessions = this.sessionMapper.getSessionsForAgent(agent.id);\n      if (sessions.length > 0) {\n        zombies.push({\n          agentId: agent.id,\n          stoppedAt: agent.stopped_at!,\n          acpSessionId: sessions[0],\n          reason: agent.stop_reason === 'completed' ? 'done_called' : 'parent_terminated',\n        });\n\n        // Cleanup: remove the mapping\n        for (const sessionId of sessions) {\n          this.sessionMapper.removeMapping(sessionId);\n        }\n      }\n    }\n\n    return zombies;\n  }\n}\n```\n\n---\n\n### B4: Retry Policy (Coordinator-Orchestrated)\n\nRetry policy stored on tasks, coordinator executes retries.\n\n**Files to create:**\n- `src/task/retry-policy.ts` - Policy types and helpers\n\n```typescript\ninterface RetryPolicy {\n  maxRetries: number;                 // Default: 0 (no retry)\n  retryOn: ('failed' | 'stalled')[];  // Which conditions trigger retry\n  backoffMs: number;                  // Initial delay (default: 1000)\n  backoffMultiplier: number;          // Exponential backoff (default: 2)\n  maxBackoffMs: number;               // Cap (default: 60000)\n}\n\ninterface RetryState {\n  attemptCount: number;\n  lastAttemptAt: Timestamp;\n  lastError?: string;\n}\n\n// Helper functions for coordinator to use\nfunction shouldRetry(task: Task, policy: RetryPolicy, reason: 'failed' | 'stalled'): boolean;\nfunction calculateBackoff(attemptCount: number, policy: RetryPolicy): number;\nfunction prepareForRetry(task: Task): Task; // Reset status to 'pending', increment attempt\n```\n\n**Files to modify:**\n- `src/store/types/tasks.ts` - Add `retryPolicy` and `retryState` to Task\n\n```typescript\nexport interface Task {\n  // ... existing fields ...\n\n  /** Retry policy for this task */\n  retryPolicy?: RetryPolicy;\n\n  /** Current retry state */\n  retryState?: RetryState;\n}\n```\n\n**Coordinator flow on STALLED_AGENT:**\n1. Receive signal with agentId and taskId\n2. Get task, check `retryPolicy`\n3. If `shouldRetry(task, policy, 'stalled')`:\n   a. Calculate backoff\n   b. After delay: reset task to 'pending'\n   c. Terminate stalled agent\n   d. Task will be picked up by next available worker\n4. If no retries left: mark task as 'failed', notify user\n\n---\n\n### Implementation Order\n\n```\nB0: Add lastActivityAt to Agent type        ✅ DONE (i-499t)\n    ↓ (schema change, tests)\nB2: Session status integration              ✅ DONE (i-6gsh)\n    ↓ (isProcessing tracking in SessionMapping)\nB3: Stall detector                          ✅ DONE (i-7p6u)\n    ↓ (uses B0 and B2)\nB1: Health check service                    ✅ DONE (i-1oyx)\n    ↓ (uses B3, emits signals)\nB4: Retry policy                            ✅ DONE (i-9jmd)\n    ↓ (coordinator consumes signals, uses policy)\nB5: Integration                             ⚠️ PENDING (no issue yet)\n    ↓ (wire into coordinator lifecycle)\n```\n\n---\n\n### Files Summary\n\n| File | Type | Purpose | Status |\n|------|------|---------|--------|\n| `src/store/types/agents.ts` | Modify | Add `last_activity_at` | ✅ Done |\n| `src/store/types/tasks.ts` | Modify | Add `RetryPolicy`, `RetryState`, task fields | ✅ Done |\n| `src/store/event-store.ts` | Modify | Update `last_activity_at` on events, retry support | ✅ Done |\n| `src/acp/session-mapper.ts` | Modify | Track `isProcessing` in mapping | ✅ Done |\n| `src/acp/types.ts` | Modify | Add `isProcessing` to `SessionMapping` | ✅ Done |\n| `src/acp/macro-agent.ts` | Modify | Hook `isProcessing` into prompt lifecycle | ✅ Done |\n| `src/task/task-manager.ts` | Modify | Add `prepareForRetry()`, `updateRetryState()` | ✅ Done |\n| `src/task/types.ts` | Modify | Add `retryPolicy` to `CreateTaskOptions` | ✅ Done |\n| `src/monitor/health-check-service.ts` | Create | Per-coordinator timer service | ✅ Done |\n| `src/monitor/stall-detector.ts` | Create | Stalled agent and zombie detection | ✅ Done |\n| `src/task/retry-policy.ts` | Create | Retry policy types and helpers | ✅ Done |\n| `src/monitor/index.ts` | Create | Module exports | ✅ Done |\n| `src/monitor/__tests__/stall-detector.test.ts` | Create | 17 tests | ✅ Done |\n| `src/monitor/__tests__/health-check-service.test.ts` | Create | 13 tests | ✅ Done |\n| `src/task/__tests__/retry-policy.test.ts` | Create | 32 tests | ✅ Done |\n\n---\n\n### Phase B Integration (Future Work)\n\n**Status:** Phase B infrastructure is implemented, but not yet wired into active code paths.\n\nThe following integration work is needed to activate the monitoring system:\n\n#### What's Implemented (Infrastructure)\n\n| Component | Location | Status |\n|-----------|----------|--------|\n| `last_activity_at` tracking | `EventStore` | ✅ Auto-updates on agent events |\n| `isProcessing` tracking | `SessionMapper` / `MacroAgent.prompt()` | ✅ Auto-updates on prompt lifecycle |\n| `StallDetector` | `src/monitor/stall-detector.ts` | ✅ Ready to use |\n| `HealthCheckService` | `src/monitor/health-check-service.ts` | ✅ Ready to use |\n| `RetryPolicy` helpers | `src/task/retry-policy.ts` | ✅ Ready to use |\n| `prepareForRetry()` | `TaskManager` | ✅ Ready to use |\n\n#### What's Missing (Integration)\n\n1. **HealthCheckService Startup**\n   - Nothing currently instantiates `HealthCheckService`\n   - Needs to be started when a coordinator spawns\n   - Needs to be stopped when coordinator terminates\n\n   ```typescript\n   // Example: In coordinator spawn handler\n   const healthService = new HealthCheckService(eventStore, sessionMapper, messageRouter);\n   healthService.startForCoordinator(coordinatorId);\n\n   // On coordinator terminate\n   healthService.stopForCoordinator(coordinatorId);\n   ```\n\n2. **STALE_AGENT Signal Handler**\n   - `CoordinatorRole` subscribes to `STALE_AGENT` (defined in `src/roles/builtin/coordinator.ts:68`)\n   - No actual handler code exists to process these signals\n   - Coordinator needs to implement the retry flow\n\n   ```typescript\n   // Example: Coordinator signal handler (does not exist yet)\n   function handleStaleAgent(signal: { workerId: AgentId, taskId: TaskId }) {\n     const task = taskManager.get(taskId);\n\n     if (shouldRetry(task, 'stalled')) {\n       const backoff = calculateBackoff(\n         task.retryState?.attemptCount ?? 0,\n         task.retryPolicy!\n       );\n\n       // Reset task for retry after backoff\n       setTimeout(() => {\n         taskManager.prepareForRetry(taskId, 'Agent stalled');\n         // Task returns to 'pending', will be picked up by available worker\n       }, backoff);\n\n       // Terminate the stalled agent\n       agentManager.terminate(workerId, 'stalled');\n     } else {\n       // No retries left\n       taskManager.updateStatus(taskId, 'failed');\n       // Notify user or escalate\n     }\n   }\n   ```\n\n3. **Coordinator Implementation**\n   - The `CoordinatorRole` is just a role definition (capabilities, subscriptions)\n   - No `Coordinator` class exists that implements actual orchestration logic\n   - This is where signal handlers, retry logic, and worker management would live\n\n#### Integration Options\n\n**Option A: Coordinator Class**\nCreate a `Coordinator` class that encapsulates orchestration logic:\n- Instantiates and manages `HealthCheckService`\n- Handles `STALE_AGENT`, `WORKER_DONE`, etc. signals\n- Implements retry flow using `RetryPolicy` helpers\n\n**Option B: Agent Manager Extension**\nExtend `AgentManager` to handle monitoring:\n- Start `HealthCheckService` when spawning coordinator-role agents\n- Wire signal routing to appropriate handlers\n\n**Option C: Lifecycle Hooks**\nAdd lifecycle hooks that auto-wire monitoring:\n- `onCoordinatorSpawn` → start health checks\n- `onCoordinatorTerminate` → stop health checks\n- Role-specific signal handlers\n\n#### Recommended Next Steps\n\n1. Create issue to track integration work\n2. Decide on integration architecture (Option A/B/C above)\n3. Implement coordinator signal handling\n4. Add integration tests for full monitoring flow\n\n---\n\n## Phase C: Cross-Batch Dependencies\n\n**Priority:** Low (most use cases work with single coordinator)\n\n### C1: Dependency Declaration\n\n```typescript\ninterface CoordinatorConfig {\n  // Existing fields...\n  \n  dependencies?: BatchDependency[];\n}\n\ninterface BatchDependency {\n  type: 'start' | 'merge' | 'baseline';\n  coordinatorId: CoordinatorId;\n}\n\n// Semantics:\n// - start: Can't spawn workers until dependency's work lands\n// - merge: Workers can start, can't merge until dependency lands\n// - baseline: Base branch forks from dependency's base branch\n```\n\n### C2: Dependency Tracking\n\n```typescript\ninterface DependencyTracker {\n  register(coordinatorId: CoordinatorId, deps: BatchDependency[]): void;\n  canStart(coordinatorId: CoordinatorId): boolean;\n  canMerge(coordinatorId: CoordinatorId): boolean;\n  getBlockers(coordinatorId: CoordinatorId): CoordinatorId[];\n  onLanded(coordinatorId: CoordinatorId): void;\n}\n```\n\n---\n\n## Phase E: Session Management\n\n### E1: Handoff Protocol\n\nEnable graceful context transfer when sessions cycle.\n\n```typescript\ninterface HandoffPayload {\n  context: string;           // What was being worked on\n  hookedWork: HookedWork[];  // Pending operations\n  reason: 'timeout' | 'manual' | 'error';\n  checkpoint?: {\n    lastCommit?: string;\n    uncommittedFiles?: string[];\n  };\n}\n\n// Handoff flow:\n// 1. Agent detects session ending (timeout, error)\n// 2. Commit any uncommitted work\n// 3. Emit HANDOFF signal with context\n// 4. Parent decides: spawn replacement, queue, or escalate\n```\n\n### E2: Lease Renewal & Expiration\n\n```typescript\ninterface LeaseConfig {\n  defaultLeaseMs: number;      // Default: 30 min\n  maxRenewals: number;         // Default: 3\n  renewalExtensionMs: number;  // Default: 15 min per renewal\n}\n\ninterface LeaseManager {\n  grantLease(taskId: TaskId, agentId: AgentId, config?: LeaseConfig): Lease;\n  renewLease(taskId: TaskId): boolean;\n  checkExpired(): ExpiredLease[];\n  onExpired(callback: (lease: ExpiredLease) => void): Unsubscribe;\n}\n\n// Expiration handling:\n// 1. LeaseManager detects expired lease\n// 2. Emit LEASE_EXPIRED signal\n// 3. Unassign task (returns to ready pool)\n// 4. Optionally: terminate agent if unresponsive\n```\n\n### E3: Channel History API\n\nSupport late-joiner catch-up.\n\n```typescript\ninterface ChannelHistory {\n  getHistory(channel: string, options: HistoryOptions): Promise<Message[]>;\n  replayTo(agentId: AgentId, channel: string, since: Timestamp): Promise<void>;\n}\n\ninterface HistoryOptions {\n  since?: Timestamp;\n  limit?: number;        // Default: 100\n  messageTypes?: string[];\n}\n```\n\n---\n\n## Implementation Order (Revised)\n\n```\nPhase B (Monitor Behaviors)        ← CRITICAL for production safety\n    ↓\nPhase A (YAML Config Loading)      ← Enables user customization\n    ↓\nPhase E (Session Management)       ← Operational resilience\n    ↓\nPhase C (Multi-Coordinator)        ← Advanced orchestration (optional)\n```\n\n**Rationale:**\n- Phase B is critical - without it, stuck workers accumulate\n- Phase A is medium priority - users can still register roles at runtime\n- Phase E builds on B for session resilience\n- Phase C is lowest priority - single coordinator works for most cases\n\n---\n\n## Resolved Design Questions\n\n### Q1: Monitor Implementation Model ✅\n\n**Decision:** Hybrid - HealthCheckService (TypeScript service) for timers/detection, Coordinator agent for remediation decisions.\n\n### Q2: Stall Remediation ✅\n\n**Decision:** Escalate to coordinator via STALLED_AGENT signal. Coordinator decides based on retry policy.\n\n### Q3: Retry Scope ✅\n\n**Decision:** Task-level policy, coordinator-orchestrated execution.\n\n### Q4: Activity Tracking ✅\n\n**Decision:** Add `lastActivityAt` field to Agent type, updated on any agent event emission.\n\n### Q5: Session Liveness ✅\n\n**Decision:** Use ACP session status via `isProcessing` tracked in SessionMapping.\n\n### Q6: Terminology ✅\n\n**Decision:** Use \"stalled\" for inactive agents. Detect and auto-remove zombies (stopped agents with lingering sessions).\n\n---\n\n## Dependencies\n\n- [[s-7t8b]] Multi-Agent Orchestration Implementation Plan (prerequisite - complete)\n- [[s-60tc]] Specialized Agent Roles (Phase A)\n- [[s-32xs]] Self-Cleaning Workers (Phase B)\n- [[s-9rld]] In-Flight Steering (Phase E)\n- [[s-bcqm]] Change Management and Merge Queue (Phase C)\n\n## References\n\n**Existing Code:**\n- Role registry: `src/roles/registry.ts`\n- Capabilities: `src/roles/capabilities.ts`\n- Activity watcher: `src/activity/watcher.ts`\n- Monitor role: `src/roles/builtin/monitor.ts`\n- Agent types: `src/store/types/agents.ts`\n- Task types: `src/store/types/tasks.ts`\n- Signal types: `src/router/signals.ts`\n- Session mapper: `src/acp/session-mapper.ts`\n- ACP-factory Session: `references/acp-factory/src/session.ts` (isProcessing)\n\n**External References:**\n- Gastown health checks: `references/gastown/internal/deacon/stuck.go`\n- Gastown GUPP: `references/gastown/internal/daemon/lifecycle.go`\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-01-23 03:12:05","updated_at":"2026-01-23T20:37:48.054Z","parent_id":null,"parent_uuid":null,"relationships":[{"from":"s-5yhx","from_type":"spec","to":"s-32xs","to_type":"spec","type":"references"},{"from":"s-5yhx","from_type":"spec","to":"s-60tc","to_type":"spec","type":"references"},{"from":"s-5yhx","from_type":"spec","to":"s-7t8b","to_type":"spec","type":"depends-on"},{"from":"s-5yhx","from_type":"spec","to":"s-7t8b","to_type":"spec","type":"references"},{"from":"s-5yhx","from_type":"spec","to":"s-8472","to_type":"spec","type":"references"},{"from":"s-5yhx","from_type":"spec","to":"s-9rld","to_type":"spec","type":"references"},{"from":"s-5yhx","from_type":"spec","to":"s-bcqm","to_type":"spec","type":"references"}],"tags":["implementation","meta-spec","phase-2","planning"]}
 {"id":"s-1zcx","uuid":"5126aca3-8059-437b-bdae-84bd7d5f86d4","title":"Multi-Agent Orchestration Testing Strategy","file_path":"specs/s-1zcx_multi_agent_orchestration_testing_strategy.md","content":"# Multi-Agent Orchestration Testing Strategy\n\n## Overview\n\nThis spec defines the comprehensive testing strategy for the multi-agent orchestration system implemented across six constituent specs ([[s-60tc]], [[s-7ktd]], [[s-32xs]], [[s-bcqm]], [[s-8472]], [[s-9rld]]). The focus is on E2E and integration tests that exercise multiple components together, including tests with live agents and real git environments.\n\n## Goals\n\n1. **Comprehensive coverage** - Test all orchestration flows, both implemented and planned\n2. **Realistic environments** - Use real git repos, real databases, real agent interactions\n3. **User-facing scenarios** - Test how users actually interact with the system\n4. **Multi-coordinator support** - Test concurrent coordinators with independent streams\n5. **Cost-effective** - Simulate agents where possible, use live agents strategically\n\n---\n\n## Test Categories\n\n### Category 1: Component Integration Tests\n\nMultiple macro-agent components working together, no agents involved. Uses mocked agent context.\n\n| Scope | Speed | Isolation | Git | Database |\n|-------|-------|-----------|-----|----------|\n| 2-5 components | Fast | Per-test | Mocked | In-memory SQLite |\n\n### Category 2: Simulated Agent Tests\n\nFull orchestration flows using the Agent Simulator Harness. No Claude API calls.\n\n| Scope | Speed | Isolation | Git | Database |\n|-------|-------|-----------|-----|----------|\n| Full system | Medium | Per-test | Real (temp repo) | Real SQLite |\n\n### Category 3: Live Agent Tests\n\nReal Claude sessions exercising the full system. Run on-demand or nightly.\n\n| Scope | Speed | Isolation | Git | Database |\n|-------|-------|-----------|-----|----------|\n| Full system + Claude | Slow | Per-test | Real (temp repo) | Real SQLite |\n\n### Category 4: User Interaction Tests\n\nTests that simulate how a human user interacts with the system via CLI/API.\n\n| Scope | Speed | Isolation | Git | Database |\n|-------|-------|-----------|-----|----------|\n| User → System | Medium-Slow | Per-test | Real | Real SQLite |\n\n### Category 5: Multi-Coordinator Tests\n\nMultiple coordinators running concurrently with different integration streams.\n\n| Scope | Speed | Isolation | Git | Database |\n|-------|-------|-----------|-----|----------|\n| N coordinators | Slow | Per-test | Real (shared repo) | Real SQLite |\n\n---\n\n## Phase 2: Comprehensive E2E Test Implementation\n\nThis section documents the design decisions and detailed requirements for comprehensive E2E testing using the test harness infrastructure built in Phase 1.\n\n### Design Decisions\n\n| Decision | Choice | Rationale |\n|----------|--------|-----------|\n| **Worktree Strategy** | Hybrid (bare repo + real worktrees) | Tests real git behavior where it matters, controlled environment |\n| **Signal Flow** | Both auto-wire and manual trigger | Auto-wire for convenience, manual for targeted testing |\n| **Merge Queue Processing** | Both simulated integrator and harness helper | Full E2E with integrator, focused tests with helper |\n| **Stream Scoping** | Explicit streamId on every spawn | Full visibility and control since we set up the tests |\n\n### Extended TestHarness Requirements\n\nThe TestHarness must be extended with additional components to support comprehensive E2E testing:\n\n#### New Components\n\n| Component | Purpose | Key Methods |\n|-----------|---------|-------------|\n| **WorkspaceManager** | Create/manage worktrees for agents | `createWorktree()`, `deleteWorktree()`, `getWorkspacePath()` |\n| **MergeQueue** | Track merge requests, process merges | `submit()`, `process()`, `getStatus()`, `listPending()` |\n| **LifecycleManager** | Handle done() signals, cascade termination | `handleWorkerDone()`, `handleCascade()` |\n\n#### Extended Interface\n\n```typescript\ninterface TestHarness {\n  // Existing services\n  readonly eventStore: EventStore;\n  readonly messageRouter: MessageRouter;\n  readonly taskManager: TaskManager;\n  \n  // New components\n  readonly workspaceManager: WorkspaceManager;\n  readonly mergeQueue: MergeQueue;\n  \n  // Workspace management\n  createWorktreeForAgent(agentId: string, branch?: string): Promise<string>;\n  getAgentWorkspace(agentId: string): string | undefined;\n  \n  // Merge queue operations\n  submitMergeRequest(options: MergeRequestOptions): Promise<string>;\n  processMergeQueue(streamId?: string): Promise<ProcessResult>;\n  getMergeRequestStatus(mrId: string): MergeRequestStatus;\n  listMergeRequests(streamId: string): MergeRequest[];\n  getMergeLog(streamId: string): MergeLogEntry[];\n  \n  // Lifecycle helpers\n  triggerWorkerDone(agentId: string, status: DoneStatus): Promise<DoneResult>;\n  triggerCascadeTermination(agentId: string): Promise<CascadeResult>;\n  \n  // Checkpoint management\n  getCheckpoints(agentId: string): Checkpoint[];\n  \n  // New assertions\n  assertMergeRequestStatus(mrId: string, status: MergeRequestStatus): void;\n  assertMergeQueueLength(streamId: string, length: number): void;\n  assertWorktreeExists(agentId: string): void;\n  assertWorktreeDeleted(agentId: string): void;\n  assertBranchContainsCommit(branch: string, commitMessage: string): void;\n}\n```\n\n#### TempRepoFactory Extension\n\n```typescript\ninterface TempRepoOptions {\n  initialFiles?: Record<string, string>;\n  initialBranch?: string;\n  bare?: boolean;  // NEW: Create bare repo for worktree support\n  withDataplane?: boolean;\n  withSudocode?: boolean;\n  sudocodeSpecs?: Partial<Spec>[];\n  sudocodeIssues?: Partial<Issue>[];\n}\n```\n\n---\n\n### Test Scenario Category 1: Full Orchestration Flow\n\n**File:** `test_fixtures/harness/__tests__/orchestration-flow.e2e.test.ts`\n\nTests the complete worker lifecycle with real components:\n\n```\nWorker spawns → does work → commits → calls done() \n  → WORKER_DONE signal → merge request submitted to queue \n  → integrator processes → merges to integration branch \n  → workspace cleanup\n```\n\n#### Scenario 1a: Single Worker Happy Path\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Worker completes task and merges successfully |\n| **Components** | Simulator, EventStore, MergeQueue, WorkspaceManager |\n| **Setup** | Bare repo, coordinator with stream, worker with worktree |\n| **Steps** | 1. Spawn coordinator 2. Spawn worker 3. Worker writes file 4. Worker commits 5. Worker calls done() 6. Process merge queue |\n| **Assertions** | Agent terminated, MR status=merged, commit on integration branch, worktree deleted |\n\n```typescript\nit('worker completes and merges to integration branch', async () => {\n  const repo = await harness.createTempRepo({\n    initialFiles: TYPESCRIPT_PROJECT,\n    bare: true,\n  });\n  \n  const coord = await harness.spawnSimulator({\n    role: 'coordinator',\n    streamId: 'test-stream',\n    behavior: SIMPLE_COORDINATOR,\n  });\n  \n  const worker = await harness.spawnSimulator({\n    role: 'worker',\n    parentId: coord.agentId,\n    streamId: 'test-stream',\n    behavior: {\n      onStart: [\n        { type: 'write_file', path: 'feature.ts', content: 'export const x = 1;' },\n        { type: 'commit', message: 'Add feature' },\n        { type: 'done', status: 'completed' },\n      ],\n    },\n  });\n  \n  await harness.waitForAll();\n  await harness.processMergeQueue('test-stream');\n  \n  harness.assertAgentTerminated(worker.agentId);\n  harness.assertMergeRequestStatus(worker.agentId, 'merged');\n  harness.assertBranchContainsCommit('integration/test-stream', 'Add feature');\n  harness.assertWorktreeDeleted(worker.agentId);\n});\n```\n\n#### Scenario 1b: Multiple Workers Sequential Merge\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Multiple workers complete and merge in FIFO order |\n| **Components** | 3 Simulators, MergeQueue, WorkspaceManager |\n| **Setup** | 3 workers editing different files |\n| **Steps** | Workers complete in order, process queue |\n| **Assertions** | Merge order matches completion order |\n\n#### Scenario 1c: Worker with Multiple Commits (Checkpoints)\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Worker creates checkpoints for each commit |\n| **Components** | Simulator, EventStore (checkpoints) |\n| **Behavior** | MULTI_COMMIT_WORKER (3 commits) |\n| **Assertions** | 3 checkpoints created, all commits in final merge |\n\n#### Scenario 1d: Worker Fails Mid-Work\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Failed worker does not submit merge request |\n| **Behavior** | FAILING_WORKER |\n| **Assertions** | No MR exists, worktree still cleaned up |\n\n#### Scenario 1e: Worker Blocked, Emits HELP\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Blocked worker signals coordinator for help |\n| **Behavior** | BLOCKED_WORKER |\n| **Assertions** | HELP signal received by coordinator, worker status=blocked |\n\n---\n\n### Test Scenario Category 2: Conflict Resolution Flow\n\n**File:** `test_fixtures/harness/__tests__/conflict-resolution.e2e.test.ts`\n\nTests the full conflict detection and resolution cycle:\n\n```\nWorker A completes → Worker B completes (conflicts with A)\n  → Integrator detects conflict → spawns resolver worker\n  → Resolver works on resolver/<mr-id> branch\n  → Resolver calls done() → RESOLVER_DONE\n  → Inline merge to integration branch\n```\n\n#### Scenario 2a: Simple Conflict Resolution\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Two workers conflict, resolver fixes it |\n| **Setup** | 2 workers editing same file with conflicting content |\n| **Steps** | 1. Worker A edits file 2. Worker B edits same file differently 3. Both complete 4. Process queue 5. Conflict detected 6. Resolver spawned 7. Resolver fixes 8. Inline merge |\n| **Assertions** | MR status transitions (pending→processing→conflict→merged), resolver branch exists then deleted |\n\n#### Scenario 2b: Nested Conflict (Resolver Also Conflicts)\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Resolver's fix also conflicts, escalates to coordinator |\n| **Assertions** | CONFLICT_UNRESOLVED signal emitted, escalation logged |\n\n#### Scenario 2c: Multiple Conflicts in Queue\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Queue with multiple conflicting MRs |\n| **Assertions** | Each conflict handled independently, queue ordering maintained |\n\n#### Scenario 2d: Resolver Fails\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Resolver fails to resolve, MR stays in conflict |\n| **Assertions** | MR status remains 'conflict', error logged |\n\n---\n\n### Test Scenario Category 3: Cascade Termination\n\n**File:** `test_fixtures/harness/__tests__/cascade-termination.e2e.test.ts`\n\nTests hierarchical agent termination with change consolidation:\n\n```\nCoordinator terminates → children terminated depth-first\n  → child changes consolidated to parent branches\n  → workspaces cleaned up\n```\n\n#### Scenario 3a: Coordinator + 2 Workers Cascade\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Coordinator terminates, children follow |\n| **Setup** | Coordinator with 2 active workers |\n| **Assertions** | All terminated, child branches merged to coordinator branch |\n\n#### Scenario 3b: Deep Hierarchy (3+ Levels)\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Cascade through multiple levels |\n| **Setup** | Coordinator → Worker → Sub-workers |\n| **Assertions** | Depth-first termination order, all changes preserved |\n\n#### Scenario 3c: Cascade During Active Work\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Workers have uncommitted changes during cascade |\n| **Assertions** | Uncommitted changes auto-committed before termination |\n\n#### Scenario 3d: Partial Cascade\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Some workers already done when cascade triggered |\n| **Assertions** | Only active workers terminated, completed workers unchanged |\n\n---\n\n### Test Scenario Category 4: Multi-Coordinator Tests\n\n**File:** `test_fixtures/harness/__tests__/multi-coordinator.e2e.test.ts`\n\nTests concurrent coordinators with independent streams.\n\n#### Scenario 4a: Independent Parallel Coordinators\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Two coordinators working on separate features |\n| **Setup** | Coord A (stream-a), Coord B (stream-b), each with 2 workers |\n| **Assertions** | Separate branches, separate merge queues, no interference |\n\n```typescript\nit('parallel coordinators with stream isolation', async () => {\n  const coordA = await harness.spawnSimulator({\n    role: 'coordinator',\n    streamId: 'stream-a',\n    behavior: createMultiWorkerCoordinator(2),\n  });\n  \n  const coordB = await harness.spawnSimulator({\n    role: 'coordinator', \n    streamId: 'stream-b',\n    behavior: createMultiWorkerCoordinator(2),\n  });\n  \n  await harness.waitForAll();\n  \n  // Verify stream isolation\n  const queueA = harness.listMergeRequests('stream-a');\n  const queueB = harness.listMergeRequests('stream-b');\n  \n  expect(queueA.every(mr => mr.streamId === 'stream-a')).toBe(true);\n  expect(queueB.every(mr => mr.streamId === 'stream-b')).toBe(true);\n  \n  harness.assertBranchExists('integration/stream-a');\n  harness.assertBranchExists('integration/stream-b');\n});\n```\n\n#### Scenario 4b: Shared File, Different Parts\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Coordinators edit different parts of same file |\n| **Assertions** | Both merge successfully, no conflict |\n\n#### Scenario 4c: Sequential Dependency (B Waits for A)\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Coordinator B blocked until A completes |\n| **Setup** | B has `blockedBy: ['stream-a']` |\n| **Assertions** | B doesn't spawn workers until A merges to main |\n\n#### Scenario 4d: Shared Integration Branch\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Two coordinators target same integration branch |\n| **Setup** | Both target `develop` branch |\n| **Assertions** | Merge queue serializes correctly, no conflicts |\n\n---\n\n### Test Scenario Category 5: Message Routing & Steering\n\n**File:** `test_fixtures/harness/__tests__/steering-integration.e2e.test.ts`\n\nTests cross-component communication flows.\n\n#### Scenario 5a: Broadcast to @workers\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Coordinator broadcasts to all workers |\n| **Setup** | 2 workers + 1 monitor |\n| **Assertions** | Only workers receive message |\n\n#### Scenario 5b: Priority Message Wakes Sleeping Agent\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Urgent message wakes sleeping worker |\n| **Assertions** | Wake action triggered, worker resumes |\n\n#### Scenario 5c: Context Injection During Execution\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Inject context into running worker |\n| **Assertions** | Worker receives injected context in next turn |\n\n#### Scenario 5d: Message to Terminated Agent\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Send message to already-terminated agent |\n| **Assertions** | Handled gracefully (queued or dropped based on config) |\n\n---\n\n### Test Scenario Category 6: Task Backend Integration\n\n**File:** `test_fixtures/harness/__tests__/task-integration.e2e.test.ts`\n\nTests task management with simulated agents.\n\n#### Scenario 6a: Worker Claims and Completes Task\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Full task lifecycle with worker |\n| **Assertions** | Task status: pending → assigned → in_progress → completed |\n\n#### Scenario 6b: Blocked Task Not in Ready List\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Task with dependencies excluded from ready |\n| **Assertions** | Blocked task only appears after blocker completes |\n\n#### Scenario 6c: Sudocode Issue → Task Mapping\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Sudocode issue syncs to task backend |\n| **Assertions** | Task fields match issue, status synchronized |\n\n#### Scenario 6d: Task Completion Updates Issue\n\n| Aspect | Detail |\n|--------|--------|\n| **Description** | Completing task updates sudocode issue |\n| **Assertions** | Issue closed, feedback added to spec |\n\n---\n\n## Implementation Plan\n\n### Phase 2a: Extend TestHarness (Prerequisite)\n\n1. Add WorkspaceManager integration (bare repo + worktrees)\n2. Add MergeQueue integration\n3. Add lifecycle helpers (triggerWorkerDone, triggerCascade)\n4. Add new assertions\n5. Extend TempRepoFactory for bare repos\n\n### Phase 2b: Full Orchestration Flow Tests\n\n1. Implement scenarios 1a-1e\n2. Test file: `orchestration-flow.e2e.test.ts`\n\n### Phase 2c: Conflict Resolution Flow Tests\n\n1. Implement scenarios 2a-2d\n2. Test file: `conflict-resolution.e2e.test.ts`\n\n### Phase 2d: Cascade Termination Tests\n\n1. Implement scenarios 3a-3d\n2. Test file: `cascade-termination.e2e.test.ts`\n\n### Phase 2e: Multi-Coordinator Tests\n\n1. Implement scenarios 4a-4d\n2. Test file: `multi-coordinator.e2e.test.ts`\n\n### Phase 2f: Steering & Task Integration Tests\n\n1. Implement scenarios 5a-5d, 6a-6d\n2. Test files: `steering-integration.e2e.test.ts`, `task-integration.e2e.test.ts`\n\n---\n\n## Agent Simulator Harness\n\n### Purpose\n\nEnable testing of complex multi-agent scenarios without Claude API calls. Simulates agent decision-making and tool usage based on configurable behavior scripts.\n\n### Design\n\n```typescript\ninterface AgentSimulator {\n  // Identity\n  agentId: string;\n  role: AgentRole;\n  \n  // Behavior configuration\n  behavior: SimulatedBehavior;\n  \n  // Lifecycle\n  start(context: SimulatedContext): Promise<void>;\n  stop(): Promise<void>;\n  \n  // Tool execution (called by harness when agent \"uses\" a tool)\n  handleToolCall(tool: string, params: unknown): Promise<unknown>;\n  \n  // Event injection (simulate receiving messages/signals)\n  injectEvent(event: SimulatedEvent): void;\n  \n  // State access (for conditional behaviors)\n  getWorkspaceState(): WorkspaceState;\n  getGitState(): GitState;\n}\n\ninterface SimulatedBehavior {\n  // What the agent does on start\n  onStart: BehaviorStep[];\n  \n  // How agent responds to events\n  onEvent: Record<string, BehaviorStep[]>;\n  \n  // Conditional behaviors\n  conditions?: ConditionalBehavior[];\n  \n  // Failure injection\n  failAfter?: number;\n  failWith?: Error;\n  \n  // Timing\n  stepDelayMs?: number;\n  timeoutMs?: number;\n}\n\ntype BehaviorStep =\n  | { type: 'call_tool'; tool: string; params: unknown }\n  | { type: 'emit_signal'; signal: string; payload: unknown }\n  | { type: 'wait_for_event'; event: string; timeoutMs?: number }\n  | { type: 'wait_for_condition'; condition: ConditionFn; timeoutMs?: number }\n  | { type: 'write_file'; path: string; content: string }\n  | { type: 'read_file'; path: string; into: string }  // Store result\n  | { type: 'commit'; message: string }\n  | { type: 'sleep'; ms: number }\n  | { type: 'spawn_child'; role: AgentRole; behavior: SimulatedBehavior }\n  | { type: 'done'; status: DoneStatus }\n  | { type: 'conditional'; if: ConditionFn; then: BehaviorStep[]; else?: BehaviorStep[] };\n\ninterface ConditionalBehavior {\n  condition: ConditionFn;\n  behavior: BehaviorStep[];\n}\n\ntype ConditionFn = (context: SimulatorContext) => boolean;\n```\n\n### Predefined Behaviors\n\n```typescript\n// Worker that completes successfully\nconst SUCCESSFUL_WORKER: SimulatedBehavior = {\n  onStart: [\n    { type: 'write_file', path: 'output.txt', content: 'work done' },\n    { type: 'commit', message: 'Complete task' },\n    { type: 'done', status: 'completed' },\n  ],\n};\n\n// Worker that implements a function\nconst IMPLEMENT_FUNCTION_WORKER: SimulatedBehavior = {\n  onStart: [\n    { type: 'read_file', path: 'src/index.ts', into: 'existing' },\n    { type: 'write_file', path: 'src/index.ts', content: '/* existing */\\nexport function newFeature() { return true; }' },\n    { type: 'call_tool', tool: 'bash', params: { command: 'npm test' } },\n    { type: 'commit', message: 'Add newFeature function' },\n    { type: 'done', status: 'completed' },\n  ],\n};\n\n// Coordinator that breaks down work and spawns workers\nconst PLANNING_COORDINATOR: SimulatedBehavior = {\n  onStart: [\n    // Analyze task and create subtasks\n    { type: 'call_tool', tool: 'create_task', params: { description: 'Implement feature A' } },\n    { type: 'call_tool', tool: 'create_task', params: { description: 'Implement feature B' } },\n    { type: 'call_tool', tool: 'create_task', params: { description: 'Write tests' } },\n    // Spawn workers for each\n    { type: 'spawn_child', role: 'worker', behavior: IMPLEMENT_FUNCTION_WORKER },\n    { type: 'spawn_child', role: 'worker', behavior: IMPLEMENT_FUNCTION_WORKER },\n    { type: 'wait_for_event', event: 'all_children_done' },\n    { type: 'done', status: 'completed' },\n  ],\n  onEvent: {\n    'WORKER_DONE': [\n      { type: 'call_tool', tool: 'update_task_status', params: { status: 'completed' } },\n    ],\n  },\n};\n\n// Monitor that checks health\nconst HEALTH_CHECK_MONITOR: SimulatedBehavior = {\n  onStart: [\n    { type: 'wait_for_event', event: 'HEALTH_CHECK_TIMER' },\n  ],\n  onEvent: {\n    'HEALTH_CHECK_TIMER': [\n      { type: 'call_tool', tool: 'list_agents', params: {} },\n      { \n        type: 'conditional',\n        if: (ctx) => ctx.stuckAgents.length > 0,\n        then: [\n          { type: 'emit_signal', signal: 'GUPP_VIOLATION', payload: { agents: '{{stuckAgents}}' } },\n        ],\n      },\n    ],\n  },\n};\n```\n\n### Harness API\n\n```typescript\ninterface TestHarness {\n  // Setup\n  createTempRepo(options?: TempRepoOptions): Promise<TempRepo>;\n  createDatabase(): Promise<Database>;\n  initializeSudocode(options?: SudocodeOptions): Promise<SudocodeFixture>;\n  \n  // Coordinator lifecycle (simulates user starting a coordinator)\n  startCoordinator(config: CoordinatorConfig): Promise<SimulatedCoordinator>;\n  stopCoordinator(coordinatorId: string): Promise<void>;\n  \n  // Agent management\n  spawnSimulator(config: SimulatorConfig): Promise<AgentSimulator>;\n  getSimulator(agentId: string): AgentSimulator | undefined;\n  getAllSimulators(): AgentSimulator[];\n  \n  // User interaction simulation\n  sendUserMessage(coordinatorId: string, message: string): Promise<void>;\n  injectUserContext(coordinatorId: string, context: string): Promise<void>;\n  cancelCoordinator(coordinatorId: string): Promise<void>;\n  \n  // Orchestration components (real implementations)\n  agentManager: AgentManager;\n  workspaceManager: WorkspaceManager;\n  taskBackend: TaskBackend;\n  messageRouter: MessageRouter;\n  mergeQueue: MergeQueue;\n  \n  // Multi-coordinator support\n  getActiveCoordinators(): SimulatedCoordinator[];\n  getStreamForCoordinator(coordinatorId: string): StreamId;\n  \n  // Assertions\n  assertAgentTerminated(agentId: string): void;\n  assertTaskStatus(taskId: string, status: TaskStatus): void;\n  assertBranchExists(branch: string): void;\n  assertBranchMerged(source: string, target: string): void;\n  assertMergeRequestStatus(mrId: string, status: MergeRequestStatus): void;\n  assertIssueStatus(issueId: string, status: IssueStatus): void;\n  assertFeedbackOnSpec(specId: string, fromIssue: string): void;\n  \n  // Timing control\n  advanceTime(ms: number): Promise<void>;\n  waitForCondition(condition: () => boolean, timeoutMs?: number): Promise<void>;\n  \n  // Cleanup\n  cleanup(): Promise<void>;\n}\n```\n\n---\n\n## User Interaction Test Scenarios\n\nThese scenarios test how a human user interacts with macro-agent through CLI or API.\n\n### User → Coordinator Connection\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **Start coordinator with goal** | User starts macro-agent with a feature request | 1. User runs `macro-agent \"Implement user auth\"` 2. Coordinator spawns 3. Coordinator creates integration branch | Coordinator running, branch exists |\n| **Start coordinator on existing branch** | User specifies existing feature branch | 1. User runs `macro-agent --branch feature/auth` 2. Coordinator attaches to branch | Coordinator uses existing branch, no new branch |\n| **Start with sudocode spec** | User points to spec for implementation | 1. User runs `macro-agent --spec s-xxxx` 2. Coordinator reads spec 3. Coordinator creates issues | Issues created with `implements` link to spec |\n| **Start with sudocode issue** | User points to existing issue | 1. User runs `macro-agent --issue i-xxxx` 2. Coordinator claims issue 3. Worker spawned for issue | Issue status → in_progress, worker assigned |\n\n### User Feedback Mid-Execution\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **User provides clarification** | User answers coordinator question | 1. Coordinator asks \"OAuth or JWT?\" 2. User responds \"JWT\" 3. Coordinator continues | Context injected, decision reflected in work |\n| **User redirects work** | User changes priority mid-execution | 1. Workers running on feature A 2. User says \"Focus on auth first\" 3. Coordinator reprioritizes | Workers on A paused/terminated, new workers on auth |\n| **User requests status** | User asks for progress update | 1. User sends \"What's the status?\" 2. Coordinator summarizes | STATUS signal, accurate summary |\n| **User cancels coordinator** | User aborts the session | 1. User presses Ctrl+C 2. Coordinator receives cancel 3. Cascade termination | All agents terminated, workspaces cleaned |\n\n### User Reviews Output\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **User reviews PR** | Coordinator creates PR for review | 1. All workers complete 2. Integrator merges to integration branch 3. Coordinator creates PR | PR created with summary, all commits included |\n| **User requests changes** | User comments on PR | 1. PR created 2. User comments \"Add error handling\" 3. Coordinator spawns fix worker | New worker addresses feedback |\n| **User approves and merges** | User merges PR | 1. PR approved 2. User merges 3. Coordinator notified | Coordinator calls done(), cleanup |\n\n---\n\n## Multi-Coordinator Test Scenarios\n\nMultiple coordinators running concurrently, each with their own integration stream.\n\n### Independent Coordinators\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **Two features in parallel** | Separate features, no interaction | Coord A: auth feature, Coord B: payment feature | Separate branches, separate workers, no interference |\n| **Shared file, no conflict** | Features touch different parts of same file | Coord A edits top of file, Coord B edits bottom | Both merge successfully to main |\n| **Stream isolation** | Workers only see their coordinator's stream | 2 coordinators, 3 workers each | Workers in stream A can't access stream B tasks |\n\n### Dependent Coordinators\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **Sequential dependency** | B waits for A to complete | Coord B has `blockedBy: [A]` | B doesn't spawn workers until A merges |\n| **Baseline dependency** | B forks from A's branch | Coord B has `baseline: A` | B's integration branch forks from A's |\n| **Shared integration branch** | Two coordinators target same branch | Coord A and B both target `develop` | Merge queue serializes, no conflicts |\n\n### Coordinator Interaction\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **Cross-coordinator messaging** | Coordinator A notifies B | A completes critical component | B receives signal, adjusts plan |\n| **Resource contention** | Both coordinators need same file | A and B both edit `config.ts` | Conflict detected, coordinated resolution |\n| **Staggered start** | B starts while A is mid-execution | A running with 2 workers, B starts | B correctly inherits A's committed changes |\n\n### Coordinator Lifecycle\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **Coordinator crash recovery** | Coordinator dies, restarts | Coord running, force kill, restart | Picks up from last state, workers continue |\n| **Coordinator handoff** | User switches context | Coord A active, user starts Coord B | A paused/terminated cleanly, B starts |\n| **Long-running coordinators** | Coordinators run for extended period | 2 coordinators, 30+ min simulated | No resource leaks, queues stay healthy |\n\n---\n\n## Sudocode Integration Test Scenarios\n\nFull integration with sudocode for spec-driven development.\n\n### Spec → Issue → Task Flow\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **Coordinator reads spec** | Coordinator uses spec as requirements | 1. Spec exists with requirements 2. Coordinator started with --spec 3. Coordinator breaks into issues | Issues created, `implements` links to spec |\n| **Issue becomes task** | Sudocode issue maps to task | 1. Issue exists in sudocode 2. TaskBackend configured for sudocode 3. Worker claims task | Task fields match issue, status synced |\n| **Task completion updates issue** | Worker done updates issue status | 1. Worker completes task 2. done() called 3. Issue updated | Issue status → closed, execution recorded |\n| **Blocked issue handling** | Issue with dependencies | 1. Issue B blocked by Issue A 2. Coordinator queries ready 3. Only A appears | B not in listReady(), A worked first |\n\n### Feedback Loop\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **Worker adds spec feedback** | Implementation feedback to spec | 1. Worker implements feature 2. Worker finds spec ambiguity 3. Worker calls add_feedback | Feedback appears on spec with anchor |\n| **Spec updated mid-work** | User updates spec during implementation | 1. Worker implementing 2. User edits spec 3. Coordinator notified | Coordinator adjusts plan, workers updated |\n| **Cross-issue feedback** | Issue provides feedback to another issue | 1. Issue A discovers Issue B needed 2. Feedback added 3. Issue B created | B linked to A with `discovered-from` |\n\n### Multi-Spec Coordination\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **Related specs** | Coordinator implements multiple specs | 2 related specs, 1 coordinator | Issues for both, proper `implements` links |\n| **Spec hierarchy** | Parent spec with child specs | Parent spec, 3 child specs | Coordinator traverses hierarchy |\n| **Spec dependencies** | Spec A depends on Spec B | A references B | B's issues worked before A's |\n\n### Execution Tracking\n\n| Scenario | Description | Steps | Assertions |\n|----------|-------------|-------|------------|\n| **Execution created on spawn** | Agent spawn creates execution record | 1. Worker spawned 2. Execution record created | Execution linked to issue, started_at set |\n| **Execution chain** | Follow-up executions linked | 1. Worker fails 2. Retry spawned 3. Linked execution | Chain: exec1 → exec2, both linked to issue |\n| **Execution outputs** | Summary and files recorded | 1. Worker completes 2. Execution finalized | summary, files_changed populated |\n\n### Backend Parity\n\n| Scenario | Description | Setup | Assertions |\n|----------|-------------|-------|------------|\n| **InMemory ↔ Sudocode same behavior** | Identical operations, identical results | Same test, both backends | Outputs match exactly |\n| **Sudocode unavailable fallback** | Graceful degradation | Sudocode server down | Falls back to cached state or error |\n| **Sync conflict resolution** | External edit during execution | User edits issue via sudocode CLI | Conflict detected, handled gracefully |\n\n---\n\n## E2E Test Scenarios (Updated)\n\n### s-60tc: Specialized Agent Roles\n\n| Scenario | Description | Agents | Assertions |\n|----------|-------------|--------|------------|\n| **Role capability enforcement** | Worker cannot spawn integrator | 1 simulated worker | Spawn rejected with capability error |\n| **Role resolution fallback** | Invalid custom role falls back | 1 simulated agent | Agent runs with generic capabilities |\n| **Custom role override** | Project-level role config | 1 simulated worker | Custom timeout applied |\n| **Tool filtering by role** | Monitor cannot use write tools | 1 simulated monitor | Tool call rejected |\n| **Role-based message routing** | Message to @workers role | 2 workers + 1 monitor | Only workers receive |\n\n### s-7ktd: Structured Workspace Isolation\n\n| Scenario | Description | Agents | Assertions |\n|----------|-------------|--------|------------|\n| **Worker workspace isolation** | Parallel workers isolated | 2 simulated workers | Different paths, different branches |\n| **Coordinator sees children** | Coordinator has visibility | 1 coord + 2 workers | childWorkspacePaths populated |\n| **Workspace cleanup on done** | Clean termination | 1 simulated worker | Worktree gone, slot released |\n| **Monitor has no workspace** | No git workspace | 1 simulated monitor | workspace is undefined |\n| **Integrator on merge branch** | Separate from coordinator | 1 coord + 1 integrator | Different branches |\n| **Concurrent worktree creation** | Race condition handling | 5 workers spawned simultaneously | All get unique workspaces |\n| **Slot exhaustion** | Pool overflow | More workers than slots | Graceful overflow naming |\n\n### s-32xs: Self-Cleaning Workers\n\n| Scenario | Description | Agents | Assertions |\n|----------|-------------|--------|------------|\n| **Happy path done()** | Normal completion | 1 simulated worker | WORKER_DONE, terminated |\n| **Failed done()** | Failure handling | 1 simulated worker | Status=failed, no MR |\n| **Blocked done()** | Awaiting help | 1 simulated worker | HELP emitted |\n| **Cascade termination** | Parent terminates children | 1 worker + 2 children | All terminated depth-first |\n| **Change consolidation** | Child merged to parent | 1 worker + 1 child | Parent has child commits |\n| **Uncommitted changes** | Auto-commit on done | 1 simulated worker | Commit before signal |\n| **done() called twice** | Idempotency | 1 simulated worker | Second call is no-op |\n| **Checkpoint creation** | Commits become checkpoints | 1 simulated worker | Checkpoints in dataplane |\n\n### s-bcqm: Change Management and Merge Queue\n\n| Scenario | Description | Agents | Assertions |\n|----------|-------------|--------|------------|\n| **FIFO queue processing** | Order preserved | 3 workers + integrator | Merge order = submit order |\n| **Priority override** | High priority first | 3 workers (1 high) | High merged first |\n| **Merge conflict detection** | Conflicts identified | 2 conflicting workers | CONFLICT status, files listed |\n| **Resolver spawn on conflict** | Resolver created | 2 workers + integrator | Resolver spawned |\n| **Resolver inline merge** | Resolver completes | Full conflict flow | MR merged via resolver |\n| **Integrator drains queue** | All processed before done | 3 workers + integrator | Queue empty |\n| **Stream isolation** | Streams independent | 2 coordinators | Correct stream assignment |\n| **Queue reordering** | Dynamic priority | 3 workers, mid-reorder | New order respected |\n| **Nested conflict** | Resolver also conflicts | Complex conflict | Escalated to coordinator |\n\n### s-8472: Pluggable Task Backend\n\n| Scenario | Description | Backend | Assertions |\n|----------|-------------|---------|------------|\n| **InMemory CRUD** | Basic operations | InMemory | All succeed |\n| **InMemory blockers** | Dependency tracking | InMemory | Blocked excluded from ready |\n| **Sudocode read** | Issue → Task | Sudocode | Fields mapped correctly |\n| **Sudocode write** | Task → Issue | Sudocode | Issue created in sudocode |\n| **Sudocode blockers** | Link API | Sudocode | depends-on created |\n| **Backend parity** | Same behavior | Both | Identical results |\n| **High-frequency updates** | Event coalescing | Both | No event storms |\n\n### s-9rld: In-Flight Steering\n\n| Scenario | Description | Agents | Assertions |\n|----------|-------------|--------|------------|\n| **Broadcast delivery** | Fan-out | 3 agents | All receive |\n| **Role channel** | @workers | 2 workers + 1 monitor | Only workers |\n| **Priority wake** | Urgent interrupts | 1 sleeping agent | Woken immediately |\n| **Activity waking** | Event triggers wake | 1 monitor | Activated |\n| **Context injection** | inject() queued | 1 live agent | In next turn |\n| **Injection fallback** | interruptWith() | 1 live agent | Restarted with context |\n| **Message to terminated** | Delivery failure | 1 terminated agent | Error handled |\n| **Wake loop prevention** | Cycle detection | Circular wake | Detected, stopped |\n\n---\n\n## Live Agent Test Scenarios (Detailed)\n\n### Happy Path Scenarios\n\n| Scenario | Task Description | Agents | Expected Flow |\n|----------|-----------------|--------|---------------|\n| **Worker adds function** | \"Add a `formatDate` function to `src/utils.ts` that formats dates as YYYY-MM-DD\" | 1 live worker | Worker reads file → writes function → runs tests → commits → done() |\n| **Worker fixes bug** | \"Fix the off-by-one error in `calculateTotal` in `src/cart.ts`\" | 1 live worker | Worker reads file → identifies bug → fixes → tests → commits → done() |\n| **Coordinator plans feature** | \"Implement user profile page with avatar upload\" | 1 coord + 2 workers | Coordinator breaks down → spawns workers → tracks → merges |\n| **Full merge flow** | Worker completes → queue → integrator → merged | 1 worker + 1 integrator | Worker done → MR submitted → integrator merges → branch updated |\n\n### Error Handling Scenarios\n\n| Scenario | Task Description | Agents | Expected Flow |\n|----------|-----------------|--------|---------------|\n| **Worker encounters test failure** | \"Add validation to `createUser` but tests fail\" | 1 live worker | Worker implements → tests fail → done(failed) with error |\n| **Worker needs clarification** | \"Implement the payment flow\" (ambiguous) | 1 live worker | Worker asks HELP → awaits response → continues |\n| **Coordinator retries failed worker** | Worker fails, coordinator spawns replacement | 1 coord + 2 workers | Worker 1 fails → coordinator detects → worker 2 spawned |\n| **Merge conflict resolution** | Two workers edit same function | 2 workers + resolver | Conflict detected → resolver spawned → resolution merged |\n\n### Complex Coordination Scenarios\n\n| Scenario | Task Description | Agents | Expected Flow |\n|----------|-----------------|--------|---------------|\n| **3 parallel workers** | \"Implement API endpoints for users, posts, comments\" | 1 coord + 3 workers | Coordinator creates 3 tasks → 3 workers parallel → all merge |\n| **Hierarchical workers** | \"Worker needs to spawn helpers for subtasks\" | 1 worker + 2 sub-workers | Worker breaks down → spawns children → consolidates |\n| **Monitor detects stuck** | \"Worker gets stuck for 30+ min\" | 1 coord + 1 worker + 1 monitor | Worker hangs → monitor detects → GUPP_VIOLATION → escalation |\n| **Cross-coordinator dependency** | \"Feature B needs Feature A complete\" | 2 coordinators | Coord A finishes → Coord B unblocked → B proceeds |\n\n### Sudocode Integration Scenarios\n\n| Scenario | Task Description | Setup | Expected Flow |\n|----------|-----------------|-------|---------------|\n| **Implement issue** | \"Implement issue i-xxxx: Add login endpoint\" | Issue exists | Worker reads issue → implements → closes issue with feedback |\n| **Spec-driven implementation** | \"Implement spec s-xxxx: User Authentication\" | Spec with requirements | Coordinator creates issues → workers implement → feedback on spec |\n| **Issue with dependencies** | \"Issue i-yyyy blocked by i-xxxx\" | Dependency exists | i-xxxx worked first → i-yyyy unblocked → worked |\n| **Full feedback loop** | Implementation reveals spec issue | Spec + issue | Worker adds feedback → user sees on spec → spec updated |\n\n---\n\n## Edge Case Scenarios\n\n### Timing and Race Conditions\n\n| Scenario | Description | Test Strategy |\n|----------|-------------|---------------|\n| **Simultaneous done()** | Two workers call done() at same moment | Simulated with synchronized timing |\n| **Message during termination** | Message arrives as agent terminates | Inject message during terminate() |\n| **Spawn during cascade** | Parent spawns child while being terminated | Race parent terminate with child spawn |\n| **Queue submit during processing** | New MR while integrator processing | Concurrent submit and process |\n\n### Failure and Recovery\n\n| Scenario | Description | Test Strategy |\n|----------|-------------|---------------|\n| **Database connection lost** | SQLite file locked/deleted | Inject filesystem error |\n| **Git operation fails** | Merge fails unexpectedly | Mock git to fail |\n| **Agent process dies** | Simulated crash | Kill simulator mid-operation |\n| **Coordinator crash recovery** | Restart after crash | Kill and restart with same state |\n\n### Resource Limits\n\n| Scenario | Description | Test Strategy |\n|----------|-------------|---------------|\n| **Many concurrent workers** | 10+ workers in parallel | Spawn 10+ simulators |\n| **Deep cascade** | 5+ levels of child agents | Nested spawn chain |\n| **Large merge queue** | 50+ pending MRs | Bulk submit |\n| **Long-running session** | 30+ min simulated time | Time advancement |\n\n---\n\n## Test Fixtures\n\n### Temp Repo Factory\n\n```typescript\ninterface TempRepoOptions {\n  initialFiles?: Record<string, string>;\n  initialBranch?: string;\n  bare?: boolean;\n  withDataplane?: boolean;\n  withSudocode?: boolean;\n  sudocodeSpecs?: Partial<Spec>[];\n  sudocodeIssues?: Partial<Issue>[];\n}\n\nasync function createTempRepo(options: TempRepoOptions): Promise<TempRepo>;\n```\n\n### Realistic Project Fixtures\n\n```typescript\n// TypeScript project with common structure\nconst TYPESCRIPT_PROJECT = {\n  'package.json': '{ \"name\": \"test-project\", \"scripts\": { \"test\": \"jest\" } }',\n  'tsconfig.json': '{ \"compilerOptions\": { \"target\": \"es2020\" } }',\n  'src/index.ts': 'export {}',\n  'src/utils.ts': 'export function helper() { return true; }',\n  'tests/utils.test.ts': 'test(\"helper\", () => expect(helper()).toBe(true))',\n};\n\n// Project with existing feature branches\nconst PROJECT_WITH_BRANCHES = {\n  ...TYPESCRIPT_PROJECT,\n  branches: ['main', 'develop', 'feature/existing'],\n};\n\n// Project with sudocode specs\nconst PROJECT_WITH_SPECS = {\n  ...TYPESCRIPT_PROJECT,\n  specs: [\n    { id: 's-test', title: 'Test Feature', description: 'Requirements...' },\n  ],\n  issues: [\n    { id: 'i-test', title: 'Implement test', implements: 's-test' },\n  ],\n};\n```\n\n### Behavior Fixtures (Extended)\n\n```typescript\nexport const BEHAVIORS = {\n  // Workers\n  successfulWorker: SUCCESSFUL_WORKER,\n  failingWorker: FAILING_WORKER,\n  stuckWorker: STUCK_WORKER,\n  implementFunction: IMPLEMENT_FUNCTION_WORKER,\n  fixBug: (file: string, bugDesc: string) => createBugFixWorker(file, bugDesc),\n  conflicting: (file: string, content: string) => createConflictingWorker(file, content),\n  \n  // Coordinators\n  simpleCoordinator: SIMPLE_COORDINATOR,\n  planningCoordinator: PLANNING_COORDINATOR,\n  multiWorkerCoordinator: (count: number) => createMultiWorkerCoordinator(count),\n  sudocodeCoordinator: (specId: string) => createSudocodeCoordinator(specId),\n  \n  // Integrators\n  integrator: INTEGRATOR,\n  conflictResolver: CONFLICT_RESOLVER,\n  \n  // Monitors\n  healthCheckMonitor: HEALTH_CHECK_MONITOR,\n  guppMonitor: GUPP_MONITOR,\n};\n```\n\n---\n\n## Coverage Targets\n\n### Quantitative Coverage\n\n| Category | Line Coverage | Branch Coverage |\n|----------|---------------|-----------------|\n| src/roles/ | ≥90% | ≥85% |\n| src/workspace/ | ≥90% | ≥85% |\n| src/lifecycle/ | ≥90% | ≥85% |\n| src/router/ | ≥90% | ≥85% |\n| src/task/backend/ | ≥90% | ≥85% |\n| src/activity/ | ≥90% | ≥85% |\n| src/merge/ | ≥90% | ≥85% |\n| **Overall** | **≥85%** | **≥80%** |\n\n### Qualitative Coverage\n\n| Category | Scenario Count |\n|----------|----------------|\n| Component Integration | 15+ |\n| Simulated Agent E2E | 40+ |\n| User Interaction | 12+ |\n| Multi-Coordinator | 9+ |\n| Sudocode Integration | 15+ |\n| Live Agent | 12+ |\n| Edge Cases | 12+ |\n| **Total** | **115+** |\n\n### Critical Path Coverage\n\nMust pass before release:\n\n1. User starts coordinator → workers spawn → work completes → PR created\n2. Worker done() → signal → queue → integrator → merge → cleanup\n3. Conflict → resolver spawn → resolution → inline merge\n4. Cascade termination with change consolidation\n5. Sudocode spec → issues → tasks → executions → feedback\n6. Multi-coordinator with separate streams\n7. Coordinator crash recovery\n\n---\n\n## Test Organization\n\n### Directory Structure\n\n```\ntest_fixtures/\n├── fixtures/                  # Test fixtures\n│   ├── repos/                 # TempRepoFactory\n│   ├── behaviors/             # Behavior fixtures\n│   ├── projects/              # Project templates\n│   └── sudocode/              # Sudocode fixtures\n├── harness/                   # Test harness code\n│   ├── simulator/             # AgentSimulator\n│   ├── timing/                # EventStepper\n│   ├── assertions/            # Assertion helpers\n│   ├── test-harness.ts        # Main harness\n│   └── __tests__/             # Harness tests\n│       ├── temp-repo-and-simulator.test.ts\n│       ├── behavior-executor-and-stepper.test.ts\n│       ├── test-harness-and-assertions.test.ts\n│       ├── fixtures.test.ts\n│       ├── orchestration-flow.e2e.test.ts      # NEW\n│       ├── conflict-resolution.e2e.test.ts     # NEW\n│       ├── cascade-termination.e2e.test.ts     # NEW\n│       ├── multi-coordinator.e2e.test.ts       # NEW\n│       ├── steering-integration.e2e.test.ts    # NEW\n│       └── task-integration.e2e.test.ts        # NEW\n```\n\n### CI/CD Configuration\n\n| Trigger | Test Suite | Duration |\n|---------|------------|----------|\n| Every PR | Unit + Integration | ~2 min |\n| PR merge | + E2E (simulated) | ~10 min |\n| Nightly | + User + Multi-coord + Sudocode | ~30 min |\n| Weekly | + Live Agent | ~60 min |\n| Release | All + Extended | ~90 min |\n\n---\n\n## Success Criteria\n\n1. **Agent Simulator Harness** - Complex multi-agent scenarios without Claude\n2. **User Interaction Coverage** - All user-facing flows tested\n3. **Multi-Coordinator Support** - Concurrent coordinators work correctly\n4. **Sudocode Integration** - Full spec → feedback cycle tested\n5. **115+ E2E Scenarios** - Comprehensive scenario coverage\n6. **12+ Live Agent Tests** - Real Claude validation\n7. **Coverage Targets** - ≥85% line, ≥80% branch\n8. **CI/CD Integration** - Automatic at appropriate triggers\n\n---\n\n## Dependencies\n\n- [[s-60tc]] Specialized Agent Roles\n- [[s-7ktd]] Structured Workspace Isolation\n- [[s-32xs]] Self-Cleaning Workers\n- [[s-bcqm]] Change Management and Merge Queue\n- [[s-8472]] Pluggable Task Backend Integration\n- [[s-9rld]] In-Flight Steering\n- [[s-7t8b]] Multi-Agent Orchestration Implementation Plan\n\n## References\n\n- Existing test patterns: `src/**/__tests__/`\n- Jest configuration: `jest.config.js`\n- CI workflow: `.github/workflows/`","priority":1,"archived":0,"archived_at":null,"created_at":"2026-01-23 03:18:46","updated_at":"2026-01-24 05:03:44","parent_id":null,"parent_uuid":null,"relationships":[{"from":"s-1zcx","from_type":"spec","to":"s-32xs","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-60tc","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-7ktd","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-7t8b","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-8472","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-9rld","to_type":"spec","type":"references"},{"from":"s-1zcx","from_type":"spec","to":"s-bcqm","to_type":"spec","type":"references"}],"tags":["e2e","integration","strategy","testing"]}
 {"id":"s-gu8h","uuid":"a2b37456-8582-4a14-bbe6-ec57903c1794","title":"Mail Protocol Integration — Structured Conversation Tracking","file_path":"specs/s-gu8h_mail_protocol_integration_structured_conversation_.md","content":"\n# Mail Protocol Integration — Structured Conversation Tracking\n\n## Background\n\nmacro-agent coordinates multiple agents with message routing, status events, and lifecycle signals. Today, `conversationHistory[]` on the API server is a flat array with no structure. Internal agent-to-agent messages flow through the MessageRouter but aren't linked to any conversation context. There is no cross-session history or audit trail.\n\nThe Multi-Agent Protocol (MAP) SDK recently added a **mail system** — a structured conversation layer providing conversations, turns, threads, and participants. This spec describes how to integrate MAP mail into macro-agent.\n\n### What Mail Adds\n\n- **Conversations** — Named, stateful containers grouping related interactions (one per user session, one per delegated task, one per peer relationship)\n- **Turns** — Ordered records of every message within a conversation\n- **Threads** — Sub-conversations for branching discussions within a conversation\n- **Participants** — Tracked membership with roles and permissions per conversation\n\n### What Doesn't Change\n\n- Agent spawning still uses acp-factory\n- The role system (worker, integrator, coordinator, monitor) stays the same\n- Workspace isolation and merge queue are unaffected\n- MCP remains the agent-facing tool interface\n- All existing messaging semantics (send_message, check_messages) are unchanged from the agent's perspective\n\n---\n\n## Design Motivations\n\n### 1. MAP Protocol Alignment for External Clients\n\nInternal messages currently bypass the MAP protocol entirely — they flow through `MessageRouter → EventStore` without touching the MAPAdapter. By making internal messaging produce MAP mail-compatible data, any MAP client gets a standardized view into macro-agent's internals:\n\n- External dashboards use standard `mail/list`, `mail/turns/list`, `mail/replay` — no custom API needed\n- Remote macro-agent instances connecting via MAP WebSocket see conversations through the same protocol\n- Cross-instance delegation (macro-agent A delegates to macro-agent B) gets unified conversation tracking through the standard mail protocol\n\n### 2. Structured Audit Trail\n\nToday there is no way to answer \"what happened during this task?\" without reconstructing it from raw events. Mail conversations provide a queryable, structured record of every interaction in the context of a task or session.\n\n### 3. Federation-Ready\n\nWhen macro-agent A delegates work to a remote macro-agent B:\n- B's work produces turns in a local conversation\n- A can query B's conversation via MAP mail protocol over the existing MAP connection\n- Conversation IDs provide cross-system correlation\n\n### 4. Zero Agent Burden\n\nAgents should not need to know about or manage conversations. Mail is purely an infrastructure/observability layer. No MCP tool changes, no system prompt changes, no conversation IDs in agent messages.\n\n---\n\n## Key Design Decisions\n\n### D1: Extend the Existing MAPAdapter Instance\n\nmacro-agent already creates a MAPAdapter via `createMAPAdapter()` for ACP/MAP external client handling. Rather than creating a separate MAPServer for mail, we extend the existing instance with `mail: { enabled: true }` and plug in EventStore-backed mail stores.\n\n**Rationale:** Single server instance, single event bus, single connection surface for external clients. Avoids duplicate infrastructure.\n\n### D2: Hook at the MessageRouter Level\n\nTurn recording happens inside `MessageRouter.sendToAddress()`, not at the MCP tool layer. This captures all message paths (direct, task-based, peer) regardless of how they're initiated.\n\n**Rationale:** The router is the single choke point for all inter-agent messaging. Hooking here via a callback (similar to the existing `wakeHandler` pattern) gives complete coverage without coupling mail to the MCP tool layer.\n\n**Mechanism:** A `turnRecorder` callback in `MessageRouterConfig`, invoked after event emission when a conversation context can be resolved for the message.\n\n### D3: Mail as an EventStore Materialized View\n\nConversations, turns, and threads are stored as new event types in the existing EventStore, with materialized views computed from them — the same pattern used for agents, tasks, and messages today.\n\n**Rationale:**\n- All data flows through the existing persistence pipeline (SQLite or memory backend)\n- Replay-on-restart works identically to agents/tasks/messages\n- Single source of truth — no parallel persistence system\n- MAP's store interfaces (`ConversationStore`, `TurnStore`, etc.) become thin adapters over these materialized views\n\n**New event types:**\n| Event type | Actions |\n|---|---|\n| `conversation` | `created`, `closed`, `participant_joined`, `participant_left` |\n| `turn` | `recorded` |\n| `thread` | `created` |\n\n**New materialized views:**\n| View | Computed from |\n|---|---|\n| `conversations` | `conversation` events |\n| `turns` (per-conversation) | `turn` events |\n| `threads` | `thread` events |\n\nMAP events (`mail.created`, `mail.turn.added`, etc.) are emitted from these store adapters so external MAP subscribers receive real-time updates through the standard protocol.\n\n### D4: Conversation Context Abstracted from Agents\n\nAgents never see, pass, or manage conversation IDs. The system maintains an internal mapping:\n\n```\nagentConversationMap: Map<AgentId, ConversationId>\n```\n\nThis is populated automatically:\n- **On agent spawn** → create task conversation, map agent to it, auto-join parent + child\n- **On user session start** → create session conversation, map head manager to it\n- **On agent terminate / done()** → close the agent's conversation, leave participant\n\nThe MCP tools (`send_message`, `check_messages`, `done`) remain unchanged. No schema changes, no system prompt additions for mail awareness.\n\n### D5: Hierarchy-Aware Turn Assignment\n\nWhen two agents in different conversations communicate, the system determines which conversation the turn belongs to using the agent hierarchy (lineage) already tracked in EventStore:\n\n- **Parent → child message:** turn recorded in child's task conversation\n- **Child → parent message:** turn recorded in child's task conversation\n- **Same conversation:** turn recorded in that shared conversation\n- **Peers (no parent-child relationship):** turn recorded in a peer conversation (see D6)\n\nThis means each task conversation is a **complete, self-contained record** of the interaction about that task — delegation, back-and-forth clarification, progress updates, completion. When a MAP client queries \"what happened with this task?\", they get everything in one conversation.\n\nThe head manager participates in multiple conversations: the session conversation (with the user) and each child's task conversation (as the delegator). This is standard in MAP mail — a participant can be in many conversations.\n\n### D6: Only Direct Messages Produce Conversation Turns\n\nTo keep the model simple, only explicit direct messages are tracked as conversations:\n\n**Tracked:**\n| Address type | Conversation |\n|---|---|\n| `{ agent: \"child-or-parent\" }` | Child's task conversation |\n| `{ agent: \"peer\" }` | Peer conversation (per agent-pair) |\n| `{ task: \"task-id\" }` | Task's conversation (resolves to agent) |\n\n**Not tracked (flow through EventStore as today, no mail turns):**\n| Address type | Reason |\n|---|---|\n| `{ broadcast: true }` | One-way announcements, not conversational |\n| `{ role: \"worker\" }` | Fan-out commands, responses go through task convs |\n| `{ siblings: true }` | Fan-out, can add later if needed |\n| `{ agents: [...] }` | Multi-target, can add later |\n| `{ scope: \"topic\" }` | Pub/sub, can add later as topic conversations |\n\nFan-out messages (role, siblings, broadcast) continue to work exactly as today through the EventStore message system. If topic or sibling group conversations are needed later, they layer on without changing the core model.\n\n### D7: Peer Conversations (Auto-Created)\n\nWhen Agent A sends a direct message to Agent B and neither is an ancestor of the other, the system creates a **peer conversation**:\n\n- Keyed by sorted agent-pair: `[min(A,B), max(A,B)]` — same conversation regardless of who initiates\n- Parent conversation set to nearest common ancestor conversation (e.g., their shared coordinator's task conv)\n- Both agents auto-joined as participants\n- Cached in a `peerConvIndex` for fast lookup on subsequent messages\n- Closed when either agent terminates\n\n### D8: Conversation Lifecycle Aligned with Agent Lifecycle\n\n**Creation:**\n- Session conversation: created by API server on first user message\n- Task conversation: created by agent-manager on agent spawn\n- Peer conversation: created by MessageRouter on first direct peer message\n\n**Closing:**\n- Integrated into the `done()` pipeline, between step 4 (task status update) and step 5 (status event emission)\n- This is the natural extension point — after the task is marked complete but before role-specific handlers deal with git/merge concerns\n- Cascade termination also closes child conversations\n\n---\n\n## Conversation Resolution Logic\n\nThe MessageRouter resolves conversations with this logic:\n\n```\nsendToAddress(from, to, content):\n  1. Route message via existing logic (unchanged)\n  2. Resolve conversation:\n     a. If not a direct agent or task address → null (no turn recorded)\n     b. Resolve target agent (for task addresses, look up assigned agent)\n     c. Check relationship via lineage:\n        - Parent/child → child's task conversation\n        - Same conversation → that conversation  \n        - Peers → getOrCreatePeerConv(from, target)\n     d. If conversation resolved → invoke turnRecorder callback\n  3. Proceed with wake decision (unchanged)\n```\n\n## Conversation Tree (Example)\n\nFor a user request \"Refactor the auth module\" that spawns two workers, where Worker A coordinates with Worker B:\n\n```\nconv-001 (session: user ↔ Head Manager)\n│  Turn 1: user → \"Refactor the auth module\"\n│  Turn 2: HM → \"Splitting into analysis and implementation...\"\n│  Turn 5: HM → \"Done. Here's what changed...\"\n│\n├── conv-002 (task: HM ↔ Worker A — analyze)\n│   Turn 1: HM → \"Analyze current auth code\"\n│   Turn 2: Worker A → \"Found 3 issues...\"\n│   Turn 3: Worker A → done(completed)\n│   [closed]\n│\n├── conv-003 (task: HM ↔ Worker B — implement)\n│   Turn 1: HM → \"Implement new auth, here's the analysis...\"\n│   Turn 2: Worker B → \"Need clarification on token store\"\n│   Turn 3: HM → \"Use the new interface from analysis\"\n│   Turn 4: Worker B → done(completed)\n│   [closed]\n│\n└── conv-004 (peer: Worker A ↔ Worker B)\n    Turn 1: Worker A → \"Here's the schema I found, you'll need it\"\n    Turn 2: Worker B → \"Got it, thanks\"\n    [closed when either terminates]\n```\n\nA MAP client queries `mail/list` and sees all four conversations. Drilling into any one shows the complete dialogue. Subscribing to `mail.turn.added` gives real-time turn updates for dashboards.\n\n---\n\n## Scope\n\n### In Scope\n- EventStore event types and materialized views for conversations/turns\n- MAP store adapter implementations backed by EventStore\n- MAPAdapter configuration to enable mail with EventStore-backed stores\n- MessageRouter turn recording hook (turnRecorder callback)\n- Conversation lifecycle management (create on spawn/first-message, close on done/terminate)\n- Peer conversation auto-creation and indexing\n- API server session conversation management\n- Conversation REST endpoints and WebSocket subscription channels\n\n### Out of Scope (Future)\n- Topic/scope conversations\n- Sibling group conversations\n- Multi-agent group conversations\n- Thread support within conversations (MAP supports it, we defer usage)\n- Conversation summarization\n- Persistent conversation storage beyond EventStore (external databases)\n- Agent-visible mail tools (agents remain unaware of conversations)\n\n---\n\n## References\n\n- MAP SDK mail implementation: `references/multi-agent-protocol/ts-sdk/src/server/mail/`\n- MAP mail types: `references/multi-agent-protocol/ts-sdk/src/server/types.ts`\n- Preliminary integration guide (informational, partially superseded by this spec): `docs/mail-integration.md`\n- Existing MAPAdapter: `src/map/adapter/map-adapter.ts`\n- MessageRouter: `src/router/message-router.ts`\n- EventStore: `src/store/event-store.ts`\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-06 04:28:45","updated_at":"2026-02-06 04:28:45","parent_id":null,"parent_uuid":null,"relationships":[],"tags":["architecture","conversations","mail","MAP"]}
+{"id":"s-1z9o","uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","title":"Self-Driving Codebases Support","file_path":"specs/s-1z9o_self_driving_codebases_support.md","content":"\n# Self-Driving Codebases Support\n\nEnable macro-agent to support multiple agent team configurations — from the existing structured coordinator/integrator/worker pattern to Cursor-style autonomous self-driving teams with planners, workers, and judges — through a declarative team template layer on top of the existing role-agnostic core.\n\n## Design Principles\n\n- **Team templates as configuration, not code** — YAML directories that compose roles, define spawn graphs, set integration strategies, and configure coordination protocols\n- **No core rewrites** — team system composes on top of existing primitives (RoleRegistry, MessageRouter, EventStore, AgentManager) via surgical additions (interceptor hooks, new dep fields)\n- **Backward compatible** — all existing tests pass unchanged; no team loaded = identical behavior to pre-feature codebase\n\n## Resolved Design Decisions (Binding)\n\n| ID | Decision | Rationale |\n|----|----------|-----------|\n| **RD1** | Replace hardcoded `done()` role check with `roleRegistry.hasCapability()` | Team-defined roles need lifecycle.done |\n| **RD2** | Share team config via EventStore `team_config` event | Cross-process MCP subprocess access without HTTP API |\n| **RD3** | `spawn_rules` translate to capability additions (`agent.spawn.*`) | Single enforcement mechanism via existing capability system |\n| **RD4** | Team `customPrompt` replaces base role `systemPrompt` entirely | Team author owns the full role prompt |\n| **RD5** | Optimistic strategy is thin — validation is the judge agent's job | Strategy stays simple; behavior lives in agent prompts |\n| **RD6** | Team selection via `.macro-agent/config.json` with CLI `--team` override | Persistent project-level config without CLI flags |\n| **RD7** | Use `js-yaml` for YAML parsing | MIT, 1.4M weekly downloads, no native bindings |\n\n## Architecture Overview\n\n```\nTeam Template (.macro-agent/teams/<name>/)\n  ├── team.yaml        → TeamLoader.load()\n  ├── roles/*.yaml     → Role inheritance resolution\n  └── prompts/*.md     → Static prompt injection\n          │\n          ▼\n    TeamRuntime\n      ├── initialize() → Register roles, emit team_config, install spawn interceptor\n      ├── bootstrap()  → Spawn root + companions, peer subscriptions\n      └── teardown()   → Cleanup interceptor, stop monitoring\n          │\n    ┌─────┼──────────────────┐\n    ▼     ▼                  ▼\nRoleRegistry  IntegrationStrategy  TaskBackend\n(team roles)  Registry (strategy)  (claim methods)\n```\n\n## Subsystems\n\nSee child specs for detailed requirements per subsystem:\n- **Team Template System** (Phase 0-1): Types, loader, runtime, spawn interceptor, CLI integration\n- **Pluggable Integration Strategies** (Phase 2): Strategy interface, registry, queue/trunk/optimistic implementations\n- **Task Pull Model** (Phase 3): claim/unclaim/listClaimable on TaskBackend, MCP tools, capability gating\n- **Session Continuations** (Phase 4): continueAgent(), continuation monitoring for daemon agents\n- **Observability** (Phase 5): Throughput/utilization/error metrics, REST endpoints\n- **Communication Topology Gaps** (remaining work): Signal filtering, peer routing, enforcement, wake logic\n\n## Implementation Status: SUBSTANTIALLY COMPLETE\n\nAll 6 phases implemented. 71 team-specific tests passing, 3,052 total tests passing. Known gaps are documented in the Communication Topology Gaps child spec.\n\n## Source Documents\n\n- `docs/spec-self-driving-support.md` — Full spec with phase requirements\n- `docs/plan-self-driving-support.md` — High-level plan, design decisions D1-D5\n- `docs/implementation-details.md` — Resolved ambiguities A1-A10, code integration points\n- `docs/implementation-summary.md` — Post-implementation summary with gap audit\n- `docs/team-templates.md` — Team template format, communication topology, examples\n- `docs/teams.md` — User-facing schema reference and guide\n\n## File Inventory\n\n### New Modules (30 files)\n```\nsrc/teams/           — types.ts, team-loader.ts, team-runtime.ts, index.ts\nsrc/workspace/strategies/ — types.ts, registry.ts, queue.ts, trunk.ts, optimistic.ts, index.ts\nsrc/mcp/tools/       — claim_task.ts, unclaim_task.ts, list_claimable_tasks.ts\nsrc/metrics/         — metrics.ts, index.ts\nsrc/config/          — project-config.ts\n.macro-agent/teams/self-driving/ — team.yaml, roles/*, prompts/*\n.macro-agent/teams/structured/   — team.yaml, roles/*, prompts/*\n```\n\n### Modified Core Files (13 files)\n```\nsrc/agent/agent-manager.ts  — setSpawnInterceptor(), continueAgent(), customPrompt\nsrc/agent/types.ts          — ContinueAgentOptions, customPrompt, interactionPatterns\nsrc/lifecycle/handlers/worker.ts — Strategy dispatch, pull mode shouldTerminate\nsrc/lifecycle/handlers/index.ts  — integrationStrategy, taskMode in AllHandlerDeps\nsrc/mcp/tools/done.ts       — integrationStrategy, taskMode in DoneToolDeps\nsrc/mcp/mcp-server.ts       — claim tool registration, MCPServices extensions\nsrc/store/types/tasks.ts    — tags field on Task\nsrc/task/backend/types.ts   — ClaimFilter, claim/unclaim/listClaimable\nsrc/task/backend/memory.ts  — claim(), unclaim(), listClaimable() implementations\nsrc/roles/types.ts          — \"task.claim\" in TaskCapability union\nsrc/roles/capabilities.ts   — TASK_CAPABILITIES.CLAIM, CAPABILITY_TOOL_MAP\nsrc/api/server.ts           — /api/team, /api/metrics/* endpoints\nsrc/router/message-router.ts — emitStatus → topic routing bridge\n```\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:25:10","updated_at":"2026-02-07 21:25:10","parent_id":null,"parent_uuid":null,"relationships":[],"tags":["integration-strategies","observability","self-driving","task-pull","teams"]}
+{"id":"s-8pqr","uuid":"b63e29db-7c1e-47c5-bf36-247546a34f7c","title":"Team Template System (Phases 0-1)","file_path":"specs/s-8pqr_team_template_system_phases_0_1.md","content":"\n# Team Template System (Phases 0-1)\n\nFoundation layer: YAML-based team configuration loading, runtime wiring, spawn interception, CLI integration, and MCP subprocess context.\n\n## Status: COMPLETE\n\nAll requirements implemented and tested.\n\n## Phase 0: Prerequisites\n\n### P0.1: Fix done() capability check\n- **File**: `src/mcp/tools/done.ts`\n- **Change**: Replaced hardcoded role set in `hasLifecycleDoneCapability()` with `roleRegistry.hasCapability()` lookup\n- **Status**: Done\n\n### P0.2: Project config loader\n- **File**: `src/config/project-config.ts` (107 LOC)\n- **Schema**: `{ team?: string, [key: string]: unknown }`\n- **Behavior**: Reads `.macro-agent/config.json`, CLI flags override, missing file = empty config\n- **Status**: Done\n\n## Phase 1: Team Template System\n\n### P1.1: TeamManifest types (`src/teams/types.ts`, 377 LOC)\n- 16+ types: TeamManifest, TeamTopology, TopologyNode, TeamCommunication, ChannelDefinition, ChannelSubscription, CommunicationRouting, PeerConnection, MacroAgentExtensions, TeamRoleDefinition, ResolvedTeamRole, etc.\n- Generic fields separated from `macro_agent` namespace per interoperability design\n- **Status**: Done\n\n### P1.2: TeamLoader (`src/teams/team-loader.ts`, 434 LOC)\n7-step loading pipeline:\n1. Parse `team.yaml` with `js-yaml`\n2. Validate required fields\n3. Resolve each role: load `roles/<name>.yaml`, find base via `extends`, compute capabilities\n4. Translate `spawn_rules` → capability additions (RD3)\n5. Load prompt markdown files\n6. Load optional MCP server configs (`tools/mcp-servers.json`)\n7. Validate communication topology references\n\nError handling: `TeamLoadError` with codes: `MANIFEST_NOT_FOUND`, `INVALID_MANIFEST`, `ROLE_NOT_FOUND`, `PROMPT_NOT_FOUND`, `INVALID_COMMUNICATION`\n- **Status**: Done\n\n### P1.3: TeamRuntime (`src/teams/team-runtime.ts`, 428 LOC)\n- `initialize()`: Register roles → emit team_config → install spawn interceptor\n- `bootstrap()`: Spawn root + companions (parent: null) → peer subscriptions → continuation monitoring\n- `teardown()`: Clear interceptor, stop monitoring\n- **Status**: Done\n\n### P1.4: Spawn interceptor (`src/agent/agent-manager.ts`)\n- `setSpawnInterceptor()` method, called before capability checks\n- Injects: topics, MCP servers, env vars, customPrompt, interaction patterns\n- No-op when not set (backward compatible)\n- **Status**: Done\n\n### P1.5: System prompt team support\n- `customPrompt` replaces `resolvedRole.systemPrompt` (RD4)\n- `interactionPatterns` appended as additional sections (pull mode, trunk strategy notes)\n- **Status**: Done\n\n### P1.6: CLI and config integration (`src/cli/index.ts`)\n- `--team <name>` flag, project config fallback\n- TeamLoader → TeamRuntime → initialize → bootstrap after server start\n- **Status**: Done\n\n### P1.7: MCP subprocess team context\n- `team_config` event emitted to EventStore during initialize()\n- MCP subprocess reads it to reconstruct strategy and taskMode (RD2)\n- **Status**: Done\n\n### P1.8: API endpoint (`src/api/server.ts`)\n- `GET /api/team` returns `{ active, name, strategy, taskMode, enforcement }`\n- **Status**: Done\n\n### P1.9: Reference self-driving template\n- `.macro-agent/teams/self-driving/` — planner, grinder, judge roles\n- `.macro-agent/teams/structured/` — lead, developer, reviewer roles\n- **Status**: Done\n\n## Test Coverage\n- `src/teams/__tests__/team-system.test.ts` — 37 tests (loading, runtime, interceptor, getters)\n- `src/teams/__tests__/cross-subsystem.integration.test.ts` — 34 tests (strategy dispatch, pull mode, claims, metrics, capabilities, topology)\n- All 71 tests passing\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:25:34","updated_at":"2026-02-07 21:25:34","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-8pqr","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["phase-0","phase-1","self-driving","teams"]}
+{"id":"s-6ok3","uuid":"d635e0ed-3b7d-49c1-a8cb-db85a153dc96","title":"Pluggable Integration Strategies (Phase 2)","file_path":"specs/s-6ok3_pluggable_integration_strategies_phase_2.md","content":"\n# Pluggable Integration Strategies (Phase 2)\n\nStrategy pattern for how worker changes are landed onto the integration branch. Replaces hardcoded merge queue logic with pluggable dispatch.\n\n## Status: COMPLETE (minor gaps noted)\n\n## Architecture\n\n```\nWorker done() handler\n  ├── integrationStrategy exists? → strategy.land(request)\n  ├── mergeQueue exists?          → existing merge queue (fallback)\n  └── neither?                    → skip integration, warn\n```\n\n## Components\n\n### IntegrationStrategy interface (`src/workspace/strategies/types.ts`)\n```typescript\ninterface IntegrationStrategy {\n  readonly name: string;\n  land(request: LandRequest): Promise<LandResult>;\n  initialize?(): Promise<void>;\n  close?(): Promise<void>;\n}\n```\n- `LandRequest`: sourceBranch, targetBranch, workspacePath, agentId, taskId, streamId\n- `LandResult`: status (\"landed\"|\"conflict\"|\"failed\"), commitHash?, conflictFiles?, error?\n\n### Strategy Registry (`src/workspace/strategies/registry.ts`)\n- Factory pattern: `register(name, factory)`, `get(name, config)`\n- `defaultStrategyRegistry` singleton pre-registers all three built-in strategies\n\n### Built-in Strategies\n\n| Strategy | File | Behavior |\n|----------|------|----------|\n| **Queue** | `queue.ts` | Wraps `MergeQueueInterface.submit()`. Late binding via `setMergeQueue()` |\n| **Trunk** | `trunk.ts` | Direct push with rebase-retry loop. `maxRetries` (default: 3), 30s git timeout |\n| **Optimistic** | `optimistic.ts` | Same as trunk + emits `validation:requested` event via EventStore |\n\n### Worker handler changes (`src/lifecycle/handlers/worker.ts`)\n- Strategy dispatch in Step 4 before merge queue fallback\n- Pull mode: `shouldTerminate = false` when `taskMode === \"pull\"` and `status === \"completed\"`\n\n### Dependency wiring\n```\nMCPServices.integrationStrategy → DoneToolDeps → AllHandlerDeps → WorkerHandlerDeps → strategy.land()\n```\n\n## Minor Gaps\n- `TrunkStrategy.conflictAction` config stored but not used in `land()` (always abandons)\n- `initialize()`/`close()` lifecycle hooks defined in interface but never called by TeamRuntime or worker handler\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:25:48","updated_at":"2026-02-07 21:25:48","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-6ok3","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["integration-strategies","phase-2","self-driving","teams"]}
+{"id":"s-9yr3","uuid":"5852699c-d4bc-4a96-bfad-03e38d302ba8","title":"Task Pull Model (Phase 3)","file_path":"specs/s-9yr3_task_pull_model_phase_3.md","content":"\n# Task Pull Model (Phase 3)\n\nAutonomous task claiming where agents pull work from a shared pool rather than being explicitly assigned tasks.\n\n## Status: COMPLETE (InMemory backend only)\n\n## Components\n\n### TaskBackend extensions (`src/task/backend/types.ts`)\nOptional methods to avoid breaking existing backends:\n```typescript\nclaim?(agentId, filter?: ClaimFilter): Promise<ExtendedTask | null>;\nunclaim?(taskId): Promise<void>;\nlistClaimable?(filter?: ClaimFilter): Promise<ExtendedTask[]>;\n```\n`ClaimFilter`: `{ tags?, rootTasksOnly?, created_by? }`\n\n### InMemoryTaskBackend implementation (`src/task/backend/memory.ts`, lines 630-735)\n- `claim()`: listClaimable → take first candidate → re-check for contention → atomically assign\n- `unclaim()`: validate task exists + assigned → emit \"unassigned\" event\n- `listClaimable()`: pending + unassigned + unblocked + matching filters\n\n### Task tags (`src/store/types/tasks.ts`)\n- `tags?: string[]` on Task type for classification and filtered claiming\n\n### MCP tools\n| Tool | File | Purpose | Capability |\n|------|------|---------|------------|\n| `claim_task` | `src/mcp/tools/claim_task.ts` | Atomically claim next task | `task.claim` |\n| `unclaim_task` | `src/mcp/tools/unclaim_task.ts` | Return task to pool | `task.claim` |\n| `list_claimable_tasks` | `src/mcp/tools/list_claimable_tasks.ts` | Preview available tasks | `task.claim` |\n\n### Capability system\n- `TASK_CAPABILITIES.CLAIM = \"task.claim\"` in `src/roles/capabilities.ts`\n- `CAPABILITY_TOOL_MAP[task.claim] = [\"claim_task\", \"unclaim_task\", \"list_claimable_tasks\"]`\n\n## Known Gap\n- **SudocodeTaskBackend** does not implement `claim()`/`unclaim()`/`listClaimable()` — pull model is InMemory-only\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:25:59","updated_at":"2026-02-07 21:25:59","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-9yr3","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["phase-3","self-driving","task-pull","teams"]}
+{"id":"s-5w36","uuid":"2e027264-1fcb-4943-bab6-4b2e35a5c2a0","title":"Session Continuations (Phase 4)","file_path":"specs/s-5w36_session_continuations_phase_4.md","content":"\n# Session Continuations (Phase 4)\n\nPersist agent session history so long-running agents can be resumed across process restarts.\n\n## Status: COMPLETE (untested continuation monitoring)\n\n## Components\n\n### continueAgent() (`src/agent/agent-manager.ts`, lines 1384-1450)\n1. Load original agent from EventStore\n2. Query status events (up to `maxMessages`, default 50)\n3. Format event summaries as \"Prior Session Context\" markdown\n4. Spawn new agent with same role/parent + resume context as `customPrompt`\n5. Emit continuation event (`continuation_of: agentId`)\n\n```typescript\ninterface ContinueAgentOptions {\n  maxMessages?: number;       // History depth (default: 50)\n  task?: string;              // Override task description\n  additionalContext?: string; // Extra context to prepend\n}\n```\n\n### Continuation monitoring (`src/teams/team-runtime.ts`)\n- `monitorContinuations()` watches agent lifecycle events via `onLifecycleEvent()`\n- Monitors root and companion agent IDs\n- On unexpected stop (not \"completed\" or \"cancelled\"): wait 1s → `continueAgent()` → update tracking\n- Only active when `lifecycle.continuations.enabled === true` in team manifest\n- Unsubscribes on teardown\n\n## Known Gap\n- **No test coverage** for `monitorContinuations()` — edge cases untested (multiple lifecycle events, spawn failure, state consistency)\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:26:08","updated_at":"2026-02-07 21:26:08","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-5w36","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["continuations","phase-4","self-driving","teams"]}
+{"id":"s-ir97","uuid":"9d8a27cd-2a68-4664-8384-a5c1bb657f71","title":"Autonomous Observability (Phase 5)","file_path":"specs/s-ir97_autonomous_observability_phase_5.md","content":"\n# Autonomous Observability (Phase 5)\n\nMetrics primitives for monitoring throughput, error rates, and agent utilization during long-running multi-agent runs.\n\n## Status: COMPLETE\n\n## Components\n\n### Metrics functions (`src/metrics/metrics.ts`, 280 LOC)\n\n| Function | Queries | Returns |\n|----------|---------|---------|\n| `getThroughputMetrics(store, windowMs)` | Task events | tasksCompleted, tasksFailed, tasksCreated, completedPerMinute, avgCompletionTimeMs |\n| `getUtilizationMetrics(store, windowMs)` | Agent spawn/terminate | activeAgents, totalSpawned, totalStopped, agentsByRole, agentsByState |\n| `getErrorMetrics(store, windowMs, limit)` | Failed status + task events | totalErrors, errorsByType, recentErrors |\n\nAll functions query EventStore directly — no new event types or materialized views needed.\n\n### REST API endpoints (`src/api/server.ts`)\n\n| Endpoint | Query Params | Description |\n|----------|-------------|-------------|\n| `GET /api/metrics/throughput` | `window_ms` (default: 5min) | Task completion rates |\n| `GET /api/metrics/utilization` | — | Agent counts by role and state |\n| `GET /api/metrics/errors` | `window_ms` (default: 30min), `limit` (default: 20) | Error counts and recent failures |\n","priority":1,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:26:16","updated_at":"2026-02-07 21:26:16","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-ir97","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["metrics","phase-5","self-driving","teams"]}
+{"id":"s-29pg","uuid":"7e8bbb6d-51df-403b-8929-5b2793603367","title":"Communication Topology Gaps","file_path":"specs/s-29pg_communication_topology_gaps.md","content":"\n# Communication Topology Gaps\n\nFeatures where YAML config was loaded and validated but not wired into runtime behavior. Identified in the post-implementation audit.\n\n## Status: COMPLETE\n\nAll 5 gaps resolved. Implementation feedback attached to each closed issue.\n\n## Gap 1: Signal Filtering — IMPLEMENTED\n\n**Issue**: `i-3o8g` (closed)\n\n`SignalFilter` callback on MessageRouter, installed by TeamRuntime. Checks two sources:\n1. Peer connection filters (per-agent-pair, directional) from `routing.peers[].signals`\n2. Channel subscription filters (per-role) from `subscriptions[role][].signals`\n\nSignal name carried in `details.signal` field. Untagged events always pass through.\n\n## Gap 2: Peer Routing from Config — IMPLEMENTED\n\n**Issue**: `i-96f6` (closed)\n\n`wirePeerRoutes()` in TeamRuntime reads `routing.peers`, maps `via` to subscription type:\n- `direct` → subtree subscription (directional)\n- `topic` → shared named topic for both agents\n- `scope` → role-based subscription\n\nDeferred wiring for roles not spawned at bootstrap. Signal filters stored per-peer.\n\n## Gap 3: Wake Logic for Status Delivery — IMPLEMENTED\n\n**Issue**: `i-4dh7` (closed)\n\nBoth `routeStatusToSubtreeSubscribers()` and `routeStatusToTopicSubscribers()` now call `wakeHandler` with `priority: \"normal\"` (wakes idle agents, queues for busy).\n\n## Gap 4: Emission Restrictions — IMPLEMENTED\n\n**Issue**: `i-1zso` (closed)\n\n`EmissionValidator` callback on MessageRouter. Checks agent role against `communication.emissions` allowlist. Enforcement mode determines action.\n\n## Gap 5: Enforcement Mode — IMPLEMENTED\n\n**Issue**: `i-1zso` (closed)\n\nBranches on enforcement mode in emission validator:\n- `strict` → reject (block emission)\n- `permissive` → warn (allow through)\n- `audit` → record `emission_violation` event in EventStore, allow through\n","priority":2,"archived":0,"archived_at":null,"created_at":"2026-02-07 21:26:38","updated_at":"2026-02-09 08:17:22","parent_id":"s-1z9o","parent_uuid":"251030e1-ea2a-4857-95b3-2d94619f0b69","relationships":[{"from":"s-29pg","from_type":"spec","to":"s-1z9o","to_type":"spec","type":"implements"}],"tags":["communication","gaps","self-driving","teams"]}

package/CLAUDE.md CHANGED Viewed

@@ -5,11 +5,16 @@ A multi-agent orchestration system for spawning and managing hierarchical Claude
 ## Project Overview
 macro-agent enables coordinated work across multiple AI agents with:
-- **Role-based agents** (Worker, Integrator, Coordinator, Monitor)
+- **Role-based agents** (Worker, Integrator, Coordinator, Monitor + custom team roles)
+- **Team templates** for declarative multi-agent topologies (YAML config)
+- **Pluggable integration strategies** (queue, trunk, optimistic)
 - **Workspace isolation** via git worktrees
 - **Merge queue** for serialized integration
-- **Task backend** abstraction (memory or sudocode)
+- **Task backend** abstraction (memory or sudocode) with push and pull modes
 - **In-flight steering** via context injection
+- **Signal filtering and emission enforcement** for communication topology
+- **Session continuations** for long-running daemon agents
+- **Observability** via throughput, utilization, and error metrics
 ## Architecture
@@ -20,10 +25,18 @@ macro-agent enables coordinated work across multiple AI agents with:
 └───────────────────────────┬─────────────────────────────────┘
                             │
 ┌───────────────────────────▼─────────────────────────────────┐
+│                     Team Runtime (optional)                   │
+│  - Loads team YAML config (topology, communication)         │
+│  - Bootstraps root + companion agents                       │
+│  - Installs spawn interceptor, signal filters, validators   │
+│  - Manages integration strategy and session continuations   │
+└───────────────────────────┬─────────────────────────────────┘
+                            │
+┌───────────────────────────▼─────────────────────────────────┐
 │                     Agent Manager                            │
 │  - Spawns agents via acp-factory                            │
-│  - Manages lifecycle (spawn, prompt, stop)                  │
-│  - Exposes sessions for context injection                   │
+│  - Manages lifecycle (spawn, prompt, stop, continue)        │
+│  - Spawn interception for team role/topic injection         │
 └───────────────────────────┬─────────────────────────────────┘
                             │
         ┌───────────────────┼───────────────────┐
@@ -31,10 +44,11 @@ macro-agent enables coordinated work across multiple AI agents with:
         ▼                   ▼                   ▼
 ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
 │    Roles     │   │   Workspace  │   │    Tasks     │
-│  Worker      │   │  Bare Repo   │   │  Backend     │
-│  Integrator  │   │  Worktrees   │   │  (memory/    │
-│  Coordinator │   │  Pool        │   │   sudocode)  │
-│  Monitor     │   │              │   │              │
+│  Built-in +  │   │  Bare Repo   │   │  Backend     │
+│  Team-defined│   │  Worktrees   │   │  (memory/    │
+│  (via YAML)  │   │  Strategies  │   │   sudocode)  │
+│              │   │  (queue/     │   │  Push/Pull   │
+│              │   │   trunk/opt) │   │   modes      │
 └──────────────┘   └──────────────┘   └──────────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
@@ -43,7 +57,8 @@ macro-agent enables coordinated work across multiple AI agents with:
 │                     Message Router                           │
 │  - MAP addressing (agent, role, scope, parent/child)        │
 │  - sendToAddress() for all message routing                  │
-│  - Priority ordering (urgent > high > normal > low)         │
+│  - Topic-based status routing with signal filtering         │
+│  - Emission validation (strict/permissive/audit)            │
 │  - Activity waking for sleeping agents                      │
 └───────────────────────────┬─────────────────────────────────┘
                             │
@@ -51,7 +66,7 @@ macro-agent enables coordinated work across multiple AI agents with:
 │                      Event Store                             │
 │  - Append-only event log (SQLite)                           │
 │  - Materialized views for queries                           │
-│  - Agents, tasks, messages, events                          │
+│  - Agents, tasks, messages, events, team config             │
 └─────────────────────────────────────────────────────────────┘
 ```
@@ -65,34 +80,43 @@ src/
 │   └── session-mapper.ts   # Session → Agent ID mapping
 │
 ├── agent/               # Agent lifecycle
-│   ├── agent-manager.ts    # Spawn, prompt, stop agents
+│   ├── agent-manager.ts    # Spawn, prompt, stop, continue agents
 │   ├── wake.ts             # Wake sleeping agents
 │   └── system-prompt.ts    # Agent system prompts
 │
 ├── api/                 # REST API
-│   ├── server.ts           # Express routes
+│   ├── server.ts           # Express routes (agents, tasks, team, metrics)
 │   └── types.ts            # Request/response types
 │
 ├── cli/                 # Command-line interface
-│   └── index.ts            # CLI commands (start, chat, status)
+│   └── index.ts            # CLI commands (start, chat, status, --team)
+│
+├── config/              # Project configuration
+│   └── project-config.ts   # .macro-agent/config.json loader
 │
 ├── lifecycle/           # Agent lifecycle management
 │   ├── handlers/           # Role-specific done() handlers
-│   │   ├── worker.ts       # Worker completion (commit, merge request)
+│   │   ├── worker.ts       # Worker completion (strategy dispatch)
 │   │   ├── integrator.ts   # Integrator completion (merge queue)
 │   │   └── monitor.ts      # Monitor completion (health reporting)
 │   ├── cascade.ts          # Cascade termination
 │   └── cleanup.ts          # Workspace cleanup helpers
 │
 ├── mcp/                 # Model Context Protocol
-│   ├── mcp-server.ts       # Per-agent MCP server
+│   ├── mcp-server.ts       # Per-agent MCP server (role-based tool filtering)
 │   └── tools/              # MCP tool implementations
 │       ├── done.ts         # Generalized done() tool
-│       └── inject_context.ts # Context injection tool
+│       ├── inject_context.ts # Context injection tool
+│       ├── claim_task.ts   # Claim task from pool (pull mode)
+│       ├── unclaim_task.ts # Release claimed task (pull mode)
+│       └── list_claimable_tasks.ts # List available tasks (pull mode)
+│
+├── metrics/             # Observability
+│   └── metrics.ts          # Throughput, utilization, error metrics
 │
 ├── roles/               # Role system
 │   ├── types.ts            # RoleDefinition, Capability types
-│   ├── capabilities.ts     # Capability constants
+│   ├── capabilities.ts     # Capability constants (incl. task.claim)
 │   ├── registry.ts         # Role registry with resolution
 │   └── builtin/            # Built-in role definitions
 │       ├── worker.ts
@@ -101,10 +125,10 @@ src/
 │       └── monitor.ts
 │
 ├── router/              # Message routing
-│   ├── message-router.ts   # Core router with priority
+│   ├── message-router.ts   # Core router with signal filtering + emission validation
 │   ├── broadcast.ts        # Broadcast channel (fan-out)
 │   ├── role-resolver.ts    # Role → Agent resolution
-│   ├── wake.ts             # Activity waking
+│   ├── wake.ts             # Activity waking (status + message)
 │   └── types.ts            # Message, Channel types
 │
 ├── steering/            # In-flight steering
@@ -126,22 +150,42 @@ src/
 ├── task/                # Task management
 │   ├── task-manager.ts     # Legacy task manager
 │   └── backend/            # Pluggable task backends
-│       ├── types.ts        # TaskBackend interface
-│       ├── memory.ts       # InMemoryTaskBackend
+│       ├── types.ts        # TaskBackend interface (+ claim/unclaim/listClaimable)
+│       ├── memory.ts       # InMemoryTaskBackend (push + pull)
 │       ├── tool-provider.ts # Task MCP tools
 │       └── sudocode/       # Sudocode integration
 │
+├── teams/               # Team template system
+│   ├── types.ts            # TeamManifest, TeamTopology, TeamCommunication
+│   ├── team-loader.ts      # YAML loading, role resolution, validation
+│   ├── team-runtime.ts     # Initialize, bootstrap, peer routing, signal filtering
+│   └── index.ts            # Public exports
+│
 └── workspace/           # Workspace isolation
     ├── workspace-manager.ts # Worktree management
     ├── config.ts           # Workspace configuration
-    └── merge-queue/        # Merge queue
-        ├── merge-queue.ts  # SQLite-backed queue
-        ├── types.ts        # Queue types
-        └── schema.ts       # Database schema
+    ├── merge-queue/        # Merge queue
+    │   ├── merge-queue.ts  # SQLite-backed queue
+    │   ├── types.ts        # Queue types
+    │   └── schema.ts       # Database schema
+    └── strategies/         # Integration strategies
+        ├── types.ts        # IntegrationStrategy interface
+        ├── registry.ts     # Strategy factory registry
+        ├── queue.ts        # Queue strategy (wraps merge queue)
+        ├── trunk.ts        # Trunk strategy (direct push + rebase)
+        └── optimistic.ts   # Optimistic strategy (push + validation event)
 ```
 ## Key Concepts
+### Teams
+Teams are declarative YAML configurations that define multi-agent topologies:
+- **TeamLoader** (`team-loader.ts`): Parses `team.yaml`, resolves role inheritance, validates topology
+- **TeamRuntime** (`team-runtime.ts`): Wires team config into running services (roles, spawn interceptor, peer routing, signal filtering, emission validation, continuation monitoring)
+- Teams compose on top of existing primitives — no team loaded = identical behavior to pre-team codebase
+- Team config is shared across processes via EventStore `team_config` event
 ### Roles
 Agents are assigned roles that determine their capabilities:
@@ -153,18 +197,40 @@ Agents are assigned roles that determine their capabilities:
 | **Coordinator** | Orchestrate workers and manage tasks | Spawn agents, assign tasks, broadcast |
 | **Monitor** | Health monitoring and alerts | Read-only access, activity watching |
+Teams can define custom roles (e.g., planner, grinder, judge) that extend built-in roles via `extends` in `roles/<name>.yaml`. Custom roles can add/remove capabilities and provide custom prompts.
+### Integration Strategies
+Pluggable strategies for landing worker changes:
+- **Queue** (`queue.ts`): Wraps merge queue with serialized integration
+- **Trunk** (`trunk.ts`): Direct push with rebase-retry loop, configurable `conflictAction`
+- **Optimistic** (`optimistic.ts`): Same as trunk + emits `validation:requested` event
+Strategy is set per-team in `team.yaml` under `macro_agent.integration.strategy`.
 ### Workspace Isolation
 Each worker gets an isolated git worktree:
 - Workers operate on feature branches
-- Changes merge through the queue
+- Changes merge through the queue or strategy
 - Conflicts detected and resolved by integrator
 ### Task Backend
 Two backends available:
-- **memory**: In-memory tasks with EventStore persistence
-- **sudocode**: External issue tracking with dependency management
+- **memory**: In-memory tasks with EventStore persistence (supports push + pull modes)
+- **sudocode**: External issue tracking with dependency management (push mode only)
+Pull mode adds `claim_task`, `unclaim_task`, `list_claimable_tasks` MCP tools (gated by `task.claim` capability).
+### Communication Topology
+Teams configure non-hierarchical communication via:
+- **Channels**: Named topics (e.g., `work_coordination`, `task_updates`) with defined signals
+- **Subscriptions**: Per-role channel subscriptions with optional signal filters
+- **Peer routing**: Directional connections between roles (`via: direct|topic|scope`) with per-peer signal filters
+- **Emissions**: Per-role allowed signal lists, enforced by emission validator
+- **Enforcement**: `strict` (reject violations), `permissive` (warn), `audit` (record to EventStore)
 ### Context Injection
@@ -226,13 +292,20 @@ npm run test:e2e            # E2E tests (requires RUN_E2E_TESTS=true)
 3. Export tool info and handler
 4. Register in `src/mcp/mcp-server.ts`
-### Adding a New Role
+### Adding a New Built-in Role
 1. Create `src/roles/builtin/your_role.ts`
 2. Define `RoleDefinition` with capabilities
 3. Add enforcement implementations
 4. Register in `src/roles/builtin/index.ts`
+### Adding a Team Role (via YAML)
+1. Create `.macro-agent/teams/<team>/roles/<role>.yaml` with `extends` base role
+2. Add `capabilities_add`/`capabilities_remove` as needed
+3. Create `.macro-agent/teams/<team>/prompts/<role>.md` for custom prompt
+4. Reference the role in `team.yaml` topology and communication sections
 ### Modifying Task Backend
 1. Update interface in `src/task/backend/types.ts`
@@ -247,10 +320,17 @@ npm run test:e2e            # E2E tests (requires RUN_E2E_TESTS=true)
 | `SUDOCODE_PROJECT_PATH` | Path to sudocode project | `cwd` |
 | `MACRO_WORKSPACE_POOL_SIZE` | Max concurrent workspaces | `10` |
 | `MACRO_MERGE_QUEUE_DB` | Merge queue SQLite path | `:memory:` |
+| `MACRO_TEAM_NAME` | Team name (injected into agent env by team runtime) | — |
+| `MACRO_TASK_MODE` | Task mode: `push` or `pull` (injected by team runtime) | — |
+| `MACRO_INTEGRATION_STRATEGY` | Integration strategy name (injected by team runtime) | — |
+| `MACRO_INSTANCE_ID` | EventStore instance ID (for MCP subprocess access) | — |
+| `MACRO_BASE_DIR` | EventStore base directory (for MCP subprocess access) | — |
 ## References
 - [README.md](README.md) - User-facing documentation
-- [docs/sudocode-integration.md](docs/sudocode-integration.md) - Sudocode backend details
 - [docs/architecture.md](docs/architecture.md) - Full architecture documentation
 - [docs/configuration.md](docs/configuration.md) - Configuration reference
+- [docs/teams.md](docs/teams.md) - Team template schema reference
+- [docs/team-templates.md](docs/team-templates.md) - Team template format and examples
+- [docs/sudocode-integration.md](docs/sudocode-integration.md) - Sudocode backend details