npm - @doingdev/opencode-claude-manager-plugin - Versions diffs - 0.1.55 → 0.1.57 - Mend

@doingdev/opencode-claude-manager-plugin 0.1.55 → 0.1.57

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/dist/manager/team-orchestrator.d.ts +10 -3
package/dist/manager/team-orchestrator.js +108 -17
package/dist/plugin/agent-hierarchy.js +7 -3
package/dist/plugin/claude-manager.plugin.d.ts +8 -0
package/dist/plugin/claude-manager.plugin.js +38 -15
package/dist/prompts/registry.js +107 -57
package/dist/src/manager/team-orchestrator.d.ts +12 -5
package/dist/src/manager/team-orchestrator.js +111 -20
package/dist/src/plugin/agent-hierarchy.d.ts +2 -2
package/dist/src/plugin/agent-hierarchy.js +15 -20
package/dist/src/plugin/claude-manager.plugin.d.ts +8 -0
package/dist/src/plugin/claude-manager.plugin.js +51 -27
package/dist/src/plugin/service-factory.js +1 -1
package/dist/src/prompts/registry.js +115 -57
package/dist/src/types/contracts.d.ts +4 -1
package/dist/test/claude-manager.plugin.test.js +94 -13
package/dist/test/prompt-registry.test.js +26 -12
package/dist/test/report-claude-event.test.js +16 -3
package/dist/test/team-orchestrator.test.js +127 -7
package/dist/types/contracts.d.ts +1 -1
package/package.json +1 -1

package/dist/src/prompts/registry.js CHANGED Viewed

@@ -1,68 +1,93 @@
 export const managerPromptRegistry = {
     ctoSystemPrompt: [
         'You are a principal engineer orchestrating a team of AI-powered engineers.',
-        'You multiply your output by delegating precisely and reviewing critically.',
-        'Every prompt you send to an engineer costs time and tokens. Make each one count.',
-        '',
-        'Understand first:',
-        '- Before asking the user anything, extract what you can from the user message, codebase (read/grep/glob/codesearch), prior engineer results, and `websearch`/`webfetch` when relevant.',
-        '- Ask the user only when the answer would materially change scope, architecture, risk, or how you verify the outcome—and you cannot resolve it from those sources.',
-        '- Do not ask for facts you can discover yourself: file paths, current behavior, architecture, or framework conventions.',
-        '- Before using `question`, silently check: is it in the user message? answerable from code or transcripts? from web? If still blocked, is this a real decision or only uncertainty tolerance?',
-        '- Identify what already exists in the codebase before creating anything new.',
-        '- Think about what could go wrong and address it upfront.',
-        '- When a bug is reported, always explore the root cause before implementing a fix. No fix without investigation. If three fix attempts fail, question the architecture, not the hypothesis.',
-        '',
-        'Questions (high bar):',
-        '- Good questions resolve irreversible choices, product tradeoffs, or ambiguous success criteria that the codebase cannot answer.',
-        '- Bad questions ask for information already in context, or vague prompts like "what exactly do you want?" when you can give a concrete recommendation and what would change your mind.',
-        '- Each `question` should name the blocked decision, offer 2–3 concrete options, state your recommendation, and what breaks if the user picks differently.',
-        '- Use the `question` tool only when you cannot proceed safely from available evidence. One high-leverage question at a time, with a sensible fallback if the user defers.',
-        '',
-        'Challenge the framing:',
-        '- Not a mandatory opener: if the request is concrete, derive context first; reframe only when it would change what you build.',
-        '- Before planning, ask what the user is actually trying to achieve, not just what they asked for.',
-        '- If the request sounds like a feature ("add photo upload"), ask what job-to-be-done it serves. The real feature might be larger or different.',
-        '- One good reframe question saves more time than ten implementation questions.',
-        '',
-        'Plan and decompose:',
-        '- Break work into independent pieces that can run in parallel. Two engineers exploring in parallel then synthesizing beats one engineer doing everything sequentially.',
-        '- For medium or large tasks, delegate dual-engineer exploration to two engineers, then task the `architect` subagent to synthesize their independent plans into one stronger plan.',
-        '- Define clear success criteria before delegating. A good assignment includes: what to do, why, which files/areas are relevant, and how to verify it worked.',
-        '',
-        'Delegate through the Task tool:',
-        '- Tom, John, Maya, Sara, and Alex are persistent engineers. Each keeps a Claude Code session that remembers prior turns.',
-        '- Reuse the same engineer when follow-up work belongs to their prior context.',
-        '- Only one implementing engineer should modify the worktree at a time. Parallelize exploration freely.',
-        '- Do not delegate without telling the engineer what done looks like.',
-        '',
-        'Review and iterate:',
-        '- Review diffs with `git_diff`, inspect changed files with `git_status`, and use `git_log` for recent context.',
-        '- Give specific, actionable feedback. Not "this could be better" but "this is wrong because X, fix it by doing Y."',
-        '- Trust engineer findings but verify critical claims. Do not re-examine every file they already reviewed.',
-        '- If something fails, figure out what you missed in the assignment, not just what the engineer got wrong.',
-        '- After an engineer reports implementation done, review the diff looking for issues that pass tests but break in production: race conditions, N+1 queries, missing error handling, trust boundary violations, stale reads, forgotten enum cases.',
-        '- Auto-fix mechanical issues by sending a follow-up to the same engineer. Surface genuinely ambiguous issues to the user.',
-        '- Check scope: did the engineer build what was asked — nothing more, nothing less?',
-        '',
-        'Verify before declaring done:',
+        'Your role is to decompose work, delegate precisely, review diffs for production risks, and verify outcomes.',
+        'You do not write code. All edits go through engineers. You multiply output by coordinating parallel work and catching issues others miss.',
+        '',
+        '# Operating Loop: Orient → Classify → Plan → Delegate → Review → Verify → Close',
+        '',
+        '## Orient: Understand the request',
+        '- Extract what you can from the user message, codebase (read/grep/glob/codesearch), prior engineer results, and `websearch`/`webfetch` when relevant.',
+        '- Light investigation is fine: read files briefly to understand scope, check what already exists, avoid re-inventing.',
+        '- When a bug is reported, ask: what is the root cause? Do not assume. Delegate root-cause exploration if the answer is in code the user should review first.',
+        '- If requirements are vague or architecture is unclear, use `question` tool with 2–3 concrete options, your recommendation, and what breaks if user picks differently.',
+        '- Only ask when the decision will materially change scope, architecture, risk, or how you verify—and you cannot resolve it from context.',
+        '',
+        '## Classify: Frame the work',
+        '- Is this a bug fix, feature, refactor, or something else?',
+        '- What could go wrong? Is it reversible or irreversible? Can it fail in prod?',
+        '- Does it require careful rollout, data migration, observability, or backwards compatibility handling?',
+        '- Are there decisions the user has not explicitly made (architecture, scope, deployment strategy)?',
+        '',
+        '## Plan: Decompose into engineer work',
+        '- For small, focused tasks: delegate to a named engineer with structured context (goal, acceptance criteria, relevant files, constraints, verification).',
+        "- For medium or large tasks: use `task(subagent_type: 'team-planner', ...)` for dual-engineer exploration and plan synthesis.",
+        '  - Team-planner automatically selects two non-overlapping engineers by availability and context; you may optionally specify lead and challenger.',
+        '  - Challenger engineer identifies missing decisions, risks, and scope gaps before implementation.',
+        '- Break work into independent pieces that can run in parallel. Two engineers exploring then synthesizing beats one engineer doing everything sequentially.',
+        '- Before delegating, state your success criteria, not just the task. What done looks like. How you will verify it.',
+        '',
+        '## Delegate: Send precise assignments',
+        "- For single-engineer work: use `task(subagent_type: 'tom'|'john'|'maya'|'sara'|'alex', ...)` and structure the prompt with goal, acceptance criteria, relevant files, constraints, and verification.",
+        "- For dual-engineer planning: use `task(subagent_type: 'team-planner', ...)` which will lead + challenger synthesis.",
+        '- Each assignment includes: goal, acceptance criteria, relevant files/areas, constraints, and verification method.',
+        '- Reuse the same engineer when follow-up work builds on their prior context.',
+        '- Only one implementing engineer modifies the worktree at a time. Parallelize exploration and research freely.',
+        '',
+        '## Review: Inspect diffs for production safety',
+        '- After an engineer reports implementation done, review the diff with `git_diff` before declaring it complete.',
+        '- Use `git_log` and `git_status` for recent context.',
+        '- Check for these production-risk patterns (issues tests may not catch):',
+        '  - Race conditions: concurrent access to shared state, missing locks or atomic operations.',
+        '  - N+1 queries: loops that fetch data repeatedly instead of batch-loading.',
+        '  - Missing error handling: uncaught exceptions, unhandled promise rejections, missing null checks.',
+        '  - Trust boundary violations: user input used without validation, permissions not checked.',
+        '  - Stale reads: reading state without synchronization or caching without invalidation logic.',
+        '  - Forgotten enum cases: switches without default, missing case handlers.',
+        '  - Backwards compatibility: breaking API changes, schema migrations without rollback plan.',
+        '  - Observability gaps: no logging, metrics, or tracing for critical paths.',
+        '  - Rollout risk: changes that must be coordinated across services or require staged rollout.',
+        '- Give specific, actionable feedback. Not "this could be better" but "line 42 has a race condition because X; fix it by doing Y."',
+        '- Trust engineer findings but verify critical claims.',
+        '- Check scope: did the engineer build what was asked—nothing more, nothing less?',
+        '',
+        '## Verify: Run checks before shipping',
         '- After review passes, dispatch an engineer in verify mode to run the most relevant checks (tests, lint, typecheck, build) for what changed.',
         '- Do not declare a task complete until verification passes. If it fails, fix and re-verify.',
         '',
-        'Constraints:',
+        '## Close: Report outcome to user',
+        '- If everything verifies and passes review, tell the user the work is done and what changed.',
+        '- If a recommended question from planning was not yet surfaced to the user, surface it now with `question` tool before closing.',
+        '- If the work discovered unexpected scope or product decisions, ask the user before proceeding further.',
+        '',
+        '# Decision-Making Rules',
+        '',
+        '- Questions: Use the `question` tool when a decision will materially affect scope, architecture, or how you verify the outcome. Name the decision, offer 2–3 concrete options, state your recommendation, and say what breaks if the user picks differently. One high-leverage question at a time.',
+        '- Reframing: Before planning, ask what the user is actually trying to achieve, not just what they asked for. If the request sounds like a feature, ask what job-to-be-done it serves.',
+        '- Engineer selection: When assigning to a single engineer, prefer lower context pressure and less-recently-used engineers. Reuse if follow-up work builds on prior context.',
+        '- Failure handling:',
+        "  - contextExhausted: The engineer's session ran out of tokens. The system automatically resets and retries once with the same task on a fresh session.",
+        '  - sdkError or toolDenied: The underlying SDK failed or a tool call was denied. Investigate the error, adjust constraints, and retry.',
+        '  - engineerBusy: Wait, or choose a different engineer.',
+        '  - aborted: The user cancelled the work. Stop and report the cancellation.',
+        '',
+        '# Constraints',
+        '',
         '- Do not edit files or run bash directly. Engineers do the hands-on work.',
-        '- Do not read files or grep when an engineer can answer the question faster.',
+        '- Light investigation is fine for orientation (read, grep, glob). Delegate deeper exploration if it saves the engineer context.',
         '- Communicate proactively. If the plan changes or you discover something unexpected, tell the user.',
-        '- Ask follow-up questions when exploration, engineer results, or diffs expose a product or architecture tradeoff you could not have known at the start. Prefer that timing over opening with speculative clarifiers.',
+        '- Do not proceed with implementation if you cannot state success criteria.',
     ].join('\n'),
     engineerAgentPrompt: [
         "You are a named engineer on the CTO's team.",
-        'Your job is to run assignments through the `claude` tool, which connects to a persistent Claude Code session that remembers your prior turns.',
+        'The CTO sends assignments through a structured prompt containing: goal, mode (explore/implement/verify), context, acceptance criteria, relevant paths, constraints, and verification method.',
+        'Your job is to parse the assignment and run it through the `claude` tool, which connects to a persistent Claude Code session that remembers your prior turns.',
         '',
-        'Frame each assignment well:',
-        '- Include relevant context, file paths, and constraints the CTO provided.',
+        'How to handle assignments:',
+        '- Extract goal, mode, acceptance criteria, relevant files, and verification from the prompt.',
+        '- If any critical field is missing (e.g., no verification method), ask the CTO for clarification before proceeding.',
+        '- Frame the assignment for Claude Code using the provided structure.',
         '- Specify the work mode: explore (investigate, no edits), implement (make changes and verify), or verify (run checks and report).',
-        "- If the CTO's assignment is unclear, ask for clarification before sending it to Claude Code.",
         '',
         'Your wrapper context from prior turns is reloaded automatically. Use it to avoid repeating work or re-explaining context that Claude Code already knows.',
         "Return the tool result directly. Add your own commentary only when something was unexpected or needs the CTO's attention.",
@@ -71,16 +96,40 @@ export const managerPromptRegistry = {
         'You are an expert software engineer working inside Claude Code.',
         'Start with the smallest investigation that resolves the key uncertainty, then act.',
         'Follow repository conventions, AGENTS.md, and any project-level instructions.',
-        'Verify your own work before reporting done. Run the most relevant check (test, lint, typecheck, build) for what you changed.',
-        'Review your own diff before reporting done. Look for issues tests would not catch: race conditions, missing error handling, hardcoded values, incomplete enum handling.',
+        '',
+        'When investigating bugs:',
+        '- Always explore the root cause before implementing a fix. Do not assume; verify.',
+        '- If three fix attempts fail, question the architecture, not the hypothesis.',
+        '',
+        'When writing code:',
+        '- Consider rollout/migration/observability implications: Will this require staged rollout, data migration, new metrics, or log/trace points?',
+        '- Check for backwards compatibility: Will this change break existing APIs, integrations, or data formats?',
+        '- Think about failure modes: What happens if this code fails? Is it recoverable? Is there an audit trail?',
+        '',
+        'Verify your work before reporting done:',
+        '- Run the most relevant check (test, lint, typecheck, build) for what you changed.',
+        '- Review your own diff. Look for these issues tests may not catch:',
+        '  - Race conditions (concurrent access, missing locks).',
+        '  - N+1 queries or similar performance patterns.',
+        '  - Missing error handling or unhandled edge cases.',
+        '  - Hardcoded values that should be configurable.',
+        '  - Incomplete enum handling (missing cases).',
+        '  - Trust boundary violations (user input not validated).',
+        '  - Stale reads or cache invalidation bugs.',
+        '',
         'Report blockers immediately with exact error output. Do not retry silently more than once.',
         'Do not run git commit, git push, git reset, git checkout, or git stash.',
     ].join('\n'),
-    architectSystemPrompt: [
-        'You are the Architect. Your role is to synthesize two independent engineering plans into one stronger, unified plan.',
+    planSynthesisPrompt: [
+        'You are synthesizing two independent engineering plans into one stronger, unified plan.',
         'Compare the lead and challenger plans on clarity, feasibility, risk, and fit to the user request.',
         'Prefer the simplest path that fully addresses the goal. Surface tradeoffs honestly.',
-        'If the plans disagree on something only the user can decide, surface exactly one recommended question and one recommended answer.',
+        '',
+        'Identify the single most important decision the user must make to execute this plan safely and correctly.',
+        '- Look for disagreements between plans, scope boundaries, deployment/rollout strategy, backwards compatibility, or architectural tradeoffs.',
+        '- The user may have stated preferences in their request; check if anything is still unsolved.',
+        'Write it as Recommended Question and Recommended Answer. Only write NONE if no external decision is genuinely required.',
+        '',
         'Do not editorialize or over-explain. Be direct and concise.',
         '',
         'Use this output format exactly:',
@@ -91,6 +140,15 @@ export const managerPromptRegistry = {
         '## Recommended Answer',
         '<answer or NONE>',
     ].join('\n'),
+    teamPlannerPrompt: [
+        'You are the team planner. Your only job is to invoke `plan_with_team`.',
+        '`plan_with_team` dispatches two engineers in parallel (lead + challenger) then synthesizes their plans.',
+        '',
+        'Call `plan_with_team` immediately with the task and any engineer names provided.',
+        '- If lead and challenger engineer names are both specified, use them.',
+        '- If either name is missing, `plan_with_team` will auto-select two non-overlapping engineers based on availability and context.',
+        'Do not attempt any planning or analysis yourself. Delegate entirely to `plan_with_team`.',
+    ].join('\n'),
     contextWarnings: {
         moderate: 'Engineer context is getting full ({percent}% estimated). Reuse is still fine, but keep the next prompt focused.',
         high: 'Engineer context is heavy ({percent}% estimated, {turns} turns, ${cost}). Prefer a narrowly scoped follow-up or internal compaction.',

package/dist/src/types/contracts.d.ts CHANGED Viewed

@@ -2,7 +2,10 @@ export interface ManagerPromptRegistry {
     ctoSystemPrompt: string;
     engineerAgentPrompt: string;
     engineerSessionPrompt: string;
-    architectSystemPrompt: string;
+    /** Prompt prepended to the user prompt of the synthesis runTask call inside plan_with_team. */
+    planSynthesisPrompt: string;
+    /** Visible subagent prompt for teamPlanner — thin bridge that calls plan_with_team. */
+    teamPlannerPrompt: string;
     contextWarnings: {
         moderate: string;
         high: string;

package/dist/test/claude-manager.plugin.test.js CHANGED Viewed

@@ -1,6 +1,6 @@
 import { describe, expect, it } from 'vitest';
 import { ClaudeManagerPlugin } from '../src/plugin/claude-manager.plugin.js';
-import { AGENT_CTO, AGENT_ARCHITECT, ENGINEER_AGENT_IDS, ENGINEER_AGENT_NAMES, } from '../src/plugin/agent-hierarchy.js';
+import { AGENT_CTO, AGENT_TEAM_PLANNER, ENGINEER_AGENT_IDS, ENGINEER_AGENT_NAMES, } from '../src/plugin/agent-hierarchy.js';
 describe('ClaudeManagerPlugin', () => {
     it('configures CTO with orchestration tools and question access', async () => {
         const plugin = await ClaudeManagerPlugin({
@@ -34,16 +34,38 @@ describe('ClaudeManagerPlugin', () => {
             git_log: 'allow',
             claude: 'deny',
         });
+        // Task permissions should include both uppercase (user-friendly) and lowercase (canonical) agent IDs.
         expect(cto.permission.task).toEqual({
             '*': 'deny',
+            Tom: 'allow',
             tom: 'allow',
+            John: 'allow',
             john: 'allow',
+            Maya: 'allow',
             maya: 'allow',
+            Sara: 'allow',
             sara: 'allow',
+            Alex: 'allow',
             alex: 'allow',
-            architect: 'allow',
+            'team-planner': 'allow',
         });
     });
+    it('allows CTO to delegate to engineers using both uppercase and lowercase agent IDs', async () => {
+        const plugin = await ClaudeManagerPlugin({
+            worktree: '/tmp/project',
+        });
+        const config = {};
+        await plugin.config?.(config);
+        const agents = (config.agent ?? {});
+        const cto = agents[AGENT_CTO];
+        const taskPerms = cto.permission.task;
+        // Verify both uppercase and lowercase can be used for delegation.
+        // This prevents delegation failures when users write task(subagent_type: 'Maya') vs task(subagent_type: 'maya').
+        expect(taskPerms.Tom).toBe('allow');
+        expect(taskPerms.tom).toBe('allow');
+        expect(taskPerms.Maya).toBe('allow');
+        expect(taskPerms.maya).toBe('allow');
+    });
     it('configures every named engineer with only the claude bridge tool', async () => {
         const plugin = await ClaudeManagerPlugin({
             worktree: '/tmp/project',
@@ -69,26 +91,25 @@ describe('ClaudeManagerPlugin', () => {
             expect(agent.permission).not.toHaveProperty('grep');
         }
     });
-    it('configures architect as a read-only subagent for plan synthesis', async () => {
+    it('configures team-planner as a planning-bridge subagent', async () => {
         const plugin = await ClaudeManagerPlugin({
             worktree: '/tmp/project',
         });
         const config = {};
         await plugin.config?.(config);
         const agents = (config.agent ?? {});
-        const architect = agents[AGENT_ARCHITECT];
-        expect(architect).toBeDefined();
-        expect(architect.mode).toBe('subagent');
-        expect(architect.description.toLowerCase()).toContain('synthesiz');
-        expect(architect.permission).toMatchObject({
+        const teamPlanner = agents[AGENT_TEAM_PLANNER];
+        expect(teamPlanner).toBeDefined();
+        expect(teamPlanner.mode).toBe('subagent');
+        expect(teamPlanner.description.toLowerCase()).toContain('plan');
+        expect(teamPlanner.permission).toMatchObject({
             '*': 'deny',
-            read: 'allow',
-            grep: 'allow',
-            glob: 'allow',
-            list: 'allow',
-            codesearch: 'allow',
+            plan_with_team: 'allow',
+            question: 'allow',
             claude: 'deny',
         });
+        expect(teamPlanner.permission).not.toHaveProperty('read');
+        expect(teamPlanner.permission).not.toHaveProperty('grep');
     });
     it('registers the named engineer bridge and team status tools', async () => {
         const plugin = await ClaudeManagerPlugin({
@@ -129,3 +150,63 @@ describe('ClaudeManagerPlugin', () => {
         expect(plugin['experimental.chat.system.transform']).toBeTypeOf('function');
     });
 });
+describe('Agent ID normalization and lookup helpers', () => {
+    it('normalizeAgentId converts mixed-case agent IDs to lowercase', async () => {
+        const { normalizeAgentId } = await import('../src/plugin/claude-manager.plugin.js');
+        expect(normalizeAgentId('Tom')).toBe('tom');
+        expect(normalizeAgentId('MAYA')).toBe('maya');
+        expect(normalizeAgentId('john')).toBe('john');
+        expect(normalizeAgentId('JoHn')).toBe('john');
+    });
+    it('engineerFromAgent resolves both uppercase and lowercase agent IDs', async () => {
+        const { engineerFromAgent } = await import('../src/plugin/claude-manager.plugin.js');
+        // Lowercase (canonical)
+        expect(engineerFromAgent('tom')).toBe('Tom');
+        expect(engineerFromAgent('maya')).toBe('Maya');
+        expect(engineerFromAgent('john')).toBe('John');
+        // Uppercase (normalized)
+        expect(engineerFromAgent('Tom')).toBe('Tom');
+        expect(engineerFromAgent('Maya')).toBe('Maya');
+        expect(engineerFromAgent('John')).toBe('John');
+        // Mixed case
+        expect(engineerFromAgent('JoHn')).toBe('John');
+        expect(engineerFromAgent('mAyA')).toBe('Maya');
+    });
+    it('engineerFromAgent throws on invalid agent IDs', async () => {
+        const { engineerFromAgent } = await import('../src/plugin/claude-manager.plugin.js');
+        expect(() => engineerFromAgent('invalid')).toThrow('The claude tool can only be used from a named engineer agent');
+        expect(() => engineerFromAgent('TomInvalid')).toThrow('The claude tool can only be used from a named engineer agent');
+    });
+    it('isEngineerAgent identifies both uppercase and lowercase agent IDs', async () => {
+        const { isEngineerAgent } = await import('../src/plugin/claude-manager.plugin.js');
+        // Lowercase (canonical)
+        expect(isEngineerAgent('tom')).toBe(true);
+        expect(isEngineerAgent('maya')).toBe(true);
+        expect(isEngineerAgent('john')).toBe(true);
+        expect(isEngineerAgent('sara')).toBe(true);
+        expect(isEngineerAgent('alex')).toBe(true);
+        // Uppercase (normalized)
+        expect(isEngineerAgent('Tom')).toBe(true);
+        expect(isEngineerAgent('Maya')).toBe(true);
+        expect(isEngineerAgent('John')).toBe(true);
+        expect(isEngineerAgent('Sara')).toBe(true);
+        expect(isEngineerAgent('Alex')).toBe(true);
+        // Mixed case
+        expect(isEngineerAgent('JoHn')).toBe(true);
+        expect(isEngineerAgent('mAyA')).toBe(true);
+        // Invalid
+        expect(isEngineerAgent('invalid')).toBe(false);
+        expect(isEngineerAgent('cto')).toBe(false);
+        expect(isEngineerAgent('team-planner')).toBe(false);
+    });
+    it('CTO agent config does not have direct assign_engineer access (delegates to named engineers)', async () => {
+        const { buildCtoAgentConfig } = await import('../src/plugin/agent-hierarchy.js');
+        const { managerPromptRegistry } = await import('../src/prompts/registry.js');
+        const ctoConfig = buildCtoAgentConfig(managerPromptRegistry);
+        const ctoPermissions = ctoConfig.permission;
+        // CTO should NOT have direct access to assign_engineer (uses task() to named engineers instead)
+        expect(ctoPermissions['assign_engineer']).not.toBe('allow');
+        // CTO should NOT have direct access to plan_with_team (must delegate to team-planner)
+        expect(ctoPermissions['plan_with_team']).not.toBe('allow');
+    });
+});

package/dist/test/prompt-registry.test.js CHANGED Viewed

@@ -3,11 +3,13 @@ import { managerPromptRegistry } from '../src/prompts/registry.js';
 describe('managerPromptRegistry', () => {
     it('gives the CTO explicit orchestration guidance', () => {
         expect(managerPromptRegistry.ctoSystemPrompt).toContain('You are a principal engineer orchestrating a team of AI-powered engineers');
-        expect(managerPromptRegistry.ctoSystemPrompt).toContain('Task tool');
-        expect(managerPromptRegistry.ctoSystemPrompt).toContain('dual-engineer');
-        expect(managerPromptRegistry.ctoSystemPrompt).toContain('architect');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('Operating Loop');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('named engineer');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('team-planner');
         expect(managerPromptRegistry.ctoSystemPrompt).toContain('question');
-        expect(managerPromptRegistry.ctoSystemPrompt).toContain('Tom, John, Maya, Sara, and Alex');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('Review: Inspect diffs for production safety');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('race condition');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('contextExhausted');
         expect(managerPromptRegistry.ctoSystemPrompt).not.toContain('clear_session');
         expect(managerPromptRegistry.ctoSystemPrompt).not.toContain('freshSession');
     });
@@ -21,20 +23,32 @@ describe('managerPromptRegistry', () => {
     it('keeps the engineer session prompt direct and repo-aware', () => {
         expect(managerPromptRegistry.engineerSessionPrompt).toContain('expert software engineer');
         expect(managerPromptRegistry.engineerSessionPrompt).toContain('Start with the smallest investigation that resolves the key uncertainty');
-        expect(managerPromptRegistry.engineerSessionPrompt).toContain('Verify your own work');
+        expect(managerPromptRegistry.engineerSessionPrompt).toContain('Verify your work before reporting done');
         expect(managerPromptRegistry.engineerSessionPrompt).toContain('Do not run git commit');
+        expect(managerPromptRegistry.engineerSessionPrompt).toContain('rollout');
+        expect(managerPromptRegistry.engineerSessionPrompt).toContain('backwards compatibility');
     });
     it('keeps context warnings available for engineer sessions', () => {
         expect(managerPromptRegistry.contextWarnings.moderate).toContain('{percent}');
         expect(managerPromptRegistry.contextWarnings.high).toContain('{turns}');
         expect(managerPromptRegistry.contextWarnings.critical).toContain('near capacity');
     });
-    it('gives the architect synthesis guidance with complete output format', () => {
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('Architect');
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('synthesiz');
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('two independent');
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('## Synthesis');
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('## Recommended Question');
-        expect(managerPromptRegistry.architectSystemPrompt).toContain('## Recommended Answer');
+    it('planSynthesisPrompt contains synthesis guidance and complete output format', () => {
+        expect(managerPromptRegistry.planSynthesisPrompt).toContain('synthesiz');
+        expect(managerPromptRegistry.planSynthesisPrompt).toContain('two independent');
+        expect(managerPromptRegistry.planSynthesisPrompt).toContain('## Synthesis');
+        expect(managerPromptRegistry.planSynthesisPrompt).toContain('## Recommended Question');
+        expect(managerPromptRegistry.planSynthesisPrompt).toContain('## Recommended Answer');
+    });
+    it('teamPlannerPrompt directs the agent to call plan_with_team with autonomous engineer selection', () => {
+        expect(managerPromptRegistry.teamPlannerPrompt).toContain('plan_with_team');
+        expect(managerPromptRegistry.teamPlannerPrompt).toContain('auto-select');
+        expect(managerPromptRegistry.teamPlannerPrompt).toContain('engineer');
+    });
+    it('ctoSystemPrompt delegates single work to named engineers via task() and dual work to team-planner', () => {
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('task(subagent_type:');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('single-engineer');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('team-planner');
+        expect(managerPromptRegistry.ctoSystemPrompt).toContain('automatically selects');
     });
 });

package/dist/test/report-claude-event.test.js CHANGED Viewed

@@ -163,6 +163,19 @@ describe('reportClaudeEvent — via plugin onEvent chain', () => {
         expect(call.title).toBe('⚡ Maya → git_status');
         expect(call.metadata.toolArgs).toEqual({});
     });
+    it('surfaces status event as visible metadata', async () => {
+        const event = {
+            type: 'status',
+            text: 'Context exhausted; resetting session and retrying once with a fresh session.',
+        };
+        const { plugin } = await setupPlugin([event]);
+        const { metadata, ctx } = makeContext(tempRoot, ENGINEER_AGENT_IDS.Sara, 'wrapper-6');
+        await executeClaude(plugin, ctx);
+        const statusCall = metadata.mock.calls.find(([c]) => c?.title?.includes('ℹ️'))?.[0];
+        expect(statusCall).toBeDefined();
+        expect(statusCall.title).toBe('ℹ️ Sara: Context exhausted; resetting session and retrying once with a fresh session.');
+        expect(statusCall.metadata.status).toBe('Context exhausted; resetting session and retrying once with a fresh session.');
+    });
 });
 // ── Second-invocation continuity ─────────────────────────────────────────────
 describe('second invocation continuity', () => {
@@ -180,7 +193,7 @@ describe('second invocation continuity', () => {
         // ── Phase 1: first task via orchestrator (no real SDK needed) ──────────
         const store = new TeamStateStore();
         await store.setActiveTeam(tempRoot, 'cto-1');
-        const orchestrator = new TeamOrchestrator({ runTask: vi.fn() }, store, { appendEvents: vi.fn(async () => undefined) }, 'Base prompt', 'Architect prompt');
+        const orchestrator = new TeamOrchestrator({ runTask: vi.fn() }, store, { appendEvents: vi.fn(async () => undefined) }, 'Base prompt', 'Synthesis prompt');
         await orchestrator.recordWrapperSession(tempRoot, 'cto-1', 'Tom', 'wrapper-tom-1');
         await orchestrator.recordWrapperExchange(tempRoot, 'cto-1', 'Tom', 'wrapper-tom-1', 'explore', 'Investigate the auth flow', 'Found two race conditions in the token refresh path.');
         // ── Phase 2: process restart ───────────────────────────────────────────
@@ -206,7 +219,7 @@ describe('second invocation continuity', () => {
         // ── Phase 1: pre-seed Tom with a claudeSessionId ───────────────────────
         const store = new TeamStateStore();
         await store.setActiveTeam(tempRoot, 'cto-1');
-        const orchestrator = new TeamOrchestrator({ runTask: vi.fn() }, store, { appendEvents: vi.fn(async () => undefined) }, 'Base prompt', 'Architect prompt');
+        const orchestrator = new TeamOrchestrator({ runTask: vi.fn() }, store, { appendEvents: vi.fn(async () => undefined) }, 'Base prompt', 'Synthesis prompt');
         await orchestrator.getOrCreateTeam(tempRoot, 'cto-1');
         await store.updateTeam(tempRoot, 'cto-1', (team) => ({
             ...team,
@@ -240,7 +253,7 @@ describe('second invocation continuity', () => {
         expect(runTask).toHaveBeenCalledOnce();
         expect(runTask.mock.calls[0]?.[0]).toMatchObject({
             resumeSessionId: 'ses-tom-persisted',
-            systemPrompt: undefined, // no new system prompt when resuming
         });
+        expect(runTask.mock.calls[0]?.[0].systemPrompt).toBeUndefined(); // no system prompt when resuming
     });
 });