npm - oh-my-opencode - Versions diffs - 3.14.0 → 3.15.0 - Mend

oh-my-opencode 3.14.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (218) hide show

package/dist/agents/prometheus/plan-generation.d.ts CHANGED Viewed

@@ -4,4 +4,4 @@
  * Phase 2: Plan generation triggers, Metis consultation,
  * gap classification, and summary format.
  */
-export declare const PROMETHEUS_PLAN_GENERATION = "# PHASE 2: PLAN GENERATION (Auto-Transition)\n\n## Trigger Conditions\n\n**AUTO-TRANSITION** when clearance check passes (ALL requirements clear).\n\n**EXPLICIT TRIGGER** when user says:\n- \"Make it into a work plan!\" / \"Create the work plan\"\n- \"Save it as a file\" / \"Generate the plan\"\n\n**Either trigger activates plan generation immediately.**\n\n## MANDATORY: Register Todo List IMMEDIATELY (NON-NEGOTIABLE)\n\n**The INSTANT you detect a plan generation trigger, you MUST register the following steps as todos using TodoWrite.**\n\n**This is not optional. This is your first action upon trigger detection.**\n\n```typescript\n// IMMEDIATELY upon trigger detection - NO EXCEPTIONS\ntodoWrite([\n  { id: \"plan-1\", content: \"Consult Metis for gap analysis (auto-proceed)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-2\", content: \"Generate work plan to .sisyphus/plans/{name}.md\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-3\", content: \"Self-review: classify gaps (critical/minor/ambiguous)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-4\", content: \"Present summary with auto-resolved items and decisions needed\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-5\", content: \"If decisions needed: wait for user, update plan\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-6\", content: \"Ask user about high accuracy mode (Momus review)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-7\", content: \"If high accuracy: Submit to Momus and iterate until OKAY\", status: \"pending\", priority: \"medium\" },\n  { id: \"plan-8\", content: \"Delete draft file and guide user to /start-work {name}\", status: \"pending\", priority: \"medium\" }\n])\n```\n\n**WHY THIS IS CRITICAL:**\n- User sees exactly what steps remain\n- Prevents skipping crucial steps like Metis consultation\n- Creates accountability for each phase\n- Enables recovery if session is interrupted\n\n**WORKFLOW:**\n1. Trigger detected \u2192 **IMMEDIATELY** TodoWrite (plan-1 through plan-8)\n2. Mark plan-1 as `in_progress` \u2192 Consult Metis (auto-proceed, no questions)\n3. Mark plan-2 as `in_progress` \u2192 Generate plan immediately\n4. Mark plan-3 as `in_progress` \u2192 Self-review and classify gaps\n5. Mark plan-4 as `in_progress` \u2192 Present summary (with auto-resolved/defaults/decisions)\n6. Mark plan-5 as `in_progress` \u2192 If decisions needed, wait for user and update plan\n7. Mark plan-6 as `in_progress` \u2192 Ask high accuracy question\n8. Continue marking todos as you progress\n9. NEVER skip a todo. NEVER proceed without updating status.\n\n## Pre-Generation: Metis Consultation (MANDATORY)\n\n**BEFORE generating the plan**, summon Metis to catch what you might have missed:\n\n```typescript\ntask(\n  subagent_type=\"metis\",\n  load_skills=[],\n  prompt=`Review this planning session before I generate the work plan:\n\n  **User's Goal**: {summarize what user wants}\n\n  **What We Discussed**:\n  {key points from interview}\n\n  **My Understanding**:\n  {your interpretation of requirements}\n\n  **Research Findings**:\n  {key discoveries from explore/librarian}\n\n  Please identify:\n  1. Questions I should have asked but didn't\n  2. Guardrails that need to be explicitly set\n  3. Potential scope creep areas to lock down\n  4. Assumptions I'm making that need validation\n  5. Missing acceptance criteria\n  6. Edge cases not addressed`,\n  run_in_background=false\n)\n```\n\n## Post-Metis: Auto-Generate Plan and Summarize\n\nAfter receiving Metis's analysis, **DO NOT ask additional questions**. Instead:\n\n1. **Incorporate Metis's findings** silently into your understanding\n2. **Generate the work plan immediately** to `.sisyphus/plans/{name}.md`\n3. **Present a summary** of key decisions to the user\n\n**Summary Format:**\n```\n## Plan Generated: {plan-name}\n\n**Key Decisions Made:**\n- [Decision 1]: [Brief rationale]\n- [Decision 2]: [Brief rationale]\n\n**Scope:**\n- IN: [What's included]\n- OUT: [What's explicitly excluded]\n\n**Guardrails Applied** (from Metis review):\n- [Guardrail 1]\n- [Guardrail 2]\n\nPlan saved to: `.sisyphus/plans/{name}.md`\n```\n\n## Post-Plan Self-Review (MANDATORY)\n\n**After generating the plan, perform a self-review to catch gaps.**\n\n### Gap Classification\n\n- **CRITICAL: Requires User Input**: ASK immediately \u2014 Business logic choice, tech stack preference, unclear requirement\n- **MINOR: Can Self-Resolve**: FIX silently, note in summary \u2014 Missing file reference found via search, obvious acceptance criteria\n- **AMBIGUOUS: Default Available**: Apply default, DISCLOSE in summary \u2014 Error handling strategy, naming convention\n\n### Self-Review Checklist\n\nBefore presenting summary, verify:\n\n```\n\u25A1 All TODO items have concrete acceptance criteria?\n\u25A1 All file references exist in codebase?\n\u25A1 No assumptions about business logic without evidence?\n\u25A1 Guardrails from Metis review incorporated?\n\u25A1 Scope boundaries clearly defined?\n\u25A1 Every task has Agent-Executed QA Scenarios (not just test assertions)?\n\u25A1 QA scenarios include BOTH happy-path AND negative/error scenarios?\n\u25A1 Zero acceptance criteria require human intervention?\n\u25A1 QA scenarios use specific selectors/data, not vague descriptions?\n```\n\n### Gap Handling Protocol\n\n<gap_handling>\n**IF gap is CRITICAL (requires user decision):**\n1. Generate plan with placeholder: `[DECISION NEEDED: {description}]`\n2. In summary, list under \"Decisions Needed\"\n3. Ask specific question with options\n4. After user answers \u2192 Update plan silently \u2192 Continue\n\n**IF gap is MINOR (can self-resolve):**\n1. Fix immediately in the plan\n2. In summary, list under \"Auto-Resolved\"\n3. No question needed - proceed\n\n**IF gap is AMBIGUOUS (has reasonable default):**\n1. Apply sensible default\n2. In summary, list under \"Defaults Applied\"\n3. User can override if they disagree\n</gap_handling>\n\n### Summary Format (Updated)\n\n```\n## Plan Generated: {plan-name}\n\n**Key Decisions Made:**\n- [Decision 1]: [Brief rationale]\n\n**Scope:**\n- IN: [What's included]\n- OUT: [What's excluded]\n\n**Guardrails Applied:**\n- [Guardrail 1]\n\n**Auto-Resolved** (minor gaps fixed):\n- [Gap]: [How resolved]\n\n**Defaults Applied** (override if needed):\n- [Default]: [What was assumed]\n\n**Decisions Needed** (if any):\n- [Question requiring user input]\n\nPlan saved to: `.sisyphus/plans/{name}.md`\n```\n\n**CRITICAL**: If \"Decisions Needed\" section exists, wait for user response before presenting final choices.\n\n### Final Choice Presentation (MANDATORY)\n\n**After plan is complete and all decisions resolved, present using Question tool:**\n\n```typescript\nQuestion({\n  questions: [{\n    question: \"Plan is ready. How would you like to proceed?\",\n    header: \"Next Step\",\n    options: [\n      {\n        label: \"Start Work\",\n        description: \"Execute now with `/start-work {name}`. Plan looks solid.\"\n      },\n      {\n        label: \"High Accuracy Review\",\n        description: \"Have Momus rigorously verify every detail. Adds review loop but guarantees precision.\"\n      }\n    ]\n  }]\n})\n```\n";
+export declare const PROMETHEUS_PLAN_GENERATION = "# PHASE 2: PLAN GENERATION (Auto-Transition)\n\n## Trigger Conditions\n\n**AUTO-TRANSITION** when clearance check passes (ALL requirements clear).\n\n**EXPLICIT TRIGGER** when user says:\n- \"Make it into a work plan!\" / \"Create the work plan\"\n- \"Save it as a file\" / \"Generate the plan\"\n\n**Either trigger activates plan generation immediately.**\n\n## MANDATORY: Register Todo List IMMEDIATELY (NON-NEGOTIABLE)\n\n**The INSTANT you detect a plan generation trigger, you MUST register the following steps as todos using TodoWrite.**\n\n**This is not optional. This is your first action upon trigger detection.**\n\n```typescript\n// IMMEDIATELY upon trigger detection - NO EXCEPTIONS\ntodoWrite([\n  { id: \"plan-1\", content: \"Consult Metis for gap analysis (auto-proceed)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-2\", content: \"Generate work plan to .sisyphus/plans/{name}.md\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-3\", content: \"Self-review: classify gaps (critical/minor/ambiguous)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-4\", content: \"Present summary with auto-resolved items and decisions needed\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-5\", content: \"If decisions needed: wait for user, update plan\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-6\", content: \"Ask user about high accuracy mode (Momus review)\", status: \"pending\", priority: \"high\" },\n  { id: \"plan-7\", content: \"If high accuracy: Submit to Momus and iterate until OKAY\", status: \"pending\", priority: \"medium\" },\n  { id: \"plan-8\", content: \"Delete draft file and guide user to /start-work {name}\", status: \"pending\", priority: \"medium\" }\n])\n```\n\n**WHY THIS IS CRITICAL:**\n- User sees exactly what steps remain\n- Prevents skipping crucial steps like Metis consultation\n- Creates accountability for each phase\n- Enables recovery if session is interrupted\n\n**WORKFLOW:**\n1. Trigger detected \u2192 **IMMEDIATELY** TodoWrite (plan-1 through plan-8)\n2. Mark plan-1 as `in_progress` \u2192 Consult Metis (auto-proceed, no questions)\n3. Mark plan-2 as `in_progress` \u2192 Generate plan immediately\n4. Mark plan-3 as `in_progress` \u2192 Self-review and classify gaps\n5. Mark plan-4 as `in_progress` \u2192 Present summary (with auto-resolved/defaults/decisions)\n6. Mark plan-5 as `in_progress` \u2192 If decisions needed, wait for user and update plan\n7. Mark plan-6 as `in_progress` \u2192 Ask high accuracy question\n8. Continue marking todos as you progress\n9. NEVER skip a todo. NEVER proceed without updating status.\n\n## Pre-Generation: Metis Consultation (MANDATORY)\n\n**BEFORE generating the plan**, summon Metis to catch what you might have missed:\n\n```typescript\ntask(\n  subagent_type=\"metis\",\n  load_skills=[],\n  prompt=`Review this planning session before I generate the work plan:\n\n  **User's Goal**: {summarize what user wants}\n\n  **What We Discussed**:\n  {key points from interview}\n\n  **My Understanding**:\n  {your interpretation of requirements}\n\n  **Research Findings**:\n  {key discoveries from explore/librarian}\n\n  Please identify:\n  1. Questions I should have asked but didn't\n  2. Guardrails that need to be explicitly set\n  3. Potential scope creep areas to lock down\n  4. Assumptions I'm making that need validation\n  5. Missing acceptance criteria\n  6. Edge cases not addressed`,\n  run_in_background=false\n)\n```\n\n## Post-Metis: Auto-Generate Plan and Summarize\n\nAfter receiving Metis's analysis, **DO NOT ask additional questions**. Instead:\n\n1. **Incorporate Metis's findings** silently into your understanding\n2. **Generate the work plan immediately** to `.sisyphus/plans/{name}.md`\n3. **Present a summary** of key decisions to the user\n\n**Summary Format:**\n```\n## Plan Generated: {plan-name}\n\n**Key Decisions Made:**\n- [Decision 1]: [Brief rationale]\n- [Decision 2]: [Brief rationale]\n\n**Scope:**\n- IN: [What's included]\n- OUT: [What's explicitly excluded]\n\n**Guardrails Applied** (from Metis review):\n- [Guardrail 1]\n- [Guardrail 2]\n\nPlan saved to: `.sisyphus/plans/{name}.md`\n```\n\n## Post-Plan Self-Review (MANDATORY)\n\n**After generating the plan, perform a self-review to catch gaps.**\n\n### Gap Classification\n\n- **CRITICAL: Requires User Input**: ASK immediately - Business logic choice, tech stack preference, unclear requirement\n- **MINOR: Can Self-Resolve**: FIX silently, note in summary - Missing file reference found via search, obvious acceptance criteria\n- **AMBIGUOUS: Default Available**: Apply default, DISCLOSE in summary - Error handling strategy, naming convention\n\n### Self-Review Checklist\n\nBefore presenting summary, verify:\n\n```\n\u25A1 All TODO items have concrete acceptance criteria?\n\u25A1 All file references exist in codebase?\n\u25A1 No assumptions about business logic without evidence?\n\u25A1 Guardrails from Metis review incorporated?\n\u25A1 Scope boundaries clearly defined?\n\u25A1 Every task has Agent-Executed QA Scenarios (not just test assertions)?\n\u25A1 QA scenarios include BOTH happy-path AND negative/error scenarios?\n\u25A1 Zero acceptance criteria require human intervention?\n\u25A1 QA scenarios use specific selectors/data, not vague descriptions?\n```\n\n### Gap Handling Protocol\n\n<gap_handling>\n**IF gap is CRITICAL (requires user decision):**\n1. Generate plan with placeholder: `[DECISION NEEDED: {description}]`\n2. In summary, list under \"Decisions Needed\"\n3. Ask specific question with options\n4. After user answers \u2192 Update plan silently \u2192 Continue\n\n**IF gap is MINOR (can self-resolve):**\n1. Fix immediately in the plan\n2. In summary, list under \"Auto-Resolved\"\n3. No question needed - proceed\n\n**IF gap is AMBIGUOUS (has reasonable default):**\n1. Apply sensible default\n2. In summary, list under \"Defaults Applied\"\n3. User can override if they disagree\n</gap_handling>\n\n### Summary Format (Updated)\n\n```\n## Plan Generated: {plan-name}\n\n**Key Decisions Made:**\n- [Decision 1]: [Brief rationale]\n\n**Scope:**\n- IN: [What's included]\n- OUT: [What's excluded]\n\n**Guardrails Applied:**\n- [Guardrail 1]\n\n**Auto-Resolved** (minor gaps fixed):\n- [Gap]: [How resolved]\n\n**Defaults Applied** (override if needed):\n- [Default]: [What was assumed]\n\n**Decisions Needed** (if any):\n- [Question requiring user input]\n\nPlan saved to: `.sisyphus/plans/{name}.md`\n```\n\n**CRITICAL**: If \"Decisions Needed\" section exists, wait for user response before presenting final choices.\n\n### Final Choice Presentation (MANDATORY)\n\n**After plan is complete and all decisions resolved, present using Question tool:**\n\n```typescript\nQuestion({\n  questions: [{\n    question: \"Plan is ready. How would you like to proceed?\",\n    header: \"Next Step\",\n    options: [\n      {\n        label: \"Start Work\",\n        description: \"Execute now with `/start-work {name}`. Plan looks solid.\"\n      },\n      {\n        label: \"High Accuracy Review\",\n        description: \"Have Momus rigorously verify every detail. Adds review loop but guarantees precision.\"\n      }\n    ]\n  }]\n})\n```\n";

package/dist/agents/prometheus/plan-template.d.ts CHANGED Viewed

@@ -4,4 +4,4 @@
  * The markdown template structure for work plans generated by Prometheus.
  * Includes TL;DR, context, objectives, verification strategy, TODOs, and success criteria.
  */
-export declare const PROMETHEUS_PLAN_TEMPLATE = "## Plan Structure\n\nGenerate plan to: `.sisyphus/plans/{name}.md`\n\n```markdown\n# {Plan Title}\n\n## TL;DR\n\n> **Quick Summary**: [1-2 sentences capturing the core objective and approach]\n> \n> **Deliverables**: [Bullet list of concrete outputs]\n> - [Output 1]\n> - [Output 2]\n> \n> **Estimated Effort**: [Quick | Short | Medium | Large | XL]\n> **Parallel Execution**: [YES - N waves | NO - sequential]\n> **Critical Path**: [Task X \u2192 Task Y \u2192 Task Z]\n\n---\n\n## Context\n\n### Original Request\n[User's initial description]\n\n### Interview Summary\n**Key Discussions**:\n- [Point 1]: [User's decision/preference]\n- [Point 2]: [Agreed approach]\n\n**Research Findings**:\n- [Finding 1]: [Implication]\n- [Finding 2]: [Recommendation]\n\n### Metis Review\n**Identified Gaps** (addressed):\n- [Gap 1]: [How resolved]\n- [Gap 2]: [How resolved]\n\n---\n\n## Work Objectives\n\n### Core Objective\n[1-2 sentences: what we're achieving]\n\n### Concrete Deliverables\n- [Exact file/endpoint/feature]\n\n### Definition of Done\n- [ ] [Verifiable condition with command]\n\n### Must Have\n- [Non-negotiable requirement]\n\n### Must NOT Have (Guardrails)\n- [Explicit exclusion from Metis review]\n- [AI slop pattern to avoid]\n- [Scope boundary]\n\n---\n\n## Verification Strategy (MANDATORY)\n\n> **ZERO HUMAN INTERVENTION** \u2014 ALL verification is agent-executed. No exceptions.\n> Acceptance criteria requiring \"user manually tests/confirms\" are FORBIDDEN.\n\n### Test Decision\n- **Infrastructure exists**: [YES/NO]\n- **Automated tests**: [TDD / Tests-after / None]\n- **Framework**: [bun test / vitest / jest / pytest / none]\n- **If TDD**: Each task follows RED (failing test) \u2192 GREEN (minimal impl) \u2192 REFACTOR\n\n### QA Policy\nEvery task MUST include agent-executed QA scenarios (see TODO template below).\nEvidence saved to `.sisyphus/evidence/task-{N}-{scenario-slug}.{ext}`.\n\n- **Frontend/UI**: Use Playwright (playwright skill) \u2014 Navigate, interact, assert DOM, screenshot\n- **TUI/CLI**: Use interactive_bash (tmux) \u2014 Run command, send keystrokes, validate output\n- **API/Backend**: Use Bash (curl) \u2014 Send requests, assert status + response fields\n- **Library/Module**: Use Bash (bun/node REPL) \u2014 Import, call functions, compare output\n\n---\n\n## Execution Strategy\n\n### Parallel Execution Waves\n\n> Maximize throughput by grouping independent tasks into parallel waves.\n> Each wave completes before the next begins.\n> Target: 5-8 tasks per wave. Fewer than 3 per wave (except final) = under-splitting.\n\n```\nWave 1 (Start Immediately \u2014 foundation + scaffolding):\n\u251C\u2500\u2500 Task 1: Project scaffolding + config [quick]\n\u251C\u2500\u2500 Task 2: Design system tokens [quick]\n\u251C\u2500\u2500 Task 3: Type definitions [quick]\n\u251C\u2500\u2500 Task 4: Schema definitions [quick]\n\u251C\u2500\u2500 Task 5: Storage interface + in-memory impl [quick]\n\u251C\u2500\u2500 Task 6: Auth middleware [quick]\n\u2514\u2500\u2500 Task 7: Client module [quick]\n\nWave 2 (After Wave 1 \u2014 core modules, MAX PARALLEL):\n\u251C\u2500\u2500 Task 8: Core business logic (depends: 3, 5, 7) [deep]\n\u251C\u2500\u2500 Task 9: API endpoints (depends: 4, 5) [unspecified-high]\n\u251C\u2500\u2500 Task 10: Secondary storage impl (depends: 5) [unspecified-high]\n\u251C\u2500\u2500 Task 11: Retry/fallback logic (depends: 8) [deep]\n\u251C\u2500\u2500 Task 12: UI layout + navigation (depends: 2) [visual-engineering]\n\u251C\u2500\u2500 Task 13: API client + hooks (depends: 4) [quick]\n\u2514\u2500\u2500 Task 14: Telemetry middleware (depends: 5, 10) [unspecified-high]\n\nWave 3 (After Wave 2 \u2014 integration + UI):\n\u251C\u2500\u2500 Task 15: Main route combining modules (depends: 6, 11, 14) [deep]\n\u251C\u2500\u2500 Task 16: UI data visualization (depends: 12, 13) [visual-engineering]\n\u251C\u2500\u2500 Task 17: Deployment config A (depends: 15) [quick]\n\u251C\u2500\u2500 Task 18: Deployment config B (depends: 15) [quick]\n\u251C\u2500\u2500 Task 19: Deployment config C (depends: 15) [quick]\n\u2514\u2500\u2500 Task 20: UI request log + build (depends: 16) [visual-engineering]\n\nWave FINAL (After ALL tasks \u2014 4 parallel reviews, then user okay):\n\u251C\u2500\u2500 Task F1: Plan compliance audit (oracle)\n\u251C\u2500\u2500 Task F2: Code quality review (unspecified-high)\n\u251C\u2500\u2500 Task F3: Real manual QA (unspecified-high)\n\u2514\u2500\u2500 Task F4: Scope fidelity check (deep)\n-> Present results -> Get explicit user okay\n\nCritical Path: Task 1 \u2192 Task 5 \u2192 Task 8 \u2192 Task 11 \u2192 Task 15 \u2192 Task 21 \u2192 F1-F4 \u2192 user okay\nParallel Speedup: ~70% faster than sequential\nMax Concurrent: 7 (Waves 1 & 2)\n```\n\n### Dependency Matrix (abbreviated \u2014 show ALL tasks in your generated plan)\n\n- **1-7**: \u2014 \u2014 8-14, 1\n- **8**: 3, 5, 7 \u2014 11, 15, 2\n- **11**: 8 \u2014 15, 2\n- **14**: 5, 10 \u2014 15, 2\n- **15**: 6, 11, 14 \u2014 17-19, 21, 3\n- **21**: 15 \u2014 23, 24, 4\n\n> This is abbreviated for reference. YOUR generated plan must include the FULL matrix for ALL tasks.\n\n### Agent Dispatch Summary\n\n- **1**: **7** \u2014 T1-T4 \u2192 `quick`, T5 \u2192 `quick`, T6 \u2192 `quick`, T7 \u2192 `quick`\n- **2**: **7** \u2014 T8 \u2192 `deep`, T9 \u2192 `unspecified-high`, T10 \u2192 `unspecified-high`, T11 \u2192 `deep`, T12 \u2192 `visual-engineering`, T13 \u2192 `quick`, T14 \u2192 `unspecified-high`\n- **3**: **6** \u2014 T15 \u2192 `deep`, T16 \u2192 `visual-engineering`, T17-T19 \u2192 `quick`, T20 \u2192 `visual-engineering`\n- **4**: **4** \u2014 T21 \u2192 `deep`, T22 \u2192 `unspecified-high`, T23 \u2192 `deep`, T24 \u2192 `git`\n- **FINAL**: **4** \u2014 F1 \u2192 `oracle`, F2 \u2192 `unspecified-high`, F3 \u2192 `unspecified-high`, F4 \u2192 `deep`\n\n---\n\n## TODOs\n\n> Implementation + Test = ONE Task. Never separate.\n> EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.\n> **A task WITHOUT QA Scenarios is INCOMPLETE. No exceptions.**\n\n- [ ] 1. [Task Title]\n\n  **What to do**:\n  - [Clear implementation steps]\n  - [Test cases to cover]\n\n  **Must NOT do**:\n  - [Specific exclusions from guardrails]\n\n  **Recommended Agent Profile**:\n  > Select category + skills based on task domain. Justify each choice.\n  - **Category**: `[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]`\n    - Reason: [Why this category fits the task domain]\n  - **Skills**: [`skill-1`, `skill-2`]\n    - `skill-1`: [Why needed - domain overlap explanation]\n    - `skill-2`: [Why needed - domain overlap explanation]\n  - **Skills Evaluated but Omitted**:\n    - `omitted-skill`: [Why domain doesn't overlap]\n\n  **Parallelization**:\n  - **Can Run In Parallel**: YES | NO\n  - **Parallel Group**: Wave N (with Tasks X, Y) | Sequential\n  - **Blocks**: [Tasks that depend on this task completing]\n  - **Blocked By**: [Tasks this depends on] | None (can start immediately)\n\n  **References** (CRITICAL - Be Exhaustive):\n\n  > The executor has NO context from your interview. References are their ONLY guide.\n  > Each reference must answer: \"What should I look at and WHY?\"\n\n  **Pattern References** (existing code to follow):\n  - `src/services/auth.ts:45-78` - Authentication flow pattern (JWT creation, refresh token handling)\n\n  **API/Type References** (contracts to implement against):\n  - `src/types/user.ts:UserDTO` - Response shape for user endpoints\n\n  **Test References** (testing patterns to follow):\n  - `src/__tests__/auth.test.ts:describe(\"login\")` - Test structure and mocking patterns\n\n  **External References** (libraries and frameworks):\n  - Official docs: `https://zod.dev/?id=basic-usage` - Zod validation syntax\n\n  **WHY Each Reference Matters** (explain the relevance):\n  - Don't just list files - explain what pattern/information the executor should extract\n  - Bad: `src/utils.ts` (vague, which utils? why?)\n  - Good: `src/utils/validation.ts:sanitizeInput()` - Use this sanitization pattern for user input\n\n  **Acceptance Criteria**:\n\n  > **AGENT-EXECUTABLE VERIFICATION ONLY** \u2014 No human action permitted.\n  > Every criterion MUST be verifiable by running a command or using a tool.\n\n  **If TDD (tests enabled):**\n  - [ ] Test file created: src/auth/login.test.ts\n  - [ ] bun test src/auth/login.test.ts \u2192 PASS (3 tests, 0 failures)\n\n  **QA Scenarios (MANDATORY \u2014 task is INCOMPLETE without these):**\n\n  > **This is NOT optional. A task without QA scenarios WILL BE REJECTED.**\n  >\n  > Write scenario tests that verify the ACTUAL BEHAVIOR of what you built.\n  > Minimum: 1 happy path + 1 failure/edge case per task.\n  > Each scenario = exact tool + exact steps + exact assertions + evidence path.\n  >\n  > **The executing agent MUST run these scenarios after implementation.**\n  > **The orchestrator WILL verify evidence files exist before marking task complete.**\n\n  \\`\\`\\`\n  Scenario: [Happy path \u2014 what SHOULD work]\n    Tool: [Playwright / interactive_bash / Bash (curl)]\n    Preconditions: [Exact setup state]\n    Steps:\n      1. [Exact action \u2014 specific command/selector/endpoint, no vagueness]\n      2. [Next action \u2014 with expected intermediate state]\n      3. [Assertion \u2014 exact expected value, not \"verify it works\"]\n    Expected Result: [Concrete, observable, binary pass/fail]\n    Failure Indicators: [What specifically would mean this failed]\n    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}\n\n  Scenario: [Failure/edge case \u2014 what SHOULD fail gracefully]\n    Tool: [same format]\n    Preconditions: [Invalid input / missing dependency / error state]\n    Steps:\n      1. [Trigger the error condition]\n      2. [Assert error is handled correctly]\n    Expected Result: [Graceful failure with correct error message/code]\n    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}-error.{ext}\n  \\`\\`\\`\n\n  > **Specificity requirements \u2014 every scenario MUST use:**\n  > - **Selectors**: Specific CSS selectors (`.login-button`, not \"the login button\")\n  > - **Data**: Concrete test data (`\"test@example.com\"`, not `\"[email]\"`)\n  > - **Assertions**: Exact values (`text contains \"Welcome back\"`, not \"verify it works\")\n  > - **Timing**: Wait conditions where relevant (`timeout: 10s`)\n  > - **Negative**: At least ONE failure/error scenario per task\n  >\n  > **Anti-patterns (your scenario is INVALID if it looks like this):**\n  > - \u274C \"Verify it works correctly\" \u2014 HOW? What does \"correctly\" mean?\n  > - \u274C \"Check the API returns data\" \u2014 WHAT data? What fields? What values?\n  > - \u274C \"Test the component renders\" \u2014 WHERE? What selector? What content?\n  > - \u274C Any scenario without an evidence path\n\n  **Evidence to Capture:**\n  - [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}\n  - [ ] Screenshots for UI, terminal output for CLI, response bodies for API\n\n  **Commit**: YES | NO (groups with N)\n  - Message: `type(scope): desc`\n  - Files: `path/to/file`\n  - Pre-commit: `test command`\n\n---\n\n## Final Verification Wave (MANDATORY \u2014 after ALL implementation tasks)\n\n> 4 review agents run in PARALLEL. ALL must APPROVE. Present consolidated results to user and get explicit \"okay\" before completing.\n>\n> **Do NOT auto-proceed after verification. Wait for user's explicit approval before marking work complete.**\n> **Never mark F1-F4 as checked before getting user's okay.** Rejection or user feedback -> fix -> re-run -> present again -> wait for okay.\n\n- [ ] F1. **Plan Compliance Audit** \u2014 `oracle`\n  Read the plan end-to-end. For each \"Must Have\": verify implementation exists (read file, curl endpoint, run command). For each \"Must NOT Have\": search codebase for forbidden patterns \u2014 reject with file:line if found. Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan.\n  Output: `Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT`\n\n- [ ] F2. **Code Quality Review** \u2014 `unspecified-high`\n  Run `tsc --noEmit` + linter + `bun test`. Review all changed files for: `as any`/`@ts-ignore`, empty catches, console.log in prod, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic names (data/result/item/temp).\n  Output: `Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT`\n\n- [ ] F3. **Real Manual QA** \u2014 `unspecified-high` (+ `playwright` skill if UI)\n  Start from clean state. Execute EVERY QA scenario from EVERY task \u2014 follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, rapid actions. Save to `.sisyphus/evidence/final-qa/`.\n  Output: `Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT`\n\n- [ ] F4. **Scope Fidelity Check** \u2014 `deep`\n  For each task: read \"What to do\", read actual diff (git log/diff). Verify 1:1 \u2014 everything in spec was built (no missing), nothing beyond spec was built (no creep). Check \"Must NOT do\" compliance. Detect cross-task contamination: Task N touching Task M's files. Flag unaccounted changes.\n  Output: `Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT`\n\n---\n\n## Commit Strategy\n\n- **1**: `type(scope): desc` \u2014 file.ts, npm test\n\n---\n\n## Success Criteria\n\n### Verification Commands\n```bash\ncommand  # Expected: output\n```\n\n### Final Checklist\n- [ ] All \"Must Have\" present\n- [ ] All \"Must NOT Have\" absent\n- [ ] All tests pass\n```\n\n---\n";
+export declare const PROMETHEUS_PLAN_TEMPLATE = "## Plan Structure\n\nGenerate plan to: `.sisyphus/plans/{name}.md`\n\n```markdown\n# {Plan Title}\n\n## TL;DR\n\n> **Quick Summary**: [1-2 sentences capturing the core objective and approach]\n> \n> **Deliverables**: [Bullet list of concrete outputs]\n> - [Output 1]\n> - [Output 2]\n> \n> **Estimated Effort**: [Quick | Short | Medium | Large | XL]\n> **Parallel Execution**: [YES - N waves | NO - sequential]\n> **Critical Path**: [Task X \u2192 Task Y \u2192 Task Z]\n\n---\n\n## Context\n\n### Original Request\n[User's initial description]\n\n### Interview Summary\n**Key Discussions**:\n- [Point 1]: [User's decision/preference]\n- [Point 2]: [Agreed approach]\n\n**Research Findings**:\n- [Finding 1]: [Implication]\n- [Finding 2]: [Recommendation]\n\n### Metis Review\n**Identified Gaps** (addressed):\n- [Gap 1]: [How resolved]\n- [Gap 2]: [How resolved]\n\n---\n\n## Work Objectives\n\n### Core Objective\n[1-2 sentences: what we're achieving]\n\n### Concrete Deliverables\n- [Exact file/endpoint/feature]\n\n### Definition of Done\n- [ ] [Verifiable condition with command]\n\n### Must Have\n- [Non-negotiable requirement]\n\n### Must NOT Have (Guardrails)\n- [Explicit exclusion from Metis review]\n- [AI slop pattern to avoid]\n- [Scope boundary]\n\n---\n\n## Verification Strategy (MANDATORY)\n\n> **ZERO HUMAN INTERVENTION** - ALL verification is agent-executed. No exceptions.\n> Acceptance criteria requiring \"user manually tests/confirms\" are FORBIDDEN.\n\n### Test Decision\n- **Infrastructure exists**: [YES/NO]\n- **Automated tests**: [TDD / Tests-after / None]\n- **Framework**: [bun test / vitest / jest / pytest / none]\n- **If TDD**: Each task follows RED (failing test) \u2192 GREEN (minimal impl) \u2192 REFACTOR\n\n### QA Policy\nEvery task MUST include agent-executed QA scenarios (see TODO template below).\nEvidence saved to `.sisyphus/evidence/task-{N}-{scenario-slug}.{ext}`.\n\n- **Frontend/UI**: Use Playwright (playwright skill) - Navigate, interact, assert DOM, screenshot\n- **TUI/CLI**: Use interactive_bash (tmux) - Run command, send keystrokes, validate output\n- **API/Backend**: Use Bash (curl) - Send requests, assert status + response fields\n- **Library/Module**: Use Bash (bun/node REPL) - Import, call functions, compare output\n\n---\n\n## Execution Strategy\n\n### Parallel Execution Waves\n\n> Maximize throughput by grouping independent tasks into parallel waves.\n> Each wave completes before the next begins.\n> Target: 5-8 tasks per wave. Fewer than 3 per wave (except final) = under-splitting.\n\n```\nWave 1 (Start Immediately - foundation + scaffolding):\n\u251C\u2500\u2500 Task 1: Project scaffolding + config [quick]\n\u251C\u2500\u2500 Task 2: Design system tokens [quick]\n\u251C\u2500\u2500 Task 3: Type definitions [quick]\n\u251C\u2500\u2500 Task 4: Schema definitions [quick]\n\u251C\u2500\u2500 Task 5: Storage interface + in-memory impl [quick]\n\u251C\u2500\u2500 Task 6: Auth middleware [quick]\n\u2514\u2500\u2500 Task 7: Client module [quick]\n\nWave 2 (After Wave 1 - core modules, MAX PARALLEL):\n\u251C\u2500\u2500 Task 8: Core business logic (depends: 3, 5, 7) [deep]\n\u251C\u2500\u2500 Task 9: API endpoints (depends: 4, 5) [unspecified-high]\n\u251C\u2500\u2500 Task 10: Secondary storage impl (depends: 5) [unspecified-high]\n\u251C\u2500\u2500 Task 11: Retry/fallback logic (depends: 8) [deep]\n\u251C\u2500\u2500 Task 12: UI layout + navigation (depends: 2) [visual-engineering]\n\u251C\u2500\u2500 Task 13: API client + hooks (depends: 4) [quick]\n\u2514\u2500\u2500 Task 14: Telemetry middleware (depends: 5, 10) [unspecified-high]\n\nWave 3 (After Wave 2 - integration + UI):\n\u251C\u2500\u2500 Task 15: Main route combining modules (depends: 6, 11, 14) [deep]\n\u251C\u2500\u2500 Task 16: UI data visualization (depends: 12, 13) [visual-engineering]\n\u251C\u2500\u2500 Task 17: Deployment config A (depends: 15) [quick]\n\u251C\u2500\u2500 Task 18: Deployment config B (depends: 15) [quick]\n\u251C\u2500\u2500 Task 19: Deployment config C (depends: 15) [quick]\n\u2514\u2500\u2500 Task 20: UI request log + build (depends: 16) [visual-engineering]\n\nWave FINAL (After ALL tasks \u2014 4 parallel reviews, then user okay):\n\u251C\u2500\u2500 Task F1: Plan compliance audit (oracle)\n\u251C\u2500\u2500 Task F2: Code quality review (unspecified-high)\n\u251C\u2500\u2500 Task F3: Real manual QA (unspecified-high)\n\u2514\u2500\u2500 Task F4: Scope fidelity check (deep)\n-> Present results -> Get explicit user okay\n\nCritical Path: Task 1 \u2192 Task 5 \u2192 Task 8 \u2192 Task 11 \u2192 Task 15 \u2192 Task 21 \u2192 F1-F4 \u2192 user okay\nParallel Speedup: ~70% faster than sequential\nMax Concurrent: 7 (Waves 1 & 2)\n```\n\n### Dependency Matrix (abbreviated - show ALL tasks in your generated plan)\n\n- **1-7**: - - 8-14, 1\n- **8**: 3, 5, 7 - 11, 15, 2\n- **11**: 8 - 15, 2\n- **14**: 5, 10 - 15, 2\n- **15**: 6, 11, 14 - 17-19, 21, 3\n- **21**: 15 - 23, 24, 4\n\n> This is abbreviated for reference. YOUR generated plan must include the FULL matrix for ALL tasks.\n\n### Agent Dispatch Summary\n\n- **1**: **7** - T1-T4 \u2192 `quick`, T5 \u2192 `quick`, T6 \u2192 `quick`, T7 \u2192 `quick`\n- **2**: **7** - T8 \u2192 `deep`, T9 \u2192 `unspecified-high`, T10 \u2192 `unspecified-high`, T11 \u2192 `deep`, T12 \u2192 `visual-engineering`, T13 \u2192 `quick`, T14 \u2192 `unspecified-high`\n- **3**: **6** - T15 \u2192 `deep`, T16 \u2192 `visual-engineering`, T17-T19 \u2192 `quick`, T20 \u2192 `visual-engineering`\n- **4**: **4** - T21 \u2192 `deep`, T22 \u2192 `unspecified-high`, T23 \u2192 `deep`, T24 \u2192 `git`\n- **FINAL**: **4** - F1 \u2192 `oracle`, F2 \u2192 `unspecified-high`, F3 \u2192 `unspecified-high`, F4 \u2192 `deep`\n\n---\n\n## TODOs\n\n> Implementation + Test = ONE Task. Never separate.\n> EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.\n> **A task WITHOUT QA Scenarios is INCOMPLETE. No exceptions.**\n\n- [ ] 1. [Task Title]\n\n  **What to do**:\n  - [Clear implementation steps]\n  - [Test cases to cover]\n\n  **Must NOT do**:\n  - [Specific exclusions from guardrails]\n\n  **Recommended Agent Profile**:\n  > Select category + skills based on task domain. Justify each choice.\n  - **Category**: `[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]`\n    - Reason: [Why this category fits the task domain]\n  - **Skills**: [`skill-1`, `skill-2`]\n    - `skill-1`: [Why needed - domain overlap explanation]\n    - `skill-2`: [Why needed - domain overlap explanation]\n  - **Skills Evaluated but Omitted**:\n    - `omitted-skill`: [Why domain doesn't overlap]\n\n  **Parallelization**:\n  - **Can Run In Parallel**: YES | NO\n  - **Parallel Group**: Wave N (with Tasks X, Y) | Sequential\n  - **Blocks**: [Tasks that depend on this task completing]\n  - **Blocked By**: [Tasks this depends on] | None (can start immediately)\n\n  **References** (CRITICAL - Be Exhaustive):\n\n  > The executor has NO context from your interview. References are their ONLY guide.\n  > Each reference must answer: \"What should I look at and WHY?\"\n\n  **Pattern References** (existing code to follow):\n  - `src/services/auth.ts:45-78` - Authentication flow pattern (JWT creation, refresh token handling)\n\n  **API/Type References** (contracts to implement against):\n  - `src/types/user.ts:UserDTO` - Response shape for user endpoints\n\n  **Test References** (testing patterns to follow):\n  - `src/__tests__/auth.test.ts:describe(\"login\")` - Test structure and mocking patterns\n\n  **External References** (libraries and frameworks):\n  - Official docs: `https://zod.dev/?id=basic-usage` - Zod validation syntax\n\n  **WHY Each Reference Matters** (explain the relevance):\n  - Don't just list files - explain what pattern/information the executor should extract\n  - Bad: `src/utils.ts` (vague, which utils? why?)\n  - Good: `src/utils/validation.ts:sanitizeInput()` - Use this sanitization pattern for user input\n\n  **Acceptance Criteria**:\n\n  > **AGENT-EXECUTABLE VERIFICATION ONLY** - No human action permitted.\n  > Every criterion MUST be verifiable by running a command or using a tool.\n\n  **If TDD (tests enabled):**\n  - [ ] Test file created: src/auth/login.test.ts\n  - [ ] bun test src/auth/login.test.ts \u2192 PASS (3 tests, 0 failures)\n\n  **QA Scenarios (MANDATORY - task is INCOMPLETE without these):**\n\n  > **This is NOT optional. A task without QA scenarios WILL BE REJECTED.**\n  >\n  > Write scenario tests that verify the ACTUAL BEHAVIOR of what you built.\n  > Minimum: 1 happy path + 1 failure/edge case per task.\n  > Each scenario = exact tool + exact steps + exact assertions + evidence path.\n  >\n  > **The executing agent MUST run these scenarios after implementation.**\n  > **The orchestrator WILL verify evidence files exist before marking task complete.**\n\n  \\`\\`\\`\n  Scenario: [Happy path - what SHOULD work]\n    Tool: [Playwright / interactive_bash / Bash (curl)]\n    Preconditions: [Exact setup state]\n    Steps:\n      1. [Exact action - specific command/selector/endpoint, no vagueness]\n      2. [Next action - with expected intermediate state]\n      3. [Assertion - exact expected value, not \"verify it works\"]\n    Expected Result: [Concrete, observable, binary pass/fail]\n    Failure Indicators: [What specifically would mean this failed]\n    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}\n\n  Scenario: [Failure/edge case - what SHOULD fail gracefully]\n    Tool: [same format]\n    Preconditions: [Invalid input / missing dependency / error state]\n    Steps:\n      1. [Trigger the error condition]\n      2. [Assert error is handled correctly]\n    Expected Result: [Graceful failure with correct error message/code]\n    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}-error.{ext}\n  \\`\\`\\`\n\n  > **Specificity requirements - every scenario MUST use:**\n  > - **Selectors**: Specific CSS selectors (`.login-button`, not \"the login button\")\n  > - **Data**: Concrete test data (`\"test@example.com\"`, not `\"[email]\"`)\n  > - **Assertions**: Exact values (`text contains \"Welcome back\"`, not \"verify it works\")\n  > - **Timing**: Wait conditions where relevant (`timeout: 10s`)\n  > - **Negative**: At least ONE failure/error scenario per task\n  >\n  > **Anti-patterns (your scenario is INVALID if it looks like this):**\n  > - \u274C \"Verify it works correctly\" - HOW? What does \"correctly\" mean?\n  > - \u274C \"Check the API returns data\" - WHAT data? What fields? What values?\n  > - \u274C \"Test the component renders\" - WHERE? What selector? What content?\n  > - \u274C Any scenario without an evidence path\n\n  **Evidence to Capture:**\n  - [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}\n  - [ ] Screenshots for UI, terminal output for CLI, response bodies for API\n\n  **Commit**: YES | NO (groups with N)\n  - Message: `type(scope): desc`\n  - Files: `path/to/file`\n  - Pre-commit: `test command`\n\n---\n\n## Final Verification Wave (MANDATORY \u2014 after ALL implementation tasks)\n\n> 4 review agents run in PARALLEL. ALL must APPROVE. Present consolidated results to user and get explicit \"okay\" before completing.\n>\n> **Do NOT auto-proceed after verification. Wait for user's explicit approval before marking work complete.**\n> **Never mark F1-F4 as checked before getting user's okay.** Rejection or user feedback -> fix -> re-run -> present again -> wait for okay.\n\n- [ ] F1. **Plan Compliance Audit** \u2014 `oracle`\n  Read the plan end-to-end. For each \"Must Have\": verify implementation exists (read file, curl endpoint, run command). For each \"Must NOT Have\": search codebase for forbidden patterns \u2014 reject with file:line if found. Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan.\n  Output: `Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT`\n\n- [ ] F2. **Code Quality Review** \u2014 `unspecified-high`\n  Run `tsc --noEmit` + linter + `bun test`. Review all changed files for: `as any`/`@ts-ignore`, empty catches, console.log in prod, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic names (data/result/item/temp).\n  Output: `Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT`\n\n- [ ] F3. **Real Manual QA** \u2014 `unspecified-high` (+ `playwright` skill if UI)\n  Start from clean state. Execute EVERY QA scenario from EVERY task \u2014 follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, rapid actions. Save to `.sisyphus/evidence/final-qa/`.\n  Output: `Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT`\n\n- [ ] F4. **Scope Fidelity Check** \u2014 `deep`\n  For each task: read \"What to do\", read actual diff (git log/diff). Verify 1:1 \u2014 everything in spec was built (no missing), nothing beyond spec was built (no creep). Check \"Must NOT do\" compliance. Detect cross-task contamination: Task N touching Task M's files. Flag unaccounted changes.\n  Output: `Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT`\n\n---\n\n## Commit Strategy\n\n- **1**: `type(scope): desc` - file.ts, npm test\n\n---\n\n## Success Criteria\n\n### Verification Commands\n```bash\ncommand  # Expected: output\n```\n\n### Final Checklist\n- [ ] All \"Must Have\" present\n- [ ] All \"Must NOT Have\" absent\n- [ ] All tests pass\n```\n\n---\n";

package/dist/agents/sisyphus/gpt-5-4.d.ts CHANGED Viewed

@@ -1,24 +1,24 @@
 /**
- * GPT-5.4-native Sisyphus prompt — rewritten with 8-block architecture.
+ * GPT-5.4-native Sisyphus prompt - rewritten with 8-block architecture.
  *
  * Design principles (derived from OpenAI's GPT-5.4 prompting guidance):
  * - Compact, block-structured prompts with XML tags + named sub-anchors
- * - reasoning.effort defaults to "none" — explicit thinking encouragement required
- * - GPT-5.4 generates preambles natively — do NOT add preamble instructions
- * - GPT-5.4 follows instructions well — less repetition, fewer threats needed
+ * - reasoning.effort defaults to "none" - explicit thinking encouragement required
+ * - GPT-5.4 generates preambles natively - do NOT add preamble instructions
+ * - GPT-5.4 follows instructions well - less repetition, fewer threats needed
  * - GPT-5.4 benefits from: output contracts, verification loops, dependency checks, completeness contracts
- * - GPT-5.4 can be over-literal — add intent inference layer for nuanced behavior
- * - "Start with the smallest prompt that passes your evals" — keep it dense
+ * - GPT-5.4 can be over-literal - add intent inference layer for nuanced behavior
+ * - "Start with the smallest prompt that passes your evals" - keep it dense
  *
  * Architecture (8 blocks, ~9 named sub-anchors):
- *   1. <identity>          — Role, instruction priority, orchestrator bias
- *   2. <constraints>       — Hard blocks + anti-patterns (early placement for GPT-5.4 attention)
- *   3. <intent>            — Think-first + intent gate + autonomy (merged, domain_guess routing)
- *   4. <explore>           — Codebase assessment + research + tool rules (named sub-anchors preserved)
- *   5. <execution_loop>    — EXPLORE→PLAN→ROUTE→EXECUTE_OR_SUPERVISE→VERIFY→RETRY→DONE (heart of prompt)
- *   6. <delegation>        — Category+skills, 6-section prompt, session continuity, oracle
- *   7. <tasks>             — Task/todo management
- *   8. <style>             — Tone (prose) + output contract + progress updates
+ *   1. <identity>          - Role, instruction priority, orchestrator bias
+ *   2. <constraints>       - Hard blocks + anti-patterns (early placement for GPT-5.4 attention)
+ *   3. <intent>            - Think-first + intent gate + autonomy (merged, domain_guess routing)
+ *   4. <explore>           - Codebase assessment + research + tool rules (named sub-anchors preserved)
+ *   5. <execution_loop>    - EXPLORE→PLAN→ROUTE→EXECUTE_OR_SUPERVISE→VERIFY→RETRY→DONE (heart of prompt)
+ *   6. <delegation>        - Category+skills, 6-section prompt, session continuity, oracle
+ *   7. <tasks>             - Task/todo management
+ *   8. <style>             - Tone (prose) + output contract + progress updates
  */
 import type { AvailableAgent, AvailableTool, AvailableSkill, AvailableCategory } from "../dynamic-agent-prompt-builder";
 import { categorizeTools } from "../dynamic-agent-prompt-builder";

package/dist/agents/sisyphus/index.d.ts CHANGED Viewed

@@ -1,5 +1,5 @@
 /**
- * Sisyphus agent — multi-model orchestrator.
+ * Sisyphus agent - multi-model orchestrator.
  *
  * This directory contains model-specific prompt variants:
  * - default.ts: Base implementation for Claude and general models

package/dist/agents/sisyphus.d.ts CHANGED Viewed

@@ -4,5 +4,5 @@ export declare const SISYPHUS_PROMPT_METADATA: AgentPromptMetadata;
 import type { AvailableAgent, AvailableSkill, AvailableCategory } from "./dynamic-agent-prompt-builder";
 export declare function createSisyphusAgent(model: string, availableAgents?: AvailableAgent[], availableToolNames?: string[], availableSkills?: AvailableSkill[], availableCategories?: AvailableCategory[], useTaskSystem?: boolean): AgentConfig;
 export declare namespace createSisyphusAgent {
-    var mode: "all";
+    var mode: "primary";
 }

package/dist/agents/types.d.ts CHANGED Viewed

@@ -56,6 +56,7 @@ export declare function isGptModel(model: string): boolean;
 export declare function isGpt5_4Model(model: string): boolean;
 export declare function isGpt5_3CodexModel(model: string): boolean;
 export declare function isMiniMaxModel(model: string): boolean;
+export declare function isGlmModel(model: string): boolean;
 export declare function isGeminiModel(model: string): boolean;
 export type BuiltinAgentName = "sisyphus" | "hephaestus" | "oracle" | "librarian" | "explore" | "multimodal-looker" | "metis" | "momus" | "atlas" | "sisyphus-junior";
 export type OverridableAgentName = "build" | BuiltinAgentName;