npm - @exaudeus/workrail - Versions diffs - 0.7.0 → 0.7.1-beta.0 - Mend

@exaudeus/workrail 0.7.0 → 0.7.1-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/package.json +1 -1
package/workflows/CHANGELOG-bug-investigation.md +28 -0
package/workflows/systematic-bug-investigation-with-loops.json +28 -71

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "0.7.0",
+  "version": "0.7.1-beta.0",
   "description": "MCP server for structured workflow orchestration and step-by-step task guidance",
   "license": "MIT",
   "bin": {

package/workflows/CHANGELOG-bug-investigation.md CHANGED Viewed

@@ -1,5 +1,33 @@
 # Changelog - Systematic Bug Investigation Workflow
+## [1.1.0-beta.17] - 2025-01-06
+### Major Restructuring
+- **Phase 0 Consolidation**: Merged 4 separate Phase 0 steps into single comprehensive setup step
+  - Combined: Triage (0), User Preferences (0a), Tool Check (0b), Context Creation (0c)
+  - Result: Single "Phase 0: Complete Investigation Setup" step covering all mechanical preparation
+  - Rationale: Reduce workflow overhead while maintaining thorough setup
+  - New structure: Phase 0 (Setup) → Phase 0a (Commitment Checkpoint, conditional)
+- **Assumption Verification Relocation**: Moved from Phase 0a to Phase 1f
+  - Previously: Early assumption check before ANY code analysis (removed)
+  - Now: Assumption verification AFTER all 5 analysis iterations complete (Phase 1f Step 2.5)
+  - Rationale: Assumptions can only be properly verified with full code context
+  - Timing: Happens after neighborhood mapping, pattern analysis, component ranking, data flow tracing, and test gap analysis
+  - Location: Integrated into Phase 1f "Final Breadth & Scope Verification" before hypothesis development
+### Impact
+- **Step Count**: Reduced from 27 steps to 23 steps (4 Phase 0 steps → 1)
+- **Phase Numbering**: Simplified Phase 0 structure (Phase 0d → Phase 0a)
+- **Debugging Workflow Alignment**: Better follows traditional debugging principles (observe fully THEN question assumptions THEN hypothesize)
+- **Agent Experience**: Faster setup phase, more informed assumption checking
+### Breaking Changes
+- `completedSteps` array format changed:
+  - OLD: `["phase-0-triage", "phase-0a-user-preferences", "phase-0b-tool-check", "phase-0c-create-context", "phase-0d-workflow-commitment"]`
+  - NEW: `["phase-0-complete-setup", "phase-0a-workflow-commitment"]`
+- Step IDs changed: `phase-0d-workflow-commitment` → `phase-0a-workflow-commitment`
 ## [1.1.0-beta.9] - 2025-01-06
 ### Enhanced

package/workflows/systematic-bug-investigation-with-loops.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "id": "systematic-bug-investigation-with-loops",
   "name": "Systematic Bug Investigation Workflow",
-  "version": "1.1.0-beta.8",
+  "version": "1.1.0-beta.17",
   "description": "A comprehensive workflow for systematic bug and failing test investigation that prevents LLMs from jumping to conclusions. Enforces thorough evidence gathering, hypothesis formation, debugging instrumentation, and validation to achieve near 100% certainty about root causes. This workflow does NOT fix bugs - it produces detailed diagnostic writeups that enable effective fixing by providing complete understanding of what is happening, why it's happening, and supporting evidence.",
   "clarificationPrompts": [
     "What type of system is this? (web app, mobile app, backend service, desktop app, etc.)",
@@ -26,7 +26,10 @@
     "You CANNOT \"figure out the bug\" and stop. You MUST execute all 27 workflow steps by repeatedly calling workflow_next and following instructions until the MCP returns isComplete=true.",
     "WORKFLOW MECHANICS: Each call to workflow_next returns the next required step. You MUST execute that step, then call workflow_next again. Repeat until isComplete=true.",
     "DO NOT STOP CALLING WORKFLOW_NEXT: Even if you think you know the bug, even if you have high confidence, even if it seems obvious - you MUST continue calling workflow_next.",
-    "STEP COUNTER: Every prompt shows \"Step X of 27\" - you are NOT done until you reach Step 29/29 and isComplete=true.",
+    "STEP COUNTER: Every prompt shows \"Step X of 26\" - you are NOT done until you reach Step 23/23 and isComplete=true.",
+    "**\ud83c\udfaf PHASE 0 = PURE SETUP (NO ANALYSIS):**",
+    "Phase 0 is MECHANICAL SETUP ONLY: triage, user preferences, tool checking, context creation. No code analysis, no assumption checking. That comes in Phase 1.",
+    "Phase 1a NOW includes assumption verification - AFTER you've seen the code and built structural understanding. You can't meaningfully question assumptions before understanding the codebase.",
     "**\ud83d\udea8 CRITICAL: ANALYSIS \u2260 DIAGNOSIS \u2260 PROOF:**",
     "AFTER PHASE 1 (Analysis): You have analyzed code and identified suspicious patterns. This is NOT proof. You have ZERO evidence yet. You are ~20% done.",
     "AFTER PHASE 2 (Hypotheses): You have theories about the bug. This is NOT proof. You still have ZERO evidence. You are ~40% done.",
@@ -96,76 +99,25 @@
   ],
   "steps": [
     {
-      "id": "phase-0-triage",
-      "title": "Phase 0: Initial Triage & Context Gathering",
-      "prompt": "**SYSTEMATIC INVESTIGATION BEGINS** - Your mission is to achieve near 100% certainty about this bug's root cause through systematic evidence gathering. NO FIXES will be proposed until Phase 6.\n\n**STEP 1: Bug Report Analysis**\nPlease provide the complete bug context:\n- **Bug Description**: What is the observed behavior vs expected behavior?\n- **Error Messages/Stack Traces**: Paste the complete error output\n- **Reproduction Steps**: How can this bug be consistently reproduced?\n- **Environment Details**: OS, language version, framework version, etc.\n- **Recent Changes**: Any recent commits, deployments, or configuration changes?\n\n**STEP 2: Project Type Classification**\nBased on the information provided, I will classify the project type and set debugging strategies:\n- **Languages/Frameworks**: Primary tech stack\n- **Build System**: Maven, Gradle, npm, etc.\n- **Testing Framework**: JUnit, Jest, pytest, etc.\n- **Logging System**: Available logging mechanisms\n- **Architecture**: Monolithic, microservices, distributed, serverless, etc.\n\n**STEP 3: Complexity Assessment**\nI will analyze the bug complexity using these criteria:\n- **Simple**: Single function/method, clear error path, minimal dependencies\n- **Standard**: Multiple components, moderate investigation required\n- **Complex**: Cross-system issues, race conditions, complex state management\n\n**STEP 4: Automation Level Selection**\nAsk the user: \"What automation level would you prefer for this investigation?\"\n- **High**: Auto-approve decisions with confidence >8.0, minimal confirmations\n- **Medium**: Standard confirmations for key decisions\n- **Low**: Extra confirmations for safety, manual approval for all changes\n\n**OUTPUTS**: Set context variables:\n- `projectType`, `bugComplexity`, `debuggingMechanism`\n- `isDistributed` (true if architecture involves microservices/distributed systems)\n- `automationLevel` (High/Medium/Low based on user preference)",
-      "agentRole": "You are a senior debugging specialist and bug triage expert with 15+ years of experience across multiple technology stacks. Your expertise lies in quickly classifying bugs, understanding project architectures, and determining appropriate investigation strategies. You excel at extracting critical information from bug reports and setting up systematic investigation approaches.",
+      "id": "phase-0-complete-setup",
+      "title": "Phase 0: Complete Investigation Setup",
+      "prompt": "**SYSTEMATIC INVESTIGATION SETUP** - Complete all mechanical setup before analysis begins.\n\n**This phase is PURELY MECHANICAL - no code analysis or hypothesis formation yet.**\n\n---\n\n**PART 1: Bug Report Triage**\n\nPlease provide complete bug context:\n- **Bug Description**: Observed vs expected behavior?\n- **Error Messages/Stack Traces**: Complete error output\n- **Reproduction Steps**: Consistent reproduction method?\n- **Environment Details**: OS, language/framework versions\n- **Recent Changes**: Commits, deployments, config changes?\n\n**Classify Project Type:**\n- Languages/Frameworks (primary tech stack)\n- Build System (Maven, Gradle, npm, etc.)\n- Testing Framework (JUnit, Jest, pytest, etc.)\n- Logging System (available mechanisms)\n- Architecture (monolithic, microservices, distributed, serverless)\n\n**Assess Bug Complexity:**\n- Simple: Single function, clear error path, minimal dependencies\n- Standard: Multiple components, moderate investigation required\n- Complex: Cross-system, race conditions, complex state management\n\n**Determine Automation Level:**\nAsk user: \"What automation level for this investigation?\"\n- High: Auto-approve decisions >8.0 confidence, minimal confirmations\n- Medium: Standard confirmations for key decisions\n- Low: Extra confirmations, manual approval for all changes\n\n---\n\n**PART 2: User Debugging Preferences**\n\n**Check for preferences in:**\n- User settings/memory\n- Project documentation (team standards)\n- Previous instructions/guidance\n\n**Categorize preferences:**\n- Debugging Tools: debugger vs logs vs traces\n- Log Verbosity: detailed vs concise\n- Output Format: structured vs human-readable\n- Testing Approach: unit vs integration test focus\n- Commit Style: conventional vs descriptive\n- Documentation: inline comments vs separate docs\n- Error Handling: fail fast vs defensive\n\n**If no explicit preferences, ask user:**\n- \"Verbose logging or concise summaries?\"\n- \"Interactive debuggers or log analysis?\"\n- \"Any specific tools or approaches your team prefers?\"\n\n---\n\n**PART 3: Tool Availability Check**\n\n**Verify core tools:**\n\n1. **Analysis Tools**: Test availability of grep_search, read_file, codebase_search\n2. **Git Operations**: Check `git --version`, set gitAvailable flag\n3. **Build/Test Tools** (based on projectType): npm/yarn, Maven/Gradle, pytest, etc.\n4. **Debugging Tools**: Language-specific debuggers, profilers, log aggregation\n\n**Fallback strategies if tools unavailable:**\n- grep_search fails \u2192 use file_search\n- codebase_search fails \u2192 use grep_search with context\n- Git unavailable \u2192 track changes in INVESTIGATION_CONTEXT.md\n- Build tools missing \u2192 focus on static analysis\n\n---\n\n**PART 4: Initialize Investigation Context Document**\n\nUse createInvestigationBranch() for version control, then create INVESTIGATION_CONTEXT.md with bug summary, progress tracking, environment setup, and resumption instructions.\n\n**REQUIRED OUTPUTS:**\n\nSet ALL context variables:\n- `projectType`, `bugComplexity`, `debuggingMechanism`, `isDistributed`\n- `automationLevel` (High/Medium/Low)\n- `userDebugPreferences` (categorized preferences object)\n- `availableTools` (array of available tool names)\n- `gitAvailable` (boolean)\n- `toolLimitations` (string describing any restrictions)\n- `contextInitialized` = true\n\nCreate comprehensive INVESTIGATION_CONTEXT.md with all function definitions from metaGuidance.",
+      "agentRole": "You are a senior investigation setup specialist with expertise in triage, environment configuration, and systematic investigation preparation. You excel at gathering complete context and preparing comprehensive investigation infrastructure.",
       "guidance": [
-        "CLASSIFICATION ACCURACY: Proper complexity assessment determines investigation depth - be thorough but decisive",
-        "CONTEXT CAPTURE: Gather complete environmental and situational context now to avoid gaps later",
-        "DEBUGGING STRATEGY: Choose debugging mechanisms appropriate for the project type and bug complexity",
-        "NO ASSUMPTIONS: If critical information is missing, explicitly request it before proceeding"
-      ]
-    },
-    {
-      "id": "phase-0a-assumption-check",
-      "title": "Phase 0a: Assumption Verification Checkpoint",
-      "prompt": "**ASSUMPTION CHECK** - Before proceeding, verify key assumptions to prevent bias.\n\n**VERIFY**:\n1. **Data State**: Confirm variable types and null handling\n2. **API/Library**: Check documentation for actual vs assumed behavior\n3. **Environment**: Verify bug exists in clean environment\n4. **Recent Changes**: Review last 5 commits for relevance\n\n**OUTPUT**: List verified assumptions with evidence sources.",
-      "agentRole": "You are a skeptical analyst who challenges every assumption. Question everything that hasn't been explicitly verified.",
-      "guidance": [
-        "Use analysis tools to verify, don't assume",
-        "Document each assumption with its verification method",
-        "Flag any unverifiable assumptions for tracking",
-        "CHECK API DOCS: Never assume function behavior from names - verify actual documentation",
-        "VERIFY DATA TYPES: Use debugger or logs to confirm actual runtime types and values",
-        "TEST ENVIRONMENT: Reproduce in minimal environment to rule out configuration issues"
-      ]
-    },
-    {
-      "id": "phase-0b-user-preferences",
-      "title": "Phase 0b: Identify User Debugging Preferences",
-      "prompt": "**USER DEBUGGING PREFERENCES** - Identify and document user-specific debugging preferences.\n\n**CHECK FOR PREFERENCES IN:**\n1. **User Settings/Memory**: Any stored debugging preferences\n2. **Project Documentation**: Team debugging standards\n3. **Previous Instructions**: Past user guidance on debugging approach\n\n**CATEGORIZE PREFERENCES:**\n- **Debugging Tools**: Preference for debugger vs logs vs traces\n- **Log Verbosity**: Detailed vs concise output\n- **Output Format**: Structured logs vs human-readable\n- **Testing Approach**: Unit tests vs integration tests focus\n- **Commit Style**: Conventional commits vs descriptive\n- **Documentation**: Inline comments vs separate docs\n- **Error Handling**: Fail fast vs defensive programming\n\n**IF NO EXPLICIT PREFERENCES:**\nAsk user:\n- \"Do you prefer verbose logging or concise summaries?\"\n- \"Should I use interactive debuggers or rely on log analysis?\"\n- \"Any specific tools or approaches your team prefers?\"\n\n**OUTPUT**: Set `userDebugPreferences` context variable with categorized preferences.\n\n**APPLY**: Use applyDebugPreferences() throughout investigation to adapt approach.",
-      "agentRole": "You are a debugging preferences specialist who understands how different teams and developers approach problem-solving. You excel at identifying and applying user-specific debugging styles.",
-      "guidance": [
-        "This step ensures the investigation aligns with user/team practices",
-        "Capture both explicit and implicit preferences",
-        "Default to standard practices if no preferences found",
-        "These preferences will be applied throughout the workflow"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-0c-tool-check",
-      "title": "Phase 0c: Tool Availability Verification",
-      "prompt": "**TOOL AVAILABILITY CHECK** - Verify required debugging tools before investigation.\n\n**CORE TOOLS CHECK:**\n1. **Analysis Tools**:\n   - grep_search: Text pattern searching\n   - read_file: File content reading\n   - codebase_search: Semantic code search\n   - Test availability, note any failures\n\n2. **Git Operations**:\n   - Check git availability: `git --version`\n   - If unavailable, set `gitAvailable = false`\n   - Plan fallback: manual change tracking\n\n3. **Build/Test Tools** (based on projectType):\n   - npm/yarn for JavaScript\n   - Maven/Gradle for Java\n   - pytest/unittest for Python\n   - Document which are available\n\n4. **Debugging Tools**:\n   - Language-specific debuggers\n   - Profilers if needed\n   - Log aggregation tools\n\n**FALLBACK STRATEGIES:**\n- grep_search fails \u2192 use file_search\n- codebase_search fails \u2192 use grep_search with context\n- Git unavailable \u2192 track changes in INVESTIGATION_CONTEXT.md\n- Build tools missing \u2192 focus on static analysis\n\n**OUTPUT**:\n- Set `availableTools` context variable\n- Set `toolLimitations` with any restrictions\n- Document fallback strategies in context\n\n**ADAPTATION**: Adjust investigation approach based on available tools.",
-      "agentRole": "You are a tool availability specialist ensuring the investigation can proceed smoothly with available resources. You excel at creating fallback strategies.",
-      "guidance": [
-        "Test each tool category systematically",
-        "Don't fail if some tools are unavailable - adapt",
-        "Document limitations clearly for user awareness",
-        "Prefer degraded functionality over investigation failure"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-0d-create-context",
-      "title": "Phase 0d: Initialize Investigation Context",
-      "prompt": "**CREATE INVESTIGATION CONTEXT** - Initialize comprehensive tracking document.\n\nUse createInvestigationBranch() to set up version control, then create INVESTIGATION_CONTEXT.md:\n\n```markdown\n# Investigation Context\n\n## 1. Bug Summary\n- **ID**: {{bugId || 'investigation-' + Date.now()}}\n- **Description**: [from bug report]\n- **Complexity**: {{bugComplexity}}\n- **Started**: {{new Date().toISOString()}}\n- **Status**: Phase 0d - Context Initialization\n- **Automation Level**: {{automationLevel}}\n\n## 2. Progress Tracking\n{{visualProgress()}}\n\u2705 Completed: Phase 0 (Triage), Phase 0a (Assumptions), Phase 0b (User Preferences), Phase 0c (Tools)\n\ud83d\udd04 Current: Phase 0d (Context Creation)\n\u23f3 Remaining: Phase 1 (Analysis), Phase 2 (Hypotheses), Phase 3-5 (Validation), Phase 6 (Writeup)\n\ud83d\udcca Confidence: 0/10\n\n## 3. Environment & Setup\n- **Project Type**: {{projectType}}\n- **Debugging Mechanism**: {{debuggingMechanism}}\n- **Architecture**: {{isDistributed ? 'Distributed' : 'Monolithic'}}\n- **User Preferences**: {{userDebugPreferences}}\n- **Available Tools**: {{availableTools}}\n- **Tool Limitations**: {{toolLimitations || 'None'}}\n\n## 4. Analysis Findings\n*To be populated during Phase 1*\n\n## 5. Hypothesis Registry\n*To be populated during Phase 2*\n\n## 6. Evidence Log\n*To be populated during validation*\n\n## 7. Experiment Results\n*To be populated if experiments conducted*\n\n## 8. Dead Ends & Lessons\n*Track approaches that didn't work*\n\n## 9. Function Definitions\n[Include all function definitions from metaGuidance for reference]\n\n## 10. Resumption Instructions\n\n### How to Resume This Investigation\n\n1. **Get the workflow**: Call `workflow_get` with:\n   - id: \"systematic-bug-investigation-with-loops\"\n   - mode: \"preview\" (to see next step)\n\n2. **Resume from saved state**: Call `workflow_next` with the JSON below:\n\n```json\n{\n  \"workflowId\": \"systematic-bug-investigation-with-loops\",\n  \"completedSteps\": [\"phase-0-triage\", \"phase-0a-assumption-check\", \"phase-0b-user-preferences\", \"phase-0c-tool-check\", \"phase-0d-create-context\"],\n  \"context\": {\n    \"bugComplexity\": \"{{bugComplexity}}\",\n    \"projectType\": \"{{projectType}}\",\n    \"debuggingMechanism\": \"{{debuggingMechanism}}\",\n    \"isDistributed\": {{isDistributed || false}},\n    \"automationLevel\": \"{{automationLevel}}\",\n    \"userDebugPreferences\": {{JSON.stringify(userDebugPreferences)}},\n    \"availableTools\": {{JSON.stringify(availableTools)}},\n    \"toolLimitations\": {{JSON.stringify(toolLimitations)}}\n  }\n}\n```\n\n3. **Continue investigation**: The workflow will pick up from where it left off\n\n### Important Notes\n- Update `completedSteps` array after completing each phase\n- Preserve all context variables for proper state restoration\n- This JSON should be updated after major milestones\n```\n\n**Set `contextInitialized` = true**",
-      "agentRole": "You are creating the central documentation hub for this investigation. This document will track all progress, findings, and enable seamless handoffs.",
-      "guidance": [
-        "Create a comprehensive but scannable document",
-        "Include all context variables discovered so far",
-        "Set up structure for future updates",
-        "Include function definitions for reference",
-        "Update the resumption JSON after each major phase using addResumptionJson()",
-        "Always include the workflow_get and workflow_next instructions for proper resumption"
+        "This phase is MECHANICAL ONLY - no code analysis or hypothesis formation",
+        "Complete all 4 parts thoroughly before proceeding",
+        "If critical information is missing, explicitly request it",
+        "Test tool availability - don't assume",
+        "Create comprehensive INVESTIGATION_CONTEXT.md for handoffs",
+        "All context variables must be set before next phase",
+        "This setup enables the entire investigation workflow"
       ],
       "requireConfirmation": false
     },
     {
-      "id": "phase-0e-workflow-commitment",
-      "title": "Phase 0e: Workflow Execution Commitment & Early Termination Checkpoint",
-      "prompt": "**\u26a0\ufe0f WORKFLOW EXECUTION COMMITMENT CHECKPOINT \u26a0\ufe0f**\n\n*(Note: This checkpoint only appears in Medium/Low automation modes. High automation mode proceeds automatically.)*\n\nYou have completed Phase 0 (Triage & Setup). Before proceeding to the investigation phases, you MUST acknowledge your understanding of workflow execution requirements AND make a critical decision.\n\n**CRITICAL UNDERSTANDING:**\n\n1. **This is a 26-step structured workflow, not freestyle debugging**\n   - You MUST call workflow_next repeatedly until isComplete=true\n   - You CANNOT stop early, even if you think you know the bug\n   - You CANNOT \"figure it out\" and skip steps\n\n2. **Professional research shows 90% error rate for premature conclusions**\n   - Even with 9-10/10 confidence, skipping systematic steps leads to wrong conclusions\n   - Edge cases, alternative explanations, and interaction effects are missed\n   - The workflow FORCES thoroughness for a reason\n\n3. **Remaining phases you MUST complete (regardless of confidence):**\n   - \u2705 Phase 0: Triage & Setup (COMPLETED)\n   - \u23f3 Phase 1: Multi-Dimensional Codebase Analysis (5 sub-phases)\n   - \u23f3 Phase 2: Hypothesis Formation & Validation (8 sub-phases)\n   - \u23f3 Phase 3: Debugging Instrumentation\n   - \u23f3 Phase 4: Evidence Collection & Experimentation\n   - \u23f3 Phase 5: Final Analysis & Confidence Assessment\n   - \u23f3 Phase 6: Comprehensive Diagnostic Writeup (REQUIRED DELIVERABLE)\n\n4. **What \"completion\" means:**\n   - isComplete=true from workflow_next (this happens ONLY after Phase 6)\n   - Comprehensive diagnostic writeup produced\n   - All evidence documented\n   - Investigation context finalized\n\n**DECISION POINT: Continue Full Investigation or Request Early Termination?**\n\n**OPTION 1: Complete Full Investigation (STRONGLY RECOMMENDED)**\n- Proceed through all 28 workflow steps\n- Professional standard for thorough bug investigations\n- 90% success rate for identifying true root cause\n- Produces comprehensive diagnostic writeup with actionable findings\n\n**OPTION 2: Request Early Termination (STRONGLY DISCOURAGED)**\n- Skip systematic investigation and attempt to diagnose now\n- \u26a0\ufe0f **EVIDENCE-BASED WARNING:**\n  - **Professional research spanning 20+ years shows that agents who skip systematic investigation are wrong about the root cause ~90% of the time**\n  - Even if you feel 9-10/10 confident right now, completing the full workflow dramatically increases accuracy\n  - Quick conclusions miss: contracts & invariants, pattern deviations, HOT paths, edge cases, interaction effects, and alternative explanations\n  - The full workflow takes ~30-60 minutes but increases fix success rate from ~10% to ~90%\n\n**REQUIRED AGENT ACTIONS:**\n\n1. **Acknowledge workflow commitment:**\n\"I acknowledge that I am executing a structured 26-step workflow and I MUST call workflow_next for every step until isComplete=true. I understand that stopping early leads to incorrect conclusions ~90% of the time. I will complete all phases (0-6) regardless of my confidence level.\"\n\n2. **State your recommendation:**\nYou MUST recommend OPTION 1 (complete full investigation) unless you have truly exceptional circumstances.\n\n3. **MANDATORY USER COMMUNICATION:**\nYou MUST explicitly tell the user (this is NOT optional):\n\"**I strongly recommend we complete the full systematic investigation (26 steps through Phase 6). Professional research shows this approach identifies the TRUE root cause ~90% of the time, compared to ~10% for quick conclusions. Even if I develop high confidence early, completing the full workflow\u2014including contracts analysis, pattern discovery, HOT path analysis, instrumentation, and evidence collection\u2014dramatically increases the likelihood of correctly identifying the root cause and preventing wasted time on wrong fixes.**\n\nDo you want to proceed with the full investigation (recommended), or would you prefer I attempt a quick diagnosis now (discouraged)?\"\n\n**USER CONFIRMATION REQUIRED:**\nThe user must explicitly choose to proceed with full investigation or request early termination.",
+      "id": "phase-0a-workflow-commitment",
+      "title": "Phase 0a: Workflow Execution Commitment & Early Termination Checkpoint",
+      "prompt": "**\u26a0\ufe0f WORKFLOW EXECUTION COMMITMENT CHECKPOINT \u26a0\ufe0f**\n\n*(Note: This checkpoint only appears in Medium/Low automation modes. High automation mode proceeds automatically.)*\n\nYou have completed Phase 0 (Complete Setup). Before proceeding to the investigation phases, you MUST acknowledge your understanding of workflow execution requirements AND make a critical decision.\n\n**CRITICAL UNDERSTANDING:**\n\n1. **This is a 23-step structured workflow, not freestyle debugging**\n   - You MUST call workflow_next repeatedly until isComplete=true\n   - You CANNOT stop early, even if you think you know the bug\n   - You CANNOT \"figure it out\" and skip steps\n\n2. **Professional research shows 90% error rate for premature conclusions**\n   - Even with 9-10/10 confidence, skipping systematic steps leads to wrong conclusions\n   - Edge cases, alternative explanations, and interaction effects are missed\n   - The workflow FORCES thoroughness for a reason\n\n3. **Remaining phases you MUST complete (regardless of confidence):**\n   - \u2705 Phase 0: Triage & Setup (COMPLETED)\n   - \u23f3 Phase 1: Multi-Dimensional Codebase Analysis (5 sub-phases)\n   - \u23f3 Phase 2: Hypothesis Formation & Validation (8 sub-phases)\n   - \u23f3 Phase 3: Debugging Instrumentation\n   - \u23f3 Phase 4: Evidence Collection & Experimentation\n   - \u23f3 Phase 5: Final Analysis & Confidence Assessment\n   - \u23f3 Phase 6: Comprehensive Diagnostic Writeup (REQUIRED DELIVERABLE)\n\n4. **What \"completion\" means:**\n   - isComplete=true from workflow_next (this happens ONLY after Phase 6)\n   - Comprehensive diagnostic writeup produced\n   - All evidence documented\n   - Investigation context finalized\n\n**DECISION POINT: Continue Full Investigation or Request Early Termination?**\n\n**OPTION 1: Complete Full Investigation (STRONGLY RECOMMENDED)**\n- Proceed through all 23 workflow steps\n- Professional standard for thorough bug investigations\n- 90% success rate for identifying true root cause\n- Produces comprehensive diagnostic writeup with actionable findings\n\n**OPTION 2: Request Early Termination (STRONGLY DISCOURAGED)**\n- Skip systematic investigation and attempt to diagnose now\n- \u26a0\ufe0f **EVIDENCE-BASED WARNING:**\n  - **Professional research spanning 20+ years shows that agents who skip systematic investigation are wrong about the root cause ~90% of the time**\n  - Even if you feel 9-10/10 confident right now, completing the full workflow dramatically increases accuracy\n  - Quick conclusions miss: contracts & invariants, pattern deviations, HOT paths, edge cases, interaction effects, and alternative explanations\n  - The full workflow takes ~30-60 minutes but increases fix success rate from ~10% to ~90%\n\n**REQUIRED AGENT ACTIONS:**\n\n1. **Acknowledge workflow commitment:**\n\"I acknowledge that I am executing a structured 23-step workflow and I MUST call workflow_next for every step until isComplete=true. I understand that stopping early leads to incorrect conclusions ~90% of the time. I will complete all phases (0-6) regardless of my confidence level.\"\n\n2. **State your recommendation:**\nYou MUST recommend OPTION 1 (complete full investigation) unless you have truly exceptional circumstances.\n\n3. **MANDATORY USER COMMUNICATION:**\nYou MUST explicitly tell the user (this is NOT optional):\n\"**I strongly recommend we complete the full systematic investigation (26 steps through Phase 6). Professional research shows this approach identifies the TRUE root cause ~90% of the time, compared to ~10% for quick conclusions. Even if I develop high confidence early, completing the full workflow\u2014including contracts analysis, pattern discovery, HOT path analysis, instrumentation, and evidence collection\u2014dramatically increases the likelihood of correctly identifying the root cause and preventing wasted time on wrong fixes.**\n\nDo you want to proceed with the full investigation (recommended), or would you prefer I attempt a quick diagnosis now (discouraged)?\"\n\n**USER CONFIRMATION REQUIRED:**\nThe user must explicitly choose to proceed with full investigation or request early termination.",
       "agentRole": "You are a workflow governance specialist ensuring agents understand they are bound to execute all workflow steps systematically, and that they MUST communicate the value of full workflow completion to users.",
       "guidance": [
         "This checkpoint prevents premature termination at the earliest possible point",
@@ -196,12 +148,15 @@
         {
           "id": "analysis-neighborhood-contracts",
           "title": "Analysis 1/5: Neighborhood, Call Graph & Contracts",
-          "prompt": "**NEIGHBORHOOD & CONTRACTS DISCOVERY - Build Structural Foundation**\n\nGoal: Build lightweight understanding of code structure, relationships, and contracts BEFORE diving into details. This provides the scaffolding for all subsequent analysis.\n\n**STEP 1: Compute Module Root**\n- Find nearest common ancestor of error stack trace files\n- Clamp to package boundary or src/ directory\n- This defines your investigation scope\n- Set `moduleRoot` context variable\n\n**STEP 2: Neighborhood Map** (cap per file to prevent analysis paralysis)\n- For each file in error stack trace:\n  - List immediate neighbors (same directory, max 8)\n  - Find imports/exports directly used (max 10)\n  - Locate co-located tests (same name pattern)\n  - Identify closest entry points: routes, endpoints, CLI commands (max 5)\n- Produce table: File | Neighbors | Tests | Entry Points\n\n**STEP 3: Bounded Call Graph** (Small Multiples with HOT Path Ranking)\n- For each failing function/class in stack trace:\n  - Build call graph \u22642 hops deep (inbound and outbound)\n  - Cap total nodes at \u226415 per failing symbol\n  - Score edges for HOT path ranking:\n    * Error location in path: +3\n    * Entry point to path: +2  \n    * Test coverage exists: +1\n    * Mentioned in ticket/error message: +1\n  - Tag paths as HOT if score \u22653\n  - Use Small Multiples ASCII visualization:\n    * Width \u2264100 chars per path\n    * Format: `EntryPoint -> Caller -> [*FailingSymbol*] -> Callee`\n    * Mark changed/failing code as `[*name*]`\n    * Add HOT tag for high-impact paths\n    * \u22648 total paths, prioritize HOT paths first\n  - If graph exceeds caps, use Adjacency Summary instead:\n    * Table: Node | Inbound | Outbound | Notes\n    * Top-K by degree/frequency\n- Create Alias Legend for repeated subpaths:\n  * A1 = common.validation.validateInput\n  * A2 = database.connection.getPool\n  * Reuse aliases across all paths\n\n**STEP 4: Flow Anchors** (Entry Points to Bug)\n- Map how users/systems trigger the bug:\n  - HTTP routes \u2192 handlers \u2192 failing code\n  - CLI commands \u2192 execution \u2192 failing code  \n  - Scheduled jobs \u2192 workers \u2192 failing code\n  - Event handlers \u2192 callbacks \u2192 failing code\n- Produce table: Anchor Type | Entry Point | Target Symbol | User Action\n- Cap at \u22645 most relevant anchors\n- Note: This tells us HOW the bug is reached\n\n**STEP 5: Contracts & Invariants**\n- Within `moduleRoot` and immediate neighbors:\n  - List public API symbols (exported functions/classes)\n  - Document API endpoints (REST/GraphQL/RPC)\n  - Identify database tables/collections touched\n  - Note message queue topics/events\n  - Extract stated invariants from:\n    * JSDoc/docstrings with @invariant\n    * Assertions in code\n    * Validation logic patterns\n    * Comments describing guarantees\n- Produce table: Symbol/API | Contract | Invariant | Location\n- Focus on contracts related to failing code\n\n**OUTPUT: Create StructuralAnalysis.md with:**\n- Module Root declaration\n- Neighborhood Map table\n- Bounded Call Graph (Small Multiples ASCII or Adjacency Summary)\n- Alias Legend (for call graph subpaths)\n- Flow Anchors table\n- Contracts & Invariants table\n- Self-Critique: 1-2 areas of uncertainty\n\n**CAPS (strictly enforce to prevent analysis paralysis):**\n- \u22648 neighbors per file\n- \u226410 imports per file\n- \u22645 entry points total\n- \u226415 call graph nodes per failing symbol\n- \u22648 total call graph paths\n- \u22645 flow anchors\n- \u2264100 chars width for ASCII paths",
+          "prompt": "**NEIGHBORHOOD & CONTRACTS DISCOVERY - Build Structural Foundation**\n\nGoal: Build lightweight understanding of code structure, relationships, and contracts BEFORE diving into details. This provides the scaffolding for all subsequent analysis.\n\n**STEP 1: Compute Module Root**\n- Find nearest common ancestor of error stack trace files\n- Clamp to package boundary or src/ directory\n- This defines your investigation scope\n- Set `moduleRoot` context variable\n\n**STEP 2: Neighborhood Map** (cap per file to prevent analysis paralysis)\n- For each file in error stack trace:\n  - List immediate neighbors (same directory, max 8)\n  - Find imports/exports directly used (max 10)\n  - Locate co-located tests (same name pattern)\n  - Identify closest entry points: routes, endpoints, CLI commands (max 5)\n- Produce table: File | Neighbors | Tests | Entry Points\n\n**STEP 3: Bounded Call Graph** (Small Multiples with HOT Path Ranking)\n- For each failing function/class in stack trace:\n  - Build call graph \u22642 hops deep (inbound and outbound)\n  - Cap total nodes at \u226415 per failing symbol\n  - Score edges for HOT path ranking:\n    * Error location in path: +3\n    * Entry point to path: +2  \n    * Test coverage exists: +1\n    * Mentioned in ticket/error message: +1\n  - Tag paths as HOT if score \u22653\n  - Use Small Multiples ASCII visualization:\n    * Width \u2264100 chars per path\n    * Format: `EntryPoint -> Caller -> [*FailingSymbol*] -> Callee`\n    * Mark changed/failing code as `[*name*]`\n    * Add HOT tag for high-impact paths\n    * \u22648 total paths, prioritize HOT paths first\n  - If graph exceeds caps, use Adjacency Summary instead:\n    * Table: Node | Inbound | Outbound | Notes\n    * Top-K by degree/frequency\n- Create Alias Legend for repeated subpaths:\n  * A1 = common.validation.validateInput\n  * A2 = database.connection.getPool\n  * Reuse aliases across all paths\n\n**STEP 4: Flow Anchors** (Entry Points to Bug)\n- Map how users/systems trigger the bug:\n  - HTTP routes \u2192 handlers \u2192 failing code\n  - CLI commands \u2192 execution \u2192 failing code  \n  - Scheduled jobs \u2192 workers \u2192 failing code\n  - Event handlers \u2192 callbacks \u2192 failing code\n- Produce table: Anchor Type | Entry Point | Target Symbol | User Action\n- Cap at \u22645 most relevant anchors\n- Note: This tells us HOW the bug is reached\n\n**STEP 5: Contracts & Invariants**\n- Within `moduleRoot` and immediate neighbors:\n  - List public API symbols (exported functions/classes)\n  - Document API endpoints (REST/GraphQL/RPC)\n  - Identify database tables/collections touched\n  - Note message queue topics/events\n  - Extract stated invariants from:\n    * JSDoc/docstrings with @invariant\n    * Assertions in code\n    * Validation logic patterns\n    * Comments describing guarantees\n- Produce table: Symbol/API | Contract | Invariant | Location\n- Focus on contracts related to failing code\n\n**STEP 6: Assumption Verification** (NOW that you've seen the code)\nNow that you understand the code structure, verify assumptions from the bug report:\n\n1. **Bug Report Assumptions**:\n   - Is the described behavior actually a bug, or might it be expected based on what you've seen?\n   - Are the reproduction steps accurate given the code paths you've mapped?\n   - Is the error message consistent with the actual code flow?\n   - Are there missing steps or context in the bug report?\n\n2. **API/Library Assumptions**:\n   - Check documentation for any APIs/libraries mentioned in stack trace\n   - Verify actual behavior vs assumed behavior\n   - Note any version-specific behavior that might matter\n\n3. **Environment Assumptions**:\n   - Based on code, could this be environment-specific?\n   - Are there configuration dependencies visible in the code?\n   - Could timing/concurrency be a factor (based on code structure)?\n\n4. **Recent Changes Impact**:\n   - Review last 5 commits affecting the failing code\n   - Do they relate to the bug or point to alternative causes?\n\n**Document**: Create AssumptionVerification.md with verified/challenged assumptions.\n\n---\n\n**OUTPUT: Create StructuralAnalysis.md with:**\n- Module Root declaration\n- Neighborhood Map table\n- Bounded Call Graph (Small Multiples ASCII or Adjacency Summary)\n- Alias Legend (for call graph subpaths)\n- Flow Anchors table\n- Contracts & Invariants table\n- Self-Critique: 1-2 areas of uncertainty\n\n**CAPS (strictly enforce to prevent analysis paralysis):**\n- \u22648 neighbors per file\n- \u226410 imports per file\n- \u22645 entry points total\n- \u226415 call graph nodes per failing symbol\n- \u22648 total call graph paths\n- \u22645 flow anchors\n- \u2264100 chars width for ASCII paths",
           "agentRole": "You are a codebase navigator building structural understanding. Your focus is mapping relationships, entry points, and contracts WITHOUT diving into implementation details yet.",
           "guidance": [
             "This is analysis phase 1 of 5 total phases",
-            "Phase 1a = Structure - Build the map before exploring terrain",
+            "Phase 1a = Structure + Assumption Verification - Build the map, THEN question the bug report",
             "Initialize majorIssuesFound = false",
+            "STEPS 1-5: Build structural understanding FIRST",
+            "STEP 6: NOW verify assumptions - you have context to challenge the bug report",
+            "CRITICAL: You can't meaningfully question assumptions before seeing code",
             "STRICTLY ENFORCE CAPS - this prevents 2-hour rabbit holes",
             "Small Multiples: Render mini ASCII path diagrams (\u22646 nodes per path)",
             "HOT Path Ranking: Score and prioritize high-impact paths",
@@ -209,7 +164,7 @@
             "Adjacency Summary: If caps exceeded, use tabular summary instead of full graph",
             "Contracts are CRITICAL: They tell us what guarantees the code must maintain",
             "Flow Anchors show HOW users trigger the bug - essential for reproduction",
-            "Create StructuralAnalysis.md in investigation directory",
+            "Create StructuralAnalysis.md AND AssumptionVerification.md",
             "Update INVESTIGATION_CONTEXT.md with module root and structural summary",
             "This phase provides the scaffolding for all subsequent analysis"
           ],
@@ -366,12 +321,14 @@
     {
       "id": "phase-1f-breadth-verification",
       "title": "Phase 1f: Final Breadth & Scope Verification",
-      "prompt": "**FINAL BREADTH & SCOPE VERIFICATION - Catch Tunnel Vision NOW**\n\n\u26a0\ufe0f **CRITICAL CHECKPOINT BEFORE HYPOTHESES**: This step prevents the #1 cause of wrong conclusions: looking in the wrong place or missing the wider context.\n\n**Goal**: Verify you analyzed the RIGHT code with sufficient breadth AND depth before committing to hypotheses.\n\n\ud83d\udea8 **DO NOT STOP HERE**: Even if you think you found the bug during analysis, you have ZERO PROOF. Analysis = educated guesses. Proof comes from Phases 3-5 (instrumentation + evidence). You are only ~25% done. MUST continue to Phase 2.\n\n---\n\n**STEP 1: Scope Sanity Check**\n\nAsk yourself these questions:\n1. **Module Root Correctness**: Is the `moduleRoot` from Phase 1a actually correct?\n   - Does it include ALL files in the error stack trace?\n   - Did I clamp too narrowly to a subdirectory when the bug spans multiple modules?\n   - Should I expand scope to parent directory or adjacent modules?\n\n2. **Missing Adjacent Systems**: Did I consider:\n   - Adjacent microservices/modules that interact with this one?\n   - Shared libraries or utilities used here?\n   - Configuration systems (env vars, config files, feature flags)?\n   - Caching layers or state management systems?\n   - Database schema or data migration issues?\n\n3. **Entry Point Coverage**: From Phase 1a Flow Anchors, did I verify:\n   - ALL entry points that could trigger this bug?\n   - Less obvious entry points (background jobs, scheduled tasks, webhooks)?\n   - Initialization code that runs before the failing code?\n\n---\n\n**STEP 2: Wide-Angle Review**\n\nReview your Phase 1 analysis outputs and answer:\n\n1. **Pattern Confidence** (from Phase 1, sub-phase 2):\n   - Do I have a solid Pattern Catalog with \u22652 occurrences per pattern?\n   - Did I identify clear pattern deviations in failing code?\n   - Are there OTHER files that deviate from patterns I haven't looked at?\n\n2. **Call Graph Completeness** (from Phase 1, sub-phase 1 & 2):\n   - Did my bounded call graph capture all HOT paths?\n   - Are there callers OUTSIDE my 2-hop boundary I should check?\n   - Did I trace BACKWARDS from the error far enough (to true entry points)?\n\n3. **Component Rankings** (from Phase 1, sub-phase 3):\n   - Are my top 5 components actually the most suspicious?\n   - Did I miss components because they're not in the stack trace?\n   - Should I re-rank based on new understanding?\n\n4. **Data Flow Completeness** (from Phase 1, sub-phase 4):\n   - Did I trace data flow from TRUE origin (user input, external system)?\n   - Are there data transformations BEFORE my analyzed scope?\n   - Did I check data validation at ALL boundaries?\n\n5. **Test Coverage Gaps** (from Phase 1, sub-phase 5):\n   - Did I find tests that SHOULD exist but don't?\n   - Are there missing test categories (integration, edge cases, error conditions)?\n   - Do test gaps reveal I'm looking in wrong place?\n\n---\n\n**STEP 3: Alternative Scope Analysis**\n\n**Generate 2-3 alternative investigation scopes and evaluate:**\n\nFor each alternative scope, assess:\n- **Scope Description**: What module/area would this focus on?\n- **Why It Might Be Better**: What evidence suggests this scope?\n- **Evidence For**: What supports investigating this area?\n- **Evidence Against**: Why might this be wrong direction?\n- **Confidence**: Rate 1-10 that this is the right scope\n\n**Example Alternative Scopes**:\n- Expand to parent module (if current feels too narrow)\n- Shift to adjacent service (if this might be symptom not cause)\n- Focus on infrastructure layer (if might be env/config issue)\n- Focus on data layer (if might be data corruption/migration issue)\n\n---\n\n**STEP 4: Breadth Decision**\n\nBased on Steps 1-3, make ONE of these decisions:\n\n**OPTION A: SCOPE IS CORRECT - Continue to Hypothesis Development**\n- Current module root and analyzed components are right\n- Breadth and depth are sufficient\n- Ready to form hypotheses with confidence\n- Set `scopeVerified = true` and proceed\n\n**OPTION B: EXPAND SCOPE - Additional Analysis Required**\n- Identified critical gaps in breadth or depth\n- Need to analyze additional modules/components\n- Set specific components/areas to add to analysis\n- Set `needsScopeExpansion = true`\n- Document what to add: `additionalAnalysisNeeded = [list]`\n\n**OPTION C: SHIFT SCOPE - Wrong Area**\n- Current focus is likely wrong place\n- Alternative scope has stronger evidence\n- Need to restart Phase 1 with new module root\n- Set `needsScopeShift = true`\n- Set `newModuleRoot = [path]`\n\n---\n\n**OUTPUT: Create ScopeVerification.md**\n\nMust include:\n1. **Scope Sanity Check Results** (answers to Step 1 questions)\n2. **Wide-Angle Review Findings** (answers to Step 2 questions)\n3. **Alternative Scopes Evaluated** (2-3 alternatives with scores)\n4. **Breadth Decision** (A, B, or C with justification)\n5. **Confidence in Current Scope** (1-10)\n6. **Action Items** (if Option B or C selected)\n\n**Context Variables to Set**:\n- `scopeVerified` (true/false)\n- `needsScopeExpansion` (true/false)\n- `needsScopeShift` (true/false)\n- `scopeConfidence` (1-10)\n- `additionalAnalysisNeeded` (array, if Option B)\n- `newModuleRoot` (string, if Option C)\n\n---\n\n**\ud83c\udfaf WHY THIS MATTERS**: \n\nResearch shows that 60% of failed investigations looked in the wrong place or too narrowly. This checkpoint catches that BEFORE you invest effort in wrong hypotheses.\n\n**Self-Critique**: List 1-2 specific uncertainties about scope that concern you most.",
+      "prompt": "**FINAL BREADTH & SCOPE VERIFICATION - Catch Tunnel Vision NOW**\n\n\u26a0\ufe0f **CRITICAL CHECKPOINT BEFORE HYPOTHESES**: This step prevents the #1 cause of wrong conclusions: looking in the wrong place or missing the wider context.\n\n**Goal**: Verify you analyzed the RIGHT code with sufficient breadth AND depth before committing to hypotheses.\n\n\ud83d\udea8 **DO NOT STOP HERE**: Even if you think you found the bug during analysis, you have ZERO PROOF. Analysis = educated guesses. Proof comes from Phases 3-5 (instrumentation + evidence). You are only ~25% done. MUST continue to Phase 2.\n\n---\n\n**STEP 1: Scope Sanity Check**\n\nAsk yourself these questions:\n1. **Module Root Correctness**: Is the `moduleRoot` from Phase 1a actually correct?\n   - Does it include ALL files in the error stack trace?\n   - Did I clamp too narrowly to a subdirectory when the bug spans multiple modules?\n   - Should I expand scope to parent directory or adjacent modules?\n\n2. **Missing Adjacent Systems**: Did I consider:\n   - Adjacent microservices/modules that interact with this one?\n   - Shared libraries or utilities used here?\n   - Configuration systems (env vars, config files, feature flags)?\n   - Caching layers or state management systems?\n   - Database schema or data migration issues?\n\n3. **Entry Point Coverage**: From Phase 1a Flow Anchors, did I verify:\n   - ALL entry points that could trigger this bug?\n   - Less obvious entry points (background jobs, scheduled tasks, webhooks)?\n   - Initialization code that runs before the failing code?\n\n---\n\n**STEP 2: Wide-Angle Review**\n\nReview your Phase 1 analysis outputs and answer:\n\n1. **Pattern Confidence** (from Phase 1, sub-phase 2):\n   - Do I have a solid Pattern Catalog with \u22652 occurrences per pattern?\n   - Did I identify clear pattern deviations in failing code?\n   - Are there OTHER files that deviate from patterns I haven't looked at?\n\n2. **Call Graph Completeness** (from Phase 1, sub-phase 1 & 2):\n   - Did my bounded call graph capture all HOT paths?\n   - Are there callers OUTSIDE my 2-hop boundary I should check?\n   - Did I trace BACKWARDS from the error far enough (to true entry points)?\n\n3. **Component Rankings** (from Phase 1, sub-phase 3):\n   - Are my top 5 components actually the most suspicious?\n   - Did I miss components because they're not in the stack trace?\n   - Should I re-rank based on new understanding?\n\n4. **Data Flow Completeness** (from Phase 1, sub-phase 4):\n   - Did I trace data flow from TRUE origin (user input, external system)?\n   - Are there data transformations BEFORE my analyzed scope?\n   - Did I check data validation at ALL boundaries?\n\n5. **Test Coverage Gaps** (from Phase 1, sub-phase 5):\n   - Did I find tests that SHOULD exist but don't?\n   - Are there missing test categories (integration, edge cases, error conditions)?\n   - Do test gaps reveal I'm looking in wrong place?\n\n---\n\n\n\n**STEP 2.5: Assumption Verification**\n\n**NOW that you've completed 5 phases of code analysis, verify all assumptions:**\n\n1. **Bug Report Assumptions**:\n   - Is the described behavior actually a bug based on what you now know about the code?\n   - Are the reproduction steps accurate given the code paths you've mapped?\n   - Is the error message consistent with the actual code flow you've traced?\n   - Are there missing steps or context in the bug report that your analysis revealed?\n\n2. **API/Library Assumptions**:\n   - Check documentation for any APIs/libraries mentioned in stack trace\n   - Verify actual behavior vs assumed behavior based on your code analysis\n   - Note any version-specific behavior that might matter\n   - Did your call graph analysis reveal unexpected library usage patterns?\n\n3. **Environment Assumptions**:\n   - Based on code analysis, is this environment-specific?\n   - Are there configuration dependencies you discovered in the code?\n   - Could timing/concurrency be a factor (based on code structure you analyzed)?\n   - Did pattern analysis reveal environment-dependent code paths?\n\n4. **Recent Changes Impact**:\n   - Review last 5-10 commits affecting the analyzed code\n   - Do they relate to the bug or point to alternative causes?\n   - Did your analysis reveal recent changes that break established patterns?\n\n**Document**: Create or update AssumptionVerification.md with verified/challenged assumptions.\n\n**Set**: `assumptionsVerified = true` in context\n\n---\n**STEP 3: Alternative Scope Analysis**\n\n**Generate 2-3 alternative investigation scopes and evaluate:**\n\nFor each alternative scope, assess:\n- **Scope Description**: What module/area would this focus on?\n- **Why It Might Be Better**: What evidence suggests this scope?\n- **Evidence For**: What supports investigating this area?\n- **Evidence Against**: Why might this be wrong direction?\n- **Confidence**: Rate 1-10 that this is the right scope\n\n**Example Alternative Scopes**:\n- Expand to parent module (if current feels too narrow)\n- Shift to adjacent service (if this might be symptom not cause)\n- Focus on infrastructure layer (if might be env/config issue)\n- Focus on data layer (if might be data corruption/migration issue)\n\n---\n\n**STEP 4: Breadth Decision**\n\nBased on Steps 1-3, make ONE of these decisions:\n\n**OPTION A: SCOPE IS CORRECT - Continue to Hypothesis Development**\n- Current module root and analyzed components are right\n- Breadth and depth are sufficient\n- Ready to form hypotheses with confidence\n- Set `scopeVerified = true` and proceed\n\n**OPTION B: EXPAND SCOPE - Additional Analysis Required**\n- Identified critical gaps in breadth or depth\n- Need to analyze additional modules/components\n- Set specific components/areas to add to analysis\n- Set `needsScopeExpansion = true`\n- Document what to add: `additionalAnalysisNeeded = [list]`\n\n**OPTION C: SHIFT SCOPE - Wrong Area**\n- Current focus is likely wrong place\n- Alternative scope has stronger evidence\n- Need to restart Phase 1 with new module root\n- Set `needsScopeShift = true`\n- Set `newModuleRoot = [path]`\n\n---\n\n**OUTPUT: Create ScopeVerification.md**\n\nMust include:\n1. **Scope Sanity Check Results** (answers to Step 1 questions)\n2. **Wide-Angle Review Findings** (answers to Step 2 questions)\n3. **Alternative Scopes Evaluated** (2-3 alternatives with scores)\n4. **Breadth Decision** (A, B, or C with justification)\n5. **Confidence in Current Scope** (1-10)\n6. **Action Items** (if Option B or C selected)\n\n**Context Variables to Set**:\n- `scopeVerified` (true/false)\n- `needsScopeExpansion` (true/false)\n- `needsScopeShift` (true/false)\n- `scopeConfidence` (1-10)\n- `additionalAnalysisNeeded` (array, if Option B)\n- `newModuleRoot` (string, if Option C)\n\n---\n\n**\ud83c\udfaf WHY THIS MATTERS**: \n\nResearch shows that 60% of failed investigations looked in the wrong place or too narrowly. This checkpoint catches that BEFORE you invest effort in wrong hypotheses.\n\n**Self-Critique**: List 1-2 specific uncertainties about scope that concern you most.",
       "agentRole": "You are a senior investigator performing final scope verification. Your expertise is catching tunnel vision, identifying missing context, and ensuring investigations focus on the right area. You excel at meta-analysis and sanity checking investigative scope.",
       "guidance": [
         "This step comes AFTER Phase 1 (5-phase analysis loop) and BEFORE Phase 2a (hypothesis development)",
         "Goal: Catch tunnel vision and wrong-place investigations BEFORE committing to hypotheses",
         "Create ScopeVerification.md with structured findings",
+        "STEP 2.5: Verify ALL assumptions from bug report now that you have full code context",
+        "Assumption verification MUST happen AFTER all 5 analysis iterations for full context",
         "CRITICAL: Evaluate 2-3 ALTERNATIVE scopes to challenge your current focus",
         "Common mistakes: too narrow scope, missed adjacent systems, wrong module root, insufficient entry point coverage",
         "If Option B (expand) or C (shift) selected, you MUST execute additional analysis before proceeding",