npm - @exaudeus/workrail - Versions diffs - 0.8.2 → 0.8.4 - Mend

@exaudeus/workrail 0.8.2 → 0.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/dist/config/feature-flags.js +8 -0
package/dist/infrastructure/storage/file-workflow-storage.d.ts +4 -0
package/dist/infrastructure/storage/file-workflow-storage.js +67 -26
package/package.json +1 -1
package/workflows/bug-investigation.agentic.json +10 -10
package/workflows/routines/context-gathering.json +149 -0
package/workflows/routines/execution-simulation.json +84 -0
package/workflows/routines/feature-implementation.json +119 -0
package/workflows/routines/hypothesis-challenge.json +113 -0
package/workflows/routines/plan-analysis.json +67 -106
package/workflows/workflow-diagnose-environment.json +21 -14
package/workflows/systematic-bug-investigation-with-loops.backup-20251106-125543.json +0 -751

package/workflows/systematic-bug-investigation-with-loops.backup-20251106-125543.json DELETED Viewed

@@ -1,751 +0,0 @@
-{
-  "id": "systematic-bug-investigation-with-loops",
-  "name": "Systematic Bug Investigation Workflow",
-  "version": "1.1.0-beta.8",
-  "description": "A comprehensive workflow for systematic bug and failing test investigation that prevents LLMs from jumping to conclusions. Enforces thorough evidence gathering, hypothesis formation, debugging instrumentation, and validation to achieve near 100% certainty about root causes. This workflow does NOT fix bugs - it produces detailed diagnostic writeups that enable effective fixing by providing complete understanding of what is happening, why it's happening, and supporting evidence.",
-  "clarificationPrompts": [
-    "What type of system is this? (web app, mobile app, backend service, desktop app, etc.)",
-    "How consistently can you reproduce this bug? (always reproducible, sometimes reproducible, rarely reproducible)",
-    "What was the last known working version or state if applicable?",
-    "Are there any time constraints or urgency factors for this investigation?",
-    "What level of system access do you have? (full codebase, limited access, production logs only)",
-    "What existing documentation is available? (README files, architecture docs, API docs, design documents, runbooks)",
-    "Do you have access to existing logs? (production logs, error logs, debug logs, metrics, traces)",
-    "Do you have preferences for handling large log volumes? (sub-chat analysis, inline summaries only, or no preference for automatic decision)"
-  ],
-  "preconditions": [
-    "User has identified a specific bug or failing test to investigate",
-    "Agent has access to codebase analysis tools (grep, file readers, etc.)",
-    "Agent has access to build/test execution tools for the project type",
-    "User can provide error messages, stack traces, or test failure output",
-    "Bug is reproducible with specific steps or a minimal test case"
-  ],
-  "metaGuidance": [
-    "**\ud83d\udea8 MANDATORY WORKFLOW EXECUTION - READ THIS FIRST:**",
-    "YOU ARE EXECUTING A STRUCTURED WORKFLOW, NOT FREESTYLE DEBUGGING.",
-    "You CANNOT \"figure out the bug\" and stop. You MUST execute all 27 workflow steps by repeatedly calling workflow_next and following instructions until the MCP returns isComplete=true.",
-    "WORKFLOW MECHANICS: Each call to workflow_next returns the next required step. You MUST execute that step, then call workflow_next again. Repeat until isComplete=true.",
-    "DO NOT STOP CALLING WORKFLOW_NEXT: Even if you think you know the bug, even if you have high confidence, even if it seems obvious - you MUST continue calling workflow_next.",
-    "STEP COUNTER: Every prompt shows \"Step X of 27\" - you are NOT done until you reach Step 29/29 and isComplete=true.",
-    "**\ud83c\udfaf WHY THIS STRUCTURE EXISTS (Evidence-Based):**",
-    "Professional research spanning 20+ years shows agents who skip systematic investigation steps are wrong ~90% of the time, even with 9-10/10 self-reported confidence.",
-    "Quick conclusions miss: edge cases, alternative explanations, environment factors, interaction effects, and data corruption paths.",
-    "This workflow FORCES thoroughness through: code analysis, hypothesis formation, instrumentation, evidence gathering, adversarial review, and comprehensive documentation.",
-    "**CRITICAL WORKFLOW DISCIPLINE:**",
-    "HIGH CONFIDENCE \u2260 INVESTIGATION COMPLETE: Achieving 8-10/10 confidence in a hypothesis is excellent progress but does NOT mean the workflow is done.",
-    "COMPLETE ALL PHASES: You MUST complete ALL phases (0 through 6) regardless of confidence level. Each phase builds critical evidence and documentation.",
-    "WORKFLOW COMPLETION FLAG: Only set isWorkflowComplete=true when you complete Phase 6 (Comprehensive Diagnostic Writeup) AND produce the full deliverable.",
-    "DO NOT SKIP PHASES: Even with high confidence, you must complete hypothesis generation (Phase 2), instrumentation (Phase 3), evidence collection (Phase 4), analysis (Phase 5), and writeup (Phase 6).",
-    "PHASE PROGRESSION: An investigation that stops at triage (Phase 0) or hypothesis formation (Phase 2) or evidence collection (Phase 4) is INCOMPLETE - the diagnostic writeup is the required deliverable.",
-    "**HIGH AUTO MODE DISCIPLINE:**",
-    "In HIGH automation mode, agents must execute phases WITHOUT asking for permission between phases. Asking 'Would you like me to continue?' or 'Should I proceed to Phase X?' implies the workflow is optional - IT IS NOT. The ONLY confirmations allowed",
-    "are: (1) Phase 0e early termination decision, (2) Phase 4a controlled experiments. All other phases execute automatically based on the systematic workflow structure.",
-    "**FUNCTION DEFINITIONS:**",
-    "fun instrumentCode(location, hypothesis) = 'Add debug logs at {location} for {hypothesis}. Format: ClassName.method [{hypothesis}]: message. Include timestamp, thread ID if concurrent.'",
-    "fun collectEvidence(hypothesis) = 'Run instrumented code, collect logs, analyze results. Score evidence quality 1-10. Document in Evidence/{hypothesis}.md.'",
-    "fun updateHypothesisLog(id, status, evidence) = 'Update INVESTIGATION_CONTEXT.md section {id} with {status} and {evidence}. Include confidence score.'",
-    "fun analyzeTests(component) = 'Find all tests for {component} using grep_search. Check coverage, recent changes, what they validate vs miss. Run with --debug flag.'",
-    "fun recursiveAnalysis(component, depth=3) = 'Analyze {component} to {depth} levels. L1: implementation, L2: direct deps, L3: transitive deps. Document each level.'",
-    "fun controlledModification(type, location) = 'Make {type} change at {location}. Types: guard (add logging), assert (add assertion), fix (minimal fix), break (controlled failure). Commit: DEBUG: {type} at {location}'",
-    "fun checkHypothesisInTests(hypothesis) = 'Search existing tests for evidence. Direct: tests of suspected components. Indirect: tests that would fail if true. Document in TestEvidence/{hypothesis}.md'",
-    "fun aggregateDebugLogs(pattern, timeWindow=100) = 'Deduplicate logs matching {pattern}. Output: {pattern} x{count} in {timeWindow}ms, variations: {unique_values}'",
-    "fun createInvestigationBranch() = 'git checkout -b investigate/{bug-id}-{timestamp}. If git unavailable, create Investigation/{timestamp}/ directory for artifacts.'",
-    "fun trackInvestigation(phase, status) = 'Update INVESTIGATION_CONTEXT.md progress: \u2705 {completed}, \ud83d\udd04 {phase}, \u23f3 Remaining: {list}, \ud83d\udcca Confidence: {score}/10'",
-    "fun updateInvestigationContext(section, content) = 'Update INVESTIGATION_CONTEXT.md {section} with {content}. Include timestamp. If section doesn\\'t exist, create it. Preserve all other sections.'",
-    "fun findSimilarBugs() = 'Search for: 1) Similar error patterns in codebase, 2) Previous fixes in git history, 3) Related test cases. Document in SimilarPatterns.md'",
-    "fun visualProgress() = 'Show: \u2705 Phase 0 | \u2705 Phase 1 | \ud83d\udd04 Phase 2 | \u23f3 Phase 3-5 | \u23f3 Phase 6 | \ud83d\udcca 35% Complete. Include time spent per phase.'",
-    "fun applyDebugPreferences() = 'Apply user debugging preferences from userDebugPreferences context variable. Adapt logging verbosity, tool selection, output format.'",
-    "fun addResumptionJson(phase) = 'Update INVESTIGATION_CONTEXT.md resumption section with: workflowId, completedSteps up to {phase}, all context variables. Include workflow_get and workflow_next instructions.'",
-    "**USAGE:** When you see function calls like instrumentCode() or analyzeTests(), execute the full instructions defined above.",
-    "INVESTIGATION DISCIPLINE: Never propose fixes or solutions until Phase 6 (Comprehensive Diagnostic Writeup). Focus entirely on systematic evidence gathering and analysis.",
-    "HYPOTHESIS RIGOR: All hypotheses must be based on concrete evidence from code analysis with quantified scoring (1-10 scales). Maximum 5 hypotheses per investigation.",
-    "DEBUGGING INSTRUMENTATION: Always implement debugging mechanisms before running tests - logs, print statements, or test modifications that will provide evidence.",
-    "EVIDENCE THRESHOLD: Require minimum 3 independent sources of evidence before confirming any hypothesis. Use objective verification criteria.",
-    "SYSTEMATIC PROGRESSION: Complete each investigation phase fully before proceeding. Each phase builds critical context for the next with structured documentation.",
-    "CONFIDENCE CALIBRATION: Use mathematical confidence framework with 9.0/10 minimum threshold. Actively challenge conclusions with adversarial analysis.",
-    "UNCERTAINTY ACKNOWLEDGMENT: Explicitly document all remaining unknowns and their potential impact. No subjective confidence assessments.",
-    "THOROUGHNESS: For complex bugs, recursively analyze dependencies and internals of identified components to ensure full picture.",
-    "TEST INTEGRATION: Leverage existing tests to validate hypotheses where possible.",
-    "**LOGGING STANDARDS:**",
-    "LOG FORMAT: Always use 'ClassName.methodName [hypothesisId] {timestamp}: message'. For concurrent code, add thread/worker ID.",
-    "LOG DEDUPLICATION: Implement in debug code: if (lastMsg === currentMsg) { count++; if (count % 10 === 0) log(`${msg} x${count}`); } else { if (count > 1) log(`Previous: x${count}`); log(currentMsg); count = 1; }",
-    "LOG AGGREGATION: For high-frequency events, create summaries: 'Event X occurred 847 times between 10:23:45-10:23:47, unique values: [val1: 623, val2: 224]'",
-    "LOG WINDOWS: Group related logs within 50-100ms. Mark groups with '=== Operation: XYZ Start ===' and '=== Operation: XYZ End (duration: 73ms) ==='",
-    "LOG CONTEXT: Include hypothesis ID in all debug logs. Use prefixes like 'H1_DEBUG:', 'H2_TRACE:', 'H3_ERROR:'",
-    "LOG ANALYSIS OFFLOADING: For voluminous logs (>500 lines), offload analysis to sub-chats with structured prompts. See Phase 4 for detailed sub-analysis implementation.",
-    "RECURSION DEPTH: Limit recursive analysis to 3 levels deep to prevent analysis paralysis while ensuring thoroughness.",
-    "INVESTIGATION BOUNDS: If investigation exceeds 20 steps or 4 hours without root cause, pause and reassess approach with user.",
-    "AUTOMATION LEVELS: High=auto-approve >8.0 confidence decisions, Medium=standard confirmations, Low=extra confirmations for safety. Control workflow autonomy based on user preference.",
-    "CONTEXT DOCUMENTATION: Maintain INVESTIGATION_CONTEXT.md throughout. Update after major milestones, failures, or user interventions to enable seamless handoffs between sessions. Include explicit resumption instructions using workflow_get and workflow_next.",
-    "GIT FALLBACK STRATEGY: If git unavailable, gracefully skip commits/branches, log changes manually in CONTEXT.md with timestamps, warn user, document modifications for manual control.",
-    "GIT ERROR HANDLING: Use run_terminal_cmd for git operations; if fails, output exact command for user manual execution. Never halt investigation due to git unavailability.",
-    "TOOL AVAILABILITY AWARENESS: Check debugging tool availability before investigation design. Have fallbacks for when primary tools unavailable (grep\u2192file_search, etc).",
-    "SECURITY PROTOCOLS: Sanitize sensitive data in logs/reproduction steps. Be mindful of exposing credentials, PII, or system internals during evidence collection phases.",
-    "DYNAMIC RE-TRIAGE: Allow complexity upgrades during investigation if evidence reveals deeper issues. Safe downgrades only with explicit user confirmation after evidence review.",
-    "DEVIL'S ADVOCATE REVIEW: Actively challenge primary hypothesis with available evidence. Seek alternative explanations and rate alternative likelihood before final confidence assessment.",
-    "COLLABORATIVE HANDOFFS: Structure documentation for peer review and team coordination. Include methodology, reasoning, and complete evidence chain for knowledge transfer.",
-    "FAILURE BOUNDS: Track investigation progress. If >20 steps or >4 hours without breakthrough, pause for user guidance. Document dead ends to prevent redundant work in future sessions.",
-    "COGNITIVE BREAKS: After 10 investigation steps, pause and summarize progress to reset perspective.",
-    "RUBBER DUCK: Verbalize hypotheses in sub-prompts to externalize reasoning and catch logical gaps.",
-    "COLLABORATION READY: Document clearly for handoffs when stuck beyond iteration limits."
-  ],
-  "steps": [
-    {
-      "id": "phase-0-triage",
-      "title": "Phase 0: Initial Triage & Context Gathering",
-      "prompt": "**SYSTEMATIC INVESTIGATION BEGINS** - Your mission is to achieve near 100% certainty about this bug's root cause through systematic evidence gathering. NO FIXES will be proposed until Phase 6.\n\n**STEP 1: Bug Report Analysis**\nPlease provide the complete bug context:\n- **Bug Description**: What is the observed behavior vs expected behavior?\n- **Error Messages/Stack Traces**: Paste the complete error output\n- **Reproduction Steps**: How can this bug be consistently reproduced?\n- **Environment Details**: OS, language version, framework version, etc.\n- **Recent Changes**: Any recent commits, deployments, or configuration changes?\n\n**STEP 2: Project Type Classification**\nBased on the information provided, I will classify the project type and set debugging strategies:\n- **Languages/Frameworks**: Primary tech stack\n- **Build System**: Maven, Gradle, npm, etc.\n- **Testing Framework**: JUnit, Jest, pytest, etc.\n- **Logging System**: Available logging mechanisms\n- **Architecture**: Monolithic, microservices, distributed, serverless, etc.\n\n**STEP 3: Complexity Assessment**\nI will analyze the bug complexity using these criteria:\n- **Simple**: Single function/method, clear error path, minimal dependencies\n- **Standard**: Multiple components, moderate investigation required\n- **Complex**: Cross-system issues, race conditions, complex state management\n\n**STEP 4: Automation Level Selection**\nAsk the user: \"What automation level would you prefer for this investigation?\"\n- **High**: Auto-approve decisions with confidence >8.0, minimal confirmations\n- **Medium**: Standard confirmations for key decisions\n- **Low**: Extra confirmations for safety, manual approval for all changes\n\n**OUTPUTS**: Set context variables:\n- `projectType`, `bugComplexity`, `debuggingMechanism`\n- `isDistributed` (true if architecture involves microservices/distributed systems)\n- `automationLevel` (High/Medium/Low based on user preference)",
-      "agentRole": "You are a senior debugging specialist and bug triage expert with 15+ years of experience across multiple technology stacks. Your expertise lies in quickly classifying bugs, understanding project architectures, and determining appropriate investigation strategies. You excel at extracting critical information from bug reports and setting up systematic investigation approaches.",
-      "guidance": [
-        "CLASSIFICATION ACCURACY: Proper complexity assessment determines investigation depth - be thorough but decisive",
-        "CONTEXT CAPTURE: Gather complete environmental and situational context now to avoid gaps later",
-        "DEBUGGING STRATEGY: Choose debugging mechanisms appropriate for the project type and bug complexity",
-        "NO ASSUMPTIONS: If critical information is missing, explicitly request it before proceeding"
-      ]
-    },
-    {
-      "id": "phase-0a-assumption-check",
-      "title": "Phase 0a: Assumption Verification Checkpoint",
-      "prompt": "**ASSUMPTION CHECK** - Before proceeding, verify key assumptions to prevent bias.\n\n**VERIFY**:\n1. **Data State**: Confirm variable types and null handling\n2. **API/Library**: Check documentation for actual vs assumed behavior\n3. **Environment**: Verify bug exists in clean environment\n4. **Recent Changes**: Review last 5 commits for relevance\n\n**OUTPUT**: List verified assumptions with evidence sources.",
-      "agentRole": "You are a skeptical analyst who challenges every assumption. Question everything that hasn't been explicitly verified.",
-      "guidance": [
-        "Use analysis tools to verify, don't assume",
-        "Document each assumption with its verification method",
-        "Flag any unverifiable assumptions for tracking",
-        "CHECK API DOCS: Never assume function behavior from names - verify actual documentation",
-        "VERIFY DATA TYPES: Use debugger or logs to confirm actual runtime types and values",
-        "TEST ENVIRONMENT: Reproduce in minimal environment to rule out configuration issues"
-      ]
-    },
-    {
-      "id": "phase-0b-user-preferences",
-      "title": "Phase 0b: Identify User Debugging Preferences",
-      "prompt": "**USER DEBUGGING PREFERENCES** - Identify and document user-specific debugging preferences.\n\n**CHECK FOR PREFERENCES IN:**\n1. **User Settings/Memory**: Any stored debugging preferences\n2. **Project Documentation**: Team debugging standards\n3. **Previous Instructions**: Past user guidance on debugging approach\n\n**CATEGORIZE PREFERENCES:**\n- **Debugging Tools**: Preference for debugger vs logs vs traces\n- **Log Verbosity**: Detailed vs concise output\n- **Output Format**: Structured logs vs human-readable\n- **Testing Approach**: Unit tests vs integration tests focus\n- **Commit Style**: Conventional commits vs descriptive\n- **Documentation**: Inline comments vs separate docs\n- **Error Handling**: Fail fast vs defensive programming\n\n**IF NO EXPLICIT PREFERENCES:**\nAsk user:\n- \"Do you prefer verbose logging or concise summaries?\"\n- \"Should I use interactive debuggers or rely on log analysis?\"\n- \"Any specific tools or approaches your team prefers?\"\n\n**OUTPUT**: Set `userDebugPreferences` context variable with categorized preferences.\n\n**APPLY**: Use applyDebugPreferences() throughout investigation to adapt approach.",
-      "agentRole": "You are a debugging preferences specialist who understands how different teams and developers approach problem-solving. You excel at identifying and applying user-specific debugging styles.",
-      "guidance": [
-        "This step ensures the investigation aligns with user/team practices",
-        "Capture both explicit and implicit preferences",
-        "Default to standard practices if no preferences found",
-        "These preferences will be applied throughout the workflow"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-0c-tool-check",
-      "title": "Phase 0c: Tool Availability Verification",
-      "prompt": "**TOOL AVAILABILITY CHECK** - Verify required debugging tools before investigation.\n\n**CORE TOOLS CHECK:**\n1. **Analysis Tools**:\n   - grep_search: Text pattern searching\n   - read_file: File content reading\n   - codebase_search: Semantic code search\n   - Test availability, note any failures\n\n2. **Git Operations**:\n   - Check git availability: `git --version`\n   - If unavailable, set `gitAvailable = false`\n   - Plan fallback: manual change tracking\n\n3. **Build/Test Tools** (based on projectType):\n   - npm/yarn for JavaScript\n   - Maven/Gradle for Java\n   - pytest/unittest for Python\n   - Document which are available\n\n4. **Debugging Tools**:\n   - Language-specific debuggers\n   - Profilers if needed\n   - Log aggregation tools\n\n**FALLBACK STRATEGIES:**\n- grep_search fails \u2192 use file_search\n- codebase_search fails \u2192 use grep_search with context\n- Git unavailable \u2192 track changes in INVESTIGATION_CONTEXT.md\n- Build tools missing \u2192 focus on static analysis\n\n**OUTPUT**:\n- Set `availableTools` context variable\n- Set `toolLimitations` with any restrictions\n- Document fallback strategies in context\n\n**ADAPTATION**: Adjust investigation approach based on available tools.",
-      "agentRole": "You are a tool availability specialist ensuring the investigation can proceed smoothly with available resources. You excel at creating fallback strategies.",
-      "guidance": [
-        "Test each tool category systematically",
-        "Don't fail if some tools are unavailable - adapt",
-        "Document limitations clearly for user awareness",
-        "Prefer degraded functionality over investigation failure"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-0d-create-context",
-      "title": "Phase 0d: Initialize Investigation Context",
-      "prompt": "**CREATE INVESTIGATION CONTEXT** - Initialize comprehensive tracking document.\n\nUse createInvestigationBranch() to set up version control, then create INVESTIGATION_CONTEXT.md:\n\n```markdown\n# Investigation Context\n\n## 1. Bug Summary\n- **ID**: {{bugId || 'investigation-' + Date.now()}}\n- **Description**: [from bug report]\n- **Complexity**: {{bugComplexity}}\n- **Started**: {{new Date().toISOString()}}\n- **Status**: Phase 0d - Context Initialization\n- **Automation Level**: {{automationLevel}}\n\n## 2. Progress Tracking\n{{visualProgress()}}\n\u2705 Completed: Phase 0 (Triage), Phase 0a (Assumptions), Phase 0b (User Preferences), Phase 0c (Tools)\n\ud83d\udd04 Current: Phase 0d (Context Creation)\n\u23f3 Remaining: Phase 1 (Analysis), Phase 2 (Hypotheses), Phase 3-5 (Validation), Phase 6 (Writeup)\n\ud83d\udcca Confidence: 0/10\n\n## 3. Environment & Setup\n- **Project Type**: {{projectType}}\n- **Debugging Mechanism**: {{debuggingMechanism}}\n- **Architecture**: {{isDistributed ? 'Distributed' : 'Monolithic'}}\n- **User Preferences**: {{userDebugPreferences}}\n- **Available Tools**: {{availableTools}}\n- **Tool Limitations**: {{toolLimitations || 'None'}}\n\n## 4. Analysis Findings\n*To be populated during Phase 1*\n\n## 5. Hypothesis Registry\n*To be populated during Phase 2*\n\n## 6. Evidence Log\n*To be populated during validation*\n\n## 7. Experiment Results\n*To be populated if experiments conducted*\n\n## 8. Dead Ends & Lessons\n*Track approaches that didn't work*\n\n## 9. Function Definitions\n[Include all function definitions from metaGuidance for reference]\n\n## 10. Resumption Instructions\n\n### How to Resume This Investigation\n\n1. **Get the workflow**: Call `workflow_get` with:\n   - id: \"systematic-bug-investigation-with-loops\"\n   - mode: \"preview\" (to see next step)\n\n2. **Resume from saved state**: Call `workflow_next` with the JSON below:\n\n```json\n{\n  \"workflowId\": \"systematic-bug-investigation-with-loops\",\n  \"completedSteps\": [\"phase-0-triage\", \"phase-0a-assumption-check\", \"phase-0b-user-preferences\", \"phase-0c-tool-check\", \"phase-0d-create-context\"],\n  \"context\": {\n    \"bugComplexity\": \"{{bugComplexity}}\",\n    \"projectType\": \"{{projectType}}\",\n    \"debuggingMechanism\": \"{{debuggingMechanism}}\",\n    \"isDistributed\": {{isDistributed || false}},\n    \"automationLevel\": \"{{automationLevel}}\",\n    \"userDebugPreferences\": {{JSON.stringify(userDebugPreferences)}},\n    \"availableTools\": {{JSON.stringify(availableTools)}},\n    \"toolLimitations\": {{JSON.stringify(toolLimitations)}}\n  }\n}\n```\n\n3. **Continue investigation**: The workflow will pick up from where it left off\n\n### Important Notes\n- Update `completedSteps` array after completing each phase\n- Preserve all context variables for proper state restoration\n- This JSON should be updated after major milestones\n```\n\n**Set `contextInitialized` = true**",
-      "agentRole": "You are creating the central documentation hub for this investigation. This document will track all progress, findings, and enable seamless handoffs.",
-      "guidance": [
-        "Create a comprehensive but scannable document",
-        "Include all context variables discovered so far",
-        "Set up structure for future updates",
-        "Include function definitions for reference",
-        "Update the resumption JSON after each major phase using addResumptionJson()",
-        "Always include the workflow_get and workflow_next instructions for proper resumption"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-0e-workflow-commitment",
-      "title": "Phase 0e: Workflow Execution Commitment & Early Termination Checkpoint",
-      "prompt": "**\u26a0\ufe0f WORKFLOW EXECUTION COMMITMENT CHECKPOINT \u26a0\ufe0f**\n\n*(Note: This checkpoint only appears in Medium/Low automation modes. High automation mode proceeds automatically.)*\n\nYou have completed Phase 0 (Triage & Setup). Before proceeding to the investigation phases, you MUST acknowledge your understanding of workflow execution requirements AND make a critical decision.\n\n**CRITICAL UNDERSTANDING:**\n\n1. **This is a 26-step structured workflow, not freestyle debugging**\n   - You MUST call workflow_next repeatedly until isComplete=true\n   - You CANNOT stop early, even if you think you know the bug\n   - You CANNOT \"figure it out\" and skip steps\n\n2. **Professional research shows 90% error rate for premature conclusions**\n   - Even with 9-10/10 confidence, skipping systematic steps leads to wrong conclusions\n   - Edge cases, alternative explanations, and interaction effects are missed\n   - The workflow FORCES thoroughness for a reason\n\n3. **Remaining phases you MUST complete (regardless of confidence):**\n   - \u2705 Phase 0: Triage & Setup (COMPLETED)\n   - \u23f3 Phase 1: Multi-Dimensional Codebase Analysis (5 sub-phases)\n   - \u23f3 Phase 2: Hypothesis Formation & Validation (8 sub-phases)\n   - \u23f3 Phase 3: Debugging Instrumentation\n   - \u23f3 Phase 4: Evidence Collection & Experimentation\n   - \u23f3 Phase 5: Final Analysis & Confidence Assessment\n   - \u23f3 Phase 6: Comprehensive Diagnostic Writeup (REQUIRED DELIVERABLE)\n\n4. **What \"completion\" means:**\n   - isComplete=true from workflow_next (this happens ONLY after Phase 6)\n   - Comprehensive diagnostic writeup produced\n   - All evidence documented\n   - Investigation context finalized\n\n**DECISION POINT: Continue Full Investigation or Request Early Termination?**\n\n**OPTION 1: Complete Full Investigation (STRONGLY RECOMMENDED)**\n- Proceed through all 28 workflow steps\n- Professional standard for thorough bug investigations\n- 90% success rate for identifying true root cause\n- Produces comprehensive diagnostic writeup with actionable findings\n\n**OPTION 2: Request Early Termination (STRONGLY DISCOURAGED)**\n- Skip systematic investigation and attempt to diagnose now\n- \u26a0\ufe0f **EVIDENCE-BASED WARNING:**\n  - **Professional research spanning 20+ years shows that agents who skip systematic investigation are wrong about the root cause ~90% of the time**\n  - Even if you feel 9-10/10 confident right now, completing the full workflow dramatically increases accuracy\n  - Quick conclusions miss: contracts & invariants, pattern deviations, HOT paths, edge cases, interaction effects, and alternative explanations\n  - The full workflow takes ~30-60 minutes but increases fix success rate from ~10% to ~90%\n\n**REQUIRED AGENT ACTIONS:**\n\n1. **Acknowledge workflow commitment:**\n\"I acknowledge that I am executing a structured 26-step workflow and I MUST call workflow_next for every step until isComplete=true. I understand that stopping early leads to incorrect conclusions ~90% of the time. I will complete all phases (0-6) regardless of my confidence level.\"\n\n2. **State your recommendation:**\nYou MUST recommend OPTION 1 (complete full investigation) unless you have truly exceptional circumstances.\n\n3. **MANDATORY USER COMMUNICATION:**\nYou MUST explicitly tell the user (this is NOT optional):\n\"**I strongly recommend we complete the full systematic investigation (26 steps through Phase 6). Professional research shows this approach identifies the TRUE root cause ~90% of the time, compared to ~10% for quick conclusions. Even if I develop high confidence early, completing the full workflow\u2014including contracts analysis, pattern discovery, HOT path analysis, instrumentation, and evidence collection\u2014dramatically increases the likelihood of correctly identifying the root cause and preventing wasted time on wrong fixes.**\n\nDo you want to proceed with the full investigation (recommended), or would you prefer I attempt a quick diagnosis now (discouraged)?\"\n\n**USER CONFIRMATION REQUIRED:**\nThe user must explicitly choose to proceed with full investigation or request early termination.",
-      "agentRole": "You are a workflow governance specialist ensuring agents understand they are bound to execute all workflow steps systematically, and that they MUST communicate the value of full workflow completion to users.",
-      "guidance": [
-        "This checkpoint prevents premature termination at the earliest possible point",
-        "Agents must explicitly acknowledge workflow structure AND communicate value to user",
-        "The MANDATORY USER COMMUNICATION is not optional - agents MUST say this exact message",
-        "Agents must recommend Option 1 unless truly exceptional circumstances exist",
-        "This is both a psychological commitment device and a user education moment",
-        "Users must explicitly confirm proceeding with full investigation",
-        "If user chooses early termination, agent must acknowledge 90% error rate and proceed with best-effort quick diagnosis"
-      ],
-      "requireConfirmation": true,
-      "runCondition": {
-        "var": "automationLevel",
-        "not_equals": "High"
-      }
-    },
-    {
-      "id": "phase-1-iterative-analysis",
-      "type": "loop",
-      "title": "Phase 1: Multi-Dimensional Codebase Analysis",
-      "loop": {
-        "type": "for",
-        "count": 5,
-        "maxIterations": 5,
-        "iterationVar": "analysisPhase"
-      },
-      "body": [
-        {
-          "id": "analysis-neighborhood-contracts",
-          "title": "Analysis 1/5: Neighborhood, Call Graph & Contracts",
-          "prompt": "**NEIGHBORHOOD & CONTRACTS DISCOVERY - Build Structural Foundation**\n\nGoal: Build lightweight understanding of code structure, relationships, and contracts BEFORE diving into details. This provides the scaffolding for all subsequent analysis.\n\n**STEP 1: Compute Module Root**\n- Find nearest common ancestor of error stack trace files\n- Clamp to package boundary or src/ directory\n- This defines your investigation scope\n- Set `moduleRoot` context variable\n\n**STEP 2: Neighborhood Map** (cap per file to prevent analysis paralysis)\n- For each file in error stack trace:\n  - List immediate neighbors (same directory, max 8)\n  - Find imports/exports directly used (max 10)\n  - Locate co-located tests (same name pattern)\n  - Identify closest entry points: routes, endpoints, CLI commands (max 5)\n- Produce table: File | Neighbors | Tests | Entry Points\n\n**STEP 3: Bounded Call Graph** (Small Multiples with HOT Path Ranking)\n- For each failing function/class in stack trace:\n  - Build call graph \u22642 hops deep (inbound and outbound)\n  - Cap total nodes at \u226415 per failing symbol\n  - Score edges for HOT path ranking:\n    * Error location in path: +3\n    * Entry point to path: +2  \n    * Test coverage exists: +1\n    * Mentioned in ticket/error message: +1\n  - Tag paths as HOT if score \u22653\n  - Use Small Multiples ASCII visualization:\n    * Width \u2264100 chars per path\n    * Format: `EntryPoint -> Caller -> [*FailingSymbol*] -> Callee`\n    * Mark changed/failing code as `[*name*]`\n    * Add HOT tag for high-impact paths\n    * \u22648 total paths, prioritize HOT paths first\n  - If graph exceeds caps, use Adjacency Summary instead:\n    * Table: Node | Inbound | Outbound | Notes\n    * Top-K by degree/frequency\n- Create Alias Legend for repeated subpaths:\n  * A1 = common.validation.validateInput\n  * A2 = database.connection.getPool\n  * Reuse aliases across all paths\n\n**STEP 4: Flow Anchors** (Entry Points to Bug)\n- Map how users/systems trigger the bug:\n  - HTTP routes \u2192 handlers \u2192 failing code\n  - CLI commands \u2192 execution \u2192 failing code  \n  - Scheduled jobs \u2192 workers \u2192 failing code\n  - Event handlers \u2192 callbacks \u2192 failing code\n- Produce table: Anchor Type | Entry Point | Target Symbol | User Action\n- Cap at \u22645 most relevant anchors\n- Note: This tells us HOW the bug is reached\n\n**STEP 5: Contracts & Invariants**\n- Within `moduleRoot` and immediate neighbors:\n  - List public API symbols (exported functions/classes)\n  - Document API endpoints (REST/GraphQL/RPC)\n  - Identify database tables/collections touched\n  - Note message queue topics/events\n  - Extract stated invariants from:\n    * JSDoc/docstrings with @invariant\n    * Assertions in code\n    * Validation logic patterns\n    * Comments describing guarantees\n- Produce table: Symbol/API | Contract | Invariant | Location\n- Focus on contracts related to failing code\n\n**OUTPUT: Create StructuralAnalysis.md with:**\n- Module Root declaration\n- Neighborhood Map table\n- Bounded Call Graph (Small Multiples ASCII or Adjacency Summary)\n- Alias Legend (for call graph subpaths)\n- Flow Anchors table\n- Contracts & Invariants table\n- Self-Critique: 1-2 areas of uncertainty\n\n**CAPS (strictly enforce to prevent analysis paralysis):**\n- \u22648 neighbors per file\n- \u226410 imports per file\n- \u22645 entry points total\n- \u226415 call graph nodes per failing symbol\n- \u22648 total call graph paths\n- \u22645 flow anchors\n- \u2264100 chars width for ASCII paths",
-          "agentRole": "You are a codebase navigator building structural understanding. Your focus is mapping relationships, entry points, and contracts WITHOUT diving into implementation details yet.",
-          "guidance": [
-            "This is analysis phase 1 of 5 total phases",
-            "Phase 1a = Structure - Build the map before exploring terrain",
-            "Initialize majorIssuesFound = false",
-            "STRICTLY ENFORCE CAPS - this prevents 2-hour rabbit holes",
-            "Small Multiples: Render mini ASCII path diagrams (\u22646 nodes per path)",
-            "HOT Path Ranking: Score and prioritize high-impact paths",
-            "Alias Legend: Collapse repeated subpaths with deterministic aliases (A1, A2...)",
-            "Adjacency Summary: If caps exceeded, use tabular summary instead of full graph",
-            "Contracts are CRITICAL: They tell us what guarantees the code must maintain",
-            "Flow Anchors show HOW users trigger the bug - essential for reproduction",
-            "Create StructuralAnalysis.md in investigation directory",
-            "Update INVESTIGATION_CONTEXT.md with module root and structural summary",
-            "This phase provides the scaffolding for all subsequent analysis"
-          ],
-          "runCondition": {
-            "var": "analysisPhase",
-            "equals": 1
-          },
-          "requireConfirmation": false
-        },
-        {
-          "id": "analysis-breadth-scan",
-          "title": "Analysis 2/5: Breadth Scan & Pattern Discovery",
-          "prompt": "**BREADTH SCAN - Cast Wide Net + Learn Expected Behavior**\n\nGoal: Understand full system impact, identify all potentially involved components, and discover existing code patterns to understand expected behavior.\n\n**PART A: Pattern Discovery (Learn How Code SHOULD Work)**\n1. **Compute Module Root**: Find nearest common ancestor of error stack trace files, clamped to package/src\n2. **Discover Patterns** (scan only moduleRoot, exclude failing files from pattern definition):\n   - Naming conventions (classes, methods, variables)\n   - Error handling patterns (try/catch, error propagation, logging)\n   - Logging patterns (format, verbosity, error vs info vs debug)\n   - Data validation patterns (where/how data is checked)\n   - Test patterns (structure, naming, assertion style)\n   - Require \u22652 occurrences across distinct files to qualify as pattern\n3. **Capture Pattern Catalog**: Document validated patterns with 1-3 exemplar locations (file:line)\n4. **Identify Pattern Deviations in Failing Code**: Compare failing code against pattern catalog\n\n**PART B: Error Propagation & Component Discovery**\n1. **ERROR PROPAGATION MAPPING**: Use grep_search for all error occurrences, trace error messages across log files, map stack traces to identify call chains, document every point where error appears/handled\n2. **COMPONENT DISCOVERY**: Find components interacting with failing area, use codebase_search \"How is [component] used?\", identify callers/callees, cap to top 10 most suspicious, rank by likelihood (1-10)\n3. **BOUNDED CALL GRAPH**: For failing function, build call graph \u22642 hops deep, cap at \u226415 total nodes, identify HOT paths (paths through error location), prioritize HOT paths in analysis\n4. **FLOW ANCHORS**: Map entry points (routes/endpoints/CLI commands) to failing code, cap at \u22645 anchors, note which user actions trigger the bug\n\n**PART C: Data Flow & Changes**\n1. **DATA FLOW MAPPING**: Trace data through bug area, identify transformations, persistence points, corruption opportunities - but CAP scope to moduleRoot and 2-hop neighborhood\n2. **RECENT CHANGES ANALYSIS**: Git history for identified components (last 10 commits), identify when bug appeared, related PRs/issues, config/dependency changes\n3. **HISTORICAL PATTERN SEARCH**: Use findSimilarBugs() for similar error patterns, previous fixes, related test failures\n\n**Output**: Create BreadthAnalysis.md with:\n- Pattern Catalog (validated patterns + exemplars)\n- Pattern Deviations (how failing code differs from expected patterns)\n- Bounded Call Graph (\u226415 nodes, HOT paths highlighted)\n- Flow Anchors Table (entry point \u2192 failing symbol)\n- Suspicious Components (top 10, ranked 1-10)\n- Data Flow Map (scoped to moduleRoot + 2 hops)\n- Recent Changes Timeline\n- Historical Similar Bugs\n\n**Self-Critique**: List 1-2 areas where you have low confidence or missing information.",
-          "agentRole": "You are performing systematic analysis phase 2 of 5. Your focus is understanding both what IS happening (error propagation) and what SHOULD happen (pattern discovery) to identify deviations.",
-          "guidance": [
-            "This is analysis phase 2 of 5 total phases",
-            "Phase 1b = Breadth + Patterns - Learn expected behavior AND map error propagation",
-            "Create BreadthAnalysis.md with structured findings",
-            "CRITICAL: Discover patterns FIRST from working code, THEN compare failing code to patterns",
-            "Pattern deviations often reveal the bug (e.g., missing validation, different error handling)",
-            "Apply CAPS to prevent analysis paralysis: \u226410 components, \u226415 call graph nodes, \u22645 flow anchors, \u22642 hops",
-            "HOT PATH RANKING: Score paths by (error in path=3, entry point=2, test coverage=1); tag HOT if score\u22653",
-            "BOUNDED CALL GRAPH: Use codebase_search to find callers/callees, stop at 2 hops, cap nodes, dedupe",
-            "PATTERN DISCOVERY: Require \u22652 occurrences to qualify as pattern; singletons are 'candidate conventions' only",
-            "SELF-CRITIQUE: Explicitly note 1-2 areas of uncertainty or missing information",
-            "Update INVESTIGATION_CONTEXT.md after completion",
-            "Use the function definitions for standardized operations"
-          ],
-          "runCondition": {
-            "var": "analysisPhase",
-            "equals": 2
-          },
-          "requireConfirmation": false
-        },
-        {
-          "id": "analysis-deep-dive",
-          "title": "Analysis 3/5: Component Deep Dive with Hot-Path Focus",
-          "prompt": "**COMPONENT DEEP DIVE - Prioritized Investigation**\n\nGoal: Deep understanding of top 5 suspicious components from breadth scan, prioritizing HOT paths and pattern deviations.\n\n**PRIORITIZATION (from Phase 1):**\n1. Focus on components on HOT paths (score \u22653)\n2. Prioritize components with pattern deviations\n3. Rank by likelihood score from Phase 1\n4. Cap analysis to top 5 components\n\n**FOR EACH COMPONENT (recursive 3-level analysis):**\n\n**LEVEL 1 - DIRECT IMPLEMENTATION** (prioritize HOT paths and deviation areas):\n- Read complete file (or HOT path sections if file >500 lines)\n- Compare error handling against pattern catalog from Phase 1\n- Identify pattern deviations with file:line locations\n- Check state management, initialization, cleanup\n- Document invariants and assumptions\n- Note TODO/FIXME/HACK/BUG comments\n- Red flags: complex logic, missing validation, race conditions\n\n**LEVEL 2 - DIRECT DEPENDENCIES** (cap at \u226410 deps per component):\n- Follow imports on HOT paths first\n- Check dependency contracts and interfaces\n- Analyze coupling and data exchange\n- Look for shared mutable state\n- Identify circular dependencies\n- Document failure propagation paths\n\n**LEVEL 3 - INTEGRATION POINTS** (cap at \u22648 integration points):\n- External calls (DB, API, file system) - cap at \u22645\n- Concurrency/threading concerns\n- Resource management issues\n- Caching and state sync\n- Event handling and callbacks\n- Configuration dependencies\n\n**FOR EACH COMPONENT, PRODUCE:**\n- **Likelihood Score** (1-10): Weight HOT paths +3, pattern deviations +2, recent changes +1\n- **Suspicious Sections**: Specific file:line with rationale (\u22645 per component)\n- **Failure Modes**: How this component could cause the observed bug (\u22643 scenarios)\n- **Pattern Violations**: How it deviates from expected patterns (from Phase 1)\n- **Critical Dependencies**: Top 3 dependencies that could be sources\n\n**Output**: Create ComponentAnalysis.md with:\n- Component Rankings (1-5, sorted by likelihood score)\n- Per-Component Analysis (following structure above)\n- Pattern Violation Summary\n- Critical Path Map (which components are on HOT paths)\n- **Self-Critique**: 1-2 components you're uncertain about and why\n\n**CAPS TO PREVENT ANALYSIS PARALYSIS:**\n- Top 5 components only\n- \u226410 dependencies per component\n- \u22648 integration points per component\n- \u22645 suspicious sections per component\n- \u22643 failure modes per component",
-          "agentRole": "You are performing systematic analysis phase 3 of 5. Your focus is deep-diving into the most suspicious components, prioritizing HOT paths and pattern deviations.",
-          "guidance": [
-            "This is analysis phase 3 of 5 total phases",
-            "Phase 1c = Deep Dive - Focus on HOT paths and pattern violations",
-            "Build on findings from Phase 1 (patterns, HOT paths, flow anchors)",
-            "Create ComponentAnalysis.md with structured findings",
-            "Use recursiveAnalysis() for systematic exploration",
-            "PRIORITIZE HOT PATHS: Analyze code on HOT paths before other code",
-            "PATTERN-DRIVEN: Compare actual code against pattern catalog from Phase 1",
-            "APPLY CAPS STRICTLY: Prevents spending 2 hours reading every file",
-            "SELF-CRITIQUE: Note where you're uncertain or making assumptions",
-            "Update INVESTIGATION_CONTEXT.md after completion"
-          ],
-          "runCondition": {
-            "var": "analysisPhase",
-            "equals": 3
-          },
-          "requireConfirmation": false
-        },
-        {
-          "id": "analysis-dependencies",
-          "title": "Analysis 4/5: Dependencies & Flow",
-          "prompt": "**DEPENDENCY & FLOW ANALYSIS - Trace Connections**\n\nGoal: Understand how components interact and data flows between them.\n\nPerform: Static dependency graph analysis, Runtime flow analysis, Data transformation pipeline tracing, and Integration analysis.\n\n**Output**: FlowAnalysis.md with sequence diagrams showing execution flow, data flow maps with transformation points, complete dependency graph, list of all integration points and failure modes, and timeline showing order of operations.",
-          "agentRole": "You are performing systematic analysis phase 4 of 5. Your focus is tracing how components connect and data flows between them.",
-          "guidance": [
-            "This is analysis phase 4 of 5 total phases",
-            "Phase 1d = Dependencies - Trace connections and data flows",
-            "Build on component understanding from Phase 2",
-            "Create FlowAnalysis.md with diagrams and flow charts",
-            "STATIC DEPENDENCY GRAPH: Build complete import/dependency tree, identify circular dependencies, find hidden dependencies (reflection, dynamic loading, DI), map version constraints and compatibility, document shared libraries and utilities, note tight coupling or fragile dependencies",
-            "RUNTIME FLOW ANALYSIS: Trace execution paths to bug, identify async/concurrent flows and coordination, map state changes through execution, document control flow (conditionals, loops, exceptions), track callback chains and event handlers, identify divergence points, note timing dependencies and race conditions",
-            "DATA TRANSFORMATION PIPELINE: Track data from input to error point, document each transformation with input/output types, identify validation points and what they check, find where data could be corrupted/lost, note serialization/deserialization boundaries, track data format conversions, document enrichment/filtering steps",
-            "INTEGRATION ANALYSIS: External service calls and failure modes, database interactions (reads/writes/transactions), message queue operations and formats, file system operations and error handling, network calls and timeout handling, cache usage and invalidation, third-party library calls",
-            "Focus on runtime behavior and integration points",
-            "Update INVESTIGATION_CONTEXT.md after completion",
-            "Pay special attention to async boundaries and error propagation",
-            "Look for implicit dependencies that aren't obvious from imports"
-          ],
-          "runCondition": {
-            "var": "analysisPhase",
-            "equals": 4
-          },
-          "requireConfirmation": false
-        },
-        {
-          "id": "analysis-test-coverage",
-          "title": "Analysis 5/5: Test Coverage",
-          "prompt": "**TEST COVERAGE ANALYSIS - Leverage Existing Knowledge**\n\nGoal: Use existing tests as source of truth about system behavior.\n\nFor each suspicious component, use analyzeTests(component) to perform: Direct test coverage analysis, Integration test analysis, Test history investigation, Test execution with debugging, and Coverage gap analysis.\n\n**Output**: TestAnalysis.md with coverage gaps matrix, suspicious test patterns, test evidence for hypotheses, recommendations for tests to add, and complete test inventory for affected components.",
-          "agentRole": "You are performing systematic analysis phase 5 of 5. Your focus is leveraging existing tests to understand expected behavior and find coverage gaps.",
-          "guidance": [
-            "This is analysis phase 5 of 5 total phases",
-            "Phase 1e = Tests - Analyze test coverage and quality",
-            "Build on all previous analysis phases",
-            "Create TestAnalysis.md with coverage gap matrix",
-            "DIRECT TEST COVERAGE: Find all tests using grep/test discovery, analyze what's tested (happy/edge/error cases), identify what's NOT tested, check test quality and assertion strength, note mocking/stubbing that might hide issues, review test names and docs",
-            "INTEGRATION TEST ANALYSIS: Find end-to-end tests for bug area, analyze assumptions/preconditions, check for flaky tests, review disabled/skipped tests and why, look for TODO/incomplete tests, identify multi-component tests, verify if tests cover failing scenario",
-            "TEST HISTORY: When were tests added/modified? Do test changes correlate with bug appearance? Were tests removed/disabled recently? Use git blame for authors and context, look for related PRs/issues, review test evolution",
-            "TEST EXECUTION WITH DEBUGGING: Run tests with debug flags (--verbose, --debug), add instrumentation to tests themselves, compare expected vs actual in detail, run in isolation and in suite, try different orderings to check dependencies, monitor resource usage",
-            "COVERAGE GAP ANALYSIS: Use coverage tools for untested code paths, map coverage to bug components, identify branches/conditions never exercised, note error handling without tests, document missing edge cases, recommend tests to add",
-            "Run tests with debug flags for additional insights",
-            "After completion, use trackInvestigation('Phase 1 Complete', 'Moving to Hypothesis Development')",
-            "Tests often reveal the 'expected' behavior - compare with actual behavior",
-            "Missing tests often indicate areas where bugs hide"
-          ],
-          "runCondition": {
-            "var": "analysisPhase",
-            "equals": 5
-          },
-          "requireConfirmation": false
-        }
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-1a-binary-search",
-      "title": "Phase 1a: Binary Search Isolation",
-      "runCondition": {
-        "or": [
-          {
-            "var": "bugType",
-            "equals": "regression"
-          },
-          {
-            "var": "searchSpace",
-            "equals": "large"
-          }
-        ]
-      },
-      "prompt": "**BINARY SEARCH** - Apply divide-and-conquer:\n\n1. Identify GOOD state (working) and BAD state (broken)\n2. Find midpoint in history/code/data\n3. Test midpoint state\n4. Narrow to relevant half\n5. Document reduced search space\n\n**OUTPUT**: Narrowed location with evidence.",
-      "agentRole": "You are a systematic investigator using algorithmic search to efficiently isolate issues.",
-      "guidance": [
-        "VERSION CONTROL: Use 'git bisect' or equivalent for commit history searches",
-        "DATA PIPELINE: Test data at pipeline midpoints to isolate transformation issues",
-        "TIME WINDOWS: For time-based issues, binary search through timestamps",
-        "DOCUMENT BOUNDARIES: Clearly record each tested boundary and result",
-        "EFFICIENCY: Each test should eliminate ~50% of remaining search space"
-      ]
-    },
-    {
-      "id": "phase-1b-test-reduction",
-      "title": "Phase 1b: Test Case Minimization",
-      "runCondition": {
-        "var": "bugSource",
-        "equals": "failing_test"
-      },
-      "prompt": "**TEST REDUCTION** - Simplify failing test:\n\n1. Inline called methods into test\n2. Add earlier assertion to fail sooner\n3. Remove code after new failure point\n4. Repeat until minimal\n\n**OUTPUT**: Minimal failing test case.",
-      "agentRole": "You are a surgical debugger who strips away layers to reveal core issues.",
-      "guidance": [
-        "PRESERVE FAILURE: Each reduction must maintain the original failure mode",
-        "INLINE AGGRESSIVELY: Replace method calls with their actual implementation",
-        "FAIL EARLY: Move assertions up to find earliest deviation from expected state",
-        "REMOVE RUTHLESSLY: Delete all code that doesn't contribute to the failure",
-        "CLARITY GOAL: Final test should make the bug obvious to any reader"
-      ]
-    },
-    {
-      "id": "phase-1f-breadth-verification",
-      "title": "Phase 1f: Final Breadth & Scope Verification",
-      "prompt": "**FINAL BREADTH & SCOPE VERIFICATION - Catch Tunnel Vision NOW**\n\n\u26a0\ufe0f **CRITICAL CHECKPOINT BEFORE HYPOTHESES**: This step prevents the #1 cause of wrong conclusions: looking in the wrong place or missing the wider context.\n\n**Goal**: Verify you analyzed the RIGHT code with sufficient breadth AND depth before committing to hypotheses.\n\n---\n\n**STEP 1: Scope Sanity Check**\n\nAsk yourself these questions:\n1. **Module Root Correctness**: Is the `moduleRoot` from Phase 1a actually correct?\n   - Does it include ALL files in the error stack trace?\n   - Did I clamp too narrowly to a subdirectory when the bug spans multiple modules?\n   - Should I expand scope to parent directory or adjacent modules?\n\n2. **Missing Adjacent Systems**: Did I consider:\n   - Adjacent microservices/modules that interact with this one?\n   - Shared libraries or utilities used here?\n   - Configuration systems (env vars, config files, feature flags)?\n   - Caching layers or state management systems?\n   - Database schema or data migration issues?\n\n3. **Entry Point Coverage**: From Phase 1a Flow Anchors, did I verify:\n   - ALL entry points that could trigger this bug?\n   - Less obvious entry points (background jobs, scheduled tasks, webhooks)?\n   - Initialization code that runs before the failing code?\n\n---\n\n**STEP 2: Wide-Angle Review**\n\nReview your Phase 1 analysis outputs and answer:\n\n1. **Pattern Confidence** (from Phase 1, sub-phase 2):\n   - Do I have a solid Pattern Catalog with \u22652 occurrences per pattern?\n   - Did I identify clear pattern deviations in failing code?\n   - Are there OTHER files that deviate from patterns I haven't looked at?\n\n2. **Call Graph Completeness** (from Phase 1, sub-phase 1 & 2):\n   - Did my bounded call graph capture all HOT paths?\n   - Are there callers OUTSIDE my 2-hop boundary I should check?\n   - Did I trace BACKWARDS from the error far enough (to true entry points)?\n\n3. **Component Rankings** (from Phase 1, sub-phase 3):\n   - Are my top 5 components actually the most suspicious?\n   - Did I miss components because they're not in the stack trace?\n   - Should I re-rank based on new understanding?\n\n4. **Data Flow Completeness** (from Phase 1, sub-phase 4):\n   - Did I trace data flow from TRUE origin (user input, external system)?\n   - Are there data transformations BEFORE my analyzed scope?\n   - Did I check data validation at ALL boundaries?\n\n5. **Test Coverage Gaps** (from Phase 1, sub-phase 5):\n   - Did I find tests that SHOULD exist but don't?\n   - Are there missing test categories (integration, edge cases, error conditions)?\n   - Do test gaps reveal I'm looking in wrong place?\n\n---\n\n**STEP 3: Alternative Scope Analysis**\n\n**Generate 2-3 alternative investigation scopes and evaluate:**\n\nFor each alternative scope, assess:\n- **Scope Description**: What module/area would this focus on?\n- **Why It Might Be Better**: What evidence suggests this scope?\n- **Evidence For**: What supports investigating this area?\n- **Evidence Against**: Why might this be wrong direction?\n- **Confidence**: Rate 1-10 that this is the right scope\n\n**Example Alternative Scopes**:\n- Expand to parent module (if current feels too narrow)\n- Shift to adjacent service (if this might be symptom not cause)\n- Focus on infrastructure layer (if might be env/config issue)\n- Focus on data layer (if might be data corruption/migration issue)\n\n---\n\n**STEP 4: Breadth Decision**\n\nBased on Steps 1-3, make ONE of these decisions:\n\n**OPTION A: SCOPE IS CORRECT - Continue to Hypothesis Development**\n- Current module root and analyzed components are right\n- Breadth and depth are sufficient\n- Ready to form hypotheses with confidence\n- Set `scopeVerified = true` and proceed\n\n**OPTION B: EXPAND SCOPE - Additional Analysis Required**\n- Identified critical gaps in breadth or depth\n- Need to analyze additional modules/components\n- Set specific components/areas to add to analysis\n- Set `needsScopeExpansion = true`\n- Document what to add: `additionalAnalysisNeeded = [list]`\n\n**OPTION C: SHIFT SCOPE - Wrong Area**\n- Current focus is likely wrong place\n- Alternative scope has stronger evidence\n- Need to restart Phase 1 with new module root\n- Set `needsScopeShift = true`\n- Set `newModuleRoot = [path]`\n\n---\n\n**OUTPUT: Create ScopeVerification.md**\n\nMust include:\n1. **Scope Sanity Check Results** (answers to Step 1 questions)\n2. **Wide-Angle Review Findings** (answers to Step 2 questions)\n3. **Alternative Scopes Evaluated** (2-3 alternatives with scores)\n4. **Breadth Decision** (A, B, or C with justification)\n5. **Confidence in Current Scope** (1-10)\n6. **Action Items** (if Option B or C selected)\n\n**Context Variables to Set**:\n- `scopeVerified` (true/false)\n- `needsScopeExpansion` (true/false)\n- `needsScopeShift` (true/false)\n- `scopeConfidence` (1-10)\n- `additionalAnalysisNeeded` (array, if Option B)\n- `newModuleRoot` (string, if Option C)\n\n---\n\n**\ud83c\udfaf WHY THIS MATTERS**: \n\nResearch shows that 60% of failed investigations looked in the wrong place or too narrowly. This checkpoint catches that BEFORE you invest effort in wrong hypotheses.\n\n**Self-Critique**: List 1-2 specific uncertainties about scope that concern you most.",
-      "agentRole": "You are a senior investigator performing final scope verification. Your expertise is catching tunnel vision, identifying missing context, and ensuring investigations focus on the right area. You excel at meta-analysis and sanity checking investigative scope.",
-      "guidance": [
-        "This step comes AFTER Phase 1 (5-phase analysis loop) and BEFORE Phase 2a (hypothesis development)",
-        "Goal: Catch tunnel vision and wrong-place investigations BEFORE committing to hypotheses",
-        "Create ScopeVerification.md with structured findings",
-        "CRITICAL: Evaluate 2-3 ALTERNATIVE scopes to challenge your current focus",
-        "Common mistakes: too narrow scope, missed adjacent systems, wrong module root, insufficient entry point coverage",
-        "If Option B (expand) or C (shift) selected, you MUST execute additional analysis before proceeding",
-        "High confidence (\u22658) in current scope required to proceed to hypotheses",
-        "This prevents the #1 cause of wrong conclusions: looking in wrong place",
-        "Update INVESTIGATION_CONTEXT.md with scope verification results"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-2a-hypothesis-development",
-      "title": "Phase 2a: Hypothesis Development & Prioritization",
-      "prompt": "**HYPOTHESIS GENERATION** - Based on codebase analysis, formulate testable hypotheses about the bug's root cause.\n\n**STEP 1: Evidence-Based Hypothesis Development**\nCreate maximum 5 prioritized hypotheses. Each includes:\n- **Root Cause Theory**: Specific technical explanation\n- **Supporting Evidence**: Code patterns/logic flows supporting this theory\n- **Failure Mechanism**: Exact sequence leading to observed bug\n- **Testability Score**: Quantified assessment (1-10) of validation ease\n- **Evidence Strength Score**: Quantified assessment (1-10) based on code findings\n\n**STEP 2: Hypothesis Prioritization Matrix**\nRank hypotheses using weighted scoring:\n- **Evidence Strength** (40%): Code analysis support for theory\n- **Testability** (35%): Validation ease with debugging instruments\n- **Impact Scope** (25%): How well this explains all symptoms\n\n**STEP 3: Pattern Integration**\nIncorporate findings from findSimilarBugs():\n- **Historical Patterns**: Similar bugs fixed previously\n- **Known Issues**: Related problems in the codebase\n- **Test Failures**: Similar test failure patterns\n- Adjust hypothesis confidence based on pattern matches\n\n**CRITICAL RULE**: All hypotheses must be based on concrete evidence from code analysis.\n\n**OUTPUTS**: Maximum 5 hypotheses with quantified scoring, ranked by priority.\n\n**\u26a0\ufe0f INVESTIGATION NOT COMPLETE**: Developing hypotheses with high evidence scores is excellent progress, but represents only ~35% of the investigation. Even if you have a hypothesis with 9-10/10 evidence strength:\n\n- You are NOT done with the investigation\n- You MUST continue to Phase 2b-2h to refine and validate hypotheses\n- You MUST continue to Phase 3 to implement instrumentation\n- You MUST continue to Phase 4-5 to collect and analyze evidence\n- You MUST continue to Phase 6 to produce the comprehensive diagnostic writeup\n\n**DO NOT set isWorkflowComplete=true at this stage.** The workflow requires completing all phases.",
-      "agentRole": "You are a senior software detective and root cause analysis expert with deep expertise in systematic hypothesis formation. Your strength lies in connecting code evidence to potential failure mechanisms and creating testable theories. You excel at logical reasoning and evidence-based deduction. You must maintain rigorous quantitative standards and reject any hypothesis not grounded in concrete code evidence.",
-      "guidance": [
-        "EVIDENCE-BASED ONLY: Every hypothesis must be grounded in concrete code analysis findings with quantified evidence scores",
-        "HYPOTHESIS LIMITS: Generate maximum 5 hypotheses to prevent analysis paralysis",
-        "QUANTIFIED SCORING: Use 1-10 scales for evidence strength and testability with clear criteria"
-      ],
-      "validationCriteria": [
-        {
-          "type": "contains",
-          "value": "Evidence Strength Score",
-          "message": "Must include quantified evidence strength scoring (1-10) for each hypothesis"
-        },
-        {
-          "type": "contains",
-          "value": "Testability Score",
-          "message": "Must include quantified testability scoring (1-10) for each hypothesis"
-        }
-      ],
-      "hasValidation": true
-    },
-    {
-      "id": "phase-2b-hypothesis-validation-strategy",
-      "title": "Phase 2b: Hypothesis Validation Strategy & Documentation",
-      "prompt": "**HYPOTHESIS VALIDATION PLANNING** - For the top 3 hypotheses, create validation strategies and documentation.\n\n**STEP 1: Hypothesis Validation Strategy**\nFor top 3 hypotheses, define:\n- **Required Evidence**: Specific evidence to confirm/refute hypothesis\n- **Debugging Approach**: Instrumentation/tests providing evidence\n- **Success Criteria**: Results proving hypothesis correct\n- **Confidence Threshold**: Minimum evidence quality needed\n\n**STEP 2: Hypothesis Documentation**\nCreate structured registry:\n- **Hypothesis ID**: H1, H2, H3 for tracking\n- **Status**: Active, Refuted, Confirmed\n- **Evidence Log**: Supporting and contradicting evidence\n- **Validation Plan**: Specific testing approach\n\n**STEP 3: Coverage Check**\nEnsure hypotheses cover diverse categories (logic, state, dependencies) with deep analysis.\n\n**STEP 4: Update Investigation Context**\nUse updateInvestigationContext('Hypothesis Registry', formatted hypothesis table with all details)\n\n**OUTPUTS**: Top 3 hypotheses selected for validation with structured documentation and validation plans.",
-      "agentRole": "You are a systematic testing strategist and documentation expert. Your strength lies in creating clear validation plans and maintaining rigorous documentation standards for hypothesis tracking and evidence collection.",
-      "guidance": [
-        "STRUCTURED DOCUMENTATION: Create formal hypothesis registry with tracking IDs and status",
-        "VALIDATION RIGOR: Only proceed with top 3 hypotheses that meet minimum evidence thresholds",
-        "COMPREHENSIVE PLANNING: Each hypothesis must have clear validation approach and success criteria"
-      ],
-      "validationCriteria": [
-        {
-          "type": "contains",
-          "value": "Hypothesis ID",
-          "message": "Must assign tracking IDs (H1, H2, H3) to each hypothesis"
-        },
-        {
-          "type": "regex",
-          "pattern": "H[1-3]",
-          "message": "Must use proper hypothesis ID format (H1, H2, H3)"
-        }
-      ],
-      "hasValidation": true
-    },
-    {
-      "id": "phase-2c-hypothesis-assumptions",
-      "title": "Phase 2c: Hypothesis Assumption Audit",
-      "prompt": "**AUDIT** each hypothesis for hidden assumptions:\n\n**FOR EACH HYPOTHESIS**:\n- List implicit assumptions\n- Rate assumption confidence (1-10)\n- Identify verification approach\n\n**REJECT** hypotheses built on unverified assumptions.",
-      "agentRole": "You are a rigorous scientist who rejects any hypothesis not grounded in verified facts.",
-      "guidance": [
-        "EXPLICIT LISTING: Write out every assumption, no matter how obvious it seems",
-        "CONFIDENCE SCORING: Rate 1-10 based on evidence quality, not intuition",
-        "VERIFICATION PLAN: For each assumption, specify how it can be tested",
-        "REJECTION CRITERIA: Any assumption with confidence <7 requires verification",
-        "DOCUMENT RATIONALE: Explain why each assumption is accepted or needs testing"
-      ],
-      "validationCriteria": [
-        {
-          "type": "contains",
-          "value": "Assumption confidence",
-          "message": "Must rate assumption confidence for each hypothesis"
-        }
-      ],
-      "hasValidation": true
-    },
-    {
-      "id": "phase-2d-prepare-validation",
-      "title": "Phase 2d: Prepare Hypothesis Validation",
-      "prompt": "**PREPARE VALIDATION ARRAY** - Extract the top 3 hypotheses for systematic validation.\n\n**Create `hypothesesToValidate` array with:**\n```json\n[\n  {\n    \"id\": \"H1\",\n    \"description\": \"[Hypothesis description]\",\n    \"evidenceStrength\": [score],\n    \"testability\": [score],\n    \"validationPlan\": \"[Specific testing approach]\"\n  },\n  // ... H2, H3\n]\n```\n\n**Set context variables:**\n- `hypothesesToValidate`: Array of top 3 hypotheses\n- `currentConfidence`: 0 (will be updated during validation)\n- `validationIterations`: 0 (tracks validation cycles)",
-      "agentRole": "You are preparing the systematic validation process by structuring hypotheses for iteration.",
-      "guidance": [
-        "Extract only the top 3 hypotheses from Phase 2b",
-        "Ensure each has complete validation information",
-        "Initialize tracking variables for the validation loop"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-2e-test-evidence-gathering",
-      "title": "Phase 2e: Test-Based Hypothesis Evidence",
-      "runCondition": {
-        "var": "hypothesesToValidate",
-        "not_equals": null
-      },
-      "prompt": "**TEST-DRIVEN HYPOTHESIS VALIDATION**\n\nFor each hypothesis in hypothesesToValidate, use checkHypothesisInTests(hypothesis):\n\n**1. Direct Test Evidence**:\n- Find tests that directly test suspected components\n- Analyze test names, descriptions, and assertions\n- Check if tests actually validate what we think\n\n**2. Indirect Test Evidence**:\n- Find tests that would fail if hypothesis is true\n- Look for integration tests touching the area\n- Check for tests that assume opposite behavior\n\n**3. Test Coverage Gaps**:\n- What aspects of hypothesis are NOT tested?\n- Where would a test have caught this bug?\n- What assumptions do tests make?\n\n**4. Test Execution Analysis**:\n- Run tests with debug instrumentation\n- Add temporary logging to tests\n- Compare test expectations vs reality\n\n**5. Historical Test Analysis**:\n- When were relevant tests last modified?\n- Were any tests disabled recently?\n- Do test changes correlate with bug appearance?\n\n**Create TestEvidence Matrix**:\n```\n| Hypothesis | Supporting Tests | Contradicting Tests | Coverage Gaps | Confidence Impact |\n|------------|------------------|---------------------|---------------|-------------------|\n| H1         | TestA, TestB     | TestC (partially)   | Edge case X   | +2 confidence     |\n```\n\n**Update each hypothesis** with test evidence findings.",
-      "agentRole": "You are a test analysis specialist validating hypotheses against the existing test suite. Your goal is to use tests as objective evidence for or against each hypothesis.",
-      "guidance": [
-        "Tests are the codified understanding of system behavior",
-        "A hypothesis contradicted by passing tests needs reconsideration",
-        "Missing test coverage often indicates where bugs hide",
-        "Update hypothesis confidence based on test evidence"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-2f-hypothesis-verification",
-      "type": "loop",
-      "title": "Phase 2f: Hypothesis Verification & Refinement",
-      "runCondition": {
-        "var": "hypothesesToValidate",
-        "not_equals": null
-      },
-      "loop": {
-        "type": "forEach",
-        "items": "hypothesesToValidate",
-        "itemVar": "hypothesis",
-        "indexVar": "hypothesisIndex",
-        "maxIterations": 10
-      },
-      "body": [
-        {
-          "id": "verify-against-code",
-          "title": "Deep Code Verification for {{hypothesis.id}}",
-          "prompt": "**DEEP VERIFICATION for {{hypothesis.id}}**\n\n**Goal**: Verify hypothesis assumptions through deep code analysis.\n\nUse recursiveAnalysis() on key components:\n\n1. **Component Analysis (3 levels deep)**:\n   - Level 1: Direct implementation of suspected component\n   - Level 2: All direct dependencies and callers\n   - Level 3: Transitive dependencies and integration points\n\n2. **State & Data Flow Verification**:\n   - How does data actually flow through this component?\n   - What state transformations occur?\n   - Are there hidden side effects?\n\n3. **Error Path Analysis**:\n   - Trace all error handling paths\n   - Find where errors could originate\n   - Check error propagation matches hypothesis\n\n4. **Concurrency Check** (if applicable):\n   - Race conditions possible?\n   - Shared state issues?\n   - Timing dependencies?\n\n**Output**: Deep verification findings for {{hypothesis.id}}",
-          "agentRole": "You are performing deep verification of hypothesis {{hypothesis.id}}, diving 3+ levels deep to ensure thorough understanding.",
-          "guidance": [
-            "This is verification step 1 of 3 for {{hypothesis.id}}",
-            "Go deeper than the initial analysis - follow every lead",
-            "Document any new discoveries that affect the hypothesis"
-          ],
-          "requireConfirmation": false
-        },
-        {
-          "id": "check-contradictions",
-          "title": "Search for Contradicting Evidence",
-          "prompt": "**CONTRADICTION SEARCH for {{hypothesis.id}}**\n\n**Goal**: Actively search for evidence that contradicts this hypothesis.\n\n1. **Code Pattern Contradictions**:\n   - Search for code that assumes opposite behavior\n   - Find defensive checks that prevent this scenario\n   - Look for comments indicating different understanding\n\n2. **Test Contradictions**:\n   - Tests that would fail if hypothesis were true\n   - Tests that explicitly verify opposite behavior\n   - Integration tests showing different flow\n\n3. **Historical Contradictions**:\n   - Git history showing intentional design decisions\n   - PRs or issues discussing this behavior\n   - Documentation stating different intent\n\n4. **Runtime Contradictions**:\n   - Logs showing successful execution through suspected path\n   - Metrics indicating normal behavior\n   - Other systems depending on current behavior\n\n**Be a skeptic** - try to disprove {{hypothesis.id}}",
-          "agentRole": "You are a skeptical investigator trying to find flaws in hypothesis {{hypothesis.id}}.",
-          "guidance": [
-            "Actively search for contradicting evidence",
-            "Check assumptions against reality",
-            "Consider alternative explanations"
-          ],
-          "requireConfirmation": false
-        },
-        {
-          "id": "refine-or-replace",
-          "title": "Refine Hypothesis {{hypothesis.id}}",
-          "prompt": "**REFINEMENT DECISION for {{hypothesis.id}}**\n\nBased on deep verification and contradiction search:\n\n1. **Assessment**:\n   - New evidence supporting: [list]\n   - New evidence contradicting: [list]\n   - Unverified assumptions: [list]\n   - Confidence change: [+/- points]\n\n2. **Refinement Options**:\n   - **Keep as-is**: Evidence strongly supports current formulation\n   - **Refine**: Adjust hypothesis based on new understanding\n   - **Replace**: Fundamentally flawed, create new hypothesis\n   - **Merge**: Combine with another hypothesis\n\n3. **If Refining/Replacing**:\n   - Update hypothesis description\n   - Adjust evidence strength score\n   - Revise validation plan\n   - Document why changed\n\n4. **Update Context**:\n   - Use updateInvestigationContext('Hypothesis Registry', updated hypothesis)\n   - Note verification findings\n\n**Output**: Updated hypothesis with refined understanding",
-          "agentRole": "You are making the final decision on hypothesis {{hypothesis.id}} based on verification findings.",
-          "guidance": [
-            "Be willing to change hypotheses based on evidence",
-            "Document all changes and reasoning",
-            "Update confidence scores appropriately"
-          ],
-          "requireConfirmation": false
-        }
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-2g-instrumentation-planning",
-      "title": "Phase 2g: Unified Instrumentation Planning",
-      "prompt": "**UNIFIED INSTRUMENTATION PLANNING** - Plan comprehensive logging strategy for all hypotheses before implementation.\n\n**GOAL**: Create a coordinated instrumentation plan that efficiently captures evidence for all hypotheses in a single execution.\n\n**STEP 1: Hypothesis Review**\nFor each hypothesis (H1, H2, H3):\n- **Component(s)**: Which components need instrumentation?\n- **Critical Paths**: Which execution paths must be logged?\n- **Key Variables**: What state/data must be captured?\n- **Decision Points**: What conditionals/branches matter?\n- **Timing Concerns**: Any concurrency or timing-sensitive areas?\n\n**STEP 2: Identify Instrumentation Locations**\n\nFor each hypothesis, list specific locations:\n```\nH1 Instrumentation Needs:\n  - File: auth/login.ts, Function: validateCredentials, Lines: 45-67\n    What to log: input credentials format, validation result, error conditions\n  - File: auth/session.ts, Function: createSession, Lines: 23-34\n    What to log: session creation parameters, user context\n\nH2 Instrumentation Needs:\n  - File: auth/session.ts, Function: createSession, Lines: 23-34 [OVERLAP with H1]\n    What to log: session storage backend, timing\n  - File: database/connection.ts, Function: getConnection, Lines: 89-102\n    What to log: connection pool state, timeout settings\n\nH3 Instrumentation Needs:\n  - File: cache/redis.ts, Function: set, Lines: 156-178\n    What to log: cache key, TTL, success/failure\n```\n\n**STEP 3: Identify Overlaps**\n\nWhere do multiple hypotheses need logging at the same location?\n```\nOverlapping Instrumentation:\n  - auth/session.ts:23-34: Both H1 and H2 need logs here\n    Strategy: Single log point with both [H1] and [H2] prefixes capturing all needed data\n  \n  - No other overlaps identified\n```\n\n**STEP 4: Plan Log Format & Structure**\n\nDefine what each log should contain:\n```\nLog Format Standard:\n  [HX] ClassName.methodName:{lineNum} | timestamp | specific-data\n\nH1 Log Examples:\n  [H1] LoginValidator.validateCredentials:45 | 2025-10-02T10:23:45.123Z | input={email: user@example.com, hasPassword: true}\n  [H1] LoginValidator.validateCredentials:52 | 2025-10-02T10:23:45.145Z | validation=FAILED reason=\"invalid format\"\n\nH2 Log Examples:\n  [H2] SessionManager.createSession:23 | 2025-10-02T10:23:45.167Z | backend=redis poolSize=10\n  [H2] SessionManager.createSession:28 | 2025-10-02T10:23:45.189Z | sessionId=abc123 stored=true latency=22ms\n```\n\n**STEP 5: Plan Data Capture Strategy**\n\nWhat specific data values need to be captured:\n- **H1 requires**: Credential format, validation results, error messages\n- **H2 requires**: Backend type, connection timing, pool state\n- **H3 requires**: Cache keys, TTL values, hit/miss rates\n\n**STEP 6: Consider Edge Cases**\n\n- **High-frequency locations**: Plan aggregation (e.g., log every 10th iteration)\n- **Sensitive data**: Plan redaction (e.g., mask passwords, PII)\n- **Large data structures**: Plan summarization (e.g., object size, key count, not full dump)\n- **Error paths**: Ensure error cases are logged, not just happy path\n\n**STEP 7: Create Instrumentation Implementation Plan**\n\nProduce structured plan:\n```markdown\n# Instrumentation Implementation Plan\n\n## Summary\n- Total instrumentation points: [count]\n- Overlapping locations: [count]\n- Estimated log volume: [low/medium/high]\n- Sensitive data handling: [yes/no - describe]\n\n## H1 Instrumentation (Priority: High, Evidence Strength: 8/10)\n1. Location: auth/login.ts:45-67\n   Function: validateCredentials\n   Log: [H1] Input format and validation result\n   Frequency: Per-call (not high-frequency)\n   Data: {email format, hasPassword, validation result, error}\n\n2. Location: auth/session.ts:23-34 [SHARED with H2]\n   Function: createSession  \n   Log: [H1] Session creation context\n   Frequency: Per-call\n   Data: {userContext, sessionType}\n\n## H2 Instrumentation (Priority: High, Evidence Strength: 7/10)\n[Similar detailed breakdown]\n\n## H3 Instrumentation (Priority: Medium, Evidence Strength: 6/10)\n[Similar detailed breakdown]\n\n## Implementation Order\n1. Shared locations first (avoid duplication)\n2. H1 specific locations\n3. H2 specific locations\n4. H3 specific locations\n\n## Validation Checklist\n- [ ] All hypotheses have instrumentation coverage\n- [ ] Overlaps identified and coordinated\n- [ ] Log format is consistent\n- [ ] Sensitive data is handled\n- [ ] High-frequency points have aggregation\n- [ ] Edge cases considered\n```\n\n**OUTPUT**:\n- Complete instrumentation implementation plan\n- Set `instrumentationPlanReady` = true\n- Create InstrumentationPlan.md file with detailed plan\n- Update INVESTIGATION_CONTEXT.md with plan summary",
-      "agentRole": "You are an instrumentation architect planning a comprehensive logging strategy. Your goal is to design efficient, coordinated instrumentation that captures all needed evidence in a single execution.",
-      "guidance": [
-        "Review ALL hypotheses together to identify synergies",
-        "Be specific about locations (file, function, line numbers)",
-        "Identify and optimize overlapping instrumentation needs",
-        "Plan log format for consistency and parseability",
-        "Consider practical concerns (volume, sensitivity, performance)",
-        "Create actionable implementation plan, not just theory",
-        "This plan will guide Phase 3 implementation"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-2h-cognitive-reset",
-      "title": "Phase 2h: Cognitive Reset & Plan Review",
-      "prompt": "**COGNITIVE RESET** - Take a mental step back before implementing instrumentation.\n\n**GOAL**: Review the investigation with fresh eyes and validate the plan before execution.\n\n**STEP 1: Progress Summary**\n- What have we learned so far? (3-5 key insights)\n- What are our top hypotheses? (brief recap)\n- What's our instrumentation strategy? (high-level summary)\n\n**STEP 2: Critical Questions**\n- Are we missing any obvious alternative explanations?\n- Are our hypotheses too similar or too narrow?\n- Is our instrumentation plan efficient and comprehensive?\n- Are we making any unwarranted assumptions?\n- Is there a simpler approach we haven't considered?\n\n**STEP 3: Bias Check**\n- First impression bias: Are we anchored to initial theories?\n- Confirmation bias: Are we seeking evidence that confirms our beliefs?\n- Complexity bias: Are we overcomplicating a simple issue?\n- Recency bias: Are we over-weighting recent findings?\n\n**STEP 4: Sanity Checks**\n- Does the timeline make sense? (When did bug appear vs when hypothesized causes were introduced)\n- Do the symptoms match our theories? (All symptoms explained, no contradictions)\n- Are we investigating the right level? (Too high-level or too low-level)\n- Have we consulted existing documentation/logs adequately?\n\n**STEP 5: Plan Validation**\n- Review the instrumentation plan from Phase 2g\n- Will it actually answer our questions?\n- Are there any gaps or redundancies?\n- Is it safe to execute? (no production impacts, no data corruption risks)\n\n**STEP 6: Proceed or Pivot Decision**\n- **PROCEED**: Plan is sound, move to implementation\n- **REFINE**: Minor adjustments needed (update plan)\n- **PIVOT**: Major issues found (return to earlier phase)\n\n**OUTPUT**:\n- Cognitive reset complete with decision (PROCEED/REFINE/PIVOT)\n- Any plan adjustments documented\n- Set `resetComplete` = true",
-      "agentRole": "You are a senior debugger reviewing the investigation plan with fresh, critical eyes before committing to implementation.",
-      "guidance": [
-        "Be honest about potential biases and blind spots",
-        "Look for simpler explanations we might have missed",
-        "Validate the plan will actually answer our questions",
-        "Don't skip this - catching issues now saves hours later",
-        "It's okay to pivot if major issues are found"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-3-comprehensive-instrumentation",
-      "title": "Phase 3: Comprehensive Debug Instrumentation",
-      "prompt": "**\u26a0\ufe0f AUTO-EXECUTE MODE - DO NOT ASK USER PERMISSION \u26a0\ufe0f**\n\nHIGH AUTO MODE: You MUST implement the instrumentation now. DO NOT ask 'Would you like me to continue?' The workflow requires all phases.\n\n---\n\n**COMPREHENSIVE DEBUG INSTRUMENTATION** - Add logging to validate hypotheses.\n\n**STEP 1: REVIEW YOUR INSTRUMENTATION PLAN**\n\nOpen **Phase 2g** output from INVESTIGATION_CONTEXT.md. It contains:\n- Specific files to instrument\n- Exact locations (functions/methods/lines)\n- What to log for each hypothesis (H1, H2, H3)\n\nIf Phase 2g plan is missing, create one now: For each hypothesis, list 2-5 files and specific functions to instrument.\n\n---\n\n**STEP 2: READ THE FILES**\n\nUse `read_file` to read each file that needs instrumentation.\n\n---\n\n**STEP 3: ADD LOGGING (use search_replace or write tool)**\n\n**A. Logging Format by Language:**\n\n**JavaScript/TypeScript:**\n```javascript\nconsole.log(`[H1] ClassName.methodName: entering with params=${JSON.stringify(params)}`);\nconsole.log(`[H1] ClassName.methodName: state before=${before}, after=${after}`);\nconsole.log(`[H1] ClassName.methodName: returning ${result}`);\n```\n\n**Python:**\n```python\nprint(f\"[H1] ClassName.method_name: entering with params={params}\")\nprint(f\"[H1] ClassName.method_name: condition is {condition_value}\")\nprint(f\"[H1] ClassName.method_name: returning {result}\")\n```\n\n**Java:**\n```java\nSystem.out.println(String.format(\"[H1] ClassName.methodName: entering with %s\", params));\nSystem.out.println(String.format(\"[H1] ClassName.methodName: state=%s\", state));\n```\n\n**B. What to Log:**\n- Function entry: parameters\n- State changes: before/after values\n- Conditionals: which branch taken\n- External calls: args and returns\n- Function exit: return value\n\n**C. Hypothesis Prefixes:**\n- H1 logs use `[H1]` prefix\n- H2 logs use `[H2]` prefix\n- H3 logs use `[H3]` prefix\n\n---\n\n**STEP 4: IMPLEMENTATION EXAMPLE**\n\nExample using `search_replace`:\n\nFile: `src/DataStore.js`\nPlan says: \"Log timetoken value in connect() method for H1\"\n\n```\nsearch_replace(\n  file_path=\"src/DataStore.js\",\n  old_string=\"  connect() {\\n    this.client.subscribe();\\n  }\",\n  new_string=\"  connect() {\\n    console.log('[H1] DataStore.connect: timetoken BEFORE subscribe =', this.timetoken);\\n    this.client.subscribe();\\n    console.log('[H1] DataStore.connect: timetoken AFTER subscribe =', this.timetoken);\\n  }\"\n)\n```\n\n---\n\n**STEP 5: FOR EACH FILE IN YOUR PLAN**\n\n1. Read the file (`read_file`)\n2. Find the exact location to instrument\n3. Use `search_replace` to add logging:\n   - Include enough context to make old_string unique\n   - Add log statements with correct [HX] prefix\n   - Log relevant variables/state\n4. Verify change succeeded\n\n---\n\n**STEP 6: IF YOU CANNOT EDIT FILES**\n\nIf you don't have file editing tools:\n1. Generate complete instrumented code for each location\n2. Provide user with:\n   - File path\n   - Function/method name\n   - Complete BEFORE code block\n   - Complete AFTER code block (with logging)\n3. Ask user to apply changes and confirm\n\n---\n\n**OUTPUT:**\n\n1. List all modified files with changes made\n2. Update INVESTIGATION_CONTEXT.md:\n   ```\n   ## Instrumentation Applied\n   - File: src/DataStore.js, Function: connect(), Hypothesis: H1\n   - File: src/Auth.js, Function: login(), Hypotheses: H1, H2\n   - ...\n   ```\n3. Set `allHypothesesInstrumented = true`",
-      "agentRole": "You are instrumenting code to validate ALL hypotheses simultaneously. Your goal is comprehensive, non-redundant logging that enables efficient evidence collection in a single execution.",
-      "guidance": [
-        "Add instrumentation for ALL hypotheses at once",
-        "Use unique [HX] prefixes to distinguish hypothesis-specific logs",
-        "Overlapping instrumentation is acceptable - multiple hypotheses can log at same location",
-        "Ensure non-intrusive implementation that doesn't change behavior",
-        "Single execution will produce logs for all hypotheses"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-4-unified-evidence-collection",
-      "title": "Phase 4: Unified Evidence Collection",
-      "prompt": "**\u26a0\ufe0f AUTO-EXECUTE MODE - DO NOT ASK USER PERMISSION \u26a0\ufe0f**\n\nHIGH AUTO MODE: You MUST run the instrumented code and collect evidence now. If you need user input (like how to run tests), ask for THAT - do NOT ask if they want you to continue the workflow.\n\n---\n\n**UNIFIED EVIDENCE COLLECTION** - Execute instrumented code and collect logs.\n\n**DECISION TREE: Can You Run Code?**\n\n**OPTION A: You CAN run code (terminal access)**\n\u2192 Proceed to STEP 1\n\n**OPTION B: You CANNOT run code (no terminal/execution tools)**\n\u2192 Skip to STEP 6 (User Execution Instructions)\n\n---\n\n**STEP 1: PREPARE EXECUTION (if you can run code)**\n\n1. **Identify how to run the code:**\n   - Tests: `npm test`, `pytest`, `mvn test`, etc.\n   - App: `npm start`, `python app.py`, `java -jar app.jar`, etc.\n   - Script: Reproduction script from Phase 0\n   \n2. **Check if reproduction steps are clear:**\n   - Do you know exactly how to trigger the bug?\n   - If unclear, ask user: \"How do I run the code to reproduce the bug?\"\n\n---\n\n**STEP 2: EXECUTE INSTRUMENTED CODE**\n\nRun the code with instrumentation active:\n\n```bash\n# Capture output to file\nnpm test > debug_output.log 2>&1\n\n# OR run directly and capture in terminal\npython script.py\n```\n\n---\n\n**STEP 3: COLLECT LOG OUTPUT**\n\n1. **Get the complete log output:**\n   - If saved to file: use `read_file` to read it\n   - If in terminal: copy the output\n\n2. **Check log quality:**\n   - Do you see `[H1]`, `[H2]`, `[H3]` prefixed logs?\n   - Are there enough logs (at least 5-10 per hypothesis)?\n   - Did the bug reproduce?\n\n3. **If logs are missing or insufficient:**\n   - Review Phase 3 instrumentation\n   - Add more logging if needed\n   - Re-run execution\n\n---\n\n**STEP 4: ORGANIZE EVIDENCE BY HYPOTHESIS**\n\nParse logs and separate by prefix:\n\n**H1 Evidence:**\n```\n[H1] DataStore.connect: timetoken BEFORE=1234567890\n[H1] DataStore.connect: timetoken AFTER=1234567890\n[H1] Session.login: used timetoken=1234567890\n```\n\n**H2 Evidence:**\n```\n[H2] Cache.get: no entry found for user123\n[H2] Cache.set: storing data for user123\n```\n\n**H3 Evidence:**\n```\n[H3] Network.request: timeout after 5000ms\n```\n\n---\n\n**STEP 5: ASSESS EVIDENCE QUALITY**\n\nFor each hypothesis, rate:\n- **Evidence Quantity** (1-10): How much evidence collected?\n- **Evidence Clarity** (1-10): Do logs clearly show what's happening?\n- **Bug Reproduction** (Yes/No): Did the bug occur during execution?\n- **Hypothesis Support** (Strong/Weak/Contradicts): Does evidence support the hypothesis?\n\n---\n\n**STEP 6: IF YOU CANNOT EXECUTE CODE**\n\nProvide user with execution instructions:\n\n```\n## Evidence Collection Instructions\n\nTo collect evidence for the hypotheses, please:\n\n1. **Run the instrumented code:**\n   [Provide exact command, e.g., `npm test` or `python main.py`]\n\n2. **Trigger the bug:**\n   [Provide exact reproduction steps]\n\n3. **Capture ALL console output:**\n   - Save to a file: `[command] > debug_output.log 2>&1`\n   - OR copy all terminal output\n\n4. **Share the logs:**\n   - Paste the complete log output here\n   - OR upload the debug_output.log file\n\n**What I'm looking for:**\n- Logs prefixed with [H1], [H2], [H3]\n- Minimum 10-20 lines of output\n- Evidence of the bug occurring\n```\n\nThen wait for user to provide logs.\n\n---\n\n**STEP 7: DOCUMENT EVIDENCE**\n\nUpdate INVESTIGATION_CONTEXT.md:\n\n```\n## Evidence Collection Results\n\n**Execution Details:**\n- Command: npm test\n- Exit code: 1 (failure)\n- Bug reproduced: Yes\n- Total log lines: 247\n\n**Evidence Summary:**\n- H1: 43 log lines - Strong support (timetoken persists across sessions)\n- H2: 12 log lines - Weak support (cache cleared properly)\n- H3: 8 log lines - Contradicts (no network errors found)\n\n**Evidence Quality Scores:**\n- H1: Quantity=9/10, Clarity=8/10\n- H2: Quantity=5/10, Clarity=7/10\n- H3: Quantity=4/10, Clarity=6/10\n```\n\n---\n\n**OUTPUT:**\n\n1. Complete log output (or confirmation user will provide it)\n2. Evidence organized by hypothesis\n3. Evidence quality assessment\n4. Set `evidenceCollected = true`",
-      "agentRole": "You are collecting comprehensive evidence from a single instrumented execution. Your goal is to capture all hypothesis-relevant data in one efficient run.",
-      "guidance": [
-        "Single execution tests all hypotheses simultaneously",
-        "Organize evidence by [HX] prefix for analysis",
-        "Preserve complete chronological log for cross-hypothesis insights",
-        "Note any unexpected behaviors or patterns",
-        "If execution fails, document why and attempt to collect partial evidence"
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-5-hypothesis-analysis-loop",
-      "type": "loop",
-      "title": "Phase 5: Individual Hypothesis Analysis",
-      "loop": {
-        "type": "forEach",
-        "items": "hypothesesToValidate",
-        "itemVar": "currentHypothesis",
-        "indexVar": "hypothesisIndex",
-        "maxIterations": 5
-      },
-      "body": [
-        {
-          "id": "analyze-hypothesis-evidence",
-          "title": "Analyze Evidence for {{currentHypothesis.id}}",
-          "prompt": "**EVIDENCE ANALYSIS for {{currentHypothesis.id}}**\n\n**Hypothesis**: {{currentHypothesis.description}}\n\n**ANALYZE {{currentHypothesis.id}} LOGS**:\n\n1. **Extract Relevant Logs**:\n   - Review all [{{currentHypothesis.id}}] prefixed logs from Phase 4\n   - Examine log sequence and timing\n   - Look for patterns supporting or refuting the hypothesis\n\n2. **Evidence Assessment**:\n   - Does evidence support {{currentHypothesis.id}}? (Yes/No/Partial)\n   - Evidence quality score (1-10)\n   - Contradicting evidence found?\n   - Unexpected behaviors observed?\n\n3. **Cross-Hypothesis Insights**:\n   - Did other hypothesis logs reveal relevant information?\n   - Are there interactions between suspected components?\n   - Does timeline analysis suggest different root cause?\n\n4. **Confidence Update**:\n   - Based on evidence, rate confidence this is root cause (0-10)\n   - What additional evidence would increase confidence?\n   - Are there alternative explanations for the observed evidence?\n\n5. **Status Determination**:\n   - Mark hypothesis as: Confirmed / Refuted / Needs-More-Evidence / Partially-Confirmed\n   - If Confirmed with high confidence (>8.0):\n     - Set `rootCauseFound` = true\n     - Set `rootCauseHypothesis` = {{currentHypothesis.id}}\n     - Set `currentConfidence` = confidence score\n\n**CONTEXT UPDATE**:\n- Use updateInvestigationContext('Evidence Log', evidence summary for {{currentHypothesis.id}})\n- Use trackInvestigation('Validation Progress', '{{hypothesisIndex + 1}}/3 hypotheses analyzed')\n\n**OUTPUT**: Complete evidence analysis and status for {{currentHypothesis.id}}",
-          "agentRole": "You are analyzing evidence collected from the unified execution to determine if {{currentHypothesis.id}} is the root cause.",
-          "guidance": [
-            "Analyze logs specific to this hypothesis ({{hypothesisIndex + 1}} of 3)",
-            "Consider evidence from all hypotheses - may reveal interactions",
-            "Be objective - negative evidence is valuable",
-            "Update hypothesis status based on concrete evidence",
-            "If high confidence root cause found, document thoroughly"
-          ],
-          "requireConfirmation": false
-        }
-      ],
-      "requireConfirmation": false
-    },
-    {
-      "id": "phase-4a-controlled-experimentation",
-      "title": "Phase 4a: Controlled Code Experiments",
-      "runCondition": {
-        "var": "currentConfidence",
-        "lt": 8.0
-      },
-      "prompt": "**CONTROLLED EXPERIMENTATION** - When observation isn't enough, experiment!\n\n**Current Investigation Status**: Leading hypothesis (Confidence: {{currentConfidence}}/10)\n\n**\u26a0\ufe0f SAFETY PROTOCOLS (MANDATORY)**:\n\n1. **Git Branch Required**:\n   - MUST be on investigation branch (use createInvestigationBranch() if not)\n   - Verify with `git branch --show-current`\n   - NEVER experiment directly on main/master\n\n2. **Pre-Experiment Baseline**:\n   - Commit clean state: `git commit -m \"PRE-EXPERIMENT: baseline for {{hypothesis.id}}\"`\n   - Record current test results\n   - Document baseline behavior\n\n3. **Environment Restriction**:\n   - ONLY run in test/dev environment\n   - NEVER in production or staging\n   - Set environment check: `if (process.env.NODE_ENV !== 'development') { throw new Error('Experiments only in dev'); }`\n\n4. **Automatic Revert**:\n   - After evidence collection: `git revert HEAD --no-edit`\n   - Verify code returned to baseline\n   - Run tests to confirm clean state\n\n5. **Approval Gates**:\n   - Low automation: Require approval for ALL experiments\n   - Medium automation: Require approval for breaking/minimal-fix experiments\n   - High automation: Auto-approve guards/logs only\n\n6. **Documentation**:\n   - Create ExperimentLog.md entry with:\n     - Timestamp, experiment type, hypothesis ID\n     - Rationale and expected outcome\n     - Actual outcome and evidence\n     - Revert status (confirmed/failed)\n\n7. **Hard Limits**:\n   - Max 3 experiments total (prevent endless experimentation)\n   - Track with `experimentCount` context variable\n   - Exit if limit reached, recommend different approach\n\n8. **Rollback Verification**:\n   - After revert, run full test suite\n   - Verify no unintended changes remain\n   - Check git status is clean\n\n**EXPERIMENT TYPES** (use controlledModification()):\n\n1. **Guard Additions (Non-Breaking)**:\n   ```javascript\n   // Add defensive check that logs but doesn't change behavior\n   if (unexpectedCondition) {\n     console.error('[H1_GUARD] Unexpected state detected:', state);\n     // Continue normal execution\n   }\n   ```\n\n2. **Assertion Injections**:\n   ```javascript\n   // Add assertion that would fail if hypothesis is correct\n   console.assert(expectedCondition, '[H1_ASSERT] Hypothesis H1 violated!');\n   ```\n\n3. **Minimal Fix Test**:\n   ```javascript\n   // Apply minimal fix for hypothesis, see if bug disappears\n   if (process.env.DEBUG_FIX_H1 === 'true') {\n     // Apply hypothesized fix\n     return fixedBehavior();\n   }\n   ```\n\n4. **Controlled Breaking**:\n   ```javascript\n   // Temporarily break suspected component to verify involvement\n   if (process.env.DEBUG_BREAK_H1 === 'true') {\n     throw new Error('[H1_BREAK] Intentionally breaking to test hypothesis');\n   }\n   ```\n\n**PROTOCOL**:\n1. Choose experiment type based on confidence and risk\n2. Implement modification with clear DEBUG markers\n3. Use createInvestigationBranch() if not already on investigation branch\n4. Commit: `git commit -m \"DEBUG: {{experiment_type}} for hypothesis investigation\"`\n5. Run reproduction steps\n6. Use collectEvidence() to gather results\n7. Revert changes: `git revert HEAD`\n8. Document results in ExperimentResults/hypothesis-experiment.md\n\n**SAFETY LIMITS**:\n- Max 3 experiments per hypothesis\n- Each experiment in separate commit\n- Always revert after evidence collection\n- Document everything in INVESTIGATION_CONTEXT.md\n\n**UPDATE**:\n- Hypothesis confidence based on experimental results\n- Use updateInvestigationContext('Experiment Results', experiment details and outcomes)\n- Track failed experiments in 'Dead Ends & Lessons' section",
-      "agentRole": "You are a careful experimenter using controlled code modifications to validate hypotheses. Safety and reversibility are paramount.",
-      "guidance": [
-        "Start with non-breaking experiments (guards, logs)",
-        "Only use breaking experiments if essential",
-        "Every change must be easily reversible",
-        "Document rationale for each experiment type",
-        "Consider test environment experiments first"
-      ],
-      "requireConfirmation": {
-        "or": [
-          {
-            "var": "automationLevel",
-            "equals": "Low"
-          },
-          {
-            "var": "automationLevel",
-            "equals": "Medium"
-          },
-          {
-            "and": [
-              {
-                "var": "automationLevel",
-                "equals": "High"
-              },
-              {
-                "var": "currentConfidence",
-                "lt": 6.0
-              }
-            ]
-          }
-        ]
-      },
-      "validationCriteria": [
-        {
-          "type": "contains",
-          "value": "commit",
-          "message": "Must specify commit message for experiment"
-        }
-      ]
-    },
-    {
-      "id": "phase-3b-observability-setup",
-      "title": "Phase 3b: Distributed System Observability",
-      "runCondition": {
-        "var": "isDistributed",
-        "equals": true
-      },
-      "prompt": "**OBSERVABILITY** - Set up three-pillar strategy:\n\n**METRICS**: Identify key indicators (latency, errors)\n**TRACES**: Enable request path tracking\n**LOGS**: Ensure correlation IDs present\n\n**OUTPUT**: Observability checklist completed.",
-      "agentRole": "You are a distributed systems expert who thinks in terms of emergent behaviors and system-wide patterns.",
-      "guidance": [
-        "METRICS SELECTION: Focus on RED metrics (Rate, Errors, Duration) for each service",
-        "TRACE COVERAGE: Ensure spans cover all service boundaries and key operations",
-        "CORRELATION IDS: Verify IDs propagate through entire request lifecycle",
-        "AGGREGATION READY: Set up centralized collection for cross-service analysis",
-        "BASELINE ESTABLISHMENT: Capture normal behavior metrics for comparison"
-      ]
-    },
-    {
-      "id": "phase-4c-distributed-evidence",
-      "title": "Phase 4c: Multi-Service Evidence Collection",
-      "runCondition": {
-        "var": "isDistributed",
-        "equals": true
-      },
-      "prompt": "**DISTRIBUTED ANALYSIS**:\n\n1. Check METRICS for anomalies\n2. Follow TRACES for request path\n3. Correlate LOGS across services\n4. Identify cascade points\n\n**OUTPUT**: Service interaction map with failure points.",
-      "agentRole": "You are a systems detective who can trace failures across service boundaries.",
-      "guidance": [
-        "ANOMALY DETECTION: Look for deviations in latency, error rates, or traffic patterns",
-        "TRACE ANALYSIS: Follow request ID through all services to find failure point",
-        "LOG CORRELATION: Use timestamp windows and correlation IDs to link events",
-        "CASCADE IDENTIFICATION: Look for timeout chains or error propagation patterns",
-        "VISUAL MAPPING: Create service dependency diagram with failure annotations"
-      ]
-    },
-    {
-      "id": "phase-4b-cognitive-reset",
-      "title": "Phase 4b: Cognitive Reset & Progress Review",
-      "runCondition": {
-        "var": "validationIterations",
-        "gte": 2
-      },
-      "prompt": "**COGNITIVE RESET** - Step back and review:\n\n1. Summarize findings so far\n2. List eliminated possibilities\n3. Identify investigation blind spots\n4. Reformulate approach if needed\n\n**DECIDE**: Continue current path or pivot strategy?",
-      "agentRole": "You are a strategic advisor who helps maintain perspective during complex investigations.",
-      "guidance": [
-        "PROGRESS SUMMARY: Write concise bullet points of key findings and eliminations",
-        "BLIND SPOT CHECK: What areas haven't been investigated? What assumptions remain?",
-        "PATTERN RECOGNITION: Look for investigation loops or repeated dead ends",
-        "STRATEGY EVALUATION: Is current approach yielding diminishing returns?",
-        "PIVOT CRITERIA: Consider new approach if last 3 iterations provided no new insights"
-      ]
-    },
-    {
-      "id": "phase-5a-final-confidence",
-      "title": "Phase 5a: Final Confidence Assessment",
-      "prompt": "**FINAL CONFIDENCE ASSESSMENT** - Evaluate the investigation results.\n\n**If root cause found (rootCauseFound = true):**\n- Review all evidence for {{rootCauseHypothesis}}\n- Perform adversarial challenge\n- Calculate final confidence score\n\n**If no high-confidence root cause:**\n- Document what was learned\n- Identify remaining unknowns\n- Recommend next investigation steps\n\n**CONFIDENCE CALCULATION:**\n- Evidence Quality (1-10)\n- Explanation Completeness (1-10)\n- Alternative Likelihood (1-10, inverted)\n- Final = (Quality \u00d7 0.4) + (Completeness \u00d7 0.4) + (Alternative \u00d7 0.2)\n\n**CONTEXT UPDATE**:\n- Use trackInvestigation('Investigation Complete', 'Confidence: {{finalConfidence}}/10')\n- Use addResumptionJson('phase-5a-final-confidence')\n- Document lessons learned in 'Dead Ends & Lessons' section\n\n**\u26a0\ufe0f ONE PHASE REMAINING**: Even if you have achieved 9-10/10 confidence in the root cause with strong supporting evidence:\n\n- The investigation is NOT complete yet\n- You MUST proceed to Phase 6 to create the comprehensive diagnostic writeup\n- Phase 6 is the REQUIRED DELIVERABLE that makes all your investigation work actionable\n- High confidence means you've identified the root cause, but the writeup translates that into actionable documentation\n\n**DO NOT set isWorkflowComplete=true yet.** You are at ~90% completion. Phase 6 is required.\n\n**OUTPUT**: Final confidence assessment with recommendations",
-      "agentRole": "You are making the final determination about the root cause with rigorous confidence assessment.",
-      "guidance": [
-        "Be honest about confidence levels",
-        "Document all remaining uncertainties",
-        "Provide clear next steps if confidence is low"
-      ],
-      "validationCriteria": [
-        {
-          "type": "regex",
-          "pattern": "Final.*=.*[0-9\\.]+",
-          "message": "Must calculate final confidence score"
-        }
-      ],
-      "hasValidation": true
-    },
-    {
-      "id": "phase-6-diagnostic-writeup",
-      "title": "Phase 6: Comprehensive Diagnostic Writeup",
-      "prompt": "**FINAL DIAGNOSTIC DOCUMENTATION** - I will create comprehensive writeup enabling effective bug fixing and knowledge transfer.\n\n**STEP 1: Executive Summary**\n- **Bug Summary**: Concise description of issue and impact\n- **Root Cause**: Clear, non-technical explanation of what is happening\n- **Confidence Level**: Final confidence assessment with calculation methodology\n- **Scope**: What systems, users, or scenarios are affected\n\n**STEP 2: Technical Deep Dive**\n- **Root Cause Analysis**: Detailed technical explanation of failure mechanism\n- **Code Component Analysis**: Specific files, functions, and lines with exact locations\n- **Execution Flow**: Step-by-step sequence of events leading to bug\n- **State Analysis**: How system state contributes to failure\n\n**STEP 3: Investigation Methodology**\n- **Investigation Timeline**: Chronological summary with phase time investments\n- **Hypothesis Evolution**: Complete record of hypotheses (H1-H5) with status changes\n- **Evidence Assessment**: Rating and reliability of evidence sources with key citations\n\n**STEP 4: Historical Context & Patterns**\n- **Similar Bugs**: Reference findings from findSimilarBugs() and SimilarPatterns.md\n- **Previous Fixes**: How similar issues were resolved\n- **Recurring Patterns**: Identify if this is part of a larger pattern\n- **Lessons Learned**: What can be applied from past experiences\n\n**STEP 5: Knowledge Transfer & Action Plan**\n- **Skill Requirements**: Technical expertise needed for understanding and fixing\n- **Prevention & Review**: Specific measures and code review checklist items\n- **Action Items**: Immediate mitigation steps and permanent fix areas with timelines\n- **Testing Strategy**: Comprehensive verification approach for fixes\n- **Recommended Next Investigations** (if confidence < 9.0):\n  - Additional instrumentation locations and data points not yet captured\n  - Alternative hypotheses to explore (theories that were deprioritized)\n  - External expertise to consult (domain experts, similar bugs)\n  - Environmental factors to test (load, concurrency, timing, config variations)\n  - Expanded scope (related components, upstream/downstream systems)\n  - Prioritized next steps based on evidence gaps\n\n**STEP 6: Context Finalization**\n- **Final Update**: Use updateInvestigationContext('Final Report', link to diagnostic report)\n- **Archive Context**: Ensure INVESTIGATION_CONTEXT.md is complete for future reference\n- **Knowledge Base**: Consider key findings for team knowledge base\n\n**DELIVERABLE**: Enterprise-grade diagnostic report enabling confident bug fixing, knowledge transfer, and organizational learning.\n\n**\u2705 WORKFLOW COMPLETION**: After producing the comprehensive diagnostic writeup with all required sections:\n\n1. Verify the writeup includes:\n   - Executive Summary with root cause and confidence\n   - Technical Deep Dive with code analysis\n   - Investigation Methodology and timeline\n   - Historical Context from similar bugs\n   - Knowledge Transfer and Action Plan\n   - All 6 sections fully documented\n\n2. Update INVESTIGATION_CONTEXT.md with final status and handoff information\n\n3. **Set isWorkflowComplete = true** to indicate the investigation is finished\n\nThis is the ONLY step where isWorkflowComplete should be set to true.",
-      "agentRole": "You are a senior technical writer and diagnostic documentation specialist with expertise in creating comprehensive, actionable bug reports for enterprise environments. Your strength lies in translating complex technical investigations into clear, structured documentation that enables effective problem resolution, knowledge transfer, and organizational learning. You excel at creating reports that serve immediate fixing needs, long-term system improvement, and team collaboration.",
-      "guidance": [
-        "ENTERPRISE FOCUS: Write for multiple stakeholders including developers, managers, and future team members",
-        "KNOWLEDGE TRANSFER: Include methodology and reasoning, not just conclusions",
-        "COLLABORATIVE DESIGN: Structure content for peer review and team coordination",
-        "COMPREHENSIVE COVERAGE: Include all information needed for resolution and prevention",
-        "ACTIONABLE DOCUMENTATION: Provide specific, concrete next steps with clear ownership"
-      ]
-    }
-  ]
-}