@exaudeus/workrail 0.8.0 → 0.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. package/dist/application/app.d.ts +0 -1
  2. package/dist/application/app.js +0 -6
  3. package/dist/application/services/workflow-service.js +56 -4
  4. package/dist/mcp-server.js +0 -35
  5. package/package.json +1 -1
  6. package/workflows/bug-investigation.agentic.json +112 -0
  7. package/workflows/document-creation-workflow.json +1 -1
  8. package/workflows/documentation-update-workflow.json +1 -1
  9. package/workflows/routines/plan-analysis.json +139 -0
  10. package/workflows/scoped-documentation-workflow.json +252 -0
  11. package/workflows/workflow-diagnose-environment.json +24 -0
  12. package/spec/mcp-compliance-summary.md +0 -211
  13. package/spec/mcp-protocol-handshake.md +0 -604
  14. package/web/DESIGN_SYSTEM_INTEGRATION.md +0 -305
  15. package/web/assets/images/favicon-amber-16.png +0 -0
  16. package/web/assets/images/favicon-amber-32.png +0 -0
  17. package/web/assets/images/favicon-white-16-clean.png +0 -0
  18. package/web/assets/images/favicon-white-32-clean.png +0 -0
  19. package/web/assets/images/icon-amber-192.png +0 -0
  20. package/web/assets/images/icon-amber-512.png +0 -0
  21. package/web/assets/images/icon-amber.svg +0 -27
  22. package/web/assets/images/icon-white-192-clean.png +0 -0
  23. package/web/assets/images/icon-white-512-clean.png +0 -0
  24. package/web/assets/images/icon-white.svg +0 -27
  25. package/web/examples/BEFORE_AFTER.md +0 -691
  26. package/workflows/IMPROVEMENTS-simplified.md +0 -122
  27. package/workflows/systematic-bug-investigation-simplified.backup-20251106-155300.json +0 -117
  28. package/workflows/systematic-bug-investigation-with-loops.backup-20251106-162241.json +0 -731
@@ -1,731 +0,0 @@
1
- {
2
- "id": "systematic-bug-investigation-with-loops",
3
- "name": "Systematic Bug Investigation Workflow",
4
- "version": "1.1.0-beta.22",
5
- "description": "A comprehensive workflow for systematic bug and failing test investigation that prevents LLMs from jumping to conclusions. Enforces thorough evidence gathering, hypothesis formation, debugging instrumentation, and validation to achieve near 100% certainty about root causes. This workflow does NOT fix bugs - it produces detailed diagnostic writeups that enable effective fixing by providing complete understanding of what is happening, why it's happening, and supporting evidence.",
6
- "clarificationPrompts": [
7
- "What type of system is this? (web app, mobile app, backend service, desktop app, etc.)",
8
- "How consistently can you reproduce this bug? (always reproducible, sometimes reproducible, rarely reproducible)",
9
- "What was the last known working version or state if applicable?",
10
- "Are there any time constraints or urgency factors for this investigation?",
11
- "What level of system access do you have? (full codebase, limited access, production logs only)",
12
- "What existing documentation is available? (README files, architecture docs, API docs, design documents, runbooks)",
13
- "Do you have access to existing logs? (production logs, error logs, debug logs, metrics, traces)",
14
- "Do you have preferences for handling large log volumes? (sub-chat analysis, inline summaries only, or no preference for automatic decision)"
15
- ],
16
- "preconditions": [
17
- "User has identified a specific bug or failing test to investigate",
18
- "Agent has access to codebase analysis tools (grep, file readers, etc.)",
19
- "Agent has access to build/test execution tools for the project type",
20
- "User can provide error messages, stack traces, or test failure output",
21
- "Bug is reproducible with specific steps or a minimal test case"
22
- ],
23
- "metaGuidance": [
24
- "**\ud83d\udea8 MANDATORY WORKFLOW EXECUTION - READ THIS FIRST:**",
25
- "YOU ARE EXECUTING A STRUCTURED WORKFLOW, NOT FREESTYLE DEBUGGING.",
26
- "You CANNOT \"figure out the bug\" and stop. You MUST execute all 23 workflow steps by repeatedly calling workflow_next and following instructions until the MCP returns isComplete=true.",
27
- "WORKFLOW MECHANICS: Each call to workflow_next returns the next required step. You MUST execute that step, then call workflow_next again. Repeat until isComplete=true.",
28
- "DO NOT STOP CALLING WORKFLOW_NEXT: Even if you think you know the bug, even if you have high confidence, even if it seems obvious - you MUST continue calling workflow_next.",
29
- "STEP COUNTER: Every prompt shows \"Step X of 23\" - you are NOT done until you reach Step 23/23 and isComplete=true.",
30
- "**\ud83d\udea8 CRITICAL MISUNDERSTANDING TO AVOID:**",
31
- "THE GOAL IS NOT \"FINDING\" THE BUG. THE GOAL IS \"PROVING\" THE BUG WITH EVIDENCE.",
32
- "\"I found the bug\" = YOU HAVE A GUESS. \"I proved the bug\" = YOU HAVE EVIDENCE FROM PHASES 3-5.",
33
- "FINDING \u2260 DONE. PROVING = DONE. Only after completing instrumentation, evidence collection, and validation do you have proof.",
34
- "NEVER say \"I've identified the root cause\" and stop. That is a THEORY, not PROOF. Continue to evidence collection.",
35
- "**\ud83d\udea8 USER SAYS: NO RATIONALIZATION. NO EXCEPTIONS. NO \"BUT IN MY CASE...\":**",
36
- "DO NOT say \"However, given that I have...\" or \"Let me do a targeted Phase X...\" or \"Based on my high confidence...\"",
37
- "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION. Complete ALL 23 steps. Complete ALL 5 analysis iterations.",
38
- "USER SAYS: \"I found the bug early\" = ALL THE MORE REASON to validate it properly through ALL phases. Quick conclusions are WRONG 90% of the time.",
39
- "DO NOT create \"summary documents\" or \"diagnostic writeups\" until Phase 6. That is SKIPPING THE WORKFLOW.",
40
- "**\ud83c\udfaf PHASE 0 = PURE SETUP (NO ANALYSIS):**",
41
- "Phase 0 is MECHANICAL SETUP ONLY: triage, user preferences, tool checking, context creation. No code analysis, no assumption checking. That comes in Phase 1.",
42
- "Phase 1a NOW includes assumption verification - AFTER you've seen the code and built structural understanding. You can't meaningfully question assumptions before understanding the codebase.",
43
- "**\ud83d\udea8 CRITICAL: ANALYSIS \u2260 DIAGNOSIS \u2260 PROOF:**",
44
- "AFTER PHASE 1 (Analysis): You have analyzed code and identified suspicious patterns. This is NOT proof. You have ZERO evidence yet. You are ~20% done.",
45
- "AFTER PHASE 2 (Hypotheses): You have theories about the bug. This is NOT proof. You still have ZERO evidence. You are ~40% done.",
46
- "EVIDENCE COMES FROM PHASES 3-5: Instrumentation, evidence collection, and validation. Only THEN do you have proof.",
47
- "STOP = WRONG: Stopping after analysis (even with \"100% confidence\") means you have ZERO PROOF and are providing GUESSES, not diagnosis.",
48
- "**\ud83c\udfaf WHY THIS STRUCTURE EXISTS (Evidence-Based):**",
49
- "Professional research spanning 20+ years shows agents who skip systematic investigation steps are wrong ~90% of the time, even with 9-10/10 self-reported confidence.",
50
- "Quick conclusions miss: edge cases, alternative explanations, environment factors, interaction effects, and data corruption paths.",
51
- "This workflow FORCES thoroughness through: code analysis, hypothesis formation, instrumentation, evidence gathering, adversarial review, and comprehensive documentation.",
52
- "**CRITICAL WORKFLOW DISCIPLINE:**",
53
- "HIGH CONFIDENCE \u2260 INVESTIGATION COMPLETE: Achieving 8-10/10 confidence in a hypothesis is excellent progress but does NOT mean the workflow is done.",
54
- "COMPLETE ALL PHASES: You MUST complete ALL phases (0 through 6) regardless of confidence level. Each phase builds critical evidence and documentation.",
55
- "WORKFLOW COMPLETION FLAG: Only set isWorkflowComplete=true when you complete Phase 6 (Comprehensive Diagnostic Writeup) AND produce the full deliverable.",
56
- "DO NOT SKIP PHASES: Even with high confidence, you must complete hypothesis generation (Phase 2), instrumentation (Phase 3), evidence collection (Phase 4), analysis (Phase 5), and writeup (Phase 6).",
57
- "PHASE PROGRESSION: An investigation that stops at triage (Phase 0) or hypothesis formation (Phase 2) or evidence collection (Phase 4) is INCOMPLETE - the diagnostic writeup is the required deliverable.",
58
- "**HIGH AUTO MODE DISCIPLINE:**",
59
- "In HIGH automation mode, agents must execute phases WITHOUT asking permission between phases. This means: proceed automatically from Phase 1\u21922\u21923\u21924\u21925\u21926.",
60
- "HIGH AUTO \u2260 PERMISSION TO SKIP PHASES. HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES.",
61
- "**CRITICAL: HIGH AUTOMATION \u2260 AUTONOMY TO SKIP:**",
62
- "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip.",
63
- "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases it thinks are unnecessary.",
64
- "are: (1) Phase 0e early termination decision, (2) Phase 4a controlled experiments. All other phases execute automatically based on the systematic workflow structure.",
65
- "**FUNCTION DEFINITIONS:**",
66
- "fun instrumentCode(location, hypothesis) = 'Add debug logs at {location} for {hypothesis}. Format: ClassName.method [{hypothesis}]: message. Include timestamp, thread ID if concurrent.'",
67
- "fun collectEvidence(hypothesis) = 'Run instrumented code, collect logs, analyze results. Score evidence quality 1-10. Document in Evidence/{hypothesis}.md.'",
68
- "fun updateHypothesisLog(id, status, evidence) = 'Update INVESTIGATION_CONTEXT.md section {id} with {status} and {evidence}. Include confidence score.'",
69
- "fun analyzeTests(component) = 'Find all tests for {component} using grep_search. Check coverage, recent changes, what they validate vs miss. Run with --debug flag.'",
70
- "fun recursiveAnalysis(component, depth=3) = 'Analyze {component} to {depth} levels. L1: implementation, L2: direct deps, L3: transitive deps. Document each level.'",
71
- "fun controlledModification(type, location) = 'Make {type} change at {location}. Types: guard (add logging), assert (add assertion), fix (minimal fix), break (controlled failure). Commit: DEBUG: {type} at {location}'",
72
- "fun checkHypothesisInTests(hypothesis) = 'Search existing tests for evidence. Direct: tests of suspected components. Indirect: tests that would fail if true. Document in TestEvidence/{hypothesis}.md'",
73
- "fun aggregateDebugLogs(pattern, timeWindow=100) = 'Deduplicate logs matching {pattern}. Output: {pattern} x{count} in {timeWindow}ms, variations: {unique_values}'",
74
- "fun createInvestigationBranch() = 'git checkout -b investigate/{bug-id}-{timestamp}. If git unavailable, create Investigation/{timestamp}/ directory for artifacts.'",
75
- "fun trackInvestigation(phase, status) = 'Update INVESTIGATION_CONTEXT.md progress: \u2705 {completed}, \ud83d\udd04 {phase}, \u23f3 Remaining: {list}, \ud83d\udcca Confidence: {score}/10'",
76
- "fun updateInvestigationContext(section, content) = 'Update INVESTIGATION_CONTEXT.md {section} with {content}. Include timestamp. If section doesn\\'t exist, create it. Preserve all other sections.'",
77
- "fun findSimilarBugs() = 'Search for: 1) Similar error patterns in codebase, 2) Previous fixes in git history, 3) Related test cases. Document in SimilarPatterns.md'",
78
- "fun visualProgress() = 'Show: \u2705 Phase 0 | \u2705 Phase 1 | \ud83d\udd04 Phase 2 | \u23f3 Phase 3-5 | \u23f3 Phase 6 | \ud83d\udcca 35% Complete. Include time spent per phase.'",
79
- "fun applyDebugPreferences() = 'Apply user debugging preferences from userDebugPreferences context variable. Adapt logging verbosity, tool selection, output format.'",
80
- "fun addResumptionJson(phase) = 'Update INVESTIGATION_CONTEXT.md resumption section with: workflowId, completedSteps up to {phase}, all context variables. Include workflow_get and workflow_next instructions.'",
81
- "**USAGE:** When you see function calls like instrumentCode() or analyzeTests(), execute the full instructions defined above.",
82
- "INVESTIGATION DISCIPLINE: Never propose fixes or solutions until Phase 6 (Comprehensive Diagnostic Writeup). Focus entirely on systematic evidence gathering and analysis.",
83
- "HYPOTHESIS RIGOR: All hypotheses must be based on concrete evidence from code analysis with quantified scoring (1-10 scales). Maximum 5 hypotheses per investigation.",
84
- "DEBUGGING INSTRUMENTATION: Always implement debugging mechanisms before running tests - logs, print statements, or test modifications that will provide evidence.",
85
- "EVIDENCE THRESHOLD: Require minimum 3 independent sources of evidence before confirming any hypothesis. Use objective verification criteria.",
86
- "SYSTEMATIC PROGRESSION: Complete each investigation phase fully before proceeding. Each phase builds critical context for the next with structured documentation.",
87
- "CONFIDENCE CALIBRATION: Use mathematical confidence framework with 9.0/10 minimum threshold. Actively challenge conclusions with adversarial analysis.",
88
- "UNCERTAINTY ACKNOWLEDGMENT: Explicitly document all remaining unknowns and their potential impact. No subjective confidence assessments.",
89
- "THOROUGHNESS: For complex bugs, recursively analyze dependencies and internals of identified components to ensure full picture.",
90
- "TEST INTEGRATION: Leverage existing tests to validate hypotheses where possible.",
91
- "**LOGGING STANDARDS:**",
92
- "LOG FORMAT: Always use 'ClassName.methodName [hypothesisId] {timestamp}: message'. For concurrent code, add thread/worker ID.",
93
- "LOG DEDUPLICATION: Implement in debug code: if (lastMsg === currentMsg) { count++; if (count % 10 === 0) log(`${msg} x${count}`); } else { if (count > 1) log(`Previous: x${count}`); log(currentMsg); count = 1; }",
94
- "LOG AGGREGATION: For high-frequency events, create summaries: 'Event X occurred 847 times between 10:23:45-10:23:47, unique values: [val1: 623, val2: 224]'",
95
- "LOG WINDOWS: Group related logs within 50-100ms. Mark groups with '=== Operation: XYZ Start ===' and '=== Operation: XYZ End (duration: 73ms) ==='",
96
- "LOG CONTEXT: Include hypothesis ID in all debug logs. Use prefixes like 'H1_DEBUG:', 'H2_TRACE:', 'H3_ERROR:'",
97
- "LOG ANALYSIS OFFLOADING: For voluminous logs (>500 lines), offload analysis to sub-chats with structured prompts. See Phase 4 for detailed sub-analysis implementation.",
98
- "RECURSION DEPTH: Limit recursive analysis to 3 levels deep to prevent analysis paralysis while ensuring thoroughness.",
99
- "INVESTIGATION BOUNDS: If investigation exceeds 20 steps or 4 hours without root cause, pause and reassess approach with user.",
100
- "AUTOMATION LEVELS: High=execute phases automatically WITHOUT asking permission between phases (but MUST complete ALL phases), Medium=standard confirmations, Low=extra confirmations for safety.",
101
- "CONTEXT DOCUMENTATION: Maintain INVESTIGATION_CONTEXT.md throughout. Update after major milestones, failures, or user interventions to enable seamless resumption and handoffs. Include explicit resumption instructions using workflow_get and workflow_next.",
102
- "GIT FALLBACK STRATEGY: If git unavailable, gracefully skip commits/branches, log changes manually in CONTEXT.md with timestamps, warn user, document modifications for manual control.",
103
- "GIT ERROR HANDLING: Use run_terminal_cmd for git operations; if fails, output exact command for user manual execution. Never halt investigation due to git unavailability.",
104
- "TOOL AVAILABILITY AWARENESS: Check debugging tool availability before investigation design. Have fallbacks for when primary tools unavailable (grep\u2192file_search, etc).",
105
- "SECURITY PROTOCOLS: Sanitize sensitive data in logs/reproduction steps. Be mindful of exposing credentials, PII, or system internals during evidence collection phases.",
106
- "DYNAMIC RE-TRIAGE: Allow complexity upgrades during investigation if evidence reveals deeper issues. Safe downgrades only with explicit user confirmation after evidence review.",
107
- "DEVIL'S ADVOCATE REVIEW: Actively challenge primary hypothesis with available evidence. Seek alternative explanations and rate alternative likelihood before final confidence assessment.",
108
- "COLLABORATIVE HANDOFFS: Structure documentation for peer review and team coordination. Include methodology, reasoning, and complete evidence chain for knowledge transfer.",
109
- "FAILURE BOUNDS: Track investigation progress. If >20 steps or >4 hours without breakthrough, pause for user guidance. Document dead ends to prevent redundant work if investigation is resumed.",
110
- "COGNITIVE BREAKS: After 10 investigation steps, pause and summarize progress to reset perspective.",
111
- "RUBBER DUCK: Verbalize hypotheses in sub-prompts to externalize reasoning and catch logical gaps.",
112
- "COLLABORATION READY: Document clearly for handoffs when stuck beyond iteration limits."
113
- ],
114
- "steps": [
115
- {
116
- "id": "phase-0-complete-setup",
117
- "title": "Phase 0: Complete Investigation Setup",
118
- "prompt": "**SYSTEMATIC INVESTIGATION SETUP** - Complete all mechanical setup before analysis begins.\n\n**This phase is PURELY MECHANICAL - no code analysis or hypothesis formation yet.**\n\n---\n\n**PART 1: Bug Report Triage**\n\nPlease provide complete bug context:\n- **Bug Description**: Observed vs expected behavior?\n- **Error Messages/Stack Traces**: Complete error output\n- **Reproduction Steps**: Consistent reproduction method?\n- **Environment Details**: OS, language/framework versions\n- **Recent Changes**: Commits, deployments, config changes?\n\n**Classify Project Type:**\n- Languages/Frameworks (primary tech stack)\n- Build System (Maven, Gradle, npm, etc.)\n- Testing Framework (JUnit, Jest, pytest, etc.)\n- Logging System (available mechanisms)\n- Architecture (monolithic, microservices, distributed, serverless)\n\n**Assess Bug Complexity:**\n- Simple: Single function, clear error path, minimal dependencies\n- Standard: Multiple components, moderate investigation required\n- Complex: Cross-system, race conditions, complex state management\n\n**Determine Automation Level:**\nAsk user: \"What automation level for this investigation?\"\n- High: Auto-approve decisions >8.0 confidence, minimal confirmations\n- Medium: Standard confirmations for key decisions\n- Low: Extra confirmations, manual approval for all changes\n\n---\n\n**PART 2: User Debugging Preferences**\n\n**Check for preferences in:**\n- User settings/memory\n- Project documentation (team standards)\n- Previous instructions/guidance\n\n**Categorize preferences:**\n- Debugging Tools: debugger vs logs vs traces\n- Log Verbosity: detailed vs concise\n- Output Format: structured vs human-readable\n- Testing Approach: unit vs integration test focus\n- Commit Style: conventional vs descriptive\n- Documentation: inline comments vs separate docs\n- Error Handling: fail fast vs defensive\n\n**If no explicit preferences, ask user:**\n- \"Verbose logging or concise summaries?\"\n- \"Interactive debuggers or log analysis?\"\n- \"Any specific tools or approaches your team prefers?\"\n\n---\n\n**PART 3: Tool Availability Check**\n\n**Verify core tools:**\n\n1. **Analysis Tools**: Test availability of grep_search, read_file, codebase_search\n2. **Git Operations**: Check `git --version`, set gitAvailable flag\n3. **Build/Test Tools** (based on projectType): npm/yarn, Maven/Gradle, pytest, etc.\n4. **Debugging Tools**: Language-specific debuggers, profilers, log aggregation\n\n**Fallback strategies if tools unavailable:**\n- grep_search fails \u2192 use file_search\n- codebase_search fails \u2192 use grep_search with context\n- Git unavailable \u2192 track changes in INVESTIGATION_CONTEXT.md\n- Build tools missing \u2192 focus on static analysis\n\n---\n\n**PART 4: Initialize Investigation Context Document**\n\nUse createInvestigationBranch() for version control, then create INVESTIGATION_CONTEXT.md with bug summary, progress tracking, environment setup, and resumption instructions.\n\n**REQUIRED OUTPUTS:**\n\nSet ALL context variables:\n- `projectType`, `bugComplexity`, `debuggingMechanism`, `isDistributed`\n- `automationLevel` (High/Medium/Low)\n- `userDebugPreferences` (categorized preferences object)\n- `availableTools` (array of available tool names)\n- `gitAvailable` (boolean)\n- `toolLimitations` (string describing any restrictions)\n- `contextInitialized` = true\n\nCreate comprehensive INVESTIGATION_CONTEXT.md with all function definitions from metaGuidance.",
119
- "agentRole": "You are a senior investigation setup specialist with expertise in triage, environment configuration, and systematic investigation preparation. You excel at gathering complete context and preparing comprehensive investigation infrastructure.",
120
- "guidance": [
121
- "This phase is MECHANICAL ONLY - no code analysis or hypothesis formation",
122
- "Complete all 4 parts thoroughly before proceeding",
123
- "If critical information is missing, explicitly request it",
124
- "Test tool availability - don't assume",
125
- "Create comprehensive INVESTIGATION_CONTEXT.md for handoffs",
126
- "All context variables must be set before next phase",
127
- "This setup enables the entire investigation workflow"
128
- ],
129
- "requireConfirmation": false
130
- },
131
- {
132
- "id": "phase-0a-workflow-commitment",
133
- "title": "Phase 0a: Workflow Execution Commitment & Early Termination Checkpoint",
134
- "prompt": "**\u26a0\ufe0f WORKFLOW EXECUTION COMMITMENT CHECKPOINT \u26a0\ufe0f**\n\n*(Note: This checkpoint only appears in Medium/Low automation modes. High automation mode proceeds automatically.)*\n\nYou have completed Phase 0 (Complete Setup). Before proceeding to the investigation phases, you MUST acknowledge your understanding of workflow execution requirements AND make a critical decision.\n\n**CRITICAL UNDERSTANDING:**\n\n1. **This is a 23-step structured workflow, not freestyle debugging**\n - You MUST call workflow_next repeatedly until isComplete=true\n - You CANNOT stop early, even if you think you know the bug\n - You CANNOT \"figure it out\" and skip steps\n\n2. **Professional research shows 90% error rate for premature conclusions**\n - Even with 9-10/10 confidence, skipping systematic steps leads to wrong conclusions\n - Edge cases, alternative explanations, and interaction effects are missed\n - The workflow FORCES thoroughness for a reason\n\n3. **Remaining phases you MUST complete (regardless of confidence):**\n - \u2705 Phase 0: Triage & Setup (COMPLETED)\n - \u23f3 Phase 1: Multi-Dimensional Codebase Analysis (5 sub-phases)\n - \u23f3 Phase 2: Hypothesis Formation & Validation (8 sub-phases)\n - \u23f3 Phase 3: Debugging Instrumentation\n - \u23f3 Phase 4: Evidence Collection & Experimentation\n - \u23f3 Phase 5: Final Analysis & Confidence Assessment\n - \u23f3 Phase 6: Comprehensive Diagnostic Writeup (REQUIRED DELIVERABLE)\n\n4. **What \"completion\" means:**\n - isComplete=true from workflow_next (this happens ONLY after Phase 6)\n - Comprehensive diagnostic writeup produced\n - All evidence documented\n - Investigation context finalized\n\n**DECISION POINT: Continue Full Investigation or Request Early Termination?**\n\n**OPTION 1: Complete Full Investigation (STRONGLY RECOMMENDED)**\n- Proceed through all 23 workflow steps\n- Professional standard for thorough bug investigations\n- 90% success rate for identifying true root cause\n- Produces comprehensive diagnostic writeup with actionable findings\n\n**OPTION 2: Request Early Termination (STRONGLY DISCOURAGED)**\n- Skip systematic investigation and attempt to diagnose now\n- \u26a0\ufe0f **EVIDENCE-BASED WARNING:**\n - **Professional research spanning 20+ years shows that agents who skip systematic investigation are wrong about the root cause ~90% of the time**\n - Even if you feel 9-10/10 confident right now, completing the full workflow dramatically increases accuracy\n - Quick conclusions miss: contracts & invariants, pattern deviations, HOT paths, edge cases, interaction effects, and alternative explanations\n - The full workflow takes ~30-60 minutes but increases fix success rate from ~10% to ~90%\n\n**REQUIRED AGENT ACTIONS:**\n\n1. **Acknowledge workflow commitment:**\n\"I acknowledge that I am executing a structured 23-step workflow and I MUST call workflow_next for every step until isComplete=true. I understand that stopping early leads to incorrect conclusions ~90% of the time. I will complete all phases (0-6) regardless of my confidence level.\"\n\n2. **State your recommendation:**\nYou MUST recommend OPTION 1 (complete full investigation) unless you have truly exceptional circumstances.\n\n3. **MANDATORY USER COMMUNICATION:**\nYou MUST explicitly tell the user (this is NOT optional):\n\"**I strongly recommend we complete the full systematic investigation (26 steps through Phase 6). Professional research shows this approach identifies the TRUE root cause ~90% of the time, compared to ~10% for quick conclusions. Even if I develop high confidence early, completing the full workflow\u2014including contracts analysis, pattern discovery, HOT path analysis, instrumentation, and evidence collection\u2014dramatically increases the likelihood of correctly identifying the root cause and preventing wasted time on wrong fixes.**\n\nDo you want to proceed with the full investigation (recommended), or would you prefer I attempt a quick diagnosis now (discouraged)?\"\n\n**USER CONFIRMATION REQUIRED:**\nThe user must explicitly choose to proceed with full investigation or request early termination.",
135
- "agentRole": "You are a workflow governance specialist ensuring agents understand they are bound to execute all workflow steps systematically, and that they MUST communicate the value of full workflow completion to users.",
136
- "guidance": [
137
- "This checkpoint prevents premature termination at the earliest possible point",
138
- "Agents must explicitly acknowledge workflow structure AND communicate value to user",
139
- "The MANDATORY USER COMMUNICATION is not optional - agents MUST say this exact message",
140
- "Agents must recommend Option 1 unless truly exceptional circumstances exist",
141
- "This is both a psychological commitment device and a user education moment",
142
- "Users must explicitly confirm proceeding with full investigation",
143
- "If user chooses early termination, agent must acknowledge 90% error rate and proceed with best-effort quick diagnosis"
144
- ],
145
- "requireConfirmation": true,
146
- "runCondition": {
147
- "var": "automationLevel",
148
- "not_equals": "High"
149
- }
150
- },
151
- {
152
- "id": "phase-1-iterative-analysis",
153
- "type": "loop",
154
- "title": "Phase 1: Multi-Dimensional Codebase Analysis",
155
- "loop": {
156
- "type": "for",
157
- "count": 5,
158
- "maxIterations": 5,
159
- "iterationVar": "analysisPhase"
160
- },
161
- "body": [
162
- {
163
- "id": "analysis-neighborhood-contracts",
164
- "title": "Analysis 1/5: Neighborhood, Call Graph & Contracts",
165
- "prompt": "**NEIGHBORHOOD & CONTRACTS DISCOVERY - Build Structural Foundation**\n\nGoal: Build lightweight understanding of code structure, relationships, and contracts BEFORE diving into details. This provides the scaffolding for all subsequent analysis.\n\n**STEP 1: Compute Module Root**\n- Find nearest common ancestor of error stack trace files\n- Clamp to package boundary or src/ directory\n- This defines your investigation scope\n- Set `moduleRoot` context variable\n\n**STEP 2: Neighborhood Map** (cap per file to prevent analysis paralysis)\n- For each file in error stack trace:\n - List immediate neighbors (same directory, max 8)\n - Find imports/exports directly used (max 10)\n - Locate co-located tests (same name pattern)\n - Identify closest entry points: routes, endpoints, CLI commands (max 5)\n- Produce table: File | Neighbors | Tests | Entry Points\n\n**STEP 3: Bounded Call Graph** (Small Multiples with HOT Path Ranking)\n- For each failing function/class in stack trace:\n - Build call graph \u22642 hops deep (inbound and outbound)\n - Cap total nodes at \u226415 per failing symbol\n - Score edges for HOT path ranking:\n * Error location in path: +3\n * Entry point to path: +2 \n * Test coverage exists: +1\n * Mentioned in ticket/error message: +1\n - Tag paths as HOT if score \u22653\n - Use Small Multiples ASCII visualization:\n * Width \u2264100 chars per path\n * Format: `EntryPoint -> Caller -> [*FailingSymbol*] -> Callee`\n * Mark changed/failing code as `[*name*]`\n * Add HOT tag for high-impact paths\n * \u22648 total paths, prioritize HOT paths first\n - If graph exceeds caps, use Adjacency Summary instead:\n * Table: Node | Inbound | Outbound | Notes\n * Top-K by degree/frequency\n- Create Alias Legend for repeated subpaths:\n * A1 = common.validation.validateInput\n * A2 = database.connection.getPool\n * Reuse aliases across all paths\n\n**STEP 4: Flow Anchors** (Entry Points to Bug)\n- Map how users/systems trigger the bug:\n - HTTP routes \u2192 handlers \u2192 failing code\n - CLI commands \u2192 execution \u2192 failing code \n - Scheduled jobs \u2192 workers \u2192 failing code\n - Event handlers \u2192 callbacks \u2192 failing code\n- Produce table: Anchor Type | Entry Point | Target Symbol | User Action\n- Cap at \u22645 most relevant anchors\n- Note: This tells us HOW the bug is reached\n\n**STEP 5: Contracts & Invariants**\n- Within `moduleRoot` and immediate neighbors:\n - List public API symbols (exported functions/classes)\n - Document API endpoints (REST/GraphQL/RPC)\n - Identify database tables/collections touched\n - Note message queue topics/events\n - Extract stated invariants from:\n * JSDoc/docstrings with @invariant\n * Assertions in code\n * Validation logic patterns\n * Comments describing guarantees\n- Produce table: Symbol/API | Contract | Invariant | Location\n- Focus on contracts related to failing code\n\n**STEP 6: Assumption Verification** (NOW that you've seen the code)\nNow that you understand the code structure, verify assumptions from the bug report:\n\n1. **Bug Report Assumptions**:\n - Is the described behavior actually a bug, or might it be expected based on what you've seen?\n - Are the reproduction steps accurate given the code paths you've mapped?\n - Is the error message consistent with the actual code flow?\n - Are there missing steps or context in the bug report?\n\n2. **API/Library Assumptions**:\n - Check documentation for any APIs/libraries mentioned in stack trace\n - Verify actual behavior vs assumed behavior\n - Note any version-specific behavior that might matter\n\n3. **Environment Assumptions**:\n - Based on code, could this be environment-specific?\n - Are there configuration dependencies visible in the code?\n - Could timing/concurrency be a factor (based on code structure)?\n\n4. **Recent Changes Impact**:\n - Review last 5 commits affecting the failing code\n - Do they relate to the bug or point to alternative causes?\n\n**Document**: Create AssumptionVerification.md with verified/challenged assumptions.\n\n---\n\n**OUTPUT: Create StructuralAnalysis.md with:**\n- Module Root declaration\n- Neighborhood Map table\n- Bounded Call Graph (Small Multiples ASCII or Adjacency Summary)\n- Alias Legend (for call graph subpaths)\n- Flow Anchors table\n- Contracts & Invariants table\n- Self-Critique: 1-2 areas of uncertainty\n\n**CAPS (strictly enforce to prevent analysis paralysis):**\n- \u22648 neighbors per file\n- \u226410 imports per file\n- \u22645 entry points total\n- \u226415 call graph nodes per failing symbol\n- \u22648 total call graph paths\n- \u22645 flow anchors\n- \u2264100 chars width for ASCII paths",
166
- "agentRole": "You are a codebase navigator building structural understanding. Your focus is mapping relationships, entry points, and contracts WITHOUT diving into implementation details yet.",
167
- "guidance": [
168
- "\ud83d\udea8 USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early even if you think you found the bug.",
169
- "DO NOT rationalize: 'I have high confidence so I can do a targeted Phase 2.' NO. Complete all 5 iterations FIRST.",
170
- "Agents who skip analysis iterations are wrong ~95% of the time. The later iterations catch edge cases and alternative explanations.",
171
- "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5.",
172
- "This is analysis phase 1 of 5 total phases",
173
- "Phase 1a = Structure + Assumption Verification - Build the map, THEN question the bug report",
174
- "Initialize majorIssuesFound = false",
175
- "STEPS 1-5: Build structural understanding FIRST",
176
- "STEP 6: NOW verify assumptions - you have context to challenge the bug report",
177
- "CRITICAL: You can't meaningfully question assumptions before seeing code",
178
- "STRICTLY ENFORCE CAPS - this prevents 2-hour rabbit holes",
179
- "Small Multiples: Render mini ASCII path diagrams (\u22646 nodes per path)",
180
- "HOT Path Ranking: Score and prioritize high-impact paths",
181
- "Alias Legend: Collapse repeated subpaths with deterministic aliases (A1, A2...)",
182
- "Adjacency Summary: If caps exceeded, use tabular summary instead of full graph",
183
- "Contracts are CRITICAL: They tell us what guarantees the code must maintain",
184
- "Flow Anchors show HOW users trigger the bug - essential for reproduction",
185
- "Create StructuralAnalysis.md AND AssumptionVerification.md",
186
- "Update INVESTIGATION_CONTEXT.md with module root and structural summary",
187
- "This phase provides the scaffolding for all subsequent analysis"
188
- ],
189
- "runCondition": {
190
- "var": "analysisPhase",
191
- "equals": 1
192
- },
193
- "requireConfirmation": false
194
- },
195
- {
196
- "id": "analysis-breadth-scan",
197
- "title": "Analysis 2/5: Breadth Scan & Pattern Discovery",
198
- "prompt": "**BREADTH SCAN - Cast Wide Net + Learn Expected Behavior**\n\nGoal: Understand full system impact, identify all potentially involved components, and discover existing code patterns to understand expected behavior.\n\n**PART A: Pattern Discovery (Learn How Code SHOULD Work)**\n1. **Compute Module Root**: Find nearest common ancestor of error stack trace files, clamped to package/src\n2. **Discover Patterns** (scan only moduleRoot, exclude failing files from pattern definition):\n - Naming conventions (classes, methods, variables)\n - Error handling patterns (try/catch, error propagation, logging)\n - Logging patterns (format, verbosity, error vs info vs debug)\n - Data validation patterns (where/how data is checked)\n - Test patterns (structure, naming, assertion style)\n - Require \u22652 occurrences across distinct files to qualify as pattern\n3. **Capture Pattern Catalog**: Document validated patterns with 1-3 exemplar locations (file:line)\n4. **Identify Pattern Deviations in Failing Code**: Compare failing code against pattern catalog\n\n**PART B: Error Propagation & Component Discovery**\n1. **ERROR PROPAGATION MAPPING**: Use grep_search for all error occurrences, trace error messages across log files, map stack traces to identify call chains, document every point where error appears/handled\n2. **COMPONENT DISCOVERY**: Find components interacting with failing area, use codebase_search \"How is [component] used?\", identify callers/callees, cap to top 10 most suspicious, rank by likelihood (1-10)\n3. **BOUNDED CALL GRAPH**: For failing function, build call graph \u22642 hops deep, cap at \u226415 total nodes, identify HOT paths (paths through error location), prioritize HOT paths in analysis\n4. **FLOW ANCHORS**: Map entry points (routes/endpoints/CLI commands) to failing code, cap at \u22645 anchors, note which user actions trigger the bug\n\n**PART C: Data Flow & Changes**\n1. **DATA FLOW MAPPING**: Trace data through bug area, identify transformations, persistence points, corruption opportunities - but CAP scope to moduleRoot and 2-hop neighborhood\n2. **RECENT CHANGES ANALYSIS**: Git history for identified components (last 10 commits), identify when bug appeared, related PRs/issues, config/dependency changes\n3. **HISTORICAL PATTERN SEARCH**: Use findSimilarBugs() for similar error patterns, previous fixes, related test failures\n\n**Output**: Create BreadthAnalysis.md with:\n- Pattern Catalog (validated patterns + exemplars)\n- Pattern Deviations (how failing code differs from expected patterns)\n- Bounded Call Graph (\u226415 nodes, HOT paths highlighted)\n- Flow Anchors Table (entry point \u2192 failing symbol)\n- Suspicious Components (top 10, ranked 1-10)\n- Data Flow Map (scoped to moduleRoot + 2 hops)\n- Recent Changes Timeline\n- Historical Similar Bugs\n\n**Self-Critique**: List 1-2 areas where you have low confidence or missing information.",
199
- "agentRole": "You are performing systematic analysis phase 2 of 5. Your focus is understanding both what IS happening (error propagation) and what SHOULD happen (pattern discovery) to identify deviations.",
200
- "guidance": [
201
- "This is analysis phase 2 of 5 total phases",
202
- "Phase 1b = Breadth + Patterns - Learn expected behavior AND map error propagation",
203
- "Create BreadthAnalysis.md with structured findings",
204
- "CRITICAL: Discover patterns FIRST from working code, THEN compare failing code to patterns",
205
- "Pattern deviations often reveal the bug (e.g., missing validation, different error handling)",
206
- "Apply CAPS to prevent analysis paralysis: \u226410 components, \u226415 call graph nodes, \u22645 flow anchors, \u22642 hops",
207
- "HOT PATH RANKING: Score paths by (error in path=3, entry point=2, test coverage=1); tag HOT if score\u22653",
208
- "BOUNDED CALL GRAPH: Use codebase_search to find callers/callees, stop at 2 hops, cap nodes, dedupe",
209
- "PATTERN DISCOVERY: Require \u22652 occurrences to qualify as pattern; singletons are 'candidate conventions' only",
210
- "SELF-CRITIQUE: Explicitly note 1-2 areas of uncertainty or missing information",
211
- "Update INVESTIGATION_CONTEXT.md after completion",
212
- "Use the function definitions for standardized operations"
213
- ],
214
- "runCondition": {
215
- "var": "analysisPhase",
216
- "equals": 2
217
- },
218
- "requireConfirmation": false
219
- },
220
- {
221
- "id": "analysis-deep-dive",
222
- "title": "Analysis 3/5: Component Deep Dive with Hot-Path Focus",
223
- "prompt": "**COMPONENT DEEP DIVE - Prioritized Investigation**\n\nGoal: Deep understanding of top 5 suspicious components from breadth scan, prioritizing HOT paths and pattern deviations.\n\n**PRIORITIZATION (from Phase 1):**\n1. Focus on components on HOT paths (score \u22653)\n2. Prioritize components with pattern deviations\n3. Rank by likelihood score from Phase 1\n4. Cap analysis to top 5 components\n\n**FOR EACH COMPONENT (recursive 3-level analysis):**\n\n**LEVEL 1 - DIRECT IMPLEMENTATION** (prioritize HOT paths and deviation areas):\n- Read complete file (or HOT path sections if file >500 lines)\n- Compare error handling against pattern catalog from Phase 1\n- Identify pattern deviations with file:line locations\n- Check state management, initialization, cleanup\n- Document invariants and assumptions\n- Note TODO/FIXME/HACK/BUG comments\n- Red flags: complex logic, missing validation, race conditions\n\n**LEVEL 2 - DIRECT DEPENDENCIES** (cap at \u226410 deps per component):\n- Follow imports on HOT paths first\n- Check dependency contracts and interfaces\n- Analyze coupling and data exchange\n- Look for shared mutable state\n- Identify circular dependencies\n- Document failure propagation paths\n\n**LEVEL 3 - INTEGRATION POINTS** (cap at \u22648 integration points):\n- External calls (DB, API, file system) - cap at \u22645\n- Concurrency/threading concerns\n- Resource management issues\n- Caching and state sync\n- Event handling and callbacks\n- Configuration dependencies\n\n**FOR EACH COMPONENT, PRODUCE:**\n- **Likelihood Score** (1-10): Weight HOT paths +3, pattern deviations +2, recent changes +1\n- **Suspicious Sections**: Specific file:line with rationale (\u22645 per component)\n- **Failure Modes**: How this component could cause the observed bug (\u22643 scenarios)\n- **Pattern Violations**: How it deviates from expected patterns (from Phase 1)\n- **Critical Dependencies**: Top 3 dependencies that could be sources\n\n**Output**: Create ComponentAnalysis.md with:\n- Component Rankings (1-5, sorted by likelihood score)\n- Per-Component Analysis (following structure above)\n- Pattern Violation Summary\n- Critical Path Map (which components are on HOT paths)\n- **Self-Critique**: 1-2 components you're uncertain about and why\n\n**CAPS TO PREVENT ANALYSIS PARALYSIS:**\n- Top 5 components only\n- \u226410 dependencies per component\n- \u22648 integration points per component\n- \u22645 suspicious sections per component\n- \u22643 failure modes per component",
224
- "agentRole": "You are performing systematic analysis phase 3 of 5. Your focus is deep-diving into the most suspicious components, prioritizing HOT paths and pattern deviations.",
225
- "guidance": [
226
- "This is analysis phase 3 of 5 total phases",
227
- "Phase 1c = Deep Dive - Focus on HOT paths and pattern violations",
228
- "Build on findings from Phase 1 (patterns, HOT paths, flow anchors)",
229
- "Create ComponentAnalysis.md with structured findings",
230
- "Use recursiveAnalysis() for systematic exploration",
231
- "PRIORITIZE HOT PATHS: Analyze code on HOT paths before other code",
232
- "PATTERN-DRIVEN: Compare actual code against pattern catalog from Phase 1",
233
- "APPLY CAPS STRICTLY: Prevents spending 2 hours reading every file",
234
- "SELF-CRITIQUE: Note where you're uncertain or making assumptions",
235
- "Update INVESTIGATION_CONTEXT.md after completion"
236
- ],
237
- "runCondition": {
238
- "var": "analysisPhase",
239
- "equals": 3
240
- },
241
- "requireConfirmation": false
242
- },
243
- {
244
- "id": "analysis-dependencies",
245
- "title": "Analysis 4/5: Dependencies & Flow",
246
- "prompt": "**DEPENDENCY & FLOW ANALYSIS - Trace Connections**\n\nGoal: Understand how components interact and data flows between them.\n\nPerform: Static dependency graph analysis, Runtime flow analysis, Data transformation pipeline tracing, and Integration analysis.\n\n**Output**: FlowAnalysis.md with sequence diagrams showing execution flow, data flow maps with transformation points, complete dependency graph, list of all integration points and failure modes, and timeline showing order of operations.",
247
- "agentRole": "You are performing systematic analysis phase 4 of 5. Your focus is tracing how components connect and data flows between them.",
248
- "guidance": [
249
- "This is analysis phase 4 of 5 total phases",
250
- "Phase 1d = Dependencies - Trace connections and data flows",
251
- "Build on component understanding from Phase 2",
252
- "Create FlowAnalysis.md with diagrams and flow charts",
253
- "STATIC DEPENDENCY GRAPH: Build complete import/dependency tree, identify circular dependencies, find hidden dependencies (reflection, dynamic loading, DI), map version constraints and compatibility, document shared libraries and utilities, note tight coupling or fragile dependencies",
254
- "RUNTIME FLOW ANALYSIS: Trace execution paths to bug, identify async/concurrent flows and coordination, map state changes through execution, document control flow (conditionals, loops, exceptions), track callback chains and event handlers, identify divergence points, note timing dependencies and race conditions",
255
- "DATA TRANSFORMATION PIPELINE: Track data from input to error point, document each transformation with input/output types, identify validation points and what they check, find where data could be corrupted/lost, note serialization/deserialization boundaries, track data format conversions, document enrichment/filtering steps",
256
- "INTEGRATION ANALYSIS: External service calls and failure modes, database interactions (reads/writes/transactions), message queue operations and formats, file system operations and error handling, network calls and timeout handling, cache usage and invalidation, third-party library calls",
257
- "Focus on runtime behavior and integration points",
258
- "Update INVESTIGATION_CONTEXT.md after completion",
259
- "Pay special attention to async boundaries and error propagation",
260
- "Look for implicit dependencies that aren't obvious from imports"
261
- ],
262
- "runCondition": {
263
- "var": "analysisPhase",
264
- "equals": 4
265
- },
266
- "requireConfirmation": false
267
- },
268
- {
269
- "id": "analysis-test-coverage",
270
- "title": "Analysis 5/5: Test Coverage",
271
- "prompt": "**TEST COVERAGE ANALYSIS - Leverage Existing Knowledge**\n\nGoal: Use existing tests as source of truth about system behavior.\n\nFor each suspicious component, use analyzeTests(component) to perform: Direct test coverage analysis, Integration test analysis, Test history investigation, Test execution with debugging, and Coverage gap analysis.\n\n**Output**: TestAnalysis.md with coverage gaps matrix, suspicious test patterns, test evidence for hypotheses, recommendations for tests to add, and complete test inventory for affected components.",
272
- "agentRole": "You are performing systematic analysis phase 5 of 5. Your focus is leveraging existing tests to understand expected behavior and find coverage gaps.",
273
- "guidance": [
274
- "This is analysis phase 5 of 5 total phases",
275
- "Phase 1e = Tests - Analyze test coverage and quality",
276
- "Build on all previous analysis phases",
277
- "Create TestAnalysis.md with coverage gap matrix",
278
- "DIRECT TEST COVERAGE: Find all tests using grep/test discovery, analyze what's tested (happy/edge/error cases), identify what's NOT tested, check test quality and assertion strength, note mocking/stubbing that might hide issues, review test names and docs",
279
- "INTEGRATION TEST ANALYSIS: Find end-to-end tests for bug area, analyze assumptions/preconditions, check for flaky tests, review disabled/skipped tests and why, look for TODO/incomplete tests, identify multi-component tests, verify if tests cover failing scenario",
280
- "TEST HISTORY: When were tests added/modified? Do test changes correlate with bug appearance? Were tests removed/disabled recently? Use git blame for authors and context, look for related PRs/issues, review test evolution",
281
- "TEST EXECUTION WITH DEBUGGING: Run tests with debug flags (--verbose, --debug), add instrumentation to tests themselves, compare expected vs actual in detail, run in isolation and in suite, try different orderings to check dependencies, monitor resource usage",
282
- "COVERAGE GAP ANALYSIS: Use coverage tools for untested code paths, map coverage to bug components, identify branches/conditions never exercised, note error handling without tests, document missing edge cases, recommend tests to add",
283
- "Run tests with debug flags for additional insights",
284
- "After completion, use trackInvestigation('Phase 1 Complete', 'Moving to Hypothesis Development')",
285
- "Tests often reveal the 'expected' behavior - compare with actual behavior",
286
- "Missing tests often indicate areas where bugs hide"
287
- ],
288
- "runCondition": {
289
- "var": "analysisPhase",
290
- "equals": 5
291
- },
292
- "requireConfirmation": false
293
- }
294
- ],
295
- "requireConfirmation": false
296
- },
297
- {
298
- "id": "phase-1a-binary-search",
299
- "title": "Phase 1a: Binary Search Isolation",
300
- "runCondition": {
301
- "or": [
302
- {
303
- "var": "bugType",
304
- "equals": "regression"
305
- },
306
- {
307
- "var": "searchSpace",
308
- "equals": "large"
309
- }
310
- ]
311
- },
312
- "prompt": "**BINARY SEARCH** - Apply divide-and-conquer:\n\n1. Identify GOOD state (working) and BAD state (broken)\n2. Find midpoint in history/code/data\n3. Test midpoint state\n4. Narrow to relevant half\n5. Document reduced search space\n\n**OUTPUT**: Narrowed location with evidence.",
313
- "agentRole": "You are a systematic investigator using algorithmic search to efficiently isolate issues.",
314
- "guidance": [
315
- "VERSION CONTROL: Use 'git bisect' or equivalent for commit history searches",
316
- "DATA PIPELINE: Test data at pipeline midpoints to isolate transformation issues",
317
- "TIME WINDOWS: For time-based issues, binary search through timestamps",
318
- "DOCUMENT BOUNDARIES: Clearly record each tested boundary and result",
319
- "EFFICIENCY: Each test should eliminate ~50% of remaining search space"
320
- ]
321
- },
322
- {
323
- "id": "phase-1b-test-reduction",
324
- "title": "Phase 1b: Test Case Minimization",
325
- "runCondition": {
326
- "var": "bugSource",
327
- "equals": "failing_test"
328
- },
329
- "prompt": "**TEST REDUCTION** - Simplify failing test:\n\n1. Inline called methods into test\n2. Add earlier assertion to fail sooner\n3. Remove code after new failure point\n4. Repeat until minimal\n\n**OUTPUT**: Minimal failing test case.",
330
- "agentRole": "You are a surgical debugger who strips away layers to reveal core issues.",
331
- "guidance": [
332
- "PRESERVE FAILURE: Each reduction must maintain the original failure mode",
333
- "INLINE AGGRESSIVELY: Replace method calls with their actual implementation",
334
- "FAIL EARLY: Move assertions up to find earliest deviation from expected state",
335
- "REMOVE RUTHLESSLY: Delete all code that doesn't contribute to the failure",
336
- "CLARITY GOAL: Final test should make the bug obvious to any reader"
337
- ]
338
- },
339
- {
340
- "id": "phase-1f-breadth-verification",
341
- "title": "Phase 1f: Final Breadth & Scope Verification",
342
- "prompt": "**FINAL BREADTH & SCOPE VERIFICATION - Catch Tunnel Vision NOW**\n\n\u26a0\ufe0f **CRITICAL CHECKPOINT BEFORE HYPOTHESES**: This step prevents the #1 cause of wrong conclusions: looking in the wrong place or missing the wider context.\n\n**Goal**: Verify you analyzed the RIGHT code with sufficient breadth AND depth before committing to hypotheses.\n\n\ud83d\udea8 **DO NOT STOP HERE - CRITICAL MISUNDERSTANDING:**\n\n**\"I FOUND THE BUG\" \u2260 DONE. \"I PROVED THE BUG\" = DONE.**\n\nEven if you think you found the bug during analysis, you have ZERO PROOF:\n- \"Finding\" the bug = You have a THEORY/GUESS based on code analysis\n- \"Proving\" the bug = You have EVIDENCE from instrumentation + logs + validation (Phases 3-5)\n\nAnalysis = educated guesses. Proof comes from Phases 3-5 (instrumentation + evidence). You are only ~25% done.\n\n**DO NOT create summary documents or \"comprehensive findings\" now. That is Phase 6, not Phase 1f.**\n\nMUST continue to Phase 2 (Hypothesis Formation), then Phases 3-5 (Evidence Collection).\n\n---\n\n**STEP 1: Scope Sanity Check**\n\nAsk yourself these questions:\n1. **Module Root Correctness**: Is the `moduleRoot` from Phase 1a actually correct?\n - Does it include ALL files in the error stack trace?\n - Did I clamp too narrowly to a subdirectory when the bug spans multiple modules?\n - Should I expand scope to parent directory or adjacent modules?\n\n2. **Missing Adjacent Systems**: Did I consider:\n - Adjacent microservices/modules that interact with this one?\n - Shared libraries or utilities used here?\n - Configuration systems (env vars, config files, feature flags)?\n - Caching layers or state management systems?\n - Database schema or data migration issues?\n\n3. **Entry Point Coverage**: From Phase 1a Flow Anchors, did I verify:\n - ALL entry points that could trigger this bug?\n - Less obvious entry points (background jobs, scheduled tasks, webhooks)?\n - Initialization code that runs before the failing code?\n\n---\n\n**STEP 2: Wide-Angle Review**\n\nReview your Phase 1 analysis outputs and answer:\n\n1. **Pattern Confidence** (from Phase 1, sub-phase 2):\n - Do I have a solid Pattern Catalog with \u22652 occurrences per pattern?\n - Did I identify clear pattern deviations in failing code?\n - Are there OTHER files that deviate from patterns I haven't looked at?\n\n2. **Call Graph Completeness** (from Phase 1, sub-phase 1 & 2):\n - Did my bounded call graph capture all HOT paths?\n - Are there callers OUTSIDE my 2-hop boundary I should check?\n - Did I trace BACKWARDS from the error far enough (to true entry points)?\n\n3. **Component Rankings** (from Phase 1, sub-phase 3):\n - Are my top 5 components actually the most suspicious?\n - Did I miss components because they're not in the stack trace?\n - Should I re-rank based on new understanding?\n\n4. **Data Flow Completeness** (from Phase 1, sub-phase 4):\n - Did I trace data flow from TRUE origin (user input, external system)?\n - Are there data transformations BEFORE my analyzed scope?\n - Did I check data validation at ALL boundaries?\n\n5. **Test Coverage Gaps** (from Phase 1, sub-phase 5):\n - Did I find tests that SHOULD exist but don't?\n - Are there missing test categories (integration, edge cases, error conditions)?\n - Do test gaps reveal I'm looking in wrong place?\n\n---\n\n\n\n**STEP 2.5: Assumption Verification**\n\n**NOW that you've completed 5 phases of code analysis, verify all assumptions:**\n\n1. **Bug Report Assumptions**:\n - Is the described behavior actually a bug based on what you now know about the code?\n - Are the reproduction steps accurate given the code paths you've mapped?\n - Is the error message consistent with the actual code flow you've traced?\n - Are there missing steps or context in the bug report that your analysis revealed?\n\n2. **API/Library Assumptions**:\n - Check documentation for any APIs/libraries mentioned in stack trace\n - Verify actual behavior vs assumed behavior based on your code analysis\n - Note any version-specific behavior that might matter\n - Did your call graph analysis reveal unexpected library usage patterns?\n\n3. **Environment Assumptions**:\n - Based on code analysis, is this environment-specific?\n - Are there configuration dependencies you discovered in the code?\n - Could timing/concurrency be a factor (based on code structure you analyzed)?\n - Did pattern analysis reveal environment-dependent code paths?\n\n4. **Recent Changes Impact**:\n - Review last 5-10 commits affecting the analyzed code\n - Do they relate to the bug or point to alternative causes?\n - Did your analysis reveal recent changes that break established patterns?\n\n**Document**: Create or update AssumptionVerification.md with verified/challenged assumptions.\n\n**Set**: `assumptionsVerified = true` in context\n\n---\n**STEP 3: Alternative Scope Analysis**\n\n**Generate 2-3 alternative investigation scopes and evaluate:**\n\nFor each alternative scope, assess:\n- **Scope Description**: What module/area would this focus on?\n- **Why It Might Be Better**: What evidence suggests this scope?\n- **Evidence For**: What supports investigating this area?\n- **Evidence Against**: Why might this be wrong direction?\n- **Confidence**: Rate 1-10 that this is the right scope\n\n**Example Alternative Scopes**:\n- Expand to parent module (if current feels too narrow)\n- Shift to adjacent service (if this might be symptom not cause)\n- Focus on infrastructure layer (if might be env/config issue)\n- Focus on data layer (if might be data corruption/migration issue)\n\n---\n\n**STEP 4: Breadth Decision**\n\nBased on Steps 1-3, make ONE of these decisions:\n\n**OPTION A: SCOPE IS CORRECT - Continue to Hypothesis Development**\n- Current module root and analyzed components are right\n- Breadth and depth are sufficient\n- Ready to form hypotheses with confidence\n- Set `scopeVerified = true` and proceed\n\n**OPTION B: EXPAND SCOPE - Additional Analysis Required**\n- Identified critical gaps in breadth or depth\n- Need to analyze additional modules/components\n- Set specific components/areas to add to analysis\n- Set `needsScopeExpansion = true`\n- Document what to add: `additionalAnalysisNeeded = [list]`\n\n**OPTION C: SHIFT SCOPE - Wrong Area**\n- Current focus is likely wrong place\n- Alternative scope has stronger evidence\n- Need to restart Phase 1 with new module root\n- Set `needsScopeShift = true`\n- Set `newModuleRoot = [path]`\n\n---\n\n**OUTPUT: Create ScopeVerification.md**\n\nMust include:\n1. **Scope Sanity Check Results** (answers to Step 1 questions)\n2. **Wide-Angle Review Findings** (answers to Step 2 questions)\n3. **Alternative Scopes Evaluated** (2-3 alternatives with scores)\n4. **Breadth Decision** (A, B, or C with justification)\n5. **Confidence in Current Scope** (1-10)\n6. **Action Items** (if Option B or C selected)\n\n**Context Variables to Set**:\n- `scopeVerified` (true/false)\n- `needsScopeExpansion` (true/false)\n- `needsScopeShift` (true/false)\n- `scopeConfidence` (1-10)\n- `additionalAnalysisNeeded` (array, if Option B)\n- `newModuleRoot` (string, if Option C)\n\n---\n\n**\ud83c\udfaf WHY THIS MATTERS**: \n\nResearch shows that 60% of failed investigations looked in the wrong place or too narrowly. This checkpoint catches that BEFORE you invest effort in wrong hypotheses.\n\n**Self-Critique**: List 1-2 specific uncertainties about scope that concern you most.",
343
- "agentRole": "You are a senior investigator performing final scope verification. Your expertise is catching tunnel vision, identifying missing context, and ensuring investigations focus on the right area. You excel at meta-analysis and sanity checking investigative scope.",
344
- "guidance": [
345
- "This step comes AFTER Phase 1 (5-phase analysis loop) and BEFORE Phase 2a (hypothesis development)",
346
- "Goal: Catch tunnel vision and wrong-place investigations BEFORE committing to hypotheses",
347
- "Create ScopeVerification.md with structured findings",
348
- "STEP 2.5: Verify ALL assumptions from bug report now that you have full code context",
349
- "Assumption verification MUST happen AFTER all 5 analysis iterations for full context",
350
- "CRITICAL: Evaluate 2-3 ALTERNATIVE scopes to challenge your current focus",
351
- "Common mistakes: too narrow scope, missed adjacent systems, wrong module root, insufficient entry point coverage",
352
- "If Option B (expand) or C (shift) selected, you MUST execute additional analysis before proceeding",
353
- "High confidence (\u22658) in current scope required to proceed to hypotheses",
354
- "This prevents the #1 cause of wrong conclusions: looking in wrong place",
355
- "Update INVESTIGATION_CONTEXT.md with scope verification results"
356
- ],
357
- "requireConfirmation": false
358
- },
359
- {
360
- "id": "phase-2a-hypothesis-development",
361
- "title": "Phase 2a: Hypothesis Development & Prioritization",
362
- "prompt": "**HYPOTHESIS GENERATION** - Based on codebase analysis, formulate testable hypotheses about the bug's root cause.\n\n\ud83d\udea8 **YOU ARE NOT DONE - \"FINDING\" \u2260 \"PROVING\"**\n\n**You have THEORIES, not EVIDENCE. You have FOUND possible causes, not PROVED them.**: You have completed Phase 1 (Analysis). You do NOT have proof yet. You have THEORIES, not EVIDENCE. Phases 2-6 are MANDATORY to prove your hypotheses and produce the diagnostic writeup. You are ~30% done.\n\n**CRITICAL REMINDERS:**\n- Even if you're \"100% confident\" in a hypothesis, it's unproven without instrumentation + evidence (Phases 3-5)\n- Confidence in a theory \u2260 proof of that theory\n- Professional practice requires validation even with high confidence\n- The workflow requires you to continue through all phases\n- DO NOT provide final conclusions or \"stop here\" - you MUST continue\n\n---\n\n**STEP 1: Evidence-Based Hypothesis Development**\nCreate maximum 5 prioritized hypotheses. Each includes:\n- **Root Cause Theory**: Specific technical explanation\n- **Supporting Evidence**: Code patterns/logic flows supporting this theory\n- **Failure Mechanism**: Exact sequence leading to observed bug\n- **Testability Score**: Quantified assessment (1-10) of validation ease\n- **Evidence Strength Score**: Quantified assessment (1-10) based on code findings\n\n**STEP 2: Hypothesis Prioritization Matrix**\nRank hypotheses using weighted scoring:\n- **Evidence Strength** (40%): Code analysis support for theory\n- **Testability** (35%): Validation ease with debugging instruments\n- **Impact Scope** (25%): How well this explains all symptoms\n\n**STEP 3: Pattern Integration**\nIncorporate findings from findSimilarBugs():\n- **Historical Patterns**: Similar bugs fixed previously\n- **Known Issues**: Related problems in the codebase\n- **Test Failures**: Similar test failure patterns\n- Adjust hypothesis confidence based on pattern matches\n\n**CRITICAL RULE**: All hypotheses must be based on concrete evidence from code analysis.\n\n**OUTPUTS**: Maximum 5 hypotheses with quantified scoring, ranked by priority.\n\n**\u26a0\ufe0f INVESTIGATION NOT COMPLETE**: Developing hypotheses with high evidence scores is excellent progress, but represents only ~35% of the investigation. Even if you have a hypothesis with 9-10/10 evidence strength:\n\n- You are NOT done with the investigation\n- You MUST continue to Phase 2b-2h to refine and validate hypotheses\n- You MUST continue to Phase 3 to implement instrumentation\n- You MUST continue to Phase 4-5 to collect and analyze evidence\n- You MUST continue to Phase 6 to produce the comprehensive diagnostic writeup\n\n**DO NOT set isWorkflowComplete=true at this stage.** The workflow requires completing all phases.",
363
- "agentRole": "You are a senior software detective and root cause analysis expert with deep expertise in systematic hypothesis formation. Your strength lies in connecting code evidence to potential failure mechanisms and creating testable theories. You excel at logical reasoning and evidence-based deduction. You must maintain rigorous quantitative standards and reject any hypothesis not grounded in concrete code evidence.",
364
- "guidance": [
365
- "EVIDENCE-BASED ONLY: Every hypothesis must be grounded in concrete code analysis findings with quantified evidence scores",
366
- "HYPOTHESIS LIMITS: Generate maximum 5 hypotheses to prevent analysis paralysis",
367
- "QUANTIFIED SCORING: Use 1-10 scales for evidence strength and testability with clear criteria"
368
- ],
369
- "validationCriteria": [
370
- {
371
- "type": "contains",
372
- "value": "Evidence Strength Score",
373
- "message": "Must include quantified evidence strength scoring (1-10) for each hypothesis"
374
- },
375
- {
376
- "type": "contains",
377
- "value": "Testability Score",
378
- "message": "Must include quantified testability scoring (1-10) for each hypothesis"
379
- }
380
- ],
381
- "hasValidation": true
382
- },
383
- {
384
- "id": "phase-2b-hypothesis-validation-strategy",
385
- "title": "Phase 2b: Hypothesis Validation Strategy & Documentation",
386
- "prompt": "**HYPOTHESIS VALIDATION PLANNING** - For the top 3 hypotheses, create validation strategies and documentation.\n\n**STEP 1: Hypothesis Validation Strategy**\nFor top 3 hypotheses, define:\n- **Required Evidence**: Specific evidence to confirm/refute hypothesis\n- **Debugging Approach**: Instrumentation/tests providing evidence\n- **Success Criteria**: Results proving hypothesis correct\n- **Confidence Threshold**: Minimum evidence quality needed\n\n**STEP 2: Hypothesis Documentation**\nCreate structured registry:\n- **Hypothesis ID**: H1, H2, H3 for tracking\n- **Status**: Active, Refuted, Confirmed\n- **Evidence Log**: Supporting and contradicting evidence\n- **Validation Plan**: Specific testing approach\n\n**STEP 3: Coverage Check**\nEnsure hypotheses cover diverse categories (logic, state, dependencies) with deep analysis.\n\n**STEP 4: Update Investigation Context**\nUse updateInvestigationContext('Hypothesis Registry', formatted hypothesis table with all details)\n\n**OUTPUTS**: Top 3 hypotheses selected for validation with structured documentation and validation plans.",
387
- "agentRole": "You are a systematic testing strategist and documentation expert. Your strength lies in creating clear validation plans and maintaining rigorous documentation standards for hypothesis tracking and evidence collection.",
388
- "guidance": [
389
- "STRUCTURED DOCUMENTATION: Create formal hypothesis registry with tracking IDs and status",
390
- "VALIDATION RIGOR: Only proceed with top 3 hypotheses that meet minimum evidence thresholds",
391
- "COMPREHENSIVE PLANNING: Each hypothesis must have clear validation approach and success criteria"
392
- ],
393
- "validationCriteria": [
394
- {
395
- "type": "contains",
396
- "value": "Hypothesis ID",
397
- "message": "Must assign tracking IDs (H1, H2, H3) to each hypothesis"
398
- },
399
- {
400
- "type": "regex",
401
- "pattern": "H[1-3]",
402
- "message": "Must use proper hypothesis ID format (H1, H2, H3)"
403
- }
404
- ],
405
- "hasValidation": true
406
- },
407
- {
408
- "id": "phase-2c-hypothesis-assumptions",
409
- "title": "Phase 2c: Hypothesis Assumption Audit",
410
- "prompt": "**AUDIT** each hypothesis for hidden assumptions:\n\n**FOR EACH HYPOTHESIS**:\n- List implicit assumptions\n- Rate assumption confidence (1-10)\n- Identify verification approach\n\n**REJECT** hypotheses built on unverified assumptions.",
411
- "agentRole": "You are a rigorous scientist who rejects any hypothesis not grounded in verified facts.",
412
- "guidance": [
413
- "EXPLICIT LISTING: Write out every assumption, no matter how obvious it seems",
414
- "CONFIDENCE SCORING: Rate 1-10 based on evidence quality, not intuition",
415
- "VERIFICATION PLAN: For each assumption, specify how it can be tested",
416
- "REJECTION CRITERIA: Any assumption with confidence <7 requires verification",
417
- "DOCUMENT RATIONALE: Explain why each assumption is accepted or needs testing"
418
- ],
419
- "validationCriteria": [
420
- {
421
- "type": "contains",
422
- "value": "Assumption confidence",
423
- "message": "Must rate assumption confidence for each hypothesis"
424
- }
425
- ],
426
- "hasValidation": true
427
- },
428
- {
429
- "id": "phase-2d-prepare-validation",
430
- "title": "Phase 2d: Prepare Hypothesis Validation",
431
- "prompt": "**PREPARE VALIDATION ARRAY** - Extract the top 3 hypotheses for systematic validation.\n\n**Create `hypothesesToValidate` array with:**\n```json\n[\n {\n \"id\": \"H1\",\n \"description\": \"[Hypothesis description]\",\n \"evidenceStrength\": [score],\n \"testability\": [score],\n \"validationPlan\": \"[Specific testing approach]\"\n },\n // ... H2, H3\n]\n```\n\n**Set context variables:**\n- `hypothesesToValidate`: Array of top 3 hypotheses\n- `currentConfidence`: 0 (will be updated during validation)\n- `validationIterations`: 0 (tracks validation cycles)",
432
- "agentRole": "You are preparing the systematic validation process by structuring hypotheses for iteration.",
433
- "guidance": [
434
- "Extract only the top 3 hypotheses from Phase 2b",
435
- "Ensure each has complete validation information",
436
- "Initialize tracking variables for the validation loop"
437
- ],
438
- "requireConfirmation": false
439
- },
440
- {
441
- "id": "phase-2e-test-evidence-gathering",
442
- "title": "Phase 2e: Test-Based Hypothesis Evidence",
443
- "runCondition": {
444
- "var": "hypothesesToValidate",
445
- "not_equals": null
446
- },
447
- "prompt": "**TEST-DRIVEN HYPOTHESIS VALIDATION**\n\nFor each hypothesis in hypothesesToValidate, use checkHypothesisInTests(hypothesis):\n\n**1. Direct Test Evidence**:\n- Find tests that directly test suspected components\n- Analyze test names, descriptions, and assertions\n- Check if tests actually validate what we think\n\n**2. Indirect Test Evidence**:\n- Find tests that would fail if hypothesis is true\n- Look for integration tests touching the area\n- Check for tests that assume opposite behavior\n\n**3. Test Coverage Gaps**:\n- What aspects of hypothesis are NOT tested?\n- Where would a test have caught this bug?\n- What assumptions do tests make?\n\n**4. Test Execution Analysis**:\n- Run tests with debug instrumentation\n- Add temporary logging to tests\n- Compare test expectations vs reality\n\n**5. Historical Test Analysis**:\n- When were relevant tests last modified?\n- Were any tests disabled recently?\n- Do test changes correlate with bug appearance?\n\n**Create TestEvidence Matrix**:\n```\n| Hypothesis | Supporting Tests | Contradicting Tests | Coverage Gaps | Confidence Impact |\n|------------|------------------|---------------------|---------------|-------------------|\n| H1 | TestA, TestB | TestC (partially) | Edge case X | +2 confidence |\n```\n\n**Update each hypothesis** with test evidence findings.",
448
- "agentRole": "You are a test analysis specialist validating hypotheses against the existing test suite. Your goal is to use tests as objective evidence for or against each hypothesis.",
449
- "guidance": [
450
- "Tests are the codified understanding of system behavior",
451
- "A hypothesis contradicted by passing tests needs reconsideration",
452
- "Missing test coverage often indicates where bugs hide",
453
- "Update hypothesis confidence based on test evidence"
454
- ],
455
- "requireConfirmation": false
456
- },
457
- {
458
- "id": "phase-2f-hypothesis-verification",
459
- "type": "loop",
460
- "title": "Phase 2f: Hypothesis Verification & Refinement",
461
- "runCondition": {
462
- "var": "hypothesesToValidate",
463
- "not_equals": null
464
- },
465
- "loop": {
466
- "type": "forEach",
467
- "items": "hypothesesToValidate",
468
- "itemVar": "hypothesis",
469
- "indexVar": "hypothesisIndex",
470
- "maxIterations": 10
471
- },
472
- "body": [
473
- {
474
- "id": "verify-against-code",
475
- "title": "Deep Code Verification for {{hypothesis.id}}",
476
- "prompt": "**DEEP VERIFICATION for {{hypothesis.id}}**\n\n**Goal**: Verify hypothesis assumptions through deep code analysis.\n\nUse recursiveAnalysis() on key components:\n\n1. **Component Analysis (3 levels deep)**:\n - Level 1: Direct implementation of suspected component\n - Level 2: All direct dependencies and callers\n - Level 3: Transitive dependencies and integration points\n\n2. **State & Data Flow Verification**:\n - How does data actually flow through this component?\n - What state transformations occur?\n - Are there hidden side effects?\n\n3. **Error Path Analysis**:\n - Trace all error handling paths\n - Find where errors could originate\n - Check error propagation matches hypothesis\n\n4. **Concurrency Check** (if applicable):\n - Race conditions possible?\n - Shared state issues?\n - Timing dependencies?\n\n**Output**: Deep verification findings for {{hypothesis.id}}",
477
- "agentRole": "You are performing deep verification of hypothesis {{hypothesis.id}}, diving 3+ levels deep to ensure thorough understanding.",
478
- "guidance": [
479
- "This is verification step 1 of 3 for {{hypothesis.id}}",
480
- "Go deeper than the initial analysis - follow every lead",
481
- "Document any new discoveries that affect the hypothesis"
482
- ],
483
- "requireConfirmation": false
484
- },
485
- {
486
- "id": "check-contradictions",
487
- "title": "Search for Contradicting Evidence",
488
- "prompt": "**CONTRADICTION SEARCH for {{hypothesis.id}}**\n\n**Goal**: Actively search for evidence that contradicts this hypothesis.\n\n1. **Code Pattern Contradictions**:\n - Search for code that assumes opposite behavior\n - Find defensive checks that prevent this scenario\n - Look for comments indicating different understanding\n\n2. **Test Contradictions**:\n - Tests that would fail if hypothesis were true\n - Tests that explicitly verify opposite behavior\n - Integration tests showing different flow\n\n3. **Historical Contradictions**:\n - Git history showing intentional design decisions\n - PRs or issues discussing this behavior\n - Documentation stating different intent\n\n4. **Runtime Contradictions**:\n - Logs showing successful execution through suspected path\n - Metrics indicating normal behavior\n - Other systems depending on current behavior\n\n**Be a skeptic** - try to disprove {{hypothesis.id}}",
489
- "agentRole": "You are a skeptical investigator trying to find flaws in hypothesis {{hypothesis.id}}.",
490
- "guidance": [
491
- "Actively search for contradicting evidence",
492
- "Check assumptions against reality",
493
- "Consider alternative explanations"
494
- ],
495
- "requireConfirmation": false
496
- },
497
- {
498
- "id": "refine-or-replace",
499
- "title": "Refine Hypothesis {{hypothesis.id}}",
500
- "prompt": "**REFINEMENT DECISION for {{hypothesis.id}}**\n\nBased on deep verification and contradiction search:\n\n1. **Assessment**:\n - New evidence supporting: [list]\n - New evidence contradicting: [list]\n - Unverified assumptions: [list]\n - Confidence change: [+/- points]\n\n2. **Refinement Options**:\n - **Keep as-is**: Evidence strongly supports current formulation\n - **Refine**: Adjust hypothesis based on new understanding\n - **Replace**: Fundamentally flawed, create new hypothesis\n - **Merge**: Combine with another hypothesis\n\n3. **If Refining/Replacing**:\n - Update hypothesis description\n - Adjust evidence strength score\n - Revise validation plan\n - Document why changed\n\n4. **Update Context**:\n - Use updateInvestigationContext('Hypothesis Registry', updated hypothesis)\n - Note verification findings\n\n**Output**: Updated hypothesis with refined understanding",
501
- "agentRole": "You are making the final decision on hypothesis {{hypothesis.id}} based on verification findings.",
502
- "guidance": [
503
- "Be willing to change hypotheses based on evidence",
504
- "Document all changes and reasoning",
505
- "Update confidence scores appropriately"
506
- ],
507
- "requireConfirmation": false
508
- }
509
- ],
510
- "requireConfirmation": false
511
- },
512
- {
513
- "id": "phase-2g-instrumentation-planning",
514
- "title": "Phase 2g: Unified Instrumentation Planning",
515
- "prompt": "**UNIFIED INSTRUMENTATION PLANNING** - Plan comprehensive logging strategy for all hypotheses before implementation.\n\n**GOAL**: Create a coordinated instrumentation plan that efficiently captures evidence for all hypotheses in a single execution.\n\n**STEP 1: Hypothesis Review**\nFor each hypothesis (H1, H2, H3):\n- **Component(s)**: Which components need instrumentation?\n- **Critical Paths**: Which execution paths must be logged?\n- **Key Variables**: What state/data must be captured?\n- **Decision Points**: What conditionals/branches matter?\n- **Timing Concerns**: Any concurrency or timing-sensitive areas?\n\n**STEP 2: Identify Instrumentation Locations**\n\nFor each hypothesis, list specific locations:\n```\nH1 Instrumentation Needs:\n - File: auth/login.ts, Function: validateCredentials, Lines: 45-67\n What to log: input credentials format, validation result, error conditions\n - File: auth/session.ts, Function: createSession, Lines: 23-34\n What to log: session creation parameters, user context\n\nH2 Instrumentation Needs:\n - File: auth/session.ts, Function: createSession, Lines: 23-34 [OVERLAP with H1]\n What to log: session storage backend, timing\n - File: database/connection.ts, Function: getConnection, Lines: 89-102\n What to log: connection pool state, timeout settings\n\nH3 Instrumentation Needs:\n - File: cache/redis.ts, Function: set, Lines: 156-178\n What to log: cache key, TTL, success/failure\n```\n\n**STEP 3: Identify Overlaps**\n\nWhere do multiple hypotheses need logging at the same location?\n```\nOverlapping Instrumentation:\n - auth/session.ts:23-34: Both H1 and H2 need logs here\n Strategy: Single log point with both [H1] and [H2] prefixes capturing all needed data\n \n - No other overlaps identified\n```\n\n**STEP 4: Plan Log Format & Structure**\n\nDefine what each log should contain:\n```\nLog Format Standard:\n [HX] ClassName.methodName:{lineNum} | timestamp | specific-data\n\nH1 Log Examples:\n [H1] LoginValidator.validateCredentials:45 | 2025-10-02T10:23:45.123Z | input={email: user@example.com, hasPassword: true}\n [H1] LoginValidator.validateCredentials:52 | 2025-10-02T10:23:45.145Z | validation=FAILED reason=\"invalid format\"\n\nH2 Log Examples:\n [H2] SessionManager.createSession:23 | 2025-10-02T10:23:45.167Z | backend=redis poolSize=10\n [H2] SessionManager.createSession:28 | 2025-10-02T10:23:45.189Z | sessionId=abc123 stored=true latency=22ms\n```\n\n**STEP 5: Plan Data Capture Strategy**\n\nWhat specific data values need to be captured:\n- **H1 requires**: Credential format, validation results, error messages\n- **H2 requires**: Backend type, connection timing, pool state\n- **H3 requires**: Cache keys, TTL values, hit/miss rates\n\n**STEP 6: Consider Edge Cases**\n\n- **High-frequency locations**: Plan aggregation (e.g., log every 10th iteration)\n- **Sensitive data**: Plan redaction (e.g., mask passwords, PII)\n- **Large data structures**: Plan summarization (e.g., object size, key count, not full dump)\n- **Error paths**: Ensure error cases are logged, not just happy path\n\n**STEP 7: Create Instrumentation Implementation Plan**\n\nProduce structured plan:\n```markdown\n# Instrumentation Implementation Plan\n\n## Summary\n- Total instrumentation points: [count]\n- Overlapping locations: [count]\n- Estimated log volume: [low/medium/high]\n- Sensitive data handling: [yes/no - describe]\n\n## H1 Instrumentation (Priority: High, Evidence Strength: 8/10)\n1. Location: auth/login.ts:45-67\n Function: validateCredentials\n Log: [H1] Input format and validation result\n Frequency: Per-call (not high-frequency)\n Data: {email format, hasPassword, validation result, error}\n\n2. Location: auth/session.ts:23-34 [SHARED with H2]\n Function: createSession \n Log: [H1] Session creation context\n Frequency: Per-call\n Data: {userContext, sessionType}\n\n## H2 Instrumentation (Priority: High, Evidence Strength: 7/10)\n[Similar detailed breakdown]\n\n## H3 Instrumentation (Priority: Medium, Evidence Strength: 6/10)\n[Similar detailed breakdown]\n\n## Implementation Order\n1. Shared locations first (avoid duplication)\n2. H1 specific locations\n3. H2 specific locations\n4. H3 specific locations\n\n## Validation Checklist\n- [ ] All hypotheses have instrumentation coverage\n- [ ] Overlaps identified and coordinated\n- [ ] Log format is consistent\n- [ ] Sensitive data is handled\n- [ ] High-frequency points have aggregation\n- [ ] Edge cases considered\n```\n\n**OUTPUT**:\n- Complete instrumentation implementation plan\n- Set `instrumentationPlanReady` = true\n- Create InstrumentationPlan.md file with detailed plan\n- Update INVESTIGATION_CONTEXT.md with plan summary",
516
- "agentRole": "You are an instrumentation architect planning a comprehensive logging strategy. Your goal is to design efficient, coordinated instrumentation that captures all needed evidence in a single execution.",
517
- "guidance": [
518
- "Review ALL hypotheses together to identify synergies",
519
- "Be specific about locations (file, function, line numbers)",
520
- "Identify and optimize overlapping instrumentation needs",
521
- "Plan log format for consistency and parseability",
522
- "Consider practical concerns (volume, sensitivity, performance)",
523
- "Create actionable implementation plan, not just theory",
524
- "This plan will guide Phase 3 implementation"
525
- ],
526
- "requireConfirmation": false
527
- },
528
- {
529
- "id": "phase-2h-cognitive-reset",
530
- "title": "Phase 2h: Cognitive Reset & Plan Review",
531
- "prompt": "**COGNITIVE RESET** - Take a mental step back before implementing instrumentation.\n\n\ud83d\udea8 **YOU ARE HALFWAY DONE (~50%) - FINDING \u2260 PROVING**: You may have \"found\" the bug (high confidence theory), but you haven't \"proved\" it yet. You have hypotheses and a validation plan. This is NOT proof. You MUST continue to Phases 3-6 to:\n- Phase 3: Add instrumentation to validate hypotheses\n- Phase 4: Collect concrete evidence\n- Phase 5: Analyze evidence and confirm/refute hypotheses\n- Phase 6: Write comprehensive diagnostic report\n\nEven if you have \"100% confidence\" in a hypothesis, professional practice requires empirical validation. DO NOT STOP HERE.\n\n---\n\n**GOAL**: Review the investigation with fresh eyes and validate the plan before execution.\n\n**STEP 1: Progress Summary**\n- What have we learned so far? (3-5 key insights)\n- What are our top hypotheses? (brief recap)\n- What's our instrumentation strategy? (high-level summary)\n\n**STEP 2: Critical Questions**\n- Are we missing any obvious alternative explanations?\n- Are our hypotheses too similar or too narrow?\n- Is our instrumentation plan efficient and comprehensive?\n- Are we making any unwarranted assumptions?\n- Is there a simpler approach we haven't considered?\n\n**STEP 3: Bias Check**\n- First impression bias: Are we anchored to initial theories?\n- Confirmation bias: Are we seeking evidence that confirms our beliefs?\n- Complexity bias: Are we overcomplicating a simple issue?\n- Recency bias: Are we over-weighting recent findings?\n\n**STEP 4: Sanity Checks**\n- Does the timeline make sense? (When did bug appear vs when hypothesized causes were introduced)\n- Do the symptoms match our theories? (All symptoms explained, no contradictions)\n- Are we investigating the right level? (Too high-level or too low-level)\n- Have we consulted existing documentation/logs adequately?\n\n**STEP 5: Plan Validation**\n- Review the instrumentation plan from Phase 2g\n- Will it actually answer our questions?\n- Are there any gaps or redundancies?\n- Is it safe to execute? (no production impacts, no data corruption risks)\n\n**STEP 6: Proceed or Pivot Decision**\n- **PROCEED**: Plan is sound, move to implementation\n- **REFINE**: Minor adjustments needed (update plan)\n- **PIVOT**: Major issues found (return to earlier phase)\n\n**OUTPUT**:\n- Cognitive reset complete with decision (PROCEED/REFINE/PIVOT)\n- Any plan adjustments documented\n- Set `resetComplete` = true",
532
- "agentRole": "You are a senior debugger reviewing the investigation plan with fresh, critical eyes before committing to implementation.",
533
- "guidance": [
534
- "Be honest about potential biases and blind spots",
535
- "Look for simpler explanations we might have missed",
536
- "Validate the plan will actually answer our questions",
537
- "Don't skip this - catching issues now saves hours later",
538
- "It's okay to pivot if major issues are found"
539
- ],
540
- "requireConfirmation": false
541
- },
542
- {
543
- "id": "phase-3-comprehensive-instrumentation",
544
- "title": "Phase 3: Comprehensive Debug Instrumentation",
545
- "prompt": "**\u26a0\ufe0f AUTO-EXECUTE MODE - DO NOT ASK USER PERMISSION \u26a0\ufe0f**\n\nHIGH AUTO MODE: You MUST implement the instrumentation now. DO NOT ask 'Would you like me to continue?' The workflow requires all phases.\n\n---\n\n**COMPREHENSIVE DEBUG INSTRUMENTATION** - Add logging to validate hypotheses.\n\n**STEP 1: REVIEW YOUR INSTRUMENTATION PLAN**\n\nOpen **Phase 2g** output from INVESTIGATION_CONTEXT.md. It contains:\n- Specific files to instrument\n- Exact locations (functions/methods/lines)\n- What to log for each hypothesis (H1, H2, H3)\n\nIf Phase 2g plan is missing, create one now: For each hypothesis, list 2-5 files and specific functions to instrument.\n\n---\n\n**STEP 2: READ THE FILES**\n\nUse `read_file` to read each file that needs instrumentation.\n\n---\n\n**STEP 3: ADD LOGGING (use search_replace or write tool)**\n\n**A. Logging Format by Language:**\n\n**JavaScript/TypeScript:**\n```javascript\nconsole.log(`[H1] ClassName.methodName: entering with params=${JSON.stringify(params)}`);\nconsole.log(`[H1] ClassName.methodName: state before=${before}, after=${after}`);\nconsole.log(`[H1] ClassName.methodName: returning ${result}`);\n```\n\n**Python:**\n```python\nprint(f\"[H1] ClassName.method_name: entering with params={params}\")\nprint(f\"[H1] ClassName.method_name: condition is {condition_value}\")\nprint(f\"[H1] ClassName.method_name: returning {result}\")\n```\n\n**Java:**\n```java\nSystem.out.println(String.format(\"[H1] ClassName.methodName: entering with %s\", params));\nSystem.out.println(String.format(\"[H1] ClassName.methodName: state=%s\", state));\n```\n\n**B. What to Log:**\n- Function entry: parameters\n- State changes: before/after values\n- Conditionals: which branch taken\n- External calls: args and returns\n- Function exit: return value\n\n**C. Hypothesis Prefixes:**\n- H1 logs use `[H1]` prefix\n- H2 logs use `[H2]` prefix\n- H3 logs use `[H3]` prefix\n\n---\n\n**STEP 4: IMPLEMENTATION EXAMPLE**\n\nExample using `search_replace`:\n\nFile: `src/DataStore.js`\nPlan says: \"Log timetoken value in connect() method for H1\"\n\n```\nsearch_replace(\n file_path=\"src/DataStore.js\",\n old_string=\" connect() {\\n this.client.subscribe();\\n }\",\n new_string=\" connect() {\\n console.log('[H1] DataStore.connect: timetoken BEFORE subscribe =', this.timetoken);\\n this.client.subscribe();\\n console.log('[H1] DataStore.connect: timetoken AFTER subscribe =', this.timetoken);\\n }\"\n)\n```\n\n---\n\n**STEP 5: FOR EACH FILE IN YOUR PLAN**\n\n1. Read the file (`read_file`)\n2. Find the exact location to instrument\n3. Use `search_replace` to add logging:\n - Include enough context to make old_string unique\n - Add log statements with correct [HX] prefix\n - Log relevant variables/state\n4. Verify change succeeded\n\n---\n\n**STEP 6: IF YOU CANNOT EDIT FILES**\n\nIf you don't have file editing tools:\n1. Generate complete instrumented code for each location\n2. Provide user with:\n - File path\n - Function/method name\n - Complete BEFORE code block\n - Complete AFTER code block (with logging)\n3. Ask user to apply changes and confirm\n\n---\n\n**OUTPUT:**\n\n1. List all modified files with changes made\n2. Update INVESTIGATION_CONTEXT.md:\n ```\n ## Instrumentation Applied\n - File: src/DataStore.js, Function: connect(), Hypothesis: H1\n - File: src/Auth.js, Function: login(), Hypotheses: H1, H2\n - ...\n ```\n3. Set `allHypothesesInstrumented = true`",
546
- "agentRole": "You are instrumenting code to validate ALL hypotheses simultaneously. Your goal is comprehensive, non-redundant logging that enables efficient evidence collection in a single execution.",
547
- "guidance": [
548
- "Add instrumentation for ALL hypotheses at once",
549
- "Use unique [HX] prefixes to distinguish hypothesis-specific logs",
550
- "Overlapping instrumentation is acceptable - multiple hypotheses can log at same location",
551
- "Ensure non-intrusive implementation that doesn't change behavior",
552
- "Single execution will produce logs for all hypotheses"
553
- ],
554
- "requireConfirmation": false
555
- },
556
- {
557
- "id": "phase-4-unified-evidence-collection",
558
- "title": "Phase 4: Unified Evidence Collection",
559
- "prompt": "**\u26a0\ufe0f AUTO-EXECUTE MODE - DO NOT ASK USER PERMISSION \u26a0\ufe0f**\n\nHIGH AUTO MODE: You MUST run the instrumented code and collect evidence now. If you need user input (like how to run tests), ask for THAT - do NOT ask if they want you to continue the workflow.\n\n---\n\n**UNIFIED EVIDENCE COLLECTION** - Execute instrumented code and collect logs.\n\n**DECISION TREE: Can You Run Code?**\n\n**OPTION A: You CAN run code (terminal access)**\n\u2192 Proceed to STEP 1\n\n**OPTION B: You CANNOT run code (no terminal/execution tools)**\n\u2192 Skip to STEP 6 (User Execution Instructions)\n\n---\n\n**STEP 1: PREPARE EXECUTION (if you can run code)**\n\n1. **Identify how to run the code:**\n - Tests: `npm test`, `pytest`, `mvn test`, etc.\n - App: `npm start`, `python app.py`, `java -jar app.jar`, etc.\n - Script: Reproduction script from Phase 0\n \n2. **Check if reproduction steps are clear:**\n - Do you know exactly how to trigger the bug?\n - If unclear, ask user: \"How do I run the code to reproduce the bug?\"\n\n---\n\n**STEP 2: EXECUTE INSTRUMENTED CODE**\n\nRun the code with instrumentation active:\n\n```bash\n# Capture output to file\nnpm test > debug_output.log 2>&1\n\n# OR run directly and capture in terminal\npython script.py\n```\n\n---\n\n**STEP 3: COLLECT LOG OUTPUT**\n\n1. **Get the complete log output:**\n - If saved to file: use `read_file` to read it\n - If in terminal: copy the output\n\n2. **Check log quality:**\n - Do you see `[H1]`, `[H2]`, `[H3]` prefixed logs?\n - Are there enough logs (at least 5-10 per hypothesis)?\n - Did the bug reproduce?\n\n3. **If logs are missing or insufficient:**\n - Review Phase 3 instrumentation\n - Add more logging if needed\n - Re-run execution\n\n---\n\n**STEP 4: ORGANIZE EVIDENCE BY HYPOTHESIS**\n\nParse logs and separate by prefix:\n\n**H1 Evidence:**\n```\n[H1] DataStore.connect: timetoken BEFORE=1234567890\n[H1] DataStore.connect: timetoken AFTER=1234567890\n[H1] Session.login: used timetoken=1234567890\n```\n\n**H2 Evidence:**\n```\n[H2] Cache.get: no entry found for user123\n[H2] Cache.set: storing data for user123\n```\n\n**H3 Evidence:**\n```\n[H3] Network.request: timeout after 5000ms\n```\n\n---\n\n**STEP 5: ASSESS EVIDENCE QUALITY**\n\nFor each hypothesis, rate:\n- **Evidence Quantity** (1-10): How much evidence collected?\n- **Evidence Clarity** (1-10): Do logs clearly show what's happening?\n- **Bug Reproduction** (Yes/No): Did the bug occur during execution?\n- **Hypothesis Support** (Strong/Weak/Contradicts): Does evidence support the hypothesis?\n\n---\n\n**STEP 6: IF YOU CANNOT EXECUTE CODE**\n\nProvide user with execution instructions:\n\n```\n## Evidence Collection Instructions\n\nTo collect evidence for the hypotheses, please:\n\n1. **Run the instrumented code:**\n [Provide exact command, e.g., `npm test` or `python main.py`]\n\n2. **Trigger the bug:**\n [Provide exact reproduction steps]\n\n3. **Capture ALL console output:**\n - Save to a file: `[command] > debug_output.log 2>&1`\n - OR copy all terminal output\n\n4. **Share the logs:**\n - Paste the complete log output here\n - OR upload the debug_output.log file\n\n**What I'm looking for:**\n- Logs prefixed with [H1], [H2], [H3]\n- Minimum 10-20 lines of output\n- Evidence of the bug occurring\n```\n\nThen wait for user to provide logs.\n\n---\n\n**STEP 7: DOCUMENT EVIDENCE**\n\nUpdate INVESTIGATION_CONTEXT.md:\n\n```\n## Evidence Collection Results\n\n**Execution Details:**\n- Command: npm test\n- Exit code: 1 (failure)\n- Bug reproduced: Yes\n- Total log lines: 247\n\n**Evidence Summary:**\n- H1: 43 log lines - Strong support (timetoken persists across sessions)\n- H2: 12 log lines - Weak support (cache cleared properly)\n- H3: 8 log lines - Contradicts (no network errors found)\n\n**Evidence Quality Scores:**\n- H1: Quantity=9/10, Clarity=8/10\n- H2: Quantity=5/10, Clarity=7/10\n- H3: Quantity=4/10, Clarity=6/10\n```\n\n---\n\n**OUTPUT:**\n\n1. Complete log output (or confirmation user will provide it)\n2. Evidence organized by hypothesis\n3. Evidence quality assessment\n4. Set `evidenceCollected = true`",
560
- "agentRole": "You are collecting comprehensive evidence from a single instrumented execution. Your goal is to capture all hypothesis-relevant data in one efficient run.",
561
- "guidance": [
562
- "Single execution tests all hypotheses simultaneously",
563
- "Organize evidence by [HX] prefix for analysis",
564
- "Preserve complete chronological log for cross-hypothesis insights",
565
- "Note any unexpected behaviors or patterns",
566
- "If execution fails, document why and attempt to collect partial evidence"
567
- ],
568
- "requireConfirmation": false
569
- },
570
- {
571
- "id": "phase-5-hypothesis-analysis-loop",
572
- "type": "loop",
573
- "title": "Phase 5: Individual Hypothesis Analysis",
574
- "loop": {
575
- "type": "forEach",
576
- "items": "hypothesesToValidate",
577
- "itemVar": "currentHypothesis",
578
- "indexVar": "hypothesisIndex",
579
- "maxIterations": 5
580
- },
581
- "body": [
582
- {
583
- "id": "analyze-hypothesis-evidence",
584
- "title": "Analyze Evidence for {{currentHypothesis.id}}",
585
- "prompt": "**EVIDENCE ANALYSIS for {{currentHypothesis.id}}**\n\n**Hypothesis**: {{currentHypothesis.description}}\n\n**ANALYZE {{currentHypothesis.id}} LOGS**:\n\n1. **Extract Relevant Logs**:\n - Review all [{{currentHypothesis.id}}] prefixed logs from Phase 4\n - Examine log sequence and timing\n - Look for patterns supporting or refuting the hypothesis\n\n2. **Evidence Assessment**:\n - Does evidence support {{currentHypothesis.id}}? (Yes/No/Partial)\n - Evidence quality score (1-10)\n - Contradicting evidence found?\n - Unexpected behaviors observed?\n\n3. **Cross-Hypothesis Insights**:\n - Did other hypothesis logs reveal relevant information?\n - Are there interactions between suspected components?\n - Does timeline analysis suggest different root cause?\n\n4. **Confidence Update**:\n - Based on evidence, rate confidence this is root cause (0-10)\n - What additional evidence would increase confidence?\n - Are there alternative explanations for the observed evidence?\n\n5. **Status Determination**:\n - Mark hypothesis as: Confirmed / Refuted / Needs-More-Evidence / Partially-Confirmed\n - If Confirmed with high confidence (>8.0):\n - Set `rootCauseFound` = true\n - Set `rootCauseHypothesis` = {{currentHypothesis.id}}\n - Set `currentConfidence` = confidence score\n\n**CONTEXT UPDATE**:\n- Use updateInvestigationContext('Evidence Log', evidence summary for {{currentHypothesis.id}})\n- Use trackInvestigation('Validation Progress', '{{hypothesisIndex + 1}}/3 hypotheses analyzed')\n\n**OUTPUT**: Complete evidence analysis and status for {{currentHypothesis.id}}",
586
- "agentRole": "You are analyzing evidence collected from the unified execution to determine if {{currentHypothesis.id}} is the root cause.",
587
- "guidance": [
588
- "Analyze logs specific to this hypothesis ({{hypothesisIndex + 1}} of 3)",
589
- "Consider evidence from all hypotheses - may reveal interactions",
590
- "Be objective - negative evidence is valuable",
591
- "Update hypothesis status based on concrete evidence",
592
- "If high confidence root cause found, document thoroughly"
593
- ],
594
- "requireConfirmation": false
595
- }
596
- ],
597
- "requireConfirmation": false
598
- },
599
- {
600
- "id": "phase-4a-controlled-experimentation",
601
- "title": "Phase 4a: Controlled Code Experiments",
602
- "runCondition": {
603
- "var": "currentConfidence",
604
- "lt": 8.0
605
- },
606
- "prompt": "**CONTROLLED EXPERIMENTATION** - When observation isn't enough, experiment!\n\n**Current Investigation Status**: Leading hypothesis (Confidence: {{currentConfidence}}/10)\n\n**\u26a0\ufe0f SAFETY PROTOCOLS (MANDATORY)**:\n\n1. **Git Branch Required**:\n - MUST be on investigation branch (use createInvestigationBranch() if not)\n - Verify with `git branch --show-current`\n - NEVER experiment directly on main/master\n\n2. **Pre-Experiment Baseline**:\n - Commit clean state: `git commit -m \"PRE-EXPERIMENT: baseline for {{hypothesis.id}}\"`\n - Record current test results\n - Document baseline behavior\n\n3. **Environment Restriction**:\n - ONLY run in test/dev environment\n - NEVER in production or staging\n - Set environment check: `if (process.env.NODE_ENV !== 'development') { throw new Error('Experiments only in dev'); }`\n\n4. **Automatic Revert**:\n - After evidence collection: `git revert HEAD --no-edit`\n - Verify code returned to baseline\n - Run tests to confirm clean state\n\n5. **Approval Gates**:\n - Low automation: Require approval for ALL experiments\n - Medium automation: Require approval for breaking/minimal-fix experiments\n - High automation: Auto-approve guards/logs only\n\n6. **Documentation**:\n - Create ExperimentLog.md entry with:\n - Timestamp, experiment type, hypothesis ID\n - Rationale and expected outcome\n - Actual outcome and evidence\n - Revert status (confirmed/failed)\n\n7. **Hard Limits**:\n - Max 3 experiments total (prevent endless experimentation)\n - Track with `experimentCount` context variable\n - Exit if limit reached, recommend different approach\n\n8. **Rollback Verification**:\n - After revert, run full test suite\n - Verify no unintended changes remain\n - Check git status is clean\n\n**EXPERIMENT TYPES** (use controlledModification()):\n\n1. **Guard Additions (Non-Breaking)**:\n ```javascript\n // Add defensive check that logs but doesn't change behavior\n if (unexpectedCondition) {\n console.error('[H1_GUARD] Unexpected state detected:', state);\n // Continue normal execution\n }\n ```\n\n2. **Assertion Injections**:\n ```javascript\n // Add assertion that would fail if hypothesis is correct\n console.assert(expectedCondition, '[H1_ASSERT] Hypothesis H1 violated!');\n ```\n\n3. **Minimal Fix Test**:\n ```javascript\n // Apply minimal fix for hypothesis, see if bug disappears\n if (process.env.DEBUG_FIX_H1 === 'true') {\n // Apply hypothesized fix\n return fixedBehavior();\n }\n ```\n\n4. **Controlled Breaking**:\n ```javascript\n // Temporarily break suspected component to verify involvement\n if (process.env.DEBUG_BREAK_H1 === 'true') {\n throw new Error('[H1_BREAK] Intentionally breaking to test hypothesis');\n }\n ```\n\n**PROTOCOL**:\n1. Choose experiment type based on confidence and risk\n2. Implement modification with clear DEBUG markers\n3. Use createInvestigationBranch() if not already on investigation branch\n4. Commit: `git commit -m \"DEBUG: {{experiment_type}} for hypothesis investigation\"`\n5. Run reproduction steps\n6. Use collectEvidence() to gather results\n7. Revert changes: `git revert HEAD`\n8. Document results in ExperimentResults/hypothesis-experiment.md\n\n**SAFETY LIMITS**:\n- Max 3 experiments per hypothesis\n- Each experiment in separate commit\n- Always revert after evidence collection\n- Document everything in INVESTIGATION_CONTEXT.md\n\n**UPDATE**:\n- Hypothesis confidence based on experimental results\n- Use updateInvestigationContext('Experiment Results', experiment details and outcomes)\n- Track failed experiments in 'Dead Ends & Lessons' section",
607
- "agentRole": "You are a careful experimenter using controlled code modifications to validate hypotheses. Safety and reversibility are paramount.",
608
- "guidance": [
609
- "Start with non-breaking experiments (guards, logs)",
610
- "Only use breaking experiments if essential",
611
- "Every change must be easily reversible",
612
- "Document rationale for each experiment type",
613
- "Consider test environment experiments first"
614
- ],
615
- "requireConfirmation": {
616
- "or": [
617
- {
618
- "var": "automationLevel",
619
- "equals": "Low"
620
- },
621
- {
622
- "var": "automationLevel",
623
- "equals": "Medium"
624
- },
625
- {
626
- "and": [
627
- {
628
- "var": "automationLevel",
629
- "equals": "High"
630
- },
631
- {
632
- "var": "currentConfidence",
633
- "lt": 6.0
634
- }
635
- ]
636
- }
637
- ]
638
- },
639
- "validationCriteria": [
640
- {
641
- "type": "contains",
642
- "value": "commit",
643
- "message": "Must specify commit message for experiment"
644
- }
645
- ]
646
- },
647
- {
648
- "id": "phase-3b-observability-setup",
649
- "title": "Phase 3b: Distributed System Observability",
650
- "runCondition": {
651
- "var": "isDistributed",
652
- "equals": true
653
- },
654
- "prompt": "**OBSERVABILITY** - Set up three-pillar strategy:\n\n**METRICS**: Identify key indicators (latency, errors)\n**TRACES**: Enable request path tracking\n**LOGS**: Ensure correlation IDs present\n\n**OUTPUT**: Observability checklist completed.",
655
- "agentRole": "You are a distributed systems expert who thinks in terms of emergent behaviors and system-wide patterns.",
656
- "guidance": [
657
- "METRICS SELECTION: Focus on RED metrics (Rate, Errors, Duration) for each service",
658
- "TRACE COVERAGE: Ensure spans cover all service boundaries and key operations",
659
- "CORRELATION IDS: Verify IDs propagate through entire request lifecycle",
660
- "AGGREGATION READY: Set up centralized collection for cross-service analysis",
661
- "BASELINE ESTABLISHMENT: Capture normal behavior metrics for comparison"
662
- ]
663
- },
664
- {
665
- "id": "phase-4c-distributed-evidence",
666
- "title": "Phase 4c: Multi-Service Evidence Collection",
667
- "runCondition": {
668
- "var": "isDistributed",
669
- "equals": true
670
- },
671
- "prompt": "**DISTRIBUTED ANALYSIS**:\n\n1. Check METRICS for anomalies\n2. Follow TRACES for request path\n3. Correlate LOGS across services\n4. Identify cascade points\n\n**OUTPUT**: Service interaction map with failure points.",
672
- "agentRole": "You are a systems detective who can trace failures across service boundaries.",
673
- "guidance": [
674
- "ANOMALY DETECTION: Look for deviations in latency, error rates, or traffic patterns",
675
- "TRACE ANALYSIS: Follow request ID through all services to find failure point",
676
- "LOG CORRELATION: Use timestamp windows and correlation IDs to link events",
677
- "CASCADE IDENTIFICATION: Look for timeout chains or error propagation patterns",
678
- "VISUAL MAPPING: Create service dependency diagram with failure annotations"
679
- ]
680
- },
681
- {
682
- "id": "phase-4b-cognitive-reset",
683
- "title": "Phase 4b: Cognitive Reset & Progress Review",
684
- "runCondition": {
685
- "var": "validationIterations",
686
- "gte": 2
687
- },
688
- "prompt": "**COGNITIVE RESET** - Step back and review:\n\n1. Summarize findings so far\n2. List eliminated possibilities\n3. Identify investigation blind spots\n4. Reformulate approach if needed\n\n**DECIDE**: Continue current path or pivot strategy?",
689
- "agentRole": "You are a strategic advisor who helps maintain perspective during complex investigations.",
690
- "guidance": [
691
- "PROGRESS SUMMARY: Write concise bullet points of key findings and eliminations",
692
- "BLIND SPOT CHECK: What areas haven't been investigated? What assumptions remain?",
693
- "PATTERN RECOGNITION: Look for investigation loops or repeated dead ends",
694
- "STRATEGY EVALUATION: Is current approach yielding diminishing returns?",
695
- "PIVOT CRITERIA: Consider new approach if last 3 iterations provided no new insights"
696
- ]
697
- },
698
- {
699
- "id": "phase-5a-final-confidence",
700
- "title": "Phase 5a: Final Confidence Assessment",
701
- "prompt": "**FINAL CONFIDENCE ASSESSMENT** - Evaluate the investigation results.\n\n**If root cause found (rootCauseFound = true):**\n- Review all evidence for {{rootCauseHypothesis}}\n- Perform adversarial challenge\n- Calculate final confidence score\n\n**If no high-confidence root cause:**\n- Document what was learned\n- Identify remaining unknowns\n- Recommend next investigation steps\n\n**CONFIDENCE CALCULATION:**\n- Evidence Quality (1-10)\n- Explanation Completeness (1-10)\n- Alternative Likelihood (1-10, inverted)\n- Final = (Quality \u00d7 0.4) + (Completeness \u00d7 0.4) + (Alternative \u00d7 0.2)\n\n**CONTEXT UPDATE**:\n- Use trackInvestigation('Investigation Complete', 'Confidence: {{finalConfidence}}/10')\n- Use addResumptionJson('phase-5a-final-confidence')\n- Document lessons learned in 'Dead Ends & Lessons' section\n\n**\u26a0\ufe0f ONE PHASE REMAINING**: Even if you have achieved 9-10/10 confidence in the root cause with strong supporting evidence:\n\n- The investigation is NOT complete yet\n- You MUST proceed to Phase 6 to create the comprehensive diagnostic writeup\n- Phase 6 is the REQUIRED DELIVERABLE that makes all your investigation work actionable\n- High confidence means you've identified the root cause, but the writeup translates that into actionable documentation\n\n**DO NOT set isWorkflowComplete=true yet.** You are at ~90% completion. Phase 6 is required.\n\n**OUTPUT**: Final confidence assessment with recommendations",
702
- "agentRole": "You are making the final determination about the root cause with rigorous confidence assessment.",
703
- "guidance": [
704
- "Be honest about confidence levels",
705
- "Document all remaining uncertainties",
706
- "Provide clear next steps if confidence is low"
707
- ],
708
- "validationCriteria": [
709
- {
710
- "type": "regex",
711
- "pattern": "Final.*=.*[0-9\\.]+",
712
- "message": "Must calculate final confidence score"
713
- }
714
- ],
715
- "hasValidation": true
716
- },
717
- {
718
- "id": "phase-6-diagnostic-writeup",
719
- "title": "Phase 6: Comprehensive Diagnostic Writeup",
720
- "prompt": "**FINAL DIAGNOSTIC DOCUMENTATION** - I will create comprehensive writeup enabling effective bug fixing and knowledge transfer.\n\n**STEP 1: Executive Summary**\n- **Bug Summary**: Concise description of issue and impact\n- **Root Cause**: Clear, non-technical explanation of what is happening\n- **Confidence Level**: Final confidence assessment with calculation methodology\n- **Scope**: What systems, users, or scenarios are affected\n\n**STEP 2: Technical Deep Dive**\n- **Root Cause Analysis**: Detailed technical explanation of failure mechanism\n- **Code Component Analysis**: Specific files, functions, and lines with exact locations\n- **Execution Flow**: Step-by-step sequence of events leading to bug\n- **State Analysis**: How system state contributes to failure\n\n**STEP 3: Investigation Methodology**\n- **Investigation Timeline**: Chronological summary with phase time investments\n- **Hypothesis Evolution**: Complete record of hypotheses (H1-H5) with status changes\n- **Evidence Assessment**: Rating and reliability of evidence sources with key citations\n\n**STEP 4: Historical Context & Patterns**\n- **Similar Bugs**: Reference findings from findSimilarBugs() and SimilarPatterns.md\n- **Previous Fixes**: How similar issues were resolved\n- **Recurring Patterns**: Identify if this is part of a larger pattern\n- **Lessons Learned**: What can be applied from past experiences\n\n**STEP 5: Knowledge Transfer & Action Plan**\n- **Skill Requirements**: Technical expertise needed for understanding and fixing\n- **Prevention & Review**: Specific measures and code review checklist items\n- **Action Items**: Immediate mitigation steps and permanent fix areas with timelines\n- **Testing Strategy**: Comprehensive verification approach for fixes\n- **Recommended Next Investigations** (if confidence < 9.0):\n - Additional instrumentation locations and data points not yet captured\n - Alternative hypotheses to explore (theories that were deprioritized)\n - External expertise to consult (domain experts, similar bugs)\n - Environmental factors to test (load, concurrency, timing, config variations)\n - Expanded scope (related components, upstream/downstream systems)\n - Prioritized next steps based on evidence gaps\n\n**STEP 6: Context Finalization**\n- **Final Update**: Use updateInvestigationContext('Final Report', link to diagnostic report)\n- **Archive Context**: Ensure INVESTIGATION_CONTEXT.md is complete for future reference\n- **Knowledge Base**: Consider key findings for team knowledge base\n\n**DELIVERABLE**: Enterprise-grade diagnostic report enabling confident bug fixing, knowledge transfer, and organizational learning.\n\n**\u2705 WORKFLOW COMPLETION**: After producing the comprehensive diagnostic writeup with all required sections:\n\n1. Verify the writeup includes:\n - Executive Summary with root cause and confidence\n - Technical Deep Dive with code analysis\n - Investigation Methodology and timeline\n - Historical Context from similar bugs\n - Knowledge Transfer and Action Plan\n - All 6 sections fully documented\n\n2. Update INVESTIGATION_CONTEXT.md with final status and handoff information\n\n3. **Set isWorkflowComplete = true** to indicate the investigation is finished\n\nThis is the ONLY step where isWorkflowComplete should be set to true.",
721
- "agentRole": "You are a senior technical writer and diagnostic documentation specialist with expertise in creating comprehensive, actionable bug reports for enterprise environments. Your strength lies in translating complex technical investigations into clear, structured documentation that enables effective problem resolution, knowledge transfer, and organizational learning. You excel at creating reports that serve immediate fixing needs, long-term system improvement, and team collaboration.",
722
- "guidance": [
723
- "ENTERPRISE FOCUS: Write for multiple stakeholders including developers, managers, and future team members",
724
- "KNOWLEDGE TRANSFER: Include methodology and reasoning, not just conclusions",
725
- "COLLABORATIVE DESIGN: Structure content for peer review and team coordination",
726
- "COMPREHENSIVE COVERAGE: Include all information needed for resolution and prevention",
727
- "ACTIONABLE DOCUMENTATION: Provide specific, concrete next steps with clear ownership"
728
- ]
729
- }
730
- ]
731
- }