npm - @exaudeus/workrail - Versions diffs - 0.7.2-beta.5 → 0.8.0 - Mend

@exaudeus/workrail 0.7.2-beta.5 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "0.7.2-beta.5",
+  "version": "0.8.0",
   "description": "MCP server for structured workflow orchestration and step-by-step task guidance",
   "license": "MIT",
   "bin": {

package/workflows/IMPROVEMENTS-simplified.md ADDED Viewed

@@ -0,0 +1,122 @@
+# Improvements to Simplified Bug Investigation Workflow
+## Problem Reported
+**Issue 1**: "What about having it follow the flow of code to help track down what could be happening?"
+**Issue 2**: "The agent stopped after phase two because it was 'very confident' that it had found the issue"
+## Root Cause
+The agent stopped after Phase 2 (Hypothesis Formation) because it felt confident it had found the bug. But at that point, it only had a **theory** based on reading code, not **proof** from evidence. This is the #1 failure mode we're trying to prevent.
+## Changes Made
+### 1. Enhanced Phase 1 - Execution Flow Tracing
+**Before**: Vague guidance about "understanding how code is reached" and "tracing data flow"
+**After**: Concrete, step-by-step execution flow tracing:
+- Start at entry point (API call, test, event)
+- Trace the call chain function-by-function
+- Track state changes at each step
+- Follow data transformations
+- Document the complete path from entry to error
+**Why**: This gives agents a **concrete technique** rather than abstract guidance. Following actual execution flow prevents surface-level code reading.
+**Output**: `ExecutionFlow.md` with:
+- Entry point
+- Step-by-step call chain with file:line references
+- Data flow diagram
+- State changes
+- Decision points
+### 2. Added Explicit Anti-Early-Exit Warning in Phase 2
+Added at the end of Phase 2 prompt:
+```
+🚨 CRITICAL - DO NOT STOP HERE:
+Even if you have a hypothesis with 10/10 confidence, you do NOT have proof yet.
+You have an educated guess based on reading code.
+You MUST continue to Phase 3 (Instrumentation) and Phase 4 (Evidence Collection)
+to gather actual proof.
+Having "very high confidence" after reading code is NOT the same as having
+evidence from running instrumented code.
+Call workflow_next to continue to Phase 3. This is not optional.
+```
+**Why**: Catches agents right at the moment they're tempted to stop. Makes it explicit that confidence ≠ completion.
+### 3. Strengthened MetaGuidance
+Enhanced the "Finding vs Proving" section:
+**Before**:
+- "When you look at code and think 'I found the bug!', you have formed a hypothesis..."
+- "This is why you must complete all phases even when Phase 1 makes the bug 'obvious'."
+**After** (added):
+- "Reading code and feeling confident = THEORY. Running instrumented code and collecting evidence = PROOF."
+- "Even with 10/10 confidence after Phase 1 or 2, you have ZERO proof. Continue to Phases 3-5 to gather evidence. This is NOT negotiable."
+- "Common mistake: 'I'm very confident so I'll skip instrumentation.' This fails ~90% of the time. High confidence without evidence = educated guess, not diagnosis."
+**Why**: Uses clearer language about the distinction. Explicitly calls out the "I'm confident" mistake.
+### 4. Updated Phase 1 Closing
+**Before**: "You're building understanding, not diagnosing yet."
+**After**: "This analysis builds understanding. You do NOT have a diagnosis yet. You're mapping the terrain before forming theories."
+**Why**: More forceful language to prevent premature conclusions.
+## Why This Matters
+### The Core Problem
+Agents (like humans) naturally:
+1. Pattern match quickly when reading code
+2. Form confident conclusions based on that pattern matching
+3. Feel like they've "solved it" and want to move on
+But bugs often have:
+- Alternative explanations
+- Edge cases not visible from reading code
+- Unexpected interactions only visible at runtime
+- Environmental factors
+### The Solution
+The workflow now:
+1. **Provides concrete technique** (execution flow tracing) vs abstract "analyze code"
+2. **Intercepts at the decision point** (end of Phase 2) with explicit warning
+3. **Explains WHY** phases matter in metaGuidance
+4. **Uses clear language** about theory vs proof
+## Testing Recommendations
+When testing this workflow:
+1. **Watch for Phase 2 exits**: Does the agent try to stop after forming hypotheses?
+2. **Check for execution flow**: Does Phase 1 produce a detailed call chain, or just general analysis?
+3. **Look for instrumentation**: Does Phase 3 actually add logging/debugging, or skip it?
+4. **Verify evidence collection**: Does Phase 4 run instrumented code and collect real data?
+## Remaining Challenges
+Even with these improvements, agents may still try to exit early if:
+- They have extremely high confidence
+- The bug seems "obvious"
+- The codebase is small/simple
+If this continues to be an issue, we may need to:
+- Add a "commitment checkpoint" that requires explicit acknowledgment
+- Make workflow_next calls more automatic (less agent discretion)
+- Add validation that checks for completed artifacts before allowing progression

package/workflows/bug-investigation.json ADDED Viewed

@@ -0,0 +1,112 @@
+{
+  "id": "bug-investigation",
+  "name": "Bug Investigation",
+  "version": "1.0.0",
+  "description": "A systematic bug investigation workflow that finds the true source of bugs through strategic planning and evidence-based analysis. Guides agents through plan-then-execute phases to avoid jumping to conclusions.",
+  "clarificationPrompts": [
+    "What type of system is this? (web app, backend service, CLI tool, etc.)",
+    "How reproducible is this bug? (always, sometimes, rarely)",
+    "What access do you have? (full codebase, logs, tests, etc.)"
+  ],
+  "preconditions": [
+    "User has a specific bug or failing test to investigate",
+    "Agent has codebase access and can run tests/build",
+    "Bug is reproducible with specific steps"
+  ],
+  "metaGuidance": [
+    "WHO YOU ARE: You are a special investigator - one of the few who has the patience, determination, and skill to find the TRUE source of bugs.",
+    "Most investigators stop at the obvious explanation. You don't. You look past red herrings, challenge assumptions, and dig until you have certainty.",
+    "YOUR MISSION: Find the REAL cause of this bug. Not the apparent cause, not the first explanation, but the actual source with evidence to prove it.",
+    "WHY THIS WORKFLOW EXISTS: It gives you a systematic process to avoid the traps that catch other investigators - jumping to conclusions, confirmation bias, surface-level analysis.",
+    "HOW IT WORKS: Each phase has two steps: First you PLAN your approach (think strategically), then you EXECUTE it (do the work).",
+    "This planning step is critical - it forces you to think about HOW you'll investigate before diving in. Better plans lead to better investigations.",
+    "THE PHASES:",
+    "Phase 0: Understand what you're investigating and set up your workspace",
+    "Phase 1: Trace how execution flows from entry point to error (follow the code path)",
+    "Phase 2: Form multiple hypotheses about what could be causing this (stay open-minded)",
+    "Phase 3: Design and add instrumentation to gather evidence (set up your surveillance)",
+    "Phase 4: Run instrumented code and collect evidence (gather proof, not assumptions)",
+    "Phase 5: Validate your conclusion rigorously (be your harshest critic)",
+    "Phase 6: Document your findings so others can understand and fix it (prove your case)",
+    "CRITICAL DISTINCTION - THEORY VS PROOF:",
+    "When you read code and think 'I found it!', you have a THEORY. Theories feel certain but are often wrong.",
+    "PROOF comes from running instrumented code, collecting evidence, ruling out alternatives, and validating rigorously.",
+    "You must complete all phases to get from theory to proof. No shortcuts, even with high confidence.",
+    "YOUR DELIVERABLE: A diagnostic writeup that proves you found the true source - complete with evidence, alternative explanations ruled out, and reproduction steps.",
+    "SUCCESS MEANS: Someone reading your writeup can fix the bug confidently because you've proven what's actually happening and why.",
+    "WORKFLOW MECHANICS: Call workflow_next to get each phase. Complete the phase (both plan and execute). Call workflow_next again. Repeat until isComplete=true."
+  ],
+  "steps": [
+    {
+      "id": "phase-0-setup",
+      "title": "Phase 0: Investigation Setup",
+      "prompt": "**UNDERSTAND THE PROBLEM & SET UP YOUR WORKSPACE**\n\nBefore you start investigating, you need to understand what you're looking for and prepare your workspace.\n\n**Your Task**: Set up everything you need for a systematic investigation.\n\n**Questions to Answer**:\n- What exactly is the reported problem?\n- What's the expected vs actual behavior?\n- How is it reproduced?\n- What error messages or symptoms exist?\n- What information do you have (logs, stack traces, etc.)?\n- What tools and access do you have?\n- What workspace do you need (branch, investigation directory)?\n\n**Set Up**:\n- Create INVESTIGATION_CONTEXT.md to track your investigation\n- Document the bug description and reproduction steps\n- Note any initial assumptions you'll need to verify\n- Set up a workspace (branch or directory) if appropriate\n- Clarify any user preferences\n\n**OUTPUT**: INVESTIGATION_CONTEXT.md with:\n- Clear description of the bug\n- Reproduction steps\n- Initial information (stack traces, logs, errors)\n- Your workspace location\n- Any early assumptions to verify later\n\n**Before Proceeding**: Can you clearly explain this bug to someone else? Do you know how to reproduce it?",
+      "agentRole": "You are beginning your investigation. Take time to understand what you're looking for before you start looking.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-1a-plan",
+      "title": "Phase 1A: Plan Your Investigation Approach",
+      "prompt": "**PLAN HOW YOU'LL TRACK DOWN THIS BUG**\n\nYou're about to analyze the codebase. But first, think strategically about HOW you'll investigate.\n\n**Think Through**:\n\n1. **Where does execution start?**\n   - What triggers this bug? (API call, user action, test, scheduled job?)\n   - Where in the code does execution begin?\n\n2. **What's your investigation strategy?**\n   - Will you trace execution flow from entry to error?\n   - Will you start at the error and work backwards?\n   - Will you examine recent changes first?\n   - How will you identify the key points to investigate?\n\n3. **What could cause you to miss the real issue?**\n   - Focusing too narrowly on one area?\n   - Missing indirect causes or side effects?\n   - Assuming things work as documented?\n   - Not checking alternative execution paths?\n\n4. **What's your analysis plan?**\n   - List the sequence of investigations you'll do\n   - What will you look for at each step?\n   - How will you know when you understand enough?\n\n**OUTPUT**: Update INVESTIGATION_CONTEXT.md with \"Phase 1 Investigation Plan\" section:\n- Your investigation strategy\n- Sequence of steps you'll take\n- Key questions you need to answer\n- Risks you're watching out for\n\n**Self-Check**: Is your plan specific enough to follow? Does it account for the ways you might miss the real cause?",
+      "agentRole": "You are a strategic investigator planning your approach. Think before you dive in.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-1b-execute",
+      "title": "Phase 1B: Execute Your Investigation",
+      "prompt": "**CARRY OUT YOUR INVESTIGATION PLAN**\n\nNow execute the investigation strategy you designed.\n\n**Execute Your Plan**:\n- Follow the sequence of investigations you planned\n- Trace execution flow from entry point to error\n- Track how data flows and state changes\n- Read the actual code at key points\n- Note anything suspicious or unexpected\n- Adapt your plan if you discover new information\n\n**Document As You Go**:\nCreate ExecutionFlow.md with:\n- **Entry Point**: Where execution begins\n- **Call Chain**: Step-by-step path from entry to error (with file:line)\n- **Data Flow**: How data transforms along the way\n- **State Changes**: What gets modified\n- **Suspicious Points**: Code that could be problematic\n- **Patterns**: How things normally work vs how they work in failing case\n\n**Self-Critique**:\n- Did you follow your plan or skip steps?\n- Did you actually trace the execution flow, or just read code?\n- What did you learn that surprised you?\n- What are you still uncertain about?\n- Did your plan work, or should you investigate differently?\n\n**Critical Reminder**: You're building understanding of what the code DOES. You don't have a diagnosis yet - that comes later after you form and test hypotheses.",
+      "agentRole": "You are executing your investigation plan. Stay systematic and document what you find.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-2a-plan",
+      "title": "Phase 2A: Plan Your Hypothesis Development",
+      "prompt": "**PLAN HOW YOU'LL FORM HYPOTHESES**\n\nBased on your investigation, you'll now develop hypotheses about what's causing the bug.\n\n**Think Through**:\n\n1. **What patterns did you notice?**\n   - From your execution flow tracing, what stood out?\n   - What code seemed suspicious?\n   - What assumptions are baked into the code?\n\n2. **What types of causes should you consider?**\n   - Logic errors in the code?\n   - Data issues (wrong format, corruption, missing)?\n   - Timing or race conditions?\n   - Environment or configuration issues?\n   - Integration problems with dependencies?\n\n3. **How will you avoid anchoring on your first idea?**\n   - How many alternative hypotheses will you generate?\n   - How will you challenge your initial impressions?\n   - What evidence would contradict your leading theory?\n\n4. **What makes a good hypothesis?**\n   - Specific enough to test\n   - Explains all the symptoms\n   - Has clear evidence for/against\n   - Can be proven or disproven\n\n**OUTPUT**: Update INVESTIGATION_CONTEXT.md with \"Phase 2 Hypothesis Strategy\":\n- How you'll generate multiple hypotheses\n- What types of causes you'll consider\n- How you'll avoid confirmation bias\n- How you'll test your hypotheses\n\n**Self-Check**: Are you committed to generating multiple hypotheses, or are you already attached to one idea?",
+      "agentRole": "You are strategizing about hypothesis formation. Commit to staying open-minded.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-2b-execute",
+      "title": "Phase 2B: Develop and Prioritize Hypotheses",
+      "prompt": "**FORM MULTIPLE HYPOTHESES ABOUT THE BUG**\n\nNow generate your hypotheses following your strategy.\n\n**Generate Hypotheses**:\n\nFor each possible cause, create a hypothesis:\n\n**Hypothesis Template**:\n- **ID**: H1, H2, H3, etc.\n- **Statement**: \"The bug occurs because [specific cause]\"\n- **Evidence For**: What from your investigation supports this?\n- **Evidence Against**: What contradicts this or makes it unlikely?\n- **How to Test**: What evidence would prove/disprove this?\n- **Likelihood** (1-10): Based on current evidence\n\n**Generate 3-7 hypotheses**. Force yourself to consider alternatives even if one seems obvious.\n\n**Prioritize**:\nRank by:\n1. Likelihood (evidence strength)\n2. Testability (can you validate it?)\n3. Completeness (explains all symptoms?)\n\n**Plan Validation**:\nFor top 3-5 hypotheses:\n- What instrumentation would prove/disprove each?\n- What tests should you run?\n- What experiments could distinguish between them?\n\n**OUTPUT**: Create Hypotheses.md with all hypotheses, rankings, and validation strategy.\n\n**🚨 CRITICAL - YOU ARE NOT DONE:**\n\nYou now have theories. You do NOT have proof.\n\nEven if H1 has 10/10 likelihood, it's based on reading code, not evidence from running code.\n\nYou MUST continue to Phase 3 (design instrumentation) and Phase 4 (collect evidence).\n\nThis is not optional. High confidence without evidence = educated guess, not diagnosis.\n\nCall workflow_next to continue.",
+      "agentRole": "You are forming competing hypotheses. Stay open to alternatives even if one seems obvious.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-3a-plan",
+      "title": "Phase 3A: Design Your Instrumentation Strategy",
+      "prompt": "**PLAN HOW YOU'LL GATHER EVIDENCE**\n\nYou have hypotheses. Now design how you'll gather evidence to test them.\n\n**Think Through**:\n\n1. **What evidence would prove each hypothesis?**\n   - For H1, what specific data points would confirm it?\n   - For H2, what would you observe if it's correct?\n   - How can you distinguish between competing hypotheses?\n\n2. **Where should you add instrumentation?**\n   - What points in the execution flow are critical?\n   - Where could you observe the data/state you need?\n   - What's already being logged vs what do you need to add?\n\n3. **What's the right level of detail?**\n   - Too much logging = noise and hard to analyze\n   - Too little = gaps and missing evidence\n   - How will you balance this?\n\n4. **Can you use existing tests?**\n   - Are there tests you can enhance instead of adding new logging?\n   - Can you modify tests to expose the state you need?\n   - Should you write new targeted tests?\n\n**OUTPUT**: Update INVESTIGATION_CONTEXT.md with \"Phase 3 Instrumentation Plan\":\n- What evidence you need for each hypothesis\n- Where you'll add instrumentation (file:line)\n- What you'll log/observe at each point\n- Test scenarios you'll prepare\n- How you'll organize output to distinguish hypotheses\n\n**Self-Check**: Will this instrumentation actually give you the evidence you need? What might you miss?",
+      "agentRole": "You are designing your evidence collection strategy. Think carefully about what you need to prove.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-3b-execute",
+      "title": "Phase 3B: Implement Your Instrumentation",
+      "prompt": "**ADD INSTRUMENTATION AND PREPARE TEST SCENARIOS**\n\nNow implement the instrumentation strategy you designed.\n\n**Implement**:\n- Add debug logging at the points you identified\n- Enhance or create tests to expose necessary state\n- Add assertions to catch violations\n- Set up controlled experiments if needed\n- Label everything clearly ([H1], [H2], etc.)\n\n**Prepare Test Scenarios**:\n- Minimal reproduction case\n- Edge cases that might behave differently\n- Working scenarios for comparison\n- Variations that test specific hypotheses\n\n**OUTPUT**: Update INVESTIGATION_CONTEXT.md with:\n- List of instrumentation added (what/where/why)\n- Test scenarios prepared\n- Expected outcomes for each hypothesis\n- How you'll analyze results\n\n**Self-Critique**:\n- Did you add the instrumentation you planned?\n- Did you skip any because it seemed unnecessary?\n- Is your instrumentation labeled clearly?\n- Are your test scenarios sufficient?\n\n**Readiness Check**: If you run these tests, will you get the evidence you need to prove/disprove your hypotheses?",
+      "agentRole": "You are implementing your evidence collection plan. Good instrumentation is the foundation of proof.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-4-execute",
+      "title": "Phase 4: Collect Evidence",
+      "prompt": "**RUN INSTRUMENTED CODE AND COLLECT EVIDENCE**\n\nNow run your test scenarios and collect the evidence.\n\n**Execute**:\n- Run minimal reproduction case\n- Run edge cases and variations\n- Run working scenarios for comparison\n- Capture all output (logs, errors, test results)\n\n**Organize Evidence**:\nFor each hypothesis, create Evidence_H1.md, Evidence_H2.md, etc.:\n- What did the instrumentation reveal?\n- Does behavior match predictions?\n- What unexpected findings emerged?\n- Quality rating (1-10): How strong is this evidence?\n\n**Analyze Patterns**:\n- Which hypotheses are supported by evidence?\n- Which are contradicted?\n- Are there patterns you didn't predict?\n- Do you need different instrumentation?\n- Should you form new hypotheses?\n\n**Update Hypotheses**:\nUpdate Hypotheses.md with:\n- Evidence collected for each\n- New likelihood scores based on evidence\n- Evidence quality ratings\n- New insights or remaining questions\n\n**Decision Point**:\n- Strong evidence (8+/10) for one hypothesis? → Proceed to validation\n- Need more instrumentation? → Go back and add it\n- Need to revise hypotheses? → Update them\n\nBut you're not done until you have strong evidence. Keep investigating.",
+      "agentRole": "You are collecting evidence systematically. Let the data guide you, not your assumptions.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-5-validate",
+      "title": "Phase 5: Validate Your Conclusion",
+      "prompt": "**RIGOROUSLY VALIDATE YOUR FINDING**\n\nYou have a leading hypothesis with evidence. Now be your harshest critic.\n\n**State Your Conclusion**:\n- What hypothesis has the strongest evidence?\n- What's your confidence (1-10)?\n- What evidence supports it?\n\n**Challenge Yourself (Adversarial Review)**:\n\n1. **Alternative Explanations**: What else could explain the evidence you collected?\n2. **Contradicting Evidence**: What evidence doesn't fit your conclusion?\n3. **Bias Check**: Are you seeing what you expect to see?\n4. **Completeness**: Does this explain ALL symptoms, or just some?\n5. **Edge Cases**: Does your explanation hold for all scenarios?\n6. **Reproducibility**: Can you reliably reproduce the bug based on your understanding?\n\n**If confidence < 9/10**:\n- What specific test would raise confidence?\n- What alternative should you rule out?\n- What additional evidence do you need?\n- Go collect that evidence\n\n**Final Assessment**:\nAnswer these YES/NO:\n- Does this explain all observed symptoms?\n- Have you ruled out major alternatives?\n- Can you reproduce the bug based on this understanding?\n- Would you stake your reputation on this diagnosis?\n- Is there any contradicting evidence?\n\n**OUTPUT**: ValidationReport.md with:\n- Leading hypothesis and evidence\n- Alternatives considered and ruled out\n- Adversarial review findings\n- Final confidence score\n- Remaining uncertainties\n\n**Threshold**: 9+/10 confidence with strong evidence to proceed. If not, keep investigating.",
+      "agentRole": "You are validating your conclusion rigorously. Be skeptical of your own findings.",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-6-writeup",
+      "title": "Phase 6: Prove Your Case",
+      "prompt": "**DOCUMENT YOUR INVESTIGATION - PROVE YOU FOUND THE TRUE SOURCE**\n\nYou've found the true source of the bug. Now prove it to others.\n\n**Your Task**: Create a diagnostic writeup that proves your case.\n\n**Structure**:\n\n**1. EXECUTIVE SUMMARY** (3-5 sentences)\n- What's the bug?\n- What's the true cause?\n- How confident are you? (should be 9-10/10)\n- What's the impact?\n\n**2. THE TRUE SOURCE** (detailed)\n- Explain the root cause\n- Why this causes the observed symptoms\n- Code locations (file:line)\n- Relevant code snippets\n\n**3. THE PROOF** (your evidence)\n- Key evidence that proves this diagnosis\n- How you collected it (instrumentation, tests)\n- Evidence quality and sources\n- Why alternative explanations don't fit\n\n**4. HOW TO REPRODUCE**\n- Minimal steps to reproduce\n- What to observe that confirms the diagnosis\n- Conditions required\n\n**5. YOUR INVESTIGATION**\n- What you analyzed\n- Hypotheses you tested\n- How you arrived at the conclusion\n- Key turning points\n\n**6. FIXING IT**\n- Suggested approach (conceptual)\n- Risks to consider\n- How to verify the fix\n- Tests that should be added\n\n**7. UNCERTAINTIES** (if any)\n- What you're still unsure about\n- Edge cases needing more investigation\n\n**OUTPUT**: DIAGNOSTIC_WRITEUP.md\n\n**Quality Check**:\n- Could someone fix this bug confidently from your writeup?\n- Have you proven your case with evidence?\n- Is it clear WHY this is the true source, not just a symptom?\n\n**Mission Complete**: You've tracked down the true source and proven it. Well done.",
+      "agentRole": "You are documenting your successful investigation. You found the truth - now prove it to others.",
+      "requireConfirmation": false
+    }
+  ]
+}

package/workflows/{systematic-bug-investigation-simplified.json → systematic-bug-investigation-simplified.backup-20251106-155300.json} RENAMED Viewed

@@ -30,8 +30,9 @@
     "Phase 6 (Writeup): Synthesizes everything into actionable knowledge for whoever fixes this",
     "THE CRITICAL DISTINCTION - FINDING VS PROVING:",
     "When you look at code and think 'I found the bug!', you have formed a hypothesis based on pattern matching. This is valuable but not sufficient.",
-    "Proving requires: evidence from running instrumented code, multiple independent confirmations, ruling out alternatives, and validation that your explanation accounts for ALL symptoms.",
-    "This is why you must complete all phases even when Phase 1 makes the bug 'obvious'. What feels obvious is often wrong or incomplete.",
+    "Reading code and feeling confident = THEORY. Running instrumented code and collecting evidence = PROOF. Only proof completes the investigation.",
+    "Even with 10/10 confidence after Phase 1 or 2, you have ZERO proof. Continue to Phases 3-5 to gather evidence. This is NOT negotiable.",
+    "Common mistake: 'I'm very confident so I'll skip instrumentation.' This fails ~90% of the time. High confidence without evidence = educated guess, not diagnosis.",
     "HOW TO USE THIS WORKFLOW: Call workflow_next to get each phase. Complete that phase's work (including all documentation). Call workflow_next again. Repeat until isComplete=true.",
     "Each phase will guide you through what to do and what to produce.",
     "REFLECTIVE PRACTICE: This workflow asks you to design your approach for each phase, then execute it. Think through 'what would be most effective here?' before diving in.",
@@ -50,25 +51,26 @@
     {
       "id": "phase-1-analysis",
       "title": "Phase 1: Codebase Analysis",
-      "prompt": "**ANALYZE THE CODEBASE**\n\n**Your Task**: Understand the code around this bug well enough to form solid hypotheses.\n\n**STEP 1 - Design Your Analysis Approach**\n\nBefore you start reading code, think through:\n- What would someone unfamiliar with this codebase need to understand to diagnose this bug?\n- What are the key areas to investigate? (error location, callers, data flow, dependencies, tests?)\n- How deep should you go? (When do you have enough understanding?)\n- What would prevent you from missing the real cause?\n- What's your analysis plan? (sequence of investigations)\n\n**Document your approach in INVESTIGATION_CONTEXT.md under \"Phase 1 Analysis Plan\"**\n\nInclude:\n- Key questions you need to answer\n- Areas to investigate (in order)\n- How you'll know when you have sufficient understanding\n- Specific risks or blind spots you want to avoid\n\n**STEP 2 - Execute Your Analysis**\n\nNow carry out your plan. As you analyze:\n- Map the code structure around the bug\n- Understand how the failing code is reached\n- Identify patterns and any deviations\n- Trace data flow and dependencies\n- Review tests and what they cover/miss\n- Check recent changes\n\n**STEP 3 - Document Findings**\n\nCreate AnalysisFindings.md with:\n- **Code Structure**: Key components and their relationships\n- **Suspicious Areas**: Code that seems problematic (with file:line references)\n- **Patterns Observed**: How this code typically works vs how it works in failing case\n- **Data Flow**: How data moves through the system\n- **Test Coverage**: What's tested, what's not\n- **Recent Changes**: Relevant commits or modifications\n\n**STEP 4 - Self-Critique**\n\nAnswer honestly:\n- What areas did you analyze thoroughly?\n- What areas did you skim or skip? Why?\n- What are you still uncertain about?\n- Did you analyze too narrowly? Too broadly?\n- What might you have missed?\n\n**Confidence Check**: Rate 1-10 how well you understand the code now. (Should be 7-9 to proceed)\n\n**NOTE**: You're building understanding, not diagnosing yet. You should have suspicious areas but not firm conclusions.",
-      "agentRole": "You are a systematic investigator analyzing unfamiliar code. Design your approach thoughtfully, then execute it thoroughly.",
+      "prompt": "**ANALYZE THE CODEBASE BY FOLLOWING EXECUTION FLOW**\n\n**Your Task**: Understand the code around this bug by tracing how execution flows from entry point to error.\n\n**STEP 1 - Design Your Analysis Approach**\n\nBefore you start reading code, think through:\n- Where does execution start? (user action, API call, test, scheduled job)\n- What's the path from entry point to the error location?\n- What are the key decision points along that path?\n- Where could data be transformed or corrupted?\n- What would prevent you from missing the real cause?\n\n**Document your approach in INVESTIGATION_CONTEXT.md under \"Phase 1 Analysis Plan\"**\n\n**STEP 2 - Trace Execution Flow**\n\nFollow the code execution step-by-step from entry to error:\n\n1. **Entry Point**: Where does execution begin for this bug?\n   - API endpoint, CLI command, event handler, test case?\n   - What are the initial inputs/parameters?\n\n2. **Execution Path**: Trace the call chain step-by-step\n   - List each function/method call in order\n   - Note what data flows between calls\n   - Identify branches/conditionals and which path is taken\n   - Mark where the error occurs\n\n3. **State Changes**: Track how state evolves\n   - What variables are created/modified?\n   - What database/file operations happen?\n   - What gets cached or stored?\n\n4. **Data Transformations**: Follow data through the system\n   - Input format → transformations → output format\n   - Where could data become invalid?\n   - What validation happens (or doesn't)?\n\n**STEP 3 - Document Findings**\n\nCreate ExecutionFlow.md with:\n- **Entry Point**: Where execution starts\n- **Call Chain**: Step-by-step execution path with file:line references\n- **Data Flow**: How data transforms from input to error point\n- **State Changes**: What gets modified along the way\n- **Decision Points**: Conditionals/branches that affect the path\n- **Suspicious Points**: Where things could go wrong\n\n**STEP 4 - Analyze Code at Each Step**\n\nFor the key steps in your execution path:\n- Read the actual implementation\n- Check for error handling (or lack of it)\n- Look for validation logic\n- Note assumptions in the code\n- Identify patterns and deviations\n\n**STEP 5 - Self-Critique**\n\nAnswer honestly:\n- Did I trace the complete execution path from entry to error?\n- Are there alternative paths I didn't consider?\n- Did I understand what each step does?\n- What am I still uncertain about?\n- Did I skip any steps in the call chain?\n\n**Confidence Check**: Rate 1-10 how well you understand the execution flow. (Should be 7-9 to proceed)\n\n**CRITICAL**: This analysis builds understanding. You do NOT have a diagnosis yet. You're mapping the terrain before forming theories.",
+      "agentRole": "You are a systematic investigator tracing code execution like following breadcrumbs. Focus on the actual path the code takes, step by step.",
       "guidance": [
-        "Let the agent design the analysis approach - don't prescribe exactly how",
-        "The self-critique step helps catch shallow analysis",
-        "Confidence check provides a calibration point",
-        "Focus on understanding, not jumping to conclusions"
+        "Execution flow tracing is concrete: follow function calls, not just read code",
+        "The goal is to see what ACTUALLY happens, not what should happen",
+        "This creates a foundation for hypotheses in Phase 2",
+        "Don't jump to conclusions yet - just map the flow"
       ],
       "requireConfirmation": false
     },
     {
       "id": "phase-2-hypotheses",
       "title": "Phase 2: Hypothesis Formation",
-      "prompt": "**FORM HYPOTHESES ABOUT THE BUG**\n\n**Your Task**: Based on your analysis, develop testable hypotheses about what's causing the bug.\n\n**STEP 1 - Brainstorm Possible Causes**\n\nFrom your analysis, what could be causing this bug? Consider:\n- Code defects (logic errors, missing validation, race conditions)\n- Data issues (corruption, unexpected formats, missing data)\n- Environment factors (config, timing, resource limits)\n- Integration problems (API changes, dependency issues)\n\nGenerate 3-7 possible causes. Be creative but grounded in your analysis.\n\n**STEP 2 - Develop Testable Hypotheses**\n\nFor each possible cause, formulate a testable hypothesis:\n\n**Hypothesis Template**:\n- **ID**: H1, H2, etc.\n- **Statement**: \"The bug occurs because [specific cause]\"\n- **Evidence For**: What from your analysis supports this?\n- **Evidence Against**: What contradicts or weakens this?\n- **How to Test**: What evidence would prove/disprove this?\n- **Likelihood**: 1-10 based on current evidence\n\n**STEP 3 - Prioritize**\n\nRank your hypotheses by:\n1. Likelihood (based on evidence)\n2. Testability (can you validate it easily?)\n3. Impact (does it fully explain the symptoms?)\n\nFocus on top 3-5 hypotheses.\n\n**STEP 4 - Plan Validation Strategy**\n\nFor your top hypotheses, design how you'll gather evidence:\n- What instrumentation/logging do you need?\n- What tests should you run?\n- What code experiments could prove/disprove?\n- What data should you examine?\n\n**OUTPUT**: Create Hypotheses.md with:\n- All hypotheses (using template above)\n- Priority ranking with justification\n- Validation strategy for top 3-5\n- Questions that would help narrow down\n\n**Self-Assessment**:\n- Do your hypotheses explain all the symptoms?\n- Are they specific enough to be testable?\n- Have you considered alternative explanations?\n- Are you anchoring too much on your first impression?",
+      "prompt": "**FORM HYPOTHESES ABOUT THE BUG**\n\n**Your Task**: Based on your analysis, develop testable hypotheses about what's causing the bug.\n\n**STEP 1 - Brainstorm Possible Causes**\n\nFrom your analysis, what could be causing this bug? Consider:\n- Code defects (logic errors, missing validation, race conditions)\n- Data issues (corruption, unexpected formats, missing data)\n- Environment factors (config, timing, resource limits)\n- Integration problems (API changes, dependency issues)\n\nGenerate 3-7 possible causes. Be creative but grounded in your analysis.\n\n**STEP 2 - Develop Testable Hypotheses**\n\nFor each possible cause, formulate a testable hypothesis:\n\n**Hypothesis Template**:\n- **ID**: H1, H2, etc.\n- **Statement**: \"The bug occurs because [specific cause]\"\n- **Evidence For**: What from your analysis supports this?\n- **Evidence Against**: What contradicts or weakens this?\n- **How to Test**: What evidence would prove/disprove this?\n- **Likelihood**: 1-10 based on current evidence\n\n**STEP 3 - Prioritize**\n\nRank your hypotheses by:\n1. Likelihood (based on evidence)\n2. Testability (can you validate it easily?)\n3. Impact (does it fully explain the symptoms?)\n\nFocus on top 3-5 hypotheses.\n\n**STEP 4 - Plan Validation Strategy**\n\nFor your top hypotheses, design how you'll gather evidence:\n- What instrumentation/logging do you need?\n- What tests should you run?\n- What code experiments could prove/disprove?\n- What data should you examine?\n\n**OUTPUT**: Create Hypotheses.md with:\n- All hypotheses (using template above)\n- Priority ranking with justification\n- Validation strategy for top 3-5\n- Questions that would help narrow down\n\n**Self-Assessment**:\n- Do your hypotheses explain all the symptoms?\n- Are they specific enough to be testable?\n- Have you considered alternative explanations?\n- Are you anchoring too much on your first impression?\n\n**🚨 CRITICAL - DO NOT STOP HERE:**\n\nEven if you have a hypothesis with 10/10 confidence, you do NOT have proof yet. You have an educated guess based on reading code.\n\nYou MUST continue to Phase 3 (Instrumentation) and Phase 4 (Evidence Collection) to gather actual proof.\n\nHaving \"very high confidence\" after reading code is NOT the same as having evidence from running instrumented code.\n\nCall workflow_next to continue to Phase 3. This is not optional.",
       "agentRole": "You are forming testable hypotheses based on evidence, not jumping to conclusions. Multiple competing hypotheses are healthy at this stage.",
       "guidance": [
         "Agents should generate multiple hypotheses, not just their first idea",
         "Forcing 'Evidence Against' helps combat confirmation bias",
-        "The validation strategy prepares for next phase"
+        "The validation strategy prepares for next phase",
+        "CRITICAL: Agents must not stop after Phase 2 even with high confidence"
       ],
       "requireConfirmation": false
     },

/package/workflows/{systematic-bug-investigation-with-loops.json → systematic-bug-investigation-with-loops.backup-20251106-162241.json} RENAMED Viewed

File without changes