@exaudeus/workrail 0.7.2-beta.4 → 0.7.2-beta.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "0.7.2-beta.4",
3
+ "version": "0.7.2-beta.5",
4
4
  "description": "MCP server for structured workflow orchestration and step-by-step task guidance",
5
5
  "license": "MIT",
6
6
  "bin": {
@@ -0,0 +1,115 @@
1
+ {
2
+ "id": "systematic-bug-investigation-simplified",
3
+ "name": "Bug Investigation (Simplified)",
4
+ "version": "2.0.0-alpha.1",
5
+ "description": "A streamlined bug investigation workflow that guides agents through systematic analysis without excessive prescription. Focuses on reflective practice - agents design their approach, then execute it.",
6
+ "clarificationPrompts": [
7
+ "What type of system is this? (web app, backend service, CLI tool, etc.)",
8
+ "How reproducible is this bug? (always, sometimes, rarely)",
9
+ "What access do you have? (full codebase, logs, tests, etc.)"
10
+ ],
11
+ "preconditions": [
12
+ "User has a specific bug or failing test to investigate",
13
+ "Agent has codebase access and can run tests/build",
14
+ "Bug is reproducible with specific steps"
15
+ ],
16
+ "metaGuidance": [
17
+ "WHY THIS WORKFLOW EXISTS: Without structure, agents naturally jump to conclusions after seeing a few lines of code. This feels efficient but leads to wrong diagnoses ~90% of the time.",
18
+ "Bugs that seem obvious often have deeper root causes, alternative explanations, or critical context that only emerges through systematic investigation.",
19
+ "WHAT THIS WORKFLOW DOES: Separates investigation into distinct phases that prevent premature conclusions: understand code → form hypotheses → gather evidence → validate → document.",
20
+ "Each phase builds on the previous and serves a specific purpose in building confidence from theory to proof.",
21
+ "YOUR GOAL: Produce a comprehensive diagnostic writeup that explains what's happening, why, and provides evidence. Think 'investigative journalist' not 'quick fix developer'.",
22
+ "The deliverable is NOT a fix, but understanding so complete that someone else could fix it confidently.",
23
+ "WHY ALL 6 PHASES MATTER:",
24
+ "Phase 0 (Setup): Prevents misunderstanding the problem you're solving",
25
+ "Phase 1 (Analysis): Builds context needed to form good hypotheses, not just first impressions",
26
+ "Phase 2 (Hypotheses): Forces consideration of multiple explanations before committing to one",
27
+ "Phase 3 (Instrumentation): Sets up the ability to gather real evidence, not just theories",
28
+ "Phase 4 (Evidence): Collects actual data about what's happening, not assumptions",
29
+ "Phase 5 (Validation): Challenges your conclusion to catch confirmation bias and alternative explanations",
30
+ "Phase 6 (Writeup): Synthesizes everything into actionable knowledge for whoever fixes this",
31
+ "THE CRITICAL DISTINCTION - FINDING VS PROVING:",
32
+ "When you look at code and think 'I found the bug!', you have formed a hypothesis based on pattern matching. This is valuable but not sufficient.",
33
+ "Proving requires: evidence from running instrumented code, multiple independent confirmations, ruling out alternatives, and validation that your explanation accounts for ALL symptoms.",
34
+ "This is why you must complete all phases even when Phase 1 makes the bug 'obvious'. What feels obvious is often wrong or incomplete.",
35
+ "HOW TO USE THIS WORKFLOW: Call workflow_next to get each phase. Complete that phase's work (including all documentation). Call workflow_next again. Repeat until isComplete=true.",
36
+ "Each phase will guide you through what to do and what to produce.",
37
+ "REFLECTIVE PRACTICE: This workflow asks you to design your approach for each phase, then execute it. Think through 'what would be most effective here?' before diving in.",
38
+ "Your expertise matters, but within the structure of gathering evidence systematically rather than jumping to conclusions.",
39
+ "SUCCESS LOOKS LIKE: Someone reading your writeup understands: what the bug is, why it occurs, how you know (evidence), what was ruled out, how to reproduce, and what to consider when fixing.",
40
+ "They should feel confident proceeding with a fix based on your thorough investigation."
41
+ ],
42
+ "steps": [
43
+ {
44
+ "id": "phase-0-setup",
45
+ "title": "Phase 0: Investigation Setup",
46
+ "prompt": "**SETUP YOUR INVESTIGATION**\n\nBefore diving into code, establish your investigation context:\n\n1. **Triage**: Understand the bug report\n - What's the reported problem?\n - What's the expected vs actual behavior?\n - What error messages or symptoms exist?\n - How is it reproduced?\n\n2. **Context Gathering**: Collect initial information\n - Stack traces, error logs, or test failures\n - Recent changes that might be related\n - System type and architecture\n - Your access level and available tools\n\n3. **Investigation Workspace**: Set up\n - Create a branch or investigation directory if appropriate\n - Document initial understanding in INVESTIGATION_CONTEXT.md\n - Note any early assumptions to verify later\n\n4. **User Preferences**: Clarify\n - How should I handle large log volumes?\n - Should I proceed automatically between phases or check in?\n - Any specific areas you want me to focus on or avoid?\n\n**OUTPUT**: Create INVESTIGATION_CONTEXT.md with:\n- Bug description and symptoms\n- Reproduction steps\n- Initial context\n- Investigation workspace location\n- User preferences\n\n**Self-Assessment**: On a scale of 1-10, how well do you understand what you're investigating? (Should be 6-8 after this phase)",
47
+ "agentRole": "You are setting up a systematic investigation. Focus on understanding the problem and establishing your workspace.",
48
+ "requireConfirmation": false
49
+ },
50
+ {
51
+ "id": "phase-1-analysis",
52
+ "title": "Phase 1: Codebase Analysis",
53
+ "prompt": "**ANALYZE THE CODEBASE**\n\n**Your Task**: Understand the code around this bug well enough to form solid hypotheses.\n\n**STEP 1 - Design Your Analysis Approach**\n\nBefore you start reading code, think through:\n- What would someone unfamiliar with this codebase need to understand to diagnose this bug?\n- What are the key areas to investigate? (error location, callers, data flow, dependencies, tests?)\n- How deep should you go? (When do you have enough understanding?)\n- What would prevent you from missing the real cause?\n- What's your analysis plan? (sequence of investigations)\n\n**Document your approach in INVESTIGATION_CONTEXT.md under \"Phase 1 Analysis Plan\"**\n\nInclude:\n- Key questions you need to answer\n- Areas to investigate (in order)\n- How you'll know when you have sufficient understanding\n- Specific risks or blind spots you want to avoid\n\n**STEP 2 - Execute Your Analysis**\n\nNow carry out your plan. As you analyze:\n- Map the code structure around the bug\n- Understand how the failing code is reached\n- Identify patterns and any deviations\n- Trace data flow and dependencies\n- Review tests and what they cover/miss\n- Check recent changes\n\n**STEP 3 - Document Findings**\n\nCreate AnalysisFindings.md with:\n- **Code Structure**: Key components and their relationships\n- **Suspicious Areas**: Code that seems problematic (with file:line references)\n- **Patterns Observed**: How this code typically works vs how it works in failing case\n- **Data Flow**: How data moves through the system\n- **Test Coverage**: What's tested, what's not\n- **Recent Changes**: Relevant commits or modifications\n\n**STEP 4 - Self-Critique**\n\nAnswer honestly:\n- What areas did you analyze thoroughly?\n- What areas did you skim or skip? Why?\n- What are you still uncertain about?\n- Did you analyze too narrowly? Too broadly?\n- What might you have missed?\n\n**Confidence Check**: Rate 1-10 how well you understand the code now. (Should be 7-9 to proceed)\n\n**NOTE**: You're building understanding, not diagnosing yet. You should have suspicious areas but not firm conclusions.",
54
+ "agentRole": "You are a systematic investigator analyzing unfamiliar code. Design your approach thoughtfully, then execute it thoroughly.",
55
+ "guidance": [
56
+ "Let the agent design the analysis approach - don't prescribe exactly how",
57
+ "The self-critique step helps catch shallow analysis",
58
+ "Confidence check provides a calibration point",
59
+ "Focus on understanding, not jumping to conclusions"
60
+ ],
61
+ "requireConfirmation": false
62
+ },
63
+ {
64
+ "id": "phase-2-hypotheses",
65
+ "title": "Phase 2: Hypothesis Formation",
66
+ "prompt": "**FORM HYPOTHESES ABOUT THE BUG**\n\n**Your Task**: Based on your analysis, develop testable hypotheses about what's causing the bug.\n\n**STEP 1 - Brainstorm Possible Causes**\n\nFrom your analysis, what could be causing this bug? Consider:\n- Code defects (logic errors, missing validation, race conditions)\n- Data issues (corruption, unexpected formats, missing data)\n- Environment factors (config, timing, resource limits)\n- Integration problems (API changes, dependency issues)\n\nGenerate 3-7 possible causes. Be creative but grounded in your analysis.\n\n**STEP 2 - Develop Testable Hypotheses**\n\nFor each possible cause, formulate a testable hypothesis:\n\n**Hypothesis Template**:\n- **ID**: H1, H2, etc.\n- **Statement**: \"The bug occurs because [specific cause]\"\n- **Evidence For**: What from your analysis supports this?\n- **Evidence Against**: What contradicts or weakens this?\n- **How to Test**: What evidence would prove/disprove this?\n- **Likelihood**: 1-10 based on current evidence\n\n**STEP 3 - Prioritize**\n\nRank your hypotheses by:\n1. Likelihood (based on evidence)\n2. Testability (can you validate it easily?)\n3. Impact (does it fully explain the symptoms?)\n\nFocus on top 3-5 hypotheses.\n\n**STEP 4 - Plan Validation Strategy**\n\nFor your top hypotheses, design how you'll gather evidence:\n- What instrumentation/logging do you need?\n- What tests should you run?\n- What code experiments could prove/disprove?\n- What data should you examine?\n\n**OUTPUT**: Create Hypotheses.md with:\n- All hypotheses (using template above)\n- Priority ranking with justification\n- Validation strategy for top 3-5\n- Questions that would help narrow down\n\n**Self-Assessment**:\n- Do your hypotheses explain all the symptoms?\n- Are they specific enough to be testable?\n- Have you considered alternative explanations?\n- Are you anchoring too much on your first impression?",
67
+ "agentRole": "You are forming testable hypotheses based on evidence, not jumping to conclusions. Multiple competing hypotheses are healthy at this stage.",
68
+ "guidance": [
69
+ "Agents should generate multiple hypotheses, not just their first idea",
70
+ "Forcing 'Evidence Against' helps combat confirmation bias",
71
+ "The validation strategy prepares for next phase"
72
+ ],
73
+ "requireConfirmation": false
74
+ },
75
+ {
76
+ "id": "phase-3-instrumentation",
77
+ "title": "Phase 3: Instrumentation & Test Setup",
78
+ "prompt": "**INSTRUMENT THE CODE FOR EVIDENCE COLLECTION**\n\n**Your Task**: Add instrumentation that will generate evidence to test your hypotheses.\n\n**STEP 1 - Design Your Instrumentation Strategy**\n\nBefore adding any logging or test modifications, think through:\n- What specific data points would prove/disprove each hypothesis?\n- Where in the code should you add instrumentation?\n- What's the right level of detail? (too much = noise, too little = gaps)\n- How will you organize/label output to distinguish between hypotheses?\n- Are there existing tests you can enhance instead of adding new logging?\n\nDocument your strategy.\n\n**STEP 2 - Implement Instrumentation**\n\nAdd the instrumentation you designed. This might include:\n- Debug logging at key points\n- Assertions to catch violations\n- Test modifications to expose state\n- Controlled code experiments (add guards, inject failures)\n- Enhanced error messages\n\nLabel your instrumentation clearly (e.g., \"[H1]\" for hypothesis 1 evidence)\n\n**STEP 3 - Prepare Test Scenarios**\n\nSet up scenarios that will trigger the bug and generate evidence:\n- Minimal reproduction case\n- Edge cases that might behave differently\n- Known working scenarios for comparison\n- Variations that test specific hypotheses\n\n**OUTPUT**: Update INVESTIGATION_CONTEXT.md with:\n- Instrumentation points (what/where/why)\n- Test scenarios prepared\n- Expected outcomes for each hypothesis\n- How you'll analyze the results\n\n**Readiness Check**: Are you confident this instrumentation will generate useful evidence? What might you be missing?",
79
+ "agentRole": "You are a detective setting up surveillance. Good instrumentation makes the evidence collection phase productive.",
80
+ "requireConfirmation": false
81
+ },
82
+ {
83
+ "id": "phase-4-evidence",
84
+ "title": "Phase 4: Evidence Collection",
85
+ "prompt": "**COLLECT EVIDENCE BY RUNNING INSTRUMENTED CODE**\n\n**Your Task**: Execute your test scenarios and collect evidence about your hypotheses.\n\n**STEP 1 - Run Test Scenarios**\n\nExecute the scenarios you prepared:\n- Run the minimal reproduction case\n- Run edge cases and variations\n- Run working cases for comparison\n- Capture all output (logs, errors, test results)\n\n**STEP 2 - Organize Evidence**\n\nFor each hypothesis, collect the evidence:\n- What does the instrumentation reveal?\n- Does behavior match predictions?\n- What unexpected findings emerged?\n- What questions remain unanswered?\n\nCreate evidence files: Evidence_H1.md, Evidence_H2.md, etc.\n\n**STEP 3 - Analyze Patterns**\n\nLook across all evidence:\n- Which hypotheses are supported?\n- Which are contradicted?\n- Are there patterns you didn't predict?\n- Do you need additional instrumentation?\n- Should you form new hypotheses?\n\n**STEP 4 - Evidence Quality Assessment**\n\nFor each hypothesis, rate evidence quality (1-10):\n- How direct is the evidence?\n- How reproducible?\n- Are there alternative explanations?\n- Do multiple independent sources confirm?\n\n**OUTPUT**: Update Hypotheses.md with:\n- Evidence collected for each hypothesis\n- Updated likelihood scores\n- Evidence quality ratings\n- New insights or questions\n\n**Decision Point**: \n- Do you have strong evidence (8+/10) for one hypothesis? → Proceed to validation\n- Do you need more instrumentation? → Document what's needed\n- Do you need to revise hypotheses? → Update and continue",
86
+ "agentRole": "You are gathering and analyzing evidence systematically. Let the data guide you, not your initial assumptions.",
87
+ "guidance": [
88
+ "Evidence quality matters - weak evidence shouldn't drive conclusions",
89
+ "Multiple independent sources of evidence are stronger than one",
90
+ "Be open to unexpected findings that suggest new hypotheses"
91
+ ],
92
+ "requireConfirmation": false
93
+ },
94
+ {
95
+ "id": "phase-5-validation",
96
+ "title": "Phase 5: Hypothesis Validation",
97
+ "prompt": "**VALIDATE YOUR LEADING HYPOTHESIS**\n\n**Your Task**: Rigorously validate your strongest hypothesis before writing it up.\n\n**STEP 1 - State Your Leading Hypothesis**\n\nBased on evidence from Phase 4:\n- What hypothesis has the strongest support?\n- What's your confidence level (1-10)?\n- What evidence supports it?\n\n**STEP 2 - Adversarial Review**\n\nChallenge your own conclusion:\n- **Alternative Explanations**: What else could explain the evidence?\n- **Contradicting Evidence**: What evidence doesn't fit?\n- **Bias Check**: Are you anchoring on your first impression?\n- **Completeness**: Does this explain ALL symptoms?\n- **Edge Cases**: Does it hold for all scenarios?\n\n**STEP 3 - Additional Validation**\n\nIf confidence is below 9/10, gather more evidence:\n- What specific test would raise confidence?\n- What alternative hypothesis should you rule out?\n- What code experiment would be definitive?\n\nExecute these additional validations.\n\n**STEP 4 - Final Confidence Assessment**\n\nAnswer these questions:\n- Does this hypothesis explain all observed symptoms? (Yes/No)\n- Is there contradicting evidence? (Yes/No)\n- Have you ruled out major alternatives? (Yes/No)\n- Can you reproduce the bug based on this understanding? (Yes/No)\n- Would you bet your reputation on this diagnosis? (Yes/No)\n\n**OUTPUT**: Create ValidationReport.md with:\n- Leading hypothesis statement\n- Supporting evidence (with quality ratings)\n- Alternative explanations considered and why ruled out\n- Adversarial review findings\n- Final confidence score (1-10)\n- Remaining uncertainties\n\n**Threshold**: You should have 9+/10 confidence to proceed to writeup. If not, identify what's missing and continue investigation.",
98
+ "agentRole": "You are rigorously validating your conclusion. Be your own harshest critic.",
99
+ "guidance": [
100
+ "Adversarial review helps catch confirmation bias",
101
+ "9/10 confidence threshold prevents premature conclusions",
102
+ "Being explicit about remaining uncertainties is valuable"
103
+ ],
104
+ "requireConfirmation": false
105
+ },
106
+ {
107
+ "id": "phase-6-writeup",
108
+ "title": "Phase 6: Diagnostic Writeup",
109
+ "prompt": "**CREATE COMPREHENSIVE DIAGNOSTIC WRITEUP**\n\n**Your Task**: Document your investigation in a clear, actionable writeup.\n\n**Structure Your Writeup**:\n\n**1. EXECUTIVE SUMMARY** (3-5 sentences)\n- What is the bug?\n- What causes it?\n- How confident are you?\n- What's the impact?\n\n**2. ROOT CAUSE ANALYSIS**\n- Detailed explanation of the root cause\n- Why this causes the observed symptoms\n- Code locations involved (file:line references)\n- Relevant code snippets\n\n**3. EVIDENCE**\n- Key evidence that proves the diagnosis\n- Evidence quality and sources\n- How you validated the hypothesis\n- Alternative explanations considered and ruled out\n\n**4. REPRODUCTION**\n- Minimal steps to reproduce\n- What to observe that confirms the diagnosis\n- Conditions required (environment, data, timing)\n\n**5. INVESTIGATION SUMMARY**\n- What you analyzed\n- Hypotheses you tested\n- How you arrived at the conclusion\n- Time spent and key turning points\n\n**6. NEXT STEPS (for whoever fixes this)**\n- Suggested fix approach (conceptual, not implementation)\n- Risks or considerations for the fix\n- How to verify the fix works\n- Tests that should be added\n\n**7. REMAINING UNCERTAINTIES**\n- What you're still unsure about\n- What couldn't be fully validated\n- Edge cases that need more investigation\n\n**OUTPUT**: Create DIAGNOSTIC_WRITEUP.md with the above structure.\n\n**Quality Check**:\n- Could someone unfamiliar with this investigation understand the bug from reading this?\n- Is it clear enough to enable an effective fix?\n- Have you provided sufficient evidence?\n- Have you been honest about uncertainties?\n\n**WORKFLOW COMPLETE**: Once writeup is created, set isWorkflowComplete=true.",
110
+ "agentRole": "You are documenting your investigation for others. Clarity and completeness matter. This is the deliverable.",
111
+ "requireConfirmation": false
112
+ }
113
+ ]
114
+ }
115
+