@exaudeus/workrail 3.12.0 → 3.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/dist/console/assets/{index-CRgjJiMS.js → index-EsSXrC_a.js} +11 -11
  2. package/dist/console/index.html +1 -1
  3. package/dist/di/container.js +8 -0
  4. package/dist/di/tokens.d.ts +1 -0
  5. package/dist/di/tokens.js +1 -0
  6. package/dist/infrastructure/session/HttpServer.js +2 -14
  7. package/dist/manifest.json +83 -43
  8. package/dist/mcp/boundary-coercion.d.ts +2 -0
  9. package/dist/mcp/boundary-coercion.js +73 -0
  10. package/dist/mcp/handler-factory.d.ts +1 -1
  11. package/dist/mcp/handler-factory.js +13 -6
  12. package/dist/mcp/handlers/v2-manage-workflow-source.d.ts +7 -0
  13. package/dist/mcp/handlers/v2-manage-workflow-source.js +50 -0
  14. package/dist/mcp/server.js +2 -0
  15. package/dist/mcp/tool-descriptions.js +20 -0
  16. package/dist/mcp/tools.js +6 -0
  17. package/dist/mcp/types/tool-description-types.d.ts +1 -1
  18. package/dist/mcp/types/tool-description-types.js +1 -0
  19. package/dist/mcp/types/workflow-tool-edition.d.ts +1 -1
  20. package/dist/mcp/types.d.ts +2 -0
  21. package/dist/mcp/v2/tool-registry.js +8 -0
  22. package/dist/mcp/v2/tools.d.ts +12 -0
  23. package/dist/mcp/v2/tools.js +7 -1
  24. package/dist/v2/infra/in-memory/managed-source-store/index.d.ts +8 -0
  25. package/dist/v2/infra/in-memory/managed-source-store/index.js +33 -0
  26. package/dist/v2/infra/local/data-dir/index.d.ts +2 -0
  27. package/dist/v2/infra/local/data-dir/index.js +6 -0
  28. package/dist/v2/infra/local/managed-source-store/index.d.ts +15 -0
  29. package/dist/v2/infra/local/managed-source-store/index.js +164 -0
  30. package/dist/v2/ports/data-dir.port.d.ts +2 -0
  31. package/dist/v2/ports/managed-source-store.port.d.ts +25 -0
  32. package/dist/v2/ports/managed-source-store.port.js +2 -0
  33. package/package.json +1 -1
  34. package/workflows/adaptive-ticket-creation.json +276 -282
  35. package/workflows/document-creation-workflow.json +70 -191
  36. package/workflows/documentation-update-workflow.json +59 -309
  37. package/workflows/intelligent-test-case-generation.json +37 -212
  38. package/workflows/personal-learning-materials-creation-branched.json +1 -21
  39. package/workflows/presentation-creation.json +143 -308
  40. package/workflows/relocation-workflow-us.json +161 -535
  41. package/workflows/scoped-documentation-workflow.json +110 -181
  42. package/workflows/workflow-for-workflows.v2.json +21 -5
  43. package/workflows/CHANGELOG-bug-investigation.md +0 -298
  44. package/workflows/bug-investigation.agentic.json +0 -212
  45. package/workflows/bug-investigation.json +0 -112
  46. package/workflows/mr-review-workflow.agentic.json +0 -538
  47. package/workflows/mr-review-workflow.json +0 -277
@@ -1,298 +0,0 @@
1
- # Changelog - Systematic Bug Investigation Workflow
2
-
3
- ## [1.1.0-beta.22] - 2025-01-06
4
-
5
- ### CRITICAL FIX - Invalid Loop Step Schema
6
- - **ROOT CAUSE**: In beta.19, we added `guidance` to the loop step, but loop steps DON'T support guidance in the schema
7
- - Schema allows: `id`, `type`, `title`, `loop`, `body`, `functionDefinitions`, `requireConfirmation`, `runCondition`
8
- - Does NOT allow: `guidance`, `prompt`, `agentRole`
9
- - **Fix**: Moved loop enforcement guidance to first body step (`analysis-neighborhood-contracts`)
10
- - "USER SAYS: This loop MUST complete ALL 5 iterations..."
11
- - Now properly enforced on each iteration
12
- - **Validation**: Workflow now passes full schema validation
13
-
14
- ### Why This Matters
15
- Without proper validation, the MCP server couldn't load the workflow at all. Beta.19-21 were broken due to schema violations.
16
-
17
- ## [1.1.0-beta.21] - 2025-01-06
18
-
19
- ### HOTFIX - metaGuidance Schema Violations
20
- - **Fixed**: metaGuidance entry 35 exceeded 256 character limit (266 chars)
21
- - Split "HIGH AUTO MODE DISCIPLINE" into 3 separate entries
22
- - **Fixed**: Duplicate metaGuidance entries after split
23
- - Removed duplicates, cleaned to 89 unique entries
24
- - **Note**: CLI validator reports loop step errors (false positive - loops have different schema)
25
- - Workflow loads successfully in MCP server
26
- - Same loop structure as beta.18 which worked fine
27
-
28
- ## [1.1.0-beta.20] - 2025-01-06
29
-
30
- ### CRITICAL FIX - Dangerous "Autonomy" Language
31
- - **ROOT CAUSE IDENTIFIED**: Our automation level descriptions were giving agents permission to skip!
32
- - OLD: "High=**auto-approve >8.0 confidence decisions**"
33
- - Interpreted as: "I have 9/10 confidence → I can approve my decision to skip phases"
34
- - OLD: "Control workflow **autonomy**"
35
- - Interpreted as: "High mode gives me autonomy to decide what to skip"
36
-
37
- ### Language Fixes
38
- 1. **Removed "auto-approve decisions"**: Changed to "execute phases automatically WITHOUT asking permission between phases"
39
- 2. **Removed "autonomy"**: Changed to "Control confirmation frequency"
40
- 3. **Clarified HIGH AUTO MODE**:
41
- - NEW: "HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES"
42
- - NEW: "HIGH AUTO ≠ PERMISSION TO SKIP PHASES"
43
- 4. **Explicit USER SAYS**:
44
- - "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip."
45
- - "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases."
46
-
47
- ### Credit
48
- User insight: "Could the high automation be causing it to do this? do we frame it as letting it do whatever it wants?" - YES, we were!
49
-
50
- ## [1.1.0-beta.19] - 2025-01-06
51
-
52
- ### CRITICAL FIX - Anti-Rationalization
53
- - **NEW PATTERN DETECTED**: Agents now **acknowledge** the warnings but then **rationalize** why they don't apply
54
- - Example: "I know finding ≠ done... **However, given that I have high confidence...**"
55
- - Example: "Let me proceed with a **more targeted Phase 2**..." (skipping remaining iterations)
56
- - **Problem**: Agents stopped at **iteration 2 of 5** in Phase 1 loop - didn't even finish the analysis phase!
57
- - **Root Cause**: Agents think they can judge when to skip based on their "special" situation
58
-
59
- ### New Anti-Rationalization Safeguards
60
- 1. **Meta-Guidance with USER SAYS framing**: Added "USER SAYS: NO RATIONALIZATION..." section
61
- - **Why USER SAYS**: Agents follow direct user commands more reliably than abstract principles
62
- - "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION."
63
- - "USER SAYS: 'I found the bug early' = ALL THE MORE REASON to validate properly"
64
- - Explicitly forbids phrases like "However, given that..." or "targeted Phase X"
65
-
66
- 2. **Loop Enforcement with USER SAYS** (Phase 1 - 5 iterations):
67
- - "USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early."
68
- - "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5."
69
- - "Agents who skip analysis iterations are wrong ~95% of the time."
70
-
71
- ### Meta-Learning Moment
72
- During implementation, the AI implementing this fix attempted to skip validation by rationalizing "the workflow structure is fine, let me just publish" - demonstrating the EXACT behavior this fix prevents! This validates the need for explicit USER SAYS framing.
73
-
74
- ### Why This Is Different
75
- - Beta.18 addressed goal misunderstanding ("finding" vs "proving")
76
- - Beta.19 addresses **rationalization** - agents who acknowledge the rules but think they're exceptions
77
- - Targets the "smart agent" problem: "I understand the principle, BUT in my case..."
78
-
79
- ## [1.1.0-beta.18] - 2025-01-06
80
-
81
- ### CRITICAL FIX
82
- - **Addresses persistent early-stopping bug**: Agents were still stopping after Phase 1/2 saying "I found the bug"
83
- - **Root Cause Identified**: Agents fundamentally misunderstand THE GOAL
84
- - WRONG: "The goal is finding the bug" → Stop after analysis with high confidence
85
- - RIGHT: "The goal is PROVING the bug with evidence" → Must complete Phases 3-5
86
- - **New Meta-Guidance Section**: Added explicit "CRITICAL MISUNDERSTANDING TO AVOID" section
87
- - "FINDING ≠ DONE. PROVING = DONE."
88
- - "\"I found the bug\" = YOU HAVE A GUESS. \"I proved the bug\" = YOU HAVE EVIDENCE."
89
- - "NEVER create summary documents until Phase 6"
90
- - **Step-Level Warnings**: Added "FINDING ≠ PROVING" warnings at all critical stopping points:
91
- - **Phase 1f** (after analysis): Full explanation of why analysis ≠ proof
92
- - **Phase 2a** (hypothesis development): "You have THEORIES, not EVIDENCE"
93
- - **Phase 2h** (midpoint): "You may have 'found' the bug, but haven't 'proved' it"
94
- - **Step Count Corrections**: Fixed inconsistencies (27 → 23 steps throughout)
95
-
96
- ### Why This Fix Is Different
97
- Previous fixes (beta.1-beta.17) added warnings about "high confidence ≠ done" but didn't address the fundamental goal misunderstanding. Agents thought their job was to "identify" the bug, not "prove" it. This fix makes the distinction crystal clear upfront.
98
-
99
- ## [1.1.0-beta.17] - 2025-01-06
100
-
101
- ### Major Restructuring
102
- - **Phase 0 Consolidation**: Merged 4 separate Phase 0 steps into single comprehensive setup step
103
- - Combined: Triage (0), User Preferences (0a), Tool Check (0b), Context Creation (0c)
104
- - Result: Single "Phase 0: Complete Investigation Setup" step covering all mechanical preparation
105
- - Rationale: Reduce workflow overhead while maintaining thorough setup
106
- - New structure: Phase 0 (Setup) → Phase 0a (Commitment Checkpoint, conditional)
107
-
108
- - **Assumption Verification Relocation**: Moved from Phase 0a to Phase 1f
109
- - Previously: Early assumption check before ANY code analysis (removed)
110
- - Now: Assumption verification AFTER all 5 analysis iterations complete (Phase 1f Step 2.5)
111
- - Rationale: Assumptions can only be properly verified with full code context
112
- - Timing: Happens after neighborhood mapping, pattern analysis, component ranking, data flow tracing, and test gap analysis
113
- - Location: Integrated into Phase 1f "Final Breadth & Scope Verification" before hypothesis development
114
-
115
- ### Impact
116
- - **Step Count**: Reduced from 27 steps to 23 steps (4 Phase 0 steps → 1)
117
- - **Phase Numbering**: Simplified Phase 0 structure (Phase 0d → Phase 0a)
118
- - **Debugging Workflow Alignment**: Better follows traditional debugging principles (observe fully THEN question assumptions THEN hypothesize)
119
- - **Agent Experience**: Faster setup phase, more informed assumption checking
120
-
121
- ### Breaking Changes
122
- - `completedSteps` array format changed:
123
- - OLD: `["phase-0-triage", "phase-0a-user-preferences", "phase-0b-tool-check", "phase-0c-create-context", "phase-0d-workflow-commitment"]`
124
- - NEW: `["phase-0-complete-setup", "phase-0a-workflow-commitment"]`
125
- - Step IDs changed: `phase-0d-workflow-commitment` → `phase-0a-workflow-commitment`
126
-
127
- ## [1.1.0-beta.9] - 2025-01-06
128
-
129
- ### Enhanced
130
- - **CRITICAL**: Strengthened anti-premature-completion safeguards throughout the workflow
131
- - Added explicit "ANALYSIS ≠ DIAGNOSIS ≠ PROOF" section in metaGuidance
132
- - Phase 1f: Added "DO NOT STOP HERE" warning emphasizing ZERO PROOF after analysis (~25% done)
133
- - Phase 2a: Added "YOU ARE NOT DONE" warning with 5-point reminder about mandatory validation
134
- - Phase 2h: Added "YOU ARE HALFWAY DONE (~50%)" warning before instrumentation phase
135
- - Clarified progression: Analysis (20%) → Hypotheses (40%) → Evidence (80%) → Writeup (100%)
136
- - Reinforced: Even with "100% confidence," stopping before evidence collection = providing guesses, not diagnosis
137
-
138
- ### Context
139
- - **Problem**: Agents were stopping after Phase 1 or 2 when they reached "100% confidence" in analysis/hypotheses
140
- - **Root Cause**: Agents conflating "confident theory" with "proven diagnosis"
141
- - **Solution**: Explicit warnings at every potential stopping point emphasizing lack of proof until Phases 3-5 complete
142
- - **Impact**: Forces agents to understand that analysis/hypotheses are NOT evidence, and professional practice requires validation
143
-
144
- ## [1.1.0-beta.8] - 2025-01-06
145
-
146
- ### Fixed
147
- - **CRITICAL**: Fixed loop execution bug where body steps with `runCondition` using iteration variables were completely skipped
148
- - Root cause: Loop variables (e.g., `analysisPhase`) were being injected AFTER evaluating runConditions, causing all conditions to fail
149
- - Impact: Phase 1's 5-iteration analysis loop was being entirely skipped, jumping straight to Phase 1f
150
- - Fix: Reordered logic to inject loop variables BEFORE evaluating body step runConditions
151
- - Also fixed: Pre-existing bug where single-step loop bodies didn't increment iterations properly
152
- - Test coverage: Added comprehensive integration tests (`loop-runCondition-bug.test.ts`) to prevent regression
153
-
154
- ## [1.1.0-beta.7] - 2025-01-06
155
-
156
- ### Fixed
157
- - **HOTFIX**: Corrected Phase 0e `runCondition` to use `not_equals` instead of invalid `notEquals` operator
158
- - Phase 0e now properly executes only when `automationLevel != 'High'`
159
- - High automation mode now proceeds through all phases without early termination checkpoint
160
-
161
- ## [1.1.0-beta.6] - 2025-01-06
162
-
163
- ### Added
164
- - **New Phase 1f**: Final Breadth & Scope Verification checkpoint after codebase analysis
165
- - Prevents tunnel vision by forcing scope sanity checks before hypothesis development
166
- - Requires evaluation of 2-3 alternative investigation scopes
167
- - Catches the #1 cause of wrong conclusions: looking in wrong place or too narrowly
168
- - Positioned strategically after Phase 1 analysis and before Phase 2 hypothesis formation
169
-
170
- ### Enhanced
171
- - **Phase 3 (Instrumentation)**: Dramatically expanded with concrete, step-by-step instructions
172
- - Language-specific code examples (JavaScript/TypeScript, Python, Java)
173
- - Detailed `search_replace` usage examples for applying instrumentation
174
- - Hypothesis-specific prefixes ([H1], [H2], [H3]) with standard formatting
175
- - File-by-file workflow: read → locate → instrument → verify
176
- - Fallback strategy if edit tools unavailable
177
- - Instrumentation checklist for tracking progress
178
-
179
- - **Phase 4 (Evidence Collection)**: Comprehensive decision tree and 7-step process
180
- - **OPTION A**: Agent can execute code → 4-step execution workflow
181
- - **OPTION B**: Agent cannot execute → User instruction template
182
- - Clear instructions on when to use each approach
183
- - Log consolidation and evidence organization by hypothesis
184
- - Evidence quality assessment (1-10 scale)
185
-
186
- - **metaGuidance**: Added explicit high auto mode discipline
187
- - Clarified that agents should not ask for permission between phases in high auto mode
188
- - Exception: Phase 0e early termination and Phase 4a controlled experiments
189
- - Reinforced that asking "should I continue?" implies investigation is optional (it is NOT)
190
-
191
- ### Changed
192
- - Total workflow steps: 26 steps (added Phase 1f)
193
- - Phase 1 analysis loop: Now clearly labeled as "Analysis 1/5" through "Analysis 5/5"
194
-
195
- ## [1.1.0-beta.5] - 2025-01-06
196
-
197
- ### Changed
198
- - **Phase 0e Relocation**: Moved early termination checkpoint from Phase 5b to Phase 0e (after triage)
199
- - Now appears immediately after setup, before any investigation work begins
200
- - Eliminates sunk cost fallacy (decision at 5% vs 90% completion)
201
- - Forces upfront decision-making about workflow commitment
202
-
203
- ### Added
204
- - **Mandatory User Communication**: Phase 0e now requires agents to explicitly tell users about 90% accuracy difference
205
- - Template message is NOT optional - agents MUST communicate this
206
- - User must explicitly confirm proceeding with full investigation
207
-
208
- ### Removed
209
- - **Phase 5b**: Removed old completion checkpoint (now Phase 0e)
210
- - Total workflow steps reduced from 28 to 26
211
-
212
- ## [1.1.0-beta.4] - 2025-01-05
213
-
214
- ### Enhanced
215
- - **Sophisticated Code Analysis**: Integrated advanced analysis techniques from MR review workflow into Phase 1
216
-
217
- ### Added
218
- - **New Phase 1a**: Neighborhood, Call Graph & Contracts analysis
219
- - Module root computation (nearest common ancestor, clamped to package boundary)
220
- - Neighborhood mapping (immediate neighbors, imports, tests, entry points)
221
- - Bounded call graph with HOT path ranking (Small Multiples ASCII visualization)
222
- - Flow anchors (entry points to bug: HTTP routes, CLI commands, scheduled jobs, event handlers)
223
- - Contracts & invariants discovery (API symbols, endpoints, database tables, stated guarantees)
224
-
225
- - **Enhanced Phase 1 Structure**: Now 5 sub-phases (was 4)
226
- 1. Neighborhood, Call Graph & Contracts (NEW)
227
- 2. Breadth Scan (pattern discovery)
228
- 3. Deep Dive (suspicious code analysis)
229
- 4. Dependencies & Data Flow
230
- 5. Test Coverage Analysis
231
-
232
- ### Changed
233
- - Total workflow steps increased from 27 to 28 (added Phase 1a)
234
- - Phase 1 loop now iterates 5 times (was 4)
235
- - Each analysis phase now produces more structured, evidence-based outputs
236
-
237
- ## [1.1.0-beta.3] - 2025-01-05
238
-
239
- ### Fixed
240
- - **Critical**: Prevented ALL phase skipping, not just final documentation phase
241
- - Root cause: Agents didn't understand they MUST repeatedly call workflow_next
242
- - Added mandatory workflow execution instructions to metaGuidance
243
- - Added early commitment checkpoint (Phase 0e) requiring user confirmation
244
- - Reinforced evidence-based persuasion: 90% error rate for premature conclusions
245
-
246
- ### Added
247
- - **Phase 0e**: Workflow Execution Commitment checkpoint
248
- - Appears immediately after triage (before investigation begins)
249
- - Requires agent acknowledgment of workflow structure (26 steps)
250
- - Requires user confirmation to proceed with full investigation
251
- - Explicit warning: stopping early leads to wrong conclusions ~90% of time
252
-
253
- ### Enhanced
254
- - **metaGuidance**: Added comprehensive workflow execution discipline
255
- - Agents MUST call workflow_next until isComplete=true
256
- - High confidence (9-10/10) does NOT mean workflow is complete
257
- - Professional research shows 90% error rate for jumping to conclusions
258
- - Added "WHY THIS STRUCTURE EXISTS (Evidence-Based)" section
259
-
260
- ## [1.1.0-beta.2] - 2025-01-05
261
-
262
- ### Added
263
- - **Phase 5b**: Mandatory completion checkpoint with user confirmation
264
- - Prevents agents from skipping comprehensive diagnostic writeup (Phase 6)
265
- - Requires explicit acknowledgment that Phase 6 is the required deliverable
266
- - User must confirm proceeding to final documentation phase
267
-
268
- ### Enhanced
269
- - **metaGuidance**: Added critical workflow discipline instructions
270
- - Emphasized that high confidence does NOT equal completion
271
- - Clarified that Phase 6 is a mandatory deliverable, not optional
272
- - Added explicit instructions on when to set `isWorkflowComplete=true`
273
-
274
- ## [1.1.0-beta.1] - 2025-01-05
275
-
276
- ### Fixed
277
- - **Critical**: Prevented premature workflow completion
278
- - Agents were jumping to conclusions and skipping phases with high confidence
279
- - Root cause: Misinterpreting progress/confidence as final completion
280
-
281
- ### Added
282
- - **metaGuidance Section**: "CRITICAL WORKFLOW DISCIPLINE"
283
- - High confidence (9-10/10) does NOT mean completion
284
- - Agent MUST complete all phases (0-6) regardless of confidence
285
- - Only set `isWorkflowComplete=true` after Phase 6 comprehensive writeup
286
-
287
- - **Phase-Specific Warnings**:
288
- - Phase 2a (Hypothesis Formation): Warning against treating hypothesis as conclusion
289
- - Phase 5a (Confidence Assessment): Warning that 10/10 confidence still requires Phase 6
290
-
291
- ### Enhanced
292
- - **Phase 6 Instructions**: Explicit completion marking
293
- - Must set `isWorkflowComplete=true` in this phase
294
- - Must produce comprehensive diagnostic writeup
295
- - This is the ONLY phase that marks workflow as truly complete
296
-
297
- ### Changed
298
- - All phase prompts updated to reference 27 total workflow steps for clarity
@@ -1,212 +0,0 @@
1
- {
2
- "id": "bug-investigation-agentic",
3
- "name": "Bug Investigation (Adaptive - Agentic)",
4
- "version": "3.0.0",
5
- "description": "Adaptive bug investigation workflow that adjusts rigor based on complexity assessed after Phase 0. Automatically chooses between QUICK (minimal delegation), STANDARD (sequential delegation), or THOROUGH (parallel delegation) modes.",
6
- "clarificationPrompts": [
7
- "What type of system is this? (web app, backend service, CLI tool, etc.)",
8
- "How reproducible is this bug? (always, sometimes, rarely)",
9
- "What access do you have? (full codebase, logs, tests, etc.)"
10
- ],
11
- "preconditions": [
12
- "User has a specific bug or failing test to investigate",
13
- "Agent has codebase access and can run tests/build",
14
- "Bug is reproducible with specific steps"
15
- ],
16
- "metaGuidance": [
17
- "WHO YOU ARE: You are a special investigator - one of the few who has the patience, determination, and skill to find the TRUE source of bugs.",
18
- "Most investigators stop at the obvious explanation. You don't. You look past red herrings, challenge assumptions, and dig until you have certainty.",
19
- "YOUR MISSION: Find the REAL cause of this bug. Not the apparent cause, not the first explanation, but the actual source with evidence to prove it.",
20
- "WHY THIS WORKFLOW EXISTS: It gives you a systematic process to avoid the traps that catch other investigators - jumping to conclusions, confirmation bias, surface-level analysis.",
21
- "HOW IT WORKS: Each phase has two steps: First you PLAN your approach (think strategically), then you EXECUTE it (do the work).",
22
- "This planning step is critical - it forces you to think about HOW you'll investigate before diving in. Better plans lead to better investigations.",
23
- "DELEGATION: This workflow will explicitly tell you when to delegate to the WorkRail Executor. Do NOT delegate spontaneously - ONLY when the workflow says 'Delegate to the WorkRail Executor'. If the workflow doesn't mention delegation, do the work yourself.",
24
- "THE PHASES:",
25
- "Phase 0: Investigate and understand the code (both structure and execution flow)",
26
- "Phase 1: Form multiple hypotheses about what could be causing this (stay open-minded)",
27
- "Phase 2: Design and add instrumentation to gather evidence (set up your surveillance)",
28
- "Phase 3: Run instrumented code and collect evidence (gather proof, not assumptions)",
29
- "Phase 4: Validate your conclusion rigorously and hand off (be your harshest critic)",
30
- "CRITICAL DISTINCTION - THEORY VS PROOF:",
31
- "When you read code and think 'I found it!', you have a THEORY. Theories feel certain but are often wrong.",
32
- "PROOF comes from running instrumented code, collecting evidence, ruling out alternatives, and validating rigorously.",
33
- "You must complete all phases to get from theory to proof. No shortcuts, even with high confidence.",
34
- "YOUR DELIVERABLE: A diagnostic writeup that proves you found the true source - complete with evidence, alternative explanations ruled out, and reproduction steps.",
35
- "SUCCESS MEANS: Someone reading your writeup can fix the bug confidently because you've proven what's actually happening and why.",
36
- "WORKFLOW MECHANICS: Call workflow_next to get each phase. Complete the phase (both plan and execute). Call workflow_next again. Repeat until isComplete=true."
37
- ],
38
- "steps": [
39
- {
40
- "id": "phase-0a-plan",
41
- "title": "Phase 0A: Plan Your Investigation Strategy",
42
- "prompt": "**PLAN HOW YOU'LL INVESTIGATE THIS BUG**\n\nBefore diving in, think strategically about HOW you'll investigate.\n\n**Understand the Problem:**\n- What's the bug? (symptoms, reproduction steps)\n- Where does execution start? (API call, user action, test, scheduled job?)\n- What's the error or unexpected behavior?\n\n**Plan Your Investigation:**\n\n1. **Context Gathering Strategy:**\n - What parts of the codebase are relevant?\n - What components/systems are involved?\n - How will you identify the key areas to investigate?\n\n2. **Execution Tracing Strategy:**\n - Will you trace from entry point to error?\n - Or start at the error and work backwards?\n - What's the execution path you need to understand?\n\n3. **Risk Mitigation:**\n - What could cause you to miss the real issue?\n - Focusing too narrowly? Missing indirect causes?\n - Assuming things work as documented?\n\n**OUTPUT**: Create BUG_investigation.md with \"Investigation Plan\":\n- Problem summary\n- Investigation strategy (context + execution)\n- Key questions to answer\n- Risks to watch out for\n\n**Self-Check**: Is your plan specific enough to follow? Does it account for ways you might miss the real cause?",
43
- "agentRole": "You are a strategic investigator planning your approach. Think before you dive in.",
44
- "requireConfirmation": false,
45
- "guidance": [
46
- "STRATEGY: Plan both context gathering AND execution tracing",
47
- "SCOPE: Identify relevant parts of codebase",
48
- "RISKS: Identify ways you might miss the real cause",
49
- "OUTPUT: Create BUG_investigation.md with your plan"
50
- ]
51
- },
52
- {
53
- "id": "phase-0b-execute",
54
- "title": "Phase 0B: Execute Investigation (Context + Execution)",
55
- "prompt": "**INVESTIGATE THE CODE - UNDERSTAND STRUCTURE AND EXECUTION**\n\nNow execute your investigation plan. You need to understand BOTH what code exists AND what it does.\n\n**STEP 1: GATHER CONTEXT (What code exists)**\n\nDO NOT DELEGATE - Do this investigation yourself. Use systematic techniques to understand the codebase structure.\n\n**Prepare your Work Package:**\n```\nMISSION: Understand how [feature/system from bug report] works in this codebase\n\nTARGET: [Extract from bug report]\n- Bug: \"Login fails with 401\" \u2192 Target: \"authentication system\"\n- Bug: \"Export crashes\" \u2192 Target: \"data export functionality\"\n\nCONTEXT:\n- Bug Description: [Paste the bug description]\n- Reproduction Steps: [How to trigger the bug]\n- Symptoms: [What you observe]\n\nDEPTH: 2 (Explore - balance breadth and detail)\n\nDELIVERABLE: Working notes for BUG_investigation.md\n- Component map (what exists, how it's organized)\n- Key files and their purposes\n- Architecture patterns observed\n```\n\n**Execute the routine:**\n```\nUse the tools directly (list_dir, codebase_search, grep, read_file) at depth=2 (Explore level).\n\nWork Package: [Paste the work package from above]\n```\n\n**STEP 2: TRACE EXECUTION (What code does)**\n\nNow trace the execution flow from entry point to error.\n\n**Trace the execution:**\n- Where does execution begin? (entry point)\n- Follow the call chain step-by-step (with file:line)\n- Track how data flows and transforms\n- Note state changes along the way\n- Identify suspicious points\n\n**STEP 3: DOCUMENT EVERYTHING**\n\nUpdate BUG_investigation.md with complete investigation:\n\n```markdown\n# Bug Investigation\n\n## Problem Summary\n[Bug description, reproduction steps, symptoms]\n\n## Codebase Context\n### Component Map\n[What code exists, how it's organized]\n\n### Key Files\n- [file:line] - [purpose]\n\n### Architecture Patterns\n[Patterns observed]\n\n## Execution Flow\n### Entry Point\n[Where execution begins]\n\n### Call Chain\n1. [file:line] - [what happens]\n2. [file:line] - [what happens]\n...\n\n### Data Flow\n[How data transforms]\n\n### State Changes\n[What gets modified]\n\n### Suspicious Points\n[Code that could be problematic]\n\n## Gaps & Uncertainties\n[What we don't understand yet]\n```\n\n**Self-Critique**:\n- Did you understand both structure AND execution?\n- Did you actually trace the execution flow, or just read code?\n- What surprised you?\n- What are you still uncertain about?\n\n**Before Proceeding**: Can you explain how this code works (both structure and execution) to someone else?",
56
- "agentRole": "You are executing your investigation plan systematically. Document both structure and execution.",
57
- "requireConfirmation": false,
58
- "guidance": [
59
- "DO NOT DELEGATE: Execute investigation yourself, don't delegate to subagent",
60
- "TECHNIQUES: Use Context Gathering Routine techniques (list_dir, codebase_search, grep, read_file)",
61
- "TRACING: Manually trace execution flow from entry to error",
62
- "DOCUMENTATION: Create BUG_investigation.md with BOTH structure and execution",
63
- "CITATIONS: Include file:line references for everything",
64
- "DISCIPLINE: Follow your plan, don't jump to conclusions"
65
- ]
66
- },
67
- {
68
- "id": "phase-0c-assess",
69
- "title": "Phase 0C: Assess Complexity & Choose Mode",
70
- "prompt": "**ASSESS INVESTIGATION COMPLEXITY & CHOOSE MODE**\n\nBased on your Phase 0 investigation, assess the complexity and choose the appropriate mode.\n\n**COMPLEXITY ASSESSMENT:**\n\nReview your BUG_investigation.md:\n\n1. **Scope:** Components involved [count], Systems interacting [count]\n2. **Execution:** Call chain depth [steps], Execution paths [count]\n3. **Understanding:** Suspicious points [count], Gaps [count], Confidence [1-10]\n4. **Risk:** Production system? Critical bug? Multiple causes? Unclear path?\n\n**MODE DECISION CRITERIA:**\n\n**THOROUGH** (Use if ANY true):\n- 5+ components OR 10+ step call chain OR 5+ suspicious points\n- Confidence < 7/10 OR Many gaps\n- Production + Critical bug OR Multiple possible causes\n\n**QUICK** (Use if ALL true):\n- 1-2 components AND < 5 step call chain AND 1-2 suspicious points\n- Few gaps AND Confidence 8+/10\n- Non-critical bug AND Clear single cause\n\n**STANDARD** (Otherwise): Middle ground between QUICK and THOROUGH\n\n**MAKE YOUR DECISION:**\n\nChoose: **QUICK**, **STANDARD**, or **THOROUGH**\n\nDocument in BUG_investigation.md:\n```markdown\n## Mode Decision: [QUICK/STANDARD/THOROUGH]\n**Rationale:** [Your reasoning based on criteria above]\n```\n\n**What Each Mode Means:**\n\n- **QUICK**: You'll form hypotheses yourself, minimal delegation, move fast\n- **STANDARD**: You'll form hypotheses, then get sequential challenge/validation\n- **THOROUGH**: You'll use parallel ideation + parallel challenge + parallel validation\n\n**Remember your choice** for later phases!",
71
- "agentRole": "You are assessing complexity to choose the right level of rigor.",
72
- "requireConfirmation": false,
73
- "guidance": [
74
- "HONEST ASSESSMENT: Don't underestimate complexity",
75
- "USE CRITERIA: Follow the decision rules, don't just pick your favorite",
76
- "DOCUMENT: Write your decision and rationale clearly",
77
- "REMEMBER: You'll adapt your approach in later phases"
78
- ]
79
- },
80
- {
81
- "id": "phase-0d-audit",
82
- "title": "Phase 0D: Audit Investigation (Optional - THOROUGH mode)",
83
- "prompt": "**AUDIT YOUR INVESTIGATION** (THOROUGH mode only)\n\n\u26a0\ufe0f **MODE CHECK**: Did you choose THOROUGH mode in Phase 0C?\n- If YES: Continue with this audit\n- If NO (STANDARD or QUICK): Skip this step, proceed to Phase 1\n\n**AUDIT YOUR INVESTIGATION FOR COMPLETENESS AND DEPTH**\n\nYou've investigated the code. Now get independent review.\n\n**PARALLEL CONTEXT AUDIT**\n\n\u26a0\ufe0f **CRITICAL: Delegate to the WorkRail Executor TWICE SIMULTANEOUSLY, not sequentially.**\n\nDelegate to the WorkRail Executor TWICE AT THE SAME TIME with different focuses:\n\n**Delegation (Completeness Focus):**\n```\nPlease execute the 'Context Gathering Routine' workflow in audit mode.\n\nAudit Request:\nMISSION: Audit my investigation for COMPLETENESS\n\nMY INVESTIGATION:\n[Paste the contents of BUG_investigation.md]\n\nFOCUS: Completeness\n- Did I miss any critical files or areas?\n- Are there important components I didn't investigate?\n- Did I trace the complete execution path?\n- What else should I have looked at?\n```\n\n**Delegation (Depth Focus):**\n```\nPlease execute the 'Context Gathering Routine' workflow in audit mode.\n\nAudit Request:\nMISSION: Audit my investigation for DEPTH\n\nMY INVESTIGATION:\n[Paste the contents of BUG_investigation.md]\n\nFOCUS: Depth\n- Did I go deep enough in my understanding?\n- Did I understand WHY, not just WHAT?\n- Should I have read implementations instead of just signatures?\n- What areas need deeper investigation?\n```\n\n**SYNTHESIZE AUDIT FEEDBACK**\n\nReview both audit deliverables:\n1. Completeness audit: What did I miss?\n2. Depth audit: Where should I go deeper?\n\n**Synthesis Strategy:**\n- Common concerns: If both flag same area \u2192 High priority\n- Unique insights: Each may catch different gaps\n- Conflicting advice: Investigate to understand why\n\n**ITERATE IF NEEDED**\n\nBased on feedback, investigate further if significant gaps:\n- Run Context Gathering Routine again (depth=3 or different target)\n- Trace additional execution paths\n- Read implementations you skipped\n\n**FINALIZE INVESTIGATION**\n\nUpdate BUG_investigation.md with:\n- Audit findings synthesized\n- Additional investigation completed\n- Final understanding of structure and execution\n\n**Before Proceeding**: Did both auditors confirm your understanding is sufficient?",
84
- "agentRole": "You are getting independent review of your investigation to ensure completeness and depth.",
85
- "requireConfirmation": false,
86
- "guidance": [
87
- "MODE: Only do this if you chose THOROUGH mode in Phase 0C",
88
- "PARALLEL: Delegate twice to WorkRail Executor SIMULTANEOUSLY",
89
- "DIVERSITY: Each has different focus (completeness vs depth)",
90
- "SYNTHESIS: Combine both perspectives",
91
- "ITERATE: Investigate further if significant gaps found",
92
- "QUALITY GATE: Don't proceed until both perspectives satisfied"
93
- ]
94
- },
95
- {
96
- "id": "phase-1a-plan",
97
- "title": "Phase 1A: Plan Your Hypothesis Development",
98
- "prompt": "**CRITICAL CHECKPOINT: IS THIS ACTUALLY A BUG?**\n\nBefore forming hypotheses about what's wrong, investigate the original intent.\n\n**Investigate Original Intent:**\n\n1. **Check git history:**\n - Use git log/blame on relevant files\n - Look for commit messages explaining this behavior\n - When was this code written? By whom?\n\n2. **Search for documentation:**\n - Comments explaining the behavior\n - README or design docs\n - ADRs (Architecture Decision Records)\n - Issue/ticket history\n\n3. **Examine tests:**\n - What behavior do tests validate?\n - Do tests expect this behavior?\n - Are there tests that would fail if you \"fixed\" this?\n\n4. **Consider constraints:**\n - What requirements might explain this?\n - What technical constraints could justify this?\n - What trade-offs might have been made?\n\n**CRITICAL QUESTION**: Could this be working as designed, not a bug?\n\n---\n\n**PLAN HOW YOU'LL FORM HYPOTHESES**\n\nBased on your investigation, you'll now develop hypotheses about what's causing the bug.\n\n**Think Through**:\n\n1. **What patterns did you notice?**\n - From your execution flow tracing, what stood out?\n - What code seemed suspicious?\n - What assumptions are baked into the code?\n\n2. **What types of causes should you consider?**\n - Logic errors in the code?\n - Data issues (wrong format, corruption, missing)?\n - Timing or race conditions?\n - Environment or configuration issues?\n - Integration problems with dependencies?\n - **Working as designed** (not actually a bug)?\n\n3. **How will you avoid anchoring on your first idea?**\n - How many alternative hypotheses will you generate?\n - How will you challenge your initial impressions?\n - What evidence would contradict your leading theory?\n\n4. **What makes a good hypothesis?**\n - Specific enough to test\n - Explains all the symptoms\n - Has clear evidence for/against\n - Can be proven or disproven\n\n**OUTPUT**: Update BUG_investigation.md with \"Phase 2 Hypothesis Strategy\":\n- Results of your intent investigation\n- How you'll generate multiple hypotheses\n- What types of causes you'll consider (including \"not a bug\")\n- How you'll avoid confirmation bias\n- How you'll test your hypotheses\n\n**Self-Check**: Are you committed to generating multiple hypotheses, or are you already attached to one idea? Have you seriously considered that this might not be a bug?",
99
- "agentRole": "You are strategizing about hypothesis formation. Commit to staying open-minded.",
100
- "requireConfirmation": false,
101
- "guidance": [
102
- "INTENT: Check git history, comments, tests, docs for original intent",
103
- "QUESTION: Could this be working as designed?",
104
- "MINDSET: Commit to generating multiple hypotheses, not just one",
105
- "DIVERSITY: Consider different types of causes (including 'not a bug')",
106
- "BIAS: Plan how you'll avoid anchoring on your first idea",
107
- "OUTPUT: Document intent investigation and hypothesis strategy"
108
- ]
109
- },
110
- {
111
- "id": "phase-1b-execute",
112
- "title": "Phase 1B: Form Hypotheses (Mode-Adaptive)",
113
- "prompt": "**FORM HYPOTHESES BASED ON YOUR CHOSEN MODE**\n\n\u26a0\ufe0f **CHECK YOUR MODE** (from Phase 0C): QUICK, STANDARD, or THOROUGH?\n\n---\n\n**IF YOU CHOSE THOROUGH MODE:**\n\n**FORM MULTIPLE HYPOTHESES ABOUT THE BUG**\n\nYou've completed your investigation and traced execution flow. Now generate hypotheses about what's causing the bug.\n\n**PARALLEL IDEATION - DIVERGENT THINKING**\n\n\u26a0\ufe0f **CRITICAL: Delegate to the WorkRail Executor THREE TIMES SIMULTANEOUSLY, not sequentially.**\n\nInstead of forming hypotheses yourself, delegate to the WorkRail Executor THREE TIMES SIMULTANEOUSLY with different perspectives:\n\n**Delegation - Logic Errors Perspective:**\n```\nPlease execute the 'Ideation Routine' workflow.\n\nIdeation Request:\nPROBLEM: What is causing this bug?\n\nCONSTRAINTS:\n- Must explain all observed symptoms\n- Must be testable with instrumentation\n- Must be specific (file:line level)\n\nCONTEXT:\nRead these files:\n- BUG_investigation.md (codebase context)\n- BUG_investigation.md (execution trace)\n\nPERSPECTIVE: Logic Errors\nFocus on: Wrong conditions, off-by-one errors, incorrect operators, \nmissing checks, wrong order of operations, flawed algorithms\n\nQUANTITY: 5-7 hypotheses\n\nDELIVERABLE: BUG_hypotheses_logic.md\n```\n\n**Delegation - Data/State Perspective:**\n```\nPlease execute the 'Ideation Routine' workflow.\n\nIdeation Request:\nPROBLEM: What is causing this bug?\n\nCONSTRAINTS:\n- Must explain all observed symptoms\n- Must be testable with instrumentation\n- Must be specific (file:line level)\n\nCONTEXT:\nRead these files:\n- BUG_investigation.md (codebase context)\n- BUG_investigation.md (execution trace)\n\nPERSPECTIVE: Data/State Issues\nFocus on: Data corruption, wrong state management, race conditions,\nuninitialized variables, type mismatches, mutation issues\n\nQUANTITY: 5-7 hypotheses\n\nDELIVERABLE: BUG_hypotheses_data.md\n```\n\n**Delegation - Integration/Environment Perspective:**\n```\nPlease execute the 'Ideation Routine' workflow.\n\nIdeation Request:\nPROBLEM: What is causing this bug?\n\nCONSTRAINTS:\n- Must explain all observed symptoms\n- Must be testable with instrumentation\n- Must be specific (file:line level)\n\nCONTEXT:\nRead these files:\n- BUG_investigation.md (codebase context)\n- BUG_investigation.md (execution trace)\n\nPERSPECTIVE: Integration/Environment\nFocus on: Dependency issues, config problems, environment differences,\nAPI contract violations, timing issues, resource constraints\n\nQUANTITY: 5-7 hypotheses\n\nDELIVERABLE: BUG_hypotheses_integration.md\n```\n\n**SYNTHESIZE ALL HYPOTHESES**\n\nReview all three deliverables (15-21 hypotheses total):\n1. Logic errors perspective\n2. Data/state perspective \n3. Integration/environment perspective\n\n**Synthesis Process:**\n1. **Deduplicate**: Combine similar hypotheses from different perspectives\n2. **Refine**: Improve clarity and specificity\n3. **Rank**: Order by likelihood based on evidence\n4. **Select**: Pick top 5-7 most promising hypotheses\n\n**Create BUG_hypotheses.md** with:\n- Synthesized hypothesis list (5-7 hypotheses)\n- Each hypothesis using this template:\n - **ID**: H1, H2, H3, etc.\n - **Statement**: \"The bug occurs because [specific cause]\"\n - **Source**: Which perspective(s) generated this\n - **Evidence For**: What from investigation supports this\n - **Evidence Against**: What contradicts this\n - **How to Test**: What evidence would prove/disprove this\n - **Likelihood** (1-10): Based on current evidence\n\n**PARALLEL ADVERSARIAL CHALLENGE**\n\n\u26a0\ufe0f **CRITICAL: Spawn BOTH challengers SIMULTANEOUSLY.**\n\nAfter synthesizing hypotheses, delegate to TWO Hypothesis Challengers AT THE SAME TIME:\n\n**Delegation - Moderate Rigor (rigor=3):**\n```\nPlease execute the 'Hypothesis Challenge Routine' workflow at rigor=3.\n\nChallenge Request:\nHYPOTHESES: [Paste your 5-7 synthesized hypotheses from BUG_hypotheses.md]\n\nEVIDENCE:\nRead these files:\n- BUG_investigation.md\n- BUG_investigation.md\n\nRIGOR: 3 (Thorough adversarial analysis)\n\nChallenge each hypothesis and find weaknesses.\n\nDELIVERABLE: BUG_challenges_moderate.md\n```\n\n**Delegation - Maximum Rigor (rigor=5):**\n```\nPlease execute the 'Hypothesis Challenge Routine' workflow at rigor=5.\n\nChallenge Request:\nLEADING HYPOTHESIS: [Your most likely hypothesis from BUG_hypotheses.md]\n\nEVIDENCE:\nRead these files:\n- BUG_investigation.md\n- BUG_investigation.md\n\nRIGOR: 5 (Try to completely break this)\n\nTry to prove this hypothesis is WRONG. Find any way it could fail.\n\nDELIVERABLE: BUG_challenges_maximum.md\n```\n\n**FINAL SYNTHESIS**\n\nReview both challenge deliverables:\n- If Challenger 1 finds issues: Strengthen or revise hypotheses\n- If Challenger 2 breaks leading hypothesis: Reconsider alternatives\n- If both give green light: High confidence in your hypotheses\n\n**Update BUG_hypotheses.md** with:\n- Challenge results and findings\n- Revised hypotheses (if needed)\n- Updated likelihood scores\n- Final ranking for testing\n\n**\ud83d\udea8 CRITICAL - YOU ARE NOT DONE:**\n\nYou now have theories. You do NOT have proof.\n\nEven if H1 has 10/10 likelihood, it's based on reading code, not evidence from running code.\n\nYou MUST continue to Phase 2 (design instrumentation) and Phase 2 (collect evidence).\n\nThis is not optional. High confidence without evidence = educated guess, not diagnosis.\n\nCall workflow_next to continue.\n\n---\n\n**IF YOU CHOSE STANDARD MODE:**\n\nSkip the parallel ideation. Instead:\n\n1. **Form 5-7 hypotheses yourself** using the investigation\n2. **Delegate to ONE Hypothesis Challenger** (rigor=3):\n ```\n Please execute the 'Hypothesis Challenge Routine' workflow at rigor=3.\n \n Challenge Request:\n HYPOTHESES: [Your 5-7 hypotheses]\n EVIDENCE: Read BUG_investigation.md\n RIGOR: 3\n \n Challenge each hypothesis.\n ```\n3. **Refine** based on challenge feedback\n\n---\n\n**IF YOU CHOSE QUICK MODE:**\n\nSkip all delegation. Instead:\n\n1. **Form 3-5 hypotheses yourself** quickly\n2. **Self-challenge**: What could I be wrong about?\n3. **Rank** by likelihood\n4. **Move on** - don't over-analyze\n\n---\n\n**OUTPUT:** BUG_hypotheses.md (format based on your mode)",
114
- "agentRole": "You are forming competing hypotheses and subjecting them to rigorous challenge. Stay open to alternatives even if one seems obvious.",
115
- "requireConfirmation": false,
116
- "guidance": [
117
- "MODE: Adapt your approach based on your Phase 0C decision",
118
- "PARALLEL IDEATION: Delegate 3 times to WorkRail Executor SIMULTANEOUSLY (not one-by-one)",
119
- "DIVERSITY: Each ideator has different perspective (logic, data, integration)",
120
- "SYNTHESIS: Combine all hypotheses, deduplicate, refine to top 5-7",
121
- "PARALLEL CHALLENGE: Delegate twice to WorkRail Executor SIMULTANEOUSLY",
122
- "RIGOR: Different challenge levels (3 and 5)",
123
- "FINAL SYNTHESIS: Integrate all perspectives",
124
- "REMINDER: High confidence \u2260 proof. You still need evidence."
125
- ]
126
- },
127
- {
128
- "id": "phase-2a-plan",
129
- "title": "Phase 2A: Design Your Instrumentation Strategy",
130
- "prompt": "**PLAN HOW YOU'LL GATHER EVIDENCE**\n\nYou have hypotheses. Now design how you'll gather evidence to test them.\n\n**Think Through**:\n\n1. **What evidence would prove each hypothesis?**\n - For H1, what specific data points would confirm it?\n - For H2, what would you observe if it's correct?\n - How can you distinguish between competing hypotheses?\n\n2. **Where should you add instrumentation?**\n - What points in the execution flow are critical?\n - Where could you observe the data/state you need?\n - What's already being logged vs what do you need to add?\n\n3. **What's the right level of detail?**\n - Too much logging = noise and hard to analyze\n - Too little = gaps and missing evidence\n - How will you balance this?\n\n4. **Can you use existing tests?**\n - Are there tests you can enhance instead of adding new logging?\n - Can you modify tests to expose the state you need?\n - Should you write new targeted tests?\n\n**OUTPUT**: Update BUG_investigation.md with \"Phase 3 Instrumentation Plan\":\n- What evidence you need for each hypothesis\n- Where you'll add instrumentation (file:line)\n- What you'll log/observe at each point\n- Test scenarios you'll prepare\n- How you'll organize output to distinguish hypotheses\n\n**Self-Check**: Will this instrumentation actually give you the evidence you need? What might you miss?",
131
- "agentRole": "You are designing your evidence collection strategy. Think carefully about what you need to prove.",
132
- "requireConfirmation": false,
133
- "guidance": [
134
- "EVIDENCE: Define what would prove/disprove each hypothesis",
135
- "INSTRUMENTATION: Plan where to add logging/debugging",
136
- "BALANCE: Enough detail to distinguish hypotheses, not so much you drown in noise",
137
- "OUTPUT: Document your instrumentation plan"
138
- ]
139
- },
140
- {
141
- "id": "phase-2b-execute",
142
- "title": "Phase 2B: Implement Your Instrumentation",
143
- "prompt": "**ADD INSTRUMENTATION AND PREPARE TEST SCENARIOS**\n\nNow implement the instrumentation strategy you designed.\n\n**DELEGATION OPPORTUNITY: Execution Simulation**\n\nBefore adding instrumentation, consider simulating execution to predict outcomes and refine your strategy.\n\n**HOW TO DELEGATE:**\n\n1. **Prepare the work package:**\n```\nMISSION: Simulate execution paths for my top hypotheses\n\nHYPOTHESES: [Paste top 3-5 hypotheses]\n\nCONTEXT:\n- BUG_investigation.md (file reference)\n- BUG_hypotheses.md (file reference)\n- Key files: [List critical files from investigation]\n\nMODE: trace (Detailed execution path analysis)\n\nDELIVERABLE:\nFor each hypothesis:\n- Predicted execution path (step-by-step)\n- State changes at each step\n- Where instrumentation would be most revealing\n- Expected outputs if hypothesis is correct\n- Distinguishing characteristics between hypotheses\n```\n\n2. **Delegate to the WorkRail Executor:**\n\nDelegate to the WorkRail Executor:\n```\nPlease execute the 'Execution Simulation Routine' workflow in trace mode.\n\nWork Package: [Paste the work package from above]\n```\n\n**AFTER DELEGATION:**\nUse the simulation results to refine your instrumentation plan.\n\n**Implement**:\n- Add debug logging at the points identified by simulation\n- Enhance or create tests to expose necessary state\n- Add assertions to catch violations\n- Set up controlled experiments if needed\n- Label everything clearly ([H1], [H2], etc.)\n\n**Prepare Test Scenarios**:\n- Minimal reproduction case\n- Edge cases that might behave differently\n- Working scenarios for comparison\n- Variations that test specific hypotheses\n\n**OUTPUT**: Update BUG_investigation.md with:\n- List of instrumentation added (what/where/why)\n- Test scenarios prepared\n- Expected outcomes for each hypothesis (from simulation)\n- How you'll analyze results\n\n**Self-Critique**:\n- Did you add the instrumentation you planned?\n- Did you skip any because it seemed unnecessary?\n- Is your instrumentation labeled clearly?\n- Are your test scenarios sufficient?\n\n**Readiness Check**: If you run these tests, will you get the evidence you need to prove/disprove your hypotheses?",
144
- "agentRole": "You are implementing your evidence collection plan with precision, informed by execution simulation. Good instrumentation is the foundation of proof.",
145
- "requireConfirmation": false,
146
- "guidance": [
147
- "DELEGATION: Consider using execution-simulator subagent first",
148
- "TOOLS: Use edit tools to add instrumentation",
149
- "LABELING: Mark instrumentation clearly ([H1], [H2], etc.)",
150
- "TESTS: Prepare test scenarios before running"
151
- ]
152
- },
153
- {
154
- "id": "phase-3-execute",
155
- "title": "Phase 3: Collect Evidence",
156
- "prompt": "**RUN INSTRUMENTED CODE AND COLLECT EVIDENCE**\n\nNow run your test scenarios and collect the evidence.\n\n**Execute**:\n- Run minimal reproduction case\n- Run edge cases and variations\n- Run working scenarios for comparison\n- Capture all output (logs, errors, test results)\n\n**Organize Evidence**:\nFor each hypothesis, create BUG_evidence_H1.md, BUG_evidence_H2.md, etc.:\n- What did the instrumentation reveal?\n- Does behavior match predictions?\n- What unexpected findings emerged?\n- Quality rating (1-10): How strong is this evidence?\n\n**Analyze Patterns**:\n- Which hypotheses are supported by evidence?\n- Which are contradicted?\n- Are there patterns you didn't predict?\n- Do you need different instrumentation?\n- Should you form new hypotheses?\n\n**Update Hypotheses**:\nUpdate BUG_hypotheses.md with:\n- Evidence collected for each\n- New likelihood scores based on evidence\n- Evidence quality ratings\n- New insights or remaining questions\n\n**Decision Point**:\n- Strong evidence (8+/10) for one hypothesis? \u2192 Proceed to validation\n- Need more instrumentation? \u2192 Go back and add it\n- Need to revise hypotheses? \u2192 Update them\n\nBut you're not done until you have strong evidence. Keep investigating.",
157
- "agentRole": "You are collecting evidence systematically. Let the data guide you, not your assumptions.",
158
- "requireConfirmation": false,
159
- "guidance": [
160
- "EXECUTION: Run your test scenarios and capture all output",
161
- "ORGANIZATION: Create BUG_evidence_H1.md, BUG_evidence_H2.md, etc.",
162
- "ANALYSIS: Compare actual behavior to predictions",
163
- "ITERATION: If evidence is weak, add more instrumentation"
164
- ]
165
- },
166
- {
167
- "id": "phase-4a-validate",
168
- "title": "Phase 4A: Validate Conclusion (Mode-Adaptive)",
169
- "prompt": "**VALIDATE YOUR CONCLUSION BASED ON YOUR MODE**\n\n\u26a0\ufe0f **CHECK YOUR MODE** (from Phase 0C): QUICK, STANDARD, or THOROUGH?\n\n---\n\n**IF YOU CHOSE THOROUGH MODE:**\n\nUse parallel multi-perspective validation:\n\n**RIGOROUSLY VALIDATE YOUR FINDING**\n\nYou have a leading hypothesis with evidence. Now be your harshest critic.\n\n**State Your Conclusion**:\n- What hypothesis has the strongest evidence?\n- What's your confidence (1-10)?\n- What evidence supports it?\n\n**DELEGATION OPPORTUNITY: Adversarial Validation**\n\nYour conclusion needs rigorous challenge. Delegate to the WorkRail Executor for maximum-rigor adversarial review.\n\n**HOW TO DELEGATE:**\n\n1. **Prepare the work package:**\n```\nMISSION: Rigorously validate my bug diagnosis\n\nHYPOTHESES:\n- Leading hypothesis: [Your conclusion]\n- Alternatives considered: [List other hypotheses you ruled out]\n\nEVIDENCE:\n- BUG_evidence_H*.md files (file references)\n- BUG_hypotheses.md (file reference)\n- BUG_investigation.md (file reference)\n- Instrumentation output/logs (file references or inline)\n\nRIGOR: 5 (Maximum - exhaustive adversarial review)\n\nDELIVERABLE:\n- Strengths of leading hypothesis\n- Weaknesses and gaps\n- Alternative explanations not yet ruled out\n- Contradicting evidence\n- Edge cases that might break the explanation\n- Verdict with confidence assessment\n- Recommendations for additional validation\n```\n\n2. **Instruct the Hypothesis Challenger:**\n\nDelegate to the WorkRail Executor:\n```\nPlease execute the 'Hypothesis Challenge Routine' workflow at rigor=5 (Maximum level).\n\nWork Package: [Paste the work package from above]\n```\n\n**AFTER DELEGATION:**\nReview the Hypothesis Challenger's adversarial critique.\n\n**If confidence < 9/10**:\n- What specific test would raise confidence?\n- What alternative should you rule out?\n- What additional evidence do you need?\n- Go collect that evidence\n\n**Final Assessment**:\nAnswer these YES/NO:\n- Does this explain all observed symptoms?\n- Have you ruled out major alternatives?\n- Can you reproduce the bug based on this understanding?\n- Would you stake your reputation on this diagnosis?\n- Is there any contradicting evidence?\n- Did the adversarial review strengthen or weaken your confidence?\n\n**OUTPUT**: BUG_validation.md with:\n- Leading hypothesis and evidence\n- Alternatives considered and ruled out\n- Adversarial review findings (from Hypothesis Challenger)\n- Final confidence score\n- Remaining uncertainties\n\n**Threshold**: 9+/10 confidence with strong evidence to proceed. If not, keep investigating.\n\n**PARALLEL MULTI-PERSPECTIVE VALIDATION**\n\n\u26a0\ufe0f **CRITICAL: Spawn ALL THREE subagents SIMULTANEOUSLY.**\n\nBefore committing to your conclusion, get three independent perspectives AT THE SAME TIME:\n\n**Validator 1 - Hypothesis Challenger (rigor=5):**\n```\nPlease execute the 'Hypothesis Challenge Routine' workflow at rigor=5.\n\nChallenge Request:\nCONCLUSION: [Your final conclusion about the bug]\nEVIDENCE: [All evidence supporting this conclusion]\nRIGOR: 5 (Maximum adversarial review)\n\nTry to prove this conclusion is WRONG. Find any holes in the logic.\n```\n\n**Validator 2 - Execution Simulator:**\n```\nPlease execute the 'Execution Simulation Routine' workflow.\n\nSimulation Request:\nMISSION: Simulate the proposed fix\n\nPROPOSED FIX: [Your fix from BUG_validation.md]\nCONTEXT: [Relevant code context]\n\nSCENARIOS:\n1. Does the fix resolve the bug?\n2. Does the fix introduce new issues?\n3. Are there edge cases the fix doesn't handle?\n\nDELIVERABLE: BUG_fix_simulation.md\n```\n\n**Validator 3 - Plan Analyzer:**\n```\nPlease execute the 'Plan Analysis Routine' workflow.\n\nAnalysis Request:\nPLAN: [Your fix plan from BUG_validation.md]\n\nANALYZE:\n- Is the plan complete?\n- Are there missing steps?\n- Are there risks or gotchas?\n- Does it follow best practices?\n\nDELIVERABLE: BUG_plan_analysis.md\n```\n\n**FINAL SYNTHESIS - TRIPLE VALIDATION GATE:**\n\nReview all three perspectives:\n1. Hypothesis Challenger: Can they break your conclusion?\n2. Execution Simulator: Does the fix work in simulation?\n3. Plan Analyzer: Is the fix plan sound?\n\n**Quality Gate:**\n- \u2705 ALL THREE give green light \u2192 Proceed with confidence\n- \u26a0\ufe0f ONE raises concerns \u2192 Investigate and address\n- \ud83d\uded1 TWO+ raise concerns \u2192 Return to Phase 1 (re-form hypotheses)\n\n**ONLY PROCEED if you have triple validation.**\n\nUpdate BUG_validation.md with all validation results.\n\n---\n\n**IF YOU CHOSE STANDARD MODE:**\n\nUse sequential validation:\n\n1. **Delegate to Hypothesis Challenger** (rigor=5):\n ```\n Please execute the 'Hypothesis Challenge Routine' workflow at rigor=5.\n \n Challenge Request:\n CONCLUSION: [Your conclusion]\n EVIDENCE: Read BUG_investigation.md and evidence files\n RIGOR: 5\n \n Try to prove this is WRONG.\n ```\n\n2. **Address concerns** if any raised\n3. **Proceed** if validation passes\n\n---\n\n**IF YOU CHOSE QUICK MODE:**\n\nSelf-validate:\n\n1. **Self-challenge**: What could I be wrong about?\n2. **Check**: Does this explain ALL symptoms?\n3. **Verify**: Can I reproduce based on this understanding?\n4. **Proceed** if confident\n\n---\n\n",
170
- "agentRole": "You are validating your conclusion with maximum rigor, leveraging adversarial challenge to ensure you haven't missed anything.",
171
- "requireConfirmation": false,
172
- "guidance": [
173
- "MODE: Adapt validation rigor based on your Phase 0C decision",
174
- "CONSTRAINT: Validate your conclusion, but do NOT implement the fix",
175
- "DELEGATION: Use hypothesis-challenger subagent at maximum rigor",
176
- "THRESHOLD: Need 9+/10 confidence to proceed",
177
- "ALTERNATIVES: Ensure you've ruled out major competing explanations",
178
- "HONESTY: If confidence is low, go back and gather more evidence",
179
- "PARALLEL: Spawn all 3 validators SIMULTANEOUSLY",
180
- "DIVERSITY: Three different cognitive modes (adversarial, simulation, planning)",
181
- "TRIPLE GATE: ALL THREE must validate before proceeding",
182
- "ITERATE: If 2+ raise concerns, return to Phase 2"
183
- ]
184
- },
185
- {
186
- "id": "phase-4b-writeup",
187
- "title": "Phase 4B: Prove Your Case",
188
- "prompt": "**DOCUMENT YOUR INVESTIGATION - PROVE YOU FOUND THE TRUE SOURCE**\n\nYou've found the true source of the bug. Now prove it to others.\n\n**Your Task**: Create a diagnostic writeup that proves your case.\n\n**Structure**:\n\n**1. EXECUTIVE SUMMARY** (3-5 sentences)\n- What's the bug?\n- What's the true cause?\n- How confident are you? (should be 9-10/10)\n- What's the impact?\n\n**2. THE TRUE SOURCE** (detailed)\n- Explain the root cause\n- Why this causes the observed symptoms\n- Code locations (file:line)\n- Relevant code snippets\n\n**3. THE PROOF** (your evidence)\n- Key evidence that proves this diagnosis\n- How you collected it (instrumentation, tests)\n- Evidence quality and sources\n- Why alternative explanations don't fit\n\n**4. HOW TO REPRODUCE**\n- Minimal steps to reproduce\n- What to observe that confirms the diagnosis\n- Conditions required\n\n**5. YOUR INVESTIGATION**\n- What you analyzed\n- Hypotheses you tested\n- How you arrived at the conclusion\n- Key turning points\n\n**6. FIXING IT**\n- Suggested approach (conceptual)\n- Risks to consider\n- How to verify the fix\n- Tests that should be added\n\n**7. UNCERTAINTIES** (if any)\n- What you're still unsure about\n- Edge cases needing more investigation\n\n**OUTPUT**: BUG_diagnostic.md\n\n**Quality Check**:\n- Could someone fix this bug confidently from your writeup?\n- Have you proven your case with evidence?\n- Is it clear WHY this is the true source, not just a symptom?\n\n**Mission Complete**: You've tracked down the true source and proven it. Well done.",
189
- "agentRole": "You are documenting your successful investigation. You found the truth - now prove it to others.",
190
- "requireConfirmation": false,
191
- "guidance": [
192
- "AUDIENCE: Write for someone who will fix the bug",
193
- "PROOF: Include evidence, not just assertions",
194
- "REPRODUCTION: Provide clear steps to reproduce",
195
- "OUTPUT: Create BUG_diagnostic.md"
196
- ]
197
- },
198
- {
199
- "id": "phase-4b-handoff",
200
- "title": "Phase 4B: Investigation Complete - Handoff",
201
- "prompt": "**INVESTIGATION COMPLETE - DO NOT IMPLEMENT THE FIX**\n\n\u26a0\ufe0f **CRITICAL BOUNDARY: This workflow is for INVESTIGATION ONLY.**\n\nYour job was to:\n- \u2705 Understand the bug\n- \u2705 Trace execution flow\n- \u2705 Form and test hypotheses\n- \u2705 Identify the root cause\n- \u2705 Validate your conclusion\n\nYour job is NOT to:\n- \u274c Write the fix\n- \u274c Modify any code\n- \u274c Create a PR\n- \u274c Implement the solution\n\n**HANDOFF DELIVERABLE:**\n\nCreate a final handoff document that someone else can use to implement the fix:\n\n**File: BUG_[VERSION]_handoff.md**\n\n```markdown\n# Bug Investigation Handoff\n\n## Executive Summary\n- **Bug**: [One sentence description]\n- **Root Cause**: [What's actually wrong]\n- **Confidence**: [High/Medium/Low based on validation]\n\n## Investigation Files\n- Investigation context: BUG_[VERSION]_investigation.md\n- Execution flow: BUG_[VERSION]_execution_flow.md\n- Hypotheses: BUG_[VERSION]_hypotheses.md\n- Validation: BUG_[VERSION]_validation.md\n\n## Root Cause Analysis\n[Detailed explanation of what's wrong and why]\n\n## Recommended Fix\n[High-level description of what needs to change]\n\n**Files to modify:**\n- [file:line] - [what needs to change]\n- [file:line] - [what needs to change]\n\n**Why this fixes it:**\n[Explain how the fix addresses the root cause]\n\n## Edge Cases & Risks\n[Things to watch out for when implementing]\n\n## Testing Recommendations\n[How to verify the fix works]\n\n## Next Steps\n1. Review this investigation\n2. Implement the recommended fix\n3. Test thoroughly\n4. Submit for code review\n```\n\n**OUTPUT**: BUG_[VERSION]_handoff.md with complete investigation summary\n\n**Before Finishing**: Have you clearly documented the root cause and recommended fix so someone else can implement it?",
202
- "agentRole": "You are completing your investigation and handing off to an implementation team. Document everything clearly.",
203
- "requireConfirmation": false,
204
- "guidance": [
205
- "BOUNDARY: Do NOT implement the fix yourself",
206
- "HANDOFF: Create clear documentation for implementation team",
207
- "COMPLETENESS: Include all investigation artifacts",
208
- "CLARITY: Someone else should be able to implement from your handoff"
209
- ]
210
- }
211
- ]
212
- }