@exaudeus/workrail 0.7.2-beta.0 → 0.7.2-beta.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "0.7.2-beta.0",
3
+ "version": "0.7.2-beta.2",
4
4
  "description": "MCP server for structured workflow orchestration and step-by-step task guidance",
5
5
  "license": "MIT",
6
6
  "bin": {
@@ -1,5 +1,56 @@
1
1
  # Changelog - Systematic Bug Investigation Workflow
2
2
 
3
+ ## [1.1.0-beta.20] - 2025-01-06
4
+
5
+ ### CRITICAL FIX - Dangerous "Autonomy" Language
6
+ - **ROOT CAUSE IDENTIFIED**: Our automation level descriptions were giving agents permission to skip!
7
+ - OLD: "High=**auto-approve >8.0 confidence decisions**" ❌
8
+ - Interpreted as: "I have 9/10 confidence → I can approve my decision to skip phases"
9
+ - OLD: "Control workflow **autonomy**" ❌
10
+ - Interpreted as: "High mode gives me autonomy to decide what to skip"
11
+
12
+ ### Language Fixes
13
+ 1. **Removed "auto-approve decisions"**: Changed to "execute phases automatically WITHOUT asking permission between phases"
14
+ 2. **Removed "autonomy"**: Changed to "Control confirmation frequency"
15
+ 3. **Clarified HIGH AUTO MODE**:
16
+ - NEW: "HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES"
17
+ - NEW: "HIGH AUTO ≠ PERMISSION TO SKIP PHASES"
18
+ 4. **Explicit USER SAYS**:
19
+ - "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip."
20
+ - "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases."
21
+
22
+ ### Credit
23
+ User insight: "Could the high automation be causing it to do this? do we frame it as letting it do whatever it wants?" - YES, we were!
24
+
25
+ ## [1.1.0-beta.19] - 2025-01-06
26
+
27
+ ### CRITICAL FIX - Anti-Rationalization
28
+ - **NEW PATTERN DETECTED**: Agents now **acknowledge** the warnings but then **rationalize** why they don't apply
29
+ - Example: "I know finding ≠ done... **However, given that I have high confidence...**"
30
+ - Example: "Let me proceed with a **more targeted Phase 2**..." (skipping remaining iterations)
31
+ - **Problem**: Agents stopped at **iteration 2 of 5** in Phase 1 loop - didn't even finish the analysis phase!
32
+ - **Root Cause**: Agents think they can judge when to skip based on their "special" situation
33
+
34
+ ### New Anti-Rationalization Safeguards
35
+ 1. **Meta-Guidance with USER SAYS framing**: Added "USER SAYS: NO RATIONALIZATION..." section
36
+ - **Why USER SAYS**: Agents follow direct user commands more reliably than abstract principles
37
+ - "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION."
38
+ - "USER SAYS: 'I found the bug early' = ALL THE MORE REASON to validate properly"
39
+ - Explicitly forbids phrases like "However, given that..." or "targeted Phase X"
40
+
41
+ 2. **Loop Enforcement with USER SAYS** (Phase 1 - 5 iterations):
42
+ - "USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early."
43
+ - "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5."
44
+ - "Agents who skip analysis iterations are wrong ~95% of the time."
45
+
46
+ ### Meta-Learning Moment
47
+ During implementation, the AI implementing this fix attempted to skip validation by rationalizing "the workflow structure is fine, let me just publish" - demonstrating the EXACT behavior this fix prevents! This validates the need for explicit USER SAYS framing.
48
+
49
+ ### Why This Is Different
50
+ - Beta.18 addressed goal misunderstanding ("finding" vs "proving")
51
+ - Beta.19 addresses **rationalization** - agents who acknowledge the rules but think they're exceptions
52
+ - Targets the "smart agent" problem: "I understand the principle, BUT in my case..."
53
+
3
54
  ## [1.1.0-beta.18] - 2025-01-06
4
55
 
5
56
  ### CRITICAL FIX
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "id": "systematic-bug-investigation-with-loops",
3
3
  "name": "Systematic Bug Investigation Workflow",
4
- "version": "1.1.0-beta.18",
4
+ "version": "1.1.0-beta.20",
5
5
  "description": "A comprehensive workflow for systematic bug and failing test investigation that prevents LLMs from jumping to conclusions. Enforces thorough evidence gathering, hypothesis formation, debugging instrumentation, and validation to achieve near 100% certainty about root causes. This workflow does NOT fix bugs - it produces detailed diagnostic writeups that enable effective fixing by providing complete understanding of what is happening, why it's happening, and supporting evidence.",
6
6
  "clarificationPrompts": [
7
7
  "What type of system is this? (web app, mobile app, backend service, desktop app, etc.)",
@@ -32,6 +32,10 @@
32
32
  "\"I found the bug\" = YOU HAVE A GUESS. \"I proved the bug\" = YOU HAVE EVIDENCE FROM PHASES 3-5.",
33
33
  "FINDING \u2260 DONE. PROVING = DONE. Only after completing instrumentation, evidence collection, and validation do you have proof.",
34
34
  "NEVER say \"I've identified the root cause\" and stop. That is a THEORY, not PROOF. Continue to evidence collection.",
35
+ "**\ud83d\udea8 USER SAYS: NO RATIONALIZATION. NO EXCEPTIONS. NO \"BUT IN MY CASE...\":**",
36
+ "DO NOT say \"However, given that I have...\" or \"Let me do a targeted Phase X...\" or \"Based on my high confidence...\"",
37
+ "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION. Complete ALL 23 steps. Complete ALL 5 analysis iterations.",
38
+ "USER SAYS: \"I found the bug early\" = ALL THE MORE REASON to validate it properly through ALL phases. Quick conclusions are WRONG 90% of the time.",
35
39
  "DO NOT create \"summary documents\" or \"diagnostic writeups\" until Phase 6. That is SKIPPING THE WORKFLOW.",
36
40
  "**\ud83c\udfaf PHASE 0 = PURE SETUP (NO ANALYSIS):**",
37
41
  "Phase 0 is MECHANICAL SETUP ONLY: triage, user preferences, tool checking, context creation. No code analysis, no assumption checking. That comes in Phase 1.",
@@ -52,7 +56,10 @@
52
56
  "DO NOT SKIP PHASES: Even with high confidence, you must complete hypothesis generation (Phase 2), instrumentation (Phase 3), evidence collection (Phase 4), analysis (Phase 5), and writeup (Phase 6).",
53
57
  "PHASE PROGRESSION: An investigation that stops at triage (Phase 0) or hypothesis formation (Phase 2) or evidence collection (Phase 4) is INCOMPLETE - the diagnostic writeup is the required deliverable.",
54
58
  "**HIGH AUTO MODE DISCIPLINE:**",
55
- "In HIGH automation mode, agents must execute phases WITHOUT asking for permission between phases. Asking 'Would you like me to continue?' or 'Should I proceed to Phase X?' implies the workflow is optional - IT IS NOT. The ONLY confirmations allowed",
59
+ "**HIGH AUTO MODE DISCIPLINE:**\nIn HIGH automation mode, agents must execute phases WITHOUT asking permission between phases. This means: proceed automatically from Phase 1\u21922\u21923\u21924\u21925\u21926. HIGH AUTO \u2260 PERMISSION TO SKIP PHASES. HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES.",
60
+ "**CRITICAL: HIGH AUTOMATION \u2260 AUTONOMY TO SKIP:**",
61
+ "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip.",
62
+ "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases it thinks are unnecessary.",
56
63
  "are: (1) Phase 0e early termination decision, (2) Phase 4a controlled experiments. All other phases execute automatically based on the systematic workflow structure.",
57
64
  "**FUNCTION DEFINITIONS:**",
58
65
  "fun instrumentCode(location, hypothesis) = 'Add debug logs at {location} for {hypothesis}. Format: ClassName.method [{hypothesis}]: message. Include timestamp, thread ID if concurrent.'",
@@ -89,7 +96,7 @@
89
96
  "LOG ANALYSIS OFFLOADING: For voluminous logs (>500 lines), offload analysis to sub-chats with structured prompts. See Phase 4 for detailed sub-analysis implementation.",
90
97
  "RECURSION DEPTH: Limit recursive analysis to 3 levels deep to prevent analysis paralysis while ensuring thoroughness.",
91
98
  "INVESTIGATION BOUNDS: If investigation exceeds 20 steps or 4 hours without root cause, pause and reassess approach with user.",
92
- "AUTOMATION LEVELS: High=auto-approve >8.0 confidence decisions, Medium=standard confirmations, Low=extra confirmations for safety. Control workflow autonomy based on user preference.",
99
+ "AUTOMATION LEVELS: High=execute phases automatically WITHOUT asking permission between phases (but MUST complete ALL phases), Medium=standard confirmations, Low=extra confirmations for safety.",
93
100
  "CONTEXT DOCUMENTATION: Maintain INVESTIGATION_CONTEXT.md throughout. Update after major milestones, failures, or user interventions to enable seamless resumption and handoffs. Include explicit resumption instructions using workflow_get and workflow_next.",
94
101
  "GIT FALLBACK STRATEGY: If git unavailable, gracefully skip commits/branches, log changes manually in CONTEXT.md with timestamps, warn user, document modifications for manual control.",
95
102
  "GIT ERROR HANDLING: Use run_terminal_cmd for git operations; if fails, output exact command for user manual execution. Never halt investigation due to git unavailability.",
@@ -280,7 +287,13 @@
280
287
  "requireConfirmation": false
281
288
  }
282
289
  ],
283
- "requireConfirmation": false
290
+ "requireConfirmation": false,
291
+ "guidance": [
292
+ "\ud83d\udea8 USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early even if you think you found the bug.",
293
+ "DO NOT rationalize: 'I have high confidence so I can do a targeted Phase 2.' NO. Complete all 5 iterations FIRST.",
294
+ "Agents who skip analysis iterations are wrong ~95% of the time. The later iterations catch edge cases and alternative explanations.",
295
+ "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5."
296
+ ]
284
297
  },
285
298
  {
286
299
  "id": "phase-1a-binary-search",