@exaudeus/workrail 3.12.0 → 3.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/dist/console/assets/{index-CRgjJiMS.js → index-EsSXrC_a.js} +11 -11
  2. package/dist/console/index.html +1 -1
  3. package/dist/di/container.js +8 -0
  4. package/dist/di/tokens.d.ts +1 -0
  5. package/dist/di/tokens.js +1 -0
  6. package/dist/infrastructure/session/HttpServer.js +2 -14
  7. package/dist/manifest.json +93 -53
  8. package/dist/mcp/boundary-coercion.d.ts +2 -0
  9. package/dist/mcp/boundary-coercion.js +73 -0
  10. package/dist/mcp/handler-factory.d.ts +1 -1
  11. package/dist/mcp/handler-factory.js +13 -6
  12. package/dist/mcp/handlers/v2-manage-workflow-source.d.ts +7 -0
  13. package/dist/mcp/handlers/v2-manage-workflow-source.js +50 -0
  14. package/dist/mcp/handlers/v2-workflow.d.ts +3 -0
  15. package/dist/mcp/handlers/v2-workflow.js +58 -0
  16. package/dist/mcp/output-schemas.d.ts +93 -0
  17. package/dist/mcp/output-schemas.js +8 -1
  18. package/dist/mcp/server.js +2 -0
  19. package/dist/mcp/tool-descriptions.js +20 -0
  20. package/dist/mcp/tools.js +6 -0
  21. package/dist/mcp/types/tool-description-types.d.ts +1 -1
  22. package/dist/mcp/types/tool-description-types.js +1 -0
  23. package/dist/mcp/types/workflow-tool-edition.d.ts +1 -1
  24. package/dist/mcp/types.d.ts +2 -0
  25. package/dist/mcp/v2/tool-registry.js +8 -0
  26. package/dist/mcp/v2/tools.d.ts +12 -0
  27. package/dist/mcp/v2/tools.js +7 -1
  28. package/dist/types/workflow-definition.d.ts +1 -0
  29. package/dist/v2/infra/in-memory/managed-source-store/index.d.ts +8 -0
  30. package/dist/v2/infra/in-memory/managed-source-store/index.js +33 -0
  31. package/dist/v2/infra/local/data-dir/index.d.ts +2 -0
  32. package/dist/v2/infra/local/data-dir/index.js +6 -0
  33. package/dist/v2/infra/local/managed-source-store/index.d.ts +15 -0
  34. package/dist/v2/infra/local/managed-source-store/index.js +164 -0
  35. package/dist/v2/ports/data-dir.port.d.ts +2 -0
  36. package/dist/v2/ports/managed-source-store.port.d.ts +25 -0
  37. package/dist/v2/ports/managed-source-store.port.js +2 -0
  38. package/package.json +2 -1
  39. package/spec/authoring-spec.json +9 -2
  40. package/spec/workflow.schema.json +418 -96
  41. package/workflows/adaptive-ticket-creation.json +276 -282
  42. package/workflows/document-creation-workflow.json +70 -191
  43. package/workflows/documentation-update-workflow.json +59 -309
  44. package/workflows/intelligent-test-case-generation.json +37 -212
  45. package/workflows/personal-learning-materials-creation-branched.json +1 -21
  46. package/workflows/presentation-creation.json +143 -308
  47. package/workflows/relocation-workflow-us.json +161 -535
  48. package/workflows/scoped-documentation-workflow.json +110 -181
  49. package/workflows/workflow-for-workflows.v2.json +72 -16
  50. package/workflows/CHANGELOG-bug-investigation.md +0 -298
  51. package/workflows/bug-investigation.agentic.json +0 -212
  52. package/workflows/bug-investigation.json +0 -112
  53. package/workflows/mr-review-workflow.agentic.json +0 -538
  54. package/workflows/mr-review-workflow.json +0 -277
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "id": "workflow-for-workflows",
3
3
  "name": "Workflow Authoring Workflow (Quality Gate v2)",
4
- "version": "2.1.0",
4
+ "version": "2.3.0",
5
5
  "description": "Guides an agent through authoring or modernizing a WorkRail workflow with a stronger quality gate: understand the task, define effectiveness targets, design both workflow and quality architecture, draft, validate, simulate execution, run adversarial review, redesign if needed, and only then hand off.",
6
6
  "recommendedPreferences": {
7
7
  "recommendedAutonomy": "guided",
@@ -28,6 +28,8 @@
28
28
  "ARTIFACT STRATEGY: the workflow JSON file is the primary output. Intermediate notes go in output.notesMarkdown. Do not create extra planning artifacts unless the workflow is genuinely complex.",
29
29
  "V2 DURABILITY: use output.notesMarkdown as the primary durable record. Do not mirror execution state into CONTEXT.md or markdown checkpoint files.",
30
30
  "ANTI-PATTERNS TO AVOID IN AUTHORED WORKFLOWS: no pseudo-function metaGuidance, no learning-path branching, no satisfaction-score loops, no heavy clarification batteries, no regex-as-primary-validation, no celebration phases.",
31
+ "MODERNIZATION DISTINCTION: remove format problems (pseudo-DSL, regex, A/B phases). Preserve or equivalently replace behavioral mechanisms (forcing functions, hard gates, domain knowledge). Never silently drop a mechanism that prevents a real failure mode.",
32
+ "EQUIVALENT REPLACEMENT: a replacement only qualifies if it prevents the same failure mode with similar enforcement strength. A rubric suggestion is not equivalent to a hard gate. Document the tradeoff explicitly when the replacement is weaker.",
31
33
  "NEVER COMMIT MARKDOWN FILES UNLESS USER EXPLICITLY ASKS."
32
34
  ],
33
35
  "references": [
@@ -112,7 +114,10 @@
112
114
  "goal": "Understand what workflow you are authoring or modernizing, and classify the task before you design anything.",
113
115
  "constraints": [
114
116
  [
115
- { "kind": "ref", "refId": "wr.refs.notes_first_durability" }
117
+ {
118
+ "kind": "ref",
119
+ "refId": "wr.refs.notes_first_durability"
120
+ }
116
121
  ],
117
122
  "Explore first. Ask the user only what you genuinely cannot determine with tools and references.",
118
123
  "Choose baselines as models, not templates. Copy structural patterns, not another workflow's domain voice."
@@ -123,11 +128,12 @@
123
128
  "Classify the target workflow archetype: `review_audit`, `coding_execution`, `diagnostic_investigation`, `planning_design`, `linear_operational`, or `content_analysis`.",
124
129
  "Classify `workflowComplexity`: Simple, Medium, or Complex. Classify `rigorMode`: QUICK, STANDARD, or THOROUGH.",
125
130
  "Choose an `authoringBaseline` for engine-native authoring quality and an `outcomeBaseline` for the kind of job the authored workflow should perform. If no good baseline exists for one of them, set it to `none` and explain why.",
131
+ "If `authoringMode = modernize_existing`, build a value inventory BEFORE forming opinions about what to change. Read the original and classify each meaningful mechanism: (1) enforcement mechanisms (forcing functions, hard gates, required outputs), (2) domain knowledge (problem-specific principles the agent would not otherwise know), (3) behavioral rules (persistent constraints on how the agent works). This inventory is the preservation checklist.",
126
132
  "If `authoringMode = modernize_existing`, identify what must stay the same about purpose, what feels stale, and what modernization constraints apply."
127
133
  ],
128
134
  "outputRequired": {
129
135
  "notesMarkdown": "Task understanding, baseline choices, patterns to borrow or avoid, and any real open questions.",
130
- "context": "Capture authoringMode, workflowArchetype, workflowComplexity, rigorMode, taskDescription, intendedAudience, successCriteria, domainConstraints, targetWorkflowPath, modernizationGoals, authoringBaseline, outcomeBaseline, baselineDecisionRationale, authoringPatternsToBorrow, outcomePatternsToBorrow, patternsToAvoid, openQuestions."
136
+ "context": "Capture authoringMode, workflowArchetype, workflowComplexity, rigorMode, taskDescription, intendedAudience, successCriteria, domainConstraints, targetWorkflowPath, modernizationGoals, authoringBaseline, outcomeBaseline, baselineDecisionRationale, authoringPatternsToBorrow, outcomePatternsToBorrow, patternsToAvoid, openQuestions, and valueInventory (modernize_existing only)."
131
137
  },
132
138
  "verify": [
133
139
  "The task is understood well enough to design the workflow without guessing blindly.",
@@ -178,11 +184,11 @@
178
184
  "Decide the phase list, one-line goal for each phase, and overall ordering.",
179
185
  "Design loops with explicit exit rules, bounded maxIterations, and real reasons for another pass.",
180
186
  "Decide confirmation gates, delegation vs template injection vs direct execution, promptFragments, references, artifacts, and metaGuidance.",
181
- "If `authoringMode = modernize_existing`, decide whether the plan is preserve-in-place, restructure, or rewrite, and map legacy behaviors as `keep`, `merge`, `remove`, or `replace`."
187
+ "If `authoringMode = modernize_existing`, decide preserve-in-place, restructure, or rewrite. For each item in valueInventory, record: `preserved` (structurally present with equivalent enforcement), `replaced` (new mechanism prevents same failure mode \u2014 justify equivalence), or `dropped` (intentionally removed \u2014 justify the loss). Phase-level mapping alone is insufficient; track what was inside each restructured or removed phase."
182
188
  ],
183
189
  "outputRequired": {
184
190
  "notesMarkdown": "Structured workflow outline, loop design, confirmation design, delegation design, artifact plan, and modernization mapping.",
185
- "context": "Capture workflowOutline, loopDesign, confirmationDesign, delegationDesign, artifactPlan, contextModel, voiceStrategy, routineAudit, delegationBoundaries, templateInjectionPlan, modernizationStrategy, legacyMapping, and behaviorPreservationNotes."
191
+ "context": "Capture workflowOutline, loopDesign, confirmationDesign, delegationDesign, artifactPlan, contextModel, voiceStrategy, routineAudit, delegationBoundaries, templateInjectionPlan, modernizationStrategy, legacyMapping, behaviorPreservationNotes, and valuePreservationMap (modernize_existing only)."
186
192
  },
187
193
  "verify": [
188
194
  "The authored workflow architecture is coherent before JSON drafting begins."
@@ -191,14 +197,23 @@
191
197
  "promptFragments": [
192
198
  {
193
199
  "id": "phase-2-simple-direct",
194
- "when": { "var": "workflowComplexity", "equals": "Simple" },
200
+ "when": {
201
+ "var": "workflowComplexity",
202
+ "equals": "Simple"
203
+ },
195
204
  "text": "For Simple workflows, keep the architecture linear and compact. Do not invent loops or ceremony unless the task truly needs them."
196
205
  }
197
206
  ],
198
207
  "requireConfirmation": {
199
208
  "or": [
200
- { "var": "workflowComplexity", "not_equals": "Simple" },
201
- { "var": "rigorMode", "not_equals": "QUICK" }
209
+ {
210
+ "var": "workflowComplexity",
211
+ "not_equals": "Simple"
212
+ },
213
+ {
214
+ "var": "rigorMode",
215
+ "not_equals": "QUICK"
216
+ }
202
217
  ]
203
218
  }
204
219
  },
@@ -227,8 +242,14 @@
227
242
  },
228
243
  "requireConfirmation": {
229
244
  "or": [
230
- { "var": "rigorMode", "equals": "THOROUGH" },
231
- { "var": "workflowComplexity", "equals": "Complex" }
245
+ {
246
+ "var": "rigorMode",
247
+ "equals": "THOROUGH"
248
+ },
249
+ {
250
+ "var": "workflowComplexity",
251
+ "equals": "Complex"
252
+ }
232
253
  ]
233
254
  }
234
255
  },
@@ -258,7 +279,10 @@
258
279
  "promptFragments": [
259
280
  {
260
281
  "id": "phase-4-simple-fast",
261
- "when": { "var": "workflowComplexity", "equals": "Simple" },
282
+ "when": {
283
+ "var": "workflowComplexity",
284
+ "equals": "Simple"
285
+ },
262
286
  "text": "For Simple workflows, keep the file compact and linear. Do not create extra metaGuidance or loops unless the task truly needs them."
263
287
  }
264
288
  ],
@@ -303,7 +327,10 @@
303
327
  "promptFragments": [
304
328
  {
305
329
  "id": "phase-5a-thorough",
306
- "when": { "var": "rigorMode", "equals": "THOROUGH" },
330
+ "when": {
331
+ "var": "rigorMode",
332
+ "equals": "THOROUGH"
333
+ },
307
334
  "text": "After structural validation passes, also check the workflow manually against required-level authoring-spec rules and fix any failures before moving on."
308
335
  }
309
336
  ],
@@ -405,7 +432,9 @@
405
432
  "procedure": [
406
433
  "Trace the authored workflow step by step against the user's actual task or the closest realistic scenario.",
407
434
  "For each step, ask: what would the agent actually do, what context would it have, what would it likely produce, and what would the next step inherit?",
435
+ "Also trace at least one degraded or edge-case path \u2014 not just the happy path. Ask: what happens when a condition evaluates unexpectedly, a loop has nothing to iterate, a runCondition skips a phase, or the user provides minimal input? Quality gates that only protect the happy path are not quality gates.",
408
436
  "Identify likely weak steps, likely unsatisfying outputs, and likely false-confidence modes.",
437
+ "For any loop in the workflow, explicitly check: does the exit condition have structural teeth (artifact contract, bounded maxIterations), or does it rely on prose instructions the engine cannot enforce?",
409
438
  "Fix issues directly in the workflow file when the right improvement is clear."
410
439
  ],
411
440
  "outputRequired": {
@@ -419,8 +448,19 @@
419
448
  "promptFragments": [
420
449
  {
421
450
  "id": "phase-6b-quick",
422
- "when": { "var": "rigorMode", "equals": "QUICK" },
451
+ "when": {
452
+ "var": "rigorMode",
453
+ "equals": "QUICK"
454
+ },
423
455
  "text": "For QUICK rigor, keep the simulation compact but still answer where the workflow would likely disappoint the user if it disappointed them at all."
456
+ },
457
+ {
458
+ "id": "phase-6b-modernize-check",
459
+ "when": {
460
+ "var": "authoringMode",
461
+ "equals": "modernize_existing"
462
+ },
463
+ "text": "For modernize_existing: after tracing the workflow forward, check each item in valueInventory. For each enforcement mechanism and domain knowledge item: would the modernized workflow produce the same behavior? Any item where the answer is no or weaker is a loss \u2014 fix it directly or record the accepted tradeoff with justification."
424
464
  }
425
465
  ],
426
466
  "requireConfirmation": false
@@ -435,9 +475,10 @@
435
475
  "Reviewer-family or validator output is evidence, not authority."
436
476
  ],
437
477
  "procedure": [
438
- "Score these dimensions 0-2 with one sentence of evidence each: `voiceClarity`, `ceremonyLevel`, `loopSoundness`, `delegationBoundedness`, `artifactClarity`, `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, `handoffUtility`, and `modernizationDiscipline`.",
478
+ "Score these dimensions 0-2 with one sentence of evidence each: `voiceClarity`, `ceremonyLevel`, `loopSoundness`, `delegationBoundedness`, `artifactClarity`, `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, `handoffUtility`, `rigorAdaptability` (0 = adapts to complexity/rigor levels, 2 = single-weight), `enforcementStrength` (0 = behavioral rules have structural teeth; 2 = important rules are prose-only with no enforcement mechanism), and `modernizationDiscipline` (0 = every valueInventory item preserved, equivalently replaced with justification, or dropped with justification; 2 = items missing or replaced with weaker versions without justification \u2014 score 0 for create mode).",
439
479
  "If delegation is available and rigor is THOROUGH, run an adversarial review bundle with these lenses: `engine_native_reviewer`, `task_effectiveness_reviewer`, `state_economy_reviewer`, `false_confidence_reviewer`, `domain_fit_reviewer`, and `maintainer_reviewer`.",
440
480
  "Synthesize what the review confirmed, what it challenged, and what changed your mind.",
481
+ "When scoring `falseConfidenceResistance`, explicitly check: do the workflow's quality gates protect edge cases and degraded paths, or only the happy path? A workflow that passes its own checks on ideal input but fails silently on minimal or unexpected input scores 2.",
441
482
  "Set hard-gate failures whenever any of these are materially weak: `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, or `handoffUtility`.",
442
483
  "Set `authoringIntegrityPassed = true` only if structural and authoring-quality dimensions are all acceptable. Set `outcomeEffectivenessPassed = true` only if the workflow is likely to achieve satisfying results for the user."
443
484
  ],
@@ -452,13 +493,27 @@
452
493
  "promptFragments": [
453
494
  {
454
495
  "id": "phase-6c-standard",
455
- "when": { "var": "rigorMode", "equals": "STANDARD" },
496
+ "when": {
497
+ "var": "rigorMode",
498
+ "equals": "STANDARD"
499
+ },
456
500
  "text": "For STANDARD rigor, you may keep the review self-executed unless uncertainty remains material. If you do delegate, prefer a small adversarial bundle."
457
501
  },
458
502
  {
459
503
  "id": "phase-6c-thorough",
460
- "when": { "var": "rigorMode", "equals": "THOROUGH" },
504
+ "when": {
505
+ "var": "rigorMode",
506
+ "equals": "THOROUGH"
507
+ },
461
508
  "text": "For THOROUGH rigor, assume the first review is not enough. Use adversarial reviewer lanes unless a hard limitation makes them impossible."
509
+ },
510
+ {
511
+ "id": "phase-6c-heritage-review",
512
+ "when": {
513
+ "var": "authoringMode",
514
+ "equals": "modernize_existing"
515
+ },
516
+ "text": "For modernize_existing: add a heritage_reviewer to the adversarial bundle. Its job is to check each valueInventory item and find what was lost or weakened \u2014 it ignores format improvements. It must answer: which enforcement mechanisms are now prose-only? Which domain knowledge items are absent? Which behavioral rules were removed without equivalent replacement? Heritage_reviewer findings drive enforcementStrength and modernizationDiscipline scores."
462
517
  }
463
518
  ],
464
519
  "requireConfirmation": false
@@ -526,6 +581,7 @@
526
581
  "Keep it concise. The workflow file is the deliverable, not the summary."
527
582
  ],
528
583
  "procedure": [
584
+ "Stamp the workflow file: read the current `version` from `spec/authoring-spec.json` and write `validatedAgainstSpecVersion: <N>` as a top-level field in the workflow JSON. Commit the change \u2014 the stamp has no effect if only saved locally.",
529
585
  "State the workflow file path and name, whether it was created or modernized, and what it does in one sentence.",
530
586
  "Summarize the step structure, loops, confirmations, and delegation profile.",
531
587
  "Report validation status, authoring-integrity status, and outcome-effectiveness status.",
@@ -1,298 +0,0 @@
1
- # Changelog - Systematic Bug Investigation Workflow
2
-
3
- ## [1.1.0-beta.22] - 2025-01-06
4
-
5
- ### CRITICAL FIX - Invalid Loop Step Schema
6
- - **ROOT CAUSE**: In beta.19, we added `guidance` to the loop step, but loop steps DON'T support guidance in the schema
7
- - Schema allows: `id`, `type`, `title`, `loop`, `body`, `functionDefinitions`, `requireConfirmation`, `runCondition`
8
- - Does NOT allow: `guidance`, `prompt`, `agentRole`
9
- - **Fix**: Moved loop enforcement guidance to first body step (`analysis-neighborhood-contracts`)
10
- - "USER SAYS: This loop MUST complete ALL 5 iterations..."
11
- - Now properly enforced on each iteration
12
- - **Validation**: Workflow now passes full schema validation
13
-
14
- ### Why This Matters
15
- Without proper validation, the MCP server couldn't load the workflow at all. Beta.19-21 were broken due to schema violations.
16
-
17
- ## [1.1.0-beta.21] - 2025-01-06
18
-
19
- ### HOTFIX - metaGuidance Schema Violations
20
- - **Fixed**: metaGuidance entry 35 exceeded 256 character limit (266 chars)
21
- - Split "HIGH AUTO MODE DISCIPLINE" into 3 separate entries
22
- - **Fixed**: Duplicate metaGuidance entries after split
23
- - Removed duplicates, cleaned to 89 unique entries
24
- - **Note**: CLI validator reports loop step errors (false positive - loops have different schema)
25
- - Workflow loads successfully in MCP server
26
- - Same loop structure as beta.18 which worked fine
27
-
28
- ## [1.1.0-beta.20] - 2025-01-06
29
-
30
- ### CRITICAL FIX - Dangerous "Autonomy" Language
31
- - **ROOT CAUSE IDENTIFIED**: Our automation level descriptions were giving agents permission to skip!
32
- - OLD: "High=**auto-approve >8.0 confidence decisions**"
33
- - Interpreted as: "I have 9/10 confidence → I can approve my decision to skip phases"
34
- - OLD: "Control workflow **autonomy**"
35
- - Interpreted as: "High mode gives me autonomy to decide what to skip"
36
-
37
- ### Language Fixes
38
- 1. **Removed "auto-approve decisions"**: Changed to "execute phases automatically WITHOUT asking permission between phases"
39
- 2. **Removed "autonomy"**: Changed to "Control confirmation frequency"
40
- 3. **Clarified HIGH AUTO MODE**:
41
- - NEW: "HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES"
42
- - NEW: "HIGH AUTO ≠ PERMISSION TO SKIP PHASES"
43
- 4. **Explicit USER SAYS**:
44
- - "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip."
45
- - "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases."
46
-
47
- ### Credit
48
- User insight: "Could the high automation be causing it to do this? do we frame it as letting it do whatever it wants?" - YES, we were!
49
-
50
- ## [1.1.0-beta.19] - 2025-01-06
51
-
52
- ### CRITICAL FIX - Anti-Rationalization
53
- - **NEW PATTERN DETECTED**: Agents now **acknowledge** the warnings but then **rationalize** why they don't apply
54
- - Example: "I know finding ≠ done... **However, given that I have high confidence...**"
55
- - Example: "Let me proceed with a **more targeted Phase 2**..." (skipping remaining iterations)
56
- - **Problem**: Agents stopped at **iteration 2 of 5** in Phase 1 loop - didn't even finish the analysis phase!
57
- - **Root Cause**: Agents think they can judge when to skip based on their "special" situation
58
-
59
- ### New Anti-Rationalization Safeguards
60
- 1. **Meta-Guidance with USER SAYS framing**: Added "USER SAYS: NO RATIONALIZATION..." section
61
- - **Why USER SAYS**: Agents follow direct user commands more reliably than abstract principles
62
- - "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION."
63
- - "USER SAYS: 'I found the bug early' = ALL THE MORE REASON to validate properly"
64
- - Explicitly forbids phrases like "However, given that..." or "targeted Phase X"
65
-
66
- 2. **Loop Enforcement with USER SAYS** (Phase 1 - 5 iterations):
67
- - "USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early."
68
- - "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5."
69
- - "Agents who skip analysis iterations are wrong ~95% of the time."
70
-
71
- ### Meta-Learning Moment
72
- During implementation, the AI implementing this fix attempted to skip validation by rationalizing "the workflow structure is fine, let me just publish" - demonstrating the EXACT behavior this fix prevents! This validates the need for explicit USER SAYS framing.
73
-
74
- ### Why This Is Different
75
- - Beta.18 addressed goal misunderstanding ("finding" vs "proving")
76
- - Beta.19 addresses **rationalization** - agents who acknowledge the rules but think they're exceptions
77
- - Targets the "smart agent" problem: "I understand the principle, BUT in my case..."
78
-
79
- ## [1.1.0-beta.18] - 2025-01-06
80
-
81
- ### CRITICAL FIX
82
- - **Addresses persistent early-stopping bug**: Agents were still stopping after Phase 1/2 saying "I found the bug"
83
- - **Root Cause Identified**: Agents fundamentally misunderstand THE GOAL
84
- - WRONG: "The goal is finding the bug" → Stop after analysis with high confidence
85
- - RIGHT: "The goal is PROVING the bug with evidence" → Must complete Phases 3-5
86
- - **New Meta-Guidance Section**: Added explicit "CRITICAL MISUNDERSTANDING TO AVOID" section
87
- - "FINDING ≠ DONE. PROVING = DONE."
88
- - "\"I found the bug\" = YOU HAVE A GUESS. \"I proved the bug\" = YOU HAVE EVIDENCE."
89
- - "NEVER create summary documents until Phase 6"
90
- - **Step-Level Warnings**: Added "FINDING ≠ PROVING" warnings at all critical stopping points:
91
- - **Phase 1f** (after analysis): Full explanation of why analysis ≠ proof
92
- - **Phase 2a** (hypothesis development): "You have THEORIES, not EVIDENCE"
93
- - **Phase 2h** (midpoint): "You may have 'found' the bug, but haven't 'proved' it"
94
- - **Step Count Corrections**: Fixed inconsistencies (27 → 23 steps throughout)
95
-
96
- ### Why This Fix Is Different
97
- Previous fixes (beta.1-beta.17) added warnings about "high confidence ≠ done" but didn't address the fundamental goal misunderstanding. Agents thought their job was to "identify" the bug, not "prove" it. This fix makes the distinction crystal clear upfront.
98
-
99
- ## [1.1.0-beta.17] - 2025-01-06
100
-
101
- ### Major Restructuring
102
- - **Phase 0 Consolidation**: Merged 4 separate Phase 0 steps into single comprehensive setup step
103
- - Combined: Triage (0), User Preferences (0a), Tool Check (0b), Context Creation (0c)
104
- - Result: Single "Phase 0: Complete Investigation Setup" step covering all mechanical preparation
105
- - Rationale: Reduce workflow overhead while maintaining thorough setup
106
- - New structure: Phase 0 (Setup) → Phase 0a (Commitment Checkpoint, conditional)
107
-
108
- - **Assumption Verification Relocation**: Moved from Phase 0a to Phase 1f
109
- - Previously: Early assumption check before ANY code analysis (removed)
110
- - Now: Assumption verification AFTER all 5 analysis iterations complete (Phase 1f Step 2.5)
111
- - Rationale: Assumptions can only be properly verified with full code context
112
- - Timing: Happens after neighborhood mapping, pattern analysis, component ranking, data flow tracing, and test gap analysis
113
- - Location: Integrated into Phase 1f "Final Breadth & Scope Verification" before hypothesis development
114
-
115
- ### Impact
116
- - **Step Count**: Reduced from 27 steps to 23 steps (4 Phase 0 steps → 1)
117
- - **Phase Numbering**: Simplified Phase 0 structure (Phase 0d → Phase 0a)
118
- - **Debugging Workflow Alignment**: Better follows traditional debugging principles (observe fully THEN question assumptions THEN hypothesize)
119
- - **Agent Experience**: Faster setup phase, more informed assumption checking
120
-
121
- ### Breaking Changes
122
- - `completedSteps` array format changed:
123
- - OLD: `["phase-0-triage", "phase-0a-user-preferences", "phase-0b-tool-check", "phase-0c-create-context", "phase-0d-workflow-commitment"]`
124
- - NEW: `["phase-0-complete-setup", "phase-0a-workflow-commitment"]`
125
- - Step IDs changed: `phase-0d-workflow-commitment` → `phase-0a-workflow-commitment`
126
-
127
- ## [1.1.0-beta.9] - 2025-01-06
128
-
129
- ### Enhanced
130
- - **CRITICAL**: Strengthened anti-premature-completion safeguards throughout the workflow
131
- - Added explicit "ANALYSIS ≠ DIAGNOSIS ≠ PROOF" section in metaGuidance
132
- - Phase 1f: Added "DO NOT STOP HERE" warning emphasizing ZERO PROOF after analysis (~25% done)
133
- - Phase 2a: Added "YOU ARE NOT DONE" warning with 5-point reminder about mandatory validation
134
- - Phase 2h: Added "YOU ARE HALFWAY DONE (~50%)" warning before instrumentation phase
135
- - Clarified progression: Analysis (20%) → Hypotheses (40%) → Evidence (80%) → Writeup (100%)
136
- - Reinforced: Even with "100% confidence," stopping before evidence collection = providing guesses, not diagnosis
137
-
138
- ### Context
139
- - **Problem**: Agents were stopping after Phase 1 or 2 when they reached "100% confidence" in analysis/hypotheses
140
- - **Root Cause**: Agents conflating "confident theory" with "proven diagnosis"
141
- - **Solution**: Explicit warnings at every potential stopping point emphasizing lack of proof until Phases 3-5 complete
142
- - **Impact**: Forces agents to understand that analysis/hypotheses are NOT evidence, and professional practice requires validation
143
-
144
- ## [1.1.0-beta.8] - 2025-01-06
145
-
146
- ### Fixed
147
- - **CRITICAL**: Fixed loop execution bug where body steps with `runCondition` using iteration variables were completely skipped
148
- - Root cause: Loop variables (e.g., `analysisPhase`) were being injected AFTER evaluating runConditions, causing all conditions to fail
149
- - Impact: Phase 1's 5-iteration analysis loop was being entirely skipped, jumping straight to Phase 1f
150
- - Fix: Reordered logic to inject loop variables BEFORE evaluating body step runConditions
151
- - Also fixed: Pre-existing bug where single-step loop bodies didn't increment iterations properly
152
- - Test coverage: Added comprehensive integration tests (`loop-runCondition-bug.test.ts`) to prevent regression
153
-
154
- ## [1.1.0-beta.7] - 2025-01-06
155
-
156
- ### Fixed
157
- - **HOTFIX**: Corrected Phase 0e `runCondition` to use `not_equals` instead of invalid `notEquals` operator
158
- - Phase 0e now properly executes only when `automationLevel != 'High'`
159
- - High automation mode now proceeds through all phases without early termination checkpoint
160
-
161
- ## [1.1.0-beta.6] - 2025-01-06
162
-
163
- ### Added
164
- - **New Phase 1f**: Final Breadth & Scope Verification checkpoint after codebase analysis
165
- - Prevents tunnel vision by forcing scope sanity checks before hypothesis development
166
- - Requires evaluation of 2-3 alternative investigation scopes
167
- - Catches the #1 cause of wrong conclusions: looking in wrong place or too narrowly
168
- - Positioned strategically after Phase 1 analysis and before Phase 2 hypothesis formation
169
-
170
- ### Enhanced
171
- - **Phase 3 (Instrumentation)**: Dramatically expanded with concrete, step-by-step instructions
172
- - Language-specific code examples (JavaScript/TypeScript, Python, Java)
173
- - Detailed `search_replace` usage examples for applying instrumentation
174
- - Hypothesis-specific prefixes ([H1], [H2], [H3]) with standard formatting
175
- - File-by-file workflow: read → locate → instrument → verify
176
- - Fallback strategy if edit tools unavailable
177
- - Instrumentation checklist for tracking progress
178
-
179
- - **Phase 4 (Evidence Collection)**: Comprehensive decision tree and 7-step process
180
- - **OPTION A**: Agent can execute code → 4-step execution workflow
181
- - **OPTION B**: Agent cannot execute → User instruction template
182
- - Clear instructions on when to use each approach
183
- - Log consolidation and evidence organization by hypothesis
184
- - Evidence quality assessment (1-10 scale)
185
-
186
- - **metaGuidance**: Added explicit high auto mode discipline
187
- - Clarified that agents should not ask for permission between phases in high auto mode
188
- - Exception: Phase 0e early termination and Phase 4a controlled experiments
189
- - Reinforced that asking "should I continue?" implies investigation is optional (it is NOT)
190
-
191
- ### Changed
192
- - Total workflow steps: 26 steps (added Phase 1f)
193
- - Phase 1 analysis loop: Now clearly labeled as "Analysis 1/5" through "Analysis 5/5"
194
-
195
- ## [1.1.0-beta.5] - 2025-01-06
196
-
197
- ### Changed
198
- - **Phase 0e Relocation**: Moved early termination checkpoint from Phase 5b to Phase 0e (after triage)
199
- - Now appears immediately after setup, before any investigation work begins
200
- - Eliminates sunk cost fallacy (decision at 5% vs 90% completion)
201
- - Forces upfront decision-making about workflow commitment
202
-
203
- ### Added
204
- - **Mandatory User Communication**: Phase 0e now requires agents to explicitly tell users about 90% accuracy difference
205
- - Template message is NOT optional - agents MUST communicate this
206
- - User must explicitly confirm proceeding with full investigation
207
-
208
- ### Removed
209
- - **Phase 5b**: Removed old completion checkpoint (now Phase 0e)
210
- - Total workflow steps reduced from 28 to 26
211
-
212
- ## [1.1.0-beta.4] - 2025-01-05
213
-
214
- ### Enhanced
215
- - **Sophisticated Code Analysis**: Integrated advanced analysis techniques from MR review workflow into Phase 1
216
-
217
- ### Added
218
- - **New Phase 1a**: Neighborhood, Call Graph & Contracts analysis
219
- - Module root computation (nearest common ancestor, clamped to package boundary)
220
- - Neighborhood mapping (immediate neighbors, imports, tests, entry points)
221
- - Bounded call graph with HOT path ranking (Small Multiples ASCII visualization)
222
- - Flow anchors (entry points to bug: HTTP routes, CLI commands, scheduled jobs, event handlers)
223
- - Contracts & invariants discovery (API symbols, endpoints, database tables, stated guarantees)
224
-
225
- - **Enhanced Phase 1 Structure**: Now 5 sub-phases (was 4)
226
- 1. Neighborhood, Call Graph & Contracts (NEW)
227
- 2. Breadth Scan (pattern discovery)
228
- 3. Deep Dive (suspicious code analysis)
229
- 4. Dependencies & Data Flow
230
- 5. Test Coverage Analysis
231
-
232
- ### Changed
233
- - Total workflow steps increased from 27 to 28 (added Phase 1a)
234
- - Phase 1 loop now iterates 5 times (was 4)
235
- - Each analysis phase now produces more structured, evidence-based outputs
236
-
237
- ## [1.1.0-beta.3] - 2025-01-05
238
-
239
- ### Fixed
240
- - **Critical**: Prevented ALL phase skipping, not just final documentation phase
241
- - Root cause: Agents didn't understand they MUST repeatedly call workflow_next
242
- - Added mandatory workflow execution instructions to metaGuidance
243
- - Added early commitment checkpoint (Phase 0e) requiring user confirmation
244
- - Reinforced evidence-based persuasion: 90% error rate for premature conclusions
245
-
246
- ### Added
247
- - **Phase 0e**: Workflow Execution Commitment checkpoint
248
- - Appears immediately after triage (before investigation begins)
249
- - Requires agent acknowledgment of workflow structure (26 steps)
250
- - Requires user confirmation to proceed with full investigation
251
- - Explicit warning: stopping early leads to wrong conclusions ~90% of time
252
-
253
- ### Enhanced
254
- - **metaGuidance**: Added comprehensive workflow execution discipline
255
- - Agents MUST call workflow_next until isComplete=true
256
- - High confidence (9-10/10) does NOT mean workflow is complete
257
- - Professional research shows 90% error rate for jumping to conclusions
258
- - Added "WHY THIS STRUCTURE EXISTS (Evidence-Based)" section
259
-
260
- ## [1.1.0-beta.2] - 2025-01-05
261
-
262
- ### Added
263
- - **Phase 5b**: Mandatory completion checkpoint with user confirmation
264
- - Prevents agents from skipping comprehensive diagnostic writeup (Phase 6)
265
- - Requires explicit acknowledgment that Phase 6 is the required deliverable
266
- - User must confirm proceeding to final documentation phase
267
-
268
- ### Enhanced
269
- - **metaGuidance**: Added critical workflow discipline instructions
270
- - Emphasized that high confidence does NOT equal completion
271
- - Clarified that Phase 6 is a mandatory deliverable, not optional
272
- - Added explicit instructions on when to set `isWorkflowComplete=true`
273
-
274
- ## [1.1.0-beta.1] - 2025-01-05
275
-
276
- ### Fixed
277
- - **Critical**: Prevented premature workflow completion
278
- - Agents were jumping to conclusions and skipping phases with high confidence
279
- - Root cause: Misinterpreting progress/confidence as final completion
280
-
281
- ### Added
282
- - **metaGuidance Section**: "CRITICAL WORKFLOW DISCIPLINE"
283
- - High confidence (9-10/10) does NOT mean completion
284
- - Agent MUST complete all phases (0-6) regardless of confidence
285
- - Only set `isWorkflowComplete=true` after Phase 6 comprehensive writeup
286
-
287
- - **Phase-Specific Warnings**:
288
- - Phase 2a (Hypothesis Formation): Warning against treating hypothesis as conclusion
289
- - Phase 5a (Confidence Assessment): Warning that 10/10 confidence still requires Phase 6
290
-
291
- ### Enhanced
292
- - **Phase 6 Instructions**: Explicit completion marking
293
- - Must set `isWorkflowComplete=true` in this phase
294
- - Must produce comprehensive diagnostic writeup
295
- - This is the ONLY phase that marks workflow as truly complete
296
-
297
- ### Changed
298
- - All phase prompts updated to reference 27 total workflow steps for clarity