@exaudeus/workrail 3.12.0 → 3.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-CRgjJiMS.js → index-EsSXrC_a.js} +11 -11
- package/dist/console/index.html +1 -1
- package/dist/di/container.js +8 -0
- package/dist/di/tokens.d.ts +1 -0
- package/dist/di/tokens.js +1 -0
- package/dist/infrastructure/session/HttpServer.js +2 -14
- package/dist/manifest.json +93 -53
- package/dist/mcp/boundary-coercion.d.ts +2 -0
- package/dist/mcp/boundary-coercion.js +73 -0
- package/dist/mcp/handler-factory.d.ts +1 -1
- package/dist/mcp/handler-factory.js +13 -6
- package/dist/mcp/handlers/v2-manage-workflow-source.d.ts +7 -0
- package/dist/mcp/handlers/v2-manage-workflow-source.js +50 -0
- package/dist/mcp/handlers/v2-workflow.d.ts +3 -0
- package/dist/mcp/handlers/v2-workflow.js +58 -0
- package/dist/mcp/output-schemas.d.ts +93 -0
- package/dist/mcp/output-schemas.js +8 -1
- package/dist/mcp/server.js +2 -0
- package/dist/mcp/tool-descriptions.js +20 -0
- package/dist/mcp/tools.js +6 -0
- package/dist/mcp/types/tool-description-types.d.ts +1 -1
- package/dist/mcp/types/tool-description-types.js +1 -0
- package/dist/mcp/types/workflow-tool-edition.d.ts +1 -1
- package/dist/mcp/types.d.ts +2 -0
- package/dist/mcp/v2/tool-registry.js +8 -0
- package/dist/mcp/v2/tools.d.ts +12 -0
- package/dist/mcp/v2/tools.js +7 -1
- package/dist/types/workflow-definition.d.ts +1 -0
- package/dist/v2/infra/in-memory/managed-source-store/index.d.ts +8 -0
- package/dist/v2/infra/in-memory/managed-source-store/index.js +33 -0
- package/dist/v2/infra/local/data-dir/index.d.ts +2 -0
- package/dist/v2/infra/local/data-dir/index.js +6 -0
- package/dist/v2/infra/local/managed-source-store/index.d.ts +15 -0
- package/dist/v2/infra/local/managed-source-store/index.js +164 -0
- package/dist/v2/ports/data-dir.port.d.ts +2 -0
- package/dist/v2/ports/managed-source-store.port.d.ts +25 -0
- package/dist/v2/ports/managed-source-store.port.js +2 -0
- package/package.json +2 -1
- package/spec/authoring-spec.json +9 -2
- package/spec/workflow.schema.json +418 -96
- package/workflows/adaptive-ticket-creation.json +276 -282
- package/workflows/document-creation-workflow.json +70 -191
- package/workflows/documentation-update-workflow.json +59 -309
- package/workflows/intelligent-test-case-generation.json +37 -212
- package/workflows/personal-learning-materials-creation-branched.json +1 -21
- package/workflows/presentation-creation.json +143 -308
- package/workflows/relocation-workflow-us.json +161 -535
- package/workflows/scoped-documentation-workflow.json +110 -181
- package/workflows/workflow-for-workflows.v2.json +72 -16
- package/workflows/CHANGELOG-bug-investigation.md +0 -298
- package/workflows/bug-investigation.agentic.json +0 -212
- package/workflows/bug-investigation.json +0 -112
- package/workflows/mr-review-workflow.agentic.json +0 -538
- package/workflows/mr-review-workflow.json +0 -277
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "workflow-for-workflows",
|
|
3
3
|
"name": "Workflow Authoring Workflow (Quality Gate v2)",
|
|
4
|
-
"version": "2.
|
|
4
|
+
"version": "2.3.0",
|
|
5
5
|
"description": "Guides an agent through authoring or modernizing a WorkRail workflow with a stronger quality gate: understand the task, define effectiveness targets, design both workflow and quality architecture, draft, validate, simulate execution, run adversarial review, redesign if needed, and only then hand off.",
|
|
6
6
|
"recommendedPreferences": {
|
|
7
7
|
"recommendedAutonomy": "guided",
|
|
@@ -28,6 +28,8 @@
|
|
|
28
28
|
"ARTIFACT STRATEGY: the workflow JSON file is the primary output. Intermediate notes go in output.notesMarkdown. Do not create extra planning artifacts unless the workflow is genuinely complex.",
|
|
29
29
|
"V2 DURABILITY: use output.notesMarkdown as the primary durable record. Do not mirror execution state into CONTEXT.md or markdown checkpoint files.",
|
|
30
30
|
"ANTI-PATTERNS TO AVOID IN AUTHORED WORKFLOWS: no pseudo-function metaGuidance, no learning-path branching, no satisfaction-score loops, no heavy clarification batteries, no regex-as-primary-validation, no celebration phases.",
|
|
31
|
+
"MODERNIZATION DISTINCTION: remove format problems (pseudo-DSL, regex, A/B phases). Preserve or equivalently replace behavioral mechanisms (forcing functions, hard gates, domain knowledge). Never silently drop a mechanism that prevents a real failure mode.",
|
|
32
|
+
"EQUIVALENT REPLACEMENT: a replacement only qualifies if it prevents the same failure mode with similar enforcement strength. A rubric suggestion is not equivalent to a hard gate. Document the tradeoff explicitly when the replacement is weaker.",
|
|
31
33
|
"NEVER COMMIT MARKDOWN FILES UNLESS USER EXPLICITLY ASKS."
|
|
32
34
|
],
|
|
33
35
|
"references": [
|
|
@@ -112,7 +114,10 @@
|
|
|
112
114
|
"goal": "Understand what workflow you are authoring or modernizing, and classify the task before you design anything.",
|
|
113
115
|
"constraints": [
|
|
114
116
|
[
|
|
115
|
-
{
|
|
117
|
+
{
|
|
118
|
+
"kind": "ref",
|
|
119
|
+
"refId": "wr.refs.notes_first_durability"
|
|
120
|
+
}
|
|
116
121
|
],
|
|
117
122
|
"Explore first. Ask the user only what you genuinely cannot determine with tools and references.",
|
|
118
123
|
"Choose baselines as models, not templates. Copy structural patterns, not another workflow's domain voice."
|
|
@@ -123,11 +128,12 @@
|
|
|
123
128
|
"Classify the target workflow archetype: `review_audit`, `coding_execution`, `diagnostic_investigation`, `planning_design`, `linear_operational`, or `content_analysis`.",
|
|
124
129
|
"Classify `workflowComplexity`: Simple, Medium, or Complex. Classify `rigorMode`: QUICK, STANDARD, or THOROUGH.",
|
|
125
130
|
"Choose an `authoringBaseline` for engine-native authoring quality and an `outcomeBaseline` for the kind of job the authored workflow should perform. If no good baseline exists for one of them, set it to `none` and explain why.",
|
|
131
|
+
"If `authoringMode = modernize_existing`, build a value inventory BEFORE forming opinions about what to change. Read the original and classify each meaningful mechanism: (1) enforcement mechanisms (forcing functions, hard gates, required outputs), (2) domain knowledge (problem-specific principles the agent would not otherwise know), (3) behavioral rules (persistent constraints on how the agent works). This inventory is the preservation checklist.",
|
|
126
132
|
"If `authoringMode = modernize_existing`, identify what must stay the same about purpose, what feels stale, and what modernization constraints apply."
|
|
127
133
|
],
|
|
128
134
|
"outputRequired": {
|
|
129
135
|
"notesMarkdown": "Task understanding, baseline choices, patterns to borrow or avoid, and any real open questions.",
|
|
130
|
-
"context": "Capture authoringMode, workflowArchetype, workflowComplexity, rigorMode, taskDescription, intendedAudience, successCriteria, domainConstraints, targetWorkflowPath, modernizationGoals, authoringBaseline, outcomeBaseline, baselineDecisionRationale, authoringPatternsToBorrow, outcomePatternsToBorrow, patternsToAvoid, openQuestions."
|
|
136
|
+
"context": "Capture authoringMode, workflowArchetype, workflowComplexity, rigorMode, taskDescription, intendedAudience, successCriteria, domainConstraints, targetWorkflowPath, modernizationGoals, authoringBaseline, outcomeBaseline, baselineDecisionRationale, authoringPatternsToBorrow, outcomePatternsToBorrow, patternsToAvoid, openQuestions, and valueInventory (modernize_existing only)."
|
|
131
137
|
},
|
|
132
138
|
"verify": [
|
|
133
139
|
"The task is understood well enough to design the workflow without guessing blindly.",
|
|
@@ -178,11 +184,11 @@
|
|
|
178
184
|
"Decide the phase list, one-line goal for each phase, and overall ordering.",
|
|
179
185
|
"Design loops with explicit exit rules, bounded maxIterations, and real reasons for another pass.",
|
|
180
186
|
"Decide confirmation gates, delegation vs template injection vs direct execution, promptFragments, references, artifacts, and metaGuidance.",
|
|
181
|
-
"If `authoringMode = modernize_existing`, decide
|
|
187
|
+
"If `authoringMode = modernize_existing`, decide preserve-in-place, restructure, or rewrite. For each item in valueInventory, record: `preserved` (structurally present with equivalent enforcement), `replaced` (new mechanism prevents same failure mode \u2014 justify equivalence), or `dropped` (intentionally removed \u2014 justify the loss). Phase-level mapping alone is insufficient; track what was inside each restructured or removed phase."
|
|
182
188
|
],
|
|
183
189
|
"outputRequired": {
|
|
184
190
|
"notesMarkdown": "Structured workflow outline, loop design, confirmation design, delegation design, artifact plan, and modernization mapping.",
|
|
185
|
-
"context": "Capture workflowOutline, loopDesign, confirmationDesign, delegationDesign, artifactPlan, contextModel, voiceStrategy, routineAudit, delegationBoundaries, templateInjectionPlan, modernizationStrategy, legacyMapping, and
|
|
191
|
+
"context": "Capture workflowOutline, loopDesign, confirmationDesign, delegationDesign, artifactPlan, contextModel, voiceStrategy, routineAudit, delegationBoundaries, templateInjectionPlan, modernizationStrategy, legacyMapping, behaviorPreservationNotes, and valuePreservationMap (modernize_existing only)."
|
|
186
192
|
},
|
|
187
193
|
"verify": [
|
|
188
194
|
"The authored workflow architecture is coherent before JSON drafting begins."
|
|
@@ -191,14 +197,23 @@
|
|
|
191
197
|
"promptFragments": [
|
|
192
198
|
{
|
|
193
199
|
"id": "phase-2-simple-direct",
|
|
194
|
-
"when": {
|
|
200
|
+
"when": {
|
|
201
|
+
"var": "workflowComplexity",
|
|
202
|
+
"equals": "Simple"
|
|
203
|
+
},
|
|
195
204
|
"text": "For Simple workflows, keep the architecture linear and compact. Do not invent loops or ceremony unless the task truly needs them."
|
|
196
205
|
}
|
|
197
206
|
],
|
|
198
207
|
"requireConfirmation": {
|
|
199
208
|
"or": [
|
|
200
|
-
{
|
|
201
|
-
|
|
209
|
+
{
|
|
210
|
+
"var": "workflowComplexity",
|
|
211
|
+
"not_equals": "Simple"
|
|
212
|
+
},
|
|
213
|
+
{
|
|
214
|
+
"var": "rigorMode",
|
|
215
|
+
"not_equals": "QUICK"
|
|
216
|
+
}
|
|
202
217
|
]
|
|
203
218
|
}
|
|
204
219
|
},
|
|
@@ -227,8 +242,14 @@
|
|
|
227
242
|
},
|
|
228
243
|
"requireConfirmation": {
|
|
229
244
|
"or": [
|
|
230
|
-
{
|
|
231
|
-
|
|
245
|
+
{
|
|
246
|
+
"var": "rigorMode",
|
|
247
|
+
"equals": "THOROUGH"
|
|
248
|
+
},
|
|
249
|
+
{
|
|
250
|
+
"var": "workflowComplexity",
|
|
251
|
+
"equals": "Complex"
|
|
252
|
+
}
|
|
232
253
|
]
|
|
233
254
|
}
|
|
234
255
|
},
|
|
@@ -258,7 +279,10 @@
|
|
|
258
279
|
"promptFragments": [
|
|
259
280
|
{
|
|
260
281
|
"id": "phase-4-simple-fast",
|
|
261
|
-
"when": {
|
|
282
|
+
"when": {
|
|
283
|
+
"var": "workflowComplexity",
|
|
284
|
+
"equals": "Simple"
|
|
285
|
+
},
|
|
262
286
|
"text": "For Simple workflows, keep the file compact and linear. Do not create extra metaGuidance or loops unless the task truly needs them."
|
|
263
287
|
}
|
|
264
288
|
],
|
|
@@ -303,7 +327,10 @@
|
|
|
303
327
|
"promptFragments": [
|
|
304
328
|
{
|
|
305
329
|
"id": "phase-5a-thorough",
|
|
306
|
-
"when": {
|
|
330
|
+
"when": {
|
|
331
|
+
"var": "rigorMode",
|
|
332
|
+
"equals": "THOROUGH"
|
|
333
|
+
},
|
|
307
334
|
"text": "After structural validation passes, also check the workflow manually against required-level authoring-spec rules and fix any failures before moving on."
|
|
308
335
|
}
|
|
309
336
|
],
|
|
@@ -405,7 +432,9 @@
|
|
|
405
432
|
"procedure": [
|
|
406
433
|
"Trace the authored workflow step by step against the user's actual task or the closest realistic scenario.",
|
|
407
434
|
"For each step, ask: what would the agent actually do, what context would it have, what would it likely produce, and what would the next step inherit?",
|
|
435
|
+
"Also trace at least one degraded or edge-case path \u2014 not just the happy path. Ask: what happens when a condition evaluates unexpectedly, a loop has nothing to iterate, a runCondition skips a phase, or the user provides minimal input? Quality gates that only protect the happy path are not quality gates.",
|
|
408
436
|
"Identify likely weak steps, likely unsatisfying outputs, and likely false-confidence modes.",
|
|
437
|
+
"For any loop in the workflow, explicitly check: does the exit condition have structural teeth (artifact contract, bounded maxIterations), or does it rely on prose instructions the engine cannot enforce?",
|
|
409
438
|
"Fix issues directly in the workflow file when the right improvement is clear."
|
|
410
439
|
],
|
|
411
440
|
"outputRequired": {
|
|
@@ -419,8 +448,19 @@
|
|
|
419
448
|
"promptFragments": [
|
|
420
449
|
{
|
|
421
450
|
"id": "phase-6b-quick",
|
|
422
|
-
"when": {
|
|
451
|
+
"when": {
|
|
452
|
+
"var": "rigorMode",
|
|
453
|
+
"equals": "QUICK"
|
|
454
|
+
},
|
|
423
455
|
"text": "For QUICK rigor, keep the simulation compact but still answer where the workflow would likely disappoint the user if it disappointed them at all."
|
|
456
|
+
},
|
|
457
|
+
{
|
|
458
|
+
"id": "phase-6b-modernize-check",
|
|
459
|
+
"when": {
|
|
460
|
+
"var": "authoringMode",
|
|
461
|
+
"equals": "modernize_existing"
|
|
462
|
+
},
|
|
463
|
+
"text": "For modernize_existing: after tracing the workflow forward, check each item in valueInventory. For each enforcement mechanism and domain knowledge item: would the modernized workflow produce the same behavior? Any item where the answer is no or weaker is a loss \u2014 fix it directly or record the accepted tradeoff with justification."
|
|
424
464
|
}
|
|
425
465
|
],
|
|
426
466
|
"requireConfirmation": false
|
|
@@ -435,9 +475,10 @@
|
|
|
435
475
|
"Reviewer-family or validator output is evidence, not authority."
|
|
436
476
|
],
|
|
437
477
|
"procedure": [
|
|
438
|
-
"Score these dimensions 0-2 with one sentence of evidence each: `voiceClarity`, `ceremonyLevel`, `loopSoundness`, `delegationBoundedness`, `artifactClarity`, `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, `handoffUtility`, and `modernizationDiscipline
|
|
478
|
+
"Score these dimensions 0-2 with one sentence of evidence each: `voiceClarity`, `ceremonyLevel`, `loopSoundness`, `delegationBoundedness`, `artifactClarity`, `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, `handoffUtility`, `rigorAdaptability` (0 = adapts to complexity/rigor levels, 2 = single-weight), `enforcementStrength` (0 = behavioral rules have structural teeth; 2 = important rules are prose-only with no enforcement mechanism), and `modernizationDiscipline` (0 = every valueInventory item preserved, equivalently replaced with justification, or dropped with justification; 2 = items missing or replaced with weaker versions without justification \u2014 score 0 for create mode).",
|
|
439
479
|
"If delegation is available and rigor is THOROUGH, run an adversarial review bundle with these lenses: `engine_native_reviewer`, `task_effectiveness_reviewer`, `state_economy_reviewer`, `false_confidence_reviewer`, `domain_fit_reviewer`, and `maintainer_reviewer`.",
|
|
440
480
|
"Synthesize what the review confirmed, what it challenged, and what changed your mind.",
|
|
481
|
+
"When scoring `falseConfidenceResistance`, explicitly check: do the workflow's quality gates protect edge cases and degraded paths, or only the happy path? A workflow that passes its own checks on ideal input but fails silently on minimal or unexpected input scores 2.",
|
|
441
482
|
"Set hard-gate failures whenever any of these are materially weak: `taskEffectiveness`, `falseConfidenceResistance`, `stateMinimality`, `coverageSharpness`, `domainFit`, or `handoffUtility`.",
|
|
442
483
|
"Set `authoringIntegrityPassed = true` only if structural and authoring-quality dimensions are all acceptable. Set `outcomeEffectivenessPassed = true` only if the workflow is likely to achieve satisfying results for the user."
|
|
443
484
|
],
|
|
@@ -452,13 +493,27 @@
|
|
|
452
493
|
"promptFragments": [
|
|
453
494
|
{
|
|
454
495
|
"id": "phase-6c-standard",
|
|
455
|
-
"when": {
|
|
496
|
+
"when": {
|
|
497
|
+
"var": "rigorMode",
|
|
498
|
+
"equals": "STANDARD"
|
|
499
|
+
},
|
|
456
500
|
"text": "For STANDARD rigor, you may keep the review self-executed unless uncertainty remains material. If you do delegate, prefer a small adversarial bundle."
|
|
457
501
|
},
|
|
458
502
|
{
|
|
459
503
|
"id": "phase-6c-thorough",
|
|
460
|
-
"when": {
|
|
504
|
+
"when": {
|
|
505
|
+
"var": "rigorMode",
|
|
506
|
+
"equals": "THOROUGH"
|
|
507
|
+
},
|
|
461
508
|
"text": "For THOROUGH rigor, assume the first review is not enough. Use adversarial reviewer lanes unless a hard limitation makes them impossible."
|
|
509
|
+
},
|
|
510
|
+
{
|
|
511
|
+
"id": "phase-6c-heritage-review",
|
|
512
|
+
"when": {
|
|
513
|
+
"var": "authoringMode",
|
|
514
|
+
"equals": "modernize_existing"
|
|
515
|
+
},
|
|
516
|
+
"text": "For modernize_existing: add a heritage_reviewer to the adversarial bundle. Its job is to check each valueInventory item and find what was lost or weakened \u2014 it ignores format improvements. It must answer: which enforcement mechanisms are now prose-only? Which domain knowledge items are absent? Which behavioral rules were removed without equivalent replacement? Heritage_reviewer findings drive enforcementStrength and modernizationDiscipline scores."
|
|
462
517
|
}
|
|
463
518
|
],
|
|
464
519
|
"requireConfirmation": false
|
|
@@ -526,6 +581,7 @@
|
|
|
526
581
|
"Keep it concise. The workflow file is the deliverable, not the summary."
|
|
527
582
|
],
|
|
528
583
|
"procedure": [
|
|
584
|
+
"Stamp the workflow file: read the current `version` from `spec/authoring-spec.json` and write `validatedAgainstSpecVersion: <N>` as a top-level field in the workflow JSON. Commit the change \u2014 the stamp has no effect if only saved locally.",
|
|
529
585
|
"State the workflow file path and name, whether it was created or modernized, and what it does in one sentence.",
|
|
530
586
|
"Summarize the step structure, loops, confirmations, and delegation profile.",
|
|
531
587
|
"Report validation status, authoring-integrity status, and outcome-effectiveness status.",
|
|
@@ -1,298 +0,0 @@
|
|
|
1
|
-
# Changelog - Systematic Bug Investigation Workflow
|
|
2
|
-
|
|
3
|
-
## [1.1.0-beta.22] - 2025-01-06
|
|
4
|
-
|
|
5
|
-
### CRITICAL FIX - Invalid Loop Step Schema
|
|
6
|
-
- **ROOT CAUSE**: In beta.19, we added `guidance` to the loop step, but loop steps DON'T support guidance in the schema
|
|
7
|
-
- Schema allows: `id`, `type`, `title`, `loop`, `body`, `functionDefinitions`, `requireConfirmation`, `runCondition`
|
|
8
|
-
- Does NOT allow: `guidance`, `prompt`, `agentRole`
|
|
9
|
-
- **Fix**: Moved loop enforcement guidance to first body step (`analysis-neighborhood-contracts`)
|
|
10
|
-
- "USER SAYS: This loop MUST complete ALL 5 iterations..."
|
|
11
|
-
- Now properly enforced on each iteration
|
|
12
|
-
- **Validation**: Workflow now passes full schema validation
|
|
13
|
-
|
|
14
|
-
### Why This Matters
|
|
15
|
-
Without proper validation, the MCP server couldn't load the workflow at all. Beta.19-21 were broken due to schema violations.
|
|
16
|
-
|
|
17
|
-
## [1.1.0-beta.21] - 2025-01-06
|
|
18
|
-
|
|
19
|
-
### HOTFIX - metaGuidance Schema Violations
|
|
20
|
-
- **Fixed**: metaGuidance entry 35 exceeded 256 character limit (266 chars)
|
|
21
|
-
- Split "HIGH AUTO MODE DISCIPLINE" into 3 separate entries
|
|
22
|
-
- **Fixed**: Duplicate metaGuidance entries after split
|
|
23
|
-
- Removed duplicates, cleaned to 89 unique entries
|
|
24
|
-
- **Note**: CLI validator reports loop step errors (false positive - loops have different schema)
|
|
25
|
-
- Workflow loads successfully in MCP server
|
|
26
|
-
- Same loop structure as beta.18 which worked fine
|
|
27
|
-
|
|
28
|
-
## [1.1.0-beta.20] - 2025-01-06
|
|
29
|
-
|
|
30
|
-
### CRITICAL FIX - Dangerous "Autonomy" Language
|
|
31
|
-
- **ROOT CAUSE IDENTIFIED**: Our automation level descriptions were giving agents permission to skip!
|
|
32
|
-
- OLD: "High=**auto-approve >8.0 confidence decisions**"
|
|
33
|
-
- Interpreted as: "I have 9/10 confidence → I can approve my decision to skip phases"
|
|
34
|
-
- OLD: "Control workflow **autonomy**"
|
|
35
|
-
- Interpreted as: "High mode gives me autonomy to decide what to skip"
|
|
36
|
-
|
|
37
|
-
### Language Fixes
|
|
38
|
-
1. **Removed "auto-approve decisions"**: Changed to "execute phases automatically WITHOUT asking permission between phases"
|
|
39
|
-
2. **Removed "autonomy"**: Changed to "Control confirmation frequency"
|
|
40
|
-
3. **Clarified HIGH AUTO MODE**:
|
|
41
|
-
- NEW: "HIGH AUTO = NO INTERRUPTIONS, NOT NO PHASES"
|
|
42
|
-
- NEW: "HIGH AUTO ≠ PERMISSION TO SKIP PHASES"
|
|
43
|
-
4. **Explicit USER SAYS**:
|
|
44
|
-
- "USER SAYS: 'High automation mode' means you DON'T ASK PERMISSION. It does NOT mean you have autonomy to decide which phases to skip."
|
|
45
|
-
- "High auto = Faster execution of ALL phases. NOT = Smarter agent gets to skip phases."
|
|
46
|
-
|
|
47
|
-
### Credit
|
|
48
|
-
User insight: "Could the high automation be causing it to do this? do we frame it as letting it do whatever it wants?" - YES, we were!
|
|
49
|
-
|
|
50
|
-
## [1.1.0-beta.19] - 2025-01-06
|
|
51
|
-
|
|
52
|
-
### CRITICAL FIX - Anti-Rationalization
|
|
53
|
-
- **NEW PATTERN DETECTED**: Agents now **acknowledge** the warnings but then **rationalize** why they don't apply
|
|
54
|
-
- Example: "I know finding ≠ done... **However, given that I have high confidence...**"
|
|
55
|
-
- Example: "Let me proceed with a **more targeted Phase 2**..." (skipping remaining iterations)
|
|
56
|
-
- **Problem**: Agents stopped at **iteration 2 of 5** in Phase 1 loop - didn't even finish the analysis phase!
|
|
57
|
-
- **Root Cause**: Agents think they can judge when to skip based on their "special" situation
|
|
58
|
-
|
|
59
|
-
### New Anti-Rationalization Safeguards
|
|
60
|
-
1. **Meta-Guidance with USER SAYS framing**: Added "USER SAYS: NO RATIONALIZATION..." section
|
|
61
|
-
- **Why USER SAYS**: Agents follow direct user commands more reliably than abstract principles
|
|
62
|
-
- "USER SAYS: YOUR SITUATION IS NOT SPECIAL. YOU ARE NOT THE EXCEPTION."
|
|
63
|
-
- "USER SAYS: 'I found the bug early' = ALL THE MORE REASON to validate properly"
|
|
64
|
-
- Explicitly forbids phrases like "However, given that..." or "targeted Phase X"
|
|
65
|
-
|
|
66
|
-
2. **Loop Enforcement with USER SAYS** (Phase 1 - 5 iterations):
|
|
67
|
-
- "USER SAYS: This loop MUST complete ALL 5 iterations. Do NOT exit early."
|
|
68
|
-
- "Iteration 2/5 is NOT enough. Iteration 3/5 is NOT enough. Complete 5/5."
|
|
69
|
-
- "Agents who skip analysis iterations are wrong ~95% of the time."
|
|
70
|
-
|
|
71
|
-
### Meta-Learning Moment
|
|
72
|
-
During implementation, the AI implementing this fix attempted to skip validation by rationalizing "the workflow structure is fine, let me just publish" - demonstrating the EXACT behavior this fix prevents! This validates the need for explicit USER SAYS framing.
|
|
73
|
-
|
|
74
|
-
### Why This Is Different
|
|
75
|
-
- Beta.18 addressed goal misunderstanding ("finding" vs "proving")
|
|
76
|
-
- Beta.19 addresses **rationalization** - agents who acknowledge the rules but think they're exceptions
|
|
77
|
-
- Targets the "smart agent" problem: "I understand the principle, BUT in my case..."
|
|
78
|
-
|
|
79
|
-
## [1.1.0-beta.18] - 2025-01-06
|
|
80
|
-
|
|
81
|
-
### CRITICAL FIX
|
|
82
|
-
- **Addresses persistent early-stopping bug**: Agents were still stopping after Phase 1/2 saying "I found the bug"
|
|
83
|
-
- **Root Cause Identified**: Agents fundamentally misunderstand THE GOAL
|
|
84
|
-
- WRONG: "The goal is finding the bug" → Stop after analysis with high confidence
|
|
85
|
-
- RIGHT: "The goal is PROVING the bug with evidence" → Must complete Phases 3-5
|
|
86
|
-
- **New Meta-Guidance Section**: Added explicit "CRITICAL MISUNDERSTANDING TO AVOID" section
|
|
87
|
-
- "FINDING ≠ DONE. PROVING = DONE."
|
|
88
|
-
- "\"I found the bug\" = YOU HAVE A GUESS. \"I proved the bug\" = YOU HAVE EVIDENCE."
|
|
89
|
-
- "NEVER create summary documents until Phase 6"
|
|
90
|
-
- **Step-Level Warnings**: Added "FINDING ≠ PROVING" warnings at all critical stopping points:
|
|
91
|
-
- **Phase 1f** (after analysis): Full explanation of why analysis ≠ proof
|
|
92
|
-
- **Phase 2a** (hypothesis development): "You have THEORIES, not EVIDENCE"
|
|
93
|
-
- **Phase 2h** (midpoint): "You may have 'found' the bug, but haven't 'proved' it"
|
|
94
|
-
- **Step Count Corrections**: Fixed inconsistencies (27 → 23 steps throughout)
|
|
95
|
-
|
|
96
|
-
### Why This Fix Is Different
|
|
97
|
-
Previous fixes (beta.1-beta.17) added warnings about "high confidence ≠ done" but didn't address the fundamental goal misunderstanding. Agents thought their job was to "identify" the bug, not "prove" it. This fix makes the distinction crystal clear upfront.
|
|
98
|
-
|
|
99
|
-
## [1.1.0-beta.17] - 2025-01-06
|
|
100
|
-
|
|
101
|
-
### Major Restructuring
|
|
102
|
-
- **Phase 0 Consolidation**: Merged 4 separate Phase 0 steps into single comprehensive setup step
|
|
103
|
-
- Combined: Triage (0), User Preferences (0a), Tool Check (0b), Context Creation (0c)
|
|
104
|
-
- Result: Single "Phase 0: Complete Investigation Setup" step covering all mechanical preparation
|
|
105
|
-
- Rationale: Reduce workflow overhead while maintaining thorough setup
|
|
106
|
-
- New structure: Phase 0 (Setup) → Phase 0a (Commitment Checkpoint, conditional)
|
|
107
|
-
|
|
108
|
-
- **Assumption Verification Relocation**: Moved from Phase 0a to Phase 1f
|
|
109
|
-
- Previously: Early assumption check before ANY code analysis (removed)
|
|
110
|
-
- Now: Assumption verification AFTER all 5 analysis iterations complete (Phase 1f Step 2.5)
|
|
111
|
-
- Rationale: Assumptions can only be properly verified with full code context
|
|
112
|
-
- Timing: Happens after neighborhood mapping, pattern analysis, component ranking, data flow tracing, and test gap analysis
|
|
113
|
-
- Location: Integrated into Phase 1f "Final Breadth & Scope Verification" before hypothesis development
|
|
114
|
-
|
|
115
|
-
### Impact
|
|
116
|
-
- **Step Count**: Reduced from 27 steps to 23 steps (4 Phase 0 steps → 1)
|
|
117
|
-
- **Phase Numbering**: Simplified Phase 0 structure (Phase 0d → Phase 0a)
|
|
118
|
-
- **Debugging Workflow Alignment**: Better follows traditional debugging principles (observe fully THEN question assumptions THEN hypothesize)
|
|
119
|
-
- **Agent Experience**: Faster setup phase, more informed assumption checking
|
|
120
|
-
|
|
121
|
-
### Breaking Changes
|
|
122
|
-
- `completedSteps` array format changed:
|
|
123
|
-
- OLD: `["phase-0-triage", "phase-0a-user-preferences", "phase-0b-tool-check", "phase-0c-create-context", "phase-0d-workflow-commitment"]`
|
|
124
|
-
- NEW: `["phase-0-complete-setup", "phase-0a-workflow-commitment"]`
|
|
125
|
-
- Step IDs changed: `phase-0d-workflow-commitment` → `phase-0a-workflow-commitment`
|
|
126
|
-
|
|
127
|
-
## [1.1.0-beta.9] - 2025-01-06
|
|
128
|
-
|
|
129
|
-
### Enhanced
|
|
130
|
-
- **CRITICAL**: Strengthened anti-premature-completion safeguards throughout the workflow
|
|
131
|
-
- Added explicit "ANALYSIS ≠ DIAGNOSIS ≠ PROOF" section in metaGuidance
|
|
132
|
-
- Phase 1f: Added "DO NOT STOP HERE" warning emphasizing ZERO PROOF after analysis (~25% done)
|
|
133
|
-
- Phase 2a: Added "YOU ARE NOT DONE" warning with 5-point reminder about mandatory validation
|
|
134
|
-
- Phase 2h: Added "YOU ARE HALFWAY DONE (~50%)" warning before instrumentation phase
|
|
135
|
-
- Clarified progression: Analysis (20%) → Hypotheses (40%) → Evidence (80%) → Writeup (100%)
|
|
136
|
-
- Reinforced: Even with "100% confidence," stopping before evidence collection = providing guesses, not diagnosis
|
|
137
|
-
|
|
138
|
-
### Context
|
|
139
|
-
- **Problem**: Agents were stopping after Phase 1 or 2 when they reached "100% confidence" in analysis/hypotheses
|
|
140
|
-
- **Root Cause**: Agents conflating "confident theory" with "proven diagnosis"
|
|
141
|
-
- **Solution**: Explicit warnings at every potential stopping point emphasizing lack of proof until Phases 3-5 complete
|
|
142
|
-
- **Impact**: Forces agents to understand that analysis/hypotheses are NOT evidence, and professional practice requires validation
|
|
143
|
-
|
|
144
|
-
## [1.1.0-beta.8] - 2025-01-06
|
|
145
|
-
|
|
146
|
-
### Fixed
|
|
147
|
-
- **CRITICAL**: Fixed loop execution bug where body steps with `runCondition` using iteration variables were completely skipped
|
|
148
|
-
- Root cause: Loop variables (e.g., `analysisPhase`) were being injected AFTER evaluating runConditions, causing all conditions to fail
|
|
149
|
-
- Impact: Phase 1's 5-iteration analysis loop was being entirely skipped, jumping straight to Phase 1f
|
|
150
|
-
- Fix: Reordered logic to inject loop variables BEFORE evaluating body step runConditions
|
|
151
|
-
- Also fixed: Pre-existing bug where single-step loop bodies didn't increment iterations properly
|
|
152
|
-
- Test coverage: Added comprehensive integration tests (`loop-runCondition-bug.test.ts`) to prevent regression
|
|
153
|
-
|
|
154
|
-
## [1.1.0-beta.7] - 2025-01-06
|
|
155
|
-
|
|
156
|
-
### Fixed
|
|
157
|
-
- **HOTFIX**: Corrected Phase 0e `runCondition` to use `not_equals` instead of invalid `notEquals` operator
|
|
158
|
-
- Phase 0e now properly executes only when `automationLevel != 'High'`
|
|
159
|
-
- High automation mode now proceeds through all phases without early termination checkpoint
|
|
160
|
-
|
|
161
|
-
## [1.1.0-beta.6] - 2025-01-06
|
|
162
|
-
|
|
163
|
-
### Added
|
|
164
|
-
- **New Phase 1f**: Final Breadth & Scope Verification checkpoint after codebase analysis
|
|
165
|
-
- Prevents tunnel vision by forcing scope sanity checks before hypothesis development
|
|
166
|
-
- Requires evaluation of 2-3 alternative investigation scopes
|
|
167
|
-
- Catches the #1 cause of wrong conclusions: looking in wrong place or too narrowly
|
|
168
|
-
- Positioned strategically after Phase 1 analysis and before Phase 2 hypothesis formation
|
|
169
|
-
|
|
170
|
-
### Enhanced
|
|
171
|
-
- **Phase 3 (Instrumentation)**: Dramatically expanded with concrete, step-by-step instructions
|
|
172
|
-
- Language-specific code examples (JavaScript/TypeScript, Python, Java)
|
|
173
|
-
- Detailed `search_replace` usage examples for applying instrumentation
|
|
174
|
-
- Hypothesis-specific prefixes ([H1], [H2], [H3]) with standard formatting
|
|
175
|
-
- File-by-file workflow: read → locate → instrument → verify
|
|
176
|
-
- Fallback strategy if edit tools unavailable
|
|
177
|
-
- Instrumentation checklist for tracking progress
|
|
178
|
-
|
|
179
|
-
- **Phase 4 (Evidence Collection)**: Comprehensive decision tree and 7-step process
|
|
180
|
-
- **OPTION A**: Agent can execute code → 4-step execution workflow
|
|
181
|
-
- **OPTION B**: Agent cannot execute → User instruction template
|
|
182
|
-
- Clear instructions on when to use each approach
|
|
183
|
-
- Log consolidation and evidence organization by hypothesis
|
|
184
|
-
- Evidence quality assessment (1-10 scale)
|
|
185
|
-
|
|
186
|
-
- **metaGuidance**: Added explicit high auto mode discipline
|
|
187
|
-
- Clarified that agents should not ask for permission between phases in high auto mode
|
|
188
|
-
- Exception: Phase 0e early termination and Phase 4a controlled experiments
|
|
189
|
-
- Reinforced that asking "should I continue?" implies investigation is optional (it is NOT)
|
|
190
|
-
|
|
191
|
-
### Changed
|
|
192
|
-
- Total workflow steps: 26 steps (added Phase 1f)
|
|
193
|
-
- Phase 1 analysis loop: Now clearly labeled as "Analysis 1/5" through "Analysis 5/5"
|
|
194
|
-
|
|
195
|
-
## [1.1.0-beta.5] - 2025-01-06
|
|
196
|
-
|
|
197
|
-
### Changed
|
|
198
|
-
- **Phase 0e Relocation**: Moved early termination checkpoint from Phase 5b to Phase 0e (after triage)
|
|
199
|
-
- Now appears immediately after setup, before any investigation work begins
|
|
200
|
-
- Eliminates sunk cost fallacy (decision at 5% vs 90% completion)
|
|
201
|
-
- Forces upfront decision-making about workflow commitment
|
|
202
|
-
|
|
203
|
-
### Added
|
|
204
|
-
- **Mandatory User Communication**: Phase 0e now requires agents to explicitly tell users about 90% accuracy difference
|
|
205
|
-
- Template message is NOT optional - agents MUST communicate this
|
|
206
|
-
- User must explicitly confirm proceeding with full investigation
|
|
207
|
-
|
|
208
|
-
### Removed
|
|
209
|
-
- **Phase 5b**: Removed old completion checkpoint (now Phase 0e)
|
|
210
|
-
- Total workflow steps reduced from 28 to 26
|
|
211
|
-
|
|
212
|
-
## [1.1.0-beta.4] - 2025-01-05
|
|
213
|
-
|
|
214
|
-
### Enhanced
|
|
215
|
-
- **Sophisticated Code Analysis**: Integrated advanced analysis techniques from MR review workflow into Phase 1
|
|
216
|
-
|
|
217
|
-
### Added
|
|
218
|
-
- **New Phase 1a**: Neighborhood, Call Graph & Contracts analysis
|
|
219
|
-
- Module root computation (nearest common ancestor, clamped to package boundary)
|
|
220
|
-
- Neighborhood mapping (immediate neighbors, imports, tests, entry points)
|
|
221
|
-
- Bounded call graph with HOT path ranking (Small Multiples ASCII visualization)
|
|
222
|
-
- Flow anchors (entry points to bug: HTTP routes, CLI commands, scheduled jobs, event handlers)
|
|
223
|
-
- Contracts & invariants discovery (API symbols, endpoints, database tables, stated guarantees)
|
|
224
|
-
|
|
225
|
-
- **Enhanced Phase 1 Structure**: Now 5 sub-phases (was 4)
|
|
226
|
-
1. Neighborhood, Call Graph & Contracts (NEW)
|
|
227
|
-
2. Breadth Scan (pattern discovery)
|
|
228
|
-
3. Deep Dive (suspicious code analysis)
|
|
229
|
-
4. Dependencies & Data Flow
|
|
230
|
-
5. Test Coverage Analysis
|
|
231
|
-
|
|
232
|
-
### Changed
|
|
233
|
-
- Total workflow steps increased from 27 to 28 (added Phase 1a)
|
|
234
|
-
- Phase 1 loop now iterates 5 times (was 4)
|
|
235
|
-
- Each analysis phase now produces more structured, evidence-based outputs
|
|
236
|
-
|
|
237
|
-
## [1.1.0-beta.3] - 2025-01-05
|
|
238
|
-
|
|
239
|
-
### Fixed
|
|
240
|
-
- **Critical**: Prevented ALL phase skipping, not just final documentation phase
|
|
241
|
-
- Root cause: Agents didn't understand they MUST repeatedly call workflow_next
|
|
242
|
-
- Added mandatory workflow execution instructions to metaGuidance
|
|
243
|
-
- Added early commitment checkpoint (Phase 0e) requiring user confirmation
|
|
244
|
-
- Reinforced evidence-based persuasion: 90% error rate for premature conclusions
|
|
245
|
-
|
|
246
|
-
### Added
|
|
247
|
-
- **Phase 0e**: Workflow Execution Commitment checkpoint
|
|
248
|
-
- Appears immediately after triage (before investigation begins)
|
|
249
|
-
- Requires agent acknowledgment of workflow structure (26 steps)
|
|
250
|
-
- Requires user confirmation to proceed with full investigation
|
|
251
|
-
- Explicit warning: stopping early leads to wrong conclusions ~90% of time
|
|
252
|
-
|
|
253
|
-
### Enhanced
|
|
254
|
-
- **metaGuidance**: Added comprehensive workflow execution discipline
|
|
255
|
-
- Agents MUST call workflow_next until isComplete=true
|
|
256
|
-
- High confidence (9-10/10) does NOT mean workflow is complete
|
|
257
|
-
- Professional research shows 90% error rate for jumping to conclusions
|
|
258
|
-
- Added "WHY THIS STRUCTURE EXISTS (Evidence-Based)" section
|
|
259
|
-
|
|
260
|
-
## [1.1.0-beta.2] - 2025-01-05
|
|
261
|
-
|
|
262
|
-
### Added
|
|
263
|
-
- **Phase 5b**: Mandatory completion checkpoint with user confirmation
|
|
264
|
-
- Prevents agents from skipping comprehensive diagnostic writeup (Phase 6)
|
|
265
|
-
- Requires explicit acknowledgment that Phase 6 is the required deliverable
|
|
266
|
-
- User must confirm proceeding to final documentation phase
|
|
267
|
-
|
|
268
|
-
### Enhanced
|
|
269
|
-
- **metaGuidance**: Added critical workflow discipline instructions
|
|
270
|
-
- Emphasized that high confidence does NOT equal completion
|
|
271
|
-
- Clarified that Phase 6 is a mandatory deliverable, not optional
|
|
272
|
-
- Added explicit instructions on when to set `isWorkflowComplete=true`
|
|
273
|
-
|
|
274
|
-
## [1.1.0-beta.1] - 2025-01-05
|
|
275
|
-
|
|
276
|
-
### Fixed
|
|
277
|
-
- **Critical**: Prevented premature workflow completion
|
|
278
|
-
- Agents were jumping to conclusions and skipping phases with high confidence
|
|
279
|
-
- Root cause: Misinterpreting progress/confidence as final completion
|
|
280
|
-
|
|
281
|
-
### Added
|
|
282
|
-
- **metaGuidance Section**: "CRITICAL WORKFLOW DISCIPLINE"
|
|
283
|
-
- High confidence (9-10/10) does NOT mean completion
|
|
284
|
-
- Agent MUST complete all phases (0-6) regardless of confidence
|
|
285
|
-
- Only set `isWorkflowComplete=true` after Phase 6 comprehensive writeup
|
|
286
|
-
|
|
287
|
-
- **Phase-Specific Warnings**:
|
|
288
|
-
- Phase 2a (Hypothesis Formation): Warning against treating hypothesis as conclusion
|
|
289
|
-
- Phase 5a (Confidence Assessment): Warning that 10/10 confidence still requires Phase 6
|
|
290
|
-
|
|
291
|
-
### Enhanced
|
|
292
|
-
- **Phase 6 Instructions**: Explicit completion marking
|
|
293
|
-
- Must set `isWorkflowComplete=true` in this phase
|
|
294
|
-
- Must produce comprehensive diagnostic writeup
|
|
295
|
-
- This is the ONLY phase that marks workflow as truly complete
|
|
296
|
-
|
|
297
|
-
### Changed
|
|
298
|
-
- All phase prompts updated to reference 27 total workflow steps for clarity
|