@exaudeus/workrail 3.67.0 → 3.68.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (144) hide show
  1. package/dist/application/services/compiler/template-registry.js +10 -1
  2. package/dist/cli/commands/worktrain-init.js +1 -1
  3. package/dist/console-ui/assets/{index-tOl8Vowf.js → index-DPdRJHMX.js} +1 -1
  4. package/dist/console-ui/index.html +1 -1
  5. package/dist/coordinators/modes/full-pipeline.js +4 -4
  6. package/dist/coordinators/modes/implement-shared.js +5 -5
  7. package/dist/coordinators/modes/implement.js +4 -4
  8. package/dist/coordinators/pr-review.js +4 -4
  9. package/dist/daemon/workflow-runner.d.ts +1 -0
  10. package/dist/daemon/workflow-runner.js +1 -0
  11. package/dist/manifest.json +31 -31
  12. package/dist/mcp/handlers/v2-context-budget.js +18 -0
  13. package/dist/mcp/handlers/v2-workflow.js +1 -1
  14. package/dist/mcp/workflow-protocol-contracts.js +2 -2
  15. package/dist/v2/durable-core/constants.d.ts +2 -0
  16. package/dist/v2/durable-core/constants.js +2 -1
  17. package/dist/v2/projections/session-metrics.js +1 -1
  18. package/docs/authoring-v2.md +4 -4
  19. package/docs/changelog-recent.md +3 -3
  20. package/docs/configuration.md +1 -1
  21. package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
  22. package/docs/design/adaptive-coordinator-context.md +1 -1
  23. package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
  24. package/docs/design/adaptive-coordinator-routing-review.md +1 -1
  25. package/docs/design/adaptive-coordinator-routing.md +34 -34
  26. package/docs/design/agent-cascade-protocol.md +2 -2
  27. package/docs/design/console-daemon-separation-discovery.md +323 -0
  28. package/docs/design/context-assembly-design-candidates.md +1 -1
  29. package/docs/design/context-assembly-implementation-plan.md +1 -1
  30. package/docs/design/context-assembly-layer.md +2 -2
  31. package/docs/design/context-assembly-review-findings.md +1 -1
  32. package/docs/design/coordinator-access-audit.md +293 -0
  33. package/docs/design/coordinator-architecture-audit.md +62 -0
  34. package/docs/design/coordinator-error-handling-audit.md +240 -0
  35. package/docs/design/coordinator-testability-audit.md +426 -0
  36. package/docs/design/daemon-architecture-discovery.md +1 -1
  37. package/docs/design/daemon-console-separation-discovery.md +242 -0
  38. package/docs/design/daemon-memory-audit.md +203 -0
  39. package/docs/design/design-candidates-console-daemon-separation.md +256 -0
  40. package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
  41. package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
  42. package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
  43. package/docs/design/discovery-loop-fix-candidates.md +161 -0
  44. package/docs/design/discovery-loop-fix-design-review.md +106 -0
  45. package/docs/design/discovery-loop-fix-validation.md +258 -0
  46. package/docs/design/discovery-loop-investigation-A.md +188 -0
  47. package/docs/design/discovery-loop-investigation-B.md +287 -0
  48. package/docs/design/exploration-workflow-candidates.md +205 -0
  49. package/docs/design/exploration-workflow-design-review.md +166 -0
  50. package/docs/design/exploration-workflow-discovery.md +443 -0
  51. package/docs/design/ide-context-files-candidates.md +231 -0
  52. package/docs/design/ide-context-files-design-review.md +85 -0
  53. package/docs/design/ide-context-files.md +615 -0
  54. package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
  55. package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
  56. package/docs/design/in-process-http-audit.md +190 -0
  57. package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
  58. package/docs/design/loadSessionNotes-candidates.md +108 -0
  59. package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
  60. package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
  61. package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
  62. package/docs/design/probe-session-design-candidates.md +261 -0
  63. package/docs/design/probe-session-phase0.md +490 -0
  64. package/docs/design/routines-guide.md +7 -7
  65. package/docs/design/session-metrics-attribution-candidates.md +250 -0
  66. package/docs/design/session-metrics-attribution-design-review.md +115 -0
  67. package/docs/design/session-metrics-attribution-discovery.md +319 -0
  68. package/docs/design/session-metrics-candidates.md +227 -0
  69. package/docs/design/session-metrics-design-review.md +104 -0
  70. package/docs/design/session-metrics-discovery.md +454 -0
  71. package/docs/design/spawn-session-debug.md +202 -0
  72. package/docs/design/trigger-validator-candidates.md +214 -0
  73. package/docs/design/trigger-validator-review.md +109 -0
  74. package/docs/design/trigger-validator-shaping-phase0.md +239 -0
  75. package/docs/design/trigger-validator.md +454 -0
  76. package/docs/design/v2-core-design-locks.md +2 -2
  77. package/docs/design/workflow-extension-points.md +15 -15
  78. package/docs/design/workflow-id-validation-at-startup.md +1 -1
  79. package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
  80. package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
  81. package/docs/design/worktrain-task-queue-candidates.md +5 -5
  82. package/docs/design/worktrain-task-queue.md +4 -4
  83. package/docs/discovery/coordinator-script-design.md +1 -1
  84. package/docs/discovery/coordinator-ux-discovery.md +3 -3
  85. package/docs/discovery/simulation-report.md +1 -1
  86. package/docs/discovery/workflow-modernization-discovery.md +326 -0
  87. package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
  88. package/docs/discovery/worktrain-status-briefing.md +1 -1
  89. package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
  90. package/docs/docker.md +1 -1
  91. package/docs/ideas/backlog.md +227 -0
  92. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
  93. package/docs/integrations/claude-code.md +5 -5
  94. package/docs/integrations/firebender.md +1 -1
  95. package/docs/plans/agentic-orchestration-roadmap.md +2 -2
  96. package/docs/plans/mr-review-workflow-redesign.md +9 -9
  97. package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
  98. package/docs/plans/ui-ux-workflow-discovery.md +2 -2
  99. package/docs/plans/workflow-categories-candidates.md +8 -8
  100. package/docs/plans/workflow-categories-discovery.md +4 -4
  101. package/docs/plans/workflow-modernization-design.md +430 -0
  102. package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
  103. package/docs/plans/workflow-staleness-detection-review.md +4 -4
  104. package/docs/plans/workflow-staleness-detection.md +9 -9
  105. package/docs/plans/workrail-platform-vision.md +3 -3
  106. package/docs/reference/agent-context-cleaner-snippet.md +1 -1
  107. package/docs/reference/agent-context-guidance.md +4 -4
  108. package/docs/reference/context-optimization.md +2 -2
  109. package/docs/roadmap/now-next-later.md +2 -2
  110. package/docs/roadmap/open-work-inventory.md +16 -16
  111. package/docs/workflows.md +31 -31
  112. package/package.json +1 -1
  113. package/spec/workflow-tags.json +47 -47
  114. package/workflows/adaptive-ticket-creation.json +16 -16
  115. package/workflows/architecture-scalability-audit.json +22 -22
  116. package/workflows/bug-investigation.agentic.v2.json +3 -3
  117. package/workflows/classify-task-workflow.json +1 -1
  118. package/workflows/coding-task-workflow-agentic.json +6 -6
  119. package/workflows/cross-platform-code-conversion.v2.json +8 -8
  120. package/workflows/document-creation-workflow.json +8 -8
  121. package/workflows/documentation-update-workflow.json +8 -8
  122. package/workflows/intelligent-test-case-generation.json +2 -2
  123. package/workflows/learner-centered-course-workflow.json +2 -2
  124. package/workflows/mr-review-workflow.agentic.v2.json +4 -4
  125. package/workflows/personal-learning-materials-creation-branched.json +8 -8
  126. package/workflows/presentation-creation.json +5 -5
  127. package/workflows/production-readiness-audit.json +1 -1
  128. package/workflows/relocation-workflow-us.json +31 -31
  129. package/workflows/routines/context-gathering.json +1 -1
  130. package/workflows/routines/design-review.json +1 -1
  131. package/workflows/routines/execution-simulation.json +1 -1
  132. package/workflows/routines/feature-implementation.json +3 -3
  133. package/workflows/routines/final-verification.json +1 -1
  134. package/workflows/routines/hypothesis-challenge.json +1 -1
  135. package/workflows/routines/ideation.json +1 -1
  136. package/workflows/routines/parallel-work-partitioning.json +3 -3
  137. package/workflows/routines/philosophy-alignment.json +2 -2
  138. package/workflows/routines/plan-analysis.json +1 -1
  139. package/workflows/routines/plan-generation.json +1 -1
  140. package/workflows/routines/tension-driven-design.json +6 -6
  141. package/workflows/scoped-documentation-workflow.json +26 -26
  142. package/workflows/ui-ux-design-workflow.json +14 -14
  143. package/workflows/workflow-diagnose-environment.json +1 -1
  144. package/workflows/workflow-for-workflows.json +1 -1
@@ -49,7 +49,7 @@ See `docs/plans/ui-ux-workflow-design-candidates.md` for full candidate analysis
49
49
  **Recommendation: Two composing workflows**
50
50
 
51
51
  ### Workflow B: UI/UX Design Creation Workflow
52
- For designing UI/UX from scratch. Adapted from `production-readiness-audit.json`.
52
+ For designing UI/UX from scratch. Adapted from `wr.production-readiness-audit.json`.
53
53
 
54
54
  **Phase structure:**
55
55
  - Phase 0: Problem framing, user goals, constraints, existing design context (requireConfirmation always)
@@ -62,7 +62,7 @@ For designing UI/UX from scratch. Adapted from `production-readiness-audit.json`
62
62
  **Complexity branching**: Simple (single component, no new flows) skips Phases 1-3.
63
63
 
64
64
  ### Workflow D: UI/UX Design Audit Workflow
65
- For reviewing an existing design description/spec before implementation. Adapted from `architecture-scalability-audit.json`.
65
+ For reviewing an existing design description/spec before implementation. Adapted from `wr.architecture-scalability-audit.json`.
66
66
 
67
67
  User provides design description; agent audits against declared dimensions: information architecture, Hick/Miller/Jakob/Fitts laws, accessibility (WCAG), edge cases (empty/error/loading/first-use), content/microcopy, visual hierarchy.
68
68
 
@@ -25,7 +25,7 @@
25
25
  - `V2WorkflowListOutputSchema` (new optional `categorySummary` field)
26
26
  - `handleV2ListWorkflows` (response branching logic)
27
27
  - `validate-workflows-registry.ts` (new uncategorized workflow warning)
28
- - `workflow-for-workflows.v2.json` Phase 7 (should stamp category when authoring)
28
+ - `wr.workflow-for-workflows.v2.json` Phase 7 (should stamp category when authoring)
29
29
 
30
30
  ## Candidates
31
31
 
@@ -79,20 +79,20 @@ B covers ~30% of workflows. C adds compiler complexity for a problem A already s
79
79
 
80
80
  | Category | Count | Examples |
81
81
  |---|---|---|
82
- | coding | 3 | coding-task, cross-platform-code-conversion |
83
- | review_audit | 3 | mr-review, production-readiness-audit, architecture-scalability-audit |
82
+ | coding | 3 | coding-task, wr.cross-platform-code-conversion |
83
+ | review_audit | 3 | mr-review, wr.production-readiness-audit, wr.architecture-scalability-audit |
84
84
  | investigation | 2 | bug-investigation, workflow-diagnose |
85
85
  | design | 2 | ui-ux-design, wr.discovery |
86
86
  | documentation | 3 | document-creation, scoped-documentation, documentation-update |
87
- | tickets | 4 | adaptive-ticket-creation, ticket-grooming, intelligent-test-case-generation |
88
- | learning | 4 | personal-learning-*, presentation-creation, relocation |
87
+ | tickets | 4 | wr.adaptive-ticket-creation, ticket-grooming, wr.intelligent-test-case-generation |
88
+ | learning | 4 | personal-learning-*, wr.presentation-creation, relocation |
89
89
  | routines | ~10 | all routine-* |
90
- | authoring | 1-2 | workflow-for-workflows |
90
+ | authoring | 1-2 | wr.workflow-for-workflows |
91
91
  | testing | 3 | test-* (hidden from default summary) |
92
92
 
93
93
  ## Self-Critique
94
94
 
95
- **Strongest counter-argument**: two-file maintenance burden (workflow JSON + overlay). Mitigated by: validate:registry warning on uncategorized workflows makes omission loud; workflow-for-workflows can be updated to prompt for category at authoring time.
95
+ **Strongest counter-argument**: two-file maintenance burden (workflow JSON + overlay). Mitigated by: validate:registry warning on uncategorized workflows makes omission loud; wr.workflow-for-workflows can be updated to prompt for category at authoring time.
96
96
 
97
97
  **Pivot condition**: if teams want per-workspace custom categories, A needs extension (workspace-level categories.json overlay). Defer to v2.
98
98
 
@@ -102,4 +102,4 @@ B covers ~30% of workflows. C adds compiler complexity for a problem A already s
102
102
  2. Should routines be surfaced in summary mode at all, or hidden by default (they're internal, not user-invoked)?
103
103
  3. Should the `categorySummary` include a short description per category (e.g., "Review code changes, audit systems") or just names + counts?
104
104
  4. What's the right `displayName` for `review_audit`? "Review & Audit"?
105
- 5. Should `workflow-for-workflows` Phase 7 be updated to stamp the category, or is that a separate ticket?
105
+ 5. Should `wr.workflow-for-workflows` Phase 7 be updated to stamp the category, or is that a separate ticket?
@@ -66,9 +66,9 @@ The workflow catalog has grown to ~36 items (25 JSON files + routines + bundled)
66
66
  { "id": "testing", "displayName": "Testing & Diagnostics" }
67
67
  ],
68
68
  "workflows": {
69
- "mr-review-workflow-agentic": { "category": "review_audit" },
70
- "bug-investigation-agentic": { "category": "investigation" },
71
- "coding-task-workflow-agentic": { "category": "coding" },
69
+ "wr.mr-review": { "category": "review_audit" },
70
+ "wr.bug-investigation": { "category": "investigation" },
71
+ "wr.coding-task": { "category": "coding" },
72
72
  "test-session-persistence": { "category": "testing", "hidden": true },
73
73
  ...
74
74
  }
@@ -107,4 +107,4 @@ The workflow catalog has grown to ~36 items (25 JSON files + routines + bundled)
107
107
  2. Should routines appear in summary or be hidden?
108
108
  3. Should `categorySummary` include a short description per category?
109
109
  4. What display name for `review_audit`?
110
- 5. Should workflow-for-workflows Phase 7 prompt for category?
110
+ 5. Should wr.workflow-for-workflows Phase 7 prompt for category?
@@ -0,0 +1,430 @@
1
+ # Workflow Modernization Design
2
+
3
+ **Status:** Active
4
+ **Created:** 2026-04-20
5
+ **Updated:** 2026-04-21 (Phase 0 complete -- goal challenged, path set, context populated)
6
+ **Owner:** WorkTrain daemon session (shaping)
7
+
8
+ ---
9
+
10
+ ## Artifact Strategy
11
+
12
+ **This document is for human readability only.** It is NOT required workflow memory. If a chat rewind occurs, the durable record lives in:
13
+ - WorkRail step notes (notesMarkdown in each `complete_step` call)
14
+ - Explicit context variables passed between steps
15
+
16
+ Do not treat this file as the source of truth for what step the session is on, what decisions have been made, or what constraints apply. Those live in the session notes.
17
+
18
+ This file is maintained alongside the session as a readable summary of findings and decisions. It may lag behind the session notes slightly.
19
+
20
+ ### Capability status (verified Phase 0b, 2026-04-21)
21
+
22
+ | Capability | Available | How verified | Notes |
23
+ |---|:---:|---|---|
24
+ | Web browsing | YES | `curl https://example.com` returned HTML (5s timeout) | Available via curl; no dedicated browser tool needed |
25
+ | Delegation (spawn_agent) | YES | `spawn_agent` with `wr.discovery` returned `{childSessionId, outcome: "stuck"}` -- mechanism works | `wr.discovery` is a multi-step workflow, unsuitable as a trivial probe; stuck on internal heuristic. Spawn mechanism itself is functional. |
26
+ | Git / GitHub CLI | YES | `gh pr list`, `git log` working throughout session | No issues |
27
+
28
+ **Capability decisions:**
29
+ - **Web browsing:** Available but not needed. All evidence for this task is in-repo (workflow files, schema, planning docs). No external references needed. Skipping -- fallback to in-repo data is fully sufficient.
30
+ - **Delegation:** Mechanism is available. Whether to use it is a per-step judgment. For design/synthesis work (this phase), delegation adds overhead without benefit -- the main agent owns synthesis by rule. For independent parallel audits (e.g. gap-scoring multiple workflows simultaneously), delegation could reduce latency. Decision will be made per step.
31
+
32
+ ---
33
+
34
+ ## Context / Ask
35
+
36
+ **Stated goal (original):** "Legacy workflow modernization -- `exploration-workflow.json` is the highest-priority candidate."
37
+
38
+ **Why this was a solution statement, not a problem statement:**
39
+ The original framing prescribes the fix (modernize specific files) and even names the approach (migrate to v2/lean patterns). It does not describe what is wrong with agent outputs or why those outputs are suboptimal.
40
+
41
+ **Critical factual finding from goal challenge:**
42
+ The stated #1 candidate (`workflows/exploration-workflow.json`) no longer exists. It was modernized in commit `f27507f4` (Mar 27) and then consolidated into `wr.discovery.json` in commit `a0ddaaac` (Mar 29). The planning docs (`docs/tickets/next-up.md`, `docs/roadmap/open-work-inventory.md`) were not updated and are stale.
43
+
44
+ **Reframed problem statement:**
45
+ Agents running several bundled workflows produce lower-quality outputs than they should because those workflows lack structural features (loop-control, evidence-gating, notes-first durability, assessment gates) that the current engine supports -- and the planning documents pointing to this work are themselves stale and misdirected.
46
+
47
+ ---
48
+
49
+ ## Path Recommendation
50
+
51
+ **Chosen path: `design_first`**
52
+
53
+ Rationale (justified against alternatives):
54
+ - **vs. `landscape_first`:** Landscape data was already available in-repo (workflow files, gap scores in prior session notes). Running a landscape-first pass would re-derive what we already have. The dominant risk is NOT "we don't know what's out there" -- it is "we pick the wrong candidates or wrong unit of work."
55
+ - **vs. `full_spectrum`:** Full spectrum adds reframing work on top of landscape + design. The reframing was already done in the goal-challenge step (`goalWasSolutionStatement = true`, reframed problem captured). No additional reframing needed; the design question is sharp enough.
56
+ - **`design_first` is correct because:** the primary decision to resolve is: (A) which candidates deserve assessment-gate redesign vs. cosmetic migration, and (B) whether planning doc correction is a prerequisite gate or can be done in parallel with workflow work. These are design/sequencing questions, not landscape gaps.
57
+ - The existing `docs/plans/workflow-modernization-design.md` (this file) from the prior session provides the landscape packet already. Phase 0's job is to correct errors in that packet, finalize the path, and set up the direction decision for Phase 1.
58
+
59
+ ---
60
+
61
+ ## Constraints / Anti-goals
62
+
63
+ **Core constraints:**
64
+ - Do not modify `src/daemon/`, `src/trigger/`, `src/v2/`, `triggers.yml`, or `~/.workrail/daemon-soul.md`
65
+ - All workflow changes must pass `npx vitest run tests/lifecycle/bundled-workflow-smoke.test.ts` (currently 37/37)
66
+ - All workflows must validate via `npm run validate:registry` (no structural regressions)
67
+ - No new markdown documentation files unless explicitly authorized
68
+ - Each modernized workflow needs a GitHub issue before implementation begins
69
+ - Never push directly to main -- branch + PR
70
+
71
+ **Anti-goals:**
72
+ - Do NOT treat stamping (`npm run stamp-workflow`) as a proxy for behavioral improvement
73
+ - Do NOT modernize workflows that are currently working well enough and rarely used (unknown usage data)
74
+ - Do NOT scope-creep into engine changes or authoring-spec changes during workflow migration
75
+ - Do NOT treat `recommendedPreferences` and `features` field addition as sufficient for "done"
76
+ - Do NOT preserve legacy step structures that are architecturally wrong -- if a workflow needs redesign, name it as redesign not modernization
77
+
78
+ ---
79
+
80
+ ## Landscape Packet
81
+
82
+ ### Current workflow inventory (corrected -- Phase 1c, 2026-04-21)
83
+
84
+ > **CRITICAL CORRECTION FROM PRIOR VERSION:** The prior landscape incorrectly identified `wfw.v2.json` and `coding-task` as having orphaned (unused) assessment gates. The orphan check did not recurse into loop body steps. A recursive check confirms ALL declared assessments in ALL workflows are properly wired. There are ZERO orphaned assessment gates in the repo.
85
+
86
+ > **Phase 1c scan methodology:** Full recursive walk of steps + loop bodies. Fields checked: `metaGuidance`, `recommendedPreferences`, `features`, `references`, `validatedAgainstSpecVersion`, and functional assessment gates (declared + referenced without broken refs). Loop body assessment refs counted correctly.
87
+
88
+ **Summary counts:**
89
+ - Workflows WITH functional assessment gates: **7** (bug-investigation, coding-task, mr-review, test-artifact-loop-control, wfw, wfw.v2, wr.shaping)
90
+ - Workflows WITHOUT functional assessment gates: **17**
91
+ - Missing `recommendedPreferences`: **11**
92
+ - Missing `references`: **21** (all but wfw, wfw.v2, wr.production-readiness-audit)
93
+ - Missing `validatedAgainstSpecVersion`: **19**
94
+
95
+ | Workflow | MG | RP | Feat | Refs | Stamp | Gates | Steps |
96
+ |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
97
+ | `wr.adaptive-ticket-creation.json` | Y | N | N | N | N | **N** | 8 |
98
+ | `wr.architecture-scalability-audit.json` | Y | Y | Y | N | N | **N** | 7 |
99
+ | `bug-investigation.agentic.v2.json` | Y | Y | N | N | N | **Y** | 9 |
100
+ | `wr.classify-task.json` | Y | Y | N | N | Y | N | 1 |
101
+ | `wr.coding-task.json` | Y | Y | N | N | N | **Y** | 14 |
102
+ | `wr.cross-platform-code-conversion.v2.json` | Y | Y | N | N | N | N | 14 |
103
+ | `wr.document-creation.json` | Y | N | N | N | N | **N** | 8 |
104
+ | `wr.documentation-update.json` | Y | N | N | N | N | **N** | 6 |
105
+ | `wr.intelligent-test-case-generation.json` | Y | N | N | N | N | **N** | 6 |
106
+ | `learner-centered-course-workflow.json` | Y | N | N | N | N | N | 11 |
107
+ | `mr-review-workflow.agentic.v2.json` | Y | Y | Y | N | N | **Y** | 7 |
108
+ | `wr.personal-learning-materials.json` | Y | N | N | N | N | N | 6 |
109
+ | `wr.presentation-creation.json` | Y | N | N | N | N | N | 5 |
110
+ | `wr.production-readiness-audit.json` | Y | Y | Y | Y | N | **N** | 7 |
111
+ | `wr.relocation-us.json` | Y | Y | N | N | N | N | 9 |
112
+ | `wr.scoped-documentation.json` | Y | N | N | N | N | N | 5 |
113
+ | `test-artifact-loop-control.json` | Y | N | N | N | N | Y | 3 |
114
+ | `test-session-persistence.json` | N | N | N | N | N | N | 5 |
115
+ | `wr.ui-ux-design.json` | Y | Y | Y | N | Y | N | 8 |
116
+ | `wr.diagnose-environment.json` | N | N | N | N | N | N | 2 |
117
+ | `wr.workflow-for-workflows.json` | Y | Y | Y | Y | Y | Y | 10 |
118
+ | `wr.workflow-for-workflows.v2.json` | Y | Y | Y | Y | Y | **Y** | 10 |
119
+ | `wr.discovery.json` | Y | Y | Y | N | Y | N | 22 |
120
+ | `wr.shaping.json` | Y | Y | N | N | N | **Y** | 9 |
121
+
122
+ **Working examples for assessment gate patterns:**
123
+ - `wr.shaping.json` -- cleanest: 1 dimension per assessment, `low`/`high` levels, `require_followup` on `low`
124
+ - `wr.coding-task.json` -- multi-assessment per step, loop-body refs
125
+ - `mr-review-workflow.agentic.v2.json` -- 3 refs on a single final validation step
126
+ - `bug-investigation.agentic.v2.json` -- single gate on diagnosis validation
127
+
128
+ **Current smoke test baseline:** 37/37 (verified 2026-04-21)
129
+
130
+ ### Key landscape observations (corrected Phase 1c, 2026-04-21)
131
+
132
+ 1. **Two prompt formats coexist:** `promptBlocks` (structured object with goal/constraints/procedure/verify) and raw `prompt` string. The authoring spec recommends `promptBlocks`. Not all "modern" workflows use it consistently.
133
+
134
+ 2. **`exploration-workflow.json` is gone:** Absorbed into `wr.discovery.json`. Planning docs must be corrected before any implementation work begins.
135
+
136
+ 3. **Several "candidates" from open-work-inventory also no longer exist:** `mr-review-workflow.json`, `bug-investigation.json`, `design-thinking-workflow.json` -- all absorbed or renamed. The list in `open-work-inventory.md` is materially stale.
137
+
138
+ 4. **Assessment gates are the biggest behavioral differentiator:** 7/24 workflows have functional assessment gates. The 17 without them have no engine-enforced quality checkpoints -- all validation is prose-only.
139
+
140
+ 5. **`wfw.v2.json` DOES have functional assessment gates** -- the prior session's "orphaned assessments" finding was wrong. The gates live in loop body steps (`phase-6a`, `phase-6b`, `phase-6c`). All 4 declared gates are referenced and wired. Prior design doc contained a material error on this point.
141
+
142
+ 6. **`recommendedPreferences` is a common gap:** 11/24 workflows are missing it. Easy to add, genuine behavioral improvement.
143
+
144
+ 7. **`references` is almost universally missing:** Only 3 workflows have it (`wfw`, `wfw.v2`, `wr.production-readiness-audit`). This is cosmetic for most workflows -- references are informational, not enforced.
145
+
146
+ 8. **The "unstamped" list from `validate:registry` is cosmetic advisory only** -- it names 17 unstamped workflows but stamping alone is not a quality improvement goal.
147
+
148
+ 9. **`wr.production-readiness-audit.json` has no assessment gates** -- despite being a review workflow with a clear audit focus, it uses no `assessmentRefs`. This is a behavioral gap on a high-value workflow.
149
+
150
+ ### Phase 1c hard-constraint findings (engine/schema reality checks)
151
+
152
+ **Assessment gate mechanism (confirmed from schema + engine source):**
153
+ - Schema: `assessments` is a top-level array on the workflow; each entry has `id` and `dimensions[]`
154
+ - Step field: `assessmentRefs` (plural array of assessment IDs) + optional `assessmentConsequences` (at most one per step)
155
+ - Engine: `require_followup` consequence is genuinely enforced -- `assessment-consequence-event-builder.ts` emits a blocking event
156
+ - Rule `assessment-v1-constraints` (required level): a step MAY have multiple assessmentRefs; at most one assessmentConsequences; trigger uses `anyEqualsLevel`
157
+ - Rule `assessment-use-for-bounded-judgment` (recommended level): use when step needs bounded judgment before workflow can safely advance
158
+ - **IMPORTANT:** `assessmentRefs` is defined ONLY on `standardStep` (not `loopStep`). Loop body steps ARE standard steps and CAN have assessmentRefs. Only the loop container step itself cannot.
159
+
160
+ **Valid `recommendedPreferences` values (from schema enum):**
161
+ - `recommendedAutonomy`: `guided` | `full_auto_stop_on_user_deps` | `full_auto_never_stop`
162
+ - `recommendedRiskPolicy`: `conservative` | `balanced` | `aggressive`
163
+ - Pattern for review/audit workflows: `guided` + `conservative`
164
+
165
+ **Valid `features` values (closed set, from feature-registry.ts):**
166
+ - `wr.features.memory_context`
167
+ - `wr.features.capabilities`
168
+ - `wr.features.subagent_guidance`
169
+ - Only `wr.features.subagent_guidance` is used by modern baselines; only declare when actually applicable
170
+
171
+ **Precedent patterns from working examples (`wr.shaping.json` is the cleanest):**
172
+ - Declare assessment at workflow level: `{ id: "frame-soundness", dimensions: [{ id: "frame_soundness", ... }] }`
173
+ - Reference from step: `assessmentRefs: ["frame-soundness"]`
174
+ - Add consequence: `assessmentConsequences: [{ when: { anyEqualsLevel: "low" }, effect: { kind: "require_followup", guidance: "..." } }]`
175
+ - Each assessment has one dimension in the clean examples; multiple dimensions are valid but must be orthogonal
176
+
177
+ ---
178
+
179
+ ## Problem Frame Packet
180
+
181
+ ### Stakeholders and jobs
182
+
183
+ **Stakeholder 1: Project owner (Etienne)**
184
+ - *Job to be done:* Maintain a catalog of reliable, high-quality bundled workflows that make autonomous daemon sessions (full-pipeline, implement, mr-review) produce better outputs with less rework.
185
+ - *Pain:* Planning docs reference deleted files. "Modernization" tasks are in the queue but it's unclear which ones are worth doing vs. which are documentation hygiene.
186
+ - *Constraint:* Active focus is on engine/daemon/console layer (recent commits). Workflow authoring work competes for bandwidth.
187
+
188
+ **Stakeholder 2: Autonomous agents (daemon sessions)**
189
+ - *Job to be done:* Complete coding tasks, reviews, discovery, and shaping with high output quality and minimal wasted iterations.
190
+ - *Pain:* Workflows with no assessment gates have no engine-enforced quality checkpoints. All verification is prose-only -- the engine cannot block advancement on poor outputs.
191
+ - *Constraint:* Agents only run the workflows they're spawned with. The 4 production workflows are what matter.
192
+
193
+ **Stakeholder 3: Future workflow authors**
194
+ - *Job to be done:* Write new workflows modeled on existing bundled ones. Bad exemplars get copied.
195
+ - *Pain:* The "lower priority" workflows in the catalog (document-creation, adaptive-ticket, documentation-update) have no assessment gates -- if authors copy these as templates, the pattern propagates.
196
+ - *Constraint:* No authoring enforcement beyond what `validate:registry` catches.
197
+
198
+ ### The 4 production workflows (what actually runs in the daemon pipeline)
199
+
200
+ From `triggers.yml` and `src/coordinators/modes/`:
201
+ 1. **`wr.discovery`** (full-pipeline mode, step 1) -- already validated (v3), has promptFragments and routines. No assessment gates -- but discovery is a research step, gates may not be appropriate here.
202
+ 2. **`wr.shaping`** (full-pipeline mode, step 3) -- has functional assessment gates on `frame-gate` and `breadboard-and-elements`. Not validated (`validatedAgainstSpecVersion` missing).
203
+ 3. **`wr.coding-task`** (full-pipeline + implement mode, step 5) -- has functional assessment gates (8 of them). Not validated. This is the highest-stakes workflow: it writes code.
204
+ 4. **`wr.mr-review`** (full-pipeline + implement mode, final step) -- has functional assessment gates (3 of them). Not validated. Issue #174 (add assessment gate) is OPEN but appears already done -- commit c83aa180 marked it done.
205
+
206
+ **Key tension**: The 4 production workflows already have assessment gates. The "legacy" workflows that don't have gates (`wr.adaptive-ticket-creation`, `wr.documentation-update`, `wr.production-readiness-audit`, etc.) are NOT used in the autonomous pipeline -- they're human-triggered workflows.
207
+
208
+ ### The real problem, decomposed
209
+
210
+ **Layer 1 (surface):** Planning docs reference workflows that no longer exist (`exploration-workflow.json`, etc.). Issue #174 is open but appears closed in practice.
211
+
212
+ **Layer 2 (operational):** The 4 production daemon workflows are missing `validatedAgainstSpecVersion` stamps. This is the most meaningful "modernization" gap for the actual production system -- not adding gates (they have them) but formally validating them against the current spec.
213
+
214
+ **Layer 3 (quality catalog):** Human-triggered workflows (`wr.adaptive-ticket-creation`, `wr.documentation-update`, `wr.production-readiness-audit`) lack assessment gates. These workflows run when humans explicitly invoke them. Adding gates here improves quality for human-driven sessions, not daemon sessions.
215
+
216
+ **Layer 4 (strategic):** "Modernization" as a concept conflates two different operations:
217
+ - **Cosmetic migration:** add schema fields, update prompt format, stamp the workflow
218
+ - **Behavioral redesign:** add assessment gates, restructure loops, tighten output contracts
219
+
220
+ ### Tensions
221
+
222
+ **Tension 1: Production value vs. legacy catalog**
223
+ - The 4 production workflows already have gates. Modernizing legacy workflows (document-creation, ticket-creation, etc.) helps human-driven sessions but doesn't improve the autonomous pipeline.
224
+ - *Implication:* "modernization for the daemon" is mostly done. "Modernization for human users" is the real remaining work.
225
+
226
+ **Tension 2: Stamping vs. behavioral improvement**
227
+ - `validatedAgainstSpecVersion` is a stamp that says "this workflow was reviewed against the current authoring spec." Most production workflows are missing this stamp.
228
+ - Running `wr.workflow-for-workflows.v2.json` on a workflow is the intended process to earn the stamp.
229
+ - But running `wr.workflow-for-workflows.v2.json` takes significant agent time and may find things to fix, making the "just stamp it" shortcut dishonest.
230
+
231
+ **Tension 3: Documentation rot creates misdirected work**
232
+ - The open-work-inventory and tickets/next-up.md reference deleted files and closed work (issue #174, exploration-workflow.json).
233
+ - If these docs are used to prioritize work, they'll produce the wrong priorities.
234
+ - Fixing the docs first is cheap but it's not "shipping workflow improvements."
235
+
236
+ **Tension 4: Active focus is elsewhere**
237
+ - Recent commits (Apr 20-21) are all engine/daemon/console: trigger fixes, coordinator crashes, console bugs.
238
+ - The project owner's actual momentum is on infrastructure, not workflow authoring.
239
+ - Starting a workflow modernization project now means context-switching from hot infrastructure work.
240
+
241
+ ### Success criteria (observable)
242
+
243
+ 1. The 4 production daemon workflows (`wr.discovery`, `wr.shaping`, `coding-task`, `mr-review`) all have `validatedAgainstSpecVersion: 3` after genuine review via `wr.workflow-for-workflows.v2.json`
244
+ 2. Planning docs (`open-work-inventory.md`, `tickets/next-up.md`) reference only files that exist in the repo, and issue #174 is closed if it's actually done
245
+ 3. At least one non-production workflow with a review/audit purpose (`wr.production-readiness-audit.json` or `wr.adaptive-ticket-creation.json`) gains functional assessment gates
246
+ 4. `npx vitest run` passes (37/37 minimum) before and after any changes
247
+
248
+ ### Reframes and HMW questions
249
+
250
+ **Reframe 1: "Modernization" is actually two separate projects**
251
+ - Project A: Fix documentation rot (cheap, prerequisite, no workflow changes)
252
+ - Project B: Validate + stamp the 4 production workflows (high value, expensive, requires running quality gate)
253
+
254
+ **Reframe 2: The daemon doesn't need modernization -- it needs validation**
255
+ The autonomous pipeline workflows already use assessment gates. What they're missing is the formal `validatedAgainstSpecVersion` stamp, which is earned by running them through `wr.workflow-for-workflows.v2.json`. The work is validation, not "modernization."
256
+
257
+ **HMW 1:** How might we get the 4 production workflows stamped without the full 10-step `wr.workflow-for-workflows.v2.json` process for each?
258
+
259
+ **HMW 2:** How might we prioritize the non-production workflows without session outcome data to guide us?
260
+
261
+ ### Primary framing risk
262
+
263
+ **The specific condition that would make this framing wrong:**
264
+
265
+ If `wr.discovery.json`, `wr.shaping.json`, `wr.coding-task.json`, or `mr-review-workflow.agentic.v2.json` actually have material quality problems that assessment gates don't catch (e.g., poorly structured prompts, missing output contracts, wrong loop structure), then the framing "production workflows are fine, legacy workflows need work" is wrong. The production workflows might need behavioral redesign, not just stamping. This would only be discoverable by actually running `wr.workflow-for-workflows.v2.json` on each of them and seeing what quality gate failures come back.
266
+
267
+ ### Primary uncertainty
268
+
269
+ **What does the quality gate actually find?** We know the production workflows have assessment gates and pass the smoke test. We do NOT know whether running `wr.workflow-for-workflows.v2.json` on them at THOROUGH depth would surface material prompt quality issues. This is the single highest-uncertainty input for the design.
270
+
271
+ ---
272
+
273
+ ## Phase 2 Synthesis
274
+
275
+ ### The opportunity
276
+
277
+ The autonomous pipeline runs on 4 workflows (`wr.discovery`, `wr.shaping`, `wr.coding-task`, `wr.mr-review`) that already have structural quality (assessment gates, loops, evidence contracts) but have never been formally validated against the current authoring spec. Closing this gap makes WorkRail's own daemon sessions exemplary examples of spec-compliant workflow execution -- which matters both for quality and for platform credibility.
278
+
279
+ At the same time, planning docs reference 7 deleted files and one open-but-done issue (#174), causing any planning effort to misdirect work. Fixing this is cheap and is a prerequisite to trustworthy prioritization.
280
+
281
+ ### Decision criteria (a good direction must satisfy all 5)
282
+
283
+ 1. **Sequencing discipline:** Corrects documentation rot before touching workflow JSON
284
+ 2. **Empirical before prescriptive:** Runs quality gate on at least one production workflow before committing to full redesign scope
285
+ 3. **Production-first value:** Prioritizes the 4 daemon pipeline workflows over the 17 ungated legacy catalog workflows
286
+ 4. **No cosmetic compliance:** Does not stamp `validatedAgainstSpecVersion` without a genuine quality gate review
287
+ 5. **Incremental shippability:** Each piece produces a standalone, shippable improvement
288
+
289
+ ### Riskiest assumption
290
+
291
+ "The 4 production workflows have sound prompt quality and will pass the quality gate with minor fixes." If they have structural quality issues (missing output contracts, weak evidence requirements, poor loop termination), Stream B expands into redesign territory. Only testable by running the gate.
292
+
293
+ ### Remaining uncertainty type
294
+
295
+ **Prototype-learning uncertainty.** We know what to do and in what order. We don't know how much work the quality gate will surface. The scope of Stream B is only knowable by doing it.
296
+
297
+ ---
298
+
299
+ ## Candidate Generation Setup (Phase 3b)
300
+
301
+ **Path:** `design_first`
302
+ **candidateCountTarget:** 3
303
+
304
+ ### Required properties of the candidate set
305
+
306
+ Per the `design_first` path contract, the 3 candidates must satisfy:
307
+
308
+ 1. **At least one reframe candidate:** One candidate must challenge whether the two work streams (docs fix + production workflow validation) are the right investment at all. The obvious directions are "clean up docs" and "run quality gate on production workflows." The reframe asks: what if neither is the best use of the same effort right now? A valid reframe might be: retire low-value workflows from the catalog, invest in lint tooling that prevents future regression, or defer workflow work entirely in favor of the active engine/daemon/console work.
309
+
310
+ 2. **Meaningful differentiation:** Candidates must differ in their primary bet, not just in ordering or scope. Minor variations on "do A then B in different order" do not count as distinct candidates.
311
+
312
+ 3. **Ground in the 5 decision criteria:** Each candidate must be evaluable against: sequencing discipline, empirical-before-prescriptive, production-first value, no cosmetic compliance, incremental shippability. A candidate that violates decision criterion 4 (cosmetic compliance) is disqualified.
313
+
314
+ 4. **Prototype-learning uncertainty honored:** At least one candidate must explicitly account for the unknown scope of Stream B (what the quality gate finds) rather than assuming it away.
315
+
316
+ ### Bias to guard against
317
+
318
+ Because the two streams are well-defined, generation will be pulled toward micro-variations: "do A then B1 only," "do A then B1 and B2," "do A then B3." This is the clustering failure. Each of the 3 candidates must be defendable as the *right* strategy in some scenario, not just the same strategy with different scope.
319
+
320
+ ### Anti-candidates (explicitly ruled out by decision criteria)
321
+
322
+ - Any candidate that adds `validatedAgainstSpecVersion` without running `wr.workflow-for-workflows.v2.json` -- violates criterion 4
323
+ - Any candidate that prioritizes legacy catalog workflows (adaptive-ticket, document-creation, etc.) over production pipeline workflows -- violates criterion 3 unless it argues from the reframe position that this is intentionally the right bet
324
+
325
+ ---
326
+
327
+ ## Candidate Directions
328
+
329
+ ### Direction A: Docs-first + empirical production validation (recommended)
330
+
331
+ **Core bet:** Fix the documentation foundation first, then run `wr.workflow-for-workflows.v2.json` on `wr.coding-task.json` as a probe -- let that run's findings determine the scope of remaining work.
332
+
333
+ **What:**
334
+ 1. Update `open-work-inventory.md` and `tickets/next-up.md` to remove 7 stale file references
335
+ 2. Close GitHub issue #174 (assessment-gate adoption in MR review is already done per commit c83aa180)
336
+ 3. Run `wr.workflow-for-workflows.v2.json` on `wr.coding-task.json` at STANDARD depth
337
+ 4. If quality gate finds only minor issues: stamp it, then repeat for `wr.shaping.json`
338
+ 5. If quality gate finds significant issues: create a focused GitHub issue for the specific fixes, do NOT stamp until fixed
339
+
340
+ **Satisfies decision criteria:**
341
+ - ✅ Sequencing discipline (docs first)
342
+ - ✅ Empirical before prescriptive (quality gate run before scope commitment)
343
+ - ✅ Production-first value (coding-task is the highest-use production workflow)
344
+ - ✅ No cosmetic compliance (stamp only after genuine gate run)
345
+ - ✅ Incremental shippability (docs fix ships independently; each stamped workflow ships independently)
346
+
347
+ **Handles prototype-learning uncertainty:** Yes -- the quality gate finding is explicitly the branch point for scope.
348
+
349
+ **Risks:** Quality gate may find significant issues requiring more work than expected. Mitigated by treating the first gate run as a probe, not a commitment to fix everything.
350
+
351
+ ---
352
+
353
+ ### Direction B: Catalog pruning + tooling investment (reframe)
354
+
355
+ **Core bet:** Instead of migrating legacy workflows, retire the ones that add clutter without adding value, and invest the remaining effort in lint tooling that prevents future regression across all workflows.
356
+
357
+ **What:**
358
+ 1. Audit all 24 bundled workflows for "is this worth keeping?" -- retire workflows that are low-use, domain-specific (e.g., `wr.relocation-us.json`), or fully superseded
359
+ 2. For surviving non-production workflows, add a `validate:registry` rule that flags workflows lacking `recommendedPreferences` and `assessments` when they have review/validation steps
360
+ 3. Skip the manual quality-gate-run process entirely -- let the linter enforce quality going forward
361
+
362
+ **The reframe argument:** "Modernization" of individual workflows is a treadmill -- as soon as you finish, new engine features exist and the catalog drifts again. Tooling that enforces quality across all current and future workflows has higher expected return than manual per-workflow migration.
363
+
364
+ **Satisfies decision criteria:**
365
+ - ✅ Sequencing discipline (retirement audit precedes tooling)
366
+ - ⚠️ Empirical before prescriptive (linting is forward-looking, not backward-empirical)
367
+ - ⚠️ Production-first value (linting helps all workflows equally, not production-first)
368
+ - ✅ No cosmetic compliance (linting catches cosmetic-only changes)
369
+ - ✅ Incremental shippability (each new lint rule ships independently)
370
+
371
+ **Fails criteria 3 if:** The production pipeline workflows are never stamped -- which this direction leaves unaddressed.
372
+
373
+ **When this is the right bet:** If the project owner's real goal is sustainable quality rather than a one-time migration.
374
+
375
+ ---
376
+
377
+ ### Direction C: Defer workflow work, close the open issues, nothing more
378
+
379
+ **Core bet:** Workflow quality is not the bottleneck for the autonomous pipeline today. The active momentum is on engine/daemon/console infrastructure. Switching context to workflow authoring work now has negative expected value. The right action is minimal: close #174, note the stale doc references, and return to workflow work when engine work is in a stable state.
380
+
381
+ **What:**
382
+ 1. Close GitHub issue #174 (it's already done)
383
+ 2. Add a comment to `open-work-inventory.md` noting that 7 file references are stale (no edits -- avoid accidental commits)
384
+ 3. Stop. No workflow JSON changes. No quality gate runs.
385
+
386
+ **The reframe argument:** The project owner said "next" but the actual commit velocity and open issue set show infrastructure work is active and urgent. A half-completed workflow migration (docs fixed, one workflow stamped) is worse than a clean slate -- it creates partial work artifacts that block future context.
387
+
388
+ **Satisfies decision criteria:**
389
+ - ✅ Sequencing discipline (N/A -- nothing is done)
390
+ - N/A Empirical before prescriptive
391
+ - ✅ Production-first value (by omission -- production workflows already have gates)
392
+ - ✅ No cosmetic compliance (nothing stamped)
393
+ - ✅ Incremental shippability (single atomic action: close #174)
394
+
395
+ **When this is the right bet:** If the project owner's bandwidth is genuinely constrained by active engine work, and the workflow modernization task was added to the queue prematurely.
396
+
397
+ ---
398
+
399
+ ## Challenge Notes (from goal challenge step)
400
+
401
+ 1. **Assumption challenged:** `exploration-workflow.json` is the top priority → **Refuted.** File does not exist.
402
+ 2. **Assumption challenged:** Adding modern schema fields improves agent outcomes → **Unverified.** Fields alone don't change behavior; assessment gates do.
403
+ 3. **Assumption challenged:** "Modernization" (preserve structure, upgrade syntax) is the right unit of work → **Contested.** Some workflows may need redesign, not migration.
404
+
405
+ ---
406
+
407
+ ## Resolution Notes
408
+
409
+ **Phase 0 (2026-04-21):** Path confirmed as `design_first`. Context fully populated for downstream steps. Key new finding from this session: the prior design doc (2026-04-20) was not committed to git and is an untracked sidecar -- this is correct per the artifact strategy. No engine changes since the prior session that affect workflow schema (the `findingCategory` addition in #644 is schema/engine but only adds a new field to review-verdict findings; it does not change the assessment gate contract). Smoke test baseline confirmed: 37/37 still passing. Open GitHub issues: only #174 ("Adopt assessment-gate follow-up in MR review") is directly related.
410
+
411
+ ---
412
+
413
+ ## Decision Log
414
+
415
+ | Decision | Rationale | Date |
416
+ |---|---|---|
417
+ | path = `design_first` | Goal was solution-statement; primary risk is wrong candidates/wrong unit of work | 2026-04-20 |
418
+ | No subagent delegation in Phase 0 | All data available in-repo via Bash/Read tools; synthesis task is single-thread | 2026-04-20 |
419
+ | Prior landscape corrected | assessmentRef (singular) vs assessmentRefs (plural) error fixed; modern baselines re-verified | 2026-04-20 |
420
+ | `wr.workflow-for-workflows.v2.json` added to HIGH-priority list | It is the quality gate workflow but lacks assessmentRefs at step level -- ironic and high-value fix | 2026-04-20 |
421
+ | Stale planning docs identified as prerequisite gate | Must correct docs before implementation begins -- they reference deleted targets | 2026-04-20 |
422
+ | Delegation: mechanism available, not used for design work | spawn_agent returned childSessionId on probe (outcome: "stuck" because wr.discovery is multi-step, not because spawn is broken). Not used for Phase 0 design/synthesis -- main agent owns synthesis by rule. Will use for independent parallel audits if latency matters in later phases. | 2026-04-20/21 |
423
+ | Web browsing: available via curl | curl to example.com returned HTML -- network reachable; no web browsing needed for this task (all data is in-repo) | 2026-04-20/21 |
424
+ | Artifact strategy: doc is readable summary only | Execution truth lives in step notes + context variables; design doc is for human reference only | 2026-04-20 |
425
+
426
+ ---
427
+
428
+ ## Final Summary
429
+
430
+ *(to be filled in at end of shaping session)*
@@ -22,7 +22,7 @@
22
22
  ## Impact Surface
23
23
 
24
24
  - `spec/workflow.schema.json` — new optional field
25
- - `workflow-for-workflows.v2.json` — new stamp step in Phase 7
25
+ - `wr.workflow-for-workflows.v2.json` — new stamp step in Phase 7
26
26
  - `src/mcp/output-schemas.ts` — new field in `V2WorkflowListItemSchema` and `V2WorkflowInspectOutputSchema`
27
27
  - `src/mcp/handlers/v2-workflow.ts` — staleness computation at list/inspect time
28
28
  - Console workflow list — new staleness indicator (follow `migration`/`staleRoots` visual pattern)
@@ -48,21 +48,21 @@
48
48
 
49
49
  ### Candidate B: Spec-version stamp in workflow JSON ✓ RECOMMENDED
50
50
 
51
- **Summary:** Add optional `validatedAgainstSpecVersion: number` to the workflow JSON schema. `workflow-for-workflows` stamps this in Phase 7. Engine reads the field and compares against `spec/authoring-spec.json` version. Three-tier signal: `none` (stamp matches current), `likely` (stamp < current), `possible` (no stamp).
51
+ **Summary:** Add optional `validatedAgainstSpecVersion: number` to the workflow JSON schema. `wr.workflow-for-workflows` stamps this in Phase 7. Engine reads the field and compares against `spec/authoring-spec.json` version. Three-tier signal: `none` (stamp matches current), `likely` (stamp < current), `possible` (no stamp).
52
52
 
53
- - **Tensions resolved:** deterministic, precise three-tier signal, actionable reason string, running workflow-for-workflows naturally clears the flag
53
+ - **Tensions resolved:** deterministic, precise three-tier signal, actionable reason string, running wr.workflow-for-workflows naturally clears the flag
54
54
  - **Tensions accepted:** bootstrapping — existing workflows start as `possible` until reviewed
55
- - **Boundary:** schema + workflow-for-workflows + output-schemas + handler
56
- - **Failure mode:** teams run workflow-for-workflows locally but forget to commit the JSON
55
+ - **Boundary:** schema + wr.workflow-for-workflows + output-schemas + handler
56
+ - **Failure mode:** teams run wr.workflow-for-workflows locally but forget to commit the JSON
57
57
  - **Repo pattern:** adapts `workflowHash` pattern (content-derived identity) to spec-version identity
58
- - **Gains:** deterministic, self-documenting, architectural fix, clears naturally with workflow-for-workflows
58
+ - **Gains:** deterministic, self-documenting, architectural fix, clears naturally with wr.workflow-for-workflows
59
59
  - **Losses:** migration cost (organic, not forced), adds schema field most workflows won't have immediately
60
60
  - **Scope:** best-fit long-term
61
61
  - **Philosophy:** honors Determinism, Make illegal states unrepresentable, Explicit domain types. Minor YAGNI tension.
62
62
 
63
63
  **Implementation steps:**
64
64
  1. `spec/workflow.schema.json`: add `validatedAgainstSpecVersion?: number` (optional, no existing workflow breaks)
65
- 2. `workflow-for-workflows.v2.json`: Phase 7 stamps `validatedAgainstSpecVersion` to current spec version before handoff
65
+ 2. `wr.workflow-for-workflows.v2.json`: Phase 7 stamps `validatedAgainstSpecVersion` to current spec version before handoff
66
66
  3. `src/mcp/output-schemas.ts`: add `staleness?: { level: 'none' | 'possible' | 'likely', reason: string, specVersionAtLastReview?: number }` to `V2WorkflowListItemSchema` and `V2WorkflowInspectOutputSchema`
67
67
  4. `src/mcp/handlers/v2-workflow.ts`: read `validatedAgainstSpecVersion` from compiled workflow; compare against `spec/authoring-spec.json` version; compute staleness
68
68
  5. Console: show staleness indicator in workflow list
@@ -76,7 +76,7 @@
76
76
 
77
77
  - **Tensions resolved:** zero migration + precision where stamps exist
78
78
  - **Tensions accepted:** git-date fallback inherits A's determinism problem; two code paths
79
- - **Failure mode:** `possible` from the fallback becomes permanent wallpaper for workflows never run through workflow-for-workflows
79
+ - **Failure mode:** `possible` from the fallback becomes permanent wallpaper for workflows never run through wr.workflow-for-workflows
80
80
  - **Scope:** slightly too broad for a first version, correct long-term shape
81
81
  - **Philosophy:** deterministic where stamps exist; conflicts with Determinism in the fallback path
82
82
 
@@ -84,7 +84,7 @@
84
84
 
85
85
  **Recommendation: Candidate B**
86
86
 
87
- The stamp is the architecturally correct fix. It's deterministic, self-documenting, and cleared by the existing workflow-for-workflows tool. The bootstrap problem is real but manageable: unstamped workflows show `possible` (not `likely`), and teams clear it organically by running workflow-for-workflows. No mass migration needed.
87
+ The stamp is the architecturally correct fix. It's deterministic, self-documenting, and cleared by the existing wr.workflow-for-workflows tool. The bootstrap problem is real but manageable: unstamped workflows show `possible` (not `likely`), and teams clear it organically by running wr.workflow-for-workflows. No mass migration needed.
88
88
 
89
89
  **Why A loses:** The CI-noise failure mode is hard to avoid and produces permanent noise. No actionable reason string.
90
90
 
@@ -94,11 +94,11 @@ The stamp is the architecturally correct fix. It's deterministic, self-documenti
94
94
 
95
95
  **Strongest counter-argument:** Spec version is too coarse — v3 may have added rules that don't apply to a given workflow's archetype, making `likely stale` misleading. Mitigation: add a `changedRules` summary to the reason string when the spec version changes.
96
96
 
97
- **Pivot condition:** If teams rarely run workflow-for-workflows and `possible` becomes permanent noise for 80%+ of workflows, add a CI step that reads the spec version and stamps workflows automatically (no human needed, just a `git commit --amend` or separate commit in CI).
97
+ **Pivot condition:** If teams rarely run wr.workflow-for-workflows and `possible` becomes permanent noise for 80%+ of workflows, add a CI step that reads the spec version and stamps workflows automatically (no human needed, just a `git commit --amend` or separate commit in CI).
98
98
 
99
99
  ## Open Questions for Main Agent
100
100
 
101
- 1. Should `workflow-for-workflows` stamp before or after the quality gate loop? (After seems right — only stamp if the workflow passes.)
101
+ 1. Should `wr.workflow-for-workflows` stamp before or after the quality gate loop? (After seems right — only stamp if the workflow passes.)
102
102
  2. Should `validatedAgainstSpecVersion` be a required field for new workflows going forward, or permanently optional?
103
103
  3. Does the console need a new visual treatment, or can it reuse the `migration` badge pattern that already exists?
104
104
  4. Should `inspect_workflow` in `metadata` mode also return the staleness field, or only in `preview` mode?
@@ -4,7 +4,7 @@
4
4
 
5
5
  | Tradeoff | Verdict | Hidden Assumption | Fails If |
6
6
  |---|---|---|---|
7
- | Bootstrap: existing workflows show `possible` | Acceptable | Teams will eventually run workflow-for-workflows on important workflows | `possible` shown as equally urgent as `likely` |
7
+ | Bootstrap: existing workflows show `possible` | Acceptable | Teams will eventually run wr.workflow-for-workflows on important workflows | `possible` shown as equally urgent as `likely` |
8
8
  | External workflows never get stamped | Acceptable | Stamp is optional, no workflow breaks | Same as above |
9
9
  | Spec granularity: one update flags all workflows | Acceptable with mitigation | Spec has a changelog per version increment | Spec version bumps silently with no explanation |
10
10
 
@@ -15,7 +15,7 @@
15
15
  | Failure Mode | Risk | Coverage | Missing Mitigation |
16
16
  |---|---|---|---|
17
17
  | Spec version not bumped when rules change | **High** | Not in code — process fix needed | Add explicit trigger to `authoring-spec.json` `changeProtocol` |
18
- | Stamp committed locally but not pushed | Medium | Phase 7 handoff note needed | Note in workflow-for-workflows Phase 7: "stamp must be committed" |
18
+ | Stamp committed locally but not pushed | Medium | Phase 7 handoff note needed | Note in wr.workflow-for-workflows Phase 7: "stamp must be committed" |
19
19
  | `possible` becomes wallpaper | Medium | Three-tier design helps | Ensure `possible` and `likely` are visually distinct in console |
20
20
 
21
21
  **Highest-risk failure mode: spec version not bumped.** This would make the entire system unreliable. Process fix required.
@@ -42,14 +42,14 @@
42
42
  **Yellow — No per-version changelog in authoring spec**
43
43
  The `reason` string in the staleness output must reference what changed between spec versions for the signal to be actionable. Currently there's no changelog. Must be added when version increments.
44
44
 
45
- **Yellow — workflow-for-workflows Phase 7 doesn't mention the stamp**
45
+ **Yellow — wr.workflow-for-workflows Phase 7 doesn't mention the stamp**
46
46
  The Phase 7 handoff step should explicitly tell the agent: "the `validatedAgainstSpecVersion` stamp was written to the workflow file — commit it for the staleness signal to take effect." Without this, teams may miss it.
47
47
 
48
48
  ## Recommended Revisions
49
49
 
50
50
  1. Add to `authoring-spec.json` `changeProtocol`: "Increment `version` when any required-level rule is added, removed, or materially changed. Add a `changelog` entry for the new version."
51
51
  2. Add `changelog` array to `authoring-spec.json` schema structure — each entry: `{ version, date, summary, affectedRules }`.
52
- 3. Add stamp reminder to workflow-for-workflows Phase 7 handoff step.
52
+ 3. Add stamp reminder to wr.workflow-for-workflows Phase 7 handoff step.
53
53
  4. Ensure console renders `likely` more prominently than `possible` (not just two shades of the same badge).
54
54
 
55
55
  ## Residual Concerns