@exaudeus/workrail 3.28.0 → 3.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-C146q2kN.js → index-Bl5-Ghuu.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +4 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +3 -3
  160. package/workflows/workflow-for-workflows.v2.json +3 -3
@@ -0,0 +1,998 @@
1
+ # Workflow Execution Contract (Token-Based)
2
+
3
+ This document describes the proposed “token-based” workflow execution tools intended to be **agent-first**, **rewind/fork safe**, and **idempotent**.
4
+
5
+ Decision record: `docs/adrs/005-agent-first-workflow-execution-tokens.md`
6
+
7
+ ## Normative vs illustrative (how to read this doc)
8
+ Sections explicitly labeled **(normative)** define binding protocol semantics. Examples and appendices are illustrative and must not be treated as authoritative when they conflict with normative sections or the code-canonical schemas referenced by v2 locks.
9
+
10
+ ## Recorded decisions (from design discussions)
11
+
12
+ - **Text rendering template versioning (internal-only)**: we may version the deterministic `text` template for testing and export stability, but we do **not** expose the template version to the agent as part of the MCP contract.
13
+ - **Text-first scope**: “text-first + JSON backbone” is **required for execution outputs** (e.g., `start_workflow`, `continue_workflow`, and `checkpoint_workflow` when present). It is optional for discovery/inspection tools (`list_workflows`, `inspect_workflow`).
14
+ - **AC/REJECT usage**: the detailed acceptance criteria and rejection triggers are treated as **non-normative guardrails** (test targets / design constraints). The contract remains defined by the normative sections in this document.
15
+
16
+ ## MCP platform constraints
17
+
18
+ This contract is shaped by constraints of the stdio MCP environment (no server push, no transcript access, lossy agents, etc.). The full list is recorded in:
19
+
20
+ - `docs/reference/mcp-platform-constraints.md`
21
+
22
+ ## Shared locks (authoritative references)
23
+
24
+ To prevent drift between MCP/CLI/Studio and keep error handling deterministic:
25
+
26
+ - **Unified error envelope** (including the closed-set `retry` union): `docs/design/v2-core-design-locks.md` (Section 12)
27
+ - **Corruption/salvage gating** (including `SessionHealth` and execution-vs-read-only tool gating): `docs/design/v2-core-design-locks.md` (Operational envelope: Corruption handling)
28
+
29
+ ## Goals
30
+
31
+ - Provide a minimal, primitives-only MCP contract that agents can use reliably.
32
+ - Support rewinds/forks/parallel runs naturally (chat UIs are not monotonic).
33
+ - Keep the workflow engine internal; avoid leaking execution internals to clients.
34
+ - Treat errors as data (structured error payloads; no throwing across boundaries).
35
+ - Preserve high-signal progress even when work happens outside a workflow step loop (rewinds can delete chat context without warning).
36
+
37
+ ## Error handling (normative)
38
+
39
+ - Tool handlers MUST return errors as data using the unified error envelope shape (see v2 locks).
40
+ - Retryability MUST be conveyed via the closed-set `retry` union (do not encode retry semantics in free-form prose).
41
+
42
+ ## Non-Goals
43
+
44
+ - Automatically infer structured output semantics from workflow prompts.
45
+ - Require the agent to manage dashboard sessions explicitly.
46
+
47
+ ## Tool Set
48
+
49
+ These tool names are chosen to make the workflow lifecycle explicit, reduce agent confusion, and keep durable state inside WorkRail:
50
+
51
+ - **Inspect**: read-only discovery and preview (never mutates execution)
52
+ - **Start**: begins a new run and returns the first pending step
53
+ - **Advance**: progresses an existing run from opaque tokens (or rehydrates/resumes a pending step)
54
+
55
+ ### `list_workflows`
56
+
57
+ Lists available workflows.
58
+
59
+ ### `inspect_workflow`
60
+
61
+ Read-only retrieval of workflow metadata and/or a preview to help select a workflow. Includes workflow-declared `references` (if any) for discoverability before starting execution.
62
+
63
+ ### `start_workflow`
64
+
65
+ Starts a new workflow run and returns the first pending step plus opaque tokens.
66
+
67
+ ### `continue_workflow`
68
+
69
+ Continues an existing workflow run.
70
+
71
+ - If `ackToken` is provided: acknowledge completion of the pending step for the given snapshot (idempotent).
72
+ - If `ackToken` is omitted: rehydrate/resume the pending step for the given snapshot (**no advancement and no durable mutation**).
73
+
74
+ **Rehydrate-only is side-effect-free (normative):**
75
+ - Calling `continue_workflow` without `ackToken` MUST NOT create nodes, edges, outputs, gaps, observations, or any other durable events.
76
+ - It exists solely to recover a lost pending prompt/recap after rewinds, restarts, or long chats.
77
+
78
+ ### `checkpoint_workflow` (optional / experimental)
79
+
80
+ Record durable “work progress” without advancing workflow state. This exists because meaningful work often happens outside a workflow step loop, and rewinds can delete chat context without warning.
81
+
82
+ This tool can be gated behind a feature flag while it is validated in real usage.
83
+
84
+ To keep WorkRail opt-in and avoid “checkpointing every chat”, `checkpoint_workflow` should require an existing workflow run handle (`stateToken`) and a WorkRail-minted `checkpointToken` unless checkpoint-only sessions are explicitly enabled.
85
+
86
+ Idempotency (required for rewind-safe correctness):
87
+ - `checkpoint_workflow` MUST be idempotent under retries/replays.
88
+ - WorkRail achieves this by minting a `checkpointToken` (opaque, scoped, replay-safe) alongside `stateToken`/`ackToken` in `start_workflow`/`continue_workflow` responses.
89
+ - Callers MUST round-trip `checkpointToken` unchanged when invoking `checkpoint_workflow`.
90
+
91
+ If checkpoint-only sessions (no workflow has started) are desired later, introduce a `start_session` tool behind a separate feature flag and extend `checkpoint_workflow` to accept `sessionToken`. Until then, checkpointing outside workflows is intentionally unsupported.
92
+
93
+ ### `start_session` (optional / feature-flagged)
94
+
95
+ Create a session handle for checkpoint-only workflows (no active run). This tool exists to reduce friction in “brand new chat” scenarios where a user wants durable notes without starting a workflow run.
96
+
97
+ This tool is intentionally narrow and should not reintroduce session CRUD surfaces.
98
+
99
+ ### `resume_session` (optional / feature-flagged)
100
+
101
+ Read-only lookup for resuming work in a brand new chat. Supports queries like “resume my session about xyz” and returns **tip-only** resume targets (latest branch tip by deterministic policy), plus small snippets for disambiguation.
102
+
103
+ ## End-to-End Flows
104
+
105
+ ### Basic flow (single workflow)
106
+
107
+ 1. Call `list_workflows` and `inspect_workflow` to select a workflow (read-only).
108
+ 2. Call `start_workflow` to begin execution.
109
+ 3. Repeat:
110
+ - Follow `pending.prompt`
111
+ - Call `continue_workflow` with the returned `stateToken` and `ackToken` to advance (or call without `ackToken` to rehydrate a pending step)
112
+ 4. Stop when `isComplete == true` (and `pending == null`).
113
+
114
+ #### Full-auto execution (modes)
115
+
116
+ WorkRail supports “full-auto” execution as a first-class behavior. In full-auto modes, the agent must play the role of both the agent and the user: it does not silently skip user-directed prompts. Instead, it resolves them via best-effort context gathering and explicit assumptions.
117
+
118
+ Two full-auto variants are intentionally supported:
119
+
120
+ - **`full_auto_never_stop`**: never returns `blocked`. When required user input is unavailable, the agent continues by gathering context elsewhere, making explicit assumptions, or skipping steps, while recording durable warnings and gaps.
121
+ - **`full_auto_stop_on_user_deps`**: blocks only for formalized user-only dependencies (see below).
122
+
123
+ ### Off-workflow work (checkpoint)
124
+
125
+ When the agent is doing substantial work outside a workflow step loop (implementation, iteration, tuning output, etc.), it should call `checkpoint_workflow` to persist a short recap. This reduces the cost of rewinds and long chats by moving durable memory into the session store.
126
+
127
+ ### Rewind/fork behavior (chat UIs)
128
+
129
+ If the user rewinds conversation history, the agent may repeat a prior call sequence and reuse an older `stateToken`.
130
+
131
+ This is expected and correct:
132
+
133
+ - An older `stateToken` represents an older snapshot.
134
+ - Advancing from that token creates a new branch in the run's lineage.
135
+ - The dashboard should render forks rather than treating this as “desync”.
136
+
137
+ ### Multiple workflows (sequential, parallel, nested)
138
+
139
+ Agents may run multiple workflows in a single chat:
140
+
141
+ - **Sequential**: finish workflow A, then start workflow B.
142
+ - **Parallel/interleaved**: keep multiple `{stateToken, ackToken}` pairs and advance any run in any order.
143
+ - **Nested**: run workflow B “inside” a step of workflow A by:
144
+ - starting and advancing B separately
145
+ - writing B’s results into A via `context` or step output
146
+
147
+ No special “nesting API” is required for correctness; it is an orchestration choice.
148
+
149
+ ## Core Concepts
150
+
151
+ ### `stateToken` (opaque snapshot)
152
+
153
+ - **Minted by WorkRail**.
154
+ - Encodes (internally): `workflowId`, `workflowHash`, `runId`, and execution snapshot data.
155
+ - Must be **opaque** to clients: clients round-trip it without modification.
156
+ - Must be **validated** by WorkRail (version + signature/HMAC) to prevent tampering.
157
+
158
+ ### `workflowHash` (pinned workflow identity)
159
+
160
+ Runs are pinned to a specific workflow definition at `start_workflow` time to avoid “live” behavior changes when workflow files evolve.
161
+
162
+ - WorkRail computes `workflowHash` from a normalized (or compiled) workflow definition.
163
+ - The hash is embedded into `stateToken` so future calls are deterministic.
164
+ - WorkRail persists a workflow snapshot keyed by `workflowHash` in the session store (required for export/import and for continuing runs when the workflow file changes or disappears).
165
+
166
+ ### `ackToken` (opaque completion acknowledgement)
167
+
168
+ - **Minted by WorkRail** per pending step and per `stateToken`.
169
+ - Represents “the client completed the pending step instruction returned with this snapshot”.
170
+ - Must be **idempotent**:
171
+ - Replaying the same `(stateToken, ackToken)` returns the same response payload.
172
+ - Replaying does not advance the run twice.
173
+ - **Idempotency must not be implemented via recompute (normative):**
174
+ - When replaying the same `(stateToken, ackToken)`, WorkRail MUST return from durable recorded facts keyed by the attempt identity (see v2 locks `advance_recorded`) and MUST NOT re-run step selection, contract validation, or other execution logic that could drift.
175
+ - Must be **scoped**:
176
+ - An `ackToken` from run A must not be usable on run B.
177
+ - An `ackToken` from snapshot X must not be usable on snapshot Y.
178
+
179
+ #### Branching and “attempt acks” (normative)
180
+
181
+ Rewinds and replays are expected. The system must support **branching** from the same snapshot (older `stateToken`) without requiring the agent to construct identifiers.
182
+
183
+ To enable branching, WorkRail must be able to mint a **fresh attempt acknowledgement** for the same snapshot when the agent wants to intentionally fork (or when a replay is detected).
184
+
185
+ Idempotency is keyed to the server-minted ack capability (replay of the same `ackToken` is a no-op returning the same response).
186
+
187
+ ### `checkpointToken` (opaque checkpoint acknowledgement)
188
+
189
+ - **Minted by WorkRail** for checkpointing against a specific `stateToken` snapshot/node.
190
+ - Represents “the client wants to append a checkpoint at this node”.
191
+ - Must be **idempotent**:
192
+ - Replaying the same `(stateToken, checkpointToken)` MUST NOT create duplicate checkpoint nodes/edges/outputs.
193
+ - Replaying returns the same response deterministically.
194
+ - Must be **scoped**:
195
+ - A `checkpointToken` from run A must not be usable on run B.
196
+ - A `checkpointToken` from snapshot X must not be usable on snapshot Y.
197
+
198
+ ### `context` (external inputs)
199
+
200
+ `context` carries external facts that can influence conditions and loop inputs, e.g.:
201
+
202
+ - identifiers: ticket id, repo path, branch
203
+ - workflow parameters: `quantity`, `deliverable`
204
+ - constraints: “don’t run detekt”, “no network”, etc.
205
+
206
+ Do not place workflow progress state in `context`.
207
+
208
+ ## Preferences & modes (normative)
209
+
210
+ WorkRail v2 supports user-selectable execution behavior (e.g., guided vs full-auto) without expanding the MCP boundary or leaking engine internals.
211
+
212
+ ### Preferences (closed set)
213
+
214
+ Preferences are a **WorkRail-defined closed set** of typed values (enums / discriminated unions). They are not arbitrary key/value bags.
215
+
216
+ - Workflows may **recommend** preferences (or presets), but they do not invent new preference keys.
217
+ - Preferences influence execution behavior, but correctness remains token-driven (`stateToken`, `ackToken`).
218
+
219
+ ### Modes (presets)
220
+
221
+ “Modes” are **display-friendly presets** (Studio/UX-facing) that map to one or more preference values. WorkRail owns the preset set and labels so the UX can stay simple without sacrificing determinism.
222
+
223
+ ### Scopes and precedence
224
+
225
+ Preferences exist at multiple scopes:
226
+
227
+ - **Global**: developer defaults.
228
+ - **Session**: defaults for a workstream; override global.
229
+
230
+ Global preferences are treated as defaults only: they are copied into a session baseline at the start of work, so future global changes do not retroactively affect past runs.
231
+
232
+ ### Node-attached effective preferences
233
+
234
+ WorkRail must evaluate each next-step decision against the **effective preference snapshot** and record it durably as part of the run graph (e.g., stored on the node or via append-only events).
235
+
236
+ That snapshot applies to descendant nodes until another preference change occurs. This makes preference-driven behavior rewind-safe and export/import safe: replaying an older `stateToken` replays with the preference state that was effective at that node, not “whatever is configured today”.
237
+
238
+ ## Optional capabilities (normative)
239
+
240
+ Some workflows can optionally leverage enhanced agent capabilities (e.g., delegation/subagents or web browsing). WorkRail cannot introspect what tools an agentic IDE provides, so capability availability must be learned through explicit, durable observations rather than assumed or inferred.
241
+
242
+ ### Capabilities (closed set)
243
+
244
+ Capabilities are a WorkRail-defined closed set. For v2 we explicitly plan for:
245
+
246
+ - `delegation` (subagents / parallel delegation)
247
+ - `web_browsing` (external knowledge lookup via agent tooling)
248
+
249
+ ### Desired vs observed
250
+
251
+ Workflows may declare whether a capability is:
252
+
253
+ - **required**: the workflow is not meaningfully executable without it.
254
+ - **preferred**: use it when available; otherwise degrade.
255
+
256
+ WorkRail does not attempt to enumerate or model “baseline” agent tools (file read/write, grep, terminal, etc.). Capabilities are only for optional enhancements that materially change how a workflow is executed or whether it can run at all.
257
+
258
+ Capability requirements are part of the workflow’s compiled behavior (they change prompts, probing steps, and fallback paths) and must therefore be included in the compiled workflow that is hashed into `workflowHash`.
259
+
260
+ WorkRail tracks the observed status per run/branch:
261
+
262
+ - `unknown` (default)
263
+ - `available` (observed working)
264
+ - `unavailable` (observed failing / not supported)
265
+
266
+ Observed status must be recorded durably (node-attached or via append-only events) so resumption and export/import do not depend on ambient IDE configuration.
267
+
268
+ ### Probing and degradation
269
+
270
+ Because capability status is learned, workflows must specify how to discover it:
271
+
272
+ - If `web_browsing` is **required**, probe early so blocking modes can fail fast with an actionable recommendation (e.g., “install a web browsing MCP” or provide the needed source material manually).
273
+ - If `delegation` is **preferred**, probing can be lazy: attempt when needed and fall back to a sequential approach if unavailable.
274
+
275
+ When a preferred capability is unavailable, WorkRail should degrade gracefully and surface a Studio warning (and/or durable notes) that the enhanced path was not applied.
276
+
277
+ ### Recording capability observations (recommended)
278
+
279
+ Observed capability status should be recorded as durable data associated with the current node (or as an append-only event). Because WorkRail cannot introspect the agent environment, this observation must come from explicit agent-reported results (e.g., a probe step that attempts to use the capability).
280
+
281
+ Where structured artifacts are used, WorkRail should provide a small, closed-set artifact kind for capability observations so Studio can render consistent warnings and history.
282
+
283
+ ### Example patterns (recommended)
284
+
285
+ #### Web browsing required: injected early probe
286
+
287
+ If a workflow requires `web_browsing`, the compiled workflow should include an early, injected probe step (collapsed by default for agent UX) whose purpose is to determine observed capability status.
288
+
289
+ Behavior:
290
+
291
+ - The probe step instructs the agent to attempt a minimal web-browsing action (e.g., fetch any short page or search query).
292
+ - On acknowledgement, the agent reports a durable capability observation (e.g., `capability=web_browsing`, `status=available|unavailable`, optional remediation).
293
+ - If the agent attempts to advance without providing the required observation, WorkRail returns `blocked` with a structured “missing required output” reason and an example payload.
294
+
295
+ This enables:
296
+
297
+ - `full_auto_stop_on_user_deps` (or guided) to fail fast when web browsing is required but unavailable.
298
+ - `full_auto_never_stop` to continue while recording a critical gap when web browsing is required but unavailable.
299
+
300
+ #### Delegation preferred: lazy attempt + sequential fallback
301
+
302
+ If a workflow prefers `delegation`, it should not require an upfront probe. Instead:
303
+
304
+ - At steps that can benefit from parallelism, the prompt instructs the agent to attempt delegation/subagents when available.
305
+ - If delegation is unavailable, the agent executes the sequential alternative and records a durable capability observation indicating `delegation` is unavailable.
306
+ - Studio surfaces a warning that the delegated path was not applied, but the workflow continues normally.
307
+
308
+ ## Response content structure (normative)
309
+
310
+ Execution tool responses (`start_workflow`, `continue_workflow`) are delivered as multiple MCP content items, each with `type: "text"`. The items are ordered:
311
+
312
+ 1. **Primary content**: the authored prompt (or system message for completed/blocked states). Always present.
313
+ 2. **Workflow references** (when present): a dedicated content item listing external documents the workflow points at. Only emitted when the workflow declares `references` and the lifecycle warrants it.
314
+ 3. **Response supplements** (when present): system-level guidance items (e.g., boundary-owned delivery guidance, one-time supplements).
315
+
316
+ ### Reference delivery by lifecycle
317
+
318
+ | Lifecycle | Reference content |
319
+ |-----------|------------------|
320
+ | `start` | Full reference set: title, resolved path, purpose, authority level, resolution status |
321
+ | `rehydrate` | Compact reminder: title and path only |
322
+ | `advance` | Not emitted (agent already has references from start/rehydrate) |
323
+
324
+ References with unresolved paths (file not found at start time) are surfaced with an `[unresolved]` tag. Unresolved references produce a warning but do not block execution.
325
+
326
+ ### Content envelope (internal)
327
+
328
+ Internally, WorkRail assembles a `StepContentEnvelope` that carries typed content categories (authored prompt, resolved references, supplements). The formatter consumes this envelope to produce the MCP content items above. This is an implementation detail not exposed in the public tool output schema.
329
+
330
+ ## Durable outputs (`output` envelope)
331
+
332
+ WorkRail needs durable memory outside the chat transcript. To keep the system simple for agents, there should be a **single write path** for durable updates:
333
+
334
+ - Use `output` for durable summaries and structured artifacts that should appear in the session/dashboard and survive rewinds.
335
+ - Use `context` only for external inputs that influence execution (conditions, loops, parameters), not for durable notes.
336
+
337
+ ## Resumption vs rewind behavior (normative)
338
+
339
+ WorkRail cannot read the chat transcript. It must infer “resume” vs “fork” from the durable run graph.
340
+
341
+ - **Resumption (tip node)**:
342
+ - When the provided snapshot is the latest tip of its branch, WorkRail should return a durable recap (“rehydration”) up to the pending step to help agents recover from lost chat context.
343
+
344
+ - **Rewind/fork (non-tip node)**:
345
+ - When the provided snapshot already has children (advancing would create a new sibling branch), WorkRail should:
346
+ - return branch-focused information (existing children summaries)
347
+ - automatically fork (no user confirmation required)
348
+ - return branch context the agent likely lost (including a bounded “downstream recap” for the preferred/latest branch), while still avoiding an unbounded full-history dump (“confusing soup”)
349
+
350
+ ### Recap budgets and truncation (normative)
351
+
352
+ WorkRail should return the **full recap when it is small**, and a **deterministically truncated recap** when it would exceed reasonable payload budgets.
353
+
354
+ - **Budgeting rule**:
355
+ - Prefer byte-based budgets (most deterministic across models/clients).
356
+ - Include as many most-recent recap entries as fit within the budget, preserving deterministic ordering.
357
+
358
+ - **Truncation marker**:
359
+ - When truncating, include an explicit marker in both `text` and structured fields indicating:
360
+ - that the recap was truncated
361
+ - how many entries were omitted (when known)
362
+ - the policy used (e.g., “kept most recent entries”)
363
+
364
+ This keeps the “rewind resilience” promise without turning every response into an unbounded history dump.
365
+
366
+ ### Function definitions in rehydrate/rewind recovery (normative clarification)
367
+ Some workflows use `functionDefinitions` + `functionReferences` to reduce repeated instructions (define once, reference many times). Because WorkRail cannot access chat history, `continue_workflow` (rehydrate-only) MUST return enough recovery context for the agent to understand any referenced functions.
368
+
369
+ Lock intent:
370
+ - Function definition recovery MUST be satisfied by deterministic rendering from the pinned compiled workflow snapshot (part of `workflowHash`), not by transcript memory.
371
+ - Function definitions SHOULD be included as part of the bounded recovery text (e.g., expanded into `pending.prompt`) and MUST respect the same byte-budget and truncation rules as other recap/recovery content.
372
+
373
+ ## User-only dependencies (normative)
374
+
375
+ WorkRail should treat “user-only dependencies” as a **closed set of reasons** that can justify returning `kind: "blocked"` (e.g., a required design doc that only the user can supply).
376
+
377
+ The behavior depends on the effective full-auto preference:
378
+
379
+ - Under **`full_auto_stop_on_user_deps`**, WorkRail returns `blocked` with structured reasons and next-input guidance.
380
+ - Under **`full_auto_never_stop`**, WorkRail never blocks. User-only dependency reasons must be converted into structured warnings plus durable disclosure (“gaps”) while execution continues.
381
+
382
+ The closed set for user-only dependency reasons is locked in `docs/design/v2-core-design-locks.md` (see “User-only dependencies: closed reasons”).
383
+
384
+ ## Blocked vs gaps (mode-driven, drift prevention) (recommended)
385
+ To keep behavior deterministic across modes and prevent semantic drift, treat “blocked” (control flow) and “gaps” (durable disclosure) as two views over the same underlying closed-set reasons.
386
+
387
+ Recommended rules:
388
+ - In blocking modes (`guided`, `full_auto_stop_on_user_deps`), eligible reasons return `kind:"blocked"` with structured blockers.
389
+ - In `full_auto_never_stop`, the engine must not return `blocked`; instead it records critical gaps and continues, while still disclosing the same underlying reason.
390
+
391
+ Additional recommendation:
392
+ - Blockers should use a closed-set `code` enum and deterministic ordering, and include a typed pointer so Studio can render actionable unblock guidance without reading chat history.
393
+
394
+ ## `blocked.blockers[]` schema (normative)
395
+ When WorkRail returns `kind:"blocked"`, the `blockers[]` payload MUST conform to a closed, deterministic shape so clients do not infer meaning from prose.
396
+
397
+ Locks:
398
+ - `blockers` is a non-empty list.
399
+ - `blockers` MUST be deterministically ordered by `(code, pointer.kind, pointer.* stable fields)` ascending.
400
+ - Each blocker MUST include: `code`, `pointer`, `message`. `suggestedFix` is optional but strongly recommended.
401
+ - Payloads are bounded:
402
+ - max blockers: 10
403
+ - max `message` bytes: 512 (UTF-8)
404
+ - max `suggestedFix` bytes: 1024 (UTF-8)
405
+
406
+ `blockers[].code` (closed set, initial):
407
+ - `USER_ONLY_DEPENDENCY`
408
+ - `MISSING_REQUIRED_OUTPUT`
409
+ - `INVALID_REQUIRED_OUTPUT`
410
+ - `REQUIRED_CAPABILITY_UNKNOWN`
411
+ - `REQUIRED_CAPABILITY_UNAVAILABLE`
412
+ - `INVARIANT_VIOLATION`
413
+ - `STORAGE_CORRUPTION_DETECTED`
414
+
415
+ `blockers[].pointer` (closed set, initial):
416
+ - `{ "kind": "context_key", "key": "..." }`
417
+ - `{ "kind": "context_budget" }`
418
+ - `{ "kind": "output_contract", "contractRef": "..." }`
419
+ - `{ "kind": "capability", "capability": "delegation" | "web_browsing" }`
420
+ - `{ "kind": "workflow_step", "stepId": "..." }`
421
+
422
+ ## Durable accounting for outcomes (normative, drift-prevention)
423
+
424
+ Because chat transcripts are not reliable storage, WorkRail should not require Studio/exports to infer what happened from transient tool responses.
425
+
426
+ Locks:
427
+ - WorkRail MUST persist a durable, node-scoped record of each attempted `continue_workflow` **ack** intent (advancement attempt) and its outcome (blocked | advanced) as append-only truth (see v2 lock: `advance_recorded`).
428
+ - Replay MUST be derived from this durable record (fact-returning); replays MUST NOT recompute outcomes.
429
+ - Dedupe/idempotency MUST be first-class: retries must not create duplicate “attempt” records, but legitimate evolution (e.g., a later unblock with a new attempt) must remain appendable.
430
+
431
+ ## Mode safety, warnings, and recommendations (normative)
432
+
433
+ WorkRail must never hard-block a user-selected mode. Instead:
434
+
435
+ - Workflows may declare a **recommended maximum automation** (a suggested preset).
436
+ - If the user selects a more aggressive mode, WorkRail returns structured warnings and recommends the highest automation combination it considers safe for that workflow.
437
+
438
+ ## Brand new chat resumption (normative)
439
+
440
+ Because WorkRail cannot access chat history, a brand new chat must either:
441
+
442
+ - supply an existing handle (e.g., a `stateToken` or a short `resumeRef`), or
443
+ - use `resume_session` (when enabled) to find the correct session/run tip.
444
+
445
+ `resume_session` should use a layered search strategy:
446
+
447
+ 1. session keys/titles/tags and obvious identifiers (high precision)
448
+ 2. durable notes (`output.notesMarkdown`) and small artifact previews on run tips
449
+ 3. deep search across durable outputs as a last resort (bounded)
450
+
451
+ Results should be **tip-only** and deterministically ranked.
452
+
453
+ ### Minimal `output` shape (recommended)
454
+
455
+ - `output.notesMarkdown` (strongly encouraged): detailed recap of this step’s work (see quality guidance below).
456
+ - `output.artifacts[]` (optional): small structured payloads, used only when you have concrete structured results.
457
+
458
+ ### Per-step notes semantics (normative)
459
+
460
+ `output.notesMarkdown` represents a **per-step fresh summary**, not a cumulative log:
461
+
462
+ - Each `continue_workflow` call should provide a summary of work accomplished in **THIS specific step only**.
463
+ - Agents MUST NOT accumulate or append previous step notes into `notesMarkdown`.
464
+ - WorkRail aggregates notes across steps via the recap projection with deterministic budgeting when presenting recovery context in rehydrate-only responses.
465
+
466
+ **Rationale**:
467
+ - Enables deterministic truncation (per-step notes have predictable size)
468
+ - Enforces byte budget compliance (cumulative notes would violate the 4096-byte limit by construction)
469
+ - Preserves rewind safety (each step's notes are independent; no need to read chat history)
470
+ - Allows projections to aggregate, filter, and budget notes deterministically
471
+
472
+ **Notes quality guidance**:
473
+
474
+ These notes are displayed to the user in a markdown viewer and serve as the durable record of the agent's work. They should be written for a human reader. Include:
475
+
476
+ 1. **What you did** and the key decisions or trade-offs made
477
+ 2. **What you produced** — files changed, functions added, test results, specific numbers
478
+ 3. **Anything notable** — risks, open questions, things deliberately NOT done and why
479
+
480
+ Use markdown formatting: headings, bullet lists, `code references`, **bold** for emphasis. Be specific — file paths, function names, counts, not vague summaries. 10–30 lines is a good target; too short is worse than too long.
481
+
482
+ Artifact kinds should be from a closed set (examples):
483
+
484
+ - `mr_review.changed_files`
485
+ - `mr_review.findings`
486
+ - `working_agreement_patch` (rare; only derived from explicit user preferences)
487
+
488
+ The exact allowed artifact kinds and schemas can be workflow-specific via explicit output contracts.
489
+
490
+ ## Workflow pinning and evolution
491
+
492
+ ### Workflow identity and namespaces (normative)
493
+
494
+ WorkRail v2 adopts a **namespaced workflow ID format** for clarity, organization, and protection of core workflows.
495
+
496
+ **ID format:**
497
+ - `namespace.name` with **exactly one dot**
498
+ - Both `namespace` and `name` segments use: `[a-z][a-z0-9_-]*` (lowercase, alphanumeric, hyphens, underscores)
499
+ - Examples: `wr.bug_investigation`, `project.auth_review`, `team.onboarding`
500
+
501
+ **Reserved namespace:**
502
+ - The `wr.*` namespace is **reserved exclusively for bundled/core workflows**.
503
+ - Non-core sources (user, project, git, remote, plugin) must not define workflows with IDs starting with `wr.*`.
504
+ - WorkRail must reject such definitions at load/validate time with an actionable error.
505
+
506
+ **Legacy IDs (no dot):**
507
+ - Workflows with legacy IDs (e.g., `bug-investigation`) remain **runnable** for backward compatibility.
508
+ - Creating or saving new workflows with legacy IDs is **rejected**.
509
+ - Usage/inspection of legacy workflows must emit **structured warnings** with suggested namespaced renames based on the workflow's source:
510
+ - User directory → `user.<id>`
511
+ - Project directory → `project.<id>`
512
+ - Git/remote/plugin → `repo.<id>` or `team.<id>` (deterministic suggestion)
513
+
514
+ **Discovery behavior:**
515
+ - `list_workflows` returns both workflows and routines, including:
516
+ - `kind: "workflow" | "routine"`
517
+ - `idStatus: "legacy" | "namespaced"`
518
+ - Deterministic sort order: **namespace → kind (workflow first) → name/id**
519
+
520
+ ### Pinning policy (normative)
521
+
522
+ - `start_workflow` MUST compute a `workflowHash` and pin the run to it.
523
+ - The `workflowHash` is computed from the **fully expanded compiled workflow**, including:
524
+ - the workflow definition (with namespaced ID)
525
+ - all builtin template expansions
526
+ - all feature applications
527
+ - all selected contract packs
528
+ - Subsequent `continue_workflow` calls MUST execute against the pinned workflow snapshot identified by the `workflowHash` embedded in `stateToken`.
529
+
530
+ ### Workflow changes on disk (recommended behavior)
531
+
532
+ If the workflow file at `workflowId` changes after a run is started:
533
+
534
+ - WorkRail should continue using the pinned snapshot for that run.
535
+ - WorkRail should surface a structured warning (as data) that the on-disk workflow differs from the pinned snapshot.
536
+
537
+ Explicit “migration” of a run to a new workflow version is a separate, opt-in feature.
538
+
539
+ ## What the Agent Must and Must Not Do
540
+
541
+ - **MUST**:
542
+ - Treat `stateToken`, `ackToken`, and `checkpointToken` as opaque values.
543
+ - Round-trip all tokens exactly as returned.
544
+ - Only advance the workflow by calling `continue_workflow` with the current tokens (`stateToken` + `ackToken`).
545
+ - Only record a checkpoint by calling `checkpoint_workflow` with the current `stateToken` and `checkpointToken`.
546
+ - In full-auto modes, resolve user-directed prompts by best-effort context gathering and explicit assumptions rather than silently skipping questions.
547
+ - Disclose assumptions, skips, and missing inputs via durable `output` so progress survives rewinds.
548
+ - **MUST NOT**:
549
+ - Construct or mutate workflow execution state (completed steps, loop stacks, etc.).
550
+ - Guess tool payload shapes beyond what the tool schema and examples provide.
551
+
552
+ ## Request/Response Shapes
553
+
554
+ ### `start_workflow` request
555
+
556
+ ```json
557
+ {
558
+ "workflowId": "mr-review-workflow",
559
+ "context": {
560
+ "ticketId": "AUTH-1234",
561
+ "complexity": "Standard"
562
+ }
563
+ }
564
+ ```
565
+
566
+ ### `start_workflow` response (example)
567
+
568
+ ```json
569
+ {
570
+ "stateToken": "st.v1....",
571
+ "pending": {
572
+ "stepId": "phase-0-triage",
573
+ "title": "Phase 0: Triage & Review Focus",
574
+ "prompt": "…",
575
+ "requireConfirmation": true
576
+ },
577
+ "ackToken": "ack.v1....",
578
+ "checkpointToken": "chk.v1....",
579
+ "isComplete": false,
580
+ "session": {
581
+ "sessionId": "sess_01JH8X2...",
582
+ "runId": "run_01JFD..."
583
+ },
584
+ "preferences": {
585
+ "autonomy": "guided",
586
+ "riskPolicy": "conservative"
587
+ }
588
+ }
589
+ ```
590
+
591
+ Notes:
592
+ - `session` is **informational** and for dashboard UX only. Correctness is driven by tokens.
593
+ - `preferences` is **informational** for UX/debugging; it does not replace the durable run graph as source of truth.
594
+
595
+ ### `continue_workflow` request
596
+
597
+ ```json
598
+ {
599
+ "stateToken": "st.v1....",
600
+ "ackToken": "ack.v1....",
601
+ "context": {
602
+ "ticketId": "AUTH-1234",
603
+ "complexity": "Standard"
604
+ },
605
+ "output": {
606
+ "notesMarkdown": "Completed phase 0. MR is Standard complexity; focus on DI wiring and tool contract correctness."
607
+ }
608
+ }
609
+ ```
610
+
611
+ ### `continue_workflow` response (example)
612
+
613
+ ```json
614
+ {
615
+ "stateToken": "st.v1.next....",
616
+ "pending": {
617
+ "stepId": "phase-1-context",
618
+ "title": "Phase 1: Contextual Understanding & Confirmation",
619
+ "prompt": "…",
620
+ "requireConfirmation": true
621
+ },
622
+ "ackToken": "ack.v1.next....",
623
+ "checkpointToken": "chk.v1.next....",
624
+ "isComplete": false,
625
+ "session": {
626
+ "sessionId": "sess_01JH8X2...",
627
+ "runId": "run_01JFD..."
628
+ },
629
+ "preferences": {
630
+ "autonomy": "full_auto_stop_on_user_deps",
631
+ "riskPolicy": "balanced"
632
+ }
633
+ }
634
+ ```
635
+
636
+ ### `continue_workflow` response with stepContext (example)
637
+
638
+ When the completed step recorded structured execution facts (e.g. an accepted assessment), the response includes `stepContext`:
639
+
640
+ ```json
641
+ {
642
+ "kind": "ok",
643
+ "continueToken": "ct_...",
644
+ "checkpointToken": "chk_...",
645
+ "isComplete": false,
646
+ "pending": { "stepId": "phase-3-implement", "title": "Phase 3: Implement", "prompt": "..." },
647
+ "stepContext": {
648
+ "assessments": {
649
+ "assessmentId": "diagnosis_readiness_gate",
650
+ "dimensions": [
651
+ { "dimensionId": "confidence", "level": "high", "rationale": "Root cause confirmed by stack trace and reproducer." }
652
+ ]
653
+ }
654
+ }
655
+ }
656
+ ```
657
+
658
+ `stepContext` is absent (not null, not empty object) when the completed step had no recorded facts. Consumers MUST treat its absence as equivalent to no step-level facts.
659
+
660
+ ### `stepContext` (normative)
661
+
662
+ `stepContext` is an optional backward-looking envelope on `continue_workflow` ok responses. It records structured facts about the step that just completed. It is distinct from all other top-level response fields, which are forward-looking (next pending step, tokens, intent).
663
+
664
+ **Invariants:**
665
+ - Present only when the completed step recorded at least one structured fact.
666
+ - Absent (not null) when no facts were recorded.
667
+ - `assessments` is present when the step declared an `assessmentRef` and the agent submitted a valid assessment that was accepted by the engine. Contains the assessmentId, each dimension's normalized level and optional rationale, and `normalizationNotes`.
668
+ - `assessments.normalizationNotes` is an array of human-readable strings explaining any level normalization WorkRail applied (e.g. "HIGH" accepted as "high"). Empty array means all levels matched exactly. Agents SHOULD check this to self-correct future submissions.
669
+ - The internal `normalization` enum per dimension (exact vs normalized) is an engine implementation detail and is NOT exposed in `stepContext`.
670
+ - `stepContext` is only present in the immediate tool response. It does NOT flow automatically into future pending steps. Agents that need assessment results in later steps MUST copy relevant values into `context` variables when calling `continue_workflow`.
671
+ - On assessment projection error (malformed event log), `stepContext` will be absent and a warning will be logged server-side. The advance is NOT rolled back - the durable event record is authoritative.
672
+
673
+ ### `checkpoint_workflow` request (example)
674
+
675
+ ```json
676
+ {
677
+ "stateToken": "st.v1....",
678
+ "checkpointToken": "chk.v1....",
679
+ "output": {
680
+ "notesMarkdown": "Implemented token-based description updates. Next: update tool naming to `start_workflow`/`continue_workflow` and add checkpoint tool behind flag."
681
+ }
682
+ }
683
+ ```
684
+
685
+ Notes:
686
+ - For now, `checkpoint_workflow` requires `stateToken` and `checkpointToken` (attach to a specific workflow node). Session-only checkpointing is a future feature behind `start_session`.
687
+
688
+ ## Dashboard / Sessions (UX Projection)
689
+
690
+ Sessions are a UX layer that should be updated **natively** as a side effect of `start_workflow`/`continue_workflow`:
691
+
692
+ - A single **session** represents a single workstream (ticket/PR/chat) and may contain **multiple workflow runs**.
693
+ - Each **run** corresponds to a single workflow execution and has its own branching token lineage.
694
+
695
+ ### Persistence model (recommended)
696
+
697
+ Use an **append-only event log as the source of truth**, stored per session.
698
+
699
+ Storage invariants (segmentation, crash-safe append, integrity/recovery, snapshot identity/layout, etc.) are consolidated and locked in:
700
+ - `docs/design/v2-core-design-locks.md`
701
+
702
+ - Events drive the dashboard; projections are derived (pure functions).
703
+ - Token lineage is derived from durable node and edge events. At minimum:
704
+ - advancing a step creates an edge (`edgeKind=acked_step`) from parent snapshot to child snapshot
705
+ - checkpointing creates a node snapshot **and** an edge (`edgeKind=checkpoint`) from parent snapshot to checkpoint snapshot (no advancement)
706
+ - Nodes represent durable snapshots. For Studio, it is useful to treat node kinds as a closed set, e.g.:
707
+ - `nodeKind=step` (created by `continue_workflow` advancement)
708
+ - `nodeKind=checkpoint` (created by `checkpoint_workflow`)
709
+ - Rewinds naturally create branches (multiple children for the same parent) instead of “desync”.
710
+ - Session pointers (like “latest”) are derived views, not authoritative state.
711
+
712
+ ### Preferences, capabilities, and divergence (recommended)
713
+
714
+ Studio-visible signals should be recorded as durable data attached to the node where they occurred, for example:
715
+
716
+ - effective preferences (and preference-change markers)
717
+ - capability observations (requested vs observed)
718
+ - divergence markers (when the agent intentionally deviates from step instructions)
719
+
720
+ ### Environment observations (recommended)
721
+
722
+ Record high-signal local observations (e.g., git branch name and HEAD SHA) as append-only events. Use these observations to improve resume ranking and session identification in `resume_session`.
723
+
724
+ ### UI guidance (avoid “confusing soup”)
725
+
726
+ If a session contains multiple workflow runs, the UI MUST make boundaries explicit.
727
+
728
+ Recommended baseline UI:
729
+
730
+ - **Runs sidebar**: list runs with `workflowId` + human title + status (Running/Complete) + branch count.
731
+ - **Single active run view**: render one run at a time (its branch graph + steps + artifacts) to avoid mixing content.
732
+ - **Session Notes**: a session-level notes area for global context and “between workflows” summaries.
733
+
734
+ Advanced view (optional):
735
+
736
+ - **Session Timeline**: a chronological timeline view with lanes per run (color-coded), plus an optional session-level lane for global checkpoints. This makes multi-workflow sessions understandable without intermixing details in the default view.
737
+
738
+ ### Local-only dashboard and sharing
739
+
740
+ The dashboard is local-only. Sharing is achieved via explicit export/import:
741
+
742
+ - Export session bundle (versioned) for another developer to import into their local dashboard.
743
+ - Export rendered views (e.g., Markdown, optionally PDF) as projections of stored session artifacts.
744
+
745
+ Retention/expiration (TTL) should be configurable; a reasonable default is 30–90 days.
746
+
747
+ ## Export/import bundles (resumable) (normative)
748
+
749
+ WorkRail must support **resumable** export/import of stored sessions. After import, an agent should be able to use `resume_session` and `continue_workflow` to proceed deterministically.
750
+
751
+ ### Bundle format (recommended)
752
+
753
+ - A single, versioned bundle file (e.g., JSON). Zip/folder formats can be added later, but the bundle must remain self-describing and deterministic.
754
+ - The bundle MUST include a `bundleSchemaVersion` so imports can fail fast (or migrate explicitly).
755
+
756
+ ### Required bundle contents (normative)
757
+
758
+ To be resumable, a bundle MUST include:
759
+
760
+ - **Session metadata** used for lookup/ranking and timestamps (when present). v2 does not require mutable session-level fields; lookup/ranking may be derived from durable observations and outputs.
761
+ - **Observations** (e.g., git branch name + HEAD SHA) as append-only data for better resume ranking
762
+ - **Runs** (0..N) and their run DAG (nodes + edges), including stable identifiers
763
+ - **Portable node snapshots** sufficient to rehydrate execution deterministically on another machine
764
+ - **Durable outputs** (`output.notesMarkdown` and artifacts/previews) attached to nodes for recap/search
765
+ - **Pinned workflow snapshots by `workflowHash`**, where `workflowHash` is computed from the **fully expanded compiled workflow**
766
+
767
+ Tokens (`stateToken`, `ackToken`) are not portable and must not be relied upon across export/import. On import, WorkRail re-mints new tokens from stored node snapshots.
768
+
769
+ ### Integrity and conflicts (recommended behavior)
770
+
771
+ - Include a manifest of digests (hashes) to detect bundle corruption and surface an actionable error.
772
+ - If importing a bundle collides with an existing session identifier, default to importing as a **new** session (no implicit merges). Merge can be an explicit, opt-in feature later.
773
+
774
+ ## Replacing File-Based Docs with Dashboard Artifacts (Optional)
775
+
776
+ Some workflows currently instruct the agent to write markdown files. The token-based contract can support dashboard-native documents by allowing an optional, step-defined `output` payload in `continue_workflow`:
777
+
778
+ - Default: accept `output.notesMarkdown` and render it per step.
779
+ - For workflows that need structured dashboards: steps should explicitly define an output contract (schema + example) to avoid inference.
780
+
781
+ ### Defaults for legacy workflows (no output contract)
782
+
783
+ If a workflow has not been updated to include an explicit output contract, WorkRail can still provide a usable dashboard without guessing semantics:
784
+
785
+ - Render a per-step “Notes” artifact from `output.notesMarkdown`.
786
+ - Show token lineage, pending step metadata, and completion timestamps.
787
+
788
+ This is intentionally generic. Structured artifacts (tables, findings, MR comments, etc.) require explicit contracts.
789
+
790
+ ### How explicit output contracts can work
791
+
792
+ Two compatible approaches:
793
+
794
+ 1. **WorkRail-owned contract packs (preferred, v2 direction)**: steps reference `output.contractRef` pointing to a WorkRail-owned contract pack (`wr.contracts.*`). The pinned compiled workflow snapshot embeds the resolved schemas/examples to keep behavior deterministic.
795
+ 2. **Server-side registry (future, optional)**: the workflow references a named output contract, and WorkRail provides the schema and example (still WorkRail-owned; not project-local).
796
+
797
+ Either way, the workflow (not heuristics) is authoritative.
798
+
799
+ This approach keeps the agent interaction primitive and moves deterministic “doc updates” into server-side reducers.
800
+
801
+ ### Appendix A: Example dashboard artifacts for `mr-review-workflow` (illustrative)
802
+
803
+ This appendix illustrates the *shape* of what an agent might send and what the dashboard might render. It is intentionally small and primitives-only for the agent. The exact schema should be declared explicitly by the workflow (or by a referenced contract registry).
804
+
805
+ #### Example: `continue_workflow` with structured `output`
806
+
807
+ ```json
808
+ {
809
+ "stateToken": "st.v1....",
810
+ "ackToken": "ack.v1....",
811
+ "context": {
812
+ "ticketId": "AUTH-1234",
813
+ "complexity": "Standard"
814
+ },
815
+ "output": {
816
+ "kind": "mr_review.phase_0_triage",
817
+ "review": {
818
+ "mrTitle": "fix(di): explicitly wire MCP description provider",
819
+ "classification": "Standard",
820
+ "focusAreas": ["tool contract correctness", "DI wiring", "agent usability"]
821
+ },
822
+ "revisionLogEntry": "Triage completed; created review session and established focus areas."
823
+ }
824
+ }
825
+ ```
826
+
827
+ WorkRail would store this as dashboard artifacts (example conceptual mapping):
828
+
829
+ - `review.header`: `mrTitle`, `classification`, `focusAreas`
830
+ - `review.revisionLog[]`: append entry
831
+
832
+ #### Example: changed files table rows (Phase 1)
833
+
834
+ ```json
835
+ {
836
+ "kind": "mr_review.changed_files",
837
+ "changedFiles": [
838
+ {
839
+ "path": "src/mcp/tool-description-provider.ts",
840
+ "summary": "Remove debug log; ensure provider wiring is explicit",
841
+ "risk": "L"
842
+ },
843
+ {
844
+ "path": "src/di/container.ts",
845
+ "summary": "Wire description provider in composition root",
846
+ "risk": "M"
847
+ }
848
+ ]
849
+ }
850
+ ```
851
+
852
+ Dashboard render intent:
853
+
854
+ - a “Changed Files” table rendered from `changedFiles[]` with deterministic ordering and dedupe keyed by `path`.
855
+
856
+ #### Example: findings and copy-ready MR comments (Phase 2+)
857
+
858
+ ```json
859
+ {
860
+ "kind": "mr_review.findings",
861
+ "findings": [
862
+ {
863
+ "severity": "Major",
864
+ "location": { "file": "src/mcp/tool-descriptions.ts", "line": 79 },
865
+ "title": "Tool description drift: mentions workflow_next/completedSteps but contract uses start_workflow/continue_workflow tokens",
866
+ "rationale": "Agents will send the wrong shape and fail the tool boundary contract.",
867
+ "suggestion": "Update authoritative and standard descriptions to match the current schema."
868
+ }
869
+ ],
870
+ "mrComments": [
871
+ {
872
+ "location": { "file": "src/mcp/tool-descriptions.ts", "line": 79 },
873
+ "title": "Fix workflow tool description drift",
874
+ "body": "The description references older workflow_next/completedSteps terminology, but the current v2 contract uses start_workflow/continue_workflow with stateToken/ackToken. This mismatch will cause agent misuse; please update descriptions to match the current contract."
875
+ }
876
+ ]
877
+ }
878
+ ```
879
+
880
+ Notes:
881
+ - The workflow can keep using its rich prompts (and “functionReferences” in the workflow definition) while the dashboard replaces file I/O by treating these outputs as structured artifacts.
882
+ - If a workflow does not provide an output contract, WorkRail should fall back to the generic per-step notes dashboard (no inference).
883
+
884
+ ## Output contracts & enforcement (normative)
885
+
886
+ WorkRail v2 enables workflows to declare **required structured outputs** via contract packs, and enforces these requirements (or records gaps) based on the effective mode.
887
+
888
+ ### How contracts are declared
889
+
890
+ Steps (and templates) may declare output requirements via an `output` object:
891
+
892
+ - `output.contractRef` (optional): references a WorkRail-owned closed-set contract pack (e.g., `wr.contracts.capability_observation`).
893
+ - `output.hints` (optional): non-enforced guidance for the agent (e.g., "≤10 lines").
894
+
895
+ Template calls may **automatically imply a contractRef** without the author specifying it (e.g., `wr.templates.capability_probe` implies `wr.contracts.capability_observation`).
896
+
897
+ ### Enforcement on `continue_workflow`
898
+
899
+ When a step declares `output.contractRef`, WorkRail validates the contract output before advancing:
900
+
901
+ - **Blocking modes (guided / full_auto_stop_on_user_deps)**: if required output is missing or invalid, return `kind: "blocked"` with structured "missing required output" reason, example payload, and the same pending step.
902
+ - **Never-stop mode**: if required output is missing or invalid, record a **critical gap** and continue.
903
+
904
+ This enables the self-correcting loop: step tells the agent what to fill out; the next `continue_workflow` verifies it.
905
+
906
+ ### Contract pack versioning
907
+
908
+ Contract packs are referenced by ID only. Versioning is implicit: the pinned compiled workflow snapshot carries the exact contract pack schemas resolved at compile time.
909
+
910
+ ## PromptBlocks & rendered prompts (normative)
911
+
912
+ Workflows may author steps with **structured `promptBlocks`** rather than a single `prompt` string. WorkRail compiles `promptBlocks` into a deterministic, text-first `pending.prompt`.
913
+
914
+ Canonical block set: `goal`, `constraints`, `procedure`, `outputRequired`, `verify`.
915
+
916
+ Blocks are **optional**; plain `prompt` strings are still allowed.
917
+
918
+ ## Boundary discipline (`nextIntent`) (recommended)
919
+
920
+ Agents do not (and must not) know the next workflow step until WorkRail returns it. In practice, agents may “fill the gap” with confident speculation (“after this I’ll implement…”) and may even skip calling `continue_workflow` when they believe they know what’s next. This undermines the “one step at a time” property that makes v2 rewind-safe.
921
+
922
+ Recommended response affordance:
923
+ - Add a **closed-set** `nextIntent` field to execution responses that states the **only safe next action**, without revealing future steps:
924
+ - `perform_pending_then_continue`
925
+ - `await_user_confirmation`
926
+ - `rehydrate_only`
927
+ - `complete`
928
+ - Pair `nextIntent` with a deterministic, byte-budgeted footer in `pending.prompt` that reinforces:
929
+ - the next step is unknown until fetched
930
+ - the next move is to call `continue_workflow`
931
+
932
+ This is behavioral shaping, not enforcement (WorkRail cannot inspect the transcript). It is still valuable because it reduces both narrative drift and tool-call drift across model variability.
933
+
934
+ ## AgentRole (normative clarification)
935
+
936
+ WorkRail **cannot control the agent's system prompt**. The `agentRole` field is workflow/step-scoped stance text injected into the rendered prompt. Workflow-level applies to all steps; step-level overrides.
937
+
938
+ ## Divergence markers (normative)
939
+
940
+ Agents may report `workflow_divergence` artifacts when intentionally deviating from step instructions. Structure: `reason` (closed set), `summary`, optional `relatedStepId`. Studio badges these nodes. Enforcement: optional unless a step explicitly requires it.
941
+
942
+ ## FAQ
943
+
944
+ ### How is `stateToken` “opaque”?
945
+
946
+ Opaque means clients treat it as an uninterpreted string. WorkRail is free to encode internal state however it wants (and change that encoding over time) as long as it can validate and decode it server-side.
947
+
948
+ In practice, WorkRail should make tokens tamper-evident (e.g., signature/HMAC) and versioned (e.g., `st.v1...`, `st.v2...`) to support safe evolution.
949
+
950
+ ### Are tokens portable across export/import?
951
+
952
+ Tokens are handles, not durable truth. Exports/imports must be **resumable**, which implies:
953
+
954
+ - the durable store must persist portable run graph nodes and pinned workflow snapshots
955
+ - on import, WorkRail re-mints new tokens from stored node snapshots
956
+
957
+ See ADR 006 and ADR 007.
958
+
959
+ ### Is `ackToken` enough? What about loops and confirmations?
960
+
961
+ Yes. Loops, confirmations, and other control structures are internal workflow mechanics represented in the snapshot behind `stateToken`. The public contract is simply:
962
+
963
+ - WorkRail tells the agent what to do next (`pending`).
964
+ - The agent completes it.
965
+ - The agent acknowledges completion using the `ackToken` issued for that snapshot.
966
+
967
+ ### Do we need `workflowId` on `continue_workflow`?
968
+
969
+ No. `workflowId` can be embedded into `stateToken`. Keep `workflowId` only on `start_workflow` (because there is no token yet). Optionally return `workflowId` in responses as informational metadata for the dashboard.
970
+
971
+ ### If sessions are “demoted”, do we still need session tools?
972
+
973
+ Sessions become a UX projection, so session creation and updates should happen as a side effect of `start_workflow`/`continue_workflow` without broad agent-facing session CRUD tools.
974
+
975
+ If checkpoint-only sessions are desired later, add a narrowly-scoped `start_session` tool behind a feature flag. For brand new chat resumption without token copy/paste, use a read-only `resume_session` lookup tool behind a feature flag.
976
+
977
+ ### Why do we need `checkpoint_workflow` at all?
978
+
979
+ Because rewinds are external to the workflow engine. If meaningful work happens outside a workflow step loop and the user rewinds without warning, that progress is lost unless WorkRail has already recorded a durable recap in the session store.
980
+
981
+ ## Status notes (non-normative)
982
+
983
+ The previously listed “open items” have been **locked in the v2 core design locks** (so the contract does not become a second, drifting source of truth). For the authoritative definitions, see:
984
+
985
+ - **Preferred tip policy**: `docs/design/v2-core-design-locks.md` (Section 2)
986
+ - **Gaps + user-only dependencies + unified reason model**: `docs/design/v2-core-design-locks.md` (Section 3)
987
+ - **Preferences + modes (minimal closed set + preset guidance)**: `docs/design/v2-core-design-locks.md` (Section 4)
988
+ - **`resume_session` deterministic ranking + budgets + normalization**: `docs/design/v2-core-design-locks.md` (Section 2.3)
989
+ - **Authoring model (promptBlocks optional, contract packs, builtins/feature configs)**: `docs/design/workflow-authoring-v2.md`
990
+
991
+ Remaining work is implementation (Slice 4+) and keeping docs/code aligned.
992
+
993
+ ## Related
994
+
995
+ - MCP constraints: `docs/reference/mcp-platform-constraints.md`
996
+ - ADR 005 (opaque tokens): `docs/adrs/005-agent-first-workflow-execution-tokens.md`
997
+ - ADR 006 (append-only session/run log): `docs/adrs/006-append-only-session-run-event-log.md`
998
+ - ADR 007 (resume + checkpoint-only sessions): `docs/adrs/007-resume-and-checkpoint-only-sessions.md`