@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +3 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +252 -93
  160. package/workflows/workflow-for-workflows.v2.json +188 -77
@@ -0,0 +1,527 @@
1
+ # Workflow Authoring Guide (v2)
2
+
3
+ WorkRail v2 authoring is **JSON-first** and is designed for **determinism**, **rewind-safety**, and **resumability**.
4
+
5
+ > **Status:** v2 authoring is design-locked but not necessarily shipped yet. This doc is a v2-only entry point.
6
+
7
+ ## Canonical references (v2)
8
+
9
+ - **Authoring model + JSON examples:** `docs/design/workflow-authoring-v2.md`
10
+ - **Execution contract (token-based):** `docs/reference/workflow-execution-contract.md`
11
+ - **Core design locks (anti-drift):** `docs/design/v2-core-design-locks.md`
12
+
13
+ ## v2 authoring principles (high level)
14
+
15
+ ### Structured freedom over rigid scripts
16
+
17
+ WorkRail workflows should constrain **outcomes and invariants**, not micromanage cognition.
18
+
19
+ Material branching, pathing, loop continuation, and gating should live in the workflow/engine as declarative control flow whenever possible, not in implicit agent judgment hidden inside prompt prose.
20
+
21
+ Authors should aim for:
22
+
23
+ - **rigid on invariants**: required outputs, loop decisions, confidence disclosure, blocked vs never-stop behavior, final handoff structure
24
+ - **semi-structured on heuristics**: routing matrices, severity guidance, confidence combination rules, artifact vs context split
25
+ - **adaptive on reasoning**: exploration order, clue prioritization, synthesis, finding phrasing, and unusual-case handling
26
+
27
+ The goal is **structured freedom**:
28
+
29
+ - not "trust the model" vagueness
30
+ - not bureaucratic form-filling
31
+
32
+ The agent should usually determine and record the route-driving facts. The engine should usually decide what node, branch, or loop state comes next.
33
+
34
+ Prefer asking:
35
+
36
+ - what must be known before leaving this phase?
37
+ - what must be disclosed if it is not known?
38
+
39
+ over prescribing the exact internal thought sequence the agent must follow.
40
+
41
+ ### Never-stop by default for enrichment and confidence gaps
42
+
43
+ For most workflows, missing enrichment sources or weak confidence should **degrade and disclose**, not block.
44
+
45
+ Typical examples:
46
+
47
+ - preferred capability unavailable
48
+ - missing ticket or supporting docs
49
+ - weak boundary confidence
50
+ - incomplete policy/context discovery
51
+
52
+ Blocking should be reserved for cases where:
53
+
54
+ - the review/task target is not meaningfully available
55
+ - a truly required capability is unavailable
56
+ - a required output contract is missing in blocking modes
57
+
58
+ ### Confidence is multi-dimensional
59
+
60
+ Avoid a single vague "confidence" concept when different uncertainty sources matter differently.
61
+
62
+ Reusable confidence dimensions often include:
63
+
64
+ - boundary confidence
65
+ - context / intent confidence
66
+ - policy-context confidence
67
+ - evidence confidence
68
+ - validation confidence
69
+
70
+ Authors should explicitly decide:
71
+
72
+ - which confidence dimensions matter for this workflow
73
+ - which ones cap final conclusions
74
+ - which ones trigger follow-up loops versus just downgrade the final handoff
75
+
76
+ ### Use structure only when it earns its place
77
+
78
+ A matrix, field, ledger, or classification should exist only if it does at least one of these:
79
+
80
+ - prevents a real recurring failure mode
81
+ - improves deterministic control flow or resumability
82
+ - improves user-visible honesty / explainability
83
+ - materially changes routing or rigor
84
+
85
+ If it does none of those, it should be removed or downgraded to advisory guidance.
86
+
87
+ Practical example:
88
+
89
+ - a `boundaryConfidence` field earns its place because it can cap conclusions and trigger follow-up
90
+ - a five-level taxonomy that never changes routing probably does not
91
+
92
+ ### Anti-lazy wording
93
+
94
+ Structured freedom should not become vague permission for shallow work.
95
+
96
+ Be careful with wording like:
97
+
98
+ - `if appropriate`
99
+ - `minimal pass`
100
+ - `light scan`
101
+ - `you may`
102
+ - `smallest`
103
+ - `cheapest`
104
+
105
+ These phrases are often useful, but they should usually be paired with a clear floor for what still must be achieved.
106
+
107
+ Prefer wording like:
108
+
109
+ - "do the lightest pass that still surfaces the main approaches, hard constraints, and obvious contradictions"
110
+ - "if you do not delegate, record why solo execution is enough"
111
+ - "generate enough distinct options to support a real choice"
112
+
113
+ The goal is freedom in method, not softness in rigor.
114
+
115
+ ### User-voice prose is a real option
116
+
117
+ For bundled and user-facing workflows, authors should consider prose that often sounds like the user is directly instructing the agent.
118
+
119
+ This is often better than detached framework narration for exploratory, advisory, or design-heavy workflows because it:
120
+
121
+ - keeps the workflow grounded in user intent
122
+ - reduces internal-boilerplate tone
123
+ - makes the workflow feel more like an expression of the user's will
124
+
125
+ Neutral/system-style prose is still appropriate for internal, infrastructural, or highly mechanical workflows. Choose deliberately.
126
+
127
+ ### Auditor-first delegation is often the better default
128
+
129
+ When using subagents or routines, prefer bounded **audits** of the main agent's work over delegating broad task ownership.
130
+
131
+ Good auditor uses:
132
+
133
+ - context completeness audit
134
+ - depth audit
135
+ - adversarial challenge
136
+ - philosophy alignment review
137
+ - final verification
138
+
139
+ Executor-style delegation still makes sense for bounded independent work, but the parent workflow should usually remain the canonical synthesizer and decision-maker.
140
+
141
+ ### JSON-first authoring
142
+
143
+ WorkRail v2 uses **JSON** as the canonical authoring format. DSL and YAML remain possible future input formats, but for v2 we optimize for determinism and straightforward validation.
144
+
145
+ Workflows are hashed based on their **compiled canonical model** (after templates/features/contracts are expanded), not raw text, so the hash remains stable and deterministic.
146
+
147
+ ### Authoring primitives (v2)
148
+
149
+ WorkRail v2 introduces several primitives for expressive workflows:
150
+
151
+ - **Capabilities** (workflow-global): declare optional agent capabilities like `delegation` or `web_browsing` (required/preferred).
152
+ - **Features** (compiler middleware): mostly toggle IDs; a small subset supports typed config objects (`{id, config}`).
153
+ - **Templates**: reusable step sequences, called explicitly via `type: "template_call"`.
154
+ - **Contract packs**: WorkRail-owned output schemas for structured artifacts (e.g., `wr.contracts.capability_observation`).
155
+ - **PromptBlocks** (optional): structure step prompts as blocks (goal/constraints/procedure/outputRequired/verify) which compile to deterministic text.
156
+ - **AgentRole**: workflow and/or step-level stance/persona (not system prompt control).
157
+ - **Extension points**: named slots declared with `extensionPoints` and referenced via `{{wr.bindings.slotId}}` tokens; resolved at compile time from project `.workrail/bindings.json` overrides or workflow defaults. Enables project-overridable delegation seams without forking workflow JSON.
158
+ - **References**: workflow-declared pointers to external documents (schemas, specs, guides). Resolved at start time, delivered as a separate MCP content item. The agent reads the files itself if needed. See "Workflow references" section below.
159
+ - **Assessments**: workflow-declared assessment shapes (`assessments`) that steps can reference with one or more `assessmentRefs` and, in v1, use for one exact-match `require_followup` consequence via `assessmentConsequences` (fires if any dimension across any referenced assessment equals the trigger level).
160
+
161
+ For detailed JSON syntax and examples, see: `docs/design/workflow-authoring-v2.md`.
162
+
163
+ ### Choosing the right authoring mechanism
164
+
165
+ Use different primitives for different jobs. A good rule of thumb is:
166
+
167
+ - **Different flow** → use `runCondition`
168
+ - **Same flow, different wording** → use `promptFragments`
169
+
170
+ | Mechanism | What it does | When it applies | Best for |
171
+ |---|---|---|---|
172
+ | **`features`** | Inject reusable guidance into `promptBlocks` sections | **Compile time** | Cross-cutting rules like memory or subagent discipline |
173
+ | **`promptFragments`** | Add small conditional prompt text to an existing step | **Render/runtime** | Context-sensitive nudges without branching the DAG |
174
+ | **`templateCall`** | Insert reusable step structure or step sequences inline | **Compile time** | Reusing standard routines or audits as first-class parent steps |
175
+ | **`extensionPoints`** | Resolve overridable bound routine/workflow IDs in prompt text | **Compile time** | Swapping delegated bounded seams without forking |
176
+ | **`runCondition`** | Decide whether a step runs | **Runtime** | Real branching, pathing, or routing |
177
+
178
+ Use these as the default choice rules:
179
+
180
+ - **Use `runCondition`** when the workflow should actually take a different path.
181
+ - **Use `promptFragments`** when the step stays the same but needs small context-sensitive additions.
182
+ - **Use `templateCall`** when you want reusable standard structure or routine injection.
183
+ - **Use `extensionPoints`** when you want a project-overridable delegated seam, not inline structure.
184
+ - **Use `features`** for workflow-wide repeated guidance.
185
+
186
+ #### References are for execution-time companion material, not authoring provenance
187
+
188
+ Use workflow `references` sparingly.
189
+
190
+ - **Good use**: documents the running workflow may genuinely need while doing its job, such as a shipped rubric, a target-system spec, or a project policy document.
191
+ - **Bad use**: authoring-only material about how the workflow was designed, including the workflow schema, authoring spec, or maintainer provenance, unless the workflow is itself about authoring or validating workflows.
192
+
193
+ Practical test:
194
+
195
+ - If the reference helps the agent perform the workflow's **runtime task**, keep it.
196
+ - If it only helps a maintainer justify or inspect the workflow's design, remove it from normal execution workflows.
197
+
198
+ #### Strong default: inject first, override only when delegation is intentional
199
+
200
+ When choosing between `templateCall` and `extensionPoints`, prefer this rule:
201
+
202
+ - **If the routine should be part of the parent workflow's visible step structure, use `templateCall`.**
203
+ - **If the routine should remain an opaque bounded implementation that the parent may delegate to, use delegation.**
204
+ - **If that delegated seam should be customizable per project, wrap that delegation seam in `extensionPoints`.**
205
+
206
+ Do **not** use `extensionPoints` as a generic substitute for routine injection.
207
+
208
+ Important implementation detail:
209
+
210
+ - `templateCall` expands routines into real steps during the compiler's template pass.
211
+ - `{{wr.bindings.*}}` tokens are resolved later, during binding resolution.
212
+ - Therefore, **extension points cannot currently choose which routine gets injected by a `templateCall`**.
213
+
214
+ ### Baseline (Tier 0): notes-first
215
+
216
+ - **You can write workflows with no special authoring features.**
217
+ - The default durable output is a short recap in `output.notesMarkdown` (recorded by the agent when advancing or checkpointing).
218
+ - Structured artifacts are **optional** and must never be required for a workflow to be usable.
219
+
220
+ ### Assessment-gate authoring (v1)
221
+
222
+ Assessment gates are now a shipped authoring/runtime feature, but the first slice is intentionally narrow.
223
+
224
+ Use them when:
225
+
226
+ - the workflow needs the agent to submit a bounded judgment explicitly
227
+ - that judgment should be durably recorded
228
+ - one specific result should keep the same step pending and require follow-up before retry
229
+
230
+ The v1 shape is:
231
+
232
+ - declare one or more workflow-level assessments in `assessments`
233
+ - reference them from a step with one or more `assessmentRefs` entries
234
+ - optionally declare one step-level `assessmentConsequences` rule
235
+ - if any dimension across any referenced assessment matches that rule, the engine returns a retryable same-step follow-up block
236
+
237
+ Important limits in v1:
238
+
239
+ - at least one `assessmentRefs` entry when `assessmentConsequences` is present; multiple refs are supported
240
+ - at most one `assessmentConsequences` entry per step
241
+ - one supported effect only: `require_followup`
242
+ - exact-match trigger only: one declared dimension equals one declared canonical level
243
+
244
+ Keep the responsibility split clean:
245
+
246
+ - **assessment definition** = reusable vocabulary (`dimensions`, allowed `levels`, purpose)
247
+ - **step usage** = local execution behavior (`assessmentRefs`, `assessmentConsequences`)
248
+
249
+ Do not put execution-policy meaning into the assessment definition itself.
250
+
251
+ Practical guidance:
252
+
253
+ - use assessments for **bounded judgment**, not generic scoring
254
+ - keep dimensions small and semantically meaningful
255
+ - use canonical level names authors and agents can understand easily
256
+ - write follow-up guidance as **same-step retry guidance**, not as a subflow or rewind instruction
257
+ - prefer one strong follow-up trigger over multiple weak ones
258
+
259
+ **Dimension design: orthogonality matters**
260
+
261
+ The value of multi-dimensional assessments is that each dimension independently blocks advancement for a different reason. A dimension that restates existing workflow state adds ceremony without structure.
262
+
263
+ Good dimensions are:
264
+ - **Orthogonal**: each captures a distinct failure mode the others don't catch
265
+ - **Independently checkable**: a `low` rating on one dimension alone justifies follow-up, regardless of the others
266
+ - **Specific**: about a concrete, observable thing -- not "is the overall result good"
267
+
268
+ Bad dimensions:
269
+ - A single `confidence` dimension that mirrors the workflow's existing `recommendationConfidenceBand` -- it just restates what the workflow already knows
270
+ - Multiple dimensions that all reduce to the same question phrased differently
271
+ - Dimensions so correlated that one being low always implies the others are low too
272
+
273
+ Example of orthogonal dimensions for an MR review handoff:
274
+ - `evidence_quality` -- are findings grounded in specific code locations? (catches: weak analysis)
275
+ - `coverage_completeness` -- are all relevant domains checked? (catches: blind spots)
276
+ - `contradiction_resolution` -- are competing interpretations resolved? (catches: premature synthesis)
277
+
278
+ Each catches a distinct failure mode. A review can have strong evidence but miss whole domains, or have good coverage with unresolved contradictions.
279
+
280
+ **Consequence trigger**: use `anyEqualsLevel` to specify which level should block. WorkRail checks all submitted dimensions and fires the consequence if any of them equals that level. For single-dimension assessments this works the same as an exact match.
281
+
282
+ ```json
283
+ {
284
+ "when": { "anyEqualsLevel": "low" },
285
+ "effect": { "kind": "require_followup", "guidance": "..." }
286
+ }
287
+ ```
288
+
289
+ **V1 limit**: at most one consequence per step.
290
+
291
+ Good fit:
292
+
293
+ - "Before handing off, assess whether the diagnosis is ready -- coverage is complete, evidence is grounded, and contradictions are resolved."
294
+ - "If evidence quality is low, require follow-up to anchor findings to specific code before retry."
295
+
296
+ Bad fit:
297
+
298
+ - generic five-level scores that never affect workflow behavior
299
+ - a single `confidence` dimension that mirrors a confidence band the workflow already tracks
300
+ - multiple interacting rule chains that really want a policy DSL
301
+ - subflow-style recovery sequences masquerading as a single follow-up consequence
302
+
303
+ ### Rigor vs rigidity
304
+
305
+ Workflows should provide a **strong skeleton, not a straitjacket**.
306
+
307
+ - Be strict about the things that protect quality:
308
+ - required outcomes
309
+ - important decision points
310
+ - evidence and uncertainty accounting
311
+ - final validation or challenge passes
312
+ - Be flexible about the things LLMs are good at:
313
+ - synthesis order
314
+ - framing moves
315
+ - idea generation
316
+ - creative problem decomposition
317
+
318
+ Author for **what must be accomplished**, not for an exact internal choreography of thought.
319
+
320
+ ### Anti-lazy wording
321
+
322
+ Adaptive workflows should leave room for judgment without creating easy escape hatches.
323
+
324
+ Be careful with wording like:
325
+
326
+ - `if appropriate`
327
+ - `minimal pass`
328
+ - `light scan`
329
+ - `you may`
330
+ - `smallest`
331
+ - `cheapest`
332
+
333
+ These phrases are often useful, but they become lazy when they are not paired with a quality floor.
334
+
335
+ Prefer wording like:
336
+
337
+ - "do the lightest pass that still surfaces the main approaches, hard constraints, and obvious contradictions"
338
+ - "choose the lighter path only if it still answers the real question"
339
+ - "if you do not delegate, record why solo execution is enough"
340
+
341
+ Good adaptive wording gives the agent freedom in method while still requiring it to **earn confidence**.
342
+
343
+ ### User-voice prose
344
+
345
+ For bundled and user-facing workflows, prefer prose that often feels like **the user is directly instructing the agent**.
346
+
347
+ This usually works better than detached author or framework narration because it:
348
+
349
+ - keeps the workflow grounded in user intent
350
+ - makes prompts feel less like internal boilerplate
351
+ - encourages the agent to treat the workflow as an expression of the user's will
352
+
353
+ Neutral/system-style prose is still fine for internal or infrastructural workflows, but user-facing flows generally benefit from a clearer user voice.
354
+
355
+ ### Builtins (no user-defined plugins)
356
+
357
+ WorkRail v2 provides **built-in** building blocks that workflows (including external workflows) can reference:
358
+
359
+ - **Templates**: pre-built steps (or step sequences) authors can “call” to speed up authoring and ensure consistency.
360
+ - **Features**: deterministic, closed-set “middleware” applied by WorkRail (e.g., tier-aware instructions, formatting, durable recap guidance).
361
+ - **Contract packs**: server-side definitions for allowed artifact kinds and small examples (no schema authoring required by workflow authors).
362
+
363
+ External workflows can reference these builtins, but cannot define arbitrary new plugin code.
364
+
365
+ ### Where injections happen: templates as anchors
366
+
367
+ When something needs to be injected at a specific point (“run an audit here”, “insert a standard gate here”), **template references are the primary anchor**:
368
+
369
+ - Explicit at the callsite (less hidden magic).
370
+ - Deterministic and debuggable.
371
+ - Avoids tag-taxonomy sprawl.
372
+
373
+ If what you want is **team-overridable delegation**, use `extensionPoints` instead — but treat that as a different mechanism with different runtime behavior, not another form of injection.
374
+
375
+ Tags can still exist as optional **classification** metadata (for UI organization and search), but should not be the primary injection mechanism.
376
+
377
+ ### Response supplements for start/resume-only instructions
378
+
379
+ Some instructions should **not** be mixed into the workflow-authored step prompt:
380
+
381
+ - short onboarding guidance
382
+ - authority/provenance framing for the WorkRail channel
383
+ - logistics that should appear only at workflow start or when resuming
384
+
385
+ For these, use **response supplements** at the MCP response boundary rather than editing workflow JSON prompts directly.
386
+
387
+ Current implementation lives in `src/mcp/response-supplements.ts`.
388
+
389
+ #### When to use a response supplement
390
+
391
+ Use a response supplement when all of the following are true:
392
+
393
+ - the instruction is **system-owned** or delivery-owned, not part of the workflow author's actual step text
394
+ - it should be shown only for specific lifecycle moments like **`start`** or **`rehydrate`**
395
+ - it should remain **structurally separate** from the main step prompt so agents do not confuse it with the user's core instruction
396
+
397
+ Do **not** use a response supplement for:
398
+
399
+ - normal step instructions that belong in the workflow prompt
400
+ - durable session state
401
+ - anything that must be remembered as part of the workflow's semantic execution state
402
+
403
+ #### Delivery modes
404
+
405
+ Response supplements support two delivery modes:
406
+
407
+ - **`per_lifecycle`**: emit on every eligible lifecycle (for example, every `rehydrate`)
408
+ - **`once_per_session`**: emit only on one designated lifecycle (for example, `start`) without persisting delivery state
409
+
410
+ In the current design, `once_per_session` is a **policy-level one-time instruction**, not a durable delivery record. It means:
411
+
412
+ - choose the single lifecycle where the supplement should appear
413
+ - render it there deterministically
414
+ - do **not** store "shown/not shown" in session state unless exact delivery history becomes a real execution requirement
415
+
416
+ This keeps presentation policy out of durable workflow state.
417
+
418
+ #### How to add a one-time instruction
419
+
420
+ 1. Add a new supplement entry in `src/mcp/response-supplements.ts`
421
+ 2. Give it a stable `kind` and explicit `order`
422
+ 3. Choose the eligible `lifecycles`
423
+ 4. Set `delivery` to:
424
+ - `{ mode: 'per_lifecycle' }`, or
425
+ - `{ mode: 'once_per_session', emitOn: '<lifecycle>' }`
426
+ 5. Keep the text:
427
+ - short
428
+ - system-owned
429
+ - clearly separate from the main authored prompt
430
+ 6. Add or update:
431
+ - unit tests in `tests/unit/mcp/response-supplements.test.ts`
432
+ - integration tests if MCP boundary behavior matters
433
+
434
+ #### Authoring rule of thumb
435
+
436
+ Use the **workflow prompt** for what the user wants done.
437
+
438
+ Use a **response supplement** for small, boundary-owned instructions about how WorkRail should frame or deliver that step to the agent.
439
+
440
+ ### Workflow references
441
+
442
+ Workflows can declare pointers to external documents that the agent should be aware of during execution. Unlike `metaGuidance` (short behavioral rules surfaced on start and resume), references point at external files without inlining their content.
443
+
444
+ ```jsonc
445
+ "references": [
446
+ {
447
+ "id": "api-schema",
448
+ "title": "API Schema",
449
+ "source": "./spec/api-schema.json",
450
+ "purpose": "Canonical API contract",
451
+ "authoritative": true
452
+ }
453
+ ]
454
+ ```
455
+
456
+ - **Delivered automatically** as a separate MCP content item on `start` (full details) and `rehydrate` (compact reminder). Not on `advance`.
457
+ - **Pointer-only**: WorkRail validates the path exists at start time but does not inline the file content. The agent reads files itself.
458
+ - **Surfaced in `inspect_workflow`** for discoverability before starting.
459
+ - **Included in `workflowHash`**: reference declarations (not file contents) are part of the hash.
460
+
461
+ For JSON syntax details, see: `docs/design/workflow-authoring-v2.md` → "References" section.
462
+
463
+ ### Step identity and provenance
464
+
465
+ To keep authoring simple:
466
+
467
+ - Author step IDs remain the primary, stable identifiers (what agents see as `pending.stepId`).
468
+ - Template-expanded/internal step IDs are **reserved/internal** and carry provenance (what injected them, where, and why).
469
+ - By default, injected steps should be **collapsed** for agent UX; provenance exists for debugging/auditing and advanced views.
470
+
471
+ ### Versioning and determinism
472
+
473
+ - The canonical pin is a **content hash** of the **fully expanded compiled workflow** (including template expansions, feature application, and contract pack selection), not a human-maintained `version` string.
474
+ - Human `version` fields may exist as labels, but should not be the source of truth for determinism.
475
+
476
+ ### Workflow staleness
477
+
478
+ Workflows can drift out of sync with the authoring spec they were written against. WorkRail surfaces this as a `staleness` signal in `list_workflows` and `inspect_workflow` output.
479
+
480
+ **How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
481
+
482
+ - `none` — workflow was validated against the current spec version
483
+ - `likely` — spec was updated since the workflow was last reviewed
484
+ - `possible` — workflow has never been run through `workflow-for-workflows`
485
+
486
+ **Stamping a workflow:**
487
+
488
+ ```bash
489
+ npm run stamp-workflow -- workflows/my-workflow.json
490
+ git add workflows/my-workflow.json && git commit -m "chore: stamp workflow"
491
+ ```
492
+
493
+ The stamp must be committed to take effect. The `workflow-for-workflows` Phase 7 step includes a reminder to do this.
494
+
495
+ **Visibility:** By default, the staleness signal is only shown for user-owned/imported workflows (`personal`, `rooted_sharing`, `external`). Built-in and legacy_project workflows are excluded. Set `WORKRAIL_DEV=1` to see staleness for all categories (useful for catalog maintenance).
496
+
497
+ **In `validate:registry`:** The validator prints a non-blocking advisory listing unstamped and outdated workflows after each run. This is always visible regardless of the dev flag.
498
+
499
+ ### Debugging and auditing
500
+
501
+ WorkRail v2 treats debugging/auditing as first-class:
502
+
503
+ - WorkRail should record a bounded “decision trace” (why a step was selected/skipped, loop decisions, fork detection) as durable data.
504
+ - Dashboards and exports can surface this trace for post-mortems without requiring the agent to carry debugging internals in chat.
505
+ - “Cognitive audits” (subagent auditor model) are supported via built-in templates/features, not bespoke author boilerplate.
506
+
507
+ ### Forced self-audit over self-reported confidence
508
+
509
+ Agents will often take the easy way out:
510
+
511
+ - assume they already have enough context
512
+ - assume they already understand the boundary
513
+ - skip challenge or audit because it "probably isn't needed"
514
+
515
+ So when a workflow needs an honest self-check, do **not** rely on vibes-only fields like:
516
+
517
+ - `stillFuzzy = true|false`
518
+ - `contextAuditNeeded = true|false`
519
+ - optional challenge wording with no rubric or trigger
520
+
521
+ Prefer patterns that force the agent to confront uncertainty:
522
+
523
+ - score concrete dimensions instead of reporting confidence directly
524
+ - require a short evidence statement for each score
525
+ - derive the next action from the rubric or trigger rules
526
+
527
+ The workflow should prove to the agent that it may not know enough yet, instead of asking the agent whether it feels confident.