@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +3 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +252 -93
  160. package/workflows/workflow-for-workflows.v2.json +188 -77
@@ -0,0 +1,536 @@
1
+ # Assessment Gates Engine Handoff
2
+
3
+ ## Status
4
+
5
+ This is a handoff document for a new agent picking up the **native assessment / decision gates** engine feature.
6
+
7
+ It is intentionally written as a **catch-up + execution-orientation** doc:
8
+
9
+ - what problem we are solving
10
+ - why it matters now
11
+ - what the current engine can and cannot do
12
+ - where to read first
13
+ - what not to accidentally do
14
+
15
+ This is **not** the final design spec. It is the best current starting point for a fresh agent.
16
+
17
+ ## What this feature is
18
+
19
+ We want a **first-class engine feature** for structured assessments that can drive workflow behavior.
20
+
21
+ Today, workflows can only express confidence, readiness, or risk decisions in prose. The agent can write notes like:
22
+
23
+ - boundary confidence is low
24
+ - coverage confidence is medium
25
+ - we should continue because uncertainty remains
26
+
27
+ But the engine cannot reason over those assessments directly. That means:
28
+
29
+ - routing still depends on prompt interpretation
30
+ - confidence caps are prose-only
31
+ - follow-up triggers are prose-only
32
+ - traces can say what happened, but not cleanly expose the structured decision that drove it
33
+
34
+ The proposed feature is a **typed assessment / decision gate system** that lets:
35
+
36
+ - the **agent** assess named dimensions and provide short rationales
37
+ - the **engine** apply declared rules such as caps, routing outcomes, and follow-up triggers
38
+
39
+ ## Why this is the next biggest engine win
40
+
41
+ This feature has high leverage because it unlocks better workflow behavior across multiple domains:
42
+
43
+ - **MR review**
44
+ - confidence assessment
45
+ - boundary/context/routing caps
46
+ - block vs continue / follow-up decisions
47
+ - **planning**
48
+ - readiness gates
49
+ - “good enough to implement?” checks
50
+ - **debugging / investigation**
51
+ - next-step routing based on confidence, evidence, or ambiguity
52
+ - **future explainability**
53
+ - cleaner traceability of why the engine chose to continue, loop, or downgrade confidence
54
+
55
+ Compared with other ideas:
56
+
57
+ - it is more powerful than a workflow previewer because it improves **runtime behavior**, not just authoring UX
58
+ - it is more foundational than note scaffolding because it changes **decision quality and engine expressiveness**
59
+
60
+ ## Problem statement
61
+
62
+ WorkRail currently has a gap between:
63
+
64
+ - what workflows want to say in a structured way
65
+ - and what the engine can actually enforce or reason over
66
+
67
+ Examples:
68
+
69
+ - a workflow wants to say “if boundary confidence is Low, final confidence cannot exceed Low”
70
+ - a workflow wants to say “if coverage confidence is Low, reopen targeted follow-up”
71
+ - a workflow wants to say “if readiness is Medium with one specific concern, continue; otherwise stop”
72
+
73
+ Today, those rules live in prompts and notes. That is useful, but weak:
74
+
75
+ - not compiler-validated
76
+ - not engine-enforced
77
+ - not structurally visible in runtime traces
78
+ - easy to drift across workflows
79
+
80
+ ## Current recommendation
81
+
82
+ Build this as a **real engine feature**, not a tiny helper.
83
+
84
+ The intended shape is:
85
+
86
+ - **typed assessment definitions**
87
+ - **engine-applied gate rules**
88
+ - **durable traceability**
89
+ - **compiler/schema support**
90
+ - **reusable built-in or repo-owned assessment shapes**
91
+
92
+ This should be the **smallest complete thing worth living with**, not a toy MVP.
93
+
94
+ ## What success looks like
95
+
96
+ A strong first version should support all of the following:
97
+
98
+ ### 1. Typed assessment definitions
99
+
100
+ Workflows can declare assessment structures with:
101
+
102
+ - a stable name / reference
103
+ - named dimensions
104
+ - allowed levels
105
+ - rationale requirements
106
+
107
+ Examples:
108
+
109
+ - `confidenceAssessment`
110
+ - `readinessAssessment`
111
+ - `riskAssessment`
112
+
113
+ ### 2. Engine-applied rules
114
+
115
+ The engine can consume a completed assessment and apply rules like:
116
+
117
+ - cap final confidence
118
+ - trigger follow-up
119
+ - continue vs stop
120
+ - reopen loop
121
+ - downgrade recommendation band
122
+
123
+ ### 3. Durable execution visibility
124
+
125
+ Assessment results should be visible in durable execution history and usable for projection/trace surfaces.
126
+
127
+ That likely means:
128
+
129
+ - structured persistence
130
+ - explicit event(s)
131
+ - console/trace visibility later
132
+
133
+ ### 4. Compiler/schema validation
134
+
135
+ Workflow definitions should be validated so authors cannot:
136
+
137
+ - reference missing dimensions
138
+ - reference invalid levels
139
+ - define malformed gate rules
140
+
141
+ ### 5. Reuse
142
+
143
+ The feature should support either:
144
+
145
+ - inline assessment declarations
146
+ - reusable refs
147
+ - or both
148
+
149
+ without forcing every workflow to invent its own one-off matrix shape.
150
+
151
+ ## Recommended product scope
152
+
153
+ ### In scope for the first serious build
154
+
155
+ - a first-class assessment primitive such as:
156
+ - `assessmentGate`
157
+ - `assessmentRef`
158
+ - or a closely-related name
159
+ - assessment dimensions with a closed set of allowed levels
160
+ - short rationale capture per dimension
161
+ - rule evaluation in the engine
162
+ - durable persistence / traceability
163
+ - compile-time validation
164
+ - a few good built-in patterns:
165
+ - confidence
166
+ - readiness
167
+ - risk
168
+
169
+ ### Out of scope for this feature
170
+
171
+ Do **not** bundle these into the first implementation:
172
+
173
+ - generic arbitrary decision-table engine
174
+ - engine-injected note scaffolding
175
+ - large UI/console preview work
176
+ - overly open-ended value types
177
+ - “anything can assess anything” without clear type boundaries
178
+
179
+ Those may come later, but they should not bloat the first solid implementation.
180
+
181
+ ## Agent vs engine responsibility split
182
+
183
+ This split should remain sharp.
184
+
185
+ ### Agent responsibilities
186
+
187
+ - assess each declared dimension
188
+ - choose one allowed level per dimension
189
+ - provide a short rationale
190
+ - submit the assessment result as part of workflow output / continuation
191
+
192
+ ### Engine responsibilities
193
+
194
+ - validate the assessment shape
195
+ - validate levels and dimension names
196
+ - apply declared gate rules
197
+ - expose derived outcomes to later workflow behavior
198
+ - persist assessment facts durably
199
+ - record enough trace information to explain the decision path later
200
+
201
+ The engine should not replace the agent’s judgment. It should **formalize and enforce the consequences** of that judgment.
202
+
203
+ ## What this is not
204
+
205
+ This is **not**:
206
+
207
+ - a generic policy engine for all workflow logic
208
+ - a replacement for prompts
209
+ - a free-form confidence essay system
210
+ - a note-formatting feature
211
+
212
+ It is a **structured decision layer** for a small class of decisions that are currently trapped in prose.
213
+
214
+ ## Existing repo context
215
+
216
+ The idea already exists in the backlog:
217
+
218
+ - `docs/ideas/backlog.md`
219
+ - **Native assessment / decision gates for workflows**
220
+ - **Engine-injected note scaffolding** is now split out as a related follow-on idea
221
+
222
+ The MR review redesign work is one of the main reasons this feature is now compelling:
223
+
224
+ - `docs/plans/mr-review-workflow-redesign.md`
225
+
226
+ That doc now has a narrowed next slice with:
227
+
228
+ - compact confidence dimensions
229
+ - routing minimalism
230
+ - explicit engine-compatibility constraints
231
+
232
+ The main takeaway is:
233
+
234
+ - workflows want structured confidence/routing
235
+ - the current engine still forces those ideas to live in prompts
236
+
237
+ ## Reading order for a new agent
238
+
239
+ If you are picking this up fresh, read in this order.
240
+
241
+ ### 1. Repo workflow and operating rules
242
+
243
+ - `AGENTS.md`
244
+
245
+ Pay attention to:
246
+
247
+ - deliberate progression
248
+ - planning-doc expectations
249
+ - verification rules
250
+ - release rules
251
+
252
+ ### 2. Normative execution semantics
253
+
254
+ - `docs/reference/workflow-execution-contract.md`
255
+
256
+ Focus on:
257
+
258
+ - token-driven execution
259
+ - continuation behavior
260
+ - blocked / continue semantics
261
+ - where optional capabilities and durable state already fit
262
+
263
+ ### 3. Core durable engine design locks
264
+
265
+ - `docs/design/v2-core-design-locks.md`
266
+
267
+ Focus on:
268
+
269
+ - append-only truth model
270
+ - projections and durable state shape
271
+ - event philosophy
272
+ - anything that constrains new execution events or derived state
273
+
274
+ ### 4. Workflow validation philosophy
275
+
276
+ - `docs/plans/workflow-validation-design.md`
277
+
278
+ Focus on:
279
+
280
+ - runtime/validation parity
281
+ - shared resolution logic
282
+ - why validation must mirror real engine behavior
283
+
284
+ This feature must not create a second “looks valid but runtime disagrees” layer.
285
+
286
+ ### 5. Current assessment-gate backlog note
287
+
288
+ - `docs/ideas/backlog.md`
289
+
290
+ Find:
291
+
292
+ - **Native assessment / decision gates for workflows**
293
+
294
+ That captures the current product intuition and open questions.
295
+
296
+ ### 6. MR review redesign context
297
+
298
+ - `docs/plans/mr-review-workflow-redesign.md`
299
+
300
+ Focus on:
301
+
302
+ - the narrowed implementation slice
303
+ - compact confidence model
304
+ - why structured routing/caps matter
305
+
306
+ ## Code-reading path
307
+
308
+ ### Authoring / workflow definition types
309
+
310
+ Start here:
311
+
312
+ - `src/types/workflow-definition.ts`
313
+
314
+ This is the key place to understand:
315
+
316
+ - what workflow definitions can express today
317
+ - existing step/loop/output-contract shapes
318
+ - where a new authoring primitive would naturally live
319
+
320
+ ### Validation layer
321
+
322
+ Read:
323
+
324
+ - `src/application/services/validation-engine.ts`
325
+ - `spec/workflow.schema.json`
326
+
327
+ You need to understand both:
328
+
329
+ - runtime-side validation expectations
330
+ - schema-level authoring support
331
+
332
+ This repo has already hit real schema/compiler mismatches, so this feature must be introduced carefully and consistently.
333
+
334
+ ### Compiler / template path
335
+
336
+ Read:
337
+
338
+ - `src/application/services/compiler/template-registry.ts`
339
+
340
+ Use this to understand how reusable authoring constructs are currently expanded/validated and whether assessment gates should participate in compilation directly.
341
+
342
+ ### Engine/runtime surfaces
343
+
344
+ Read:
345
+
346
+ - `src/engine/index.ts`
347
+ - `src/engine/types.ts`
348
+ - `src/engine/engine-factory.ts`
349
+
350
+ The goal is to find the right place for:
351
+
352
+ - assessment results
353
+ - derived outcomes
354
+ - execution integration
355
+
356
+ ### MCP output / trace surfaces
357
+
358
+ Read:
359
+
360
+ - `src/mcp/step-content-envelope.ts`
361
+ - `src/mcp/v2-response-formatter.ts`
362
+ - `src/mcp/output-schemas.ts`
363
+
364
+ Assessment gates likely do not need first-class agent-facing prose immediately, but they should fit the existing response/contract model cleanly.
365
+
366
+ ### Projections / durable views
367
+
368
+ Read:
369
+
370
+ - `src/v2/projections/`
371
+
372
+ Especially anything around:
373
+
374
+ - run status
375
+ - preferences
376
+ - DAG state
377
+ - node outputs
378
+
379
+ You want to understand how a structured assessment result would appear in durable projections later.
380
+
381
+ ## Design constraints that matter
382
+
383
+ ### 1. Runtime/validation parity is mandatory
384
+
385
+ Do not design an assessment feature that:
386
+
387
+ - validates in schema
388
+ - but is not truly enforced in runtime
389
+
390
+ or the reverse.
391
+
392
+ ### 2. Keep the engine/agent split clean
393
+
394
+ The agent assesses.
395
+
396
+ The engine applies gate rules.
397
+
398
+ Do not let the engine become a generic policy brain.
399
+
400
+ ### 3. Avoid giant generality
401
+
402
+ A real feature does **not** mean a universal decision-language.
403
+
404
+ Prefer:
405
+
406
+ - typed, bounded, closed-set constructs
407
+
408
+ over:
409
+
410
+ - open-ended user-programmable rule DSLs
411
+
412
+ for the first serious implementation.
413
+
414
+ ### 4. Preserve traceability
415
+
416
+ One of the biggest benefits of this feature is explainability.
417
+
418
+ If the engine applies a cap or follow-up trigger from an assessment, that should be traceable later.
419
+
420
+ ### 5. Keep note scaffolding separate
421
+
422
+ This came up during design discussion, but it is a separate feature.
423
+
424
+ Do not quietly smuggle note-structure requirements into assessment gates just because they are adjacent concepts.
425
+
426
+ ## Recommended first design pass
427
+
428
+ The first real design pass should answer these questions explicitly.
429
+
430
+ ### Authoring shape
431
+
432
+ - Is the core primitive inline, referenced, or both?
433
+ - What is the minimum stable declaration shape?
434
+ - How are dimensions declared?
435
+ - How are allowed levels declared?
436
+
437
+ ### Runtime behavior
438
+
439
+ - When does the engine evaluate the gate?
440
+ - What inputs does it consume?
441
+ - What outputs does it produce for later steps/conditions?
442
+ - How are gate outcomes persisted?
443
+
444
+ ### Validation
445
+
446
+ - What can schema validate?
447
+ - What must runtime validate?
448
+ - How do we prevent authoring/runtime drift?
449
+
450
+ ### Traceability
451
+
452
+ - What event(s) or durable records are emitted?
453
+ - What do projections need to expose later?
454
+ - What should console/trace surfaces eventually show?
455
+
456
+ ### Reuse
457
+
458
+ - What built-in assessment families should exist first?
459
+ - What does inline-only authoring lose?
460
+ - What should refs buy us?
461
+
462
+ ## Good first built-in families
463
+
464
+ If built-ins are included in the first proper version, the best candidates are:
465
+
466
+ - **confidence assessment**
467
+ - **readiness assessment**
468
+ - **risk assessment**
469
+
470
+ These are broad enough to matter across workflows, but still conceptually tight.
471
+
472
+ ## Suggested non-goals for the first implementation
473
+
474
+ Keep these explicitly out unless the user directs otherwise:
475
+
476
+ - arbitrary free-form scoring systems
477
+ - weighted math-heavy assessment engines
478
+ - bundled UI work for previewing assessments
479
+ - note scaffolding
480
+ - generalized business-rule language
481
+
482
+ ## Known risks
483
+
484
+ ### Over-generalization
485
+
486
+ The biggest risk is building something too generic too early.
487
+
488
+ That would likely:
489
+
490
+ - slow adoption
491
+ - complicate schema/compiler work
492
+ - blur the engine/agent boundary
493
+
494
+ ### Validation drift
495
+
496
+ If schema, compiler, and runtime do not all agree on the feature shape, confidence in the system will drop quickly.
497
+
498
+ ### Trace debt
499
+
500
+ If the engine uses assessment outcomes internally but they are not visible in durable traces, the feature will feel magical and hard to debug.
501
+
502
+ ## A good end state
503
+
504
+ By the end of the first solid implementation, a workflow author should be able to say:
505
+
506
+ - here are the dimensions
507
+ - here are the allowed levels
508
+ - here are the gate rules
509
+
510
+ And the engine should be able to:
511
+
512
+ - validate the shape
513
+ - accept the agent’s assessment
514
+ - apply the gate outcomes
515
+ - persist the result
516
+ - explain later what happened
517
+
518
+ ## Immediate next step for a new agent
519
+
520
+ Do **not** jump straight to implementation.
521
+
522
+ First:
523
+
524
+ 1. read the docs/code listed above
525
+ 2. write a compact design note or plan that proposes:
526
+ - authoring shape
527
+ - runtime behavior
528
+ - validation shape
529
+ - persistence/trace model
530
+ - non-goals
531
+ 3. compare at least two design options:
532
+ - narrower typed gate
533
+ - slightly more reusable ref-based model
534
+ 4. bring the tradeoffs back to the user before coding
535
+
536
+ That is the right restart point.
@@ -0,0 +1,151 @@
1
+ # Content Coherence and Linked References
2
+
3
+ > **Active initiative plan**
4
+ >
5
+ > Canonical design and slice plan for increasing coherence across WorkRail's content delivery seams
6
+ > and introducing workflow-declared linked references.
7
+
8
+ **Status**: Implemented (slices 1–6 complete)
9
+ **Date**: 2026-03-22
10
+ **Completed**: 2026-03-22
11
+
12
+ ---
13
+
14
+ ## Problem
15
+
16
+ WorkRail has grown six independent mechanisms for injecting content into what the agent sees at execution time:
17
+
18
+ | Seam | Phase | Declared on | Override mechanism |
19
+ |---|---|---|---|
20
+ | Extension points / bindings | Compile-time | `WorkflowDefinition.extensionPoints` | `.workrail/bindings.json` |
21
+ | Features | Compile-time | `WorkflowDefinition.features` | None (closed set) |
22
+ | Refs (`wr.refs.*`) | Compile-time | `promptBlocks` parts | None (closed set) |
23
+ | Context templates (`{{varName}}`) | Render-time | Inline in prompt text | Session context |
24
+ | Prompt fragments | Render-time | `WorkflowStepDefinition.promptFragments` | Session context conditions |
25
+ | Response supplements | Transport-time | Hardcoded in `response-supplements.ts` | None |
26
+ | `metaGuidance` | Always visible | `WorkflowDefinition.metaGuidance` | None |
27
+
28
+ Each seam was well-motivated in isolation, but they share no vocabulary, no resolution protocol, and no unified introspection surface. An author deciding "where does this content belong?" must understand all of them and their interactions.
29
+
30
+ Additionally, there is no first-class way for a workflow to point at authoritative external documents (schemas, authoring specs, team guides, playbooks) without inlining content into the prompt or metaGuidance strings.
31
+
32
+ ## Goal
33
+
34
+ 1. Introduce a **typed intermediate representation** (StepContentEnvelope) that makes the categories of agent-visible content explicit in the type system, replacing implicit string concatenation in the prompt renderer.
35
+ 2. Introduce **workflow-declared linked references** as a new declaration surface for external supporting documents.
36
+ 3. Make the boundary between the compiler pipeline (compile-time, deterministic, hashed) and the render/transport pipeline (runtime, session-aware) explicit and typed.
37
+
38
+ ## Non-goals
39
+
40
+ - Grand unification of all seams into one abstraction. The compiler and render pipelines serve different purposes and should stay distinct.
41
+ - Moving prompt fragments out of the prompt string. Fragments are authored prompt content; they belong inline. The envelope documents what matched, but the text stays in `authoredPrompt`.
42
+ - Content inlining for references. V1 references are pointers only. The agent reads the file itself if needed.
43
+ - User-defined refs replacing the closed `wr.refs.*` set. Workflow-declared references are a separate concept.
44
+
45
+ ## Key design decisions
46
+
47
+ ### StepContentEnvelope
48
+
49
+ The prompt renderer currently returns `StepMetadata` (stepId, title, prompt string, agentRole, requireConfirmation). The response formatter receives the final response object through shape detection. Between them, content categories are implicit.
50
+
51
+ The envelope makes them explicit:
52
+
53
+ ```typescript
54
+ interface StepContentEnvelope {
55
+ readonly authoredPrompt: string;
56
+ readonly matchedFragmentIds: readonly string[];
57
+ readonly requirements: readonly Requirement[];
58
+ readonly loopBanner: string | null;
59
+ readonly recoveryContext: string | null;
60
+ readonly references: readonly ResolvedReference[];
61
+ }
62
+ ```
63
+
64
+ The handler assembles the envelope from renderer output + handler-level knowledge (binding drift, preferences, blockers). The `V2ExecutionRenderEnvelope` grows from `{ response, lifecycle }` to `{ response, lifecycle, contentEnvelope }`. The formatter consumes the envelope instead of relying on ad-hoc shape detection.
65
+
66
+ ### Linked references
67
+
68
+ A reference declaration on `WorkflowDefinition`:
69
+
70
+ ```typescript
71
+ interface WorkflowReference {
72
+ readonly id: string;
73
+ readonly title: string;
74
+ readonly source: string; // path or URI
75
+ readonly purpose: string;
76
+ readonly authoritative: boolean;
77
+ }
78
+ ```
79
+
80
+ Reference handling splits into two phases:
81
+
82
+ - **Compile-time** (pure): validate declarations structurally (unique IDs, non-empty paths, valid shapes). Include declarations in the workflow hash.
83
+ - **Start-time** (I/O): resolve paths against the workspace, validate existence, capture resolved references as observation events. This follows the existing pattern in `resolveWorkspaceAnchors` in `start.ts`.
84
+
85
+ Only workflow-declared references participate in the hash. Project-attached references (future) are handled like binding overrides: captured at session start, drift-detected against current state.
86
+
87
+ ### Data flow awareness
88
+
89
+ The Zod schema boundary (`V2StartWorkflowOutputSchema.parse`) only knows about `pending.prompt` as a string. The envelope travels as a parallel channel through the render envelope wrapper, not through the Zod-validated response. `pending.prompt` is serialized from the envelope's `authoredPrompt` for backward compatibility.
90
+
91
+ ### metaGuidance status
92
+
93
+ `metaGuidance` is declared on `WorkflowDefinition` but in the v2 clean-format path it is not delivered to the agent during execution (only visible in `inspect_workflow` output). Some things currently in metaGuidance (e.g. "follow this coding guide") are references in disguise. This initiative should clarify metaGuidance's delivery semantics or deprecate it in favor of references + existing prompt composition primitives.
94
+
95
+ ## Constraints
96
+
97
+ - Prompt fragments must not move out of the prompt string. They participate in the authored prompt and affect recovery budget calculations (`RECOVERY_BUDGET_BYTES`). Moving them would change prompt hashes and break rehydrate for existing sessions.
98
+ - Reference content must not be inlined at compile time. Referenced files change independently of the workflow; content inlining would make hashes unstable.
99
+ - Project-attached references must not participate in the workflow hash. The same workflow in two projects with different local refs must produce the same hash. Project refs are observation-level, not definition-level.
100
+
101
+ ## Slice plan
102
+
103
+ > All slices 1–6 are implemented. Slice 5 (project-attached references) was deferred as future work.
104
+
105
+ ### Slice 1: StepContentEnvelope type and render envelope extension (done)
106
+
107
+ Define the `StepContentEnvelope` type. Extend `V2ExecutionRenderEnvelope` to carry it. Have the handler assemble it from renderer output + handler-level knowledge. Formatter consumes it. **No behavioral change**: the formatter produces identical output, sourced from a typed representation instead of ad-hoc shape detection.
108
+
109
+ **Key files**: `render-envelope.ts`, `prompt-renderer.ts`, `v2-response-formatter.ts`, `v2-execution/start.ts`, `v2-execution/continue-rehydrate.ts`, `v2-execution/continue-advance.ts`
110
+
111
+ ### Slice 2: Reference declarations (done)
112
+
113
+ Add `references` as an optional array on `WorkflowDefinition` and `workflow.schema.json`. Structural validation in the validation engine (unique IDs, non-empty paths). Compiler includes declarations in workflow hash. Surfaced in `inspect_workflow` output.
114
+
115
+ **Key files**: `workflow-definition.ts`, `workflow.schema.json`, `validation-engine.ts`, `v2-workflow.ts` (inspect handler)
116
+
117
+ ### Slice 3: Reference resolution at start-time (done)
118
+
119
+ I/O phase at `start_workflow` validates reference paths against the workspace, stores resolved references as observation events. Handler populates the envelope's reference section.
120
+
121
+ **Key files**: `v2-execution/start.ts`, `v2-workspace-resolution.ts`, observation event schema
122
+
123
+ ### Slice 4: Reference delivery (done)
124
+
125
+ Formatter renders resolved references as a dedicated MCP content item on `start` (full set) and `rehydrate` (compact reminder). Separate from the authored prompt and from supplements.
126
+
127
+ **Key files**: `v2-response-formatter.ts`, `handler-factory.ts` (toMcpResult)
128
+
129
+ ### Slice 5: Project-attached references (deferred — future work)
130
+
131
+ `.workrail/references.json` merges with workflow-declared references at start-time. Provenance field (`workflow_declared` | `project_attached`) distinguishes origin. Drift detection via observation comparison (same pattern as binding drift in `binding-drift.ts`).
132
+
133
+ **Key files**: new `reference-registry.ts`, `v2-execution/continue-rehydrate.ts` (drift detection), `v2-response-formatter.ts` (drift warnings)
134
+
135
+ ### Slice 6: metaGuidance clarification (done)
136
+
137
+ Either make metaGuidance delivery explicit through the envelope (a supplement or dedicated content section with clear lifecycle semantics) or deprecate it with a migration path to references + prompt composition.
138
+
139
+ **Key files**: `workflow-definition.ts`, `prompt-renderer.ts`, `v2-response-formatter.ts`, authoring spec, authoring docs
140
+
141
+ ## Relationship to other initiatives
142
+
143
+ - **Composition and middleware engine** (agentic-orchestration-roadmap.md Phase 2): the StepContentEnvelope provides a typed surface that a future assembler/middleware engine would populate, rather than producing raw strings.
144
+ - **Authorable response supplements** (agentic-orchestration-roadmap.md backlog): the envelope gives supplements a typed home. Authorable supplements would declare their content in workflow JSON and flow through the envelope rather than being hardcoded in `response-supplements.ts`.
145
+ - **Clean response formatting** (active partial): this initiative completes the boundary clarification between authored prompts, system-injected content, and delivery framing by making each category typed and inspectable.
146
+
147
+ ## Open questions
148
+
149
+ - Should references support URI schemes beyond file paths (e.g. `https://`, `wr.refs.*`)? Deferring to v1 feedback.
150
+ - Should the envelope carry the full supplement specs or just the rendered text? Leaning toward rendered text to keep the formatter's presentation logic in one place.
151
+ - Should drift detection for project-attached references be blocking or advisory? Leaning advisory (same as binding drift).