@exaudeus/workrail 3.27.0 → 3.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +3 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +252 -93
- package/workflows/workflow-for-workflows.v2.json +188 -77
|
@@ -0,0 +1,536 @@
|
|
|
1
|
+
# Assessment Gates Engine Handoff
|
|
2
|
+
|
|
3
|
+
## Status
|
|
4
|
+
|
|
5
|
+
This is a handoff document for a new agent picking up the **native assessment / decision gates** engine feature.
|
|
6
|
+
|
|
7
|
+
It is intentionally written as a **catch-up + execution-orientation** doc:
|
|
8
|
+
|
|
9
|
+
- what problem we are solving
|
|
10
|
+
- why it matters now
|
|
11
|
+
- what the current engine can and cannot do
|
|
12
|
+
- where to read first
|
|
13
|
+
- what not to accidentally do
|
|
14
|
+
|
|
15
|
+
This is **not** the final design spec. It is the best current starting point for a fresh agent.
|
|
16
|
+
|
|
17
|
+
## What this feature is
|
|
18
|
+
|
|
19
|
+
We want a **first-class engine feature** for structured assessments that can drive workflow behavior.
|
|
20
|
+
|
|
21
|
+
Today, workflows can only express confidence, readiness, or risk decisions in prose. The agent can write notes like:
|
|
22
|
+
|
|
23
|
+
- boundary confidence is low
|
|
24
|
+
- coverage confidence is medium
|
|
25
|
+
- we should continue because uncertainty remains
|
|
26
|
+
|
|
27
|
+
But the engine cannot reason over those assessments directly. That means:
|
|
28
|
+
|
|
29
|
+
- routing still depends on prompt interpretation
|
|
30
|
+
- confidence caps are prose-only
|
|
31
|
+
- follow-up triggers are prose-only
|
|
32
|
+
- traces can say what happened, but not cleanly expose the structured decision that drove it
|
|
33
|
+
|
|
34
|
+
The proposed feature is a **typed assessment / decision gate system** that lets:
|
|
35
|
+
|
|
36
|
+
- the **agent** assess named dimensions and provide short rationales
|
|
37
|
+
- the **engine** apply declared rules such as caps, routing outcomes, and follow-up triggers
|
|
38
|
+
|
|
39
|
+
## Why this is the next biggest engine win
|
|
40
|
+
|
|
41
|
+
This feature has high leverage because it unlocks better workflow behavior across multiple domains:
|
|
42
|
+
|
|
43
|
+
- **MR review**
|
|
44
|
+
- confidence assessment
|
|
45
|
+
- boundary/context/routing caps
|
|
46
|
+
- block vs continue / follow-up decisions
|
|
47
|
+
- **planning**
|
|
48
|
+
- readiness gates
|
|
49
|
+
- “good enough to implement?” checks
|
|
50
|
+
- **debugging / investigation**
|
|
51
|
+
- next-step routing based on confidence, evidence, or ambiguity
|
|
52
|
+
- **future explainability**
|
|
53
|
+
- cleaner traceability of why the engine chose to continue, loop, or downgrade confidence
|
|
54
|
+
|
|
55
|
+
Compared with other ideas:
|
|
56
|
+
|
|
57
|
+
- it is more powerful than a workflow previewer because it improves **runtime behavior**, not just authoring UX
|
|
58
|
+
- it is more foundational than note scaffolding because it changes **decision quality and engine expressiveness**
|
|
59
|
+
|
|
60
|
+
## Problem statement
|
|
61
|
+
|
|
62
|
+
WorkRail currently has a gap between:
|
|
63
|
+
|
|
64
|
+
- what workflows want to say in a structured way
|
|
65
|
+
- and what the engine can actually enforce or reason over
|
|
66
|
+
|
|
67
|
+
Examples:
|
|
68
|
+
|
|
69
|
+
- a workflow wants to say “if boundary confidence is Low, final confidence cannot exceed Low”
|
|
70
|
+
- a workflow wants to say “if coverage confidence is Low, reopen targeted follow-up”
|
|
71
|
+
- a workflow wants to say “if readiness is Medium with one specific concern, continue; otherwise stop”
|
|
72
|
+
|
|
73
|
+
Today, those rules live in prompts and notes. That is useful, but weak:
|
|
74
|
+
|
|
75
|
+
- not compiler-validated
|
|
76
|
+
- not engine-enforced
|
|
77
|
+
- not structurally visible in runtime traces
|
|
78
|
+
- easy to drift across workflows
|
|
79
|
+
|
|
80
|
+
## Current recommendation
|
|
81
|
+
|
|
82
|
+
Build this as a **real engine feature**, not a tiny helper.
|
|
83
|
+
|
|
84
|
+
The intended shape is:
|
|
85
|
+
|
|
86
|
+
- **typed assessment definitions**
|
|
87
|
+
- **engine-applied gate rules**
|
|
88
|
+
- **durable traceability**
|
|
89
|
+
- **compiler/schema support**
|
|
90
|
+
- **reusable built-in or repo-owned assessment shapes**
|
|
91
|
+
|
|
92
|
+
This should be the **smallest complete thing worth living with**, not a toy MVP.
|
|
93
|
+
|
|
94
|
+
## What success looks like
|
|
95
|
+
|
|
96
|
+
A strong first version should support all of the following:
|
|
97
|
+
|
|
98
|
+
### 1. Typed assessment definitions
|
|
99
|
+
|
|
100
|
+
Workflows can declare assessment structures with:
|
|
101
|
+
|
|
102
|
+
- a stable name / reference
|
|
103
|
+
- named dimensions
|
|
104
|
+
- allowed levels
|
|
105
|
+
- rationale requirements
|
|
106
|
+
|
|
107
|
+
Examples:
|
|
108
|
+
|
|
109
|
+
- `confidenceAssessment`
|
|
110
|
+
- `readinessAssessment`
|
|
111
|
+
- `riskAssessment`
|
|
112
|
+
|
|
113
|
+
### 2. Engine-applied rules
|
|
114
|
+
|
|
115
|
+
The engine can consume a completed assessment and apply rules like:
|
|
116
|
+
|
|
117
|
+
- cap final confidence
|
|
118
|
+
- trigger follow-up
|
|
119
|
+
- continue vs stop
|
|
120
|
+
- reopen loop
|
|
121
|
+
- downgrade recommendation band
|
|
122
|
+
|
|
123
|
+
### 3. Durable execution visibility
|
|
124
|
+
|
|
125
|
+
Assessment results should be visible in durable execution history and usable for projection/trace surfaces.
|
|
126
|
+
|
|
127
|
+
That likely means:
|
|
128
|
+
|
|
129
|
+
- structured persistence
|
|
130
|
+
- explicit event(s)
|
|
131
|
+
- console/trace visibility later
|
|
132
|
+
|
|
133
|
+
### 4. Compiler/schema validation
|
|
134
|
+
|
|
135
|
+
Workflow definitions should be validated so authors cannot:
|
|
136
|
+
|
|
137
|
+
- reference missing dimensions
|
|
138
|
+
- reference invalid levels
|
|
139
|
+
- define malformed gate rules
|
|
140
|
+
|
|
141
|
+
### 5. Reuse
|
|
142
|
+
|
|
143
|
+
The feature should support either:
|
|
144
|
+
|
|
145
|
+
- inline assessment declarations
|
|
146
|
+
- reusable refs
|
|
147
|
+
- or both
|
|
148
|
+
|
|
149
|
+
without forcing every workflow to invent its own one-off matrix shape.
|
|
150
|
+
|
|
151
|
+
## Recommended product scope
|
|
152
|
+
|
|
153
|
+
### In scope for the first serious build
|
|
154
|
+
|
|
155
|
+
- a first-class assessment primitive such as:
|
|
156
|
+
- `assessmentGate`
|
|
157
|
+
- `assessmentRef`
|
|
158
|
+
- or a closely-related name
|
|
159
|
+
- assessment dimensions with a closed set of allowed levels
|
|
160
|
+
- short rationale capture per dimension
|
|
161
|
+
- rule evaluation in the engine
|
|
162
|
+
- durable persistence / traceability
|
|
163
|
+
- compile-time validation
|
|
164
|
+
- a few good built-in patterns:
|
|
165
|
+
- confidence
|
|
166
|
+
- readiness
|
|
167
|
+
- risk
|
|
168
|
+
|
|
169
|
+
### Out of scope for this feature
|
|
170
|
+
|
|
171
|
+
Do **not** bundle these into the first implementation:
|
|
172
|
+
|
|
173
|
+
- generic arbitrary decision-table engine
|
|
174
|
+
- engine-injected note scaffolding
|
|
175
|
+
- large UI/console preview work
|
|
176
|
+
- overly open-ended value types
|
|
177
|
+
- “anything can assess anything” without clear type boundaries
|
|
178
|
+
|
|
179
|
+
Those may come later, but they should not bloat the first solid implementation.
|
|
180
|
+
|
|
181
|
+
## Agent vs engine responsibility split
|
|
182
|
+
|
|
183
|
+
This split should remain sharp.
|
|
184
|
+
|
|
185
|
+
### Agent responsibilities
|
|
186
|
+
|
|
187
|
+
- assess each declared dimension
|
|
188
|
+
- choose one allowed level per dimension
|
|
189
|
+
- provide a short rationale
|
|
190
|
+
- submit the assessment result as part of workflow output / continuation
|
|
191
|
+
|
|
192
|
+
### Engine responsibilities
|
|
193
|
+
|
|
194
|
+
- validate the assessment shape
|
|
195
|
+
- validate levels and dimension names
|
|
196
|
+
- apply declared gate rules
|
|
197
|
+
- expose derived outcomes to later workflow behavior
|
|
198
|
+
- persist assessment facts durably
|
|
199
|
+
- record enough trace information to explain the decision path later
|
|
200
|
+
|
|
201
|
+
The engine should not replace the agent’s judgment. It should **formalize and enforce the consequences** of that judgment.
|
|
202
|
+
|
|
203
|
+
## What this is not
|
|
204
|
+
|
|
205
|
+
This is **not**:
|
|
206
|
+
|
|
207
|
+
- a generic policy engine for all workflow logic
|
|
208
|
+
- a replacement for prompts
|
|
209
|
+
- a free-form confidence essay system
|
|
210
|
+
- a note-formatting feature
|
|
211
|
+
|
|
212
|
+
It is a **structured decision layer** for a small class of decisions that are currently trapped in prose.
|
|
213
|
+
|
|
214
|
+
## Existing repo context
|
|
215
|
+
|
|
216
|
+
The idea already exists in the backlog:
|
|
217
|
+
|
|
218
|
+
- `docs/ideas/backlog.md`
|
|
219
|
+
- **Native assessment / decision gates for workflows**
|
|
220
|
+
- **Engine-injected note scaffolding** is now split out as a related follow-on idea
|
|
221
|
+
|
|
222
|
+
The MR review redesign work is one of the main reasons this feature is now compelling:
|
|
223
|
+
|
|
224
|
+
- `docs/plans/mr-review-workflow-redesign.md`
|
|
225
|
+
|
|
226
|
+
That doc now has a narrowed next slice with:
|
|
227
|
+
|
|
228
|
+
- compact confidence dimensions
|
|
229
|
+
- routing minimalism
|
|
230
|
+
- explicit engine-compatibility constraints
|
|
231
|
+
|
|
232
|
+
The main takeaway is:
|
|
233
|
+
|
|
234
|
+
- workflows want structured confidence/routing
|
|
235
|
+
- the current engine still forces those ideas to live in prompts
|
|
236
|
+
|
|
237
|
+
## Reading order for a new agent
|
|
238
|
+
|
|
239
|
+
If you are picking this up fresh, read in this order.
|
|
240
|
+
|
|
241
|
+
### 1. Repo workflow and operating rules
|
|
242
|
+
|
|
243
|
+
- `AGENTS.md`
|
|
244
|
+
|
|
245
|
+
Pay attention to:
|
|
246
|
+
|
|
247
|
+
- deliberate progression
|
|
248
|
+
- planning-doc expectations
|
|
249
|
+
- verification rules
|
|
250
|
+
- release rules
|
|
251
|
+
|
|
252
|
+
### 2. Normative execution semantics
|
|
253
|
+
|
|
254
|
+
- `docs/reference/workflow-execution-contract.md`
|
|
255
|
+
|
|
256
|
+
Focus on:
|
|
257
|
+
|
|
258
|
+
- token-driven execution
|
|
259
|
+
- continuation behavior
|
|
260
|
+
- blocked / continue semantics
|
|
261
|
+
- where optional capabilities and durable state already fit
|
|
262
|
+
|
|
263
|
+
### 3. Core durable engine design locks
|
|
264
|
+
|
|
265
|
+
- `docs/design/v2-core-design-locks.md`
|
|
266
|
+
|
|
267
|
+
Focus on:
|
|
268
|
+
|
|
269
|
+
- append-only truth model
|
|
270
|
+
- projections and durable state shape
|
|
271
|
+
- event philosophy
|
|
272
|
+
- anything that constrains new execution events or derived state
|
|
273
|
+
|
|
274
|
+
### 4. Workflow validation philosophy
|
|
275
|
+
|
|
276
|
+
- `docs/plans/workflow-validation-design.md`
|
|
277
|
+
|
|
278
|
+
Focus on:
|
|
279
|
+
|
|
280
|
+
- runtime/validation parity
|
|
281
|
+
- shared resolution logic
|
|
282
|
+
- why validation must mirror real engine behavior
|
|
283
|
+
|
|
284
|
+
This feature must not create a second “looks valid but runtime disagrees” layer.
|
|
285
|
+
|
|
286
|
+
### 5. Current assessment-gate backlog note
|
|
287
|
+
|
|
288
|
+
- `docs/ideas/backlog.md`
|
|
289
|
+
|
|
290
|
+
Find:
|
|
291
|
+
|
|
292
|
+
- **Native assessment / decision gates for workflows**
|
|
293
|
+
|
|
294
|
+
That captures the current product intuition and open questions.
|
|
295
|
+
|
|
296
|
+
### 6. MR review redesign context
|
|
297
|
+
|
|
298
|
+
- `docs/plans/mr-review-workflow-redesign.md`
|
|
299
|
+
|
|
300
|
+
Focus on:
|
|
301
|
+
|
|
302
|
+
- the narrowed implementation slice
|
|
303
|
+
- compact confidence model
|
|
304
|
+
- why structured routing/caps matter
|
|
305
|
+
|
|
306
|
+
## Code-reading path
|
|
307
|
+
|
|
308
|
+
### Authoring / workflow definition types
|
|
309
|
+
|
|
310
|
+
Start here:
|
|
311
|
+
|
|
312
|
+
- `src/types/workflow-definition.ts`
|
|
313
|
+
|
|
314
|
+
This is the key place to understand:
|
|
315
|
+
|
|
316
|
+
- what workflow definitions can express today
|
|
317
|
+
- existing step/loop/output-contract shapes
|
|
318
|
+
- where a new authoring primitive would naturally live
|
|
319
|
+
|
|
320
|
+
### Validation layer
|
|
321
|
+
|
|
322
|
+
Read:
|
|
323
|
+
|
|
324
|
+
- `src/application/services/validation-engine.ts`
|
|
325
|
+
- `spec/workflow.schema.json`
|
|
326
|
+
|
|
327
|
+
You need to understand both:
|
|
328
|
+
|
|
329
|
+
- runtime-side validation expectations
|
|
330
|
+
- schema-level authoring support
|
|
331
|
+
|
|
332
|
+
This repo has already hit real schema/compiler mismatches, so this feature must be introduced carefully and consistently.
|
|
333
|
+
|
|
334
|
+
### Compiler / template path
|
|
335
|
+
|
|
336
|
+
Read:
|
|
337
|
+
|
|
338
|
+
- `src/application/services/compiler/template-registry.ts`
|
|
339
|
+
|
|
340
|
+
Use this to understand how reusable authoring constructs are currently expanded/validated and whether assessment gates should participate in compilation directly.
|
|
341
|
+
|
|
342
|
+
### Engine/runtime surfaces
|
|
343
|
+
|
|
344
|
+
Read:
|
|
345
|
+
|
|
346
|
+
- `src/engine/index.ts`
|
|
347
|
+
- `src/engine/types.ts`
|
|
348
|
+
- `src/engine/engine-factory.ts`
|
|
349
|
+
|
|
350
|
+
The goal is to find the right place for:
|
|
351
|
+
|
|
352
|
+
- assessment results
|
|
353
|
+
- derived outcomes
|
|
354
|
+
- execution integration
|
|
355
|
+
|
|
356
|
+
### MCP output / trace surfaces
|
|
357
|
+
|
|
358
|
+
Read:
|
|
359
|
+
|
|
360
|
+
- `src/mcp/step-content-envelope.ts`
|
|
361
|
+
- `src/mcp/v2-response-formatter.ts`
|
|
362
|
+
- `src/mcp/output-schemas.ts`
|
|
363
|
+
|
|
364
|
+
Assessment gates likely do not need first-class agent-facing prose immediately, but they should fit the existing response/contract model cleanly.
|
|
365
|
+
|
|
366
|
+
### Projections / durable views
|
|
367
|
+
|
|
368
|
+
Read:
|
|
369
|
+
|
|
370
|
+
- `src/v2/projections/`
|
|
371
|
+
|
|
372
|
+
Especially anything around:
|
|
373
|
+
|
|
374
|
+
- run status
|
|
375
|
+
- preferences
|
|
376
|
+
- DAG state
|
|
377
|
+
- node outputs
|
|
378
|
+
|
|
379
|
+
You want to understand how a structured assessment result would appear in durable projections later.
|
|
380
|
+
|
|
381
|
+
## Design constraints that matter
|
|
382
|
+
|
|
383
|
+
### 1. Runtime/validation parity is mandatory
|
|
384
|
+
|
|
385
|
+
Do not design an assessment feature that:
|
|
386
|
+
|
|
387
|
+
- validates in schema
|
|
388
|
+
- but is not truly enforced in runtime
|
|
389
|
+
|
|
390
|
+
or the reverse.
|
|
391
|
+
|
|
392
|
+
### 2. Keep the engine/agent split clean
|
|
393
|
+
|
|
394
|
+
The agent assesses.
|
|
395
|
+
|
|
396
|
+
The engine applies gate rules.
|
|
397
|
+
|
|
398
|
+
Do not let the engine become a generic policy brain.
|
|
399
|
+
|
|
400
|
+
### 3. Avoid giant generality
|
|
401
|
+
|
|
402
|
+
A real feature does **not** mean a universal decision-language.
|
|
403
|
+
|
|
404
|
+
Prefer:
|
|
405
|
+
|
|
406
|
+
- typed, bounded, closed-set constructs
|
|
407
|
+
|
|
408
|
+
over:
|
|
409
|
+
|
|
410
|
+
- open-ended user-programmable rule DSLs
|
|
411
|
+
|
|
412
|
+
for the first serious implementation.
|
|
413
|
+
|
|
414
|
+
### 4. Preserve traceability
|
|
415
|
+
|
|
416
|
+
One of the biggest benefits of this feature is explainability.
|
|
417
|
+
|
|
418
|
+
If the engine applies a cap or follow-up trigger from an assessment, that should be traceable later.
|
|
419
|
+
|
|
420
|
+
### 5. Keep note scaffolding separate
|
|
421
|
+
|
|
422
|
+
This came up during design discussion, but it is a separate feature.
|
|
423
|
+
|
|
424
|
+
Do not quietly smuggle note-structure requirements into assessment gates just because they are adjacent concepts.
|
|
425
|
+
|
|
426
|
+
## Recommended first design pass
|
|
427
|
+
|
|
428
|
+
The first real design pass should answer these questions explicitly.
|
|
429
|
+
|
|
430
|
+
### Authoring shape
|
|
431
|
+
|
|
432
|
+
- Is the core primitive inline, referenced, or both?
|
|
433
|
+
- What is the minimum stable declaration shape?
|
|
434
|
+
- How are dimensions declared?
|
|
435
|
+
- How are allowed levels declared?
|
|
436
|
+
|
|
437
|
+
### Runtime behavior
|
|
438
|
+
|
|
439
|
+
- When does the engine evaluate the gate?
|
|
440
|
+
- What inputs does it consume?
|
|
441
|
+
- What outputs does it produce for later steps/conditions?
|
|
442
|
+
- How are gate outcomes persisted?
|
|
443
|
+
|
|
444
|
+
### Validation
|
|
445
|
+
|
|
446
|
+
- What can schema validate?
|
|
447
|
+
- What must runtime validate?
|
|
448
|
+
- How do we prevent authoring/runtime drift?
|
|
449
|
+
|
|
450
|
+
### Traceability
|
|
451
|
+
|
|
452
|
+
- What event(s) or durable records are emitted?
|
|
453
|
+
- What do projections need to expose later?
|
|
454
|
+
- What should console/trace surfaces eventually show?
|
|
455
|
+
|
|
456
|
+
### Reuse
|
|
457
|
+
|
|
458
|
+
- What built-in assessment families should exist first?
|
|
459
|
+
- What does inline-only authoring lose?
|
|
460
|
+
- What should refs buy us?
|
|
461
|
+
|
|
462
|
+
## Good first built-in families
|
|
463
|
+
|
|
464
|
+
If built-ins are included in the first proper version, the best candidates are:
|
|
465
|
+
|
|
466
|
+
- **confidence assessment**
|
|
467
|
+
- **readiness assessment**
|
|
468
|
+
- **risk assessment**
|
|
469
|
+
|
|
470
|
+
These are broad enough to matter across workflows, but still conceptually tight.
|
|
471
|
+
|
|
472
|
+
## Suggested non-goals for the first implementation
|
|
473
|
+
|
|
474
|
+
Keep these explicitly out unless the user directs otherwise:
|
|
475
|
+
|
|
476
|
+
- arbitrary free-form scoring systems
|
|
477
|
+
- weighted math-heavy assessment engines
|
|
478
|
+
- bundled UI work for previewing assessments
|
|
479
|
+
- note scaffolding
|
|
480
|
+
- generalized business-rule language
|
|
481
|
+
|
|
482
|
+
## Known risks
|
|
483
|
+
|
|
484
|
+
### Over-generalization
|
|
485
|
+
|
|
486
|
+
The biggest risk is building something too generic too early.
|
|
487
|
+
|
|
488
|
+
That would likely:
|
|
489
|
+
|
|
490
|
+
- slow adoption
|
|
491
|
+
- complicate schema/compiler work
|
|
492
|
+
- blur the engine/agent boundary
|
|
493
|
+
|
|
494
|
+
### Validation drift
|
|
495
|
+
|
|
496
|
+
If schema, compiler, and runtime do not all agree on the feature shape, confidence in the system will drop quickly.
|
|
497
|
+
|
|
498
|
+
### Trace debt
|
|
499
|
+
|
|
500
|
+
If the engine uses assessment outcomes internally but they are not visible in durable traces, the feature will feel magical and hard to debug.
|
|
501
|
+
|
|
502
|
+
## A good end state
|
|
503
|
+
|
|
504
|
+
By the end of the first solid implementation, a workflow author should be able to say:
|
|
505
|
+
|
|
506
|
+
- here are the dimensions
|
|
507
|
+
- here are the allowed levels
|
|
508
|
+
- here are the gate rules
|
|
509
|
+
|
|
510
|
+
And the engine should be able to:
|
|
511
|
+
|
|
512
|
+
- validate the shape
|
|
513
|
+
- accept the agent’s assessment
|
|
514
|
+
- apply the gate outcomes
|
|
515
|
+
- persist the result
|
|
516
|
+
- explain later what happened
|
|
517
|
+
|
|
518
|
+
## Immediate next step for a new agent
|
|
519
|
+
|
|
520
|
+
Do **not** jump straight to implementation.
|
|
521
|
+
|
|
522
|
+
First:
|
|
523
|
+
|
|
524
|
+
1. read the docs/code listed above
|
|
525
|
+
2. write a compact design note or plan that proposes:
|
|
526
|
+
- authoring shape
|
|
527
|
+
- runtime behavior
|
|
528
|
+
- validation shape
|
|
529
|
+
- persistence/trace model
|
|
530
|
+
- non-goals
|
|
531
|
+
3. compare at least two design options:
|
|
532
|
+
- narrower typed gate
|
|
533
|
+
- slightly more reusable ref-based model
|
|
534
|
+
4. bring the tradeoffs back to the user before coding
|
|
535
|
+
|
|
536
|
+
That is the right restart point.
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# Content Coherence and Linked References
|
|
2
|
+
|
|
3
|
+
> **Active initiative plan**
|
|
4
|
+
>
|
|
5
|
+
> Canonical design and slice plan for increasing coherence across WorkRail's content delivery seams
|
|
6
|
+
> and introducing workflow-declared linked references.
|
|
7
|
+
|
|
8
|
+
**Status**: Implemented (slices 1–6 complete)
|
|
9
|
+
**Date**: 2026-03-22
|
|
10
|
+
**Completed**: 2026-03-22
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Problem
|
|
15
|
+
|
|
16
|
+
WorkRail has grown six independent mechanisms for injecting content into what the agent sees at execution time:
|
|
17
|
+
|
|
18
|
+
| Seam | Phase | Declared on | Override mechanism |
|
|
19
|
+
|---|---|---|---|
|
|
20
|
+
| Extension points / bindings | Compile-time | `WorkflowDefinition.extensionPoints` | `.workrail/bindings.json` |
|
|
21
|
+
| Features | Compile-time | `WorkflowDefinition.features` | None (closed set) |
|
|
22
|
+
| Refs (`wr.refs.*`) | Compile-time | `promptBlocks` parts | None (closed set) |
|
|
23
|
+
| Context templates (`{{varName}}`) | Render-time | Inline in prompt text | Session context |
|
|
24
|
+
| Prompt fragments | Render-time | `WorkflowStepDefinition.promptFragments` | Session context conditions |
|
|
25
|
+
| Response supplements | Transport-time | Hardcoded in `response-supplements.ts` | None |
|
|
26
|
+
| `metaGuidance` | Always visible | `WorkflowDefinition.metaGuidance` | None |
|
|
27
|
+
|
|
28
|
+
Each seam was well-motivated in isolation, but they share no vocabulary, no resolution protocol, and no unified introspection surface. An author deciding "where does this content belong?" must understand all of them and their interactions.
|
|
29
|
+
|
|
30
|
+
Additionally, there is no first-class way for a workflow to point at authoritative external documents (schemas, authoring specs, team guides, playbooks) without inlining content into the prompt or metaGuidance strings.
|
|
31
|
+
|
|
32
|
+
## Goal
|
|
33
|
+
|
|
34
|
+
1. Introduce a **typed intermediate representation** (StepContentEnvelope) that makes the categories of agent-visible content explicit in the type system, replacing implicit string concatenation in the prompt renderer.
|
|
35
|
+
2. Introduce **workflow-declared linked references** as a new declaration surface for external supporting documents.
|
|
36
|
+
3. Make the boundary between the compiler pipeline (compile-time, deterministic, hashed) and the render/transport pipeline (runtime, session-aware) explicit and typed.
|
|
37
|
+
|
|
38
|
+
## Non-goals
|
|
39
|
+
|
|
40
|
+
- Grand unification of all seams into one abstraction. The compiler and render pipelines serve different purposes and should stay distinct.
|
|
41
|
+
- Moving prompt fragments out of the prompt string. Fragments are authored prompt content; they belong inline. The envelope documents what matched, but the text stays in `authoredPrompt`.
|
|
42
|
+
- Content inlining for references. V1 references are pointers only. The agent reads the file itself if needed.
|
|
43
|
+
- User-defined refs replacing the closed `wr.refs.*` set. Workflow-declared references are a separate concept.
|
|
44
|
+
|
|
45
|
+
## Key design decisions
|
|
46
|
+
|
|
47
|
+
### StepContentEnvelope
|
|
48
|
+
|
|
49
|
+
The prompt renderer currently returns `StepMetadata` (stepId, title, prompt string, agentRole, requireConfirmation). The response formatter receives the final response object through shape detection. Between them, content categories are implicit.
|
|
50
|
+
|
|
51
|
+
The envelope makes them explicit:
|
|
52
|
+
|
|
53
|
+
```typescript
|
|
54
|
+
interface StepContentEnvelope {
|
|
55
|
+
readonly authoredPrompt: string;
|
|
56
|
+
readonly matchedFragmentIds: readonly string[];
|
|
57
|
+
readonly requirements: readonly Requirement[];
|
|
58
|
+
readonly loopBanner: string | null;
|
|
59
|
+
readonly recoveryContext: string | null;
|
|
60
|
+
readonly references: readonly ResolvedReference[];
|
|
61
|
+
}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
The handler assembles the envelope from renderer output + handler-level knowledge (binding drift, preferences, blockers). The `V2ExecutionRenderEnvelope` grows from `{ response, lifecycle }` to `{ response, lifecycle, contentEnvelope }`. The formatter consumes the envelope instead of relying on ad-hoc shape detection.
|
|
65
|
+
|
|
66
|
+
### Linked references
|
|
67
|
+
|
|
68
|
+
A reference declaration on `WorkflowDefinition`:
|
|
69
|
+
|
|
70
|
+
```typescript
|
|
71
|
+
interface WorkflowReference {
|
|
72
|
+
readonly id: string;
|
|
73
|
+
readonly title: string;
|
|
74
|
+
readonly source: string; // path or URI
|
|
75
|
+
readonly purpose: string;
|
|
76
|
+
readonly authoritative: boolean;
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Reference handling splits into two phases:
|
|
81
|
+
|
|
82
|
+
- **Compile-time** (pure): validate declarations structurally (unique IDs, non-empty paths, valid shapes). Include declarations in the workflow hash.
|
|
83
|
+
- **Start-time** (I/O): resolve paths against the workspace, validate existence, capture resolved references as observation events. This follows the existing pattern in `resolveWorkspaceAnchors` in `start.ts`.
|
|
84
|
+
|
|
85
|
+
Only workflow-declared references participate in the hash. Project-attached references (future) are handled like binding overrides: captured at session start, drift-detected against current state.
|
|
86
|
+
|
|
87
|
+
### Data flow awareness
|
|
88
|
+
|
|
89
|
+
The Zod schema boundary (`V2StartWorkflowOutputSchema.parse`) only knows about `pending.prompt` as a string. The envelope travels as a parallel channel through the render envelope wrapper, not through the Zod-validated response. `pending.prompt` is serialized from the envelope's `authoredPrompt` for backward compatibility.
|
|
90
|
+
|
|
91
|
+
### metaGuidance status
|
|
92
|
+
|
|
93
|
+
`metaGuidance` is declared on `WorkflowDefinition` but in the v2 clean-format path it is not delivered to the agent during execution (only visible in `inspect_workflow` output). Some things currently in metaGuidance (e.g. "follow this coding guide") are references in disguise. This initiative should clarify metaGuidance's delivery semantics or deprecate it in favor of references + existing prompt composition primitives.
|
|
94
|
+
|
|
95
|
+
## Constraints
|
|
96
|
+
|
|
97
|
+
- Prompt fragments must not move out of the prompt string. They participate in the authored prompt and affect recovery budget calculations (`RECOVERY_BUDGET_BYTES`). Moving them would change prompt hashes and break rehydrate for existing sessions.
|
|
98
|
+
- Reference content must not be inlined at compile time. Referenced files change independently of the workflow; content inlining would make hashes unstable.
|
|
99
|
+
- Project-attached references must not participate in the workflow hash. The same workflow in two projects with different local refs must produce the same hash. Project refs are observation-level, not definition-level.
|
|
100
|
+
|
|
101
|
+
## Slice plan
|
|
102
|
+
|
|
103
|
+
> All slices 1–6 are implemented. Slice 5 (project-attached references) was deferred as future work.
|
|
104
|
+
|
|
105
|
+
### Slice 1: StepContentEnvelope type and render envelope extension (done)
|
|
106
|
+
|
|
107
|
+
Define the `StepContentEnvelope` type. Extend `V2ExecutionRenderEnvelope` to carry it. Have the handler assemble it from renderer output + handler-level knowledge. Formatter consumes it. **No behavioral change**: the formatter produces identical output, sourced from a typed representation instead of ad-hoc shape detection.
|
|
108
|
+
|
|
109
|
+
**Key files**: `render-envelope.ts`, `prompt-renderer.ts`, `v2-response-formatter.ts`, `v2-execution/start.ts`, `v2-execution/continue-rehydrate.ts`, `v2-execution/continue-advance.ts`
|
|
110
|
+
|
|
111
|
+
### Slice 2: Reference declarations (done)
|
|
112
|
+
|
|
113
|
+
Add `references` as an optional array on `WorkflowDefinition` and `workflow.schema.json`. Structural validation in the validation engine (unique IDs, non-empty paths). Compiler includes declarations in workflow hash. Surfaced in `inspect_workflow` output.
|
|
114
|
+
|
|
115
|
+
**Key files**: `workflow-definition.ts`, `workflow.schema.json`, `validation-engine.ts`, `v2-workflow.ts` (inspect handler)
|
|
116
|
+
|
|
117
|
+
### Slice 3: Reference resolution at start-time (done)
|
|
118
|
+
|
|
119
|
+
I/O phase at `start_workflow` validates reference paths against the workspace, stores resolved references as observation events. Handler populates the envelope's reference section.
|
|
120
|
+
|
|
121
|
+
**Key files**: `v2-execution/start.ts`, `v2-workspace-resolution.ts`, observation event schema
|
|
122
|
+
|
|
123
|
+
### Slice 4: Reference delivery (done)
|
|
124
|
+
|
|
125
|
+
Formatter renders resolved references as a dedicated MCP content item on `start` (full set) and `rehydrate` (compact reminder). Separate from the authored prompt and from supplements.
|
|
126
|
+
|
|
127
|
+
**Key files**: `v2-response-formatter.ts`, `handler-factory.ts` (toMcpResult)
|
|
128
|
+
|
|
129
|
+
### Slice 5: Project-attached references (deferred — future work)
|
|
130
|
+
|
|
131
|
+
`.workrail/references.json` merges with workflow-declared references at start-time. Provenance field (`workflow_declared` | `project_attached`) distinguishes origin. Drift detection via observation comparison (same pattern as binding drift in `binding-drift.ts`).
|
|
132
|
+
|
|
133
|
+
**Key files**: new `reference-registry.ts`, `v2-execution/continue-rehydrate.ts` (drift detection), `v2-response-formatter.ts` (drift warnings)
|
|
134
|
+
|
|
135
|
+
### Slice 6: metaGuidance clarification (done)
|
|
136
|
+
|
|
137
|
+
Either make metaGuidance delivery explicit through the envelope (a supplement or dedicated content section with clear lifecycle semantics) or deprecate it with a migration path to references + prompt composition.
|
|
138
|
+
|
|
139
|
+
**Key files**: `workflow-definition.ts`, `prompt-renderer.ts`, `v2-response-formatter.ts`, authoring spec, authoring docs
|
|
140
|
+
|
|
141
|
+
## Relationship to other initiatives
|
|
142
|
+
|
|
143
|
+
- **Composition and middleware engine** (agentic-orchestration-roadmap.md Phase 2): the StepContentEnvelope provides a typed surface that a future assembler/middleware engine would populate, rather than producing raw strings.
|
|
144
|
+
- **Authorable response supplements** (agentic-orchestration-roadmap.md backlog): the envelope gives supplements a typed home. Authorable supplements would declare their content in workflow JSON and flow through the envelope rather than being hardcoded in `response-supplements.ts`.
|
|
145
|
+
- **Clean response formatting** (active partial): this initiative completes the boundary clarification between authored prompts, system-injected content, and delivery framing by making each category typed and inspectable.
|
|
146
|
+
|
|
147
|
+
## Open questions
|
|
148
|
+
|
|
149
|
+
- Should references support URI schemes beyond file paths (e.g. `https://`, `wr.refs.*`)? Deferring to v1 feedback.
|
|
150
|
+
- Should the envelope carry the full supplement specs or just the rendered text? Leaning toward rendered text to keep the formatter's presentation logic in one place.
|
|
151
|
+
- Should drift detection for project-attached references be blocking or advisory? Leaning advisory (same as binding drift).
|