@exaudeus/workrail 3.27.0 → 3.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +3 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +252 -93
- package/workflows/workflow-for-workflows.v2.json +188 -77
|
@@ -0,0 +1,527 @@
|
|
|
1
|
+
# Workflow Authoring Guide (v2)
|
|
2
|
+
|
|
3
|
+
WorkRail v2 authoring is **JSON-first** and is designed for **determinism**, **rewind-safety**, and **resumability**.
|
|
4
|
+
|
|
5
|
+
> **Status:** v2 authoring is design-locked but not necessarily shipped yet. This doc is a v2-only entry point.
|
|
6
|
+
|
|
7
|
+
## Canonical references (v2)
|
|
8
|
+
|
|
9
|
+
- **Authoring model + JSON examples:** `docs/design/workflow-authoring-v2.md`
|
|
10
|
+
- **Execution contract (token-based):** `docs/reference/workflow-execution-contract.md`
|
|
11
|
+
- **Core design locks (anti-drift):** `docs/design/v2-core-design-locks.md`
|
|
12
|
+
|
|
13
|
+
## v2 authoring principles (high level)
|
|
14
|
+
|
|
15
|
+
### Structured freedom over rigid scripts
|
|
16
|
+
|
|
17
|
+
WorkRail workflows should constrain **outcomes and invariants**, not micromanage cognition.
|
|
18
|
+
|
|
19
|
+
Material branching, pathing, loop continuation, and gating should live in the workflow/engine as declarative control flow whenever possible, not in implicit agent judgment hidden inside prompt prose.
|
|
20
|
+
|
|
21
|
+
Authors should aim for:
|
|
22
|
+
|
|
23
|
+
- **rigid on invariants**: required outputs, loop decisions, confidence disclosure, blocked vs never-stop behavior, final handoff structure
|
|
24
|
+
- **semi-structured on heuristics**: routing matrices, severity guidance, confidence combination rules, artifact vs context split
|
|
25
|
+
- **adaptive on reasoning**: exploration order, clue prioritization, synthesis, finding phrasing, and unusual-case handling
|
|
26
|
+
|
|
27
|
+
The goal is **structured freedom**:
|
|
28
|
+
|
|
29
|
+
- not "trust the model" vagueness
|
|
30
|
+
- not bureaucratic form-filling
|
|
31
|
+
|
|
32
|
+
The agent should usually determine and record the route-driving facts. The engine should usually decide what node, branch, or loop state comes next.
|
|
33
|
+
|
|
34
|
+
Prefer asking:
|
|
35
|
+
|
|
36
|
+
- what must be known before leaving this phase?
|
|
37
|
+
- what must be disclosed if it is not known?
|
|
38
|
+
|
|
39
|
+
over prescribing the exact internal thought sequence the agent must follow.
|
|
40
|
+
|
|
41
|
+
### Never-stop by default for enrichment and confidence gaps
|
|
42
|
+
|
|
43
|
+
For most workflows, missing enrichment sources or weak confidence should **degrade and disclose**, not block.
|
|
44
|
+
|
|
45
|
+
Typical examples:
|
|
46
|
+
|
|
47
|
+
- preferred capability unavailable
|
|
48
|
+
- missing ticket or supporting docs
|
|
49
|
+
- weak boundary confidence
|
|
50
|
+
- incomplete policy/context discovery
|
|
51
|
+
|
|
52
|
+
Blocking should be reserved for cases where:
|
|
53
|
+
|
|
54
|
+
- the review/task target is not meaningfully available
|
|
55
|
+
- a truly required capability is unavailable
|
|
56
|
+
- a required output contract is missing in blocking modes
|
|
57
|
+
|
|
58
|
+
### Confidence is multi-dimensional
|
|
59
|
+
|
|
60
|
+
Avoid a single vague "confidence" concept when different uncertainty sources matter differently.
|
|
61
|
+
|
|
62
|
+
Reusable confidence dimensions often include:
|
|
63
|
+
|
|
64
|
+
- boundary confidence
|
|
65
|
+
- context / intent confidence
|
|
66
|
+
- policy-context confidence
|
|
67
|
+
- evidence confidence
|
|
68
|
+
- validation confidence
|
|
69
|
+
|
|
70
|
+
Authors should explicitly decide:
|
|
71
|
+
|
|
72
|
+
- which confidence dimensions matter for this workflow
|
|
73
|
+
- which ones cap final conclusions
|
|
74
|
+
- which ones trigger follow-up loops versus just downgrade the final handoff
|
|
75
|
+
|
|
76
|
+
### Use structure only when it earns its place
|
|
77
|
+
|
|
78
|
+
A matrix, field, ledger, or classification should exist only if it does at least one of these:
|
|
79
|
+
|
|
80
|
+
- prevents a real recurring failure mode
|
|
81
|
+
- improves deterministic control flow or resumability
|
|
82
|
+
- improves user-visible honesty / explainability
|
|
83
|
+
- materially changes routing or rigor
|
|
84
|
+
|
|
85
|
+
If it does none of those, it should be removed or downgraded to advisory guidance.
|
|
86
|
+
|
|
87
|
+
Practical example:
|
|
88
|
+
|
|
89
|
+
- a `boundaryConfidence` field earns its place because it can cap conclusions and trigger follow-up
|
|
90
|
+
- a five-level taxonomy that never changes routing probably does not
|
|
91
|
+
|
|
92
|
+
### Anti-lazy wording
|
|
93
|
+
|
|
94
|
+
Structured freedom should not become vague permission for shallow work.
|
|
95
|
+
|
|
96
|
+
Be careful with wording like:
|
|
97
|
+
|
|
98
|
+
- `if appropriate`
|
|
99
|
+
- `minimal pass`
|
|
100
|
+
- `light scan`
|
|
101
|
+
- `you may`
|
|
102
|
+
- `smallest`
|
|
103
|
+
- `cheapest`
|
|
104
|
+
|
|
105
|
+
These phrases are often useful, but they should usually be paired with a clear floor for what still must be achieved.
|
|
106
|
+
|
|
107
|
+
Prefer wording like:
|
|
108
|
+
|
|
109
|
+
- "do the lightest pass that still surfaces the main approaches, hard constraints, and obvious contradictions"
|
|
110
|
+
- "if you do not delegate, record why solo execution is enough"
|
|
111
|
+
- "generate enough distinct options to support a real choice"
|
|
112
|
+
|
|
113
|
+
The goal is freedom in method, not softness in rigor.
|
|
114
|
+
|
|
115
|
+
### User-voice prose is a real option
|
|
116
|
+
|
|
117
|
+
For bundled and user-facing workflows, authors should consider prose that often sounds like the user is directly instructing the agent.
|
|
118
|
+
|
|
119
|
+
This is often better than detached framework narration for exploratory, advisory, or design-heavy workflows because it:
|
|
120
|
+
|
|
121
|
+
- keeps the workflow grounded in user intent
|
|
122
|
+
- reduces internal-boilerplate tone
|
|
123
|
+
- makes the workflow feel more like an expression of the user's will
|
|
124
|
+
|
|
125
|
+
Neutral/system-style prose is still appropriate for internal, infrastructural, or highly mechanical workflows. Choose deliberately.
|
|
126
|
+
|
|
127
|
+
### Auditor-first delegation is often the better default
|
|
128
|
+
|
|
129
|
+
When using subagents or routines, prefer bounded **audits** of the main agent's work over delegating broad task ownership.
|
|
130
|
+
|
|
131
|
+
Good auditor uses:
|
|
132
|
+
|
|
133
|
+
- context completeness audit
|
|
134
|
+
- depth audit
|
|
135
|
+
- adversarial challenge
|
|
136
|
+
- philosophy alignment review
|
|
137
|
+
- final verification
|
|
138
|
+
|
|
139
|
+
Executor-style delegation still makes sense for bounded independent work, but the parent workflow should usually remain the canonical synthesizer and decision-maker.
|
|
140
|
+
|
|
141
|
+
### JSON-first authoring
|
|
142
|
+
|
|
143
|
+
WorkRail v2 uses **JSON** as the canonical authoring format. DSL and YAML remain possible future input formats, but for v2 we optimize for determinism and straightforward validation.
|
|
144
|
+
|
|
145
|
+
Workflows are hashed based on their **compiled canonical model** (after templates/features/contracts are expanded), not raw text, so the hash remains stable and deterministic.
|
|
146
|
+
|
|
147
|
+
### Authoring primitives (v2)
|
|
148
|
+
|
|
149
|
+
WorkRail v2 introduces several primitives for expressive workflows:
|
|
150
|
+
|
|
151
|
+
- **Capabilities** (workflow-global): declare optional agent capabilities like `delegation` or `web_browsing` (required/preferred).
|
|
152
|
+
- **Features** (compiler middleware): mostly toggle IDs; a small subset supports typed config objects (`{id, config}`).
|
|
153
|
+
- **Templates**: reusable step sequences, called explicitly via `type: "template_call"`.
|
|
154
|
+
- **Contract packs**: WorkRail-owned output schemas for structured artifacts (e.g., `wr.contracts.capability_observation`).
|
|
155
|
+
- **PromptBlocks** (optional): structure step prompts as blocks (goal/constraints/procedure/outputRequired/verify) which compile to deterministic text.
|
|
156
|
+
- **AgentRole**: workflow and/or step-level stance/persona (not system prompt control).
|
|
157
|
+
- **Extension points**: named slots declared with `extensionPoints` and referenced via `{{wr.bindings.slotId}}` tokens; resolved at compile time from project `.workrail/bindings.json` overrides or workflow defaults. Enables project-overridable delegation seams without forking workflow JSON.
|
|
158
|
+
- **References**: workflow-declared pointers to external documents (schemas, specs, guides). Resolved at start time, delivered as a separate MCP content item. The agent reads the files itself if needed. See "Workflow references" section below.
|
|
159
|
+
- **Assessments**: workflow-declared assessment shapes (`assessments`) that steps can reference with one or more `assessmentRefs` and, in v1, use for one exact-match `require_followup` consequence via `assessmentConsequences` (fires if any dimension across any referenced assessment equals the trigger level).
|
|
160
|
+
|
|
161
|
+
For detailed JSON syntax and examples, see: `docs/design/workflow-authoring-v2.md`.
|
|
162
|
+
|
|
163
|
+
### Choosing the right authoring mechanism
|
|
164
|
+
|
|
165
|
+
Use different primitives for different jobs. A good rule of thumb is:
|
|
166
|
+
|
|
167
|
+
- **Different flow** → use `runCondition`
|
|
168
|
+
- **Same flow, different wording** → use `promptFragments`
|
|
169
|
+
|
|
170
|
+
| Mechanism | What it does | When it applies | Best for |
|
|
171
|
+
|---|---|---|---|
|
|
172
|
+
| **`features`** | Inject reusable guidance into `promptBlocks` sections | **Compile time** | Cross-cutting rules like memory or subagent discipline |
|
|
173
|
+
| **`promptFragments`** | Add small conditional prompt text to an existing step | **Render/runtime** | Context-sensitive nudges without branching the DAG |
|
|
174
|
+
| **`templateCall`** | Insert reusable step structure or step sequences inline | **Compile time** | Reusing standard routines or audits as first-class parent steps |
|
|
175
|
+
| **`extensionPoints`** | Resolve overridable bound routine/workflow IDs in prompt text | **Compile time** | Swapping delegated bounded seams without forking |
|
|
176
|
+
| **`runCondition`** | Decide whether a step runs | **Runtime** | Real branching, pathing, or routing |
|
|
177
|
+
|
|
178
|
+
Use these as the default choice rules:
|
|
179
|
+
|
|
180
|
+
- **Use `runCondition`** when the workflow should actually take a different path.
|
|
181
|
+
- **Use `promptFragments`** when the step stays the same but needs small context-sensitive additions.
|
|
182
|
+
- **Use `templateCall`** when you want reusable standard structure or routine injection.
|
|
183
|
+
- **Use `extensionPoints`** when you want a project-overridable delegated seam, not inline structure.
|
|
184
|
+
- **Use `features`** for workflow-wide repeated guidance.
|
|
185
|
+
|
|
186
|
+
#### References are for execution-time companion material, not authoring provenance
|
|
187
|
+
|
|
188
|
+
Use workflow `references` sparingly.
|
|
189
|
+
|
|
190
|
+
- **Good use**: documents the running workflow may genuinely need while doing its job, such as a shipped rubric, a target-system spec, or a project policy document.
|
|
191
|
+
- **Bad use**: authoring-only material about how the workflow was designed, including the workflow schema, authoring spec, or maintainer provenance, unless the workflow is itself about authoring or validating workflows.
|
|
192
|
+
|
|
193
|
+
Practical test:
|
|
194
|
+
|
|
195
|
+
- If the reference helps the agent perform the workflow's **runtime task**, keep it.
|
|
196
|
+
- If it only helps a maintainer justify or inspect the workflow's design, remove it from normal execution workflows.
|
|
197
|
+
|
|
198
|
+
#### Strong default: inject first, override only when delegation is intentional
|
|
199
|
+
|
|
200
|
+
When choosing between `templateCall` and `extensionPoints`, prefer this rule:
|
|
201
|
+
|
|
202
|
+
- **If the routine should be part of the parent workflow's visible step structure, use `templateCall`.**
|
|
203
|
+
- **If the routine should remain an opaque bounded implementation that the parent may delegate to, use delegation.**
|
|
204
|
+
- **If that delegated seam should be customizable per project, wrap that delegation seam in `extensionPoints`.**
|
|
205
|
+
|
|
206
|
+
Do **not** use `extensionPoints` as a generic substitute for routine injection.
|
|
207
|
+
|
|
208
|
+
Important implementation detail:
|
|
209
|
+
|
|
210
|
+
- `templateCall` expands routines into real steps during the compiler's template pass.
|
|
211
|
+
- `{{wr.bindings.*}}` tokens are resolved later, during binding resolution.
|
|
212
|
+
- Therefore, **extension points cannot currently choose which routine gets injected by a `templateCall`**.
|
|
213
|
+
|
|
214
|
+
### Baseline (Tier 0): notes-first
|
|
215
|
+
|
|
216
|
+
- **You can write workflows with no special authoring features.**
|
|
217
|
+
- The default durable output is a short recap in `output.notesMarkdown` (recorded by the agent when advancing or checkpointing).
|
|
218
|
+
- Structured artifacts are **optional** and must never be required for a workflow to be usable.
|
|
219
|
+
|
|
220
|
+
### Assessment-gate authoring (v1)
|
|
221
|
+
|
|
222
|
+
Assessment gates are now a shipped authoring/runtime feature, but the first slice is intentionally narrow.
|
|
223
|
+
|
|
224
|
+
Use them when:
|
|
225
|
+
|
|
226
|
+
- the workflow needs the agent to submit a bounded judgment explicitly
|
|
227
|
+
- that judgment should be durably recorded
|
|
228
|
+
- one specific result should keep the same step pending and require follow-up before retry
|
|
229
|
+
|
|
230
|
+
The v1 shape is:
|
|
231
|
+
|
|
232
|
+
- declare one or more workflow-level assessments in `assessments`
|
|
233
|
+
- reference them from a step with one or more `assessmentRefs` entries
|
|
234
|
+
- optionally declare one step-level `assessmentConsequences` rule
|
|
235
|
+
- if any dimension across any referenced assessment matches that rule, the engine returns a retryable same-step follow-up block
|
|
236
|
+
|
|
237
|
+
Important limits in v1:
|
|
238
|
+
|
|
239
|
+
- at least one `assessmentRefs` entry when `assessmentConsequences` is present; multiple refs are supported
|
|
240
|
+
- at most one `assessmentConsequences` entry per step
|
|
241
|
+
- one supported effect only: `require_followup`
|
|
242
|
+
- exact-match trigger only: one declared dimension equals one declared canonical level
|
|
243
|
+
|
|
244
|
+
Keep the responsibility split clean:
|
|
245
|
+
|
|
246
|
+
- **assessment definition** = reusable vocabulary (`dimensions`, allowed `levels`, purpose)
|
|
247
|
+
- **step usage** = local execution behavior (`assessmentRefs`, `assessmentConsequences`)
|
|
248
|
+
|
|
249
|
+
Do not put execution-policy meaning into the assessment definition itself.
|
|
250
|
+
|
|
251
|
+
Practical guidance:
|
|
252
|
+
|
|
253
|
+
- use assessments for **bounded judgment**, not generic scoring
|
|
254
|
+
- keep dimensions small and semantically meaningful
|
|
255
|
+
- use canonical level names authors and agents can understand easily
|
|
256
|
+
- write follow-up guidance as **same-step retry guidance**, not as a subflow or rewind instruction
|
|
257
|
+
- prefer one strong follow-up trigger over multiple weak ones
|
|
258
|
+
|
|
259
|
+
**Dimension design: orthogonality matters**
|
|
260
|
+
|
|
261
|
+
The value of multi-dimensional assessments is that each dimension independently blocks advancement for a different reason. A dimension that restates existing workflow state adds ceremony without structure.
|
|
262
|
+
|
|
263
|
+
Good dimensions are:
|
|
264
|
+
- **Orthogonal**: each captures a distinct failure mode the others don't catch
|
|
265
|
+
- **Independently checkable**: a `low` rating on one dimension alone justifies follow-up, regardless of the others
|
|
266
|
+
- **Specific**: about a concrete, observable thing -- not "is the overall result good"
|
|
267
|
+
|
|
268
|
+
Bad dimensions:
|
|
269
|
+
- A single `confidence` dimension that mirrors the workflow's existing `recommendationConfidenceBand` -- it just restates what the workflow already knows
|
|
270
|
+
- Multiple dimensions that all reduce to the same question phrased differently
|
|
271
|
+
- Dimensions so correlated that one being low always implies the others are low too
|
|
272
|
+
|
|
273
|
+
Example of orthogonal dimensions for an MR review handoff:
|
|
274
|
+
- `evidence_quality` -- are findings grounded in specific code locations? (catches: weak analysis)
|
|
275
|
+
- `coverage_completeness` -- are all relevant domains checked? (catches: blind spots)
|
|
276
|
+
- `contradiction_resolution` -- are competing interpretations resolved? (catches: premature synthesis)
|
|
277
|
+
|
|
278
|
+
Each catches a distinct failure mode. A review can have strong evidence but miss whole domains, or have good coverage with unresolved contradictions.
|
|
279
|
+
|
|
280
|
+
**Consequence trigger**: use `anyEqualsLevel` to specify which level should block. WorkRail checks all submitted dimensions and fires the consequence if any of them equals that level. For single-dimension assessments this works the same as an exact match.
|
|
281
|
+
|
|
282
|
+
```json
|
|
283
|
+
{
|
|
284
|
+
"when": { "anyEqualsLevel": "low" },
|
|
285
|
+
"effect": { "kind": "require_followup", "guidance": "..." }
|
|
286
|
+
}
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
**V1 limit**: at most one consequence per step.
|
|
290
|
+
|
|
291
|
+
Good fit:
|
|
292
|
+
|
|
293
|
+
- "Before handing off, assess whether the diagnosis is ready -- coverage is complete, evidence is grounded, and contradictions are resolved."
|
|
294
|
+
- "If evidence quality is low, require follow-up to anchor findings to specific code before retry."
|
|
295
|
+
|
|
296
|
+
Bad fit:
|
|
297
|
+
|
|
298
|
+
- generic five-level scores that never affect workflow behavior
|
|
299
|
+
- a single `confidence` dimension that mirrors a confidence band the workflow already tracks
|
|
300
|
+
- multiple interacting rule chains that really want a policy DSL
|
|
301
|
+
- subflow-style recovery sequences masquerading as a single follow-up consequence
|
|
302
|
+
|
|
303
|
+
### Rigor vs rigidity
|
|
304
|
+
|
|
305
|
+
Workflows should provide a **strong skeleton, not a straitjacket**.
|
|
306
|
+
|
|
307
|
+
- Be strict about the things that protect quality:
|
|
308
|
+
- required outcomes
|
|
309
|
+
- important decision points
|
|
310
|
+
- evidence and uncertainty accounting
|
|
311
|
+
- final validation or challenge passes
|
|
312
|
+
- Be flexible about the things LLMs are good at:
|
|
313
|
+
- synthesis order
|
|
314
|
+
- framing moves
|
|
315
|
+
- idea generation
|
|
316
|
+
- creative problem decomposition
|
|
317
|
+
|
|
318
|
+
Author for **what must be accomplished**, not for an exact internal choreography of thought.
|
|
319
|
+
|
|
320
|
+
### Anti-lazy wording
|
|
321
|
+
|
|
322
|
+
Adaptive workflows should leave room for judgment without creating easy escape hatches.
|
|
323
|
+
|
|
324
|
+
Be careful with wording like:
|
|
325
|
+
|
|
326
|
+
- `if appropriate`
|
|
327
|
+
- `minimal pass`
|
|
328
|
+
- `light scan`
|
|
329
|
+
- `you may`
|
|
330
|
+
- `smallest`
|
|
331
|
+
- `cheapest`
|
|
332
|
+
|
|
333
|
+
These phrases are often useful, but they become lazy when they are not paired with a quality floor.
|
|
334
|
+
|
|
335
|
+
Prefer wording like:
|
|
336
|
+
|
|
337
|
+
- "do the lightest pass that still surfaces the main approaches, hard constraints, and obvious contradictions"
|
|
338
|
+
- "choose the lighter path only if it still answers the real question"
|
|
339
|
+
- "if you do not delegate, record why solo execution is enough"
|
|
340
|
+
|
|
341
|
+
Good adaptive wording gives the agent freedom in method while still requiring it to **earn confidence**.
|
|
342
|
+
|
|
343
|
+
### User-voice prose
|
|
344
|
+
|
|
345
|
+
For bundled and user-facing workflows, prefer prose that often feels like **the user is directly instructing the agent**.
|
|
346
|
+
|
|
347
|
+
This usually works better than detached author or framework narration because it:
|
|
348
|
+
|
|
349
|
+
- keeps the workflow grounded in user intent
|
|
350
|
+
- makes prompts feel less like internal boilerplate
|
|
351
|
+
- encourages the agent to treat the workflow as an expression of the user's will
|
|
352
|
+
|
|
353
|
+
Neutral/system-style prose is still fine for internal or infrastructural workflows, but user-facing flows generally benefit from a clearer user voice.
|
|
354
|
+
|
|
355
|
+
### Builtins (no user-defined plugins)
|
|
356
|
+
|
|
357
|
+
WorkRail v2 provides **built-in** building blocks that workflows (including external workflows) can reference:
|
|
358
|
+
|
|
359
|
+
- **Templates**: pre-built steps (or step sequences) authors can “call” to speed up authoring and ensure consistency.
|
|
360
|
+
- **Features**: deterministic, closed-set “middleware” applied by WorkRail (e.g., tier-aware instructions, formatting, durable recap guidance).
|
|
361
|
+
- **Contract packs**: server-side definitions for allowed artifact kinds and small examples (no schema authoring required by workflow authors).
|
|
362
|
+
|
|
363
|
+
External workflows can reference these builtins, but cannot define arbitrary new plugin code.
|
|
364
|
+
|
|
365
|
+
### Where injections happen: templates as anchors
|
|
366
|
+
|
|
367
|
+
When something needs to be injected at a specific point (“run an audit here”, “insert a standard gate here”), **template references are the primary anchor**:
|
|
368
|
+
|
|
369
|
+
- Explicit at the callsite (less hidden magic).
|
|
370
|
+
- Deterministic and debuggable.
|
|
371
|
+
- Avoids tag-taxonomy sprawl.
|
|
372
|
+
|
|
373
|
+
If what you want is **team-overridable delegation**, use `extensionPoints` instead — but treat that as a different mechanism with different runtime behavior, not another form of injection.
|
|
374
|
+
|
|
375
|
+
Tags can still exist as optional **classification** metadata (for UI organization and search), but should not be the primary injection mechanism.
|
|
376
|
+
|
|
377
|
+
### Response supplements for start/resume-only instructions
|
|
378
|
+
|
|
379
|
+
Some instructions should **not** be mixed into the workflow-authored step prompt:
|
|
380
|
+
|
|
381
|
+
- short onboarding guidance
|
|
382
|
+
- authority/provenance framing for the WorkRail channel
|
|
383
|
+
- logistics that should appear only at workflow start or when resuming
|
|
384
|
+
|
|
385
|
+
For these, use **response supplements** at the MCP response boundary rather than editing workflow JSON prompts directly.
|
|
386
|
+
|
|
387
|
+
Current implementation lives in `src/mcp/response-supplements.ts`.
|
|
388
|
+
|
|
389
|
+
#### When to use a response supplement
|
|
390
|
+
|
|
391
|
+
Use a response supplement when all of the following are true:
|
|
392
|
+
|
|
393
|
+
- the instruction is **system-owned** or delivery-owned, not part of the workflow author's actual step text
|
|
394
|
+
- it should be shown only for specific lifecycle moments like **`start`** or **`rehydrate`**
|
|
395
|
+
- it should remain **structurally separate** from the main step prompt so agents do not confuse it with the user's core instruction
|
|
396
|
+
|
|
397
|
+
Do **not** use a response supplement for:
|
|
398
|
+
|
|
399
|
+
- normal step instructions that belong in the workflow prompt
|
|
400
|
+
- durable session state
|
|
401
|
+
- anything that must be remembered as part of the workflow's semantic execution state
|
|
402
|
+
|
|
403
|
+
#### Delivery modes
|
|
404
|
+
|
|
405
|
+
Response supplements support two delivery modes:
|
|
406
|
+
|
|
407
|
+
- **`per_lifecycle`**: emit on every eligible lifecycle (for example, every `rehydrate`)
|
|
408
|
+
- **`once_per_session`**: emit only on one designated lifecycle (for example, `start`) without persisting delivery state
|
|
409
|
+
|
|
410
|
+
In the current design, `once_per_session` is a **policy-level one-time instruction**, not a durable delivery record. It means:
|
|
411
|
+
|
|
412
|
+
- choose the single lifecycle where the supplement should appear
|
|
413
|
+
- render it there deterministically
|
|
414
|
+
- do **not** store "shown/not shown" in session state unless exact delivery history becomes a real execution requirement
|
|
415
|
+
|
|
416
|
+
This keeps presentation policy out of durable workflow state.
|
|
417
|
+
|
|
418
|
+
#### How to add a one-time instruction
|
|
419
|
+
|
|
420
|
+
1. Add a new supplement entry in `src/mcp/response-supplements.ts`
|
|
421
|
+
2. Give it a stable `kind` and explicit `order`
|
|
422
|
+
3. Choose the eligible `lifecycles`
|
|
423
|
+
4. Set `delivery` to:
|
|
424
|
+
- `{ mode: 'per_lifecycle' }`, or
|
|
425
|
+
- `{ mode: 'once_per_session', emitOn: '<lifecycle>' }`
|
|
426
|
+
5. Keep the text:
|
|
427
|
+
- short
|
|
428
|
+
- system-owned
|
|
429
|
+
- clearly separate from the main authored prompt
|
|
430
|
+
6. Add or update:
|
|
431
|
+
- unit tests in `tests/unit/mcp/response-supplements.test.ts`
|
|
432
|
+
- integration tests if MCP boundary behavior matters
|
|
433
|
+
|
|
434
|
+
#### Authoring rule of thumb
|
|
435
|
+
|
|
436
|
+
Use the **workflow prompt** for what the user wants done.
|
|
437
|
+
|
|
438
|
+
Use a **response supplement** for small, boundary-owned instructions about how WorkRail should frame or deliver that step to the agent.
|
|
439
|
+
|
|
440
|
+
### Workflow references
|
|
441
|
+
|
|
442
|
+
Workflows can declare pointers to external documents that the agent should be aware of during execution. Unlike `metaGuidance` (short behavioral rules surfaced on start and resume), references point at external files without inlining their content.
|
|
443
|
+
|
|
444
|
+
```jsonc
|
|
445
|
+
"references": [
|
|
446
|
+
{
|
|
447
|
+
"id": "api-schema",
|
|
448
|
+
"title": "API Schema",
|
|
449
|
+
"source": "./spec/api-schema.json",
|
|
450
|
+
"purpose": "Canonical API contract",
|
|
451
|
+
"authoritative": true
|
|
452
|
+
}
|
|
453
|
+
]
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
- **Delivered automatically** as a separate MCP content item on `start` (full details) and `rehydrate` (compact reminder). Not on `advance`.
|
|
457
|
+
- **Pointer-only**: WorkRail validates the path exists at start time but does not inline the file content. The agent reads files itself.
|
|
458
|
+
- **Surfaced in `inspect_workflow`** for discoverability before starting.
|
|
459
|
+
- **Included in `workflowHash`**: reference declarations (not file contents) are part of the hash.
|
|
460
|
+
|
|
461
|
+
For JSON syntax details, see: `docs/design/workflow-authoring-v2.md` → "References" section.
|
|
462
|
+
|
|
463
|
+
### Step identity and provenance
|
|
464
|
+
|
|
465
|
+
To keep authoring simple:
|
|
466
|
+
|
|
467
|
+
- Author step IDs remain the primary, stable identifiers (what agents see as `pending.stepId`).
|
|
468
|
+
- Template-expanded/internal step IDs are **reserved/internal** and carry provenance (what injected them, where, and why).
|
|
469
|
+
- By default, injected steps should be **collapsed** for agent UX; provenance exists for debugging/auditing and advanced views.
|
|
470
|
+
|
|
471
|
+
### Versioning and determinism
|
|
472
|
+
|
|
473
|
+
- The canonical pin is a **content hash** of the **fully expanded compiled workflow** (including template expansions, feature application, and contract pack selection), not a human-maintained `version` string.
|
|
474
|
+
- Human `version` fields may exist as labels, but should not be the source of truth for determinism.
|
|
475
|
+
|
|
476
|
+
### Workflow staleness
|
|
477
|
+
|
|
478
|
+
Workflows can drift out of sync with the authoring spec they were written against. WorkRail surfaces this as a `staleness` signal in `list_workflows` and `inspect_workflow` output.
|
|
479
|
+
|
|
480
|
+
**How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
|
|
481
|
+
|
|
482
|
+
- `none` — workflow was validated against the current spec version
|
|
483
|
+
- `likely` — spec was updated since the workflow was last reviewed
|
|
484
|
+
- `possible` — workflow has never been run through `workflow-for-workflows`
|
|
485
|
+
|
|
486
|
+
**Stamping a workflow:**
|
|
487
|
+
|
|
488
|
+
```bash
|
|
489
|
+
npm run stamp-workflow -- workflows/my-workflow.json
|
|
490
|
+
git add workflows/my-workflow.json && git commit -m "chore: stamp workflow"
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
The stamp must be committed to take effect. The `workflow-for-workflows` Phase 7 step includes a reminder to do this.
|
|
494
|
+
|
|
495
|
+
**Visibility:** By default, the staleness signal is only shown for user-owned/imported workflows (`personal`, `rooted_sharing`, `external`). Built-in and legacy_project workflows are excluded. Set `WORKRAIL_DEV=1` to see staleness for all categories (useful for catalog maintenance).
|
|
496
|
+
|
|
497
|
+
**In `validate:registry`:** The validator prints a non-blocking advisory listing unstamped and outdated workflows after each run. This is always visible regardless of the dev flag.
|
|
498
|
+
|
|
499
|
+
### Debugging and auditing
|
|
500
|
+
|
|
501
|
+
WorkRail v2 treats debugging/auditing as first-class:
|
|
502
|
+
|
|
503
|
+
- WorkRail should record a bounded “decision trace” (why a step was selected/skipped, loop decisions, fork detection) as durable data.
|
|
504
|
+
- Dashboards and exports can surface this trace for post-mortems without requiring the agent to carry debugging internals in chat.
|
|
505
|
+
- “Cognitive audits” (subagent auditor model) are supported via built-in templates/features, not bespoke author boilerplate.
|
|
506
|
+
|
|
507
|
+
### Forced self-audit over self-reported confidence
|
|
508
|
+
|
|
509
|
+
Agents will often take the easy way out:
|
|
510
|
+
|
|
511
|
+
- assume they already have enough context
|
|
512
|
+
- assume they already understand the boundary
|
|
513
|
+
- skip challenge or audit because it "probably isn't needed"
|
|
514
|
+
|
|
515
|
+
So when a workflow needs an honest self-check, do **not** rely on vibes-only fields like:
|
|
516
|
+
|
|
517
|
+
- `stillFuzzy = true|false`
|
|
518
|
+
- `contextAuditNeeded = true|false`
|
|
519
|
+
- optional challenge wording with no rubric or trigger
|
|
520
|
+
|
|
521
|
+
Prefer patterns that force the agent to confront uncertainty:
|
|
522
|
+
|
|
523
|
+
- score concrete dimensions instead of reporting confidence directly
|
|
524
|
+
- require a short evidence statement for each score
|
|
525
|
+
- derive the next action from the rubric or trigger rules
|
|
526
|
+
|
|
527
|
+
The workflow should prove to the agent that it may not know enough yet, instead of asking the agent whether it feels confident.
|