@exaudeus/workrail 3.73.1 → 3.74.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-worktrain.js +126 -1
- package/dist/console-ui/assets/{index-txIYXGHx.js → index-CfU3va8H.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/pr-review.d.ts +11 -1
- package/dist/coordinators/types.d.ts +15 -0
- package/dist/coordinators/types.js +2 -0
- package/dist/manifest.json +81 -57
- package/dist/mcp/handlers/v2-advance-core/index.d.ts +1 -0
- package/dist/mcp/handlers/v2-advance-core/index.js +3 -3
- package/dist/mcp/handlers/v2-advance-core/outcome-success.js +4 -18
- package/dist/mcp/handlers/v2-advance-events.d.ts +1 -1
- package/dist/mcp/handlers/v2-advance-events.js +1 -1
- package/dist/mcp/handlers/v2-execution/advance.d.ts +1 -0
- package/dist/mcp/handlers/v2-execution/advance.js +3 -3
- package/dist/mcp/handlers/v2-execution/continue-advance.d.ts +1 -0
- package/dist/mcp/handlers/v2-execution/continue-advance.js +2 -1
- package/dist/mcp/handlers/v2-execution/index.js +3 -1
- package/dist/mcp/server.js +6 -4
- package/dist/mcp/types.d.ts +2 -0
- package/dist/trigger/coordinator-deps.js +203 -36
- package/dist/trigger/delivery-action.d.ts +1 -0
- package/dist/trigger/delivery-action.js +1 -1
- package/dist/trigger/delivery-pipeline.d.ts +13 -2
- package/dist/trigger/delivery-pipeline.js +58 -3
- package/dist/trigger/trigger-router.js +6 -3
- package/dist/v2/durable-core/constants.d.ts +1 -0
- package/dist/v2/durable-core/constants.js +1 -0
- package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +202 -0
- package/dist/v2/durable-core/schemas/session/events.d.ts +56 -0
- package/dist/v2/durable-core/schemas/session/events.js +8 -0
- package/dist/v2/infra/local/git-snapshot/index.d.ts +6 -0
- package/dist/v2/infra/local/git-snapshot/index.js +39 -0
- package/dist/v2/ports/git-snapshot.port.d.ts +10 -0
- package/dist/v2/ports/git-snapshot.port.js +9 -0
- package/dist/v2/projections/session-metrics.js +17 -2
- package/docs/authoring.md +23 -0
- package/docs/design/engine-boundary-discovery.md +123 -0
- package/docs/design/engine-boundary-review-findings.md +72 -0
- package/docs/ideas/backlog.md +129 -48
- package/package.json +1 -1
- package/spec/authoring-spec.json +36 -1
package/docs/ideas/backlog.md
CHANGED
|
@@ -18,62 +18,29 @@ See the scoring rubric in the "Agent-assisted backlog prioritization" entry (Wor
|
|
|
18
18
|
|
|
19
19
|
## P0 / Critical (blocks WorkTrain from working correctly)
|
|
20
20
|
|
|
21
|
-
###
|
|
22
|
-
|
|
23
|
-
**Status: idea** | Priority: high
|
|
24
|
-
|
|
25
|
-
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
26
|
-
|
|
27
|
-
This is one of the most fundamental failure modes for autonomous WorkTrain sessions and a blocker for production viability. An agent receives a task description, forms an interpretation of what's needed, and executes flawlessly against that interpretation -- but the interpretation was wrong. The code is correct for what the agent thought was asked. It is not what the user actually wanted. The user only discovers this after reviewing the PR, sometimes after it has already merged.
|
|
28
|
-
|
|
29
|
-
This is categorically different from bugs (the agent implemented the right thing incorrectly) and scope creep (the agent did extra things). This is the agent solving the wrong problem well.
|
|
30
|
-
|
|
31
|
-
**Why it's hard:** the agent's interpretation feels reasonable from the task description. The user's description was ambiguous, underspecified, or relied on context the agent didn't have. Neither party made an obvious mistake -- the gap is structural.
|
|
32
|
-
|
|
33
|
-
**Known manifestations:**
|
|
34
|
-
- Agent fixes the symptom instead of the root cause because the task description named the symptom
|
|
35
|
-
- Agent implements feature X when the user wanted feature Y that happens to use X
|
|
36
|
-
- Agent interprets "add support for Z" as extending the existing system when the user wanted a new abstraction
|
|
37
|
-
- Agent makes a local fix when the user wanted an architectural change
|
|
38
|
-
- Agent's implementation is technically correct but violates unstated invariants the user assumed were obvious
|
|
21
|
+
### wr.coding-task implementation loop does not exit when slices complete (Apr 30, 2026)
|
|
39
22
|
|
|
40
|
-
**
|
|
41
|
-
- Where in the workflow should intent validation happen? Before the agent writes any code (Phase 0), the agent should be required to state its interpretation back in plain English. The user (or a validation step) confirms or corrects it before implementation begins. But this requires a human confirmation gate -- does that break the autonomous use case?
|
|
42
|
-
- For fully autonomous sessions (no human in the loop), is there a way to detect a likely intent gap before the agent commits? Signals might include: the task description is short or vague, the agent's interpretation involves a significant architectural decision, the agent is about to delete or restructure existing code.
|
|
43
|
-
- What is the right escalation path when the agent detects ambiguity itself? Currently `report_issue` handles task obstacles; there is no structured way for the agent to surface "I am not sure I understood this correctly" before acting.
|
|
44
|
-
- The `wr.shaping` workflow exists precisely to close this gap for planned features -- the issue is urgent/reactive tasks that skip shaping entirely. How do we get intent validation without requiring a full shaping pass for every small task?
|
|
45
|
-
- Can historical session notes help? If previous sessions have established what "X" means in this codebase (design decisions, naming conventions, architectural invariants), injecting that context before Phase 0 reduces the gap. This points toward the knowledge graph and persistent project memory as partial solutions.
|
|
46
|
-
- Should WorkTrain have an explicit "confirm interpretation" step as a configurable option per trigger? A `requireIntentConfirmation: true` flag on the trigger that blocks autonomous start until the operator approves the agent's stated interpretation via the console or CLI.
|
|
23
|
+
**Status: bug** | Priority: high
|
|
47
24
|
|
|
48
|
-
|
|
25
|
+
**Score: 13** | Cor:3 Cap:1 Eff:2 Lev:2 Con:3 | Blocked: no
|
|
49
26
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
**Status: idea** | Priority: high
|
|
53
|
-
|
|
54
|
-
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
27
|
+
The `wr.coding-task` workflow's implementation loop (up to 20 passes) does not exit when a `wr.loop_control` stop artifact is emitted. The loop ran 8 passes before stopping -- not because of the artifact, but because it exhausted its slice array.
|
|
55
28
|
|
|
56
|
-
|
|
29
|
+
**Root cause (confirmed by investigation)**: `phase-6-implement-slices` is a `forEach` loop, not a `while`/`until` loop with `artifact_contract`. The `wr.loop_control` stop artifact mechanism **only works for `while`/`until` loops** that declare `conditionSource.kind = artifact_contract`. For `forEach` loops, `shouldEnterIteration` checks only `iteration < slices.length` -- artifacts passed to `interpreter.next()` are never consulted. Confirmed in `workflow-interpreter.ts:254-273` and verified by a direct test (3-slice forEach with stop artifact on every call ran all 3 iterations to completion).
|
|
57
30
|
|
|
58
|
-
|
|
31
|
+
**Why the loop stopped at pass 8**: the loop exhausted its `slices` array which had exactly 8 elements. `metrics_outcome = success` appearing at pass 8 was a coincidence.
|
|
59
32
|
|
|
60
|
-
|
|
33
|
+
**`currentSlice.name` showing `[unset]`**: secondary issue. `buildLoopRenderContext` in `prompt-renderer.ts:190-197` requires `sessionContext['slices']` to be an array at render time. If the `slices` context had not yet been projected into `sessionContext`, or if the slice objects lacked a `name` property, templates render as `[unset: currentSlice.name]`.
|
|
61
34
|
|
|
62
|
-
**
|
|
63
|
-
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
- Agent's change passes all tests but the tests don't cover the degraded behavior
|
|
67
|
-
- Agent notes a downstream impact in session notes but does not block, escalate, or file a follow-up ticket
|
|
68
|
-
- **Agent reframes a bug as "a key tradeoff to document."** This is a specific and common failure: the agent detects a real problem it caused, correctly identifies that it's a problem, and instead of filing it as a bug or escalating, reclassifies it as an "accepted design decision" or "known limitation" in documentation. The bug is real. Documenting it is not fixing it. This pattern actively buries bugs.
|
|
35
|
+
**Three fix directions:**
|
|
36
|
+
1. **Authoring fix**: change `phase-6-implement-slices` from `forEach` to a `while` with `artifact_contract` and add an explicit exit-decision step -- agents can then signal completion via `wr.loop_control`
|
|
37
|
+
2. **Engine feature**: add early-exit support to `forEach` loops when a `wr.loop_control` stop artifact is emitted
|
|
38
|
+
3. **Prompt fix**: if forEach-exhausts-all-slices is the intent, remove the instruction that tells the agent to emit `wr.loop_control` artifacts
|
|
69
39
|
|
|
70
40
|
**Things to hash out:**
|
|
71
|
-
-
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
- Test coverage is the obvious mitigation -- if Y has tests, the agent's change would fail them. But not everything has tests, and agents can rationalize skipping test runs for "unrelated" paths.
|
|
75
|
-
- Is there a way to detect likely collateral damage statically before the agent acts? A pre-commit check that measures what changed beyond the declared `filesChanged` list, for example, could surface unexpected side effects automatically.
|
|
76
|
-
- The knowledge graph and architectural invariant rules (pattern and architecture validation) are partial solutions -- they can flag when a change violates a declared constraint. But they only work for constraints that have been explicitly codified.
|
|
41
|
+
- Which fix direction is correct depends on the intended behavior: should the agent be able to stop the loop early (fix 1 or 2), or should it always run all slices (fix 3)?
|
|
42
|
+
- If fix 2 (engine feature), does early-exit from forEach affect the `currentSlice` render context in a way that could cause confusion?
|
|
43
|
+
- Does fix 1 require re-authoring the workflow through `wr.workflow-for-workflows`, or is it a targeted JSON edit?
|
|
77
44
|
|
|
78
45
|
---
|
|
79
46
|
|
|
@@ -157,9 +124,97 @@ The delivery pipeline was extracted into `delivery-pipeline.ts` with explicit st
|
|
|
157
124
|
|
|
158
125
|
## WorkTrain Daemon
|
|
159
126
|
|
|
127
|
+
### Intent gap: agent builds what it understood, not what the user meant (Apr 30, 2026)
|
|
128
|
+
|
|
129
|
+
**Status: idea** | Priority: medium
|
|
130
|
+
|
|
131
|
+
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
132
|
+
|
|
133
|
+
This is one of the most fundamental failure modes for autonomous WorkTrain sessions and a blocker for production viability. An agent receives a task description, forms an interpretation of what's needed, and executes flawlessly against that interpretation -- but the interpretation was wrong. The code is correct for what the agent thought was asked. It is not what the user actually wanted. The user only discovers this after reviewing the PR, sometimes after it has already merged.
|
|
134
|
+
|
|
135
|
+
This is categorically different from bugs (the agent implemented the right thing incorrectly) and scope creep (the agent did extra things). This is the agent solving the wrong problem well.
|
|
136
|
+
|
|
137
|
+
**Why it's hard:** the agent's interpretation feels reasonable from the task description. The user's description was ambiguous, underspecified, or relied on context the agent didn't have. Neither party made an obvious mistake -- the gap is structural.
|
|
138
|
+
|
|
139
|
+
**Known manifestations:**
|
|
140
|
+
- Agent fixes the symptom instead of the root cause because the task description named the symptom
|
|
141
|
+
- Agent implements feature X when the user wanted feature Y that happens to use X
|
|
142
|
+
- Agent interprets "add support for Z" as extending the existing system when the user wanted a new abstraction
|
|
143
|
+
- Agent makes a local fix when the user wanted an architectural change
|
|
144
|
+
- Agent's implementation is technically correct but violates unstated invariants the user assumed were obvious
|
|
145
|
+
|
|
146
|
+
**Things to hash out:**
|
|
147
|
+
- Where in the workflow should intent validation happen? Before the agent writes any code (Phase 0), the agent should be required to state its interpretation back in plain English. The user (or a validation step) confirms or corrects it before implementation begins. But this requires a human confirmation gate -- does that break the autonomous use case?
|
|
148
|
+
- For fully autonomous sessions (no human in the loop), is there a way to detect a likely intent gap before the agent commits? Signals might include: the task description is short or vague, the agent's interpretation involves a significant architectural decision, the agent is about to delete or restructure existing code.
|
|
149
|
+
- What is the right escalation path when the agent detects ambiguity itself? Currently `report_issue` handles task obstacles; there is no structured way for the agent to surface "I am not sure I understood this correctly" before acting.
|
|
150
|
+
- The `wr.shaping` workflow exists precisely to close this gap for planned features -- the issue is urgent/reactive tasks that skip shaping entirely. How do we get intent validation without requiring a full shaping pass for every small task?
|
|
151
|
+
- Can historical session notes help? If previous sessions have established what "X" means in this codebase (design decisions, naming conventions, architectural invariants), injecting that context before Phase 0 reduces the gap. This points toward the knowledge graph and persistent project memory as partial solutions.
|
|
152
|
+
- Should WorkTrain have an explicit "confirm interpretation" step as a configurable option per trigger? A `requireIntentConfirmation: true` flag on the trigger that blocks autonomous start until the operator approves the agent's stated interpretation via the console or CLI.
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
### Scope rationalization: agent silently accepts collateral damage (Apr 30, 2026)
|
|
157
|
+
|
|
158
|
+
**Status: idea** | Priority: medium
|
|
159
|
+
|
|
160
|
+
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
161
|
+
|
|
162
|
+
When an agent makes a change that breaks or degrades something outside its immediate task scope, it often recognizes the impact but rationalizes it as acceptable because "that's not in scope for this task." The reasoning feels locally valid -- the agent was asked to do X, X is done correctly, the side effect on Y is noted but deprioritized. This produces a PR that is correct for X and silently broken for Y.
|
|
163
|
+
|
|
164
|
+
This is exactly what happened with the commit SHA change: setting `agentCommitShas` to always empty correctly fixes the faked SHA bug, but degrades the console's SHA display for all sessions going forward. A scoped agent might note "this makes the console show empty SHAs" and proceed anyway because fixing the console display is "a separate ticket."
|
|
165
|
+
|
|
166
|
+
**Why this is insidious:** the agent's reasoning is locally coherent. It did not make a mistake within its scope. The problem is that autonomous agents operating in isolation cannot always see when a locally correct change has unacceptable global consequences -- and even when they can see it, they lack a good mechanism to stop, escalate, and surface the impact rather than proceeding.
|
|
167
|
+
|
|
168
|
+
**Known manifestations:**
|
|
169
|
+
- Agent correctly fixes a bug but the fix changes a public API contract, breaking callers it didn't check
|
|
170
|
+
- Agent refactors a module for clarity but silently changes behavior in an edge case it considered minor
|
|
171
|
+
- Agent adds a feature but disables or degrades an existing feature as a side effect, judging the tradeoff acceptable on its own
|
|
172
|
+
- Agent's change passes all tests but the tests don't cover the degraded behavior
|
|
173
|
+
- Agent notes a downstream impact in session notes but does not block, escalate, or file a follow-up ticket
|
|
174
|
+
- **Agent reframes a bug as "a key tradeoff to document."** This is a specific and common failure: the agent detects a real problem it caused, correctly identifies that it's a problem, and instead of filing it as a bug or escalating, reclassifies it as an "accepted design decision" or "known limitation" in documentation. The bug is real. Documenting it is not fixing it. This pattern actively buries bugs.
|
|
175
|
+
|
|
176
|
+
**Things to hash out:**
|
|
177
|
+
- How does an agent distinguish "acceptable tradeoff within scope" from "collateral damage that must be escalated"? The line is fuzzy and context-dependent. A hard rule ("never degrade existing behavior") is too strict for refactors; a soft heuristic ("if it affects other code, escalate") is too broad.
|
|
178
|
+
- Should the agent be required to enumerate side effects as part of the verification phase, and should the coordinator review that list before merging? This is the proof record concept applied to impact assessment rather than just correctness.
|
|
179
|
+
- What is the right mechanism for the agent to pause and escalate? Currently `report_issue` is for task obstacles; `signal_coordinator` is for coordinator events. There is no structured "I need a decision on whether this tradeoff is acceptable" signal.
|
|
180
|
+
- Test coverage is the obvious mitigation -- if Y has tests, the agent's change would fail them. But not everything has tests, and agents can rationalize skipping test runs for "unrelated" paths.
|
|
181
|
+
- Is there a way to detect likely collateral damage statically before the agent acts? A pre-commit check that measures what changed beyond the declared `filesChanged` list, for example, could surface unexpected side effects automatically.
|
|
182
|
+
- The knowledge graph and architectural invariant rules (pattern and architecture validation) are partial solutions -- they can flag when a change violates a declared constraint. But they only work for constraints that have been explicitly codified.
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
160
186
|
The autonomous workflow runner (`worktrain daemon`). Completely separate from the MCP server -- calls the engine directly in-process.
|
|
161
187
|
|
|
162
188
|
|
|
189
|
+
### Subagent context package: project vision and task goal baked into spawning (Apr 30, 2026)
|
|
190
|
+
|
|
191
|
+
**Status: idea** | Priority: high
|
|
192
|
+
|
|
193
|
+
**Score: 12** | Cor:2 Cap:3 Eff:2 Lev:3 Con:3 | Blocked: no
|
|
194
|
+
|
|
195
|
+
When WorkTrain spawns a subagent today, the operator (or the main agent) must manually write out all context: what the project is, what WorkTrain's vision is, what the task is trying to accomplish, what documents exist, what the end goal is. Subagents know nothing -- no conversation history, no project familiarity, no awareness of the vision. If the context briefing is thin or missing, the subagent works in the dark and produces generic output.
|
|
196
|
+
|
|
197
|
+
Two things need to be baked into the spawning infrastructure:
|
|
198
|
+
|
|
199
|
+
1. **Project-level context package**: every spawned subagent automatically receives a synthesized briefing about the WorkTrain project -- what it is, what it is trying to become, the architectural layers (daemon vs MCP server vs console), the coding philosophy, and pointers to key docs (AGENTS.md, backlog.md, relevant design docs). This should not require the spawning agent to manually write it out each time.
|
|
200
|
+
|
|
201
|
+
2. **Task-level context package**: every spawned subagent automatically receives the vision and end goal of the specific task -- not just the technical instructions, but WHY the task matters, what it enables, and how it fits into the larger picture. A subagent that understands the goal can adapt when it hits unexpected situations; one that only has instructions cannot.
|
|
202
|
+
|
|
203
|
+
This is related to the "Coordinator context injection standard" and "Context budget per spawned agent" backlog entries, but is broader -- it applies to all subagent spawning, not just coordinator-spawned child sessions.
|
|
204
|
+
|
|
205
|
+
**Critical design constraint:** WorkTrain may not always have a "main" agent assembling context dynamically. A pure coordinator pipeline is deterministic TypeScript code -- it knows the goal it was given and the results it gets back, but has no ambient understanding of the project vision and cannot synthesize what context a subagent needs at runtime. This means context packages cannot be assembled dynamically by the spawning agent; they must be **pre-built and attached as structured data**, assembled by the daemon from configured sources before the session starts. This is closer to the trigger-derived knowledge configuration idea than to runtime context assembly.
|
|
206
|
+
|
|
207
|
+
**Things to hash out:**
|
|
208
|
+
- Where does the project-level context package live and how is it kept current? A static template in `~/.workrail/daemon-soul.md` covers behavioral rules but not project vision -- these are different concerns.
|
|
209
|
+
- In a pure coordinator pipeline (no main agent), who decides what goes in the context package for each session type? Must be declared configuration, not runtime synthesis.
|
|
210
|
+
- Should context profiles be declared per workflow, per trigger type, or per session role (coding vs review vs discovery)?
|
|
211
|
+
- What is the right size for an auto-injected context package? Too small loses signal; too large crowds out the actual task prompt.
|
|
212
|
+
- Should the package be structured (JSON/YAML) for programmatic injection, or prose for human readability?
|
|
213
|
+
- How does this interact with the existing workspace context injection (CLAUDE.md, AGENTS.md, daemon-soul.md)?
|
|
214
|
+
- Whether a "main" orchestrating agent is needed at all, or whether pure coordinator scripts plus well-configured context packages are sufficient -- this is an open question that requires real pipeline testing to answer.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
163
218
|
### Agent-assisted backlog and issue enrichment (Apr 28, 2026)
|
|
164
219
|
|
|
165
220
|
**Status: idea** | Priority: medium
|
|
@@ -1336,7 +1391,7 @@ Routing by `finding.category` from `wr.review_verdict`:
|
|
|
1336
1391
|
|
|
1337
1392
|
### Workflow execution time tracking and prediction
|
|
1338
1393
|
|
|
1339
|
-
**Status:
|
|
1394
|
+
**Status: partial** | Tracking shipped; prediction/calibration layer not yet built
|
|
1340
1395
|
|
|
1341
1396
|
**Score: 11** | Cor:1 Cap:2 Eff:3 Lev:2 Con:3 | Blocked: no
|
|
1342
1397
|
|
|
@@ -1639,6 +1694,32 @@ Ghost nodes represent steps that were compiled into the DAG but skipped at runti
|
|
|
1639
1694
|
|
|
1640
1695
|
## Workflow Library
|
|
1641
1696
|
|
|
1697
|
+
### Automatic root cause analysis when MR review finds issues post-coding (Apr 30, 2026)
|
|
1698
|
+
|
|
1699
|
+
**Status: idea** | Priority: high
|
|
1700
|
+
|
|
1701
|
+
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
1702
|
+
|
|
1703
|
+
When an MR review session (run by a WorkTrain agent) finds issues in a coding session's output, WorkTrain should automatically investigate why the coding agent missed it and determine whether the workflow, the prompts, or the process can be improved.
|
|
1704
|
+
|
|
1705
|
+
**Two distinct triggers:**
|
|
1706
|
+
|
|
1707
|
+
1. **WorkTrain MR review finds something**: after a WorkTrain review session produces findings, the coordinator should automatically spawn an analysis session asking: why did the coding agent produce code with this issue? Was it a workflow gap (missing verification step, insufficient scrutiny at a phase), a prompt gap (the agent wasn't told to check this), or a context gap (the agent didn't have the information needed)?
|
|
1708
|
+
|
|
1709
|
+
2. **Human finds something post-review**: when a human reviewer comments on or requests changes to a PR that already passed WorkTrain's review, this is doubly significant -- it means both the coding agent AND the review agent missed it. WorkTrain should automatically investigate why both missed it and whether the review workflow has a systematic blind spot.
|
|
1710
|
+
|
|
1711
|
+
**Why this matters**: every finding that slips through is a signal about a workflow or process gap. Today that signal is lost. Capturing it systematically and feeding it back into workflow improvement closes the quality loop.
|
|
1712
|
+
|
|
1713
|
+
**Things to hash out:**
|
|
1714
|
+
- How does WorkTrain detect that a human has commented on a PR post-review? This requires monitoring the PR for new review activity after WorkTrain's session completed -- either webhook events or polling.
|
|
1715
|
+
- What does the analysis session actually produce? A structured finding about the gap? A concrete proposal for workflow improvement? Both?
|
|
1716
|
+
- Who reviews the analysis output before it becomes a workflow change? Auto-applying workflow changes based on analysis is risky.
|
|
1717
|
+
- How do you distinguish "the workflow is fine but this was a genuinely hard edge case" from "the workflow has a systematic gap"? A single miss doesn't prove a gap; multiple misses of the same kind do.
|
|
1718
|
+
- Should the analysis result feed directly into `workflow-effectiveness-assessment`, or is it a separate concern?
|
|
1719
|
+
- For the "coding agent missed it" case: is the right fix to change the coding workflow, or to make the review workflow more adversarial?
|
|
1720
|
+
|
|
1721
|
+
---
|
|
1722
|
+
|
|
1642
1723
|
### Workflow previewer for compiled and runtime behavior
|
|
1643
1724
|
|
|
1644
1725
|
**Status: idea** | Priority: medium
|
package/package.json
CHANGED
package/spec/authoring-spec.json
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
"version": 3,
|
|
4
4
|
"title": "WorkRail Authoring Rules",
|
|
5
5
|
"purpose": "Canonical current rules for authoring good WorkRail workflows. workflow.schema.json remains the source of truth for legal structure.",
|
|
6
|
-
"lastReviewed": "2026-04-
|
|
6
|
+
"lastReviewed": "2026-04-28",
|
|
7
7
|
"principles": [
|
|
8
8
|
"Schema defines what is valid. These rules define what is good.",
|
|
9
9
|
"Prefer current authoring rules over design rationale or historical notes.",
|
|
@@ -189,6 +189,10 @@
|
|
|
189
189
|
"id": "artifact.verification",
|
|
190
190
|
"description": "Verification or handoff artifacts"
|
|
191
191
|
},
|
|
192
|
+
{
|
|
193
|
+
"id": "artifact.coordinator-result",
|
|
194
|
+
"description": "wr.coordinator_result artifact emitted by coordinator-phase workflows to signal phase completion to the coordinator"
|
|
195
|
+
},
|
|
192
196
|
{
|
|
193
197
|
"id": "delegation.context-packet",
|
|
194
198
|
"description": "Structured context passed to subagents"
|
|
@@ -1429,6 +1433,37 @@
|
|
|
1429
1433
|
"id": "artifacts",
|
|
1430
1434
|
"title": "Artifacts and planning surfaces",
|
|
1431
1435
|
"rules": [
|
|
1436
|
+
{
|
|
1437
|
+
"id": "coordinator-result-artifact-schema",
|
|
1438
|
+
"status": "active",
|
|
1439
|
+
"level": "required",
|
|
1440
|
+
"scope": [
|
|
1441
|
+
"artifact.coordinator-result"
|
|
1442
|
+
],
|
|
1443
|
+
"rule": "When a workflow step signals coordinator phase completion, emit a `wr.coordinator_result` artifact with exactly 4 fields: `outcome` (enum: success|failed|timed_out|await_degraded), `summary` (string), `sessionId` (string), `error` (string|null). No additional fields allowed.",
|
|
1444
|
+
"why": "Coordinators read this artifact to determine whether to proceed, retry, or escalate. Extra fields pollute the schema boundary and break forward compatibility. The 4-field constraint is a hard limit, not a guideline.",
|
|
1445
|
+
"enforcement": [
|
|
1446
|
+
"advisory"
|
|
1447
|
+
],
|
|
1448
|
+
"checks": [
|
|
1449
|
+
"Exactly 4 fields present: outcome, summary, sessionId, error.",
|
|
1450
|
+
"outcome is one of: success, failed, timed_out, await_degraded.",
|
|
1451
|
+
"error is string|null -- null when outcome is success, non-null string when outcome is failed.",
|
|
1452
|
+
"No workflow-specific fields (prUrl, branchName, commitSha, etc.) in wr.coordinator_result. Those belong in workflow-specific artifacts."
|
|
1453
|
+
],
|
|
1454
|
+
"antiPatterns": [
|
|
1455
|
+
"Adding prUrl, branchName, or commitSha to wr.coordinator_result",
|
|
1456
|
+
"Using a free-form notes string instead of the typed outcome enum",
|
|
1457
|
+
"Omitting sessionId (required for coordinator tracing and console parent-child display)"
|
|
1458
|
+
],
|
|
1459
|
+
"sourceRefs": [
|
|
1460
|
+
{
|
|
1461
|
+
"kind": "runtime",
|
|
1462
|
+
"path": "src/coordinators/types.ts",
|
|
1463
|
+
"note": "ChildSessionResult discriminated union -- the runtime type that wr.coordinator_result maps to."
|
|
1464
|
+
}
|
|
1465
|
+
]
|
|
1466
|
+
},
|
|
1432
1467
|
{
|
|
1433
1468
|
"id": "artifact-canonicality",
|
|
1434
1469
|
"status": "active",
|