@exaudeus/workrail 3.44.0 → 3.45.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,172 @@
1
+ # Implementation Plan: WorkTrain Stuck-Escalation
2
+
3
+ *2026-04-19 | Pitch: .workrail/current-pitch.md*
4
+
5
+ ---
6
+
7
+ ## 1. Problem Statement
8
+
9
+ When a WorkTrain daemon session enters a `repeated_tool_call` loop, the session
10
+ currently burns turns until wall-clock or max-turn timeout. The result is
11
+ `_tag: 'timeout'`, indistinguishable from a legitimate slow session. Automated
12
+ routing is impossible without string-parsing.
13
+
14
+ ---
15
+
16
+ ## 2. Acceptance Criteria
17
+
18
+ 1. `WorkflowRunStuck` interface exported from `workflow-runner.ts` with fields:
19
+ `_tag: 'stuck'`, `workflowId`, `reason`, `message`, `stopReason`, `issueSummaries?`
20
+ 2. `WorkflowRunResult` union includes `WorkflowRunStuck`.
21
+ 3. `ChildWorkflowRunResult` union includes `WorkflowRunStuck` (SAME COMMIT as #2).
22
+ 4. `WorkflowTrigger.agentConfig` has `stuckAbortPolicy?` and `noProgressAbortEnabled?`.
23
+ 5. `TriggerDefinition.agentConfig` has the same two fields.
24
+ 6. When `repeated_tool_call` fires and `stuckAbortPolicy !== 'notify_only'`:
25
+ outbox entry written, `agent.abort()` called, `stuckReason = 'repeated_tool_call'`.
26
+ 7. When `notify_only` is set: outbox written, abort NOT called.
27
+ 8. When `noProgressAbortEnabled: true` and `no_progress` fires with `stuckAbortPolicy !== 'notify_only'`:
28
+ same abort + outbox write.
29
+ 9. Return path returns `{ _tag: 'stuck', ... }` before `timeoutReason` check.
30
+ 10. `trigger-router.ts` `route()` and `dispatch()` handle `stuck` without assertNever fallthrough.
31
+ 11. `notification-service.ts` `buildNotificationBody()` and `buildDetail()` handle `stuck`.
32
+ 12. `NotificationPayload.outcome` union includes `'stuck'`.
33
+ 13. `makeSpawnAgentTool` handles `stuck` child result, returns `outcome: 'stuck'`.
34
+ 14. All 6 test cases in `workflow-runner-stuck-escalation.test.ts` pass.
35
+ 15. `npm run build` clean. `npx vitest run` no regressions.
36
+
37
+ ---
38
+
39
+ ## 3. Non-Goals
40
+
41
+ - No `onStuck:` hook in TriggerDefinition (follow-up)
42
+ - No console live panel stuck indicator
43
+ - No `worktrain logs` formatting changes
44
+ - No automatic retry on stuck
45
+ - No Signal 5 (wall-clock at 80%) wiring
46
+ - No new heuristics beyond Signal 1 and 2
47
+ - No changes to `src/mcp/`
48
+ - No `trigger-store.ts` parser changes
49
+
50
+ ---
51
+
52
+ ## 4. Philosophy-Driven Constraints
53
+
54
+ - All new fields `readonly`
55
+ - `issueSummaries` spread to new readonly array when included in return value
56
+ - `writeStuckOutboxEntry` is fire-and-forget (void + catch)
57
+ - `stuckReason` flag: first-writer-wins (same as `timeoutReason`)
58
+ - Outbox write and abort are independent effects (write before abort gate check)
59
+
60
+ ---
61
+
62
+ ## 5. Invariants
63
+
64
+ - **I1**: `ChildWorkflowRunResult` and `WorkflowRunResult` updates ship in the same commit.
65
+ - **I2**: `stuckReason` is checked BEFORE `timeoutReason` in the return path.
66
+ - **I3**: Outbox write fires regardless of `stuckAbortPolicy`.
67
+ - **I4**: `no_progress` never aborts unless `noProgressAbortEnabled: true`.
68
+ - **I5**: `repeated_tool_call` abort fires on the same turn as detection.
69
+ - **I6**: First writer wins on `stuckReason` (guard: `stuckReason === null && timeoutReason === null`).
70
+
71
+ ---
72
+
73
+ ## 6. Selected Approach
74
+
75
+ New `_tag: 'stuck'` discriminated union variant. Wire abort in `turn_end` subscriber
76
+ after Signal 1 and Signal 2 emitter calls. Return stuck result before `timeoutReason`
77
+ check. Update both union types atomically. Add `writeStuckOutboxEntry` module-level
78
+ helper. Propagate to trigger-router, notification-service, makeSpawnAgentTool.
79
+
80
+ **Runner-up rejected**: Extend `WorkflowRunTimeout.reason` -- violates make-illegal-states-unrepresentable.
81
+
82
+ ---
83
+
84
+ ## 7. Vertical Slices
85
+
86
+ ### Slice 1: Core types (workflow-runner.ts)
87
+ - Add `WorkflowRunStuck` interface after `WorkflowRunTimeout`
88
+ - Add to `WorkflowRunResult` union
89
+ - Add to `ChildWorkflowRunResult` union (ATOMIC with above)
90
+ - Add `stuckAbortPolicy?` and `noProgressAbortEnabled?` to `WorkflowTrigger.agentConfig`
91
+ - **Done when**: `npm run build` clean after this slice
92
+
93
+ ### Slice 2: TriggerDefinition.agentConfig (types.ts)
94
+ - Add `stuckAbortPolicy?` and `noProgressAbortEnabled?` after `maxTurns`
95
+ - **Done when**: `npm run build` clean
96
+
97
+ ### Slice 3: Runtime wiring (workflow-runner.ts)
98
+ - Add `sessionStartMs` constant after `maxTurns` resolution
99
+ - Add `stuckReason` flag after `timeoutReason` flag
100
+ - Add `writeStuckOutboxEntry` module-level helper
101
+ - Wire abort after Signal 1 emitter call in `turn_end`
102
+ - Wire abort after Signal 2 emitter call in `turn_end`
103
+ - Add stuck return path before `timeoutReason` check
104
+ - Update `makeSpawnAgentTool` resultObj type + add `stuck` arm before `assertNever`
105
+ - **Done when**: `npm run build` clean
106
+
107
+ ### Slice 4: Caller propagation (trigger-router.ts, notification-service.ts)
108
+ - Add `stuck` arm in `route()` exhaustive chain
109
+ - Add `stuck` arm in `dispatch()` exhaustive chain
110
+ - Add `'stuck'` to `NotificationPayload.outcome` union
111
+ - Add `stuck` case in `buildNotificationBody()`
112
+ - Add `stuck` case in `buildDetail()`
113
+ - **Done when**: `npm run build` clean
114
+
115
+ ### Slice 5: Tests
116
+ - Write `tests/unit/workflow-runner-stuck-escalation.test.ts` with 6 test cases
117
+ - **Done when**: all 6 tests pass, no regressions
118
+
119
+ ---
120
+
121
+ ## 8. Test Design
122
+
123
+ File: `tests/unit/workflow-runner-stuck-escalation.test.ts`
124
+
125
+ Pattern: replicate turn_end subscriber logic (same as workflow-runner-stuck-detection.test.ts).
126
+
127
+ **Test 1**: `stuckAbortPolicy: 'abort'` default -- repeated_tool_call fires, stuckReason set, abort called, would return _tag:'stuck'
128
+ **Test 2**: `stuckAbortPolicy: 'notify_only'` -- abort NOT called, emitter still fires
129
+ **Test 3**: `noProgressAbortEnabled: false` default -- no_progress does NOT set stuckReason
130
+ **Test 4**: `noProgressAbortEnabled: true` -- no_progress sets stuckReason = 'no_progress', abort called
131
+ **Test 5**: Compile-time assignability test: `WorkflowRunStuck` is assignable to `ChildWorkflowRunResult`
132
+ **Test 6**: trigger-router exhaustive switch handles 'stuck' (import trigger-router, verify no assertNever path hit)
133
+
134
+ ---
135
+
136
+ ## 9. Risk Register
137
+
138
+ | Risk | Likelihood | Severity | Mitigation |
139
+ |------|------------|----------|------------|
140
+ | ChildWorkflowRunResult not updated atomically | Low | High | Single-PR, Slice 1 includes both updates, Test 5 catches gap |
141
+ | NotificationPayload.outcome union gap | Low | Medium | Slice 4 adds 'stuck'; build catches it |
142
+ | stuckReason/timeoutReason race | Low | Low | Guard condition (both null check) |
143
+ | writeStuckOutboxEntry silent failure | Low | Low | Fire-and-forget with console.warn |
144
+
145
+ ---
146
+
147
+ ## 10. PR Packaging Strategy
148
+
149
+ Single PR: `feat/stuck-escalation`
150
+ Single atomic commit with all 4 source files + test file.
151
+ PR title: `feat(daemon): WorkflowRunStuck result variant with abort and outbox notification`
152
+
153
+ ---
154
+
155
+ ## 11. Philosophy Alignment Per Slice
156
+
157
+ | Slice | Principle | Status |
158
+ |-------|-----------|--------|
159
+ | Slice 1 (types) | Make illegal states unrepresentable | Satisfied |
160
+ | Slice 1 (types) | Exhaustiveness everywhere | Satisfied |
161
+ | Slice 1 (types) | Type safety as first line of defense | Satisfied (ChildWorkflowRunResult updated) |
162
+ | Slice 3 (runtime) | Errors are data | Satisfied |
163
+ | Slice 3 (runtime) | Determinism over cleverness | Satisfied (simple flag) |
164
+ | Slice 3 (runtime) | Fire-and-forget side effects | Satisfied (outbox write) |
165
+ | Slice 4 (callers) | Exhaustiveness everywhere | Satisfied (all assertNever guards updated) |
166
+ | Slice 5 (tests) | Prefer fakes over mocks | Satisfied (replicate subscriber logic, not vi.mock) |
167
+
168
+ ---
169
+
170
+ **unresolvedUnknownCount**: 0
171
+ **planConfidenceBand**: High
172
+ **estimatedPRCount**: 1
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.44.0",
3
+ "version": "3.45.0",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {