@exaudeus/workrail 3.41.0 → 3.43.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-worktrain.js +40 -11
- package/dist/console-ui/assets/{index-CQt4UhPB.js → index-Sb57DW4B.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/context-assembly/deps.d.ts +8 -0
- package/dist/context-assembly/deps.js +2 -0
- package/dist/context-assembly/index.d.ts +6 -0
- package/dist/context-assembly/index.js +50 -0
- package/dist/context-assembly/infra.d.ts +3 -0
- package/dist/context-assembly/infra.js +154 -0
- package/dist/context-assembly/types.d.ts +30 -0
- package/dist/context-assembly/types.js +2 -0
- package/dist/coordinators/pr-review.d.ts +3 -1
- package/dist/coordinators/pr-review.js +25 -4
- package/dist/daemon/workflow-runner.d.ts +11 -1
- package/dist/daemon/workflow-runner.js +82 -9
- package/dist/domain/execution/state.d.ts +6 -6
- package/dist/manifest.json +76 -44
- package/dist/mcp/handlers/v2-workflow.d.ts +2 -2
- package/dist/mcp/output-schemas.d.ts +234 -234
- package/dist/mcp/tools.d.ts +2 -2
- package/dist/mcp/v2/tools.d.ts +24 -24
- package/dist/trigger/delivery-action.d.ts +2 -0
- package/dist/trigger/delivery-action.js +24 -0
- package/dist/trigger/trigger-router.js +24 -1
- package/dist/trigger/trigger-store.js +42 -0
- package/dist/trigger/types.d.ts +3 -0
- package/dist/v2/durable-core/schemas/artifacts/assessment.d.ts +2 -2
- package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.d.ts +2 -2
- package/dist/v2/durable-core/schemas/artifacts/loop-control.d.ts +6 -6
- package/dist/v2/durable-core/schemas/artifacts/review-verdict.d.ts +6 -6
- package/dist/v2/durable-core/schemas/compiled-workflow/index.d.ts +56 -56
- package/dist/v2/durable-core/schemas/execution-snapshot/blocked-snapshot.d.ts +83 -83
- package/dist/v2/durable-core/schemas/execution-snapshot/execution-snapshot.v1.d.ts +1024 -1024
- package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +2336 -2336
- package/dist/v2/durable-core/schemas/session/dag-topology.d.ts +6 -6
- package/dist/v2/durable-core/schemas/session/events.d.ts +339 -339
- package/dist/v2/durable-core/schemas/session/gaps.d.ts +30 -30
- package/dist/v2/durable-core/schemas/session/manifest.d.ts +6 -6
- package/dist/v2/durable-core/schemas/session/outputs.d.ts +8 -8
- package/dist/v2/durable-core/schemas/session/validation-event.d.ts +3 -3
- package/docs/design/adaptive-coordinator-context-candidates.md +265 -0
- package/docs/design/adaptive-coordinator-context-review.md +101 -0
- package/docs/design/adaptive-coordinator-context.md +504 -0
- package/docs/design/adaptive-coordinator-routing-candidates.md +340 -0
- package/docs/design/adaptive-coordinator-routing-design-review.md +135 -0
- package/docs/design/adaptive-coordinator-routing-review.md +156 -0
- package/docs/design/adaptive-coordinator-routing.md +660 -0
- package/docs/design/context-assembly-design-candidates.md +199 -0
- package/docs/design/context-assembly-implementation-plan.md +211 -0
- package/docs/design/context-assembly-layer-design-review.md +110 -0
- package/docs/design/context-assembly-layer.md +622 -0
- package/docs/design/context-assembly-review-findings.md +112 -0
- package/docs/design/stuck-escalation-candidates.md +176 -0
- package/docs/design/stuck-escalation-design-review.md +70 -0
- package/docs/design/stuck-escalation.md +326 -0
- package/docs/design/worktrain-task-queue-candidates.md +252 -0
- package/docs/design/worktrain-task-queue-design-review.md +109 -0
- package/docs/design/worktrain-task-queue.md +443 -0
- package/docs/design/worktree-review-findings-candidates.md +101 -0
- package/docs/design/worktree-review-findings-design-review.md +65 -0
- package/docs/design/worktree-review-findings-implementation-plan.md +153 -0
- package/docs/ideas/backlog.md +212 -0
- package/package.json +3 -3
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# Design Review Findings: Context Assembly Layer v1
|
|
2
|
+
|
|
3
|
+
*Generated during coding-task-workflow-agentic session | 2026-04-19*
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Tradeoff Review
|
|
8
|
+
|
|
9
|
+
| Tradeoff | Verdict | Condition for Failure |
|
|
10
|
+
|---|---|---|
|
|
11
|
+
| FileSystemPortV2 write stubs | RESOLVED -- switching to direct SessionEventLogReadonlyStorePortV2 adapter | N/A |
|
|
12
|
+
| Inline DI wiring in infra.ts | Acceptable -- explicitly endorsed by pitch R1 mitigation | N/A |
|
|
13
|
+
| Global session list (no workspace filtering) | Accepted -- v2 improvement documented in pitch | Multiple simultaneous workspaces with unrelated sessions |
|
|
14
|
+
| Promise.all partial failure | Adequately handled -- each source returns Result, try/catch in infra | N/A |
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Failure Mode Review
|
|
19
|
+
|
|
20
|
+
| Failure Mode | Covered By | Residual Risk |
|
|
21
|
+
|---|---|---|
|
|
22
|
+
| Write stubs called | Outer try/catch in createListRecentSessions | Low -- FM1 resolved by switching adapter |
|
|
23
|
+
| Git commands fail | execGit/execGh return err(), fallback chain | None |
|
|
24
|
+
| Sessions dir missing | LocalDirectoryListingV2 returns [] for FS_NOT_FOUND | None |
|
|
25
|
+
| Corrupt session data | LocalSessionSummaryProviderV2 graceful skip | None |
|
|
26
|
+
| spawnSession backward compat | Optional 4th arg | None |
|
|
27
|
+
| contextAssembler not provided | Optional field + if-guard in coordinator | None |
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Runner-Up / Simpler Alternative Review
|
|
32
|
+
|
|
33
|
+
**Runner-up** (Candidate B: raw fs.promises) has no elements worth borrowing into the
|
|
34
|
+
selected design.
|
|
35
|
+
|
|
36
|
+
**Simpler variant identified**: Use a direct `SessionEventLogReadonlyStorePortV2`
|
|
37
|
+
adapter in `infra.ts` instead of `LocalSessionEventLogStoreV2`. This eliminates all
|
|
38
|
+
`FileSystemPortV2` and `Sha256PortV2` stubs while still reusing `LocalSessionSummaryProviderV2`
|
|
39
|
+
and the full projection pipeline. The adapter implements only `load()` and
|
|
40
|
+
`loadValidatedPrefix()` using `fs.promises` + existing Zod schemas (`ManifestRecordV1Schema`,
|
|
41
|
+
`DomainEventV1Schema`). Approximately 40 lines. This is strictly cleaner.
|
|
42
|
+
|
|
43
|
+
**Recommendation**: Use the simpler `SessionEventLogReadonlyStorePortV2` adapter approach.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Philosophy Alignment
|
|
48
|
+
|
|
49
|
+
| Principle | Alignment |
|
|
50
|
+
|---|---|
|
|
51
|
+
| Errors are data | STRONG -- Result<T,E> throughout |
|
|
52
|
+
| DI for boundaries | STRONG -- ContextAssemblerDeps, inline construction accepted for separate process |
|
|
53
|
+
| Prefer fakes over mocks | STRONG -- test spec uses fake deps |
|
|
54
|
+
| Immutability by default | STRONG -- readonly everywhere |
|
|
55
|
+
| Make illegal states unrepresentable | STRONG -- AssemblyTask discriminated union |
|
|
56
|
+
| Functional/declarative | STRONG -- factory functions, pure renderContextBundle |
|
|
57
|
+
| YAGNI | MILD TENSION -- empty RenderOpts interface (accepted, documented) |
|
|
58
|
+
| Architectural fixes over patches | STRONG with adapter approach |
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Findings
|
|
63
|
+
|
|
64
|
+
### YELLOW: Empty RenderOpts interface
|
|
65
|
+
|
|
66
|
+
**Severity**: Yellow (minor)
|
|
67
|
+
|
|
68
|
+
**Finding**: `RenderOpts` is an empty interface in v1. TypeScript `{}` and `interface RenderOpts {}` are equivalent. The empty interface is only useful as a future extension point.
|
|
69
|
+
|
|
70
|
+
**Assessment**: Acceptable -- the pitch explicitly documents this as an intentional v1 placeholder. No action needed.
|
|
71
|
+
|
|
72
|
+
### YELLOW: Global session filtering (no workspace scope)
|
|
73
|
+
|
|
74
|
+
**Severity**: Yellow (minor, known)
|
|
75
|
+
|
|
76
|
+
**Finding**: `listRecentSessions` returns the 3 most recent sessions globally, not scoped to the coordinator's workspace path. For users running multiple workspaces, prior notes from unrelated workspaces may appear.
|
|
77
|
+
|
|
78
|
+
**Assessment**: Acceptable for v1 -- pitch explicitly documents this as a known limitation. The `_workspacePath` parameter should be named with a leading `_` to signal intentional non-use.
|
|
79
|
+
|
|
80
|
+
### GREEN: All other design elements
|
|
81
|
+
|
|
82
|
+
Tradeoffs, failure modes, and philosophy alignment are all sound. No RED or ORANGE findings.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Recommended Revisions
|
|
87
|
+
|
|
88
|
+
1. **Use direct `SessionEventLogReadonlyStorePortV2` adapter** in `infra.ts` instead of
|
|
89
|
+
`LocalSessionEventLogStoreV2`. Eliminates all write stubs. Implements `load()` using
|
|
90
|
+
`fs.promises.readFile` + `ManifestRecordV1Schema` + `DomainEventV1Schema` parsing.
|
|
91
|
+
Approximately 40 lines.
|
|
92
|
+
|
|
93
|
+
2. **Name unused parameter** `_workspacePath` in `createListRecentSessions` to signal
|
|
94
|
+
intentional non-use.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Residual Concerns
|
|
99
|
+
|
|
100
|
+
1. **Import path for Zod schemas**: `ManifestRecordV1Schema` and `DomainEventV1Schema`
|
|
101
|
+
must be importable from `src/v2/durable-core/schemas/session/index.ts`. Verify before
|
|
102
|
+
implementing that these schemas are exported.
|
|
103
|
+
|
|
104
|
+
2. **`asSessionId` usage**: The `SessionEventLogReadonlyStorePortV2.load()` adapter
|
|
105
|
+
receives a `SessionId` branded type. When reading session directories and mapping
|
|
106
|
+
entry names to `SessionId`, the `asSessionId` import from
|
|
107
|
+
`src/v2/durable-core/ids/index.ts` must be used correctly.
|
|
108
|
+
|
|
109
|
+
3. **Neverthrow ResultAsync wrapping**: The adapter must return neverthrow `ResultAsync`
|
|
110
|
+
objects (using `okAsync`/`errAsync` from neverthrow), not the custom `Result` type.
|
|
111
|
+
The bridge from neverthrow to custom Result happens in the `createListRecentSessions`
|
|
112
|
+
function via `const result = await resultAsync; if (result.isErr()) return err(...)`.
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# Design Candidates: Stuck Escalation for Overnight-Autonomous WorkTrain Sessions
|
|
2
|
+
|
|
3
|
+
> Raw investigative material for main-agent review. Not a final decision.
|
|
4
|
+
|
|
5
|
+
## Problem Understanding
|
|
6
|
+
|
|
7
|
+
### Core Tensions
|
|
8
|
+
|
|
9
|
+
1. **Early certainty vs. false positives.** Aborting at threshold 3 for `repeated_tool_call` saves up to 27 minutes of a 30-minute session. But a legitimate retry loop (transient network error, idempotent file read called 3x) also triggers at 3. Higher threshold = fewer false positives but less wall-clock savings; lower = more savings but more false positives.
|
|
10
|
+
|
|
11
|
+
2. **Structural correctness vs. maintenance surface.** Adding `_tag: 'stuck'` to `WorkflowRunResult` is structurally clean (make illegal states unrepresentable) but widens the maintenance surface: every switch on the union must be updated. The naive alternative (add `reason: 'stuck_loop'` to `WorkflowRunTimeout`) avoids the new variant but conflates stuck abort with wall-clock timeout -- these have categorically different implications for retry logic.
|
|
12
|
+
|
|
13
|
+
3. **Abort power vs. future recoverability.** `agent.abort()` is terminal. A `steer()` injection could warn the agent and let it self-correct. Aborting closes the door to LLM-driven self-recovery. For overnight-autonomous use, abort is deterministic and saves resources. For supervised use, steer-and-warn might be preferred.
|
|
14
|
+
|
|
15
|
+
4. **Outbox write timing vs. fire-and-forget contract.** The outbox write must happen as close as possible to the abort moment. But `turn_end` is synchronous, and any blocking `await` would stall the abort path. Resolution: initiate outbox write as a detached fire-and-forget Promise in `turn_end`, same contract as `DaemonEventEmitter.emit()`.
|
|
16
|
+
|
|
17
|
+
### Likely Seam
|
|
18
|
+
|
|
19
|
+
`turn_end` subscriber in `workflow-runner.ts`. Confirmed -- not just where the symptom appears, but where all relevant state exists (`turnCount`, `stepAdvanceCount`, `lastNToolCalls`, `timeoutReason`). The `max_turns` abort at lines 3088-3104 is the exact template: set closure variable, emit event, call `agent.abort()`, return.
|
|
20
|
+
|
|
21
|
+
### What Makes This Hard
|
|
22
|
+
|
|
23
|
+
1. `ChildWorkflowRunResult` type alias (line 396) -- must be updated alongside `WorkflowRunResult`. If missed, the cast at line 2014 silently hides the new variant from `makeSpawnAgentTool`'s switch, producing a runtime `assertNever` error in production.
|
|
24
|
+
|
|
25
|
+
2. The fire-and-forget contract -- any `await` in the `turn_end` subscriber blocks the abort path. The outbox write must be a detached Promise.
|
|
26
|
+
|
|
27
|
+
3. Double-emit of `timeout_imminent` -- the max_turns path emits it AND the `timeoutReason !== null` check at line 3157 would also emit it. The design must not add a third abort here.
|
|
28
|
+
|
|
29
|
+
4. `maybeRunDelivery` gate in TriggerRouter -- must exclude `stuck` results from autoCommit delivery (there is no successful output to commit).
|
|
30
|
+
|
|
31
|
+
## Philosophy Constraints
|
|
32
|
+
|
|
33
|
+
From `CLAUDE.md` and codebase patterns:
|
|
34
|
+
|
|
35
|
+
- **Make illegal states unrepresentable** -- stuck and timeout are categorically different; a new discriminant is required.
|
|
36
|
+
- **Exhaustiveness everywhere** -- all `assertNever` guards must be updated when the union grows.
|
|
37
|
+
- **Errors are data** -- `WorkflowRunStuck` as a result value, not an exception.
|
|
38
|
+
- **Fire-and-forget observability** -- `DaemonEventEmitter.emit()` and `NotificationService.notify()` both return void and swallow errors. Outbox write must follow this contract.
|
|
39
|
+
- **YAGNI with discipline** -- do not add `issue_reported severity=fatal` abort without production evidence.
|
|
40
|
+
- **Pure functions for message building** -- `buildNotificationBody`, `buildOutcome`, `buildDetail` are pure switch-dispatch functions; new cases extend them cleanly.
|
|
41
|
+
- **WHY comments** -- every non-obvious decision must have an inline rationale comment.
|
|
42
|
+
|
|
43
|
+
No conflicts between stated philosophy and repo patterns.
|
|
44
|
+
|
|
45
|
+
## Impact Surface
|
|
46
|
+
|
|
47
|
+
Changes required beyond the immediate task if `WorkflowRunResult` is widened:
|
|
48
|
+
|
|
49
|
+
| Location | File | Required Change |
|
|
50
|
+
|---|---|---|
|
|
51
|
+
| `WorkflowRunResult` type alias | `workflow-runner.ts` | Add `WorkflowRunStuck` variant |
|
|
52
|
+
| `ChildWorkflowRunResult` type alias | `workflow-runner.ts` | Add `WorkflowRunStuck` variant |
|
|
53
|
+
| `makeSpawnAgentTool` switch | `workflow-runner.ts` | Add `stuck` case (assertNever guard) |
|
|
54
|
+
| `turn_end` subscriber | `workflow-runner.ts` | Add abort logic for `repeated_tool_call` and `no_progress` |
|
|
55
|
+
| `runWorkflow()` catch block | `workflow-runner.ts` | Add branch: if `stuckContext !== null` return `WorkflowRunStuck` |
|
|
56
|
+
| `TriggerRouter.route()` | `trigger-router.ts` | Add `stuck` log branch before `assertNever` |
|
|
57
|
+
| `TriggerRouter.dispatch()` | `trigger-router.ts` | Add `stuck` log branch before `assertNever` |
|
|
58
|
+
| `maybeRunDelivery` gate | `trigger-router.ts` | Exclude `stuck` from delivery |
|
|
59
|
+
| `buildNotificationBody` | `notification-service.ts` | Add `stuck` case |
|
|
60
|
+
| `buildOutcome` | `notification-service.ts` | Add `stuck` to return type |
|
|
61
|
+
| `buildDetail` | `notification-service.ts` | Add `stuck` case |
|
|
62
|
+
| `NotificationPayload.outcome` | `notification-service.ts` | Add `'stuck'` to union |
|
|
63
|
+
| `TriggerDefinition.agentConfig` | `types.ts` | Add `stuckAbortPolicy?: 'abort' | 'notify_only'` |
|
|
64
|
+
|
|
65
|
+
## Candidates
|
|
66
|
+
|
|
67
|
+
### Candidate A: Minimal -- Abort with existing result types
|
|
68
|
+
|
|
69
|
+
**Summary:** Wire `agent.abort()` after `repeated_tool_call` emit, add `reason: 'stuck_loop'` to `WorkflowRunTimeout` (new string value), skip outbox write and notification extension.
|
|
70
|
+
|
|
71
|
+
**Tensions resolved:** Maintenance surface (2 files changed).
|
|
72
|
+
|
|
73
|
+
**Tensions accepted:** Structural correctness (conflates stuck with wall-clock timeout), coordinator readiness (no toolName/argsSummary in result), notification distinctness (NotificationService cannot distinguish stuck from timeout).
|
|
74
|
+
|
|
75
|
+
**Boundary solved at:** `turn_end` subscriber + `WorkflowRunTimeout` extension. This is a symptom-level fix -- it stops the waste but provides no diagnostic value.
|
|
76
|
+
|
|
77
|
+
**Failure mode:** A coordinator script reading `result._tag === 'timeout'` has no way to distinguish stuck abort from wall-clock timeout without parsing `result.reason`. This is string parsing -- exactly what discriminated unions are designed to prevent.
|
|
78
|
+
|
|
79
|
+
**Repo pattern relationship:** Adapts max_turns abort template. Does NOT follow the `WorkflowRunTimeout` vs. `WorkflowRunError` precedent of 'one variant per categorically distinct outcome.'
|
|
80
|
+
|
|
81
|
+
**Gains:** 2 files changed. Minimal assertNever surface.
|
|
82
|
+
|
|
83
|
+
**Gives up:** Semantic precision (stuck != timeout), coordinator readiness, notification distinctness.
|
|
84
|
+
|
|
85
|
+
**Scope judgment:** Too narrow. Violates 'make illegal states unrepresentable.'
|
|
86
|
+
|
|
87
|
+
**Philosophy:** Honors YAGNI. Conflicts with 'make illegal states unrepresentable', 'exhaustiveness everywhere', 'errors are data.'
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
### Candidate B: Full -- New `WorkflowRunStuck` variant + outbox + notification (RECOMMENDED)
|
|
92
|
+
|
|
93
|
+
**Summary:** Add `WorkflowRunStuck` to `WorkflowRunResult` and `ChildWorkflowRunResult`; abort on `repeated_tool_call` and `no_progress`; write a 10-field outbox entry as a fire-and-forget Promise; extend `NotificationService` with a `stuck` case; add `stuckAbortPolicy: 'abort' | 'notify_only'` to `WorkflowTrigger.agentConfig`.
|
|
94
|
+
|
|
95
|
+
**Tensions resolved:** All four. Structural correctness (new variant), coordinator readiness (toolName/argsSummary/turnCount/stepAdvanceCount on the result), notification distinctness (new message body), policy expressiveness (stuckAbortPolicy with abort default).
|
|
96
|
+
|
|
97
|
+
**Tensions accepted:** Maintenance surface (5 files, all switch statements widened), policy granularity (no per-signal policy within a trigger -- only per-trigger).
|
|
98
|
+
|
|
99
|
+
**Boundary solved at:**
|
|
100
|
+
- `turn_end` subscriber: abort + fire-and-forget outbox write
|
|
101
|
+
- `runWorkflow()` catch block: construct `WorkflowRunStuck` from `stuckContext` closure variable
|
|
102
|
+
- `TriggerRouter.route()`/`dispatch()`: log and skip delivery
|
|
103
|
+
- `NotificationService.notify()`: new message body
|
|
104
|
+
|
|
105
|
+
**Failure mode:** The outbox write initiates in `turn_end` as a detached Promise but the `WorkflowRunStuck` result is returned synchronously from the catch block. The outbox write may complete AFTER the result reaches TriggerRouter. Acceptable (outbox is diagnostic, not delivery) but must be documented.
|
|
106
|
+
|
|
107
|
+
**Repo pattern relationship:** Follows the `WorkflowRunTimeout` precedent exactly. Uses `DaemonEventEmitter` fire-and-forget pattern for outbox write. Uses max_turns abort template. Extends `NotificationService` pure-function pattern.
|
|
108
|
+
|
|
109
|
+
**Gains:** Full semantic precision, coordinator-ready structured data, notification distinctness, type-safe exhaustiveness.
|
|
110
|
+
|
|
111
|
+
**Gives up:** 5-file maintenance surface, `ChildWorkflowRunResult` and `makeSpawnAgentTool` must be updated.
|
|
112
|
+
|
|
113
|
+
**Scope judgment:** Best-fit. Directly addresses all 4 decision criteria. No speculative abstractions.
|
|
114
|
+
|
|
115
|
+
**Philosophy:** Honors all core principles. Minor YAGNI pressure vs. Candidate A, but the added files are necessary consequences of correct union design.
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
### Candidate C: Extended -- Candidate B + `issue_reported severity=fatal` abort trigger
|
|
120
|
+
|
|
121
|
+
**Summary:** All of Candidate B, plus: abort when the `onIssueSummary` callback receives `severity: 'fatal'`, implemented via a `fatalIssueAbortPending` closure flag set by the callback and checked/cleared in `turn_end`. Adds `stuckReason: 'fatal_issue_report'` to `WorkflowRunStuck` and an optional `issueSummary` field.
|
|
122
|
+
|
|
123
|
+
**Tensions resolved:** All of Candidate B's tensions, plus the primary framing risk (agent self-report is more reliable than heuristics per session ea2de6e5).
|
|
124
|
+
|
|
125
|
+
**Tensions accepted:** Higher implementation complexity, one-turn abort latency for fatal issues (the flag is checked in `turn_end`, not inline in the callback).
|
|
126
|
+
|
|
127
|
+
**Boundary solved at:** All of Candidate B's boundaries, plus `onIssueSummary` callback wiring.
|
|
128
|
+
|
|
129
|
+
**Failure mode:** One-turn latency -- the abort fires on the turn AFTER `report_issue` calls, not immediately. For a fatal issue, one extra LLM turn is acceptable but must be documented.
|
|
130
|
+
|
|
131
|
+
**Repo pattern relationship:** Extends Candidate B. Adapts the existing `onIssueSummary` callback infrastructure.
|
|
132
|
+
|
|
133
|
+
**Gains:** Catches the most reliable real-world stuck signal. Directly addresses primary framing risk.
|
|
134
|
+
|
|
135
|
+
**Gives up:** Higher initial complexity. `stuckReason` union grows to 3 values. Requires production evidence to justify over Candidate B.
|
|
136
|
+
|
|
137
|
+
**Scope judgment:** Slightly broad for the initial design. The primary use case (blind tool loop) is covered by Candidate B. Candidate C is the correct Phase 2 extension.
|
|
138
|
+
|
|
139
|
+
**Philosophy:** Fully honors all principles. Marginal YAGNI pressure vs. Candidate B -- grounded in real log evidence (session ea2de6e5) but requires more than one data point to justify the added complexity upfront.
|
|
140
|
+
|
|
141
|
+
## Comparison and Recommendation
|
|
142
|
+
|
|
143
|
+
| Criterion | A | B | C |
|
|
144
|
+
|---|---|---|---|
|
|
145
|
+
| Structural correctness | FAIL | PASS | PASS |
|
|
146
|
+
| Coordinator readiness | FAIL | PASS | PASS |
|
|
147
|
+
| Notification distinctness | FAIL | PASS | PASS |
|
|
148
|
+
| Policy expressiveness | PARTIAL | PASS | PASS |
|
|
149
|
+
| Maintenance surface | Best | Medium | Highest |
|
|
150
|
+
| Covers primary framing risk | No | No | Yes |
|
|
151
|
+
| YAGNI compliance | Best | Good | Marginal |
|
|
152
|
+
| Reversibility | Hard | Easy | Easy |
|
|
153
|
+
|
|
154
|
+
**Recommendation: Candidate B.**
|
|
155
|
+
|
|
156
|
+
Reasoning: Structural correctness is non-negotiable (CLAUDE.md: 'make illegal states unrepresentable'). Candidate A fails this criterion regardless of its maintenance advantage. Candidate C is architecturally correct but the `issue_reported severity=fatal` trigger requires production evidence beyond session ea2de6e5. The 5-file surface of Candidate B is manageable because all changes are additive switch-case additions, and TypeScript exhaustiveness enforcement catches any missed location at compile time.
|
|
157
|
+
|
|
158
|
+
## Self-Critique
|
|
159
|
+
|
|
160
|
+
**Strongest counter-argument against Candidate B:** The `repeated_tool_call` heuristic may have an unacceptable false-positive rate in production. If it aborts legitimate sessions frequently, operators will set `stuckAbortPolicy: 'notify_only'` everywhere, negating the feature. A minimal Candidate A approach would have caused less collateral damage in this scenario.
|
|
161
|
+
|
|
162
|
+
**Response:** The `stuckAbortPolicy: 'notify_only'` escape hatch directly addresses this. The structural correctness argument still stands -- conflating stuck with timeout is a design debt that compounds over time.
|
|
163
|
+
|
|
164
|
+
**Pivot to Candidate A:** If there is a hard constraint against widening `WorkflowRunResult` (e.g., a serialization layer or cross-process protocol that can't handle new variants). No such constraint exists.
|
|
165
|
+
|
|
166
|
+
**Pivot to Candidate C:** If production logs show `repeated_tool_call` false-positive rate exceeds 20% while `issue_reported severity=fatal` false-positive rate is under 5%.
|
|
167
|
+
|
|
168
|
+
## Open Questions for the Main Agent
|
|
169
|
+
|
|
170
|
+
1. **Verify abort propagation:** Does `agent.abort()` called in `turn_end` correctly propagate to the `runWorkflow()` catch block without clearing closure state? The `stuckContext` variable must be readable in the catch block after abort. Confirm by reading `AgentLoop.abort()` implementation.
|
|
171
|
+
|
|
172
|
+
2. **`sessionStartMs` presence:** Is `const sessionStartMs = Date.now()` already set before the agent loop in `runWorkflow()`? If not, it must be added to support `elapsedMs` in the outbox entry.
|
|
173
|
+
|
|
174
|
+
3. **`no_progress` false-positive rate:** The 80%-turns threshold fires even on a legitimate research session. Should the initial design wire `no_progress` abort, or start with only `repeated_tool_call` abort and add `no_progress` in a follow-on?
|
|
175
|
+
|
|
176
|
+
4. **`stuckAbortPolicy` placement:** Should the policy live in `WorkflowTrigger.agentConfig` (as proposed), or in a separate top-level `TriggerDefinition.stuckPolicy` field to distinguish session-behavior knobs from routing knobs?
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
# Design Review: Stuck Escalation for Overnight-Autonomous WorkTrain Sessions
|
|
2
|
+
|
|
3
|
+
> Findings from adversarial review of the selected direction (Candidate B, adjusted).
|
|
4
|
+
|
|
5
|
+
## Tradeoff Review
|
|
6
|
+
|
|
7
|
+
| Tradeoff | Verdict | Condition That Invalidates It |
|
|
8
|
+
|---|---|---|
|
|
9
|
+
| 5-file maintenance surface for type safety | ACCEPTABLE | Union grows to 20+ consumers without a code-gen layer |
|
|
10
|
+
| `sessionStartMs` must be added | TRIVIALLY ACCEPTABLE | None |
|
|
11
|
+
| `no_progress` abort gated by `noProgressAbortEnabled` (default false) | ACCEPTABLE | Production evidence shows `no_progress` is the dominant failure mode |
|
|
12
|
+
|
|
13
|
+
## Failure Mode Review
|
|
14
|
+
|
|
15
|
+
| Failure Mode | Design Handling | Missing Mitigation | Risk |
|
|
16
|
+
|---|---|---|---|
|
|
17
|
+
| `ChildWorkflowRunResult` not updated | Called out in design doc as required update | Add explicit WARNING comment at cast site (line 2014) | HIGH -- silent compile-time pass, runtime crash |
|
|
18
|
+
| `await` in `turn_end` subscriber blocks abort | Fire-and-forget pattern specified explicitly | Add WHY comment at the outbox write call | MEDIUM -- junior devs may add await by analogy |
|
|
19
|
+
| `maybeRunDelivery` not updated | Existing gate `if (result._tag !== 'success') return` already handles it | None needed | LOW -- already handled |
|
|
20
|
+
|
|
21
|
+
## Runner-Up / Simpler Alternative Review
|
|
22
|
+
|
|
23
|
+
**Candidate C element borrowed:** `issueSummaries?: readonly string[]` added to `WorkflowRunStuck`. The `issueSummaries` array is already tracked in session closures (zero new collection code). Adds coordinator-readiness at near-zero cost.
|
|
24
|
+
|
|
25
|
+
**Simplest alternative:** Abort on `repeated_tool_call` only, no `no_progress` gate, no `issueSummaries`. Satisfies all 6 acceptance criteria. Excluded because `issueSummaries` and `noProgressAbortEnabled` add meaningful value at minimal cost.
|
|
26
|
+
|
|
27
|
+
**Hybrid result:** Candidate B adjusted = Candidate B + `issueSummaries` field (borrowed from C) + `noProgressAbortEnabled: boolean` gate (default false) for `no_progress` abort.
|
|
28
|
+
|
|
29
|
+
## Philosophy Alignment
|
|
30
|
+
|
|
31
|
+
| Principle | Alignment |
|
|
32
|
+
|---|---|
|
|
33
|
+
| Make illegal states unrepresentable | FULL -- new discriminant, not reused timeout |
|
|
34
|
+
| Exhaustiveness everywhere | FULL -- all assertNever guards updated |
|
|
35
|
+
| Errors are data | FULL -- result value, not exception |
|
|
36
|
+
| Immutability by default | FULL -- all new fields readonly |
|
|
37
|
+
| Fire-and-forget observability | FULL -- outbox write is void+detached+swallowed |
|
|
38
|
+
| YAGNI | TENSION -- 5-file surface; resolved in favor of structural correctness |
|
|
39
|
+
| Determinism over cleverness | TENSION -- no_progress is a heuristic; resolved by gating it behind explicit opt-in |
|
|
40
|
+
|
|
41
|
+
## Findings
|
|
42
|
+
|
|
43
|
+
### RED (Blocking)
|
|
44
|
+
None. No design-correctness violations found.
|
|
45
|
+
|
|
46
|
+
### ORANGE (High Risk)
|
|
47
|
+
**Finding O1: `ChildWorkflowRunResult` update is easy to miss.**\nThe coding agent must update `ChildWorkflowRunResult` alongside `WorkflowRunResult`. If missed, a `_tag: 'stuck'` result from a child session spawned via `makeSpawnAgentTool` will reach `assertNever` at runtime and crash. The design doc mentions it but the failure mode is severe enough to warrant a WARNING comment in the code at the cast site (line 2014 in workflow-runner.ts).\n\n**Recommended action:** Add to design doc: 'CRITICAL: Update ChildWorkflowRunResult on the SAME commit as WorkflowRunResult. Do not split across commits.'
|
|
48
|
+
|
|
49
|
+
### YELLOW (Medium Risk)
|
|
50
|
+
**Finding Y1: `no_progress` false-positive rate is unvalidated.**\nThe 80%-turns threshold with 0 step advances can fire on legitimate deep-research sessions. The `noProgressAbortEnabled: false` default mitigates this but means the feature is effectively inactive until explicitly enabled. Users who expect `no_progress` to work out-of-the-box will be surprised.\n\n**Recommended action:** Document the default explicitly in the trigger YAML schema comment and in the CLI help text.
|
|
51
|
+
|
|
52
|
+
**Finding Y2: Fire-and-forget outbox write timing.**\nThe outbox write initiates in `turn_end` as a detached Promise but the `WorkflowRunStuck` result reaches TriggerRouter before the write completes. If the process exits immediately after TriggerRouter logs the result (e.g. during a rapid daemon shutdown), the outbox write may be lost.\n\n**Recommended action:** Accept this risk -- it is the same risk accepted by `DaemonEventEmitter`. Document it in the code with a WHY comment.
|
|
53
|
+
|
|
54
|
+
## Recommended Revisions to Design Doc
|
|
55
|
+
|
|
56
|
+
1. Add a **CRITICAL** callout in the '5-File Change Estimate' section: 'ChildWorkflowRunResult must be updated in the same commit as WorkflowRunResult. A cast at line 2014 allows a stuck result to bypass the makeSpawnAgentTool switch's assertNever if ChildWorkflowRunResult is not updated.'
|
|
57
|
+
|
|
58
|
+
2. Add `issueSummaries?: readonly string[]` to the `WorkflowRunStuck` interface definition. Update the outbox entry schema to include `issueSummaries`.
|
|
59
|
+
|
|
60
|
+
3. Add `noProgressAbortEnabled?: boolean` (default: false) to `WorkflowTrigger.agentConfig` as a separate field from `stuckAbortPolicy`. The abort policy controls whether to abort or notify; this flag controls whether `no_progress` is an active trigger at all.
|
|
61
|
+
|
|
62
|
+
4. Update the '5-File Change Estimate' table to show 5 files clearly (the confusion is that workflow-runner.ts has multiple edit locations).
|
|
63
|
+
|
|
64
|
+
## Residual Concerns
|
|
65
|
+
|
|
66
|
+
1. **`repeated_tool_call` false-positive rate**: Not empirically validated. The `stuckAbortPolicy: 'notify_only'` escape hatch is the mitigation. If the false-positive rate is high in production, the recommended path is: set `stuckAbortPolicy: 'notify_only'` by default and make `'abort'` opt-in, reversing the current default.
|
|
67
|
+
|
|
68
|
+
2. **Coordinator consumption of outbox.jsonl**: `outbox.jsonl` has no automated consumer today. The stuck entry will persist in the file until a human or coordinator reads it. This is a 'build it now, connect it later' tradeoff -- acceptable for the initial design.
|
|
69
|
+
|
|
70
|
+
3. **No shadow-mode validation**: Ideally, the heuristics would run in shadow mode (emit but never abort) for 20+ production sessions before enabling abort. This design does not include shadow mode. The `stuckAbortPolicy: 'notify_only'` setting can serve as a manual shadow mode.
|