@exaudeus/workrail 3.59.3 → 3.59.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console-ui/assets/{index-C8iMtnPv.js → index-Ctoxo1z6.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/modes/full-pipeline.js +43 -11
- package/dist/coordinators/modes/implement-shared.js +84 -17
- package/dist/coordinators/modes/implement.js +18 -1
- package/dist/coordinators/pr-review.d.ts +1 -1
- package/dist/manifest.json +15 -15
- package/dist/trigger/trigger-listener.js +83 -72
- package/dist/trigger/trigger-router.js +4 -1
- package/docs/design/coordinator-in-process-await-candidates.md +128 -0
- package/docs/design/coordinator-in-process-await-design-review.md +93 -0
- package/docs/design/coordinator-io-error-handling-candidates.md +199 -0
- package/docs/design/coordinator-io-error-handling-design-review.md +120 -0
- package/docs/design/dispatch-dedup-prealloc-bypass-candidates.md +187 -0
- package/docs/design/dispatch-dedup-prealloc-bypass-design-review.md +100 -0
- package/docs/design/dispatch-dedup-prealloc-bypass-implementation-plan.md +218 -0
- package/docs/ideas/backlog.md +52 -0
- package/package.json +1 -1
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# Design Review: In-Process awaitSessions and getAgentResult
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-19
|
|
4
|
+
**Candidate reviewed:** Candidate A from `coordinator-in-process-await-candidates.md`
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Tradeoff Review
|
|
9
|
+
|
|
10
|
+
**Tradeoff: `port` field made optional in `CoordinatorDeps`**
|
|
11
|
+
- Verified: `deps.port` is unused by all coordinator logic (grep returns zero results)
|
|
12
|
+
- Condition that invalidates: TypeScript errors showing `port` required elsewhere
|
|
13
|
+
- Mitigation: Run `npm run build` immediately; pivot to `port: 0` sentinel if errors appear
|
|
14
|
+
- **Verdict: Acceptable**
|
|
15
|
+
|
|
16
|
+
**Tradeoff: `null consoleService` fallback for missing `ctx.v2` ports**
|
|
17
|
+
- Verified: `createToolContext()` always provides non-null values in production
|
|
18
|
+
- Condition that invalidates: Not applicable in production path
|
|
19
|
+
- Mitigation: Warn on stderr + degrade gracefully to same all-failed behavior as current HTTP failure
|
|
20
|
+
- **Verdict: Acceptable**
|
|
21
|
+
|
|
22
|
+
**Tradeoff: Pending-Set polling over check-all-handles-every-poll**
|
|
23
|
+
- Verified: Terminal state transitions are monotonic (event log is append-only)
|
|
24
|
+
- Condition that invalidates: Not possible given ConsoleRunStatus projection semantics
|
|
25
|
+
- **Verdict: Correct and more efficient**
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Failure Mode Review
|
|
30
|
+
|
|
31
|
+
| Failure Mode | Handled? | Risk |
|
|
32
|
+
|---|---|---|
|
|
33
|
+
| `ctx.v2.dataDir`/`directoryListing` null | Yes -- null guard + graceful fallback | Low |
|
|
34
|
+
| Session not yet visible after spawnSession | Yes -- SESSION_LOAD_FAILED treated as retry | Low |
|
|
35
|
+
| TypeScript error from port optional | Partially -- pivot to sentinel if needed | Low |
|
|
36
|
+
| ConsoleService circular dependency | Yes -- dynamic import pattern | Low |
|
|
37
|
+
| getSessionDetail/getNodeDetail unexpected throw | Yes -- ResultAsync + outer try/catch | Low |
|
|
38
|
+
|
|
39
|
+
**Highest-risk**: TypeScript port optional change causing build failure. Pivot defined and trivial.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Runner-Up / Simpler Alternative Review
|
|
44
|
+
|
|
45
|
+
**Candidate B** (port: 0 sentinel): Nothing worth borrowing. The only advantage was zero interface changes, but it introduces a dead required field with a misleading sentinel value.
|
|
46
|
+
|
|
47
|
+
**Simpler variant** (keep `port: DAEMON_CONSOLE_PORT`): Insufficient -- leaves dead code after removing the HTTP deps that needed it. Design doc explicitly lists port removal as required.
|
|
48
|
+
|
|
49
|
+
**No hybrid needed.**
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Philosophy Alignment
|
|
54
|
+
|
|
55
|
+
| Principle | Status |
|
|
56
|
+
|---|---|
|
|
57
|
+
| Architectural fixes over patches | SATISFIED -- root-cause fix, not workaround |
|
|
58
|
+
| Make illegal states unrepresentable | SATISFIED -- no sentinel, optional instead |
|
|
59
|
+
| Dependency injection for boundaries | SATISFIED -- ConsoleService gets ports injected |
|
|
60
|
+
| Immutability by default | SATISFIED -- minimal mutable state (pending Set only) |
|
|
61
|
+
| Errors are data | SATISFIED -- ResultAsync.isOk()/isErr() throughout |
|
|
62
|
+
| YAGNI with discipline | SATISFIED -- no speculative abstractions |
|
|
63
|
+
| Document "why", not "what" | REQUIRED -- add WHY comments in implementation |
|
|
64
|
+
|
|
65
|
+
One acceptable tension: null consoleService uses nullable variable rather than Result type. Acceptable at initialization boundary (not domain logic).
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Findings
|
|
70
|
+
|
|
71
|
+
**No Red (blocking) findings.**
|
|
72
|
+
|
|
73
|
+
**Orange (should address before shipping):**
|
|
74
|
+
- None identified.
|
|
75
|
+
|
|
76
|
+
**Yellow (advisory):**
|
|
77
|
+
- Y1: The `ctx.v2` null guard creates a nullable `consoleService` variable. If `ctx.v2` is ever null in production, the stderr warning may be missed. Recommend making the warning prominent: `[CRITICAL trigger-listener:reason=consoleService_unavailable]` prefix.
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Recommended Revisions
|
|
82
|
+
|
|
83
|
+
1. Add `[CRITICAL]` prefix to the `ctx.v2` null guard warning to make it visible in logs.
|
|
84
|
+
2. Add WHY comment on new `awaitSessions` explaining the in-process approach (mirrors the spawnSession WHY comment pattern).
|
|
85
|
+
3. Add WHY comment on `ConsoleService` construction explaining it avoids the HTTP race condition.
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Residual Concerns
|
|
90
|
+
|
|
91
|
+
- **None blocking.** The design is sound, the tradeoffs are acceptable, and the failure modes are covered.
|
|
92
|
+
- Build verification (`npm run build`) will immediately catch any TypeScript issues from the `port` optional change.
|
|
93
|
+
- The implementation is a near-direct transcription of the design doc pseudocode, reducing creative risk.
|
|
@@ -0,0 +1,199 @@
|
|
|
1
|
+
# Coordinator I/O Error Handling -- Design Candidates
|
|
2
|
+
|
|
3
|
+
Generated: 2026-04-19
|
|
4
|
+
|
|
5
|
+
## Problem Understanding
|
|
6
|
+
|
|
7
|
+
### Core Tensions
|
|
8
|
+
|
|
9
|
+
1. **Crash-safety vs. DI purity**: The coordinator declares "all phase failures produce
|
|
10
|
+
`PipelineOutcome { kind: 'escalated' }` -- never thrown" as a design invariant, but three
|
|
11
|
+
injected dep functions (`getAgentResult`, `postToOutbox`, `pollForPR`) are called without
|
|
12
|
+
try/catch in the mode files. Any throw from these functions crashes the coordinator silently
|
|
13
|
+
instead of returning a structured `PipelineOutcome`. The fix must enforce the invariant at
|
|
14
|
+
the right boundary.
|
|
15
|
+
|
|
16
|
+
2. **Verbosity vs. DRY**: `postToOutbox` is called at 8+ critical escalation points across
|
|
17
|
+
`implement-shared.ts` and `full-pipeline.ts`. Each call site needs individual protection.
|
|
18
|
+
Inline try/catch at 8 sites is repetitive; a helper would reduce duplication but adds
|
|
19
|
+
abstraction not in the existing codebase pattern.
|
|
20
|
+
|
|
21
|
+
3. **`process.stderr.write()` vs. `deps.stderr()`**: The prescribed pattern uses
|
|
22
|
+
`process.stderr.write()` in catch blocks, but the rest of the coordinator uses the injected
|
|
23
|
+
`deps.stderr()`. The tension is minor -- catch blocks represent unexpected I/O failures,
|
|
24
|
+
so using `process.stderr.write()` signals this is an emergency log path, not a normal
|
|
25
|
+
operational log.
|
|
26
|
+
|
|
27
|
+
### Likely Seam
|
|
28
|
+
|
|
29
|
+
The mode files are the correct seam. `implement-shared.ts`, `full-pipeline.ts`, and
|
|
30
|
+
`implement.ts` are the callers of the three unsafe deps. The coordinator owns the
|
|
31
|
+
escalation-first invariant -- not the injectors (`trigger-listener.ts`, `cli-worktrain.ts`).
|
|
32
|
+
|
|
33
|
+
### What Makes This Hard
|
|
34
|
+
|
|
35
|
+
- `postToOutbox` calls are immediately followed by `return { kind: 'escalated', ... }`. The
|
|
36
|
+
try/catch must wrap ONLY the `postToOutbox` call, not the return statement. Careful
|
|
37
|
+
placement required.
|
|
38
|
+
- `pollForPR` is called in BOTH `implement.ts` (explicitly mentioned in task) AND
|
|
39
|
+
`full-pipeline.ts` line 454 (not mentioned but equally unsafe). Both must be wrapped.
|
|
40
|
+
- UX gate zombie detection: `implement.ts` line 144 assigns `uxHandle` from `uxSpawnResult.value`
|
|
41
|
+
without a null/empty-string guard before passing to `awaitSessions`. This is the only session
|
|
42
|
+
handle in the coordinator without the guard -- a separate gap alongside the I/O error handling.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Philosophy Constraints
|
|
47
|
+
|
|
48
|
+
**From `CLAUDE.md`:**
|
|
49
|
+
- "Errors are data -- represent failure as values (Result/Either), not exceptions as control flow"
|
|
50
|
+
- "Type safety as the first line of defense"
|
|
51
|
+
|
|
52
|
+
**From `adaptive-pipeline.ts` header (design invariant):**
|
|
53
|
+
- "All phase failures produce PipelineOutcome { kind: 'escalated' } -- never thrown."
|
|
54
|
+
- "All I/O is injected via AdaptiveCoordinatorDeps. Zero direct fs/fetch/exec imports."
|
|
55
|
+
|
|
56
|
+
**Repo precedent:**
|
|
57
|
+
- `archiveFile` (in `implement.ts` and `full-pipeline.ts`): try/catch inline in finally block, log-and-continue. This is the exact model for `postToOutbox`.
|
|
58
|
+
- `writeFile` routing log (in `adaptive-pipeline.ts`): try/catch inline, log-and-continue.
|
|
59
|
+
|
|
60
|
+
**Conflicts:** None material. The stated philosophy (errors as data) and the repo pattern (inline try/catch for non-Result deps) are consistent.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Impact Surface
|
|
65
|
+
|
|
66
|
+
- `runReviewAndVerdictCycle` is called from both `implement.ts` and `full-pipeline.ts`. Fixing
|
|
67
|
+
`getAgentResult` in `implement-shared.ts` protects both callers automatically.
|
|
68
|
+
- `runAuditChain` (also in `implement-shared.ts`) calls both `getAgentResult` and `postToOutbox`
|
|
69
|
+
at multiple points.
|
|
70
|
+
- `adaptive-pipeline.ts` line 362 calls `postToOutbox` in the `ESCALATE` routing case -- this is
|
|
71
|
+
OUT OF SCOPE for this task (task restricts changes to the 3 mode files).
|
|
72
|
+
- No callers outside these files change signature or return type.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Candidates
|
|
77
|
+
|
|
78
|
+
### Candidate 1: Inline try/catch at each call site (prescribed pattern)
|
|
79
|
+
|
|
80
|
+
**Summary:** Wrap each `getAgentResult`, `postToOutbox`, and `pollForPR` call individually in
|
|
81
|
+
a try/catch block in the 3 mode files.
|
|
82
|
+
|
|
83
|
+
**Tensions resolved:** Crash-safety fully addressed. Accepts: slight verbosity from 8+
|
|
84
|
+
`postToOutbox` sites.
|
|
85
|
+
|
|
86
|
+
**Boundary:** At the mode file call sites -- the correct boundary. The coordinator owns the
|
|
87
|
+
escalation-first invariant; the mode files are where the invariant must be enforced.
|
|
88
|
+
|
|
89
|
+
**Failure mode:** Missing the `pollForPR` call in `full-pipeline.ts` (not explicitly called out
|
|
90
|
+
in task description but confirmed unsafe by code analysis). Must be systematic.
|
|
91
|
+
|
|
92
|
+
**Repo-pattern relationship:** Follows `archiveFile` try/catch pattern exactly. Adapts
|
|
93
|
+
`writeFile` routing-log pattern from `adaptive-pipeline.ts`.
|
|
94
|
+
|
|
95
|
+
**Gains:** Zero risk to happy path. Locally visible -- reviewer can see exactly what is
|
|
96
|
+
protected at each call site. No new abstractions.
|
|
97
|
+
|
|
98
|
+
**Losses:** Mildly repetitive for `postToOutbox` sites. Functions grow slightly.
|
|
99
|
+
|
|
100
|
+
**Scope:** Best-fit. The 3 files are exactly the seam.
|
|
101
|
+
|
|
102
|
+
**Philosophy fit:** Honors "errors are data", "escalation-first invariant", "DI for boundaries".
|
|
103
|
+
Minor: uses `process.stderr.write()` in catch blocks rather than `deps.stderr()`, consistent
|
|
104
|
+
with prescribed pattern and emergency-log semantics.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
### Candidate 2: Wrap at injection site (safe wrapper functions)
|
|
109
|
+
|
|
110
|
+
**Summary:** Wrap `getAgentResult`, `postToOutbox`, `pollForPR` in safe adapter functions at
|
|
111
|
+
the injection sites (`trigger-listener.ts`, `cli-worktrain.ts`) so the deps never throw from
|
|
112
|
+
the coordinator's perspective.
|
|
113
|
+
|
|
114
|
+
**Tensions resolved:** DI purity -- the mode files stay clean. Accepts: changes to 2 files
|
|
115
|
+
outside the permitted scope.
|
|
116
|
+
|
|
117
|
+
**Boundary:** At the injection layer. Wrong boundary for this task -- the coordinator owns the
|
|
118
|
+
escalation invariant, not the injectors. Injectors wire up the real implementation; they are
|
|
119
|
+
not responsible for the coordinator's recovery behavior.
|
|
120
|
+
|
|
121
|
+
**Failure mode:** Wrapping at injection site catches throws but cannot return `PipelineOutcome`
|
|
122
|
+
-- would need to return null or a sentinel, which the mode files then check. Adds complexity
|
|
123
|
+
at both ends, solving neither fully.
|
|
124
|
+
|
|
125
|
+
**Repo-pattern relationship:** Departs -- existing injected deps use Result types for
|
|
126
|
+
error-returning deps; plain-promise deps are not wrapped at injection sites.
|
|
127
|
+
|
|
128
|
+
**Scope:** Too broad -- reaches outside the permitted 3 files.
|
|
129
|
+
|
|
130
|
+
**Verdict: Rejected.** Out of scope and wrong seam.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
### Candidate 3: Private helper `safePostToOutbox(deps, msg, meta)`
|
|
135
|
+
|
|
136
|
+
**Summary:** Extract a private helper that wraps `deps.postToOutbox` in try/catch, reducing
|
|
137
|
+
repetition at the 8+ `postToOutbox` call sites.
|
|
138
|
+
|
|
139
|
+
**Tensions resolved:** DRY for `postToOutbox`. Accepts: new abstraction not prescribed by task.
|
|
140
|
+
|
|
141
|
+
**Boundary:** Same 3 mode files, plus a local helper function in `implement-shared.ts`.
|
|
142
|
+
|
|
143
|
+
**Failure mode:** Helper abstraction obscures the try/catch from reviewers; may hide future
|
|
144
|
+
misuse (e.g., someone using the helper for a call that SHOULD escalate on failure).
|
|
145
|
+
|
|
146
|
+
**Repo-pattern relationship:** No precedent for dep-wrapper helpers in the mode files. `archiveFile`
|
|
147
|
+
try/catch is inline without a helper.
|
|
148
|
+
|
|
149
|
+
**Scope:** Best-fit only if `postToOutbox` had 15+ sites. At 8, YAGNI says no.
|
|
150
|
+
|
|
151
|
+
**Verdict: Skipped.** Task spec gives explicit inline pattern. YAGNI applies.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Comparison and Recommendation
|
|
156
|
+
|
|
157
|
+
**Candidate 1 is the clear choice.**
|
|
158
|
+
|
|
159
|
+
All three candidates converge on the same underlying mechanism (try/catch). The only real
|
|
160
|
+
alternatives differ in location (injection site -- wrong boundary) or DRY abstraction (helper --
|
|
161
|
+
not warranted at 8 sites). Convergence is honest here.
|
|
162
|
+
|
|
163
|
+
Candidate 1:
|
|
164
|
+
- Follows the prescribed pattern from the task description exactly
|
|
165
|
+
- Follows the repo precedent (`archiveFile`, `writeFile` try/catch)
|
|
166
|
+
- Is locally visible and reviewable
|
|
167
|
+
- Carries zero happy-path risk
|
|
168
|
+
- Can be applied systematically to all confirmed call sites
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Self-Critique
|
|
173
|
+
|
|
174
|
+
**Strongest argument against:** The 8+ `postToOutbox` call sites produce repetitive code. If
|
|
175
|
+
the count grew to 20+, a helper would be clearly warranted. At 8, the verbosity is manageable.
|
|
176
|
+
|
|
177
|
+
**Narrower option that might work:** Only fix `getAgentResult` and `pollForPR` (HIGH severity),
|
|
178
|
+
skip `postToOutbox` wrapping (MEDIUM severity). Would reduce scope. Loses: `postToOutbox` crash
|
|
179
|
+
at escalation decision points is still a real failure mode that kills the pipeline silently.
|
|
180
|
+
|
|
181
|
+
**Broader option:** Candidate 2 (wrap at injection). Would be justified only if there were a
|
|
182
|
+
precedent of wrapping injected deps at the injection layer. No such precedent exists.
|
|
183
|
+
|
|
184
|
+
**Invalidating assumption:** If `postToOutbox` is guaranteed never to throw in production (e.g.,
|
|
185
|
+
the real impl is in-memory rather than disk-based). The audit doc confirms it uses
|
|
186
|
+
`fs.promises.appendFile` -- can fail on disk full or permission error. Assumption holds.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Open Questions for Main Agent
|
|
191
|
+
|
|
192
|
+
None. The problem, solution, and scope are fully specified. Implementation is mechanical.
|
|
193
|
+
|
|
194
|
+
- Confirm `pollForPR` in `full-pipeline.ts` line 454 also needs wrapping (not explicitly in
|
|
195
|
+
task description but confirmed unsafe -- include it).
|
|
196
|
+
- For `postToOutbox`: the task says "log a warning and continue". Use `process.stderr.write()`
|
|
197
|
+
as prescribed, not `deps.stderr()`.
|
|
198
|
+
- UX gate zombie detection in `implement.ts`: add `if (!uxHandle || uxHandle.trim() === '')` guard
|
|
199
|
+
after line 144, consistent with all 9 other session handle checks in the coordinator.
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# Coordinator I/O Error Handling -- Design Review Findings
|
|
2
|
+
|
|
3
|
+
Generated: 2026-04-19
|
|
4
|
+
|
|
5
|
+
## Tradeoff Review
|
|
6
|
+
|
|
7
|
+
### Verbosity (8+ `postToOutbox` inline try/catch sites)
|
|
8
|
+
|
|
9
|
+
- **Verdict:** Acceptable. At 8 sites, YAGNI wins. The `archiveFile` precedent in the same files
|
|
10
|
+
shows inline try/catch is the established pattern.
|
|
11
|
+
- **Break condition:** If `postToOutbox` call sites grow to 15+, extract a private helper.
|
|
12
|
+
- **Hidden assumption:** `postToOutbox` call count stays roughly constant in the near term.
|
|
13
|
+
|
|
14
|
+
### `deps.stderr()` vs. `process.stderr.write()` in catch blocks
|
|
15
|
+
|
|
16
|
+
- **Verdict:** Use `deps.stderr()` to match the existing `archiveFile` pattern in `implement.ts`
|
|
17
|
+
and `full-pipeline.ts`. The task spec example uses `process.stderr.write()` but the actual repo
|
|
18
|
+
uses `deps.stderr()` for the identical use case. `deps.stderr()` is more consistent and testable.
|
|
19
|
+
- **Break condition:** None. `deps.stderr()` is strictly better here.
|
|
20
|
+
|
|
21
|
+
### `pollForPR` in `full-pipeline.ts` not in task description but included
|
|
22
|
+
|
|
23
|
+
- **Verdict:** Include it. It's the same unsafe call pattern, same dep, same risk. Excluding it
|
|
24
|
+
would leave the fix incomplete.
|
|
25
|
+
- **Hidden assumption:** Both `pollForPR` call sites use the same real implementation -- confirmed.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Failure Mode Review
|
|
30
|
+
|
|
31
|
+
| Mode | Handled? | Risk | Notes |
|
|
32
|
+
|------|----------|------|-------|
|
|
33
|
+
| Missing a call site | Mitigated | Medium | Grep check after implementation |
|
|
34
|
+
| `postToOutbox` throw during escalation sequence | Yes | Low | `return` is on next line after try/catch |
|
|
35
|
+
| `getAgentResult` throw with non-Error | Yes | Low | `e instanceof Error ? e.message : String(e)` |
|
|
36
|
+
| `pollForPR` throw leaving `prUrl` uninitialized | Yes (with care) | Medium | Must use `let prUrl; try {...} catch -> return escalated` |
|
|
37
|
+
| UX gate empty `uxHandle` zombie | Yes (after fix) | Low | Same 4-line guard as 9 other handles |
|
|
38
|
+
|
|
39
|
+
**Highest-risk failure mode:** `pollForPR` catch block structure. If written incorrectly (catch
|
|
40
|
+
logs but falls through), `prUrl` would be undefined and the subsequent `if (!prUrl)` check would
|
|
41
|
+
catch it -- but the `prUrl` variable would need to be declared with `let` outside the try block.
|
|
42
|
+
The fix requires care in the variable declaration pattern.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Runner-Up / Simpler Alternative Review
|
|
47
|
+
|
|
48
|
+
**Runner-up:** Private `safePostToOutbox` helper.
|
|
49
|
+
- **Strength worth borrowing:** Standardized log message format across all `postToOutbox` sites.
|
|
50
|
+
- **Adopted:** Standardize the log message format inline (consistent `[WARN coordinator] postToOutbox failed: ...` prefix across all sites).
|
|
51
|
+
- **Rejected:** Full helper extraction. YAGNI at 8 sites. No precedent in repo.
|
|
52
|
+
|
|
53
|
+
**Simpler alternative:** Skip `postToOutbox` wrapping (only fix `getAgentResult` and `pollForPR`).
|
|
54
|
+
- **Rejected:** `postToOutbox` crashes at critical escalation points. Medium severity is still a
|
|
55
|
+
production crash path that must be fixed.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Philosophy Alignment
|
|
60
|
+
|
|
61
|
+
| Principle | Status |
|
|
62
|
+
|-----------|--------|
|
|
63
|
+
| Errors are data | Fully satisfied -- throws become `PipelineOutcome` values |
|
|
64
|
+
| Escalation-first invariant | Enforced -- no throw-exit paths remain after fix |
|
|
65
|
+
| Make illegal states unrepresentable | Satisfied -- coordinator now always returns a value |
|
|
66
|
+
| DI for boundaries | Satisfied -- no new imports, changes are in mode files only |
|
|
67
|
+
| Compose with small functions | Under acceptable tension -- functions grow slightly |
|
|
68
|
+
| Document why not what | Needs 1-line comment per postToOutbox catch explaining non-fatal rationale |
|
|
69
|
+
| YAGNI with discipline | Satisfied -- no speculative helper |
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Findings
|
|
74
|
+
|
|
75
|
+
### Yellow: `pollForPR` variable declaration pattern
|
|
76
|
+
|
|
77
|
+
The `let prUrl` declaration must be placed BEFORE the try/catch block (not inside it) so that
|
|
78
|
+
the catch block can `return` an escalated outcome and the variable remains in scope after. If
|
|
79
|
+
the variable is declared inside `try`, TypeScript will not compile. This is a known TypeScript
|
|
80
|
+
pattern but worth flagging explicitly.
|
|
81
|
+
|
|
82
|
+
**Fix:** Use the explicit two-step pattern from the task spec:
|
|
83
|
+
```typescript
|
|
84
|
+
let prUrl: string | null;
|
|
85
|
+
try {
|
|
86
|
+
prUrl = await deps.pollForPR(branchPattern, PR_POLL_TIMEOUT_MS);
|
|
87
|
+
} catch (e) {
|
|
88
|
+
const msg = e instanceof Error ? e.message : String(e);
|
|
89
|
+
deps.stderr(`[WARN coordinator] pollForPR threw: ${msg}`);
|
|
90
|
+
return { kind: 'escalated', escalationReason: { phase: 'pr-detection', reason: `pollForPR threw: ${msg}` } };
|
|
91
|
+
}
|
|
92
|
+
if (!prUrl) { ... }
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Yellow: `deps.stderr()` vs. `process.stderr.write()`
|
|
96
|
+
|
|
97
|
+
Use `deps.stderr()` in catch blocks. The task spec example uses `process.stderr.write()` but the
|
|
98
|
+
repo's `archiveFile` catch blocks use `deps.stderr()`. Consistency with repo pattern wins.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Recommended Revisions
|
|
103
|
+
|
|
104
|
+
1. Use `deps.stderr()` (not `process.stderr.write()`) in all catch blocks.
|
|
105
|
+
2. Use `let prUrl: string | null` declared before the try block for `pollForPR` calls.
|
|
106
|
+
3. Add a one-line comment in each `postToOutbox` catch explaining non-fatal policy:
|
|
107
|
+
`// postToOutbox write failure is non-fatal -- escalation still returns below`
|
|
108
|
+
4. Include `pollForPR` in `full-pipeline.ts` even though task description only names `implement.ts`.
|
|
109
|
+
5. Include UX gate zombie detection fix in `implement.ts` line 144.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Residual Concerns
|
|
114
|
+
|
|
115
|
+
- **No tests for throw injection:** This PR fixes the runtime behavior but adds no tests for
|
|
116
|
+
the throw paths. Tests are a planned follow-up (per the audit doc). The absence of tests means
|
|
117
|
+
a regression in this fix would not be caught by CI. Low concern for this PR -- the fix is
|
|
118
|
+
mechanical and the pattern is simple.
|
|
119
|
+
- **`adaptive-pipeline.ts` line 362 `postToOutbox` is unguarded** but is explicitly out of scope
|
|
120
|
+
for this task. Should be addressed in a follow-up.
|
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
# Design Candidates: Bypass Dispatch Dedup for Pre-Allocated Sessions
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-19
|
|
4
|
+
**Status:** Decided -- Option A (guard before dedup block)
|
|
5
|
+
**Scope:** `src/trigger/trigger-router.ts`, `dispatch()` method only
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Problem Understanding
|
|
10
|
+
|
|
11
|
+
### Core Tensions
|
|
12
|
+
|
|
13
|
+
1. **Dedup protection vs session liveness** -- The 30s dedup window in `dispatch()` exists to prevent
|
|
14
|
+
duplicate pipeline sessions from webhook retries. But the same mechanism incorrectly kills child
|
|
15
|
+
sessions spawned by `spawnSession()`, which already pre-created the session in the store via
|
|
16
|
+
`executeStartWorkflow()`. The invariant 'same key = duplicate' is false for pre-allocated sessions.
|
|
17
|
+
|
|
18
|
+
2. **Shared state vs path-specific logic** -- `_recentAdaptiveDispatches` is intentionally shared
|
|
19
|
+
across `route()`, `dispatch()`, and `dispatchAdaptivePipeline()`. This cross-path coupling causes
|
|
20
|
+
the key collision: `dispatchAdaptivePipeline()` writes `goal::workspace` at t=0, then `dispatch()`
|
|
21
|
+
reads it at t~=0 and returns early. The fix must carve out an exception for one specific case
|
|
22
|
+
without breaking the shared-state intent.
|
|
23
|
+
|
|
24
|
+
3. **Code reuse vs early-exit clarity** -- The dedup block is a scoped `{...}` block at the top of
|
|
25
|
+
`dispatch()`. Adding the guard before it avoids duplicating the enqueue block, but requires
|
|
26
|
+
careful restructuring to keep one enqueue call reached by both paths.
|
|
27
|
+
|
|
28
|
+
### Likely Seam
|
|
29
|
+
|
|
30
|
+
The real seam is `dispatch()` lines 851-866 (the dedup block). The symptom (zombie session) is
|
|
31
|
+
downstream, but the root cause (early return that bypasses `queue.enqueue()`) is exactly here.
|
|
32
|
+
|
|
33
|
+
### What Makes It Hard
|
|
34
|
+
|
|
35
|
+
A junior developer might:
|
|
36
|
+
- Add `_preAllocatedStartResponse` as a new parameter instead of checking the existing field.
|
|
37
|
+
- Delete the dedup block from `dispatch()` entirely (Option B) -- too broad.
|
|
38
|
+
- Add the guard inside the dedup block as a `return` that skips `_recentAdaptiveDispatches.set()`,
|
|
39
|
+
which is technically acceptable (the map entry from `dispatchAdaptivePipeline()` is still valid
|
|
40
|
+
for blocking duplicate top-level calls) but less clear.
|
|
41
|
+
- Accidentally duplicate the `queue.enqueue()` callback body, creating divergent result-handling.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Philosophy Constraints
|
|
46
|
+
|
|
47
|
+
**Source: `/Users/etienneb/CLAUDE.md`**
|
|
48
|
+
|
|
49
|
+
- **Architectural fixes over patches** -- the guard models the invariant, not a special case
|
|
50
|
+
- **Make illegal states unrepresentable** -- `_preAllocatedStartResponse !== undefined` is a
|
|
51
|
+
compile-time discriminator; the guard makes 'dedup fires for pre-allocated session' impossible
|
|
52
|
+
- **YAGNI with discipline** -- Option A is the minimal fix; no speculative abstractions
|
|
53
|
+
- **Document why, not what** -- the guard comment must explain the invariant, not describe the code
|
|
54
|
+
|
|
55
|
+
No conflicts between stated philosophy and repo patterns detected.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Impact Surface
|
|
60
|
+
|
|
61
|
+
- `spawnSession()` in `src/trigger/trigger-listener.ts` is the only caller that sets
|
|
62
|
+
`_preAllocatedStartResponse`. The fix must not change its call signature.
|
|
63
|
+
- `queue.enqueue()` callback body in `dispatch()` handles the full `WorkflowRunResult` union
|
|
64
|
+
(success, error, timeout, stuck, delivery_failed). Both the guard path and the normal path must
|
|
65
|
+
reach the same callback body -- no duplication.
|
|
66
|
+
- `_recentAdaptiveDispatches` -- for non-prealloc calls, cleanup-on-entry and set must still run.
|
|
67
|
+
- `route()` and `dispatchAdaptivePipeline()` -- no changes to either.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## Candidates
|
|
72
|
+
|
|
73
|
+
### Candidate A: Early-return guard before the dedup block (Option A from design doc)
|
|
74
|
+
|
|
75
|
+
**Summary:** Add `if (workflowTrigger._preAllocatedStartResponse !== undefined) { void this.queue.enqueue(...); return workflowTrigger.workflowId; }` before the scoped dedup block. The dedup block is only reached when the field is absent.
|
|
76
|
+
|
|
77
|
+
**Tensions resolved:** Dedup protection vs session liveness (fully resolved for pre-alloc path).
|
|
78
|
+
**Tensions accepted:** Shared state coupling remains -- the map is still shared.
|
|
79
|
+
|
|
80
|
+
**Boundary solved at:** `dispatch()` entry, before the dedup block. This is the real seam.
|
|
81
|
+
|
|
82
|
+
**Why this boundary:** The bug fires at the dedup block. The guard is placed exactly where the
|
|
83
|
+
divergence must happen. No other location would be more direct.
|
|
84
|
+
|
|
85
|
+
**Failure mode:** If the guard path and the normal path have separate `queue.enqueue()` callback
|
|
86
|
+
bodies, any future change to one body must be mirrored to the other. Mitigated by restructuring
|
|
87
|
+
so both paths reach the same enqueue call.
|
|
88
|
+
|
|
89
|
+
**Repo pattern:** Follows the `dispatchCondition` early-exit pattern in `route()` (lines 641-653).
|
|
90
|
+
Departs in that `dispatch()` previously had no such guard.
|
|
91
|
+
|
|
92
|
+
**Gains:** Minimal blast radius. Self-documenting -- the guard directly encodes the invariant.
|
|
93
|
+
**Loses:** Slightly more complex method structure if naively implemented with two enqueue blocks.
|
|
94
|
+
|
|
95
|
+
**Scope:** Best-fit. Single method, single guard.
|
|
96
|
+
|
|
97
|
+
**Philosophy:** Honors 'Architectural fixes', 'Make illegal states unrepresentable', 'YAGNI'.
|
|
98
|
+
No conflicts.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### Candidate B: Wrap dedup block in `if (!_preAllocatedStartResponse)` (same intent, cleaner structure)
|
|
103
|
+
|
|
104
|
+
**Summary:** Wrap the entire scoped dedup block in `if (workflowTrigger._preAllocatedStartResponse === undefined)`. Both paths then fall through to the same `void this.queue.enqueue(...)` call at the bottom of the method.
|
|
105
|
+
|
|
106
|
+
**Tensions resolved:** Same as A, plus eliminates code duplication risk.
|
|
107
|
+
**Tensions accepted:** Same as A.
|
|
108
|
+
|
|
109
|
+
**Boundary:** Same as A.
|
|
110
|
+
|
|
111
|
+
**Failure mode:** `_recentAdaptiveDispatches.set()` must still run for non-prealloc paths that
|
|
112
|
+
pass dedup. This is naturally handled by the wrap -- the set is inside the block.
|
|
113
|
+
|
|
114
|
+
**Repo pattern:** More consistent with the existing scoped-block style in `dispatch()`.
|
|
115
|
+
|
|
116
|
+
**Gains:** Single enqueue block -- no duplication risk.
|
|
117
|
+
**Loses:** The `_preAllocatedStartResponse` check is farther from the enqueue call than in A,
|
|
118
|
+
making the intent slightly less immediate.
|
|
119
|
+
|
|
120
|
+
**Scope:** Best-fit. Same scope as A.
|
|
121
|
+
|
|
122
|
+
**Philosophy:** Same as A.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
### Candidate C: Remove dedup from `dispatch()` entirely (Option B from design doc)
|
|
127
|
+
|
|
128
|
+
**Summary:** Delete the entire dedup block from `dispatch()`. The dedup that protects against
|
|
129
|
+
webhook retries lives in `dispatchAdaptivePipeline()` and `route()`, both of which are the
|
|
130
|
+
actual entry points for external events.
|
|
131
|
+
|
|
132
|
+
**Tensions resolved:** Eliminates the shared-state coupling entirely.
|
|
133
|
+
|
|
134
|
+
**Boundary:** Too broad -- removes protection from the HTTP console route at `console-routes.ts:868`
|
|
135
|
+
which calls `dispatch()` directly.
|
|
136
|
+
|
|
137
|
+
**Failure mode:** Rapid-fire console dispatches could spawn duplicates if the HTTP layer does not
|
|
138
|
+
deduplicate. No evidence exists that this protection is currently needed, but removing it is a
|
|
139
|
+
behavioral change beyond the scope of this fix.
|
|
140
|
+
|
|
141
|
+
**Repo pattern:** Departs from the established shared-dedup-map pattern.
|
|
142
|
+
|
|
143
|
+
**Scope:** Too broad. The task explicitly specifies Option A.
|
|
144
|
+
|
|
145
|
+
**Philosophy:** Conflicts with 'YAGNI with discipline' -- speculative fix for an unproven gap.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Comparison and Recommendation
|
|
150
|
+
|
|
151
|
+
All three candidates converge on the same fundamental fix. C is ruled out (too broad). A and B
|
|
152
|
+
are structurally equivalent -- the choice is whether to use an early-return guard (A) or a
|
|
153
|
+
wrapping if-block (B).
|
|
154
|
+
|
|
155
|
+
**Recommendation: Implement B's structure with A's intent.**
|
|
156
|
+
|
|
157
|
+
Use `if (workflowTrigger._preAllocatedStartResponse === undefined)` to wrap the dedup block,
|
|
158
|
+
with the `void this.queue.enqueue(...)` call appearing once after the if-block. This gives:
|
|
159
|
+
- No code duplication (single enqueue call)
|
|
160
|
+
- Clear separation: 'if non-prealloc, check dedup; then enqueue regardless'
|
|
161
|
+
- Consistent with the scoped-block style already in `dispatch()`
|
|
162
|
+
|
|
163
|
+
**Rationale:** The task pseudocode suggests A's early-return pattern, but B is equivalent and
|
|
164
|
+
avoids the duplication risk. Any reviewer familiar with the design doc will understand either shape.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## Self-Critique
|
|
169
|
+
|
|
170
|
+
**Strongest counter-argument:** The task pseudocode explicitly shows an early-return guard (A).
|
|
171
|
+
A reviewer seeing the design doc side-by-side with the implementation may expect exactly that
|
|
172
|
+
pattern. Diverging to a wrap (B) adds a tiny friction.
|
|
173
|
+
|
|
174
|
+
**Pivot conditions:**
|
|
175
|
+
- If the enqueue callback body diverges between paths in A, switch to B immediately.
|
|
176
|
+
- If `dispatch()` gains a third call path that also needs dedup bypass, consider Option C or
|
|
177
|
+
a separate dedup map (Option C from design doc) at that point.
|
|
178
|
+
|
|
179
|
+
**Assumption that would invalidate this design:** If `_preAllocatedStartResponse` could ever be
|
|
180
|
+
set on a call where dedup should still fire. The JSDoc is explicit: the field is only set by
|
|
181
|
+
`spawnSession`/`spawn_agent` which already hold a pre-created session. This assumption is safe.
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Open Questions for the Main Agent
|
|
186
|
+
|
|
187
|
+
None. The problem, solution, boundary, and tests are fully specified.
|