@exaudeus/workrail 3.59.4 → 3.59.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,218 @@
1
+ # Implementation Plan: Bypass Dispatch Dedup for Pre-Allocated Sessions
2
+
3
+ **Date:** 2026-04-19
4
+ **Branch:** fix/dispatch-dedup-prealloc-bypass
5
+ **Scope:** src/trigger/trigger-router.ts only (src/mcp/ excluded)
6
+
7
+ ---
8
+
9
+ ## 1. Problem Statement
10
+
11
+ `TriggerRouter.dispatch()` has a 30-second deduplication guard that compares incoming
12
+ `goal::workspacePath` against `_recentAdaptiveDispatches`. When `dispatchAdaptivePipeline()`
13
+ runs, it writes this key. Milliseconds later, `spawnSession()` calls `dispatch()` with the same
14
+ goal and workspace (plus `_preAllocatedStartResponse`). The dedup guard fires, `queue.enqueue()`
15
+ is never called, and the session that was already written to the store by `executeStartWorkflow()`
16
+ zombies permanently.
17
+
18
+ ---
19
+
20
+ ## 2. Acceptance Criteria
21
+
22
+ 1. `dispatch()` called with `_preAllocatedStartResponse` set bypasses the dedup check and calls
23
+ `runWorkflowFn` exactly once.
24
+ 2. `dispatch()` called WITHOUT `_preAllocatedStartResponse` still deduplicates correctly within 30s.
25
+ 3. `npm run build` exits clean (no TypeScript errors).
26
+ 4. `npx vitest run tests/unit/trigger-router.test.ts` -- all tests pass including two new tests.
27
+ 5. `npx vitest run` -- no regressions in any other test file.
28
+ 6. PR merged to main via `gh pr merge <N> --squash`.
29
+ 7. Daemon rebuilt and reinstalled (`npm run build && node dist/cli-worktrain.js daemon --install`).
30
+ 8. `node dist/cli-worktrain.js trigger poll self-improvement` starts a session and `session_started`
31
+ appears in the event log within 30s.
32
+
33
+ ---
34
+
35
+ ## 3. Non-Goals
36
+
37
+ - Do NOT touch `src/mcp/` (any file).
38
+ - Do NOT implement Option B (remove dedup from dispatch() entirely).
39
+ - Do NOT implement Option C (separate dedup maps).
40
+ - Do NOT touch `route()` or `dispatchAdaptivePipeline()`.
41
+ - Do NOT change the dedup TTL or map key format.
42
+
43
+ ---
44
+
45
+ ## 4. Philosophy-Driven Constraints
46
+
47
+ - Guard comment must explain WHY, not what (CLAUDE.md: 'Document why, not what').
48
+ - No code duplication: both the pre-alloc path and the normal path reach the same single
49
+ `queue.enqueue()` call (CLAUDE.md: 'Compose with small, pure functions').
50
+ - Guard uses `!== undefined` not a falsy check (CLAUDE.md: 'Type safety as the first line of defense').
51
+
52
+ ---
53
+
54
+ ## 5. Invariants
55
+
56
+ 1. When `_preAllocatedStartResponse !== undefined`, `queue.enqueue()` MUST be called.
57
+ 2. When `_preAllocatedStartResponse === undefined`, the existing dedup check runs unchanged.
58
+ 3. `_recentAdaptiveDispatches` is not updated for pre-alloc dispatch calls (the entry from
59
+ `dispatchAdaptivePipeline()` remains and is the correct TTL anchor for top-level dedup).
60
+ 4. The `assertNever` exhaustiveness guard in the enqueue callback remains intact.
61
+
62
+ ---
63
+
64
+ ## 6. Selected Approach + Rationale
65
+
66
+ **Approach:** Wrap the dedup block in `dispatch()` with:
67
+ ```typescript
68
+ if (workflowTrigger._preAllocatedStartResponse === undefined) {
69
+ // ... existing dedup block ...
70
+ }
71
+ ```
72
+ Both the pre-alloc path and the normal (post-dedup) path fall through to the same single
73
+ `void this.queue.enqueue(...)` call.
74
+
75
+ **Rationale:** Minimal blast radius. Single guard. No code duplication. Directly models the
76
+ documented invariant in `WorkflowTrigger._preAllocatedStartResponse` JSDoc.
77
+
78
+ **Runner-up:** Early-return guard before the dedup block (Candidate A from design review).
79
+ Lost because it risks duplicating the enqueue callback body. Structurally equivalent otherwise.
80
+
81
+ ---
82
+
83
+ ## 7. Vertical Slices
84
+
85
+ ### Slice 1: Implementation fix in trigger-router.ts
86
+
87
+ **Scope:** `src/trigger/trigger-router.ts`, `dispatch()` method only (lines 847-933).
88
+
89
+ **Change:**
90
+ 1. Add a guard comment before the dedup block explaining the pre-alloc invariant.
91
+ 2. Wrap the scoped dedup block `{...}` in `if (workflowTrigger._preAllocatedStartResponse === undefined)`.
92
+ 3. Add an optional `console.log` in the pre-alloc path for daemon observability.
93
+ 4. The existing `void this.queue.enqueue(...)` call remains after the if-block.
94
+
95
+ **Acceptance criterion:** TypeScript compiles clean. The guard is visible and commented correctly.
96
+
97
+ ---
98
+
99
+ ### Slice 2: Unit tests in trigger-router.test.ts
100
+
101
+ **Scope:** `tests/unit/trigger-router.test.ts`, new describe block or additions to existing
102
+ 'TriggerRouter.route and dispatch deduplication' describe block.
103
+
104
+ **Two new tests:**
105
+
106
+ **Test 1: dispatch() with _preAllocatedStartResponse bypasses dedup and calls runWorkflowFn**
107
+ - Call `dispatchAdaptivePipeline(goal, workspace)` to prime the dedup map.
108
+ - Then call `dispatch({ workflowId, goal, workspacePath: workspace, context: {}, _preAllocatedStartResponse: <fake> })`.
109
+ - Flush the async queue.
110
+ - Assert `calls.toHaveLength(1)` -- runWorkflowFn was called exactly once.
111
+
112
+ **Test 2: dispatch() WITHOUT _preAllocatedStartResponse still deduplicates within 30s**
113
+ - This test already exists at line 1604. Verify it still passes after the change.
114
+ - No new test needed for this case; the existing test is the regression guard.
115
+
116
+ **Acceptance criterion:** Both tests pass (`vitest run tests/unit/trigger-router.test.ts`).
117
+
118
+ ---
119
+
120
+ ### Slice 3: Build, test, PR, CI, merge
121
+
122
+ **Steps:**
123
+ 1. `npm run build` -- clean
124
+ 2. `npx vitest run tests/unit/trigger-router.test.ts` -- all pass
125
+ 3. `npx vitest run` -- no regressions
126
+ 4. Create branch `fix/dispatch-dedup-prealloc-bypass`
127
+ 5. Commit: `fix(trigger): bypass dispatch dedup for pre-allocated sessions to prevent zombie sessions`
128
+ 6. Push + open PR
129
+ 7. Wait for CI
130
+ 8. Merge: `gh pr merge <N> --squash`
131
+
132
+ ---
133
+
134
+ ### Slice 4: Daemon reinstall and smoke test
135
+
136
+ **Steps:**
137
+ 1. `npm run build && node dist/cli-worktrain.js daemon --install`
138
+ 2. `node dist/cli-worktrain.js trigger poll self-improvement`
139
+ 3. Watch for `session_started` in event log for at least 30s
140
+ 4. Confirm session is not zombie (status completes or progresses past `run_started`)
141
+
142
+ ---
143
+
144
+ ## 8. Test Design
145
+
146
+ ### New Test 1: Bypass case (primary regression test for this fix)
147
+
148
+ ```typescript
149
+ it('dispatch(): bypasses dedup and calls runWorkflowFn when _preAllocatedStartResponse is set', async () => {
150
+ vi.useFakeTimers();
151
+ const { fn, calls } = makeFakeRunWorkflow();
152
+ const trigger = makeTrigger();
153
+ const router = new TriggerRouter(
154
+ makeIndex(trigger), FAKE_CTX, FAKE_API_KEY, fn,
155
+ undefined, undefined, undefined, undefined, undefined,
156
+ FAKE_DEPS, executors,
157
+ );
158
+
159
+ const goal = trigger.goal;
160
+ const workspace = trigger.workspacePath;
161
+
162
+ // Prime the dedup map via dispatchAdaptivePipeline
163
+ await router.dispatchAdaptivePipeline(goal, workspace);
164
+
165
+ // Now dispatch with _preAllocatedStartResponse set -- must bypass dedup
166
+ router.dispatch({
167
+ workflowId: trigger.workflowId,
168
+ goal,
169
+ workspacePath: workspace,
170
+ context: {},
171
+ _preAllocatedStartResponse: {} as any, // non-undefined value triggers bypass
172
+ });
173
+
174
+ // Flush
175
+ await new Promise((r) => setImmediate(r));
176
+
177
+ // runWorkflowFn must have been called exactly once
178
+ expect(calls).toHaveLength(1);
179
+ vi.useRealTimers();
180
+ });
181
+ ```
182
+
183
+ ### Existing Test (regression guard): dispatch() dedup still works without _preAllocatedStartResponse
184
+
185
+ The test at line 1604-1644 of trigger-router.test.ts covers this. It will serve as the
186
+ regression guard after the change.
187
+
188
+ ---
189
+
190
+ ## 9. Risk Register
191
+
192
+ | Risk | Likelihood | Impact | Mitigation |
193
+ |---|---|---|---|
194
+ | Guard removed in future refactor | Low | High | Unit test catches it in CI |
195
+ | Comment becomes stale | Low | Medium | Comment explains invariant, not mechanics -- less likely to stale |
196
+ | _preAllocatedStartResponse type changes | Very low | Low | TypeScript would catch it at compile time |
197
+
198
+ ---
199
+
200
+ ## 10. PR Packaging Strategy
201
+
202
+ Single PR. One commit. No breaking changes.
203
+
204
+ Branch: `fix/dispatch-dedup-prealloc-bypass`
205
+ Commit: `fix(trigger): bypass dispatch dedup for pre-allocated sessions to prevent zombie sessions`
206
+
207
+ ---
208
+
209
+ ## 11. Philosophy Alignment
210
+
211
+ | Principle | Status | Why |
212
+ |---|---|---|
213
+ | Architectural fixes over patches | Satisfied | Guard models the root invariant, not a special-case |
214
+ | Make illegal states unrepresentable | Satisfied | `_preAllocatedStartResponse !== undefined` is compile-time discriminator |
215
+ | YAGNI with discipline | Satisfied | Minimal change, no speculative abstractions |
216
+ | Document why, not what | Satisfied | Guard comment explains invariant |
217
+ | Type safety as first line of defense | Satisfied | `!== undefined` check, typed optional field |
218
+ | Immutability by default | Tension (pre-existing) | Shared mutable map; not introduced by this fix |
@@ -7351,3 +7351,138 @@ An agent can die from: stream watchdog timeout (600s no progress), OOM kill, or
7351
7351
  ### Priority
7352
7352
 
7353
7353
  High. Agent crash recovery makes the overnight-autonomous bar achievable. Without it, any hung LLM call or tool timeout fails the entire pipeline silently. With it, transient failures are automatically retried and the pipeline continues.
7354
+
7355
+ ---
7356
+
7357
+ ## Workflow execution time tracking and prediction (Apr 21, 2026)
7358
+
7359
+ **The problem:** WorkTrain has no data on how long workflows actually take. Timeouts are set by intuition (55 min for discovery, 35 for shaping, 65 for coding). We just discovered that discovery on a real workrail task takes ~16 minutes. The 55-minute timeout is 3x the actual time -- but we didn't know that until we ran a benchmark manually.
7360
+
7361
+ ### What to track
7362
+
7363
+ For every completed session, record:
7364
+ - Workflow ID
7365
+ - Total wall-clock duration
7366
+ - Number of turns
7367
+ - Number of step advances
7368
+ - Outcome (success / timeout / stuck / error)
7369
+ - Task complexity signals (codebase size, number of files read, task type from context)
7370
+
7371
+ Store in `~/.workrail/data/execution-stats.jsonl` -- one line per completed session, append-only.
7372
+
7373
+ ### What to do with it
7374
+
7375
+ **Immediate use: calibrate timeouts automatically**
7376
+
7377
+ Instead of hardcoded `DISCOVERY_TIMEOUT_MS = 55 * 60 * 1000`, read the p95 completion time from execution stats and set the timeout to `p95 * 1.5`. Start with the hardcoded values as seeds; refine after 10+ samples.
7378
+
7379
+ **Medium-term use: predict duration before dispatch**
7380
+
7381
+ Given: task description + workflow ID + codebase characteristics → predicted duration range.
7382
+
7383
+ The coordinator could use this to:
7384
+ - Warn when a task is likely to exceed session limits before starting
7385
+ - Adjust timeout budgets per-dispatch based on predicted complexity
7386
+ - Surface "this type of task usually takes 45 minutes" in `worktrain trigger test` output
7387
+
7388
+ **Longer-term use: quality/efficiency metrics**
7389
+
7390
+ Track step-advance rate (steps per turn) as a proxy for workflow efficiency. A session with 50 turns and 2 step advances is spending too many turns between steps. This feeds into the workflow improvement loop.
7391
+
7392
+ ### Implementation notes
7393
+
7394
+ - Append to `execution-stats.jsonl` in `runWorkflow()`'s finally block, same pattern as the daemon event log
7395
+ - Keep it simple: flat JSONL, no database, no schema migration
7396
+ - `worktrain status` can show recent timing stats: "Last 10 wr.discovery sessions: avg 18min, p95 31min"
7397
+ - `worktrain trigger validate` can warn if configured timeouts are well below historical p95
7398
+
7399
+ ### Priority
7400
+
7401
+ Medium. The data collection is small (~5 lines in `runWorkflow()`). The prediction and calibration are more involved. Ship collection first, calibration second.
7402
+
7403
+ ---
7404
+
7405
+ ## WorkRail MCP server self-cleanup (Apr 21, 2026)
7406
+
7407
+ **The problem:** The WorkRail MCP server accumulates stale state that never cleans itself up: old workflow copies in `~/.workrail/workflows/`, dead managed sources, git repo caches that can't pull, 500+ sessions in the store, stale remembered roots. None of it has a TTL or cleanup mechanism. Every server startup loads everything and logs validation errors for stale state.
7408
+
7409
+ ### Sources of stale state
7410
+
7411
+ 1. **`~/.workrail/workflows/`** -- manually copied or `worktrain init`-placed workflows that go stale when the repo updates. MCP server loads both repo copy and user copy; older one fails validation silently or noisily.
7412
+
7413
+ 2. **Managed sources** (`~/.workrail/data/managed-sources/`) -- paths that no longer exist stay registered. Server tries to load them on every startup.
7414
+
7415
+ 3. **Git workflow cache** (`~/.workrail/cache/git-*`) -- cloned repos whose remotes have changed, been deleted, or whose auth has expired. `git pull` fails; errors logged on every startup.
7416
+
7417
+ 4. **Session store** (`~/.workrail/data/sessions/`) -- sessions accumulate forever. No TTL, no archival. Console loads all 500+ on every `/api/v2/sessions` request (partially mitigated by mtime cache).
7418
+
7419
+ 5. **Remembered roots** (`~/.workrail/data/managed-sources/remembered-roots.json`) -- workspace paths from past sessions that no longer exist.
7420
+
7421
+ ### Fix: two layers
7422
+
7423
+ **Layer 1: Defensive loading (mostly already done)**
7424
+ Every loader should already handle missing/broken sources gracefully. Audit: are all managed source failures caught and logged as warnings rather than errors? Are git cache failures non-fatal?
7425
+
7426
+ **Layer 2: `workrail cleanup` command**
7427
+ ```
7428
+ workrail cleanup [--yes] [--sessions --older-than <age>] [--sources] [--cache] [--roots]
7429
+ ```
7430
+ - `--sources`: remove managed sources where path doesn't exist on disk
7431
+ - `--cache`: remove git caches where `git pull` fails (remote gone or auth expired)
7432
+ - `--sessions --older-than 30d`: archive or delete sessions older than N days
7433
+ - `--roots`: remove remembered roots where path doesn't exist
7434
+ - Without `--yes`: show what would be removed and ask for confirmation
7435
+ - With `--yes`: remove without prompting (for CI / worktrain init)
7436
+
7437
+ **Layer 3: Automatic startup cleanup (light)**
7438
+ On MCP server startup, silently remove managed sources where the filesystem path doesn't exist (non-destructive -- the path is already gone). Log a single "removed N stale sources" line. Do not auto-remove sessions or caches -- those require explicit user intent.
7439
+
7440
+ **Layer 4: User workflow directory sync**
7441
+ `~/.workrail/workflows/` should not be a place users copy workflows to manually. It should either:
7442
+ - Be deprecated entirely (use managed sources / workspace roots instead)
7443
+ - Have a `workrail sync` command that updates it from the canonical sources
7444
+ - Auto-detect when a user workflow is an older version of a bundled workflow and skip loading it
7445
+
7446
+ ### Priority
7447
+
7448
+ Medium for the cleanup command (quality of life, stops log noise). High for startup auto-cleanup of dead managed sources (prevents the `Invalid workflow` errors that have been confusing throughout this session). Low for session TTL/archival (the mtime cache handles the performance concern).
7449
+
7450
+ ---
7451
+
7452
+ ## Worktree orphan leak on delivery failure (Apr 21, 2026, from Audit 4)
7453
+
7454
+ **The bug:** In `src/trigger/trigger-router.ts`, `maybeRunDelivery()` on the success path deletes the session sidecar file before attempting worktree removal. If worktree removal fails (network error, git command failure), the sidecar is already gone. `runStartupRecovery()` scans sidecar files to find orphan worktrees -- so the orphaned worktree is invisible and accumulates indefinitely.
7455
+
7456
+ **Fix:** In the success path cleanup, delete the sidecar AFTER worktree removal, not before. Or better: always attempt worktree removal in a `try/finally` that deletes the sidecar regardless of whether removal succeeded.
7457
+
7458
+ **File:** `src/trigger/trigger-router.ts`, `maybeRunDelivery()` success path.
7459
+
7460
+ **Priority:** Medium. Worktrees are small, but the leak is permanent across daemon restarts.
7461
+
7462
+ ---
7463
+
7464
+ ## queue-poll.jsonl never rotated (Apr 21, 2026, from Audit 5)
7465
+
7466
+ **The bug:** `~/.workrail/queue-poll.jsonl` grows indefinitely. `appendFile`-only, no rotation. At 5-minute poll intervals with 2-3 events per cycle, this is ~8-87 MB/month depending on activity. Disk exhaustion risk on long-running daemons.
7467
+
7468
+ **Fix:** Add a size check before appending in `appendQueuePollLog()`. If file exceeds 10 MB, rotate it: rename to `queue-poll.jsonl.1`, start fresh. Keep at most 2 rotated files.
7469
+
7470
+ **File:** `src/trigger/polling-scheduler.ts`, `appendQueuePollLog()` function.
7471
+
7472
+ **Priority:** Medium. Not urgent but a production correctness issue.
7473
+
7474
+ ---
7475
+
7476
+ ## ReviewSeverity missing assertNever + stderr bypassing injected dep (Apr 21, 2026, from Audit 2)
7477
+
7478
+ **Bug 1 (Major):** In `src/coordinators/modes/implement-shared.ts`, the `switch(findings.severity)` over `ReviewSeverity` has no `default: assertNever(findings.severity)`. Widening `ReviewSeverity` with a new variant compiles cleanly and falls through silently.
7479
+
7480
+ **Fix:** Add `default: assertNever(findings.severity)` to the switch.
7481
+
7482
+ **Bug 2 (Major):** In `src/coordinators/pr-review.ts`, `readVerdictArtifact()` calls `process.stderr.write(...)` directly instead of using the injected `deps.stderr`. Tests that inject a fake dep will miss this log.
7483
+
7484
+ **Fix:** Replace `process.stderr.write(...)` with `deps.stderr(...)`.
7485
+
7486
+ **Files:** `src/coordinators/modes/implement-shared.ts`, `src/coordinators/pr-review.ts`.
7487
+
7488
+ **Priority:** Medium. Correctness issues that won't crash in production but make future refactors unsafe.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.59.4",
3
+ "version": "3.59.6",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {