npm - @exaudeus/workrail - Versions diffs - 3.59.4 → 3.59.6 - Mend

@exaudeus/workrail 3.59.4 → 3.59.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/dist/console-ui/assets/{index-BuMfiLrV.js → index-xMwhHmR2.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/coordinators/modes/full-pipeline.js +4 -4
package/dist/coordinators/pr-review.d.ts +4 -1
package/dist/manifest.json +19 -19
package/dist/trigger/adapters/github-queue-poller.js +25 -1
package/dist/trigger/polling-scheduler.d.ts +1 -0
package/dist/trigger/polling-scheduler.js +48 -5
package/dist/trigger/trigger-listener.js +2 -1
package/dist/trigger/trigger-router.js +4 -1
package/docs/design/dispatch-dedup-prealloc-bypass-candidates.md +187 -0
package/docs/design/dispatch-dedup-prealloc-bypass-design-review.md +100 -0
package/docs/design/dispatch-dedup-prealloc-bypass-implementation-plan.md +218 -0
package/docs/ideas/backlog.md +135 -0
package/package.json +1 -1

package/docs/design/dispatch-dedup-prealloc-bypass-implementation-plan.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Implementation Plan: Bypass Dispatch Dedup for Pre-Allocated Sessions
+**Date:** 2026-04-19
+**Branch:** fix/dispatch-dedup-prealloc-bypass
+**Scope:** src/trigger/trigger-router.ts only (src/mcp/ excluded)
+---
+## 1. Problem Statement
+`TriggerRouter.dispatch()` has a 30-second deduplication guard that compares incoming
+`goal::workspacePath` against `_recentAdaptiveDispatches`. When `dispatchAdaptivePipeline()`
+runs, it writes this key. Milliseconds later, `spawnSession()` calls `dispatch()` with the same
+goal and workspace (plus `_preAllocatedStartResponse`). The dedup guard fires, `queue.enqueue()`
+is never called, and the session that was already written to the store by `executeStartWorkflow()`
+zombies permanently.
+---
+## 2. Acceptance Criteria
+1. `dispatch()` called with `_preAllocatedStartResponse` set bypasses the dedup check and calls
+   `runWorkflowFn` exactly once.
+2. `dispatch()` called WITHOUT `_preAllocatedStartResponse` still deduplicates correctly within 30s.
+3. `npm run build` exits clean (no TypeScript errors).
+4. `npx vitest run tests/unit/trigger-router.test.ts` -- all tests pass including two new tests.
+5. `npx vitest run` -- no regressions in any other test file.
+6. PR merged to main via `gh pr merge <N> --squash`.
+7. Daemon rebuilt and reinstalled (`npm run build && node dist/cli-worktrain.js daemon --install`).
+8. `node dist/cli-worktrain.js trigger poll self-improvement` starts a session and `session_started`
+   appears in the event log within 30s.
+---
+## 3. Non-Goals
+- Do NOT touch `src/mcp/` (any file).
+- Do NOT implement Option B (remove dedup from dispatch() entirely).
+- Do NOT implement Option C (separate dedup maps).
+- Do NOT touch `route()` or `dispatchAdaptivePipeline()`.
+- Do NOT change the dedup TTL or map key format.
+---
+## 4. Philosophy-Driven Constraints
+- Guard comment must explain WHY, not what (CLAUDE.md: 'Document why, not what').
+- No code duplication: both the pre-alloc path and the normal path reach the same single
+  `queue.enqueue()` call (CLAUDE.md: 'Compose with small, pure functions').
+- Guard uses `!== undefined` not a falsy check (CLAUDE.md: 'Type safety as the first line of defense').
+---
+## 5. Invariants
+1. When `_preAllocatedStartResponse !== undefined`, `queue.enqueue()` MUST be called.
+2. When `_preAllocatedStartResponse === undefined`, the existing dedup check runs unchanged.
+3. `_recentAdaptiveDispatches` is not updated for pre-alloc dispatch calls (the entry from
+   `dispatchAdaptivePipeline()` remains and is the correct TTL anchor for top-level dedup).
+4. The `assertNever` exhaustiveness guard in the enqueue callback remains intact.
+---
+## 6. Selected Approach + Rationale
+**Approach:** Wrap the dedup block in `dispatch()` with:
+```typescript
+if (workflowTrigger._preAllocatedStartResponse === undefined) {
+  // ... existing dedup block ...
+}
+```
+Both the pre-alloc path and the normal (post-dedup) path fall through to the same single
+`void this.queue.enqueue(...)` call.
+**Rationale:** Minimal blast radius. Single guard. No code duplication. Directly models the
+documented invariant in `WorkflowTrigger._preAllocatedStartResponse` JSDoc.
+**Runner-up:** Early-return guard before the dedup block (Candidate A from design review).
+Lost because it risks duplicating the enqueue callback body. Structurally equivalent otherwise.
+---
+## 7. Vertical Slices
+### Slice 1: Implementation fix in trigger-router.ts
+**Scope:** `src/trigger/trigger-router.ts`, `dispatch()` method only (lines 847-933).
+**Change:**
+1. Add a guard comment before the dedup block explaining the pre-alloc invariant.
+2. Wrap the scoped dedup block `{...}` in `if (workflowTrigger._preAllocatedStartResponse === undefined)`.
+3. Add an optional `console.log` in the pre-alloc path for daemon observability.
+4. The existing `void this.queue.enqueue(...)` call remains after the if-block.
+**Acceptance criterion:** TypeScript compiles clean. The guard is visible and commented correctly.
+---
+### Slice 2: Unit tests in trigger-router.test.ts
+**Scope:** `tests/unit/trigger-router.test.ts`, new describe block or additions to existing
+'TriggerRouter.route and dispatch deduplication' describe block.
+**Two new tests:**
+**Test 1: dispatch() with _preAllocatedStartResponse bypasses dedup and calls runWorkflowFn**
+- Call `dispatchAdaptivePipeline(goal, workspace)` to prime the dedup map.
+- Then call `dispatch({ workflowId, goal, workspacePath: workspace, context: {}, _preAllocatedStartResponse: <fake> })`.
+- Flush the async queue.
+- Assert `calls.toHaveLength(1)` -- runWorkflowFn was called exactly once.
+**Test 2: dispatch() WITHOUT _preAllocatedStartResponse still deduplicates within 30s**
+- This test already exists at line 1604. Verify it still passes after the change.
+- No new test needed for this case; the existing test is the regression guard.
+**Acceptance criterion:** Both tests pass (`vitest run tests/unit/trigger-router.test.ts`).
+---
+### Slice 3: Build, test, PR, CI, merge
+**Steps:**
+1. `npm run build` -- clean
+2. `npx vitest run tests/unit/trigger-router.test.ts` -- all pass
+3. `npx vitest run` -- no regressions
+4. Create branch `fix/dispatch-dedup-prealloc-bypass`
+5. Commit: `fix(trigger): bypass dispatch dedup for pre-allocated sessions to prevent zombie sessions`
+6. Push + open PR
+7. Wait for CI
+8. Merge: `gh pr merge <N> --squash`
+---
+### Slice 4: Daemon reinstall and smoke test
+**Steps:**
+1. `npm run build && node dist/cli-worktrain.js daemon --install`
+2. `node dist/cli-worktrain.js trigger poll self-improvement`
+3. Watch for `session_started` in event log for at least 30s
+4. Confirm session is not zombie (status completes or progresses past `run_started`)
+---
+## 8. Test Design
+### New Test 1: Bypass case (primary regression test for this fix)
+```typescript
+it('dispatch(): bypasses dedup and calls runWorkflowFn when _preAllocatedStartResponse is set', async () => {
+  vi.useFakeTimers();
+  const { fn, calls } = makeFakeRunWorkflow();
+  const trigger = makeTrigger();
+  const router = new TriggerRouter(
+    makeIndex(trigger), FAKE_CTX, FAKE_API_KEY, fn,
+    undefined, undefined, undefined, undefined, undefined,
+    FAKE_DEPS, executors,
+  );
+  const goal = trigger.goal;
+  const workspace = trigger.workspacePath;
+  // Prime the dedup map via dispatchAdaptivePipeline
+  await router.dispatchAdaptivePipeline(goal, workspace);
+  // Now dispatch with _preAllocatedStartResponse set -- must bypass dedup
+  router.dispatch({
+    workflowId: trigger.workflowId,
+    goal,
+    workspacePath: workspace,
+    context: {},
+    _preAllocatedStartResponse: {} as any, // non-undefined value triggers bypass
+  });
+  // Flush
+  await new Promise((r) => setImmediate(r));
+  // runWorkflowFn must have been called exactly once
+  expect(calls).toHaveLength(1);
+  vi.useRealTimers();
+});
+```
+### Existing Test (regression guard): dispatch() dedup still works without _preAllocatedStartResponse
+The test at line 1604-1644 of trigger-router.test.ts covers this. It will serve as the
+regression guard after the change.
+---
+## 9. Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Guard removed in future refactor | Low | High | Unit test catches it in CI |
+| Comment becomes stale | Low | Medium | Comment explains invariant, not mechanics -- less likely to stale |
+| _preAllocatedStartResponse type changes | Very low | Low | TypeScript would catch it at compile time |
+---
+## 10. PR Packaging Strategy
+Single PR. One commit. No breaking changes.
+Branch: `fix/dispatch-dedup-prealloc-bypass`
+Commit: `fix(trigger): bypass dispatch dedup for pre-allocated sessions to prevent zombie sessions`
+---
+## 11. Philosophy Alignment
+| Principle | Status | Why |
+|---|---|---|
+| Architectural fixes over patches | Satisfied | Guard models the root invariant, not a special-case |
+| Make illegal states unrepresentable | Satisfied | `_preAllocatedStartResponse !== undefined` is compile-time discriminator |
+| YAGNI with discipline | Satisfied | Minimal change, no speculative abstractions |
+| Document why, not what | Satisfied | Guard comment explains invariant |
+| Type safety as first line of defense | Satisfied | `!== undefined` check, typed optional field |
+| Immutability by default | Tension (pre-existing) | Shared mutable map; not introduced by this fix |

package/docs/ideas/backlog.md CHANGED Viewed

@@ -7351,3 +7351,138 @@ An agent can die from: stream watchdog timeout (600s no progress), OOM kill, or
 ### Priority
 High. Agent crash recovery makes the overnight-autonomous bar achievable. Without it, any hung LLM call or tool timeout fails the entire pipeline silently. With it, transient failures are automatically retried and the pipeline continues.
+---
+## Workflow execution time tracking and prediction (Apr 21, 2026)
+**The problem:** WorkTrain has no data on how long workflows actually take. Timeouts are set by intuition (55 min for discovery, 35 for shaping, 65 for coding). We just discovered that discovery on a real workrail task takes ~16 minutes. The 55-minute timeout is 3x the actual time -- but we didn't know that until we ran a benchmark manually.
+### What to track
+For every completed session, record:
+- Workflow ID
+- Total wall-clock duration
+- Number of turns
+- Number of step advances
+- Outcome (success / timeout / stuck / error)
+- Task complexity signals (codebase size, number of files read, task type from context)
+Store in `~/.workrail/data/execution-stats.jsonl` -- one line per completed session, append-only.
+### What to do with it
+**Immediate use: calibrate timeouts automatically**
+Instead of hardcoded `DISCOVERY_TIMEOUT_MS = 55 * 60 * 1000`, read the p95 completion time from execution stats and set the timeout to `p95 * 1.5`. Start with the hardcoded values as seeds; refine after 10+ samples.
+**Medium-term use: predict duration before dispatch**
+Given: task description + workflow ID + codebase characteristics → predicted duration range.
+The coordinator could use this to:
+- Warn when a task is likely to exceed session limits before starting
+- Adjust timeout budgets per-dispatch based on predicted complexity
+- Surface "this type of task usually takes 45 minutes" in `worktrain trigger test` output
+**Longer-term use: quality/efficiency metrics**
+Track step-advance rate (steps per turn) as a proxy for workflow efficiency. A session with 50 turns and 2 step advances is spending too many turns between steps. This feeds into the workflow improvement loop.
+### Implementation notes
+- Append to `execution-stats.jsonl` in `runWorkflow()`'s finally block, same pattern as the daemon event log
+- Keep it simple: flat JSONL, no database, no schema migration
+- `worktrain status` can show recent timing stats: "Last 10 wr.discovery sessions: avg 18min, p95 31min"
+- `worktrain trigger validate` can warn if configured timeouts are well below historical p95
+### Priority
+Medium. The data collection is small (~5 lines in `runWorkflow()`). The prediction and calibration are more involved. Ship collection first, calibration second.
+---
+## WorkRail MCP server self-cleanup (Apr 21, 2026)
+**The problem:** The WorkRail MCP server accumulates stale state that never cleans itself up: old workflow copies in `~/.workrail/workflows/`, dead managed sources, git repo caches that can't pull, 500+ sessions in the store, stale remembered roots. None of it has a TTL or cleanup mechanism. Every server startup loads everything and logs validation errors for stale state.
+### Sources of stale state
+1. **`~/.workrail/workflows/`** -- manually copied or `worktrain init`-placed workflows that go stale when the repo updates. MCP server loads both repo copy and user copy; older one fails validation silently or noisily.
+2. **Managed sources** (`~/.workrail/data/managed-sources/`) -- paths that no longer exist stay registered. Server tries to load them on every startup.
+3. **Git workflow cache** (`~/.workrail/cache/git-*`) -- cloned repos whose remotes have changed, been deleted, or whose auth has expired. `git pull` fails; errors logged on every startup.
+4. **Session store** (`~/.workrail/data/sessions/`) -- sessions accumulate forever. No TTL, no archival. Console loads all 500+ on every `/api/v2/sessions` request (partially mitigated by mtime cache).
+5. **Remembered roots** (`~/.workrail/data/managed-sources/remembered-roots.json`) -- workspace paths from past sessions that no longer exist.
+### Fix: two layers
+**Layer 1: Defensive loading (mostly already done)**
+Every loader should already handle missing/broken sources gracefully. Audit: are all managed source failures caught and logged as warnings rather than errors? Are git cache failures non-fatal?
+**Layer 2: `workrail cleanup` command**
+```
+workrail cleanup [--yes] [--sessions --older-than <age>] [--sources] [--cache] [--roots]
+```
+- `--sources`: remove managed sources where path doesn't exist on disk
+- `--cache`: remove git caches where `git pull` fails (remote gone or auth expired)
+- `--sessions --older-than 30d`: archive or delete sessions older than N days
+- `--roots`: remove remembered roots where path doesn't exist
+- Without `--yes`: show what would be removed and ask for confirmation
+- With `--yes`: remove without prompting (for CI / worktrain init)
+**Layer 3: Automatic startup cleanup (light)**
+On MCP server startup, silently remove managed sources where the filesystem path doesn't exist (non-destructive -- the path is already gone). Log a single "removed N stale sources" line. Do not auto-remove sessions or caches -- those require explicit user intent.
+**Layer 4: User workflow directory sync**
+`~/.workrail/workflows/` should not be a place users copy workflows to manually. It should either:
+- Be deprecated entirely (use managed sources / workspace roots instead)
+- Have a `workrail sync` command that updates it from the canonical sources
+- Auto-detect when a user workflow is an older version of a bundled workflow and skip loading it
+### Priority
+Medium for the cleanup command (quality of life, stops log noise). High for startup auto-cleanup of dead managed sources (prevents the `Invalid workflow` errors that have been confusing throughout this session). Low for session TTL/archival (the mtime cache handles the performance concern).
+---
+## Worktree orphan leak on delivery failure (Apr 21, 2026, from Audit 4)
+**The bug:** In `src/trigger/trigger-router.ts`, `maybeRunDelivery()` on the success path deletes the session sidecar file before attempting worktree removal. If worktree removal fails (network error, git command failure), the sidecar is already gone. `runStartupRecovery()` scans sidecar files to find orphan worktrees -- so the orphaned worktree is invisible and accumulates indefinitely.
+**Fix:** In the success path cleanup, delete the sidecar AFTER worktree removal, not before. Or better: always attempt worktree removal in a `try/finally` that deletes the sidecar regardless of whether removal succeeded.
+**File:** `src/trigger/trigger-router.ts`, `maybeRunDelivery()` success path.
+**Priority:** Medium. Worktrees are small, but the leak is permanent across daemon restarts.
+---
+## queue-poll.jsonl never rotated (Apr 21, 2026, from Audit 5)
+**The bug:** `~/.workrail/queue-poll.jsonl` grows indefinitely. `appendFile`-only, no rotation. At 5-minute poll intervals with 2-3 events per cycle, this is ~8-87 MB/month depending on activity. Disk exhaustion risk on long-running daemons.
+**Fix:** Add a size check before appending in `appendQueuePollLog()`. If file exceeds 10 MB, rotate it: rename to `queue-poll.jsonl.1`, start fresh. Keep at most 2 rotated files.
+**File:** `src/trigger/polling-scheduler.ts`, `appendQueuePollLog()` function.
+**Priority:** Medium. Not urgent but a production correctness issue.
+---
+## ReviewSeverity missing assertNever + stderr bypassing injected dep (Apr 21, 2026, from Audit 2)
+**Bug 1 (Major):** In `src/coordinators/modes/implement-shared.ts`, the `switch(findings.severity)` over `ReviewSeverity` has no `default: assertNever(findings.severity)`. Widening `ReviewSeverity` with a new variant compiles cleanly and falls through silently.
+**Fix:** Add `default: assertNever(findings.severity)` to the switch.
+**Bug 2 (Major):** In `src/coordinators/pr-review.ts`, `readVerdictArtifact()` calls `process.stderr.write(...)` directly instead of using the injected `deps.stderr`. Tests that inject a fake dep will miss this log.
+**Fix:** Replace `process.stderr.write(...)` with `deps.stderr(...)`.
+**Files:** `src/coordinators/modes/implement-shared.ts`, `src/coordinators/pr-review.ts`.
+**Priority:** Medium. Correctness issues that won't crash in production but make future refactors unsafe.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.59.4",
+  "version": "3.59.6",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {