npm - @exaudeus/workrail - Versions diffs - 3.67.0 → 3.68.1 - Mend

@exaudeus/workrail 3.67.0 → 3.68.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (144) hide show

package/dist/application/services/compiler/template-registry.js +10 -1
package/dist/cli/commands/worktrain-init.js +1 -1
package/dist/console-ui/assets/{index-tOl8Vowf.js → index-DPdRJHMX.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/coordinators/modes/full-pipeline.js +4 -4
package/dist/coordinators/modes/implement-shared.js +5 -5
package/dist/coordinators/modes/implement.js +4 -4
package/dist/coordinators/pr-review.js +4 -4
package/dist/daemon/workflow-runner.d.ts +1 -0
package/dist/daemon/workflow-runner.js +1 -0
package/dist/manifest.json +31 -31
package/dist/mcp/handlers/v2-context-budget.js +18 -0
package/dist/mcp/handlers/v2-workflow.js +1 -1
package/dist/mcp/workflow-protocol-contracts.js +2 -2
package/dist/v2/durable-core/constants.d.ts +2 -0
package/dist/v2/durable-core/constants.js +2 -1
package/dist/v2/projections/session-metrics.js +1 -1
package/docs/authoring-v2.md +4 -4
package/docs/changelog-recent.md +3 -3
package/docs/configuration.md +1 -1
package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
package/docs/design/adaptive-coordinator-context.md +1 -1
package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
package/docs/design/adaptive-coordinator-routing-review.md +1 -1
package/docs/design/adaptive-coordinator-routing.md +34 -34
package/docs/design/agent-cascade-protocol.md +2 -2
package/docs/design/console-daemon-separation-discovery.md +323 -0
package/docs/design/context-assembly-design-candidates.md +1 -1
package/docs/design/context-assembly-implementation-plan.md +1 -1
package/docs/design/context-assembly-layer.md +2 -2
package/docs/design/context-assembly-review-findings.md +1 -1
package/docs/design/coordinator-access-audit.md +293 -0
package/docs/design/coordinator-architecture-audit.md +62 -0
package/docs/design/coordinator-error-handling-audit.md +240 -0
package/docs/design/coordinator-testability-audit.md +426 -0
package/docs/design/daemon-architecture-discovery.md +1 -1
package/docs/design/daemon-console-separation-discovery.md +242 -0
package/docs/design/daemon-memory-audit.md +203 -0
package/docs/design/design-candidates-console-daemon-separation.md +256 -0
package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
package/docs/design/discovery-loop-fix-candidates.md +161 -0
package/docs/design/discovery-loop-fix-design-review.md +106 -0
package/docs/design/discovery-loop-fix-validation.md +258 -0
package/docs/design/discovery-loop-investigation-A.md +188 -0
package/docs/design/discovery-loop-investigation-B.md +287 -0
package/docs/design/exploration-workflow-candidates.md +205 -0
package/docs/design/exploration-workflow-design-review.md +166 -0
package/docs/design/exploration-workflow-discovery.md +443 -0
package/docs/design/ide-context-files-candidates.md +231 -0
package/docs/design/ide-context-files-design-review.md +85 -0
package/docs/design/ide-context-files.md +615 -0
package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
package/docs/design/in-process-http-audit.md +190 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
package/docs/design/loadSessionNotes-candidates.md +108 -0
package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
package/docs/design/probe-session-design-candidates.md +261 -0
package/docs/design/probe-session-phase0.md +490 -0
package/docs/design/routines-guide.md +7 -7
package/docs/design/session-metrics-attribution-candidates.md +250 -0
package/docs/design/session-metrics-attribution-design-review.md +115 -0
package/docs/design/session-metrics-attribution-discovery.md +319 -0
package/docs/design/session-metrics-candidates.md +227 -0
package/docs/design/session-metrics-design-review.md +104 -0
package/docs/design/session-metrics-discovery.md +454 -0
package/docs/design/spawn-session-debug.md +202 -0
package/docs/design/trigger-validator-candidates.md +214 -0
package/docs/design/trigger-validator-review.md +109 -0
package/docs/design/trigger-validator-shaping-phase0.md +239 -0
package/docs/design/trigger-validator.md +454 -0
package/docs/design/v2-core-design-locks.md +2 -2
package/docs/design/workflow-extension-points.md +15 -15
package/docs/design/workflow-id-validation-at-startup.md +1 -1
package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
package/docs/design/worktrain-task-queue-candidates.md +5 -5
package/docs/design/worktrain-task-queue.md +4 -4
package/docs/discovery/coordinator-script-design.md +1 -1
package/docs/discovery/coordinator-ux-discovery.md +3 -3
package/docs/discovery/simulation-report.md +1 -1
package/docs/discovery/workflow-modernization-discovery.md +326 -0
package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
package/docs/discovery/worktrain-status-briefing.md +1 -1
package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
package/docs/docker.md +1 -1
package/docs/ideas/backlog.md +227 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
package/docs/integrations/claude-code.md +5 -5
package/docs/integrations/firebender.md +1 -1
package/docs/plans/agentic-orchestration-roadmap.md +2 -2
package/docs/plans/mr-review-workflow-redesign.md +9 -9
package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
package/docs/plans/ui-ux-workflow-discovery.md +2 -2
package/docs/plans/workflow-categories-candidates.md +8 -8
package/docs/plans/workflow-categories-discovery.md +4 -4
package/docs/plans/workflow-modernization-design.md +430 -0
package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
package/docs/plans/workflow-staleness-detection-review.md +4 -4
package/docs/plans/workflow-staleness-detection.md +9 -9
package/docs/plans/workrail-platform-vision.md +3 -3
package/docs/reference/agent-context-cleaner-snippet.md +1 -1
package/docs/reference/agent-context-guidance.md +4 -4
package/docs/reference/context-optimization.md +2 -2
package/docs/roadmap/now-next-later.md +2 -2
package/docs/roadmap/open-work-inventory.md +16 -16
package/docs/workflows.md +31 -31
package/package.json +1 -1
package/spec/workflow-tags.json +47 -47
package/workflows/adaptive-ticket-creation.json +16 -16
package/workflows/architecture-scalability-audit.json +22 -22
package/workflows/bug-investigation.agentic.v2.json +3 -3
package/workflows/classify-task-workflow.json +1 -1
package/workflows/coding-task-workflow-agentic.json +6 -6
package/workflows/cross-platform-code-conversion.v2.json +8 -8
package/workflows/document-creation-workflow.json +8 -8
package/workflows/documentation-update-workflow.json +8 -8
package/workflows/intelligent-test-case-generation.json +2 -2
package/workflows/learner-centered-course-workflow.json +2 -2
package/workflows/mr-review-workflow.agentic.v2.json +4 -4
package/workflows/personal-learning-materials-creation-branched.json +8 -8
package/workflows/presentation-creation.json +5 -5
package/workflows/production-readiness-audit.json +1 -1
package/workflows/relocation-workflow-us.json +31 -31
package/workflows/routines/context-gathering.json +1 -1
package/workflows/routines/design-review.json +1 -1
package/workflows/routines/execution-simulation.json +1 -1
package/workflows/routines/feature-implementation.json +3 -3
package/workflows/routines/final-verification.json +1 -1
package/workflows/routines/hypothesis-challenge.json +1 -1
package/workflows/routines/ideation.json +1 -1
package/workflows/routines/parallel-work-partitioning.json +3 -3
package/workflows/routines/philosophy-alignment.json +2 -2
package/workflows/routines/plan-analysis.json +1 -1
package/workflows/routines/plan-generation.json +1 -1
package/workflows/routines/tension-driven-design.json +6 -6
package/workflows/scoped-documentation-workflow.json +26 -26
package/workflows/ui-ux-design-workflow.json +14 -14
package/workflows/workflow-diagnose-environment.json +1 -1
package/workflows/workflow-for-workflows.json +1 -1

package/docs/design/design-review-findings-console-daemon-separation.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Design Review Findings: Console-Daemon Separation
+**Selected Design:** Candidate A -- Delete daemon-console.ts
+**Reviewed:** 2026-04-21
+**Status:** Raw review material for main agent synthesis
+---
+## Tradeoff Review
+### Tradeoff 1: CLI commands may fail if daemon runs without console
+- `worktrain spawn`, `worktrain await`, `worktrain-trigger-poll` discover the console port via `daemon-console.lock`. If the daemon no longer auto-starts the console, these commands will get ECONNREFUSED when the user has not run `worktrain console` separately.
+- **Verdict:** Acceptable. Failure mode is explicit (ECONNREFUSED), not silent. The lock-file discovery still works -- it just requires the user to start the console. Not a violation of acceptance criteria.
+- **Condition under which it fails:** An automated pipeline that expects the daemon to auto-start the console. Would need pipeline update.
+### Tradeoff 2: Browser dispatch returns 503 with a confusing message
+- Current message: "Autonomous dispatch requires v2 tools enabled." This is wrong for standalone console context -- the daemon IS enabled, but standalone console has no live v2 context.
+- **Verdict:** MUST FIX as part of implementation. Not optional. The message should clearly direct users to run `worktrain console` alongside the daemon or use the CLI.
+- **Condition under which it fails:** If left with the current message, users will be confused when they click the dispatch button with the standalone console running.
+---
+## Failure Mode Review
+| Failure Mode | Handled? | Mitigation |
+|---|---|---|
+| Browser dispatch 503 confusing message | REQUIRES FIX | Change message in console-routes.ts POST /api/v2/auto/dispatch |
+| CLI spawn/await/poll ECONNREFUSED | ACCEPTABLE | Clear error, users know to start console |
+| Lock file race (old daemon-console.ts and standalone conflict) | NOT APPLICABLE | daemon-console.ts is deleted |
+| Daemon tests referencing startDaemonConsole | CLEANUP NEEDED | Delete tests/unit/daemon-console.test.ts |
+**Most dangerous failure mode:** The 503 message confusion. All others are low risk.
+---
+## Runner-Up / Simpler Alternative Review
+**Runner-up (Candidate C -- proxy):** One element worth borrowing: making the 503 response more actionable. Instead of a generic message, distinguish between "standalone console (daemon not available)" and "daemon console with working dispatch". The improvement is 2 lines in console-routes.ts. Proxy routes from Candidate C are not needed.
+**Simpler variant of Candidate A:** Candidate A without the 503 message fix. Satisfies structural acceptance criteria but leaves confusing UX. Not recommended -- the message fix is too small to omit.
+**Hybrid:** Candidate A + improved 503 message = the correct implementation. This is what is recommended.
+---
+## Philosophy Alignment
+| Principle | Status | Notes |
+|---|---|---|
+| Architectural fixes over patches | SATISFIED | Deleting root cause, not patching |
+| YAGNI with discipline | SATISFIED | Net -220 lines |
+| Make illegal states unrepresentable | SATISFIED | Dual-console ambiguity eliminated |
+| Dependency injection for boundaries | SATISFIED | standalone-console.ts already uses DI correctly |
+| Errors are data | ACCEPTABLE TENSION | 503 HTTP response is correct at HTTP boundary |
+No risky philosophy tensions.
+---
+## Findings
+### RED (Blocking)
+None.
+### ORANGE (Must Address Before Implementation)
+**O1: 503 error message is wrong for standalone console context**
+- Location: `src/v2/usecases/console-routes.ts`, POST /api/v2/auto/dispatch handler, line ~758
+- Current: "Autonomous dispatch requires v2 tools enabled."
+- Required: "Autonomous dispatch requires the WorkTrain daemon. Run `worktrain console` alongside `worktrain daemon`, or use the `worktrain dispatch` CLI command."
+- Why orange: UX-blocking confusion for the primary user. Small fix but must be done.
+### YELLOW (Should Address, Not Blocking)
+**Y1: trigger-listener.ts comment still mentions startDaemonConsole**
+- Location: `src/trigger/trigger-listener.ts`, TriggerListenerHandle interface JSDoc (lines 81-88)
+- The JSDoc says "Pass to startDaemonConsole() so POST /sessions/:id/steer can dispatch steers..."
+- After Candidate A, this guidance is stale. Remove the `steerRegistry` JSDoc reference to `startDaemonConsole`.
+**Y2: cli-worktrain.ts daemon startup may have a console handle cleanup path**
+- Location: `src/cli-worktrain.ts`, around line 439 -- `consoleHandle` variable used in signal handlers
+- After removing `startDaemonConsole()` call, verify the SIGINT/SIGTERM handler correctly cleans up without attempting `consoleHandle.stop()`.
+**Y3: launchd service definition may need updating**
+- If the owner runs `worktrain daemon` via launchd, a second launchd plist (or a wrapper script) is needed to also start `worktrain console`. This is documentation/deployment work, not code.
+---
+## Recommended Revisions
+1. **Delete `src/trigger/daemon-console.ts`** (220 lines)
+2. **Delete `tests/unit/daemon-console.test.ts`** (tests for deleted file)
+3. **In `src/cli-worktrain.ts`**: Remove lines 370, 431-449 (startDaemonConsole import and call); update signal handler to not reference `consoleHandle`
+4. **In `src/v2/usecases/console-routes.ts`**: Update POST /api/v2/auto/dispatch 503 message (O1)
+5. **In `src/trigger/trigger-listener.ts`**: Remove stale JSDoc reference to `startDaemonConsole` (Y1)
+6. **(Optional)** Document launchd startup change in `docs/configuration.md`
+---
+## Residual Concerns
+**RC1: The `worktrain dispatch` CLI command must exist or be easy to use**
+The improved 503 message references `worktrain dispatch`. If this command does not exist, the message is misleading. Verify that a dispatch CLI command exists (`worktrain-trigger-poll.ts` does force-poll, but does a dispatch-by-workflow-id CLI exist?). If not, the 503 message should be adjusted to remove the CLI reference.
+**RC2: Owner may not know to run both `worktrain daemon` and `worktrain console` separately**
+This is a documentation and UX concern, not a code concern. The daemon startup log should include a line like "[DaemonConsole] Start the WorkRail console with: worktrain console" when the console is not detected. This would be a small addition to cli-worktrain.ts after the trigger listener starts.

package/docs/design/design-review-findings-discovery-loop-fix.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Design Review Findings: Discovery Loop Fix
+**Date:** 2026-04-19
+**Design reviewed:** Candidate A (exact spec implementation)
+---
+## Tradeoff Review
+| Tradeoff | Assessment | Verdict |
+|---|---|---|
+| `checkIdempotency` gains TTL behavior | Time-dependent, but TTL is read from file -- no `Date.now()` mock needed in tests | Acceptable |
+| `applyGitHubLabel` adds I/O in fire-and-forget callback | Non-fatal, warn logged, does not block cycle | Acceptable |
+| Optional 5th param on `spawnSession` | All test fakes use `vi.fn()` without strict param-count assertions | Acceptable |
+| Malformed sidecar -> conservative 'active' | Controlled write path; TTL handles cleanup on successful parse | Acceptable |
+---
+## Failure Mode Review
+| Failure Mode | Coverage | Risk |
+|---|---|---|
+| GitHub token expired when applying label | Non-fatal (warn), loop could restart | Low -- config error, warn is the signal |
+| Sidecar write fails | Dispatch proceeds, Fix 2 label is primary guard | Low |
+| Sidecar delete fails | TTL handles cleanup after 56 min | Low |
+| Daemon crash between sidecar write and dispatch | Sidecar blocks re-dispatch until TTL expires | Low -- correct behavior |
+| Fix 2 deployed without Fix 1 | Single PR strategy eliminates this risk | Mitigated |
+| New PipelineOutcome kind in future | If-check safe -- new kinds don't trigger label | Low |
+---
+## Runner-Up / Simpler Alternative Review
+- **Candidate B** (separate sidecar check function): Rejected -- creates coordination risk, no benefit over single function
+- **Skip Fix 3**: Cannot satisfy acceptance criteria (sidecar test cases required)
+- **No hybrid needed**: Candidate A is already minimal
+---
+## Philosophy Alignment
+| Principle | Status |
+|---|---|
+| Immutability by default | Satisfied -- all new fields `readonly` |
+| Type safety | Satisfied -- `Promise<PipelineOutcome>` replaces `Promise<unknown>` |
+| Errors are data | Satisfied with acceptable fire-and-forget exception for non-fatal I/O |
+| Dependency injection | Satisfied -- `fetchFn` injected |
+| Exhaustiveness | Satisfied -- all PipelineOutcome kinds handled |
+| YAGNI | Satisfied -- no extra abstractions |
+---
+## Findings
+### Yellow (observe, no change required)
+**Y1: `checkIdempotency` JSDoc needs updating**
+The function comment currently describes only session file scanning. With sidecar file scanning added, the comment should document the new sidecar file format and TTL behavior. Update the JSDoc before shipping.
+**Y2: `applyGitHubLabel` POST body format**
+GitHub Labels API `POST /repos/:owner/:repo/issues/:issue_number/labels` requires body `{ "labels": ["label-name"] }`. Verify the implementation sends JSON with `Content-Type: application/json` header.
+**Y3: Sidecar delete in .catch() handler**
+The `.catch()` handler currently only calls `this.dispatchingIssues.delete(top.issue.number)`. The sidecar delete must also happen in `.catch()` to avoid a stale sidecar when the pipeline promise rejects (not escalates).
+---
+## Recommended Revisions
+1. Update `checkIdempotency` JSDoc to describe sidecar file format and TTL behavior
+2. Add `Content-Type: application/json` to `applyGitHubLabel` request headers
+3. Ensure sidecar is deleted in BOTH `.then()` and `.catch()` handlers
+None of these are blockers -- all are implementation details that can be addressed during coding.
+---
+## Residual Concerns
+- **Token rotation**: If the GitHub token is rotated and the env var is not updated, `applyGitHubLabel` will silently fail. The warn log is the only signal. This is an operational concern, not a code concern.
+- **Sidecar file accumulation**: If both `.then()` and `.catch()` fail to delete the sidecar (e.g., permissions issue), sidecars accumulate. They expire naturally after TTL but are never cleaned. Low risk -- cleanup could be added to daemon startup as a follow-on.

package/docs/design/discovery-loop-fix-candidates.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Discovery Loop Fix -- Design Candidates
+**Date:** 2026-04-21
+**Context:** Candidates generated as part of `wr.discovery` workflow validation. Final recommendations live in `discovery-loop-fix-validation.md`.
+---
+## Problem Understanding
+**Core tensions:**
+1. Fire-and-forget dispatch (scalability) vs. outcome-aware cleanup (correctness). The existing `void dispatchP` pattern is right for responsiveness but wrong when cleanup requires type information from the result.
+2. In-memory state (fast, zero I/O) vs. persistent state (crash-safe, cross-restart). `dispatchingIssues` Set is fine for within-process dedup but fails on daemon restart.
+3. Coordinator budget vs. runner default. `DISCOVERY_TIMEOUT_MS=55min` (adaptive-pipeline.ts:39) and `DEFAULT_SESSION_TIMEOUT_MINUTES=30` (workflow-runner.ts:83) are owned by different concerns with no structural coupling.
+**Likely seam:** The queue-poller/pipeline boundary in `polling-scheduler.ts`. The poller fires a `Promise<unknown>` and discards the typed `PipelineOutcome`. This is where the fix belongs.
+**What makes it hard:**
+- Three root causes are independent but interact -- fixing only one leaves the others active
+- The idempotency guard `checkIdempotency` looks correct but is dead because session files never contain the `context` field it looks for
+- The `Promise<unknown>` cast is a one-line antipattern that silently disables the type system
+---
+## Philosophy Constraints
+From `CLAUDE.md` and observed repo patterns:
+- **Type safety as first line of defense** -- `Promise<unknown>` cast violated this; fix must restore it
+- **Make illegal states unrepresentable** -- session timeout and coordinator budget should be structurally coupled, not independently configured
+- **Errors are data** -- `PipelineOutcome` is a proper Result-style discriminated union; the caller must inspect it
+- **Exhaustiveness everywhere** -- the discriminated union demands all `kind` values be handled
+- **YAGNI with discipline** -- incident response should be minimal; architectural refactors are follow-on
+---
+## Impact Surface
+- `queueConfig.excludeLabels`: adding a label changes which issues are globally excluded from selection
+- `CoordinatorDeps.spawnSession` interface in `pr-review.ts`: changing the signature requires updating all implementations and test fakes
+- GitHub API calls from `polling-scheduler.ts`: new failure mode -- API error on label write must not crash the daemon
+- `full-pipeline.ts` spawn sites: all three (discovery, shaping, coding) need the `agentConfig` threading for completeness
+---
+## Candidates
+### Candidate 1 -- Minimal Viable: Fix 2 only (PipelineOutcome inspection + label)
+**Summary:** In `polling-scheduler.ts`, change the `dispatchAdaptivePipeline` cast from `Promise<unknown>` to `Promise<PipelineOutcome>`, add `(outcome)` parameter to `.then()`, and on `outcome.kind === 'escalated'` apply `worktrain:blocked` label to the issue via GitHub API. Add `worktrain:blocked` to `queueConfig.excludeLabels`.
+**Tensions resolved:** Loop prevention via persistent external signal. Fire-and-forget semantics preserved (label write is inside non-blocking `.then()`).
+**Tensions accepted:** Session timeout mismatch not fixed. Discovery still dies at 30 min; loop stops by label rather than by completion. Future new issues will also be immediately labeled `worktrain:blocked` after their first 30-minute timeout.
+**Boundary:** Queue-poller/pipeline boundary in `polling-scheduler.ts`. Correct seam -- this is where the issue lifecycle decision lives.
+**Why this boundary is best-fit:** The poller owns issue selection and skipping. Adding label writes to the completion handler keeps the concern co-located with the dispatch decision.
+**Failure mode:** GitHub API error on label write. Must be caught and logged; daemon must not crash. Add try/catch around the label write.
+**Repo-pattern relationship:** Adapts existing `excludeLabels` check and `worktrain:in-progress` label pattern. No new patterns introduced.
+**Gains:** Stops the loop immediately with ~20 lines of code change. Single file. Deployable today.
+**Losses:** Sessions still time out -- future issues will hit the same 30-minute wall. Every new issue needs manual label removal to be retried.
+**Scope judgment:** Best-fit for incident response. Too narrow for a production fix.
+**Philosophy fit:** Restores type safety (removes `unknown` cast). Honors 'errors are data'. YAGNI-compatible for an incident fix.
+---
+### Candidate 2 -- Complete Fix Set: Fixes 1+2+3
+**Summary:** Apply all three fixes. (1) Extend `CoordinatorDeps.spawnSession` to accept 5th param `agentConfig?: { readonly maxSessionMinutes?: number }` and forward it to `routerRef.dispatch()` in `trigger-listener.ts`; update all three spawn sites in `full-pipeline.ts` with the correct per-phase timeout (discovery=55, shaping=35, coding=65). (2) Fix 2 as above. (3) Write issue-ownership sidecar file to `~/.workrail/daemon-sessions/<issueNumber>-queue.json` in `doPollGitHubQueue` before dispatch, delete it in `.then`/`.catch`.
+**Tensions resolved:** All three root causes addressed. Session completes successfully. Cross-restart safety via sidecar + label.
+**Tensions accepted:** Larger changeset requiring test fake updates. Sidecar write adds new I/O path.
+**Boundary:** Three seams simultaneously -- session timeout (trigger-listener + full-pipeline), outcome discard (polling-scheduler), idempotency (polling-scheduler + doPollGitHubQueue).
+**Why this boundary is best-fit:** Each fix is at the correct seam. Fixing all three is coherent because they address separate failure modes in the same execution path.
+**Failure mode:** Sidecar write failure could in theory block dispatch -- must be fire-and-forget (log error, continue). `agentConfig` threading requires updating `CoordinatorDeps` interface in `pr-review.ts` and all test fakes.
+**Repo-pattern relationship:** Fix 1 follows the same pattern as `context?` threading in `spawnSession`. Fix 2 restores PipelineOutcome type safety. Fix 3 introduces a new naming convention for issue-ownership sidecars (distinguished from regular session files by `-queue.json` suffix).
+**Gains:** Pipeline can actually complete successfully. Issue loop stops. Crash safety for concurrent dispatch window. All future issues benefit.
+**Losses:** Larger changeset. More test updates. Slightly more complex `.then()` handler.
+**Scope judgment:** Best-fit for a production fix. Slightly broad for incident response only (Fix 1 is not strictly needed to stop the loop).
+**Philosophy fit:** Best alignment. Restores type safety, exhaustiveness, and structural coupling of timeout values.
+---
+### Candidate 3 -- Architectural: Coordinator-Owns-Termination
+**Summary:** Move issue lifecycle management out of the queue poller and into the adaptive coordinator. After `runAdaptivePipeline` returns `PipelineOutcome`, the coordinator calls GitHub API via an injected dep to (a) apply `worktrain:blocked` + unassign on `kind === 'escalated'`, (b) close issue on `kind === 'merged'`. Queue poller becomes a pure selector -- it only reads from GitHub, never writes back.
+**Tensions resolved:** Clean separation of concerns -- selector (poller) and executor (coordinator) are fully decoupled. Coordinator owns its own cleanup.
+**Tensions accepted:** Coordinator gains a GitHub API dependency it currently does not have. Requires extending `AdaptiveCoordinatorDeps` with a `labelIssue(issueNumber, labels)` dep and threading issue context into the coordinator opts.
+**Boundary:** The coordinator's `runAdaptivePipeline` function in `adaptive-pipeline.ts` / `full-pipeline.ts`. The coordinator already has `deps.postToOutbox` for side effects -- adding `deps.labelIssue` is consistent.
+**Why this boundary is best-fit for the long run:** Whoever runs the pipeline should own its cleanup. This is the correct invariant for a production system. But today the coordinator has zero GitHub API surface.
+**Failure mode:** Requires threading `issueNumber` into `AdaptivePipelineOpts` (currently has `taskCandidate?: Readonly<Record<string, unknown>>` -- untyped). Proper threading would need a typed `issueNumber?: number` field. Invasive but clean.
+**Repo-pattern relationship:** Departs from existing pattern where GitHub API interactions live exclusively in the trigger layer. Introduces a new dep type into the coordinator.
+**Gains:** Clean architectural invariant. Coordinator is self-contained. Poller logic simplifies.
+**Losses:** Invasive change to coordinator interface. Out of scope for incident response. Adds deps to coordinator that have no tests yet.
+**Scope judgment:** Too broad for incident response. Correct long-term architecture.
+**Philosophy fit:** Honors 'architectural fixes over patches'. Conflicts with YAGNI for incident response.
+---
+## Comparison and Recommendation
+| Criterion | Candidate 1 | Candidate 2 | Candidate 3 |
+|---|---|---|---|
+| Stops the loop | Yes | Yes | Yes |
+| Pipeline can complete | No | Yes | Yes |
+| Cross-restart safe | Partially | Fully | Fully |
+| Scope | Minimal | Medium | Large |
+| Test impact | Low | Medium | High |
+| Philosophy fit | Good | Best | Good (long-term) |
+**Recommendation: Candidate 2 (complete fix set)**
+Rationale: Candidate 1 stops the loop but immediately creates a new operational burden -- every new issue that fails discovery gets permanently labeled `worktrain:blocked` and requires manual label removal. Candidate 2 is the correct production fix. Fix 1 is the necessary pairing with Fix 2; without it, every issue through FULL mode hits the 30-minute wall regardless.
+Candidate 3 is the right long-term architecture but wrong scope for an incident fix.
+---
+## Self-Critique
+**Strongest counter-argument to Candidate 2:** The signature change to `CoordinatorDeps.spawnSession` touches multiple files and test fakes. For an incident response, Candidate 1 (Fix 2 only) stops the loop with 5 minutes of work; Fix 1 can be a follow-on PR. The recommendation depends on how urgent the 'pipeline must complete' requirement is vs. 'loop must stop now'.
+**Pivot conditions:**
+- If operator can accept manual label removal for retry: deploy Candidate 1 now, Candidate 2 in the next sprint
+- If the codebase has comprehensive test coverage of `spawnSession`: use Candidate 2 immediately (the interface change is mechanical)
+- If coordinator already has GitHub API deps: Candidate 3 becomes feasible scope
+**Invalidating assumption:** If `queueConfig.excludeLabels` is not configurable at deploy time, Fix 2's label has no effect. Verify the label name appears in the config before deploying.
+---
+## Open Questions for the Main Agent
+1. Should `worktrain:blocked` be a permanent label, or should it have a TTL so transient failures can be retried autonomously after N hours?
+2. Should the sidecar file (Fix 3) use the issue number as the key (not the session ID) to survive session deletion on timeout?
+3. Should Fix 2 also remove the `worktrain:in-progress` label and unassign the bot from the issue when applying `worktrain:blocked`?
+4. Does the coordinator need to post a GitHub comment on escalation so the operator knows what happened?

package/docs/design/discovery-loop-fix-design-review.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Discovery Loop Fix -- Design Review Findings
+**Date:** 2026-04-21
+**Selected direction:** Candidate 2 -- complete fix set (Fix 1+2+3)
+**Context:** Review of tradeoffs, failure modes, and philosophy alignment for the selected direction.
+---
+## Tradeoff Review
+| Tradeoff | Verdict | Condition that makes it unacceptable |
+|---|---|---|
+| Interface signature change to `CoordinatorDeps.spawnSession` (5th optional param) | Acceptable | If test fakes use `as unknown as CoordinatorDeps` casts that miss the new param |
+| Sidecar write is new I/O (fire-and-forget) | Acceptable | If `~/.workrail/` is on a slow mount; negligible on local disk |
+| Permanent `worktrain:blocked` label on first escalation | Acceptable for this incident (13 consecutive timeouts) | Would be unacceptable for a single transient failure |
+---
+## Failure Mode Review
+| Failure mode | Coverage | Missing mitigation | Risk |
+|---|---|---|---|
+| GitHub API error on label write propagates to daemon | Partial -- outer `void` swallows it but log is confusing | Add explicit `catch` on label write with structured log entry `[QueuePoll] label-write-failed #N` | Medium |
+| Escalation label not in `queueConfig.excludeLabels` | Not enforced | Use `worktrain:in-progress` (already excluded) OR add startup assertion | **High** |
+| `agentConfig` threading missed for shaping/coding spawn sites | Not enforced by TypeScript | Code review checklist item for all 3 spawn sites | Medium |
+**Most dangerous:** Failure mode 2. If the label name is not in `excludeLabels`, Fix 2 applies the label visibly but has zero effect on re-selection. The loop continues and the operator sees a confusing state.
+---
+## Runner-Up / Simpler Alternative Review
+**Runner-up (Candidate 1, Fix 2 only):** Deployable faster but operationally worse in production -- every new issue through FULL mode gets permanently labeled after first escalation. Ruled out as standalone deployment.
+**Hybrid opportunity:** Deploy Fix 2 in PR #1 (immediate loop stop), then Fix 1+3 in PR #2. The two PRs are independent. This captures Candidate 1's deployment velocity without sacrificing Candidate 2's completeness.
+**Label simplification:** Using `worktrain:in-progress` as the escalation label instead of `worktrain:blocked` eliminates the highest-risk failure mode (failure mode 2) at no cost. Slightly wrong semantic (issue is not in progress after escalation) but pragmatically correct. Recommended.
+---
+## Philosophy Alignment
+| Principle | Status |
+|---|---|
+| Type safety as first line of defense | Restored by removing `Promise<unknown>` cast |
+| Exhaustiveness everywhere | Partially satisfied -- `merged` and `dry_run` cases need explicit handling even if they are no-ops |
+| Errors are data | Satisfied -- PipelineOutcome is used, not discarded |
+| Make illegal states unrepresentable | Under tension -- timeout mismatch becomes structurally bounded but not fully eliminated |
+| Architectural fixes over patches | Under tension -- Candidate 3 (coordinator-owns-termination) is the correct architectural direction; this fix patches the poller boundary |
+| YAGNI with discipline | Satisfied -- Fix 3 is low-cost and addresses a real crash scenario |
+---
+## Findings
+### RED
+**Finding R1: Escalation label not in excludeLabels is a silently ineffective fix**
+If `worktrain:blocked` (or whatever label is applied on escalation) is not present in `queueConfig.excludeLabels`, Fix 2 writes the label to GitHub but has no effect on re-selection. The loop continues. The operator sees a label but no behavioral change. This is the most dangerous failure mode.
+- **Mitigation:** Use `worktrain:in-progress` (already in excludeLabels) as the escalation label, or add a startup runtime assertion.
+### ORANGE
+**Finding O1: Fix 2 `.then()` handler needs explicit `catch` on label write**
+An unhandled rejection from the GitHub API call inside `.then()` could produce a confusing unhandled-rejection warning. The daemon will not crash (outer `void` catches it), but the error will not be logged structurally.
+- **Mitigation:** Add `try/catch` around the label write with `console.warn('[QueuePoll] label-write-failed #N ...')`.
+**Finding O2: Exhaustiveness in `.then()` outcome handler**
+The `.then((outcome) =>` handler should handle all three `PipelineOutcome` kinds explicitly: `escalated` (apply label), `merged` (log success), `dry_run` (log, no action). Without exhaustive handling, a future change to `PipelineOutcome` could silently skip new outcome kinds.
+- **Mitigation:** Add explicit case for `merged` and `dry_run` (even if they are just console.log calls) so TypeScript enforces exhaustiveness on future changes.
+**Finding O3: Fix 1 must cover all three spawn sites**
+The `agentConfig` threading must be applied to `wr.discovery` (55 min), `wr.shaping` (35 min), and `wr.coding` (65 min) spawn sites in `full-pipeline.ts`. Omitting any one results in that phase inheriting the 30-minute default.
+- **Mitigation:** Code review checklist; confirm all three TIMEOUT constants are passed.
+### YELLOW
+**Finding Y1: Permanent label may be too aggressive for transient failures**
+A network outage or rate limit error would trigger the same `worktrain:blocked` label as a structural failure. For a production system with many issues, this creates operational toil.
+- **Mitigation (follow-on):** Add an N-strike mechanism (label only after 3+ consecutive escalations). Out of scope for incident fix.
+**Finding Y2: Sidecar file naming convention is undocumented**
+The proposed `-queue.json` suffix distinguishes issue-ownership sidecars from session sidecars, but there is no comment in `checkIdempotency` explaining the naming convention.
+- **Mitigation:** Add a comment to `checkIdempotency` noting that `*-queue.json` files are issue-ownership sidecars written by the queue poller.
+---
+## Recommended Revisions
+1. **Use `worktrain:in-progress` as the escalation label** (not a new `worktrain:blocked` label) to eliminate Finding R1. If semantics matter, remove `worktrain:in-progress` and add a new label in the same config change -- but do not deploy label application before config verification.
+2. **Add explicit `catch` on label write** inside `.then()` handler (Finding O1).
+3. **Handle all three `PipelineOutcome` kinds** in the `.then()` handler (Finding O2).
+4. **Verify all three spawn sites** in `full-pipeline.ts` during code review (Finding O3).
+5. **Deploy as two sequential PRs:** Fix 2 first (loop stop), then Fix 1+3 (completeness). This makes the loop fix independently verifiable.
+---
+## Residual Concerns
+- **Coordinator-owns-termination (Candidate 3)** is the architecturally correct long-term design. The current fix patches the poller. This should be logged as technical debt.
+- **N-strike mechanism** for label application would reduce operational toil for transient failures. Worth a follow-on issue.
+- **Full-pipeline.ts spawn sites** should be verified during implementation -- the investigation docs describe them but line numbers were not directly verified in this session.