npm - @exaudeus/workrail - Versions diffs - 3.42.0 → 3.44.0 - Mend

@exaudeus/workrail 3.42.0 → 3.44.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/dist/console-ui/assets/{index-DwfWMKvv.js → index-Bi38ITiQ.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/workflow-runner.d.ts +15 -1
package/dist/daemon/workflow-runner.js +86 -9
package/dist/manifest.json +39 -23
package/dist/trigger/adapters/github-queue-poller.d.ts +34 -0
package/dist/trigger/adapters/github-queue-poller.js +200 -0
package/dist/trigger/delivery-action.d.ts +2 -0
package/dist/trigger/delivery-action.js +24 -0
package/dist/trigger/github-queue-config.d.ts +18 -0
package/dist/trigger/github-queue-config.js +155 -0
package/dist/trigger/polling-scheduler.d.ts +1 -0
package/dist/trigger/polling-scheduler.js +185 -6
package/dist/trigger/trigger-router.js +24 -1
package/dist/trigger/trigger-store.js +77 -2
package/dist/trigger/types.d.ts +19 -0
package/docs/design/adaptive-coordinator-context-candidates.md +265 -0
package/docs/design/adaptive-coordinator-context-review.md +101 -0
package/docs/design/adaptive-coordinator-context.md +504 -0
package/docs/design/adaptive-coordinator-routing-candidates.md +340 -0
package/docs/design/adaptive-coordinator-routing-design-review.md +135 -0
package/docs/design/adaptive-coordinator-routing-review.md +156 -0
package/docs/design/adaptive-coordinator-routing.md +660 -0
package/docs/design/context-assembly-layer-design-review.md +110 -0
package/docs/design/context-assembly-layer.md +622 -0
package/docs/design/stuck-escalation-candidates.md +176 -0
package/docs/design/stuck-escalation-design-review.md +70 -0
package/docs/design/stuck-escalation.md +326 -0
package/docs/design/worktrain-task-queue-candidates.md +252 -0
package/docs/design/worktrain-task-queue-design-review.md +109 -0
package/docs/design/worktrain-task-queue.md +443 -0
package/docs/design/worktree-review-findings-candidates.md +101 -0
package/docs/design/worktree-review-findings-design-review.md +65 -0
package/docs/design/worktree-review-findings-implementation-plan.md +153 -0
package/docs/ideas/backlog.md +148 -0
package/package.json +3 -3

package/docs/design/adaptive-coordinator-routing-candidates.md ADDED Viewed

@@ -0,0 +1,340 @@
+# Adaptive Coordinator Routing -- Design Candidates
+**Status:** Generated by wr.discovery workflow, 2026-04-19
+**Main design doc:** `docs/design/adaptive-coordinator-routing.md`
+**For:** Main agent review and synthesis -- this is raw investigative material, not a final decision
+---
+## Problem Understanding
+### Core tensions
+1. **Determinism vs intelligence**: Static routing is deterministic (same input, same pipeline). LLM routing is intelligent but non-deterministic. The design must decide where the determinism boundary is.
+2. **Completeness vs YAGNI**: The backlog envisions a full autonomous pipeline covering ideation through production audit. The immediate need is 4-5 pipeline modes for code-level tasks. Designing for completeness now is over-engineering.
+3. **Monolithic coordinator vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one pipeline mode. Five modes in one file would be unmanageable. The right architecture decomposes into mode files with a thin dispatcher -- but this requires deciding the seam deliberately.
+4. **`recommendedPipeline` verbatim vs advisory**: If classify-task-workflow's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
+5. **Phase 0.5 vs coordinator routing for upstream context**: `coding-task-workflow-agentic` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
+### What the codebase already solves (and how)
+**`pr-review.ts` pattern:**
+- `CoordinatorDeps` interface (16 injectable methods) -- all I/O behind the interface, coordinator core is pure
+- `ReviewSeverity` as a discriminated union -- illegal states unrepresentable
+- `parseFindingsFromNotes()` as a pure function with two-tier strategy (structured JSON block first, keyword scan fallback)
+- Escalation-first: every failure produces `escalated: true` + `escalationReason`, never silent substitution
+- TRACE log before acting on routing decision
+**`classify-task-workflow.json`:**
+- Exists as of v3.40.0. Single LLM step, no tools, outputs `recommendedPipeline` as ordered workflow ID array
+- Output format: structured text block with `recommendedPipeline: ["...", "..."]` line
+- Note: `spawn_agent` does NOT return artifacts (v3.40.0 limitation #5) -- output must be read via `spawnSession` + `awaitSessions` + `getAgentResult` + note parsing
+**Phase 0.5 (`coding-task-workflow-agentic`):**
+- Already detects `pitch.md` and sets `solutionFixed=true`, skipping design phases
+- The coordinator's "IMPLEMENT mode" (skip discovery/shaping) and Phase 0.5 are complementary, not conflicting
+**`context-passing agent` findings (from `docs/design/adaptive-coordinator-context.md`):**
+- File-based handoff (pitch.md) already covers Shaping->Coding
+- Discovery->Shaping gap: coordinator must inject `lastStepNotes` from discovery session as `assembledContextSummary` for shaping spawn
+- This is execution logic within FULL pipeline mode, not a routing/classification concern
+### Likely seam
+The real seam is `routeTask(goal: string, workspace: string) -> PipelineMode`. This function is the heart of the routing layer. All inputs flow into it; all pipeline execution flows out of it.
+### What makes this hard
+- Note parsing for classify-task output: no typed artifact yet. Text parsing of LLM output for `recommendedPipeline` is fragile.
+- Static rule conflicts: `"fix the BLOCKING issue in PR #47"` contains a PR number (-> REVIEW_ONLY) and a severity keyword. Disambiguation needed.
+- Phase 0.5 in coding-task and coordinator-level routing can diverge: coordinator spawns shaping but coding-task's Phase 0.5 would skip design phases anyway if pitch.md appears later.
+- The number of pipeline modes is bounded for now but the backlog implies growth. The architecture must be additive without modification.
+---
+## Philosophy Constraints
+From CLAUDE.md (stated) and pr-review.ts (practiced):
+- **Immutability by default**: all interfaces readonly. `PipelineMode`, `AdaptivePipelineOpts`, `AdaptiveCoordinatorDeps` must be fully readonly.
+- **Make illegal states unrepresentable**: `PipelineMode` as a discriminated union, not a string constant. `switch(mode.kind)` with `assertNever` fallthrough.
+- **Errors are data**: `routeTask()` returns `Result<PipelineMode, string>`, never throws. Phase failures return `err(reason)`.
+- **Exhaustiveness everywhere**: switch on `PipelineMode` must handle all variants.
+- **Dependency injection for boundaries**: `AdaptiveCoordinatorDeps` injectable interface. No direct fs/fetch in coordinator core.
+- **YAGNI with discipline**: cover the 5 named modes; do not build a general pipeline engine.
+- **Determinism over cleverness**: static routing preferred; LLM as bounded fallback.
+- **Document 'why', not 'what'**: coordinator header block must explain invariants and design decisions (pr-review.ts header is the template).
+**Philosophy conflict identified:** LLM classification (Candidate B, C's fallback tier) is non-deterministic, conflicting with "determinism over cleverness". Resolution: static tier is deterministic; LLM fallback is bounded and documented as a deliberate trade.
+---
+## Impact Surface
+- `src/cli-worktrain.ts` -- needs `worktrain run pipeline` subcommand wiring
+- `src/coordinators/pr-review.ts` -- must remain unchanged; new coordinator is additive
+- `src/trigger/types.ts` -- if Candidate D's `pipelineMode` field is added; otherwise unchanged
+- `workflows/classify-task-workflow.json` -- coordinator depends on its note output format; format changes break parsing
+- `src/coordinators/routing/route-task.ts` (new) -- pure routing function; all mode selection logic lives here
+- `src/coordinators/modes/*.ts` (new files) -- each mode's pipeline execution logic
+- Test suite: each mode coordinator needs its own unit tests with `CoordinatorDeps` fakes
+---
+## Candidates
+### Candidate A: Pure static routing with named pipeline modes
+**Summary:** `routeTask()` is a pure function that applies static rules against goal text and workspace filesystem. No LLM. Returns one of 5 `PipelineMode` variants.
+**PipelineMode type:**
+```typescript
+type PipelineMode =
+  | { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
+  | { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
+  | { kind: 'IMPLEMENT'; pitchPath: string }
+  | { kind: 'FULL'; goal: string }
+  | { kind: 'ESCALATE'; reason: string };
+```
+**Static rules (in priority order):**
+1. goal contains dep-bump keywords AND PR/MR number -> QUICK_REVIEW
+2. goal contains PR/MR number -> REVIEW_ONLY
+3. `.workrail/current-pitch.md` exists in workspace -> IMPLEMENT
+4. else -> FULL
+**Per-mode pipelines:**
+- REVIEW_ONLY: `mr-review-workflow.agentic.v2` -> route by verdict
+- QUICK_REVIEW: same + light model config, no arch audit override
+- IMPLEMENT: `coding-task-workflow-agentic` (Phase 0.5 picks up pitch) -> PR -> review -> merge
+- FULL: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
+**Tensions resolved:** determinism, YAGNI, no LLM latency.
+**Tensions accepted:** all ambiguous tasks fall to FULL (wasteful for Medium complexity tasks that don't need full discovery).
+**Boundary:** routing function is pure, filesystem-only I/O (check for pitch.md).
+**Failure mode:** task `"fix the race condition in auth.ts"` (Medium complexity, should discover) falls to FULL and runs wr.discovery unnecessarily -- but this is correct behavior, not a failure.
+**Repo pattern:** follows. Pure function routing, discriminated union, escalation-first.
+**Gain:** Zero dispatch latency; fully deterministic; simplest possible implementation (~100 lines).
+**Give up:** Ambiguous Medium tasks all run FULL (discovery + shaping + coding) even when they might not need discovery.
+**Scope judgment:** Best-fit for the named use cases. Slight over-inclusion for Medium tasks.
+**Philosophy:** Fully honors CLAUDE.md. Best alignment.
+---
+### Candidate B: classify-task-workflow as authoritative source
+**Summary:** Always spawn `classify-task-workflow` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
+**Architecture:**
+```typescript
+async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
+  const handle = await deps.spawnSession('classify-task-workflow', `Classify: ${goal}`, workspace);
+  await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS); // 3 minutes max
+  const agentResult = await deps.getAgentResult(handle);
+  return parseRecommendedPipeline(agentResult.recapMarkdown);
+}
+```
+`parseRecommendedPipeline` is a pure function parsing the text block (two-tier: JSON array first, regex fallback).
+**Fallback:** if parsing fails, default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
+**Tensions resolved:** intelligent routing for all tasks including ambiguous ones; single source of truth for pipeline selection rules.
+**Tensions accepted:** non-deterministic; 5-15 second LLM latency per dispatch; no typed `PipelineMode` discriminated union (pipeline is a string[] at coordinator level).
+**Boundary:** classify-task-workflow is the routing authority; coordinator is a runner.
+**Failure mode:** classify-task-workflow misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
+**Repo pattern:** departs from determinism-over-cleverness principle. No named discriminated union.
+**Gain:** routing rules live in a workflow file -- updatable without code deployment.
+**Give up:** determinism, transparency, typed modes, dispatch speed for obvious cases.
+**Scope judgment:** Too broad -- the coordinator becomes a generic workflow runner, not a policy owner.
+**Philosophy:** Conflicts with determinism-over-cleverness and make-illegal-states-unrepresentable.
+---
+### Candidate C: Static-first with LLM fallback (hybrid, recommended for routing)
+**Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to classify-task-workflow for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
+**PipelineMode type (6 variants):**
+```typescript
+type PipelineMode =
+  | { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
+  | { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
+  | { kind: 'IMPLEMENT'; pitchPath: string }
+  | { kind: 'FULL'; goal: string }
+  | { kind: 'CLASSIFY_AND_RUN'; classifiedPipeline: readonly string[]; goal: string }
+  | { kind: 'ESCALATE'; reason: string };
+```
+**Two-tier routing function:**
+```typescript
+async function routeTask(
+  goal: string,
+  workspace: string,
+  deps: Pick<AdaptiveCoordinatorDeps, 'spawnSession' | 'awaitSessions' | 'getAgentResult'>,
+): Promise<Result<PipelineMode, string>> {
+  // Tier 1: static (pure, no I/O except filesystem check for pitch.md)
+  const staticMode = applyStaticRules(goal, workspace);
+  if (staticMode !== null) return ok(staticMode);
+  // Tier 2: classify-task-workflow
+  const classified = await runClassification(goal, workspace, deps);
+  if (classified.kind === 'err') return err(`classification failed: ${classified.error}`);
+  return ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value, goal });
+}
+```
+`applyStaticRules(goal, workspace): PipelineMode | null` -- pure function, same rules as Candidate A.
+`runClassification(goal, workspace, deps): Promise<Result<readonly string[], string>>` -- same as Candidate B's routeTask.
+**CLASSIFY_AND_RUN execution:** coordinator iterates `classifiedPipeline` and spawns each workflow in sequence; unknown workflow IDs escalate with structured reason.
+**Tensions resolved:** determinism for well-known cases; intelligence for ambiguous cases; fast (no LLM) for 80% of cases.
+**Tensions accepted:** CLASSIFY_AND_RUN mode is less typed than named modes; two-tier adds ~30 lines of complexity vs pure static.
+**Boundary:** static rules handle the policy for known cases; classify-task handles the policy for unknown cases.
+**Failure mode:** developer adds a new static rule that catches cases formerly handled by classify-task; routing changes silently for those cases. Documentation + tests mitigate this.
+**Repo pattern:** follows parseFindingsFromNotes two-tier strategy precisely.
+**Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
+**Give up:** CLASSIFY_AND_RUN is not a named typed mode with typed data (it carries a string[] pipeline).
+**Scope judgment:** Best-fit.
+**Philosophy:** Honors all CLAUDE.md principles. Determinism-over-cleverness: static tier is deterministic; LLM fallback is explicitly bounded and documented.
+---
+### Candidate D: Explicit pipelineMode in TriggerDefinition + CLI flag (configuration-driven)
+**Summary:** Add optional `pipelineMode: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto'` to `TriggerDefinition`; CLI `--mode` flag for `worktrain run pipeline`. `auto` falls back to Candidate C.
+**New optional fields:**
+```typescript
+// In TriggerDefinition (src/trigger/types.ts):
+readonly pipelineMode?: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto';
+```
+**CLI override:**
+```
+worktrain run pipeline --task "..." --mode review_only
+worktrain run pipeline --task "..." --mode full
+worktrain run pipeline --task "..."  # auto: falls back to Candidate C hybrid
+```
+**Coordinator uses explicit mode when present; otherwise Candidate C.**
+**Tensions resolved:** removes routing ambiguity for configured triggers; fully explicit for trigger operators.
+**Tensions accepted:** configuration overhead; TriggerDefinition schema change; `auto` still needs C's complexity.
+**Boundary:** trigger config as the routing authority for configured workflows; C for ad-hoc CLI.
+**Failure mode:** trigger operator omits `pipelineMode` and gets unexpected auto-routing.
+**Repo pattern:** departs -- adds a new field to TriggerDefinition. The existing `workflowId` field plays a similar role (telling the trigger what to run). Adding `pipelineMode` is a competing second field for routing.
+**Gain:** deterministic, transparent, auditable routing for trigger-based pipelines.
+**Give up:** schema change, configuration overhead, potential confusion with `workflowId`.
+**Scope judgment:** Slightly broad for an initial design. CLI `--mode` flag is sufficient without the TriggerDefinition change.
+**Philosophy:** Honors explicit-over-implicit. Minor YAGNI conflict for the TriggerDefinition field.
+---
+### Candidate E: Per-mode coordinator files with thin dispatcher (architectural decomposition, recommended for architecture)
+**Summary:** The adaptive coordinator is decomposed into per-mode files following `pr-review.ts` independently; a thin `dispatch.ts` reads the routing result and calls the right coordinator.
+**File structure:**
+```
+src/coordinators/
+  adaptive-pipeline.ts          <- thin entry point + CoordinatorDeps wiring
+  routing/
+    route-task.ts               <- routeTask() [Candidate A or C logic]
+    parse-classify-output.ts    <- parseRecommendedPipeline() pure function
+    classify.ts                 <- runClassification() -- LLM fallback
+  modes/
+    review-only.ts              <- runReviewOnlyPipeline(deps, opts, mode)
+    quick-review.ts             <- runQuickReviewPipeline(deps, opts, mode)
+    implement.ts                <- runImplementPipeline(deps, opts, mode)
+    full-pipeline.ts            <- runFullPipeline(deps, opts, mode)
+    classify-and-run.ts         <- runClassifyAndRunPipeline(deps, opts, mode)
+```
+**Dispatcher (adaptive-pipeline.ts):**
+```typescript
+export async function runAdaptivePipeline(
+  deps: AdaptiveCoordinatorDeps,
+  opts: AdaptivePipelineOpts,
+): Promise<PipelineResult> {
+  const modeResult = await routeTask(opts.goal, opts.workspace, deps);
+  if (modeResult.kind === 'err') return escalate(modeResult.error);
+  const mode = modeResult.value;
+  deps.stderr(`[routing] mode=${mode.kind} goal="${opts.goal.slice(0, 60)}"`);
+  switch (mode.kind) {
+    case 'REVIEW_ONLY': return runReviewOnlyPipeline(deps, opts, mode);
+    case 'QUICK_REVIEW': return runQuickReviewPipeline(deps, opts, mode);
+    case 'IMPLEMENT': return runImplementPipeline(deps, opts, mode);
+    case 'FULL': return runFullPipeline(deps, opts, mode);
+    case 'CLASSIFY_AND_RUN': return runClassifyAndRunPipeline(deps, opts, mode);
+    case 'ESCALATE': return { escalated: true, escalationReason: mode.reason, ... };
+    default: return assertNever(mode);
+  }
+}
+```
+**Tensions resolved:** monolithic-vs-decomposition (fully decomposed); open/closed (adding a mode is additive, not modification); each mode independently testable.
+**Tensions accepted:** more files to navigate; thin dispatcher adds one level of indirection.
+**Boundary:** the seam is the `PipelineMode` discriminated union passed from routing to dispatch to mode executors.
+**Failure mode:** mode executor interfaces diverge over time. Mitigation: shared `AdaptivePipelineOpts` base type, `AdaptiveCoordinatorDeps` as single shared interface.
+**Repo pattern:** direct extension of pr-review.ts pattern -- each mode file IS a pr-review.ts equivalent.
+**Gain:** each mode file is small (~300-600 lines), focused, testable in isolation.
+**Give up:** more files than a monolithic coordinator.
+**Scope judgment:** Best-fit for 5+ modes. The decomposition cost is low (~3 extra files); the maintenance benefit is high.
+**Philosophy:** Honors YAGNI (each file is exactly what that mode needs), exhaustiveness (dispatch switch), compose-with-small-pure-functions.
+---
+## Comparison and Recommendation
+### Recommendation: Candidate C (routing mechanism) + Candidate E (architecture)
+**Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via classify-task-workflow. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
+**Architecture (E):** Per-mode coordinator files with thin dispatcher. Each mode file follows `pr-review.ts` independently. The dispatcher's `switch(mode.kind)` is exhaustive with `assertNever`. Adding a new mode is additive.
+**Candidate D (pipelineMode config):** Not part of the initial design. CLI `--mode` flag provides explicit override. TriggerDefinition field deferred.
+### Why not A (pure static)?
+Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with classify-task-workflow returning `['coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
+### Why not B (pure LLM)?
+Non-deterministic routing is unacceptable for the coordinator. A PR review task must always route to REVIEW_ONLY regardless of LLM mood. The latency and indeterminism costs of always-classify outweigh its benefits given the existing static signal coverage.
+---
+## Self-Critique
+### Strongest counter-argument against C+E
+**Against C's static rules:** The dep-bump heuristic (`bump`, `chore:`, `dependabot`) may match task descriptions from queue items that aren't actually PR-linked dep bumps. Example: `"bump the cache TTL to 300 seconds"` would match `bump` but isn't a dep bump. Static rule fix: require BOTH a bump keyword AND a PR/MR number for QUICK_REVIEW. This is already the design (rules are ANDed).
+**Against E's decomposition:** A developer unfamiliar with the codebase must navigate 7+ files to understand one end-to-end pipeline. Counter: the dispatcher is the entry point; the routing logic is in one file; each mode file is self-contained and understandable in isolation. The navigation cost is lower than reading 4000 lines in one file.
+### Pivot conditions
+1. If classify-task-workflow format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
+2. If trigger operators need deterministic routing for automated workflows -> add `pipelineMode` to TriggerDefinition (Candidate D addition)
+3. If context-passing agent's design requires structured handoff data from routing to mode executors -> add a `contextBundle` field to mode types (implementation change, not routing design change)
+### Assumption that would invalidate this design
+If `spawn_agent`'s artifact return limitation is fixed before implementation (planned PR), the coordinator could read classify-task's `recommendedPipeline` as a typed artifact instead of parsing notes. This would change `classify.ts` but not the overall C+E architecture. The design is forward-compatible with this improvement.
+---
+## Open Questions for the Main Agent
+1. Should `CLASSIFY_AND_RUN` mode be treated as a named mode with a typed variant (as in Candidate C), or should the coordinator convert classify-task's output into one of the 4 named modes (REVIEW_ONLY / QUICK_REVIEW / IMPLEMENT / FULL) based on what workflows appear in `recommendedPipeline`? The latter would eliminate the `CLASSIFY_AND_RUN` variant but require mapping logic.
+2. The context-passing agent has not yet filled in the "Assumptions the routing agent needs to know about" section of their doc. Before finalizing, should the routing design explicitly specify what each mode executor must inject as `assembledContextSummary` for each phase transition?
+3. Should the `AdaptiveCoordinatorDeps` interface be a strict superset of `CoordinatorDeps` from pr-review.ts, or should it be a separate interface that may share some methods? The safest approach: new interface, shared common methods via intersection type or copy.
+4. What is the wall-clock timeout per phase in the full pipeline? pr-review.ts hardcodes 15 minutes per child session. For wr.discovery and wr.shaping (which may run longer), a higher timeout is appropriate -- but it should be hardcoded, not LLM-computed (robustness rule 1 from pr-review.ts).

package/docs/design/adaptive-coordinator-routing-design-review.md ADDED Viewed

@@ -0,0 +1,135 @@
+# Adaptive Coordinator Routing -- Design Review Findings
+**Status:** Review complete
+**Date:** 2026-04-19
+**Reviewing:** `docs/design/adaptive-coordinator-routing.md` (selected design: A+E)
+**For:** Main agent interpretation and final decision
+---
+## Tradeoff Review
+| Tradeoff | Acceptable? | Condition that breaks it |
+|----------|-------------|--------------------------|
+| All ambiguous tasks run FULL (wasteful for Medium-complexity refactors) | Yes for MVP | >20% of real tasks are Medium-complexity refactors routed to FULL unnecessarily |
+| `routeTask()` filesystem check injectable via `TaskSignals` | Yes | Multiple expensive filesystem signals needed (currently only pitchMdExists) |
+| `QUICK_REVIEW` and `REVIEW_ONLY` as separate DU variants | Yes | Behaviors converge to same implementation (trivial merge at that point) |
+**Hidden assumptions surfaced:**
+1. Discovery and shaping phases produce helpful (not misleading) output for all tasks applied to them
+2. `pitchMdExists` is the only filesystem routing signal (future signals added to `TaskSignals`)
+---
+## Failure Mode Review
+| Failure mode | Design handles it? | Missing mitigation | Risk |
+|--------------|-------------------|--------------------|------|
+| PR number in non-review task misroutes to REVIEW_ONLY | Partially (--mode override exists) | Regex pattern too broad -- bare `#\d+` matches non-PR numbers | LOW-MEDIUM |
+| Stale pitch.md routes new task to IMPLEMENT incorrectly | No -- convention not enforced | `runImplementPipeline()` must archive pitch.md after successful coding | MEDIUM |
+| FULL pipeline timeout leaves intermediate state | Partial (wall-clock budget mentioned) | Per-phase timeouts not specified; intermediate state cleanup undefined | HIGH |
+---
+## Runner-Up / Simpler Alternative Review
+**Runner-up (C+E):** Only element worth borrowing is `TaskSignals` as explicit value object -- already incorporated into design. CLASSIFY_AND_RUN mode correctly excluded (non-determinism + format-parsing fragility + FULL mode redundancy).
+**Simpler variant (A without E -- monolithic file):** Would satisfy acceptance criteria but violates open/closed (adding a mode modifies shared file). `pr-review.ts` at 1462 lines for one mode is justification for E's decomposition from day one. Monolithic rejected.
+---
+## Philosophy Alignment
+**All key principles satisfied:** illegal-states-unrepresentable, exhaustiveness, errors-as-data, validate-at-boundaries, determinism-over-cleverness, dependency-injection, YAGNI.
+**Two acceptable tensions:**
+1. `fileExists()` I/O behind `CoordinatorDeps` injectable -- principle preserved by injection
+2. Mode executors are imperative (sequential spawns) -- routing layer is declarative/pure; this is the best achievable split for sequential pipeline coordination
+---
+## Findings
+### RED -- must fix before implementation
+**R1: FULL pipeline per-phase timeouts and total budget not specified**
+The design says "hardcoded timeouts" but does not specify the values. For a coordinator that chains 4 sessions, this is a required constant before implementation. If the coordinator times out mid-pipeline (after discovery but before coding), the repository is left with a `.workrail/current-pitch.md` file and no implementation -- a silent intermediate state that will misroute the next invocation.
+**Required additions to the design:**
+```typescript
+const DISCOVERY_SESSION_TIMEOUT_MS  = 30 * 60 * 1000;  // 30 min
+const SHAPING_SESSION_TIMEOUT_MS    = 30 * 60 * 1000;  // 30 min
+const CODING_SESSION_TIMEOUT_MS     = 60 * 60 * 1000;  // 60 min
+const REVIEW_SESSION_TIMEOUT_MS     = 20 * 60 * 1000;  // 20 min
+const FULL_PIPELINE_MAX_MS          = 160 * 60 * 1000; // 160 min total
+const FULL_PIPELINE_SPAWN_CUTOFF_MS = 130 * 60 * 1000; // 130 min (stop spawning new phases)
+```
+And: `runFullPipeline()` must archive pitch.md if it was produced by shaping but coding times out or fails.
+---
+### ORANGE -- should fix before implementation
+**O1: PR number regex pattern too broad**
+Current routing rule: "goal contains PR/MR number" matches any bare `#\d+` in the goal text. This produces false positives for `"refactor PR #47 related auth code"` (wants IMPLEMENT, gets REVIEW_ONLY) or `"fix issue #123 in the auth module"` (wants FULL, gets REVIEW_ONLY).
+**Required fix:** Use context-sensitive pattern matching:
+- REVIEW_ONLY: `\bPR\s*#\d+\b` or `\bMR\s*!?\d+\b` with leading verb context (`review`, `check`, `approve`, etc.)
+- Or more conservatively: require the goal to START with review-intent keywords (`"Review PR #..."`, `"Check MR ..."`) rather than contain a PR number anywhere
+Recommendation: the routing function should be aware of ambiguous patterns and log a warning when a PR number is found but no review-intent verb precedes it.
+**O2: pitch.md archival not specified**
+`runImplementPipeline()` must rename `.workrail/current-pitch.md` to `.workrail/pitches/[timestamp]-[goal-slug]-pitch.md` (or similar) after the coding session completes successfully. Without this, stale pitch files cause incorrect IMPLEMENT routing for subsequent tasks.
+The `IMPLEMENT` mode coordinator needs a post-coding cleanup step. This is an implementation detail but must be in the design spec before coding begins.
+---
+### YELLOW -- nice to fix, non-blocking
+**Y1: `QUICK_REVIEW` and `REVIEW_ONLY` distinction underspecified**
+The design mentions QUICK_REVIEW uses a "lighter model config" but does not define what "lighter" means -- which model, what `agentConfig` fields, what the expected speed/quality tradeoff is. Before implementing QUICK_REVIEW, define: `{ model: 'amazon-bedrock/claude-haiku-4-5', maxSessionMinutes: 5 }` or equivalent.
+**Y2: `TaskSignals` interface not fully specified**
+The design refers to `TaskSignals` but does not define all fields. A complete definition is needed before implementation:
+```typescript
+interface TaskSignals {
+  readonly triggerProvider: string;        // 'generic' | 'github_prs_poll' | 'github_issues_poll' | 'gitlab_poll'
+  readonly pitchMdExists: boolean;         // .workrail/current-pitch.md exists in workspace
+  readonly issueLabels: readonly string[]; // labels from trigger payload (empty if not from polling trigger)
+  readonly explicitMode?: string;          // from --mode CLI flag or trigger context variable
+}
+```
+**Y3: `AdaptiveCoordinatorDeps` vs `CoordinatorDeps` relationship**
+The design does not specify whether `AdaptiveCoordinatorDeps` extends `CoordinatorDeps` from pr-review.ts or is a separate interface. Recommendation: separate interface that copies shared methods (no inheritance from pr-review.ts since that coordinator's deps are highly specific to PR review). Shared pattern, not shared type.
+---
+## Recommended Revisions
+1. **(RED) Add per-phase timeout constants and intermediate state cleanup to design** -- required before implementation
+2. **(ORANGE) Tighten PR number routing regex** -- reduces false positives meaningfully
+3. **(ORANGE) Specify pitch.md archival in `runImplementPipeline()`** -- prevents stale routing
+4. **(YELLOW) Define `TaskSignals` interface fully**
+5. **(YELLOW) Define QUICK_REVIEW model config**
+6. **(YELLOW) Specify `AdaptiveCoordinatorDeps` relationship to `CoordinatorDeps`**
+---
+## Residual Concerns
+1. **Discovery/shaping quality for tasks they shouldn't run on.** If `wr.discovery` is unhelpful for "refactor auth.ts" tasks (just produces boilerplate), the FULL default becomes actively harmful, not just wasteful. Monitoring needed in production: log which tasks route to FULL and whether discovery findings are actually used by the shaping phase.
+2. **No checkpoint/resume for multi-phase pipeline.** If the coordinator crashes mid-FULL-pipeline, there is no way to resume from the completed phases. The current design requires re-running from the beginning. This is acceptable for MVP but should be tracked as a gap.
+3. **The context-passing agent's `adaptive-coordinator-context.md` did not exist at review time.** The assumptions section in the routing design is speculative. If the context-passing design introduces new contracts (e.g., coordinator must inject a `discoveryDoc` at the shaping spawn), the routing design is unaffected (routing is pure, context injection is per-mode), but the mode coordinator implementations need updating.

package/docs/design/adaptive-coordinator-routing-review.md ADDED Viewed

@@ -0,0 +1,156 @@
+# Adaptive Coordinator Routing -- Design Review Findings
+**Status:** Generated by wr.discovery workflow, 2026-04-19
+**Selected design:** Candidate A (pure static routing) + Candidate E (per-mode file architecture)
+**Main design doc:** `docs/design/adaptive-coordinator-routing.md`
+---
+## Tradeoff Review
+### Tradeoff 1: All ambiguous tasks default to FULL pipeline
+- **Status: ACCEPTED**
+- Covers all 5 stated use cases correctly
+- Hidden assumption: wr.discovery runtime < 30 minutes (needs validation)
+- Phase-level escalation required: discovery-fail and shaping-fail must be independent escalation points in full-pipeline.ts
+- **Action required:** Document in design doc that FULL pipeline requires per-phase escalation, not just a top-level catch
+### Tradeoff 2: routeTask() filesystem check is injectable
+- **Status: ACCEPTED with implementation note**
+- `AdaptiveCoordinatorDeps` needs a new `fileExists(path: string): Promise<boolean>` method
+- This method is NOT in the current `CoordinatorDeps` from pr-review.ts
+- **Action required:** Document `AdaptiveCoordinatorDeps` as a new interface (not extending CoordinatorDeps), specifying which methods are shared vs. new
+### Tradeoff 3: QUICK_REVIEW and REVIEW_ONLY as separate discriminated union variants
+- **Status: ACCEPTED with clarification needed**
+- Hidden assumption exposed: QUICK_REVIEW behavior difference must be realized via specialized goal string to mr-review, not a new workflow flag
+- If mr-review ignores goal hints, QUICK_REVIEW = REVIEW_ONLY (acceptable for MVP)
+- **Action required:** Document that QUICK_REVIEW passes goal prefix `[DEP BUMP]` to the review session
+---
+## Failure Mode Review
+### FM1: PR number in non-review task goal string
+- **Status: ADEQUATE mitigation** -- routing log printed before spawn; `--mode` override available
+- Severity: LOW -- edge case, transparent to user
+### FM2: Stale pitch.md from previous task (HIGHEST RISK)
+- **Status: INADEQUATE mitigation** -- routing design doc does not address pitch.md lifecycle
+- The IMPLEMENT mode can silently route a new task to the wrong pitch
+- **Action required (ORANGE):** Document pitch.md lifecycle invariant: IMPLEMENT mode executor (`modes/implement.ts`) must archive or delete pitch.md after coding session completes (success OR failure). This is a pipeline executor invariant, not a routing layer change.
+### FM3: FULL pipeline phase timeout
+- **Status: INCOMPLETE specification**
+- pr-review.ts timeouts (15 min child session, 70 min spawn cutoff) are wrong for FULL pipeline
+- FULL pipeline (discovery 30min + shaping 30min + coding 60min = 120min possible)
+- **Action required (ORANGE):** Specify explicit timeout constants for adaptive coordinator in the design doc: `DISCOVERY_TIMEOUT_MS = 35*60*1000`, `SHAPING_TIMEOUT_MS = 35*60*1000`, `CODING_TIMEOUT_MS = 65*60*1000`, `COORDINATOR_SPAWN_CUTOFF_MS = 100*60*1000`, `COORDINATOR_MAX_MS = 120*60*1000`
+### FM4: Discovery session produces empty/trivial notes
+- **Status: EXPLICITLY ACCEPTED** -- acceptable for MVP to proceed to shaping with thin context
+- Severity: LOW -- quality degradation, not a correctness failure
+- Note: FULL mode executor should log a warning when discovery notes are < N characters before spawning shaping
+---
+## Runner-Up / Simpler Alternative Review
+### Runner-up (Candidate C: static-first + LLM fallback)
+**One element worth preserving:** Write `parseRecommendedPipeline()` as a pure function in `routing/parse-classify-output.ts` with tests, but do NOT call it in the coordinator at MVP. This preserves the upgrade path to Candidate C if static routing proves insufficient.
+**Cost:** ~30 lines of code + tests. Low cost, high future value.
+### Simpler variant (single coordinator file)
+**Rejected.** For 4+ pipeline modes, a single file would be 4000+ lines or require the same internal decomposition as Candidate E. Candidate E makes the seams explicit and testable without adding meaningful overhead.
+---
+## Philosophy Alignment
+| Principle | Status | Note |
+|-----------|--------|------|
+| Immutability by default | SATISFIED | All types readonly, routeTask() pure |
+| Make illegal states unrepresentable | SATISFIED | PipelineMode discriminated union |
+| Type safety first | SATISFIED | Result<PipelineMode, string>, no null |
+| Errors are data | SATISFIED | Result pattern, escalated+reason |
+| Exhaustiveness | SATISFIED | switch + assertNever in dispatcher |
+| Dependency injection | SATISFIED | AdaptiveCoordinatorDeps injectable |
+| YAGNI with discipline | SATISFIED | 4 modes, no general engine |
+| Determinism over cleverness | SATISFIED | Pure static routing function |
+| Document why not what | REQUIRES ACTION | Coordinator header block needed |
+| Functional/declarative | UNDER TENSION (acceptable) | Mode executors are imperative; mitigation: pure routing core, imperative execution shell |
+| Compose with small pure functions | UNDER TENSION (acceptable) | Same tension as pr-review.ts; same mitigation |
+---
+## Findings
+### RED (blocking -- must address before implementation)
+None.
+### ORANGE (significant -- should address before implementation)
+**O1: FULL pipeline timeout constants not specified**
+The design doc does not specify concrete timeout values for the adaptive coordinator. pr-review.ts's hardcoded constants are wrong for the FULL pipeline duration. Implementation will guess values without explicit guidance.
+*Required action:* Add a "Timing Constants" section to the design doc with explicit millisecond values for each timeout.
+**O2: Pitch.md lifecycle invariant not documented**
+The IMPLEMENT mode can silently misroute if pitch.md is stale. The routing design doc is the natural place to document that IMPLEMENT mode's `modes/implement.ts` must consume pitch.md after completion.
+*Required action:* Add pitch.md lifecycle invariant to the design doc.
+### YELLOW (advisory -- address if convenient)
+**Y1: AdaptiveCoordinatorDeps interface specification missing**
+The design doc does not specify which methods are shared with `CoordinatorDeps` and which are new (e.g., `fileExists`). An implementer will have to infer this.
+*Required action:* Add a "New deps interface methods" section.
+**Y2: QUICK_REVIEW goal string format not specified**
+The design says QUICK_REVIEW passes `[DEP BUMP]` goal prefix. The exact format affects how mr-review interprets the task. Should be explicit.
+*Required action:* Specify goal string template for QUICK_REVIEW in the design doc.
+**Y3: parseRecommendedPipeline() should be written at implementation time**
+Even though not called at MVP, having the pure function ready preserves the upgrade path to Candidate C.
+*Required action:* Include in implementation scope as a zero-cost pure function with tests.
+---
+## Recommended Revisions to Design Doc
+1. Add **Timing Constants** section with explicit millisecond values:
+   - `DISCOVERY_TIMEOUT_MS = 35 * 60 * 1000` (35 min)
+   - `SHAPING_TIMEOUT_MS = 35 * 60 * 1000` (35 min)
+   - `CODING_TIMEOUT_MS = 65 * 60 * 1000` (65 min)
+   - `COORDINATOR_SPAWN_CUTOFF_MS = 100 * 60 * 1000` (100 min)
+   - `COORDINATOR_MAX_MS = 120 * 60 * 1000` (120 min)
+2. Add **Pitch.md Lifecycle Invariant** to the design doc under pipeline invariants:
+   - IMPLEMENT mode executor archives/deletes `.workrail/current-pitch.md` after coding session completes or fails
+   - Archive path: `.workrail/used-pitches/pitch-{timestamp}.md`
+3. Add **AdaptiveCoordinatorDeps interface** sketch:
+   - New methods beyond CoordinatorDeps: `fileExists(path: string): Promise<boolean>`
+   - Shared methods (same signature): `spawnSession`, `awaitSessions`, `getAgentResult`, `mergePR`, `listOpenPRs`, `writeFile`, `stderr`, `now`
+   - Dropped methods (not needed for adaptive): none, but `listOpenPRs` is only used by REVIEW_ONLY and QUICK_REVIEW modes
+4. Add **QUICK_REVIEW goal template**:
+   - `[DEP BUMP] Review PR #${prNumber}: ${prTitle} -- skip architecture audit, verify version compatibility and test coverage only`
+---
+## Residual Concerns
+1. **wr.discovery output standardization**: the routing design assumes wr.discovery notes are injected by the coordinator as `assembledContextSummary` for wr.shaping. But wr.discovery's `designDocPath` output location is not standardized (finding from context-passing agent's doc). The FULL mode executor must parse `lastStepNotes` from the discovery session to build the shaping context -- this is per the context-passing agent's Candidate D (coordinator-injected text). This concern is correctly owned by the context-passing design, not the routing design.
+2. **classify-task-workflow format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
+3. **REVIEW_ONLY vs pr-review coordinator**: the existing `worktrain run pr-review` command already provides REVIEW_ONLY+QUICK_REVIEW behavior. The new `worktrain run pipeline --mode review_only` should either (a) delegate to pr-review coordinator, or (b) reimplement the same logic in `modes/review-only.ts`. Recommendation: (a) delegate -- avoid duplicating the fix-agent loop logic. Document this delegation explicitly.