@exaudeus/workrail 3.42.0 → 3.44.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/dist/console-ui/assets/{index-DwfWMKvv.js → index-Bi38ITiQ.js} +1 -1
  2. package/dist/console-ui/index.html +1 -1
  3. package/dist/daemon/workflow-runner.d.ts +15 -1
  4. package/dist/daemon/workflow-runner.js +86 -9
  5. package/dist/manifest.json +39 -23
  6. package/dist/trigger/adapters/github-queue-poller.d.ts +34 -0
  7. package/dist/trigger/adapters/github-queue-poller.js +200 -0
  8. package/dist/trigger/delivery-action.d.ts +2 -0
  9. package/dist/trigger/delivery-action.js +24 -0
  10. package/dist/trigger/github-queue-config.d.ts +18 -0
  11. package/dist/trigger/github-queue-config.js +155 -0
  12. package/dist/trigger/polling-scheduler.d.ts +1 -0
  13. package/dist/trigger/polling-scheduler.js +185 -6
  14. package/dist/trigger/trigger-router.js +24 -1
  15. package/dist/trigger/trigger-store.js +77 -2
  16. package/dist/trigger/types.d.ts +19 -0
  17. package/docs/design/adaptive-coordinator-context-candidates.md +265 -0
  18. package/docs/design/adaptive-coordinator-context-review.md +101 -0
  19. package/docs/design/adaptive-coordinator-context.md +504 -0
  20. package/docs/design/adaptive-coordinator-routing-candidates.md +340 -0
  21. package/docs/design/adaptive-coordinator-routing-design-review.md +135 -0
  22. package/docs/design/adaptive-coordinator-routing-review.md +156 -0
  23. package/docs/design/adaptive-coordinator-routing.md +660 -0
  24. package/docs/design/context-assembly-layer-design-review.md +110 -0
  25. package/docs/design/context-assembly-layer.md +622 -0
  26. package/docs/design/stuck-escalation-candidates.md +176 -0
  27. package/docs/design/stuck-escalation-design-review.md +70 -0
  28. package/docs/design/stuck-escalation.md +326 -0
  29. package/docs/design/worktrain-task-queue-candidates.md +252 -0
  30. package/docs/design/worktrain-task-queue-design-review.md +109 -0
  31. package/docs/design/worktrain-task-queue.md +443 -0
  32. package/docs/design/worktree-review-findings-candidates.md +101 -0
  33. package/docs/design/worktree-review-findings-design-review.md +65 -0
  34. package/docs/design/worktree-review-findings-implementation-plan.md +153 -0
  35. package/docs/ideas/backlog.md +148 -0
  36. package/package.json +3 -3
@@ -0,0 +1,340 @@
1
+ # Adaptive Coordinator Routing -- Design Candidates
2
+
3
+ **Status:** Generated by wr.discovery workflow, 2026-04-19
4
+ **Main design doc:** `docs/design/adaptive-coordinator-routing.md`
5
+ **For:** Main agent review and synthesis -- this is raw investigative material, not a final decision
6
+
7
+ ---
8
+
9
+ ## Problem Understanding
10
+
11
+ ### Core tensions
12
+
13
+ 1. **Determinism vs intelligence**: Static routing is deterministic (same input, same pipeline). LLM routing is intelligent but non-deterministic. The design must decide where the determinism boundary is.
14
+
15
+ 2. **Completeness vs YAGNI**: The backlog envisions a full autonomous pipeline covering ideation through production audit. The immediate need is 4-5 pipeline modes for code-level tasks. Designing for completeness now is over-engineering.
16
+
17
+ 3. **Monolithic coordinator vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one pipeline mode. Five modes in one file would be unmanageable. The right architecture decomposes into mode files with a thin dispatcher -- but this requires deciding the seam deliberately.
18
+
19
+ 4. **`recommendedPipeline` verbatim vs advisory**: If classify-task-workflow's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
20
+
21
+ 5. **Phase 0.5 vs coordinator routing for upstream context**: `coding-task-workflow-agentic` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
22
+
23
+ ### What the codebase already solves (and how)
24
+
25
+ **`pr-review.ts` pattern:**
26
+ - `CoordinatorDeps` interface (16 injectable methods) -- all I/O behind the interface, coordinator core is pure
27
+ - `ReviewSeverity` as a discriminated union -- illegal states unrepresentable
28
+ - `parseFindingsFromNotes()` as a pure function with two-tier strategy (structured JSON block first, keyword scan fallback)
29
+ - Escalation-first: every failure produces `escalated: true` + `escalationReason`, never silent substitution
30
+ - TRACE log before acting on routing decision
31
+
32
+ **`classify-task-workflow.json`:**
33
+ - Exists as of v3.40.0. Single LLM step, no tools, outputs `recommendedPipeline` as ordered workflow ID array
34
+ - Output format: structured text block with `recommendedPipeline: ["...", "..."]` line
35
+ - Note: `spawn_agent` does NOT return artifacts (v3.40.0 limitation #5) -- output must be read via `spawnSession` + `awaitSessions` + `getAgentResult` + note parsing
36
+
37
+ **Phase 0.5 (`coding-task-workflow-agentic`):**
38
+ - Already detects `pitch.md` and sets `solutionFixed=true`, skipping design phases
39
+ - The coordinator's "IMPLEMENT mode" (skip discovery/shaping) and Phase 0.5 are complementary, not conflicting
40
+
41
+ **`context-passing agent` findings (from `docs/design/adaptive-coordinator-context.md`):**
42
+ - File-based handoff (pitch.md) already covers Shaping->Coding
43
+ - Discovery->Shaping gap: coordinator must inject `lastStepNotes` from discovery session as `assembledContextSummary` for shaping spawn
44
+ - This is execution logic within FULL pipeline mode, not a routing/classification concern
45
+
46
+ ### Likely seam
47
+
48
+ The real seam is `routeTask(goal: string, workspace: string) -> PipelineMode`. This function is the heart of the routing layer. All inputs flow into it; all pipeline execution flows out of it.
49
+
50
+ ### What makes this hard
51
+
52
+ - Note parsing for classify-task output: no typed artifact yet. Text parsing of LLM output for `recommendedPipeline` is fragile.
53
+ - Static rule conflicts: `"fix the BLOCKING issue in PR #47"` contains a PR number (-> REVIEW_ONLY) and a severity keyword. Disambiguation needed.
54
+ - Phase 0.5 in coding-task and coordinator-level routing can diverge: coordinator spawns shaping but coding-task's Phase 0.5 would skip design phases anyway if pitch.md appears later.
55
+ - The number of pipeline modes is bounded for now but the backlog implies growth. The architecture must be additive without modification.
56
+
57
+ ---
58
+
59
+ ## Philosophy Constraints
60
+
61
+ From CLAUDE.md (stated) and pr-review.ts (practiced):
62
+
63
+ - **Immutability by default**: all interfaces readonly. `PipelineMode`, `AdaptivePipelineOpts`, `AdaptiveCoordinatorDeps` must be fully readonly.
64
+ - **Make illegal states unrepresentable**: `PipelineMode` as a discriminated union, not a string constant. `switch(mode.kind)` with `assertNever` fallthrough.
65
+ - **Errors are data**: `routeTask()` returns `Result<PipelineMode, string>`, never throws. Phase failures return `err(reason)`.
66
+ - **Exhaustiveness everywhere**: switch on `PipelineMode` must handle all variants.
67
+ - **Dependency injection for boundaries**: `AdaptiveCoordinatorDeps` injectable interface. No direct fs/fetch in coordinator core.
68
+ - **YAGNI with discipline**: cover the 5 named modes; do not build a general pipeline engine.
69
+ - **Determinism over cleverness**: static routing preferred; LLM as bounded fallback.
70
+ - **Document 'why', not 'what'**: coordinator header block must explain invariants and design decisions (pr-review.ts header is the template).
71
+
72
+ **Philosophy conflict identified:** LLM classification (Candidate B, C's fallback tier) is non-deterministic, conflicting with "determinism over cleverness". Resolution: static tier is deterministic; LLM fallback is bounded and documented as a deliberate trade.
73
+
74
+ ---
75
+
76
+ ## Impact Surface
77
+
78
+ - `src/cli-worktrain.ts` -- needs `worktrain run pipeline` subcommand wiring
79
+ - `src/coordinators/pr-review.ts` -- must remain unchanged; new coordinator is additive
80
+ - `src/trigger/types.ts` -- if Candidate D's `pipelineMode` field is added; otherwise unchanged
81
+ - `workflows/classify-task-workflow.json` -- coordinator depends on its note output format; format changes break parsing
82
+ - `src/coordinators/routing/route-task.ts` (new) -- pure routing function; all mode selection logic lives here
83
+ - `src/coordinators/modes/*.ts` (new files) -- each mode's pipeline execution logic
84
+ - Test suite: each mode coordinator needs its own unit tests with `CoordinatorDeps` fakes
85
+
86
+ ---
87
+
88
+ ## Candidates
89
+
90
+ ### Candidate A: Pure static routing with named pipeline modes
91
+
92
+ **Summary:** `routeTask()` is a pure function that applies static rules against goal text and workspace filesystem. No LLM. Returns one of 5 `PipelineMode` variants.
93
+
94
+ **PipelineMode type:**
95
+ ```typescript
96
+ type PipelineMode =
97
+ | { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
98
+ | { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
99
+ | { kind: 'IMPLEMENT'; pitchPath: string }
100
+ | { kind: 'FULL'; goal: string }
101
+ | { kind: 'ESCALATE'; reason: string };
102
+ ```
103
+
104
+ **Static rules (in priority order):**
105
+ 1. goal contains dep-bump keywords AND PR/MR number -> QUICK_REVIEW
106
+ 2. goal contains PR/MR number -> REVIEW_ONLY
107
+ 3. `.workrail/current-pitch.md` exists in workspace -> IMPLEMENT
108
+ 4. else -> FULL
109
+
110
+ **Per-mode pipelines:**
111
+ - REVIEW_ONLY: `mr-review-workflow.agentic.v2` -> route by verdict
112
+ - QUICK_REVIEW: same + light model config, no arch audit override
113
+ - IMPLEMENT: `coding-task-workflow-agentic` (Phase 0.5 picks up pitch) -> PR -> review -> merge
114
+ - FULL: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
115
+
116
+ **Tensions resolved:** determinism, YAGNI, no LLM latency.
117
+ **Tensions accepted:** all ambiguous tasks fall to FULL (wasteful for Medium complexity tasks that don't need full discovery).
118
+ **Boundary:** routing function is pure, filesystem-only I/O (check for pitch.md).
119
+ **Failure mode:** task `"fix the race condition in auth.ts"` (Medium complexity, should discover) falls to FULL and runs wr.discovery unnecessarily -- but this is correct behavior, not a failure.
120
+ **Repo pattern:** follows. Pure function routing, discriminated union, escalation-first.
121
+ **Gain:** Zero dispatch latency; fully deterministic; simplest possible implementation (~100 lines).
122
+ **Give up:** Ambiguous Medium tasks all run FULL (discovery + shaping + coding) even when they might not need discovery.
123
+ **Scope judgment:** Best-fit for the named use cases. Slight over-inclusion for Medium tasks.
124
+ **Philosophy:** Fully honors CLAUDE.md. Best alignment.
125
+
126
+ ---
127
+
128
+ ### Candidate B: classify-task-workflow as authoritative source
129
+
130
+ **Summary:** Always spawn `classify-task-workflow` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
131
+
132
+ **Architecture:**
133
+ ```typescript
134
+ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
135
+ const handle = await deps.spawnSession('classify-task-workflow', `Classify: ${goal}`, workspace);
136
+ await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS); // 3 minutes max
137
+ const agentResult = await deps.getAgentResult(handle);
138
+ return parseRecommendedPipeline(agentResult.recapMarkdown);
139
+ }
140
+ ```
141
+
142
+ `parseRecommendedPipeline` is a pure function parsing the text block (two-tier: JSON array first, regex fallback).
143
+
144
+ **Fallback:** if parsing fails, default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
145
+
146
+ **Tensions resolved:** intelligent routing for all tasks including ambiguous ones; single source of truth for pipeline selection rules.
147
+ **Tensions accepted:** non-deterministic; 5-15 second LLM latency per dispatch; no typed `PipelineMode` discriminated union (pipeline is a string[] at coordinator level).
148
+ **Boundary:** classify-task-workflow is the routing authority; coordinator is a runner.
149
+ **Failure mode:** classify-task-workflow misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
150
+ **Repo pattern:** departs from determinism-over-cleverness principle. No named discriminated union.
151
+ **Gain:** routing rules live in a workflow file -- updatable without code deployment.
152
+ **Give up:** determinism, transparency, typed modes, dispatch speed for obvious cases.
153
+ **Scope judgment:** Too broad -- the coordinator becomes a generic workflow runner, not a policy owner.
154
+ **Philosophy:** Conflicts with determinism-over-cleverness and make-illegal-states-unrepresentable.
155
+
156
+ ---
157
+
158
+ ### Candidate C: Static-first with LLM fallback (hybrid, recommended for routing)
159
+
160
+ **Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to classify-task-workflow for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
161
+
162
+ **PipelineMode type (6 variants):**
163
+ ```typescript
164
+ type PipelineMode =
165
+ | { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
166
+ | { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
167
+ | { kind: 'IMPLEMENT'; pitchPath: string }
168
+ | { kind: 'FULL'; goal: string }
169
+ | { kind: 'CLASSIFY_AND_RUN'; classifiedPipeline: readonly string[]; goal: string }
170
+ | { kind: 'ESCALATE'; reason: string };
171
+ ```
172
+
173
+ **Two-tier routing function:**
174
+ ```typescript
175
+ async function routeTask(
176
+ goal: string,
177
+ workspace: string,
178
+ deps: Pick<AdaptiveCoordinatorDeps, 'spawnSession' | 'awaitSessions' | 'getAgentResult'>,
179
+ ): Promise<Result<PipelineMode, string>> {
180
+ // Tier 1: static (pure, no I/O except filesystem check for pitch.md)
181
+ const staticMode = applyStaticRules(goal, workspace);
182
+ if (staticMode !== null) return ok(staticMode);
183
+ // Tier 2: classify-task-workflow
184
+ const classified = await runClassification(goal, workspace, deps);
185
+ if (classified.kind === 'err') return err(`classification failed: ${classified.error}`);
186
+ return ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value, goal });
187
+ }
188
+ ```
189
+
190
+ `applyStaticRules(goal, workspace): PipelineMode | null` -- pure function, same rules as Candidate A.
191
+ `runClassification(goal, workspace, deps): Promise<Result<readonly string[], string>>` -- same as Candidate B's routeTask.
192
+
193
+ **CLASSIFY_AND_RUN execution:** coordinator iterates `classifiedPipeline` and spawns each workflow in sequence; unknown workflow IDs escalate with structured reason.
194
+
195
+ **Tensions resolved:** determinism for well-known cases; intelligence for ambiguous cases; fast (no LLM) for 80% of cases.
196
+ **Tensions accepted:** CLASSIFY_AND_RUN mode is less typed than named modes; two-tier adds ~30 lines of complexity vs pure static.
197
+ **Boundary:** static rules handle the policy for known cases; classify-task handles the policy for unknown cases.
198
+ **Failure mode:** developer adds a new static rule that catches cases formerly handled by classify-task; routing changes silently for those cases. Documentation + tests mitigate this.
199
+ **Repo pattern:** follows parseFindingsFromNotes two-tier strategy precisely.
200
+ **Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
201
+ **Give up:** CLASSIFY_AND_RUN is not a named typed mode with typed data (it carries a string[] pipeline).
202
+ **Scope judgment:** Best-fit.
203
+ **Philosophy:** Honors all CLAUDE.md principles. Determinism-over-cleverness: static tier is deterministic; LLM fallback is explicitly bounded and documented.
204
+
205
+ ---
206
+
207
+ ### Candidate D: Explicit pipelineMode in TriggerDefinition + CLI flag (configuration-driven)
208
+
209
+ **Summary:** Add optional `pipelineMode: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto'` to `TriggerDefinition`; CLI `--mode` flag for `worktrain run pipeline`. `auto` falls back to Candidate C.
210
+
211
+ **New optional fields:**
212
+ ```typescript
213
+ // In TriggerDefinition (src/trigger/types.ts):
214
+ readonly pipelineMode?: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto';
215
+ ```
216
+
217
+ **CLI override:**
218
+ ```
219
+ worktrain run pipeline --task "..." --mode review_only
220
+ worktrain run pipeline --task "..." --mode full
221
+ worktrain run pipeline --task "..." # auto: falls back to Candidate C hybrid
222
+ ```
223
+
224
+ **Coordinator uses explicit mode when present; otherwise Candidate C.**
225
+
226
+ **Tensions resolved:** removes routing ambiguity for configured triggers; fully explicit for trigger operators.
227
+ **Tensions accepted:** configuration overhead; TriggerDefinition schema change; `auto` still needs C's complexity.
228
+ **Boundary:** trigger config as the routing authority for configured workflows; C for ad-hoc CLI.
229
+ **Failure mode:** trigger operator omits `pipelineMode` and gets unexpected auto-routing.
230
+ **Repo pattern:** departs -- adds a new field to TriggerDefinition. The existing `workflowId` field plays a similar role (telling the trigger what to run). Adding `pipelineMode` is a competing second field for routing.
231
+ **Gain:** deterministic, transparent, auditable routing for trigger-based pipelines.
232
+ **Give up:** schema change, configuration overhead, potential confusion with `workflowId`.
233
+ **Scope judgment:** Slightly broad for an initial design. CLI `--mode` flag is sufficient without the TriggerDefinition change.
234
+ **Philosophy:** Honors explicit-over-implicit. Minor YAGNI conflict for the TriggerDefinition field.
235
+
236
+ ---
237
+
238
+ ### Candidate E: Per-mode coordinator files with thin dispatcher (architectural decomposition, recommended for architecture)
239
+
240
+ **Summary:** The adaptive coordinator is decomposed into per-mode files following `pr-review.ts` independently; a thin `dispatch.ts` reads the routing result and calls the right coordinator.
241
+
242
+ **File structure:**
243
+ ```
244
+ src/coordinators/
245
+ adaptive-pipeline.ts <- thin entry point + CoordinatorDeps wiring
246
+ routing/
247
+ route-task.ts <- routeTask() [Candidate A or C logic]
248
+ parse-classify-output.ts <- parseRecommendedPipeline() pure function
249
+ classify.ts <- runClassification() -- LLM fallback
250
+ modes/
251
+ review-only.ts <- runReviewOnlyPipeline(deps, opts, mode)
252
+ quick-review.ts <- runQuickReviewPipeline(deps, opts, mode)
253
+ implement.ts <- runImplementPipeline(deps, opts, mode)
254
+ full-pipeline.ts <- runFullPipeline(deps, opts, mode)
255
+ classify-and-run.ts <- runClassifyAndRunPipeline(deps, opts, mode)
256
+ ```
257
+
258
+ **Dispatcher (adaptive-pipeline.ts):**
259
+ ```typescript
260
+ export async function runAdaptivePipeline(
261
+ deps: AdaptiveCoordinatorDeps,
262
+ opts: AdaptivePipelineOpts,
263
+ ): Promise<PipelineResult> {
264
+ const modeResult = await routeTask(opts.goal, opts.workspace, deps);
265
+ if (modeResult.kind === 'err') return escalate(modeResult.error);
266
+ const mode = modeResult.value;
267
+ deps.stderr(`[routing] mode=${mode.kind} goal="${opts.goal.slice(0, 60)}"`);
268
+ switch (mode.kind) {
269
+ case 'REVIEW_ONLY': return runReviewOnlyPipeline(deps, opts, mode);
270
+ case 'QUICK_REVIEW': return runQuickReviewPipeline(deps, opts, mode);
271
+ case 'IMPLEMENT': return runImplementPipeline(deps, opts, mode);
272
+ case 'FULL': return runFullPipeline(deps, opts, mode);
273
+ case 'CLASSIFY_AND_RUN': return runClassifyAndRunPipeline(deps, opts, mode);
274
+ case 'ESCALATE': return { escalated: true, escalationReason: mode.reason, ... };
275
+ default: return assertNever(mode);
276
+ }
277
+ }
278
+ ```
279
+
280
+ **Tensions resolved:** monolithic-vs-decomposition (fully decomposed); open/closed (adding a mode is additive, not modification); each mode independently testable.
281
+ **Tensions accepted:** more files to navigate; thin dispatcher adds one level of indirection.
282
+ **Boundary:** the seam is the `PipelineMode` discriminated union passed from routing to dispatch to mode executors.
283
+ **Failure mode:** mode executor interfaces diverge over time. Mitigation: shared `AdaptivePipelineOpts` base type, `AdaptiveCoordinatorDeps` as single shared interface.
284
+ **Repo pattern:** direct extension of pr-review.ts pattern -- each mode file IS a pr-review.ts equivalent.
285
+ **Gain:** each mode file is small (~300-600 lines), focused, testable in isolation.
286
+ **Give up:** more files than a monolithic coordinator.
287
+ **Scope judgment:** Best-fit for 5+ modes. The decomposition cost is low (~3 extra files); the maintenance benefit is high.
288
+ **Philosophy:** Honors YAGNI (each file is exactly what that mode needs), exhaustiveness (dispatch switch), compose-with-small-pure-functions.
289
+
290
+ ---
291
+
292
+ ## Comparison and Recommendation
293
+
294
+ ### Recommendation: Candidate C (routing mechanism) + Candidate E (architecture)
295
+
296
+ **Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via classify-task-workflow. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
297
+
298
+ **Architecture (E):** Per-mode coordinator files with thin dispatcher. Each mode file follows `pr-review.ts` independently. The dispatcher's `switch(mode.kind)` is exhaustive with `assertNever`. Adding a new mode is additive.
299
+
300
+ **Candidate D (pipelineMode config):** Not part of the initial design. CLI `--mode` flag provides explicit override. TriggerDefinition field deferred.
301
+
302
+ ### Why not A (pure static)?
303
+
304
+ Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with classify-task-workflow returning `['coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
305
+
306
+ ### Why not B (pure LLM)?
307
+
308
+ Non-deterministic routing is unacceptable for the coordinator. A PR review task must always route to REVIEW_ONLY regardless of LLM mood. The latency and indeterminism costs of always-classify outweigh its benefits given the existing static signal coverage.
309
+
310
+ ---
311
+
312
+ ## Self-Critique
313
+
314
+ ### Strongest counter-argument against C+E
315
+
316
+ **Against C's static rules:** The dep-bump heuristic (`bump`, `chore:`, `dependabot`) may match task descriptions from queue items that aren't actually PR-linked dep bumps. Example: `"bump the cache TTL to 300 seconds"` would match `bump` but isn't a dep bump. Static rule fix: require BOTH a bump keyword AND a PR/MR number for QUICK_REVIEW. This is already the design (rules are ANDed).
317
+
318
+ **Against E's decomposition:** A developer unfamiliar with the codebase must navigate 7+ files to understand one end-to-end pipeline. Counter: the dispatcher is the entry point; the routing logic is in one file; each mode file is self-contained and understandable in isolation. The navigation cost is lower than reading 4000 lines in one file.
319
+
320
+ ### Pivot conditions
321
+
322
+ 1. If classify-task-workflow format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
323
+ 2. If trigger operators need deterministic routing for automated workflows -> add `pipelineMode` to TriggerDefinition (Candidate D addition)
324
+ 3. If context-passing agent's design requires structured handoff data from routing to mode executors -> add a `contextBundle` field to mode types (implementation change, not routing design change)
325
+
326
+ ### Assumption that would invalidate this design
327
+
328
+ If `spawn_agent`'s artifact return limitation is fixed before implementation (planned PR), the coordinator could read classify-task's `recommendedPipeline` as a typed artifact instead of parsing notes. This would change `classify.ts` but not the overall C+E architecture. The design is forward-compatible with this improvement.
329
+
330
+ ---
331
+
332
+ ## Open Questions for the Main Agent
333
+
334
+ 1. Should `CLASSIFY_AND_RUN` mode be treated as a named mode with a typed variant (as in Candidate C), or should the coordinator convert classify-task's output into one of the 4 named modes (REVIEW_ONLY / QUICK_REVIEW / IMPLEMENT / FULL) based on what workflows appear in `recommendedPipeline`? The latter would eliminate the `CLASSIFY_AND_RUN` variant but require mapping logic.
335
+
336
+ 2. The context-passing agent has not yet filled in the "Assumptions the routing agent needs to know about" section of their doc. Before finalizing, should the routing design explicitly specify what each mode executor must inject as `assembledContextSummary` for each phase transition?
337
+
338
+ 3. Should the `AdaptiveCoordinatorDeps` interface be a strict superset of `CoordinatorDeps` from pr-review.ts, or should it be a separate interface that may share some methods? The safest approach: new interface, shared common methods via intersection type or copy.
339
+
340
+ 4. What is the wall-clock timeout per phase in the full pipeline? pr-review.ts hardcodes 15 minutes per child session. For wr.discovery and wr.shaping (which may run longer), a higher timeout is appropriate -- but it should be hardcoded, not LLM-computed (robustness rule 1 from pr-review.ts).
@@ -0,0 +1,135 @@
1
+ # Adaptive Coordinator Routing -- Design Review Findings
2
+
3
+ **Status:** Review complete
4
+ **Date:** 2026-04-19
5
+ **Reviewing:** `docs/design/adaptive-coordinator-routing.md` (selected design: A+E)
6
+ **For:** Main agent interpretation and final decision
7
+
8
+ ---
9
+
10
+ ## Tradeoff Review
11
+
12
+ | Tradeoff | Acceptable? | Condition that breaks it |
13
+ |----------|-------------|--------------------------|
14
+ | All ambiguous tasks run FULL (wasteful for Medium-complexity refactors) | Yes for MVP | >20% of real tasks are Medium-complexity refactors routed to FULL unnecessarily |
15
+ | `routeTask()` filesystem check injectable via `TaskSignals` | Yes | Multiple expensive filesystem signals needed (currently only pitchMdExists) |
16
+ | `QUICK_REVIEW` and `REVIEW_ONLY` as separate DU variants | Yes | Behaviors converge to same implementation (trivial merge at that point) |
17
+
18
+ **Hidden assumptions surfaced:**
19
+ 1. Discovery and shaping phases produce helpful (not misleading) output for all tasks applied to them
20
+ 2. `pitchMdExists` is the only filesystem routing signal (future signals added to `TaskSignals`)
21
+
22
+ ---
23
+
24
+ ## Failure Mode Review
25
+
26
+ | Failure mode | Design handles it? | Missing mitigation | Risk |
27
+ |--------------|-------------------|--------------------|------|
28
+ | PR number in non-review task misroutes to REVIEW_ONLY | Partially (--mode override exists) | Regex pattern too broad -- bare `#\d+` matches non-PR numbers | LOW-MEDIUM |
29
+ | Stale pitch.md routes new task to IMPLEMENT incorrectly | No -- convention not enforced | `runImplementPipeline()` must archive pitch.md after successful coding | MEDIUM |
30
+ | FULL pipeline timeout leaves intermediate state | Partial (wall-clock budget mentioned) | Per-phase timeouts not specified; intermediate state cleanup undefined | HIGH |
31
+
32
+ ---
33
+
34
+ ## Runner-Up / Simpler Alternative Review
35
+
36
+ **Runner-up (C+E):** Only element worth borrowing is `TaskSignals` as explicit value object -- already incorporated into design. CLASSIFY_AND_RUN mode correctly excluded (non-determinism + format-parsing fragility + FULL mode redundancy).
37
+
38
+ **Simpler variant (A without E -- monolithic file):** Would satisfy acceptance criteria but violates open/closed (adding a mode modifies shared file). `pr-review.ts` at 1462 lines for one mode is justification for E's decomposition from day one. Monolithic rejected.
39
+
40
+ ---
41
+
42
+ ## Philosophy Alignment
43
+
44
+ **All key principles satisfied:** illegal-states-unrepresentable, exhaustiveness, errors-as-data, validate-at-boundaries, determinism-over-cleverness, dependency-injection, YAGNI.
45
+
46
+ **Two acceptable tensions:**
47
+ 1. `fileExists()` I/O behind `CoordinatorDeps` injectable -- principle preserved by injection
48
+ 2. Mode executors are imperative (sequential spawns) -- routing layer is declarative/pure; this is the best achievable split for sequential pipeline coordination
49
+
50
+ ---
51
+
52
+ ## Findings
53
+
54
+ ### RED -- must fix before implementation
55
+
56
+ **R1: FULL pipeline per-phase timeouts and total budget not specified**
57
+
58
+ The design says "hardcoded timeouts" but does not specify the values. For a coordinator that chains 4 sessions, this is a required constant before implementation. If the coordinator times out mid-pipeline (after discovery but before coding), the repository is left with a `.workrail/current-pitch.md` file and no implementation -- a silent intermediate state that will misroute the next invocation.
59
+
60
+ **Required additions to the design:**
61
+ ```typescript
62
+ const DISCOVERY_SESSION_TIMEOUT_MS = 30 * 60 * 1000; // 30 min
63
+ const SHAPING_SESSION_TIMEOUT_MS = 30 * 60 * 1000; // 30 min
64
+ const CODING_SESSION_TIMEOUT_MS = 60 * 60 * 1000; // 60 min
65
+ const REVIEW_SESSION_TIMEOUT_MS = 20 * 60 * 1000; // 20 min
66
+ const FULL_PIPELINE_MAX_MS = 160 * 60 * 1000; // 160 min total
67
+ const FULL_PIPELINE_SPAWN_CUTOFF_MS = 130 * 60 * 1000; // 130 min (stop spawning new phases)
68
+ ```
69
+
70
+ And: `runFullPipeline()` must archive pitch.md if it was produced by shaping but coding times out or fails.
71
+
72
+ ---
73
+
74
+ ### ORANGE -- should fix before implementation
75
+
76
+ **O1: PR number regex pattern too broad**
77
+
78
+ Current routing rule: "goal contains PR/MR number" matches any bare `#\d+` in the goal text. This produces false positives for `"refactor PR #47 related auth code"` (wants IMPLEMENT, gets REVIEW_ONLY) or `"fix issue #123 in the auth module"` (wants FULL, gets REVIEW_ONLY).
79
+
80
+ **Required fix:** Use context-sensitive pattern matching:
81
+ - REVIEW_ONLY: `\bPR\s*#\d+\b` or `\bMR\s*!?\d+\b` with leading verb context (`review`, `check`, `approve`, etc.)
82
+ - Or more conservatively: require the goal to START with review-intent keywords (`"Review PR #..."`, `"Check MR ..."`) rather than contain a PR number anywhere
83
+
84
+ Recommendation: the routing function should be aware of ambiguous patterns and log a warning when a PR number is found but no review-intent verb precedes it.
85
+
86
+ **O2: pitch.md archival not specified**
87
+
88
+ `runImplementPipeline()` must rename `.workrail/current-pitch.md` to `.workrail/pitches/[timestamp]-[goal-slug]-pitch.md` (or similar) after the coding session completes successfully. Without this, stale pitch files cause incorrect IMPLEMENT routing for subsequent tasks.
89
+
90
+ The `IMPLEMENT` mode coordinator needs a post-coding cleanup step. This is an implementation detail but must be in the design spec before coding begins.
91
+
92
+ ---
93
+
94
+ ### YELLOW -- nice to fix, non-blocking
95
+
96
+ **Y1: `QUICK_REVIEW` and `REVIEW_ONLY` distinction underspecified**
97
+
98
+ The design mentions QUICK_REVIEW uses a "lighter model config" but does not define what "lighter" means -- which model, what `agentConfig` fields, what the expected speed/quality tradeoff is. Before implementing QUICK_REVIEW, define: `{ model: 'amazon-bedrock/claude-haiku-4-5', maxSessionMinutes: 5 }` or equivalent.
99
+
100
+ **Y2: `TaskSignals` interface not fully specified**
101
+
102
+ The design refers to `TaskSignals` but does not define all fields. A complete definition is needed before implementation:
103
+ ```typescript
104
+ interface TaskSignals {
105
+ readonly triggerProvider: string; // 'generic' | 'github_prs_poll' | 'github_issues_poll' | 'gitlab_poll'
106
+ readonly pitchMdExists: boolean; // .workrail/current-pitch.md exists in workspace
107
+ readonly issueLabels: readonly string[]; // labels from trigger payload (empty if not from polling trigger)
108
+ readonly explicitMode?: string; // from --mode CLI flag or trigger context variable
109
+ }
110
+ ```
111
+
112
+ **Y3: `AdaptiveCoordinatorDeps` vs `CoordinatorDeps` relationship**
113
+
114
+ The design does not specify whether `AdaptiveCoordinatorDeps` extends `CoordinatorDeps` from pr-review.ts or is a separate interface. Recommendation: separate interface that copies shared methods (no inheritance from pr-review.ts since that coordinator's deps are highly specific to PR review). Shared pattern, not shared type.
115
+
116
+ ---
117
+
118
+ ## Recommended Revisions
119
+
120
+ 1. **(RED) Add per-phase timeout constants and intermediate state cleanup to design** -- required before implementation
121
+ 2. **(ORANGE) Tighten PR number routing regex** -- reduces false positives meaningfully
122
+ 3. **(ORANGE) Specify pitch.md archival in `runImplementPipeline()`** -- prevents stale routing
123
+ 4. **(YELLOW) Define `TaskSignals` interface fully**
124
+ 5. **(YELLOW) Define QUICK_REVIEW model config**
125
+ 6. **(YELLOW) Specify `AdaptiveCoordinatorDeps` relationship to `CoordinatorDeps`**
126
+
127
+ ---
128
+
129
+ ## Residual Concerns
130
+
131
+ 1. **Discovery/shaping quality for tasks they shouldn't run on.** If `wr.discovery` is unhelpful for "refactor auth.ts" tasks (just produces boilerplate), the FULL default becomes actively harmful, not just wasteful. Monitoring needed in production: log which tasks route to FULL and whether discovery findings are actually used by the shaping phase.
132
+
133
+ 2. **No checkpoint/resume for multi-phase pipeline.** If the coordinator crashes mid-FULL-pipeline, there is no way to resume from the completed phases. The current design requires re-running from the beginning. This is acceptable for MVP but should be tracked as a gap.
134
+
135
+ 3. **The context-passing agent's `adaptive-coordinator-context.md` did not exist at review time.** The assumptions section in the routing design is speculative. If the context-passing design introduces new contracts (e.g., coordinator must inject a `discoveryDoc` at the shaping spawn), the routing design is unaffected (routing is pure, context injection is per-mode), but the mode coordinator implementations need updating.
@@ -0,0 +1,156 @@
1
+ # Adaptive Coordinator Routing -- Design Review Findings
2
+
3
+ **Status:** Generated by wr.discovery workflow, 2026-04-19
4
+ **Selected design:** Candidate A (pure static routing) + Candidate E (per-mode file architecture)
5
+ **Main design doc:** `docs/design/adaptive-coordinator-routing.md`
6
+
7
+ ---
8
+
9
+ ## Tradeoff Review
10
+
11
+ ### Tradeoff 1: All ambiguous tasks default to FULL pipeline
12
+
13
+ - **Status: ACCEPTED**
14
+ - Covers all 5 stated use cases correctly
15
+ - Hidden assumption: wr.discovery runtime < 30 minutes (needs validation)
16
+ - Phase-level escalation required: discovery-fail and shaping-fail must be independent escalation points in full-pipeline.ts
17
+ - **Action required:** Document in design doc that FULL pipeline requires per-phase escalation, not just a top-level catch
18
+
19
+ ### Tradeoff 2: routeTask() filesystem check is injectable
20
+
21
+ - **Status: ACCEPTED with implementation note**
22
+ - `AdaptiveCoordinatorDeps` needs a new `fileExists(path: string): Promise<boolean>` method
23
+ - This method is NOT in the current `CoordinatorDeps` from pr-review.ts
24
+ - **Action required:** Document `AdaptiveCoordinatorDeps` as a new interface (not extending CoordinatorDeps), specifying which methods are shared vs. new
25
+
26
+ ### Tradeoff 3: QUICK_REVIEW and REVIEW_ONLY as separate discriminated union variants
27
+
28
+ - **Status: ACCEPTED with clarification needed**
29
+ - Hidden assumption exposed: QUICK_REVIEW behavior difference must be realized via specialized goal string to mr-review, not a new workflow flag
30
+ - If mr-review ignores goal hints, QUICK_REVIEW = REVIEW_ONLY (acceptable for MVP)
31
+ - **Action required:** Document that QUICK_REVIEW passes goal prefix `[DEP BUMP]` to the review session
32
+
33
+ ---
34
+
35
+ ## Failure Mode Review
36
+
37
+ ### FM1: PR number in non-review task goal string
38
+
39
+ - **Status: ADEQUATE mitigation** -- routing log printed before spawn; `--mode` override available
40
+ - Severity: LOW -- edge case, transparent to user
41
+
42
+ ### FM2: Stale pitch.md from previous task (HIGHEST RISK)
43
+
44
+ - **Status: INADEQUATE mitigation** -- routing design doc does not address pitch.md lifecycle
45
+ - The IMPLEMENT mode can silently route a new task to the wrong pitch
46
+ - **Action required (ORANGE):** Document pitch.md lifecycle invariant: IMPLEMENT mode executor (`modes/implement.ts`) must archive or delete pitch.md after coding session completes (success OR failure). This is a pipeline executor invariant, not a routing layer change.
47
+
48
+ ### FM3: FULL pipeline phase timeout
49
+
50
+ - **Status: INCOMPLETE specification**
51
+ - pr-review.ts timeouts (15 min child session, 70 min spawn cutoff) are wrong for FULL pipeline
52
+ - FULL pipeline (discovery 30min + shaping 30min + coding 60min = 120min possible)
53
+ - **Action required (ORANGE):** Specify explicit timeout constants for adaptive coordinator in the design doc: `DISCOVERY_TIMEOUT_MS = 35*60*1000`, `SHAPING_TIMEOUT_MS = 35*60*1000`, `CODING_TIMEOUT_MS = 65*60*1000`, `COORDINATOR_SPAWN_CUTOFF_MS = 100*60*1000`, `COORDINATOR_MAX_MS = 120*60*1000`
54
+
55
+ ### FM4: Discovery session produces empty/trivial notes
56
+
57
+ - **Status: EXPLICITLY ACCEPTED** -- acceptable for MVP to proceed to shaping with thin context
58
+ - Severity: LOW -- quality degradation, not a correctness failure
59
+ - Note: FULL mode executor should log a warning when discovery notes are < N characters before spawning shaping
60
+
61
+ ---
62
+
63
+ ## Runner-Up / Simpler Alternative Review
64
+
65
+ ### Runner-up (Candidate C: static-first + LLM fallback)
66
+
67
+ **One element worth preserving:** Write `parseRecommendedPipeline()` as a pure function in `routing/parse-classify-output.ts` with tests, but do NOT call it in the coordinator at MVP. This preserves the upgrade path to Candidate C if static routing proves insufficient.
68
+
69
+ **Cost:** ~30 lines of code + tests. Low cost, high future value.
70
+
71
+ ### Simpler variant (single coordinator file)
72
+
73
+ **Rejected.** For 4+ pipeline modes, a single file would be 4000+ lines or require the same internal decomposition as Candidate E. Candidate E makes the seams explicit and testable without adding meaningful overhead.
74
+
75
+ ---
76
+
77
+ ## Philosophy Alignment
78
+
79
+ | Principle | Status | Note |
80
+ |-----------|--------|------|
81
+ | Immutability by default | SATISFIED | All types readonly, routeTask() pure |
82
+ | Make illegal states unrepresentable | SATISFIED | PipelineMode discriminated union |
83
+ | Type safety first | SATISFIED | Result<PipelineMode, string>, no null |
84
+ | Errors are data | SATISFIED | Result pattern, escalated+reason |
85
+ | Exhaustiveness | SATISFIED | switch + assertNever in dispatcher |
86
+ | Dependency injection | SATISFIED | AdaptiveCoordinatorDeps injectable |
87
+ | YAGNI with discipline | SATISFIED | 4 modes, no general engine |
88
+ | Determinism over cleverness | SATISFIED | Pure static routing function |
89
+ | Document why not what | REQUIRES ACTION | Coordinator header block needed |
90
+ | Functional/declarative | UNDER TENSION (acceptable) | Mode executors are imperative; mitigation: pure routing core, imperative execution shell |
91
+ | Compose with small pure functions | UNDER TENSION (acceptable) | Same tension as pr-review.ts; same mitigation |
92
+
93
+ ---
94
+
95
+ ## Findings
96
+
97
+ ### RED (blocking -- must address before implementation)
98
+
99
+ None.
100
+
101
+ ### ORANGE (significant -- should address before implementation)
102
+
103
+ **O1: FULL pipeline timeout constants not specified**
104
+ The design doc does not specify concrete timeout values for the adaptive coordinator. pr-review.ts's hardcoded constants are wrong for the FULL pipeline duration. Implementation will guess values without explicit guidance.
105
+ *Required action:* Add a "Timing Constants" section to the design doc with explicit millisecond values for each timeout.
106
+
107
+ **O2: Pitch.md lifecycle invariant not documented**
108
+ The IMPLEMENT mode can silently misroute if pitch.md is stale. The routing design doc is the natural place to document that IMPLEMENT mode's `modes/implement.ts` must consume pitch.md after completion.
109
+ *Required action:* Add pitch.md lifecycle invariant to the design doc.
110
+
111
+ ### YELLOW (advisory -- address if convenient)
112
+
113
+ **Y1: AdaptiveCoordinatorDeps interface specification missing**
114
+ The design doc does not specify which methods are shared with `CoordinatorDeps` and which are new (e.g., `fileExists`). An implementer will have to infer this.
115
+ *Required action:* Add a "New deps interface methods" section.
116
+
117
+ **Y2: QUICK_REVIEW goal string format not specified**
118
+ The design says QUICK_REVIEW passes `[DEP BUMP]` goal prefix. The exact format affects how mr-review interprets the task. Should be explicit.
119
+ *Required action:* Specify goal string template for QUICK_REVIEW in the design doc.
120
+
121
+ **Y3: parseRecommendedPipeline() should be written at implementation time**
122
+ Even though not called at MVP, having the pure function ready preserves the upgrade path to Candidate C.
123
+ *Required action:* Include in implementation scope as a zero-cost pure function with tests.
124
+
125
+ ---
126
+
127
+ ## Recommended Revisions to Design Doc
128
+
129
+ 1. Add **Timing Constants** section with explicit millisecond values:
130
+ - `DISCOVERY_TIMEOUT_MS = 35 * 60 * 1000` (35 min)
131
+ - `SHAPING_TIMEOUT_MS = 35 * 60 * 1000` (35 min)
132
+ - `CODING_TIMEOUT_MS = 65 * 60 * 1000` (65 min)
133
+ - `COORDINATOR_SPAWN_CUTOFF_MS = 100 * 60 * 1000` (100 min)
134
+ - `COORDINATOR_MAX_MS = 120 * 60 * 1000` (120 min)
135
+
136
+ 2. Add **Pitch.md Lifecycle Invariant** to the design doc under pipeline invariants:
137
+ - IMPLEMENT mode executor archives/deletes `.workrail/current-pitch.md` after coding session completes or fails
138
+ - Archive path: `.workrail/used-pitches/pitch-{timestamp}.md`
139
+
140
+ 3. Add **AdaptiveCoordinatorDeps interface** sketch:
141
+ - New methods beyond CoordinatorDeps: `fileExists(path: string): Promise<boolean>`
142
+ - Shared methods (same signature): `spawnSession`, `awaitSessions`, `getAgentResult`, `mergePR`, `listOpenPRs`, `writeFile`, `stderr`, `now`
143
+ - Dropped methods (not needed for adaptive): none, but `listOpenPRs` is only used by REVIEW_ONLY and QUICK_REVIEW modes
144
+
145
+ 4. Add **QUICK_REVIEW goal template**:
146
+ - `[DEP BUMP] Review PR #${prNumber}: ${prTitle} -- skip architecture audit, verify version compatibility and test coverage only`
147
+
148
+ ---
149
+
150
+ ## Residual Concerns
151
+
152
+ 1. **wr.discovery output standardization**: the routing design assumes wr.discovery notes are injected by the coordinator as `assembledContextSummary` for wr.shaping. But wr.discovery's `designDocPath` output location is not standardized (finding from context-passing agent's doc). The FULL mode executor must parse `lastStepNotes` from the discovery session to build the shaping context -- this is per the context-passing agent's Candidate D (coordinator-injected text). This concern is correctly owned by the context-passing design, not the routing design.
153
+
154
+ 2. **classify-task-workflow format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
155
+
156
+ 3. **REVIEW_ONLY vs pr-review coordinator**: the existing `worktrain run pr-review` command already provides REVIEW_ONLY+QUICK_REVIEW behavior. The new `worktrain run pipeline --mode review_only` should either (a) delegate to pr-review coordinator, or (b) reimplement the same logic in `modes/review-only.ts`. Recommendation: (a) delegate -- avoid duplicating the fix-agent loop logic. Document this delegation explicitly.