@exaudeus/workrail 3.42.0 → 3.44.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console-ui/assets/{index-DwfWMKvv.js → index-Bi38ITiQ.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/daemon/workflow-runner.d.ts +15 -1
- package/dist/daemon/workflow-runner.js +86 -9
- package/dist/manifest.json +39 -23
- package/dist/trigger/adapters/github-queue-poller.d.ts +34 -0
- package/dist/trigger/adapters/github-queue-poller.js +200 -0
- package/dist/trigger/delivery-action.d.ts +2 -0
- package/dist/trigger/delivery-action.js +24 -0
- package/dist/trigger/github-queue-config.d.ts +18 -0
- package/dist/trigger/github-queue-config.js +155 -0
- package/dist/trigger/polling-scheduler.d.ts +1 -0
- package/dist/trigger/polling-scheduler.js +185 -6
- package/dist/trigger/trigger-router.js +24 -1
- package/dist/trigger/trigger-store.js +77 -2
- package/dist/trigger/types.d.ts +19 -0
- package/docs/design/adaptive-coordinator-context-candidates.md +265 -0
- package/docs/design/adaptive-coordinator-context-review.md +101 -0
- package/docs/design/adaptive-coordinator-context.md +504 -0
- package/docs/design/adaptive-coordinator-routing-candidates.md +340 -0
- package/docs/design/adaptive-coordinator-routing-design-review.md +135 -0
- package/docs/design/adaptive-coordinator-routing-review.md +156 -0
- package/docs/design/adaptive-coordinator-routing.md +660 -0
- package/docs/design/context-assembly-layer-design-review.md +110 -0
- package/docs/design/context-assembly-layer.md +622 -0
- package/docs/design/stuck-escalation-candidates.md +176 -0
- package/docs/design/stuck-escalation-design-review.md +70 -0
- package/docs/design/stuck-escalation.md +326 -0
- package/docs/design/worktrain-task-queue-candidates.md +252 -0
- package/docs/design/worktrain-task-queue-design-review.md +109 -0
- package/docs/design/worktrain-task-queue.md +443 -0
- package/docs/design/worktree-review-findings-candidates.md +101 -0
- package/docs/design/worktree-review-findings-design-review.md +65 -0
- package/docs/design/worktree-review-findings-implementation-plan.md +153 -0
- package/docs/ideas/backlog.md +148 -0
- package/package.json +3 -3
|
@@ -0,0 +1,660 @@
|
|
|
1
|
+
# Adaptive Coordinator Routing -- Discovery Design Document
|
|
2
|
+
|
|
3
|
+
**Status:** COMPLETE (wr.discovery workflow, 2026-04-19)
|
|
4
|
+
**Date:** 2026-04-19
|
|
5
|
+
**Author:** WorkTrain autonomous session
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Ask
|
|
10
|
+
|
|
11
|
+
**Statedgoal (original framing):** Design the routing/classification layer of an adaptive WorkTrain pipeline coordinator -- one that looks at an incoming task and decides which phases to run.
|
|
12
|
+
|
|
13
|
+
**Reframed problem:** WorkTrain has no way to dispatch the right workflow sequence for a task without a human deciding which coordinator to invoke -- every task type today needs a bespoke hardcoded coordinator script.
|
|
14
|
+
|
|
15
|
+
**Scope:** `src/coordinators/`, `src/trigger/`, `src/cli/`. NOT `src/mcp/`.
|
|
16
|
+
|
|
17
|
+
**Coordination artifact:** This doc captures routing/classification design decisions. A parallel discovery agent is designing inter-phase context passing in `docs/design/adaptive-coordinator-context.md`.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Path Recommendation
|
|
22
|
+
|
|
23
|
+
**Chosen path:** `design_first`
|
|
24
|
+
|
|
25
|
+
**Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `classify-task-workflow.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Constraints / Anti-goals
|
|
30
|
+
|
|
31
|
+
**Constraints:**
|
|
32
|
+
- Must follow `CoordinatorDeps` injection pattern from `pr-review.ts`
|
|
33
|
+
- Must not design for `src/mcp/`
|
|
34
|
+
- Must use `spawnSession`/`awaitSessions` for workflow session dispatch
|
|
35
|
+
- Failure policy: escalate with structured reason, never silently substitute a different pipeline
|
|
36
|
+
- TypeScript script, not a workflow JSON
|
|
37
|
+
|
|
38
|
+
**Anti-goals:**
|
|
39
|
+
- Do not design an orchestration workflow (JSON-defined pipeline) -- this is a coordinator script
|
|
40
|
+
- Do not require the daemon to know about pipeline modes -- the coordinator owns routing
|
|
41
|
+
- Do not add `pipelineMode` to `TriggerDefinition` unless there is no better option
|
|
42
|
+
- Do not couple the adaptive coordinator to `src/mcp/`
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Artifact Strategy
|
|
47
|
+
|
|
48
|
+
This document is a **human-readable artifact only**. It is not workflow execution truth.
|
|
49
|
+
Workflow execution truth lives in WorkRail step notes and context variables.
|
|
50
|
+
|
|
51
|
+
If a chat rewind occurs: the notes and context variables survive; this file may not. Do not rely on this file as the sole record of design decisions -- always cross-check with WorkRail session notes.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Landscape Packet
|
|
56
|
+
|
|
57
|
+
### Current state (as of Apr 19, 2026)
|
|
58
|
+
|
|
59
|
+
**What exists:**
|
|
60
|
+
- `src/coordinators/pr-review.ts` -- 1462-line hardcoded coordinator for PR review. Establishes the `CoordinatorDeps` injectable interface (16 methods), `spawnSession`/`awaitSessions`/`getAgentResult` pattern, fix-agent loop with escalation-first failure policy.
|
|
61
|
+
- `workflows/classify-task-workflow.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
|
|
62
|
+
- `src/cli-worktrain.ts` -- wires `worktrain run pr-review` subcommand. No `worktrain run pipeline` or adaptive coordinator command exists yet.
|
|
63
|
+
- `src/trigger/types.ts` -- `TriggerDefinition` has `workflowId`, `goal`, `goalTemplate`, `contextMapping`, `agentConfig`. No `pipelineMode` field.
|
|
64
|
+
- Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> coding-task-workflow-agentic`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
|
|
65
|
+
- `wr.shaping` and `wr.discovery` workflows both exist as of v3.40.0.
|
|
66
|
+
- `coding-task-workflow-agentic` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
|
|
67
|
+
|
|
68
|
+
**The Apr 15 backlog full pipeline DAG** (still relevant design intent):
|
|
69
|
+
```
|
|
70
|
+
trigger
|
|
71
|
+
-> [always] classify-task (outputs: taskComplexity, riskLevel, hasUI, touchesArchitecture)
|
|
72
|
+
-> [if taskComplexity != Small] discovery
|
|
73
|
+
-> [if hasUI] ux-design
|
|
74
|
+
-> [if touchesArchitecture OR riskLevel=High] architecture-design + arch-review
|
|
75
|
+
-> [always] coding-task
|
|
76
|
+
-> [always] mr-review -> (clean: merge | minor: fix-agent-loop | blocking: escalate)
|
|
77
|
+
-> [if riskLevel=High] prod-risk-audit
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**What is NOT yet built:**
|
|
81
|
+
- `src/coordinators/adaptive-pipeline.ts` (the target of this design)
|
|
82
|
+
- `worktrain run pipeline` CLI command
|
|
83
|
+
- Pipeline-mode routing logic of any kind
|
|
84
|
+
|
|
85
|
+
### Hard constraints
|
|
86
|
+
|
|
87
|
+
1. Coordinator is a TypeScript script, not a workflow JSON -- it calls `spawnSession`/`awaitSessions`.
|
|
88
|
+
2. Failure policy from `pr-review.ts` is canonical: escalate with structured reason, never silently substitute a different pipeline.
|
|
89
|
+
3. `CoordinatorDeps` injection pattern must be followed (testability requirement).
|
|
90
|
+
4. Scope: `src/coordinators/`, `src/trigger/`, `src/cli/` only.
|
|
91
|
+
|
|
92
|
+
### Contradictions and tensions
|
|
93
|
+
|
|
94
|
+
- **classify-task-workflow is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/classify-task-workflow.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
|
|
95
|
+
- **"Always run classify-task first"** (Apr 15 backlog) vs. **"Static heuristics for well-known cases"** (primary uncertainty). The Apr 15 backlog says "always" but this was written before Phase 0.5 upstream context detection was built. With Phase 0.5, many routing decisions can be made statically.
|
|
96
|
+
- **`recommendedPipeline` from classify-task** includes `wr.discovery` for Medium/Large tasks, but the Three-Workflow Pipeline decision treats `wr.discovery` as optional. The coordinator must decide: use classify-task's `recommendedPipeline` verbatim, or treat it as a hint that can be overridden by static signals (e.g., pitch.md already present = skip discovery even if classify says Medium)?
|
|
97
|
+
|
|
98
|
+
### Evidence gaps
|
|
99
|
+
|
|
100
|
+
1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `classify-task-workflow`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
|
|
101
|
+
2. No existing test harness for a multi-mode coordinator. `pr-review.ts` tests exist but only cover the review pipeline.
|
|
102
|
+
3. The `worktrain-spawn.ts` CLI wiring for `spawnSession` is the only proven path to dispatch sessions from a coordinator script. No other dispatch mechanism has been tested.
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Problem Frame Packet
|
|
107
|
+
|
|
108
|
+
### Users and stakeholders
|
|
109
|
+
|
|
110
|
+
| User | Job | Pain | What success looks like |
|
|
111
|
+
|------|-----|------|------------------------|
|
|
112
|
+
| Developer triggering tasks via CLI | Run the right pipeline without knowing pipeline internals | Has to manually pick the right coordinator command per task type | Types `worktrain run pipeline --task "..."` and the right phases run |
|
|
113
|
+
| Trigger operator (triggers.yml author) | Configure automatic response to webhooks/PRs | Must hardcode a single workflowId per trigger -- no pipeline awareness | Can configure a trigger that routes to the right pipeline mode dynamically |
|
|
114
|
+
| WorkTrain developer extending coordinator | Add a new pipeline mode | Must read 1462 lines of pr-review.ts to understand the pattern, then duplicate the structure | New mode is a named, documented, testable function following a clear interface |
|
|
115
|
+
| WorkTrain runtime (daemon) | Dispatch the right coordinator | Knows nothing about pipeline modes -- just spawns what it's told | Coordinator handles all pipeline routing; daemon stays generic |
|
|
116
|
+
|
|
117
|
+
### Key tensions
|
|
118
|
+
|
|
119
|
+
1. **LLM accuracy vs dispatch latency**: always-classify gives accurate routing but adds a full LLM turn (classification cost: ~$0.002 on Haiku, but latency is ~5-15 seconds) before any real work starts. Static heuristics are instant but fail on ambiguous tasks.
|
|
120
|
+
|
|
121
|
+
2. **Flexibility vs explicit configuration**: static heuristics are implicit and may surprise users ("why did it skip discovery?"). Explicit `pipelineMode` on the trigger is transparent but requires more config. The ideal is: explicit where the user knows the mode, heuristic where they don't.
|
|
122
|
+
|
|
123
|
+
3. **Single coordinator file vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one mode. A monolithic adaptive coordinator handling all modes risks becoming unmaintainable. Per-mode coordinator functions (each independently testable) with a thin routing dispatcher is a cleaner architecture -- but introduces coordination between files.
|
|
124
|
+
|
|
125
|
+
4. **`recommendedPipeline` verbatim vs as a hint**: classify-task-workflow encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
|
|
126
|
+
|
|
127
|
+
5. **Phase 0.5 vs coordinator routing for upstream context**: coding-task already auto-detects pitch.md. So the coordinator's routing decision for "skip wr.shaping?" partially duplicates Phase 0.5's detection. The coordinator should route based on what phases to _spawn_, not what the coding workflow will internally skip -- but these can diverge (coordinator spawns shaping but coding-task's Phase 0.5 would have skipped it anyway).
|
|
128
|
+
|
|
129
|
+
### Success criteria (observable)
|
|
130
|
+
|
|
131
|
+
- [ ] A `worktrain run pipeline --task "fix the race condition in auth.ts"` command routes to the correct pipeline mode and logs the routing decision before spawning any sessions
|
|
132
|
+
- [ ] A task with `#123` or `PR #123` in the goal routes to REVIEW_ONLY without spawning discovery or shaping sessions
|
|
133
|
+
- [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (coding-task-workflow-agentic only)
|
|
134
|
+
- [ ] An ambiguous task (no static signal) routes to classify-task-workflow session, parses `recommendedPipeline`, and executes that pipeline
|
|
135
|
+
- [ ] A `dep bump` or `chore:` task routes to QUICK_REVIEW (mr-review only, no arch audit) based on goal text heuristics
|
|
136
|
+
- [ ] Any phase failure produces a `PipelineOutcome` with `escalated: true` and a structured `escalationReason` -- no silent substitution
|
|
137
|
+
- [ ] The `CoordinatorDeps` interface for the adaptive coordinator extends or reuses the existing `CoordinatorDeps` pattern from `pr-review.ts`
|
|
138
|
+
- [ ] A developer reading the coordinator code can identify which pipeline mode a given task will route to by reading a single routing function
|
|
139
|
+
|
|
140
|
+
### Assumptions not yet verified
|
|
141
|
+
|
|
142
|
+
1. `classify-task-workflow` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
|
|
143
|
+
2. The `recommendedPipeline` text can be reliably parsed from classify-task-workflow's note output using a regex or structured block parser
|
|
144
|
+
3. A new CLI subcommand `worktrain run pipeline` can be added following the same pattern as `worktrain run pr-review` in `src/cli-worktrain.ts`
|
|
145
|
+
4. Pipeline modes can be named and bounded at design time (not open-ended)
|
|
146
|
+
|
|
147
|
+
### Primary framing risk
|
|
148
|
+
|
|
149
|
+
**The framing assumes that "which phases to run" is the right decomposition.** If the real problem is "how to pass context between phases so each phase doesn't re-discover what the previous phase already found", then a routing/classification layer solves the wrong problem -- the bottleneck is inter-phase context, not phase selection. Evidence that would confirm this risk: if a parallel discovery session produces `docs/design/adaptive-coordinator-context.md` showing that context passing is the dominant complexity, the routing layer design should be subordinate to that.
|
|
150
|
+
|
|
151
|
+
### HMW (How Might We) reframes
|
|
152
|
+
|
|
153
|
+
- HMW make the pipeline mode explicit in the trigger config so routing is never ambiguous, while still supporting dynamic routing for ad-hoc CLI invocations?
|
|
154
|
+
- HMW use classify-task-workflow's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
|
|
155
|
+
|
|
156
|
+
### Primary uncertainty (updated)
|
|
157
|
+
|
|
158
|
+
Can classify-task-workflow's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
|
|
159
|
+
|
|
160
|
+
### Known approaches
|
|
161
|
+
|
|
162
|
+
1. **classify-task-workflow first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
|
|
163
|
+
2. **Static heuristics** -- parse goal text and trigger metadata (PR number present, labels, pitch.md present, explicit pipelineMode flag on trigger). Zero LLM cost, covers well-defined cases.
|
|
164
|
+
3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `classify-task-workflow` is an optional fast path, not always required.
|
|
165
|
+
4. **Explicit `pipelineMode` on trigger** -- add a `pipelineMode` field to `TriggerDefinition` (or as a context variable). Users/triggers declare mode explicitly. Removes ambiguity but requires configuration overhead.
|
|
166
|
+
5. **classify-task advisory + static overrides** -- run classify-task first (small cost, accurate), then apply static override rules on top of `recommendedPipeline` to handle well-known signals. Classify sets the baseline; static rules correct known exceptions.
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Candidate Generation Expectations
|
|
171
|
+
|
|
172
|
+
**Path:** `design_first`, rigor: `thorough`
|
|
173
|
+
|
|
174
|
+
**Requirements for the candidate set:**
|
|
175
|
+
1. At least one candidate must meaningfully reframe the problem (not just package obvious LLM-vs-heuristics variants)
|
|
176
|
+
2. At least one candidate must address the monolithic-vs-decomposition tension directly (architecture of the coordinator script itself, not just routing logic)
|
|
177
|
+
3. At least one candidate must be more conservative -- building only what is needed for the immediate use cases without speculative generality
|
|
178
|
+
4. Candidates must collectively span the full design space: pure static, pure LLM-classify, hybrid, advisory+overrides, and at least one unexpected direction
|
|
179
|
+
5. Every candidate must address failure handling explicitly (not leave it open)
|
|
180
|
+
6. Extra push required: if the 5 candidates feel clustered around "hybrid LLM+heuristics", force a 6th that radically simplifies or radically separates concerns
|
|
181
|
+
|
|
182
|
+
**Anti-criteria (eliminate these):**
|
|
183
|
+
- Candidates that require new engine primitives (context-gather step type, new daemon features) -- out of scope for this coordinator design
|
|
184
|
+
- Candidates that route through `src/mcp/` -- explicitly out of scope
|
|
185
|
+
- Candidates that do not follow `CoordinatorDeps` injection pattern
|
|
186
|
+
|
|
187
|
+
## Candidate Directions
|
|
188
|
+
|
|
189
|
+
### Cross-check with context-passing agent
|
|
190
|
+
|
|
191
|
+
`docs/design/adaptive-coordinator-context.md` exists and was read before generating candidates. Key finding: the context-passing agent confirms that file-based handoff (pitch.md) already covers Shaping->Coding, and the dominant context gap is Discovery->Shaping. The routing design must account for:
|
|
192
|
+
- Discovery writes a design doc to a path (e.g., `.workrail/current-discovery.md`) if the file convention is adopted
|
|
193
|
+
- Shaping session needs this path injected as `assembledContextSummary` at spawn time
|
|
194
|
+
- The coordinator is the bridging layer -- it reads `lastStepNotes` from the Discovery session and injects context for the Shaping spawn
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
### Candidate A: Pure static routing with named pipeline modes (simplest, YAGNI)
|
|
199
|
+
|
|
200
|
+
**One-sentence summary:** A `routeTask()` function applies prioritized static rules against the goal string and workspace filesystem to select one of 5 named `PipelineMode` variants; no LLM classification step.
|
|
201
|
+
|
|
202
|
+
**Pipeline modes:**
|
|
203
|
+
- `REVIEW_ONLY` -- triggered by: goal contains PR/MR number (`#\d+`, `PR #\d+`, `MR \d+`) or explicit `review:` prefix
|
|
204
|
+
- `QUICK_REVIEW` -- triggered by: goal contains dep-bump keywords (`bump`, `chore:`, `dependabot`, `dependency upgrade`) AND contains PR/MR number
|
|
205
|
+
- `IMPLEMENT` -- triggered by: `.workrail/current-pitch.md` exists in workspace (Phase 0.5 will auto-detect it)
|
|
206
|
+
- `FULL` -- default: none of the above static signals present
|
|
207
|
+
- `ESCALATE` -- triggered by: static routing fails with a structural error (workspace not found, etc.)
|
|
208
|
+
|
|
209
|
+
**Routing function shape:**
|
|
210
|
+
```typescript
|
|
211
|
+
type PipelineMode =
|
|
212
|
+
| { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
|
|
213
|
+
| { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
|
|
214
|
+
| { kind: 'IMPLEMENT'; pitchPath: string }
|
|
215
|
+
| { kind: 'FULL'; goal: string }
|
|
216
|
+
| { kind: 'ESCALATE'; reason: string };
|
|
217
|
+
|
|
218
|
+
function routeTask(goal: string, workspace: string): PipelineMode
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Per-mode pipeline sequences:**
|
|
222
|
+
- `REVIEW_ONLY`: `mr-review-workflow.agentic.v2` -> route by verdict (clean: merge, minor: fix-agent-loop, blocking: escalate)
|
|
223
|
+
- `QUICK_REVIEW`: same as REVIEW_ONLY but `agentConfig: { model: 'haiku-light' }`, no arch audit even if touched
|
|
224
|
+
- `IMPLEMENT`: `coding-task-workflow-agentic` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
|
|
225
|
+
- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> `mr-review-workflow.agentic.v2` -> merge
|
|
226
|
+
|
|
227
|
+
**Failure handling:** each phase failure returns a `PipelineOutcome` with `escalated: true` and `escalationReason`. No fallback to simpler pipeline. Same pattern as `PrOutcome` in pr-review.ts.
|
|
228
|
+
|
|
229
|
+
**Tensions resolved:** determinism (pure function), YAGNI (no LLM cost), CoordinatorDeps (routing pure, execution injected).
|
|
230
|
+
**Tensions accepted:** routing is heuristic, not intelligent -- a PR-based task with a pitch in the repo would route to REVIEW_ONLY and skip the IMPLEMENT mode.
|
|
231
|
+
**Failure mode to watch:** edge cases where static signals conflict (PR number AND pitch.md both present). Disambiguation rule needed: REVIEW_ONLY wins over IMPLEMENT.
|
|
232
|
+
**Follows:** CoordinatorDeps injection pattern, pr-review.ts discriminated union approach.
|
|
233
|
+
**Gain:** Zero dispatch latency for routing; fully deterministic; easy to test.
|
|
234
|
+
**Give up:** Cannot handle ambiguous tasks. Any task not matching a static signal falls into FULL.
|
|
235
|
+
**Impact surface:** CLI `worktrain run pipeline`; trigger.yml operators who rely on goal text format.
|
|
236
|
+
**Scope judgment:** Best-fit for the immediate 4-5 use cases named in the problem statement.
|
|
237
|
+
**Philosophy:** Honors immutability, exhaustiveness, determinism-over-cleverness, YAGNI. Conflicts with nothing.
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
### Candidate B: classify-task-workflow as authoritative source (pure LLM routing)
|
|
242
|
+
|
|
243
|
+
**One-sentence summary:** The coordinator always spawns a `classify-task-workflow` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
|
|
244
|
+
|
|
245
|
+
**Architecture:**
|
|
246
|
+
```typescript
|
|
247
|
+
async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
|
|
248
|
+
const handle = await deps.spawnSession('classify-task-workflow', goal, workspace);
|
|
249
|
+
const result = await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS);
|
|
250
|
+
const notes = await deps.getAgentResult(handle);
|
|
251
|
+
return parseRecommendedPipeline(notes.recapMarkdown); // pure function, text block parser
|
|
252
|
+
}
|
|
253
|
+
// Then: for workflowId of recommendedPipeline, spawn in sequence
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
`parseRecommendedPipeline` is a pure function that extracts the `recommendedPipeline: ["...", "..."]` line from the structured text block, following `parseFindingsFromNotes` two-tier strategy: JSON block first, text regex fallback.
|
|
257
|
+
|
|
258
|
+
**Pipeline modes:** not named at the coordinator level -- the pipeline IS whatever classify-task returns. The coordinator just runs the sequence.
|
|
259
|
+
|
|
260
|
+
**Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
|
|
261
|
+
|
|
262
|
+
**Tensions resolved:** intelligent routing for ambiguous tasks; single source of truth for pipeline selection rules (the workflow, not the coordinator).
|
|
263
|
+
**Tensions accepted:** non-deterministic (same task may classify differently); adds 5-15 second LLM latency per dispatch; `recommendedPipeline` is a string array of workflow IDs, not a typed discriminated union.
|
|
264
|
+
**Failure mode to watch:** coordinator runs `wr.discovery` unnecessarily for PR-only tasks if classify-task misclassifies them. Recovery: add static pre-check before spawning classify-task.
|
|
265
|
+
**Follows:** classify-task-workflow's existing decision rules are already correct; this candidate delegates trust to them.
|
|
266
|
+
**Gain:** routing rules live in the workflow, not the coordinator -- can be updated without code changes.
|
|
267
|
+
**Give up:** determinism, routing transparency (routing reason requires parsing LLM output), typed pipeline modes.
|
|
268
|
+
**Impact surface:** classify-task-workflow becomes a critical dependency -- format changes break coordinator.
|
|
269
|
+
**Scope judgment:** Best-fit for teams that want routing rules to evolve without code deployment.
|
|
270
|
+
**Philosophy:** Honors dependency injection (classify-task as a boundary). Conflicts with determinism-over-cleverness (LLM routing is clever but non-deterministic).
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
### Candidate C: static-first with LLM fallback (hybrid, recommended)
|
|
275
|
+
|
|
276
|
+
**One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to classify-task-workflow only for ambiguous tasks where no static signal fires.
|
|
277
|
+
|
|
278
|
+
**Architecture:**
|
|
279
|
+
```typescript
|
|
280
|
+
type PipelineMode =
|
|
281
|
+
| { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
|
|
282
|
+
| { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
|
|
283
|
+
| { kind: 'IMPLEMENT'; pitchPath: string }
|
|
284
|
+
| { kind: 'FULL'; goal: string }
|
|
285
|
+
| { kind: 'CLASSIFY_AND_RUN'; classifiedPipeline: readonly string[] }
|
|
286
|
+
| { kind: 'ESCALATE'; reason: string };
|
|
287
|
+
|
|
288
|
+
async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, string>> {
|
|
289
|
+
// Tier 1: static signals (pure, no I/O)
|
|
290
|
+
const staticResult = applyStaticRules(goal, workspace);
|
|
291
|
+
if (staticResult !== null) return ok(staticResult);
|
|
292
|
+
// Tier 2: LLM classification
|
|
293
|
+
const classified = await runClassification(goal, workspace, deps);
|
|
294
|
+
return classified.kind === 'ok'
|
|
295
|
+
? ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value })
|
|
296
|
+
: err(classified.error);
|
|
297
|
+
}
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
`CLASSIFY_AND_RUN` mode executes the `recommendedPipeline` array as a sequential phase list. `REVIEW_ONLY`/`QUICK_REVIEW`/`IMPLEMENT`/`FULL` have hardcoded phase sequences in the coordinator.
|
|
301
|
+
|
|
302
|
+
**Per-mode sequences:**
|
|
303
|
+
- `REVIEW_ONLY`: same as Candidate A
|
|
304
|
+
- `QUICK_REVIEW`: same as Candidate A
|
|
305
|
+
- `IMPLEMENT`: same as Candidate A
|
|
306
|
+
- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
|
|
307
|
+
- `CLASSIFY_AND_RUN`: execute phases from classify-task output in order; unknown workflow IDs escalate
|
|
308
|
+
|
|
309
|
+
**Failure handling:** escalation-first, same as pr-review.ts. The routing failure (classify-task parse failure) produces ESCALATE mode with reason.
|
|
310
|
+
|
|
311
|
+
**Tensions resolved:** determinism for well-known cases (static tier); intelligence for ambiguous cases (LLM fallback); no LLM latency for 80% of cases.
|
|
312
|
+
**Tensions accepted:** two-tier routing is more complex than either pure approach; `CLASSIFY_AND_RUN` mode is less typed than named modes.
|
|
313
|
+
**Failure mode to watch:** static rules and LLM classification disagree. Resolution: static always wins. If a developer adds a new static rule that catches cases formerly handled by classify-task, behavior changes silently.
|
|
314
|
+
**Follows:** parseFindingsFromNotes two-tier strategy pattern. CoordinatorDeps injection for the LLM fallback path.
|
|
315
|
+
**Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
|
|
316
|
+
**Give up:** complexity of two tiers; CLASSIFY_AND_RUN mode is not a named type with typed data.
|
|
317
|
+
**Impact surface:** same as Candidate A plus classify-task-workflow dependency.
|
|
318
|
+
**Scope judgment:** Best-fit -- covers all named use cases efficiently. YAGNI risk is low because the LLM fallback adds ~30 lines of code, not a new architecture.
|
|
319
|
+
**Philosophy:** Honors immutability, exhaustiveness (switch on PipelineMode is exhaustive), determinism-over-cleverness (static tier is deterministic, LLM is bounded fallback), errors-as-data.
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
### Candidate D: explicit pipelineMode in trigger config + CLI flag (configuration-driven)
|
|
324
|
+
|
|
325
|
+
**One-sentence summary:** Add an optional `pipelineMode: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto'` field to `TriggerDefinition` and `worktrain run pipeline --mode <mode>` CLI flag; `auto` falls back to Candidate C's hybrid routing.
|
|
326
|
+
|
|
327
|
+
**Architecture:**
|
|
328
|
+
```typescript
|
|
329
|
+
// In TriggerDefinition (new optional field)
|
|
330
|
+
readonly pipelineMode?: 'review_only' | 'quick_review' | 'implement' | 'full' | 'auto';
|
|
331
|
+
|
|
332
|
+
// In coordinator: read from opts or trigger config
|
|
333
|
+
const mode = opts.pipelineMode ?? 'auto';
|
|
334
|
+
if (mode !== 'auto') return ok(toPipelineMode(mode, goal, workspace));
|
|
335
|
+
// Else: Candidate C hybrid routing
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
**Trigger config example:**
|
|
339
|
+
```yaml
|
|
340
|
+
triggers:
|
|
341
|
+
- id: github-prs
|
|
342
|
+
workflowId: adaptive-pipeline
|
|
343
|
+
pipelineMode: review_only # explicit: always run review pipeline for PR events
|
|
344
|
+
- id: backlog-implement
|
|
345
|
+
workflowId: adaptive-pipeline
|
|
346
|
+
pipelineMode: full # explicit: always run full pipeline for backlog tasks
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
**Failure handling:** same escalation-first policy. `pipelineMode` validation at trigger load time catches invalid values.
|
|
350
|
+
|
|
351
|
+
**Tensions resolved:** eliminates routing ambiguity for trigger operators (explicit config is authoritative); removes LLM classification cost for well-configured triggers.
|
|
352
|
+
**Tensions accepted:** requires configuration overhead; trigger.yml changes needed for each new use case; `auto` mode still requires Candidate C's complexity.
|
|
353
|
+
**Failure mode to watch:** trigger operator forgets to set `pipelineMode` and gets unexpected routing from `auto` fallback.
|
|
354
|
+
**Follows / departs:** departs from `TriggerDefinition` design (adds a new field); follows the principle of explicit > implicit.
|
|
355
|
+
**Gain:** total routing clarity for trigger-based pipelines; observable in trigger.yml config without reading coordinator code.
|
|
356
|
+
**Give up:** adds a field to `TriggerDefinition` (src/trigger/types.ts change); configuration overhead.
|
|
357
|
+
**Impact surface:** `TriggerDefinition`, trigger-store.ts validation, CLI `worktrain run pipeline` opts.
|
|
358
|
+
**Scope judgment:** Slightly broad for an initial coordinator design -- the `TriggerDefinition` change is a schema change with broader impact. But it resolves the root tension between implicit and explicit routing.
|
|
359
|
+
**Philosophy:** Honors explicit-over-implicit (not named in CLAUDE.md but consistent with the spirit of 'make illegal states unrepresentable'). Minor conflict with YAGNI (schema change is speculative for users who only use CLI).
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
### Candidate E: per-mode coordinator files with thin dispatcher (architectural decomposition)
|
|
364
|
+
|
|
365
|
+
**One-sentence summary:** Instead of one adaptive coordinator file, each pipeline mode is a separate coordinator function in its own file, mirroring the `pr-review.ts` pattern; a thin `dispatch.ts` reads the routing result and calls the right coordinator function.
|
|
366
|
+
|
|
367
|
+
**Architecture:**
|
|
368
|
+
```
|
|
369
|
+
src/coordinators/
|
|
370
|
+
dispatch.ts <- thin router: calls routeTask(), dispatches to mode coordinator
|
|
371
|
+
modes/
|
|
372
|
+
review-only.ts <- runReviewOnlyPipeline(deps, opts)
|
|
373
|
+
quick-review.ts <- runQuickReviewPipeline(deps, opts)
|
|
374
|
+
implement.ts <- runImplementPipeline(deps, opts)
|
|
375
|
+
full-pipeline.ts <- runFullPipeline(deps, opts)
|
|
376
|
+
routing/
|
|
377
|
+
route-task.ts <- routeTask() pure function (Candidate A's static rules)
|
|
378
|
+
classify.ts <- runClassification() -- LLM fallback
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
**dispatch.ts:**
|
|
382
|
+
```typescript
|
|
383
|
+
const mode = await routeTask(goal, workspace, deps);
|
|
384
|
+
switch (mode.kind) {
|
|
385
|
+
case 'REVIEW_ONLY': return runReviewOnlyPipeline(deps, opts, mode);
|
|
386
|
+
case 'QUICK_REVIEW': return runQuickReviewPipeline(deps, opts, mode);
|
|
387
|
+
case 'IMPLEMENT': return runImplementPipeline(deps, opts, mode);
|
|
388
|
+
case 'FULL': return runFullPipeline(deps, opts, mode);
|
|
389
|
+
case 'ESCALATE': return err(mode.reason);
|
|
390
|
+
default: return assertNever(mode);
|
|
391
|
+
}
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
Each mode coordinator is ~300-600 lines, fully independently testable. No mode-specific logic bleeds into other modes.
|
|
395
|
+
|
|
396
|
+
**Failure handling:** each mode coordinator has its own escalation policy appropriate to that mode. Full pipeline might have shaping-failure escalation logic. Review-only mirrors pr-review.ts.
|
|
397
|
+
|
|
398
|
+
**Tensions resolved:** monolithic-vs-decomposition tension (fully decomposed); each mode independently testable; adding a new mode is additive, not modification of existing code.
|
|
399
|
+
**Tensions accepted:** more files to navigate; routing layer is separate from execution, which is the right seam but adds indirection.
|
|
400
|
+
**Failure mode to watch:** mode coordinator interfaces diverge over time (each team member adds different fields to their mode's `Opts` type).
|
|
401
|
+
**Follows:** directly extends the `pr-review.ts` single-mode pattern -- this is N instances of that pattern.
|
|
402
|
+
**Gain:** each mode coordinator is small, focused, testable in isolation. Open/closed principle: adding a new mode does not touch existing files.
|
|
403
|
+
**Give up:** more files; thin dispatcher adds a layer.
|
|
404
|
+
**Impact surface:** CLI wiring, each mode coordinator's test suite.
|
|
405
|
+
**Scope judgment:** Best-fit for a growing coordinator surface. The decomposition is the right architecture for 5+ modes.
|
|
406
|
+
**Philosophy:** Honors YAGNI (each mode file is exactly what that mode needs), exhaustiveness (switch in dispatch.ts), compose-with-small-pure-functions.
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
## Challenge Notes
|
|
411
|
+
|
|
412
|
+
### Comparison matrix
|
|
413
|
+
|
|
414
|
+
| Tension | A (static) | B (LLM-only) | C (hybrid) | D (config) | E (decomposed) |
|
|
415
|
+
|---------|-----------|-------------|-----------|-----------|---------------|
|
|
416
|
+
| LLM accuracy vs dispatch latency | A wins: zero latency, no accuracy | B wins accuracy, loses latency | C wins both for common cases | D wins for configured triggers | neutral |
|
|
417
|
+
| Flexibility vs explicit configuration | A: implicit heuristics | B: implicit LLM | C: implicit hybrid | D: explicit config (best) | neutral |
|
|
418
|
+
| Monolithic vs decomposition | all except E are monolithic routing | same | same | same | E wins |
|
|
419
|
+
| recommendedPipeline verbatim vs advisory | A: ignores it entirely | B: verbatim | C: advisory for static cases, verbatim for classify | D: bypassed by config | neutral |
|
|
420
|
+
| Phase 0.5 vs coordinator routing | A: delegates to Phase 0.5 | B: may duplicate Phase 0.5 | C: static pitch.md check before Phase 0.5 | D: config resolves it | neutral |
|
|
421
|
+
|
|
422
|
+
### Recommendation: C + E (Candidate C routing mechanism, Candidate E file architecture)
|
|
423
|
+
|
|
424
|
+
**The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via classify-task-workflow. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
|
|
425
|
+
|
|
426
|
+
**The architecture decision (E):** Per-mode coordinator files with a thin dispatcher is the correct architecture for 5 modes. Each mode file follows pr-review.ts independently. The dispatcher is the only code that changes when a new mode is added. This is how the codebase is already structured (pr-review.ts is one mode file) -- Candidate E just makes the pattern explicit.
|
|
427
|
+
|
|
428
|
+
**Combined:** the routing logic lives in `src/coordinators/routing/route-task.ts` and `routing/classify.ts`. The dispatcher lives in `src/coordinators/adaptive-pipeline.ts` (thin). The mode executors live in `src/coordinators/modes/`.
|
|
429
|
+
|
|
430
|
+
### Candidate C alone handles the routing; Candidate E handles the architecture; D is additive
|
|
431
|
+
|
|
432
|
+
Candidate D (pipelineMode in TriggerDefinition) is not mutually exclusive with C+E. It can be added as a later optimization -- the CLI `--mode` flag gives explicit override without requiring a schema change in TriggerDefinition. Start with `--mode` CLI flag; add TriggerDefinition field later if trigger operators need it.
|
|
433
|
+
|
|
434
|
+
### Strongest argument against C+E
|
|
435
|
+
|
|
436
|
+
**Against C (static rules):** The static rules are heuristics. A task `"fix the BLOCKING issue in PR #47"` contains both a blocking keyword (from review vocabulary) and a PR number. It routes to REVIEW_ONLY but the user may intend to implement a fix. The ambiguity is real. Counter: the routing decision is logged with reason before any spawn. Users who see an unexpected routing can add `--mode full` as an override.
|
|
437
|
+
|
|
438
|
+
**Against E (decomposition):** More files means more navigation overhead and more risk of interface divergence between mode coordinators. Counter: the shared `CoordinatorDeps` interface is the contract; mode-specific opts types can extend a base type. The decomposition is justified by the maintenance benefit at 5+ modes.
|
|
439
|
+
|
|
440
|
+
### Narrower option that loses: Candidate A (pure static)
|
|
441
|
+
|
|
442
|
+
Candidate A loses because tasks that don't match any static signal fall to FULL (run all phases). This is wasteful for Medium complexity tasks that don't need full discovery. Classify-task-workflow covers these for ~$0.002 and 5-15 seconds. The cost/benefit favors the hybrid over pure static.
|
|
443
|
+
|
|
444
|
+
### Broader option that might be justified: Candidate D
|
|
445
|
+
|
|
446
|
+
Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger operators need deterministic routing for automated workflows (e.g., a GitHub PR webhook should ALWAYS route to REVIEW_ONLY, regardless of goal text). Evidence required: at least one trigger configuration where heuristic routing produces wrong results and explicit config is the only safe option. Start without it; add it if this evidence appears.
|
|
447
|
+
|
|
448
|
+
### Pivot conditions
|
|
449
|
+
|
|
450
|
+
- If `classify-task-workflow` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
|
|
451
|
+
- If `TriggerDefinition` change is needed for automated workflows, add Candidate D's pipelineMode field
|
|
452
|
+
- If context-passing agent's design shows that the coordinator must inject structured context at spawn time, the mode coordinator files must include context injection logic -- this is implementation detail, not a routing design change
|
|
453
|
+
|
|
454
|
+
---
|
|
455
|
+
|
|
456
|
+
## Resolution Notes
|
|
457
|
+
|
|
458
|
+
### Selected direction: C (routing) + E (architecture)
|
|
459
|
+
|
|
460
|
+
**Winner:** Candidate C two-tier routing + Candidate E per-mode file decomposition.
|
|
461
|
+
|
|
462
|
+
**Runner-up:** Candidate A (pure static routing). The challenge revealed that Candidate A covers all 5 stated use cases. It is a legitimate MVP starting point. C adds value for future Medium-complexity tasks not in the stated use cases. Both are correct -- choose based on timeline.
|
|
463
|
+
|
|
464
|
+
**Challenge findings:**
|
|
465
|
+
|
|
466
|
+
1. **CLASSIFY_AND_RUN seam crack (genuine weakness, not blocking):** C's CLASSIFY_AND_RUN mode creates a typed/untyped seam in the dispatcher. Mitigation: CLASSIFY_AND_RUN fires only for tasks with no static signal; the dispatcher handles it with a dedicated `runClassifyAndRunPipeline` function that is documented as the "catch-all" path. Alternatively: fold CLASSIFY_AND_RUN into FULL (just run the three-workflow pipeline for all ambiguous tasks) and remove the LLM fallback entirely. This would make C = A for ambiguous tasks, simplifying the design.
|
|
467
|
+
- **Final decision: simplify C by removing CLASSIFY_AND_RUN. Ambiguous tasks (no static signal) default to FULL. This gives Candidate A's simplicity with Candidate C's structure.**
|
|
468
|
+
|
|
469
|
+
2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No classify-task-workflow dependency at all for the initial implementation.
|
|
470
|
+
|
|
471
|
+
### Final simplified design (A + E, not C + E)
|
|
472
|
+
|
|
473
|
+
**Routing (revised -- effectively A):**
|
|
474
|
+
```typescript
|
|
475
|
+
type PipelineMode =
|
|
476
|
+
| { kind: 'REVIEW_ONLY'; prNumbers: readonly number[] }
|
|
477
|
+
| { kind: 'QUICK_REVIEW'; prNumbers: readonly number[] }
|
|
478
|
+
| { kind: 'IMPLEMENT'; pitchPath: string }
|
|
479
|
+
| { kind: 'FULL'; goal: string }
|
|
480
|
+
| { kind: 'ESCALATE'; reason: string };
|
|
481
|
+
|
|
482
|
+
function routeTask(goal: string, workspace: string): Result<PipelineMode, string>
|
|
483
|
+
// Pure function. No LLM. No async.
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
Static rules (prioritized):
|
|
487
|
+
1. goal matches dep-bump keywords AND PR/MR number -> `QUICK_REVIEW`
|
|
488
|
+
2. goal matches PR/MR number (`#\d+`, `PR #\d+`, `MR !?\d+`) -> `REVIEW_ONLY`
|
|
489
|
+
3. `.workrail/current-pitch.md` exists -> `IMPLEMENT`
|
|
490
|
+
4. else -> `FULL`
|
|
491
|
+
|
|
492
|
+
**Why remove CLASSIFY_AND_RUN:** classify-task-workflow adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
|
|
493
|
+
|
|
494
|
+
**Architecture (E as designed):**
|
|
495
|
+
```
|
|
496
|
+
src/coordinators/
|
|
497
|
+
adaptive-pipeline.ts <- thin entry point
|
|
498
|
+
routing/
|
|
499
|
+
route-task.ts <- routeTask() pure function
|
|
500
|
+
modes/
|
|
501
|
+
review-only.ts <- runReviewOnlyPipeline()
|
|
502
|
+
quick-review.ts <- runQuickReviewPipeline()
|
|
503
|
+
implement.ts <- runImplementPipeline()
|
|
504
|
+
full-pipeline.ts <- runFullPipeline()
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
### Accepted tradeoffs
|
|
508
|
+
|
|
509
|
+
1. All tasks without static signals run FULL (discovery + shaping + coding). This is correct for vague ideas; slightly wasteful for "refactor X" tasks that might skip discovery. Accepted: correctness matters more than cost optimization at MVP stage.
|
|
510
|
+
2. `routeTask()` is a pure function -- the only I/O is `fs.existsSync('.workrail/current-pitch.md')`. This filesystem check must be injectable via `AdaptiveCoordinatorDeps` for testability.
|
|
511
|
+
3. QUICK_REVIEW and REVIEW_ONLY are structurally similar; QUICK_REVIEW just passes a lighter model hint. They could be merged into one mode with an `isLight` flag, but discriminated union with two variants is cleaner.
|
|
512
|
+
|
|
513
|
+
### Identified failure modes
|
|
514
|
+
|
|
515
|
+
1. **PR number in a non-review task**: `"refactor PR #47 related auth code"` contains `#47` and routes to REVIEW_ONLY incorrectly. Mitigation: the routing decision is logged before any spawn; users see the unexpected routing and can use `--mode full` override.
|
|
516
|
+
2. **pitch.md stale**: `.workrail/current-pitch.md` exists from a previous task and routes a new task to IMPLEMENT incorrectly. Mitigation: document that pitch.md is consumed (moved/deleted) after the coding task completes. This is a Phase 0.5 / wr.shaping convention issue, not a routing design issue.
|
|
517
|
+
3. **FULL pipeline timeout**: wr.discovery + wr.shaping + coding is potentially 90+ minutes. The coordinator must enforce a wall-clock cutoff (same pattern as pr-review.ts COORDINATOR_SPAWN_CUTOFF_MS). Each phase gets a hardcoded timeout: discovery 30 min, shaping 30 min, coding 60 min.
|
|
518
|
+
|
|
519
|
+
### Switch triggers
|
|
520
|
+
|
|
521
|
+
- If `routeTask()` produces wrong routing more than 5% of the time in real use -> add classify-task as Tier 2 fallback (upgrade to C)
|
|
522
|
+
- If trigger operators need deterministic routing without static signals -> add Candidate D's `pipelineMode` field to TriggerDefinition
|
|
523
|
+
- If wr.discovery produces a standardized file at `.workrail/current-discovery.md` -> add a static rule to detect it and route to a IMPLEMENT_FROM_DISCOVERY mode
|
|
524
|
+
|
|
525
|
+
---
|
|
526
|
+
|
|
527
|
+
## Decision Log
|
|
528
|
+
|
|
529
|
+
| Date | Decision | Rationale |
|
|
530
|
+
|------|----------|-----------|
|
|
531
|
+
| 2026-04-19 | Path: design_first | Goal was solution-stated; dominant risk is wrong routing mechanism design, not lack of landscape knowledge |
|
|
532
|
+
| 2026-04-19 | Routing mechanism: pure static (A), not hybrid (C) | Challenge revealed Candidate A covers all 5 stated use cases. CLASSIFY_AND_RUN adds non-determinism and format-parsing risk for no concrete MVP benefit. YAGNI wins. |
|
|
533
|
+
| 2026-04-19 | Architecture: per-mode files + thin dispatcher (E) | pr-review.ts is 1462 lines for one mode. Five modes in one file would be unmanageable. Decomposition is required, not premature. |
|
|
534
|
+
| 2026-04-19 | Candidate D (TriggerDefinition pipelineMode field) deferred | CLI --mode flag is sufficient. Schema change not justified until evidence of trigger-operator need. |
|
|
535
|
+
| 2026-04-19 | REVIEW_ONLY/QUICK_REVIEW delegate to pr-review coordinator | Review finding: reimplementing fix-agent loop would duplicate pr-review.ts logic. Delegation keeps behavior consistent. |
|
|
536
|
+
| 2026-04-19 | Per-phase timeouts required in implementation (R1 from review) | Discovery 30min, Shaping 30min, Coding 60min, Review 20min, FULL_MAX 160min, SPAWN_CUTOFF 130min -- hardcoded, never LLM-computed. |
|
|
537
|
+
| 2026-04-19 | PR number regex must be context-sensitive (O1 from review) | Bare `#\d+` produces false positives. Use `\bPR\s*#\d+\b` or `\bMR\s*!?\d+\b` patterns with verb context check. |
|
|
538
|
+
| 2026-04-19 | pitch.md must be archived after IMPLEMENT mode (O2 from review) | runImplementPipeline() archives .workrail/current-pitch.md to .workrail/pitches/[timestamp]-pitch.md after coding succeeds. Prevents stale routing. |
|
|
539
|
+
| 2026-04-19 | Timing constants specified explicitly | Review finding O1: FULL pipeline is 4x longer than pr-review.ts; pr-review.ts constants were wrong. New constants: discovery 35min, shaping 35min, coding 65min, cutoff 100min, max 120min. |
|
|
540
|
+
| 2026-04-19 | Pitch.md lifecycle invariant documented | Review finding O2: stale pitch.md silently misroutes future tasks. IMPLEMENT mode executor must archive pitch.md after completion. |
|
|
541
|
+
|
|
542
|
+
---
|
|
543
|
+
|
|
544
|
+
## Assumptions for the Context-Passing Agent
|
|
545
|
+
|
|
546
|
+
**Note:** `docs/design/adaptive-coordinator-context.md` did not exist at the time of this session's finalization. The following assumptions are based on what the routing design implies for inter-phase context passing. The context-passing agent should verify or challenge these.
|
|
547
|
+
|
|
548
|
+
### Assumptions the context-passing agent must know about:
|
|
549
|
+
|
|
550
|
+
1. **Routing determines spawn order, not context shape.** The routing layer (`routeTask()`) produces a `PipelineMode` variant. It does NOT know what context to pass to each spawned session. Context injection is entirely the responsibility of each mode coordinator (full-pipeline.ts, implement.ts, etc.), not the routing layer.
|
|
551
|
+
|
|
552
|
+
2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
|
|
553
|
+
|
|
554
|
+
3. **pitch.md is the canonical Shaping->Coding handoff.** The `IMPLEMENT` mode routes directly to coding because `current-pitch.md` already exists. The coding-task Phase 0.5 detects it and uses it. If the context-passing agent introduces a different handoff mechanism (e.g., coordinator-injected context instead of a file), the `IMPLEMENT` mode coordinator needs to inject that context at spawn time rather than relying on Phase 0.5 file detection.
|
|
555
|
+
|
|
556
|
+
4. **Discovery->Shaping context passing is not yet solved.** The `FULL` pipeline currently spawns `wr.discovery` then `wr.shaping` without passing discovery findings to shaping. The coordinator must bridge this gap by reading discovery's final step notes and injecting them as `assembledContextSummary` for the shaping spawn. This is an implementation detail inside `full-pipeline.ts` but the context-passing agent's design will determine exactly what to inject.
|
|
557
|
+
|
|
558
|
+
5. **The routing layer has no opinion on context.** The `PipelineMode` discriminated union does NOT carry context bundles. Context assembly and injection is done at spawn time within each mode coordinator, not in `routeTask()` or `adaptive-pipeline.ts`. If the context-passing agent's design needs the routing decision to carry context, this is a new requirement that changes the `PipelineMode` type shape.
|
|
559
|
+
|
|
560
|
+
6. **ESCALATE mode carries no phases.** If `routeTask()` returns `ESCALATE`, no sessions are spawned. The context-passing agent does not need to handle this case.
|
|
561
|
+
|
|
562
|
+
---
|
|
563
|
+
|
|
564
|
+
## Final Summary
|
|
565
|
+
|
|
566
|
+
### The routing/classification design for WorkTrain's adaptive pipeline coordinator
|
|
567
|
+
|
|
568
|
+
**What was decided:**
|
|
569
|
+
|
|
570
|
+
The adaptive coordinator uses **pure static routing with per-mode file decomposition** (Candidate A routing + Candidate E architecture).
|
|
571
|
+
|
|
572
|
+
**Routing mechanism:** `routeTask(goal: string, workspace: string): Result<PipelineMode, string>` is a pure function with no I/O (filesystem check for pitch.md is injectable via deps). It applies static rules in priority order:
|
|
573
|
+
|
|
574
|
+
1. Dep-bump keywords AND PR/MR number in goal → `QUICK_REVIEW`
|
|
575
|
+
2. PR/MR number in goal OR `github_prs_poll` trigger provider → `REVIEW_ONLY`
|
|
576
|
+
3. `.workrail/current-pitch.md` exists in workspace → `IMPLEMENT`
|
|
577
|
+
4. Default → `FULL` (conservative)
|
|
578
|
+
|
|
579
|
+
**Named pipeline modes with step sequences:**
|
|
580
|
+
|
|
581
|
+
| Mode | Step sequence |
|
|
582
|
+
|------|---------------|
|
|
583
|
+
| `REVIEW_ONLY` | `mr-review-workflow.agentic.v2` → verdict routing (clean: merge, minor: fix-loop, blocking: escalate) |
|
|
584
|
+
| `QUICK_REVIEW` | same as REVIEW_ONLY with lighter model config |
|
|
585
|
+
| `IMPLEMENT` | `coding-task-workflow-agentic` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
|
|
586
|
+
| `FULL` | `wr.discovery` → `wr.shaping` → `coding-task-workflow-agentic` → PR → `mr-review-workflow.agentic.v2` → merge |
|
|
587
|
+
|
|
588
|
+
**File architecture (Candidate E):**
|
|
589
|
+
```
|
|
590
|
+
src/coordinators/
|
|
591
|
+
adaptive-pipeline.ts -- thin entry point + AdaptiveCoordinatorDeps wiring
|
|
592
|
+
routing/
|
|
593
|
+
route-task.ts -- routeTask() pure function + applyStaticRules()
|
|
594
|
+
modes/
|
|
595
|
+
review-only.ts -- runReviewOnlyPipeline()
|
|
596
|
+
quick-review.ts -- runQuickReviewPipeline()
|
|
597
|
+
implement.ts -- runImplementPipeline()
|
|
598
|
+
full-pipeline.ts -- runFullPipeline()
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
**Entry point:** `worktrain run pipeline --task "..." [--mode FULL|IMPLEMENT|REVIEW_ONLY|QUICK_REVIEW]` CLI command. The `--mode` flag provides explicit override for all routing decisions.
|
|
602
|
+
|
|
603
|
+
**REVIEW_ONLY and QUICK_REVIEW delegate to existing pr-review coordinator:**
|
|
604
|
+
The `modes/review-only.ts` and `modes/quick-review.ts` executors should delegate to the existing `runPrReviewCoordinator()` from `src/coordinators/pr-review.ts` rather than reimplementing the fix-agent loop. This avoids duplicating the verdict parsing, fix-agent loop, and merge logic. The only difference is the goal string passed to the review session.
|
|
605
|
+
|
|
606
|
+
**Timing constants (hardcoded, never LLM-computed -- robustness rule from pr-review.ts):**
|
|
607
|
+
```typescript
|
|
608
|
+
const DISCOVERY_TIMEOUT_MS = 35 * 60 * 1000; // 35 minutes
|
|
609
|
+
const SHAPING_TIMEOUT_MS = 35 * 60 * 1000; // 35 minutes
|
|
610
|
+
const CODING_TIMEOUT_MS = 65 * 60 * 1000; // 65 minutes
|
|
611
|
+
const REVIEW_TIMEOUT_MS = 25 * 60 * 1000; // 25 minutes (child session)
|
|
612
|
+
const COORDINATOR_SPAWN_CUTOFF_MS = 100 * 60 * 1000; // 100 min (refuse new spawns after)
|
|
613
|
+
const COORDINATOR_MAX_MS = 120 * 60 * 1000; // 120 min total coordinator wall-clock
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
**Pitch.md lifecycle invariant:**
|
|
617
|
+
`IMPLEMENT` mode routes to coding because `.workrail/current-pitch.md` exists. After the coding session completes (success OR failure), the mode executor must archive the pitch:
|
|
618
|
+
- Archive path: `.workrail/used-pitches/pitch-{ISO-timestamp}.md`
|
|
619
|
+
- This prevents stale pitch.md from incorrectly routing future tasks to IMPLEMENT mode
|
|
620
|
+
|
|
621
|
+
**AdaptiveCoordinatorDeps new methods (beyond pr-review.ts CoordinatorDeps):**
|
|
622
|
+
- `fileExists(path: string): Promise<boolean>` -- for pitch.md detection in routeTask()
|
|
623
|
+
- All other methods: same as `CoordinatorDeps` (copied, not inherited -- separate interface avoids forced coupling to pr-review.ts)
|
|
624
|
+
|
|
625
|
+
**QUICK_REVIEW goal string template:**
|
|
626
|
+
```
|
|
627
|
+
[DEP BUMP] Review PR #${prNumber}: ${prTitle} -- skip architecture audit, verify version compatibility and test coverage only
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
**Failure handling:**
|
|
631
|
+
- Any phase failure produces a `PipelineOutcome` with `escalated: true` and structured `escalationReason: { phase: string, reason: string }`
|
|
632
|
+
- No silent substitution (e.g., shaping failure does not fall back to a simplified pipeline)
|
|
633
|
+
- Routing decision is logged as traceability JSON before any session spawn
|
|
634
|
+
- FULL pipeline: each phase is an independent escalation point (discovery-fail, shaping-fail, coding-fail each escalate independently)
|
|
635
|
+
|
|
636
|
+
**Why LLM classification (classify-task-workflow) was excluded:**
|
|
637
|
+
|
|
638
|
+
After adversarial challenge, CLASSIFY_AND_RUN mode was removed. The LLM classification path adds non-determinism and format-parsing fragility (notes parsing vs typed artifact) for no concrete MVP benefit. All 5 stated use cases are covered by static rules. The upgrade path to add classify-task as a Tier 2 fallback exists when evidence shows >5% misrouting in production.
|
|
639
|
+
|
|
640
|
+
**Deferred:** Candidate D's `pipelineMode` field in `TriggerDefinition`. CLI `--mode` flag is sufficient. Schema change deferred until trigger operators demonstrate need for deterministic routing without static signals.
|
|
641
|
+
|
|
642
|
+
**Switch triggers for the context-passing agent:**
|
|
643
|
+
- If context-passing design requires the routing decision to carry a context bundle → `PipelineMode` type changes to include context
|
|
644
|
+
- If discovery->shaping handoff uses a new file convention → routing logic may gain a new static signal (detect `.workrail/current-discovery.md`)
|
|
645
|
+
- If pitch.md handoff is replaced by coordinator-injected context → `IMPLEMENT` mode routing condition changes
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
### Confidence and residual risks
|
|
650
|
+
|
|
651
|
+
**Confidence band: HIGH**
|
|
652
|
+
|
|
653
|
+
**Residual risks (non-blocking):**
|
|
654
|
+
1. wr.discovery runtime > 35 minutes would bust the FULL pipeline timing budget. Mitigation: per-phase timeout constants are hardcoded and will surface the issue as a timeout escalation.
|
|
655
|
+
2. `parseRecommendedPipeline()` pure function not yet written. Should be written at implementation time as upgrade-path preparation (does not block MVP).
|
|
656
|
+
3. REVIEW_ONLY delegation to `runPrReviewCoordinator()` -- the delegation API (how `modes/review-only.ts` calls into pr-review.ts) needs to be designed at implementation time. This is a clean internal API design question, not a routing design question.
|
|
657
|
+
|
|
658
|
+
**What would change the design:**
|
|
659
|
+
- >5% misrouting rate in real use → upgrade to hybrid C (add classify-task as Tier 2 fallback)
|
|
660
|
+
- Trigger operators need deterministic routing without goal-text signals → add Candidate D's `pipelineMode` to TriggerDefinition
|