@exaudeus/workrail 3.66.0 → 3.68.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/application/services/compiler/template-registry.js +10 -1
- package/dist/application/validation.js +1 -1
- package/dist/cli/commands/worktrain-init.js +1 -1
- package/dist/console/standalone-console.js +4 -1
- package/dist/console-ui/assets/{index-BynU38Vu.js → index-CyzltI6D.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/modes/full-pipeline.js +4 -4
- package/dist/coordinators/modes/implement-shared.js +5 -5
- package/dist/coordinators/modes/implement.js +4 -4
- package/dist/coordinators/pr-review.js +4 -4
- package/dist/daemon/workflow-runner.d.ts +1 -0
- package/dist/daemon/workflow-runner.js +1 -0
- package/dist/infrastructure/storage/schema-validating-workflow-storage.d.ts +21 -2
- package/dist/infrastructure/storage/schema-validating-workflow-storage.js +48 -0
- package/dist/manifest.json +41 -41
- package/dist/mcp/handlers/v2-workflow.js +24 -7
- package/dist/mcp/output-schemas.d.ts +36 -0
- package/dist/mcp/output-schemas.js +11 -1
- package/dist/mcp/workflow-protocol-contracts.js +2 -2
- package/dist/v2/projections/session-metrics.d.ts +1 -1
- package/dist/v2/projections/session-metrics.js +16 -35
- package/dist/v2/usecases/console-routes.d.ts +2 -2
- package/docs/authoring-v2.md +4 -4
- package/docs/changelog-recent.md +3 -3
- package/docs/configuration.md +1 -1
- package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
- package/docs/design/adaptive-coordinator-context.md +1 -1
- package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
- package/docs/design/adaptive-coordinator-routing-review.md +1 -1
- package/docs/design/adaptive-coordinator-routing.md +34 -34
- package/docs/design/agent-cascade-protocol.md +2 -2
- package/docs/design/console-daemon-separation-discovery.md +323 -0
- package/docs/design/context-assembly-design-candidates.md +1 -1
- package/docs/design/context-assembly-implementation-plan.md +1 -1
- package/docs/design/context-assembly-layer.md +2 -2
- package/docs/design/context-assembly-review-findings.md +1 -1
- package/docs/design/coordinator-access-audit.md +293 -0
- package/docs/design/coordinator-architecture-audit.md +62 -0
- package/docs/design/coordinator-error-handling-audit.md +240 -0
- package/docs/design/coordinator-testability-audit.md +426 -0
- package/docs/design/daemon-architecture-discovery.md +1 -1
- package/docs/design/daemon-console-separation-discovery.md +242 -0
- package/docs/design/daemon-memory-audit.md +203 -0
- package/docs/design/design-candidates-console-daemon-separation.md +256 -0
- package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
- package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
- package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
- package/docs/design/discovery-loop-fix-candidates.md +161 -0
- package/docs/design/discovery-loop-fix-design-review.md +106 -0
- package/docs/design/discovery-loop-fix-validation.md +258 -0
- package/docs/design/discovery-loop-investigation-A.md +188 -0
- package/docs/design/discovery-loop-investigation-B.md +287 -0
- package/docs/design/exploration-workflow-candidates.md +205 -0
- package/docs/design/exploration-workflow-design-review.md +166 -0
- package/docs/design/exploration-workflow-discovery.md +443 -0
- package/docs/design/ide-context-files-candidates.md +231 -0
- package/docs/design/ide-context-files-design-review.md +85 -0
- package/docs/design/ide-context-files.md +615 -0
- package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
- package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
- package/docs/design/in-process-http-audit.md +190 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
- package/docs/design/loadSessionNotes-candidates.md +108 -0
- package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
- package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
- package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
- package/docs/design/probe-session-design-candidates.md +261 -0
- package/docs/design/probe-session-phase0.md +490 -0
- package/docs/design/routines-guide.md +7 -7
- package/docs/design/session-metrics-attribution-candidates.md +250 -0
- package/docs/design/session-metrics-attribution-design-review.md +115 -0
- package/docs/design/session-metrics-attribution-discovery.md +319 -0
- package/docs/design/session-metrics-candidates.md +227 -0
- package/docs/design/session-metrics-design-review.md +104 -0
- package/docs/design/session-metrics-discovery.md +454 -0
- package/docs/design/spawn-session-debug.md +202 -0
- package/docs/design/trigger-validator-candidates.md +214 -0
- package/docs/design/trigger-validator-review.md +109 -0
- package/docs/design/trigger-validator-shaping-phase0.md +239 -0
- package/docs/design/trigger-validator.md +454 -0
- package/docs/design/v2-core-design-locks.md +2 -2
- package/docs/design/workflow-extension-points.md +15 -15
- package/docs/design/workflow-id-validation-at-startup.md +1 -1
- package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
- package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
- package/docs/design/worktrain-task-queue-candidates.md +5 -5
- package/docs/design/worktrain-task-queue.md +4 -4
- package/docs/discovery/coordinator-script-design.md +1 -1
- package/docs/discovery/coordinator-ux-discovery.md +3 -3
- package/docs/discovery/simulation-report.md +1 -1
- package/docs/discovery/workflow-modernization-discovery.md +326 -0
- package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
- package/docs/discovery/worktrain-status-briefing.md +1 -1
- package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
- package/docs/docker.md +1 -1
- package/docs/ideas/backlog.md +227 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
- package/docs/integrations/claude-code.md +5 -5
- package/docs/integrations/firebender.md +1 -1
- package/docs/plans/agentic-orchestration-roadmap.md +2 -2
- package/docs/plans/mr-review-workflow-redesign.md +9 -9
- package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
- package/docs/plans/ui-ux-workflow-discovery.md +2 -2
- package/docs/plans/workflow-categories-candidates.md +8 -8
- package/docs/plans/workflow-categories-discovery.md +4 -4
- package/docs/plans/workflow-modernization-design.md +430 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
- package/docs/plans/workflow-staleness-detection-review.md +4 -4
- package/docs/plans/workflow-staleness-detection.md +9 -9
- package/docs/plans/workrail-platform-vision.md +3 -3
- package/docs/reference/agent-context-cleaner-snippet.md +1 -1
- package/docs/reference/agent-context-guidance.md +4 -4
- package/docs/reference/context-optimization.md +2 -2
- package/docs/roadmap/now-next-later.md +2 -2
- package/docs/roadmap/open-work-inventory.md +16 -16
- package/docs/workflows.md +31 -31
- package/package.json +1 -1
- package/spec/workflow-tags.json +47 -47
- package/workflows/adaptive-ticket-creation.json +16 -16
- package/workflows/architecture-scalability-audit.json +22 -22
- package/workflows/bug-investigation.agentic.v2.json +3 -3
- package/workflows/classify-task-workflow.json +1 -1
- package/workflows/coding-task-workflow-agentic.json +6 -6
- package/workflows/cross-platform-code-conversion.v2.json +8 -8
- package/workflows/document-creation-workflow.json +8 -8
- package/workflows/documentation-update-workflow.json +8 -8
- package/workflows/intelligent-test-case-generation.json +2 -2
- package/workflows/learner-centered-course-workflow.json +2 -2
- package/workflows/mr-review-workflow.agentic.v2.json +4 -4
- package/workflows/personal-learning-materials-creation-branched.json +8 -8
- package/workflows/presentation-creation.json +5 -5
- package/workflows/production-readiness-audit.json +1 -1
- package/workflows/relocation-workflow-us.json +31 -31
- package/workflows/routines/context-gathering.json +1 -1
- package/workflows/routines/design-review.json +1 -1
- package/workflows/routines/execution-simulation.json +1 -1
- package/workflows/routines/feature-implementation.json +3 -3
- package/workflows/routines/final-verification.json +1 -1
- package/workflows/routines/hypothesis-challenge.json +1 -1
- package/workflows/routines/ideation.json +1 -1
- package/workflows/routines/parallel-work-partitioning.json +3 -3
- package/workflows/routines/philosophy-alignment.json +2 -2
- package/workflows/routines/plan-analysis.json +1 -1
- package/workflows/routines/plan-generation.json +1 -1
- package/workflows/routines/tension-driven-design.json +6 -6
- package/workflows/scoped-documentation-workflow.json +26 -26
- package/workflows/ui-ux-design-workflow.json +14 -14
- package/workflows/workflow-diagnose-environment.json +1 -1
- package/workflows/workflow-for-workflows.json +32 -77
- package/workflows/workflow-for-workflows.v2.json +0 -788
|
@@ -4,7 +4,7 @@ export interface SessionMetricsV2 {
|
|
|
4
4
|
readonly endGitSha: string | null;
|
|
5
5
|
readonly gitBranch: string | null;
|
|
6
6
|
readonly agentCommitShas: readonly string[];
|
|
7
|
-
readonly captureConfidence: 'high' | '
|
|
7
|
+
readonly captureConfidence: 'high' | 'none';
|
|
8
8
|
readonly durationMs: number | undefined;
|
|
9
9
|
readonly outcome: 'success' | 'partial' | 'abandoned' | 'error' | null;
|
|
10
10
|
readonly prNumbers: readonly number[];
|
|
@@ -3,24 +3,22 @@ Object.defineProperty(exports, "__esModule", { value: true });
|
|
|
3
3
|
exports.projectSessionMetricsV2 = projectSessionMetricsV2;
|
|
4
4
|
const constants_js_1 = require("../durable-core/constants.js");
|
|
5
5
|
function projectSessionMetricsV2(events) {
|
|
6
|
-
let
|
|
7
|
-
let runCompletedRunId = null;
|
|
6
|
+
let runCompleted = null;
|
|
8
7
|
for (const e of events) {
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
runCompletedData = asUnknown.data;
|
|
12
|
-
runCompletedRunId = asUnknown.scope?.runId ?? null;
|
|
8
|
+
if (e.kind === 'run_completed') {
|
|
9
|
+
runCompleted = e;
|
|
13
10
|
break;
|
|
14
11
|
}
|
|
15
12
|
}
|
|
16
|
-
if (
|
|
13
|
+
if (runCompleted === null) {
|
|
17
14
|
return null;
|
|
18
15
|
}
|
|
16
|
+
const runCompletedRunId = runCompleted.scope.runId;
|
|
19
17
|
const metricsContext = {};
|
|
20
18
|
for (const e of events) {
|
|
21
19
|
if (e.kind !== constants_js_1.EVENT_KIND.CONTEXT_SET)
|
|
22
20
|
continue;
|
|
23
|
-
if (
|
|
21
|
+
if (e.scope?.runId !== runCompletedRunId)
|
|
24
22
|
continue;
|
|
25
23
|
const ctx = e.data.context;
|
|
26
24
|
if (!ctx || typeof ctx !== 'object' || Array.isArray(ctx))
|
|
@@ -32,25 +30,15 @@ function projectSessionMetricsV2(events) {
|
|
|
32
30
|
}
|
|
33
31
|
}
|
|
34
32
|
}
|
|
35
|
-
const d =
|
|
33
|
+
const d = runCompleted.data;
|
|
36
34
|
const startGitSha = typeof d.startGitSha === 'string' ? d.startGitSha : null;
|
|
37
35
|
const endGitSha = typeof d.endGitSha === 'string' ? d.endGitSha : null;
|
|
38
36
|
const gitBranch = typeof d.gitBranch === 'string' ? d.gitBranch : null;
|
|
39
|
-
const agentCommitShas =
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
}
|
|
45
|
-
}
|
|
46
|
-
}
|
|
47
|
-
const captureConfidenceRaw = d.captureConfidence;
|
|
48
|
-
const captureConfidence = captureConfidenceRaw === 'high' || captureConfidenceRaw === 'medium' || captureConfidenceRaw === 'none'
|
|
49
|
-
? captureConfidenceRaw
|
|
50
|
-
: 'none';
|
|
51
|
-
const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs)
|
|
52
|
-
? d.durationMs
|
|
53
|
-
: undefined;
|
|
37
|
+
const agentCommitShas = Array.isArray(d.agentCommitShas)
|
|
38
|
+
? d.agentCommitShas.filter((s) => typeof s === 'string')
|
|
39
|
+
: [];
|
|
40
|
+
const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs) ? d.durationMs : undefined;
|
|
41
|
+
const captureConfidence = d.captureConfidence === 'high' ? 'high' : 'none';
|
|
54
42
|
const outcomeRaw = metricsContext['metrics_outcome'];
|
|
55
43
|
const outcome = outcomeRaw === 'success' || outcomeRaw === 'partial' || outcomeRaw === 'abandoned' || outcomeRaw === 'error'
|
|
56
44
|
? outcomeRaw
|
|
@@ -68,24 +56,17 @@ function projectSessionMetricsV2(events) {
|
|
|
68
56
|
const metricCommitShas = [];
|
|
69
57
|
if (Array.isArray(commitShasRaw)) {
|
|
70
58
|
for (const sha of commitShasRaw) {
|
|
71
|
-
if (typeof sha === 'string')
|
|
59
|
+
if (typeof sha === 'string')
|
|
72
60
|
metricCommitShas.push(sha);
|
|
73
|
-
}
|
|
74
61
|
}
|
|
75
62
|
}
|
|
76
63
|
const finalAgentCommitShas = metricCommitShas.length > 0 ? metricCommitShas : agentCommitShas;
|
|
77
64
|
const filesChangedRaw = metricsContext['metrics_files_changed'];
|
|
78
|
-
const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw)
|
|
79
|
-
? filesChangedRaw
|
|
80
|
-
: null;
|
|
65
|
+
const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw) ? filesChangedRaw : null;
|
|
81
66
|
const linesAddedRaw = metricsContext['metrics_lines_added'];
|
|
82
|
-
const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw)
|
|
83
|
-
? linesAddedRaw
|
|
84
|
-
: null;
|
|
67
|
+
const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw) ? linesAddedRaw : null;
|
|
85
68
|
const linesRemovedRaw = metricsContext['metrics_lines_removed'];
|
|
86
|
-
const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw)
|
|
87
|
-
? linesRemovedRaw
|
|
88
|
-
: null;
|
|
69
|
+
const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw) ? linesRemovedRaw : null;
|
|
89
70
|
return {
|
|
90
71
|
startGitSha,
|
|
91
72
|
endGitSha,
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
import type { Application } from 'express';
|
|
2
2
|
import type { ConsoleService } from './console-service.js';
|
|
3
|
-
import type {
|
|
3
|
+
import type { IWorkflowReader } from '../../types/storage.js';
|
|
4
4
|
import type { ToolCallTimingRingBuffer } from '../../mcp/tool-call-timing.js';
|
|
5
5
|
import type { V2ToolContext } from '../../mcp/types.js';
|
|
6
|
-
export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?:
|
|
6
|
+
export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?: IWorkflowReader, timingRingBuffer?: ToolCallTimingRingBuffer, toolCallsPerfFile?: string, serverVersion?: string, v2ToolContext?: V2ToolContext): () => void;
|
package/docs/authoring-v2.md
CHANGED
|
@@ -240,7 +240,7 @@ Profile selection guide:
|
|
|
240
240
|
| `"research"` | Workflow produces a finding or recommendation but no commits | Outcome-only reminder on final step only |
|
|
241
241
|
| `"none"` or absent | Meta-workflows, utilities, authoring tools | No injection -- existing behavior unchanged |
|
|
242
242
|
|
|
243
|
-
The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
|
|
243
|
+
The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `wr.workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
|
|
244
244
|
|
|
245
245
|
**Final step detection**: The engine injects the final-step footer on the last top-level step, or on the exit step of a loop that is the last top-level step. A loop in a non-terminal position does not trigger the final-step footer on its exit step.
|
|
246
246
|
|
|
@@ -551,11 +551,11 @@ To keep authoring simple:
|
|
|
551
551
|
|
|
552
552
|
Workflows can drift out of sync with the authoring spec they were written against. WorkRail surfaces this as a `staleness` signal in `list_workflows` and `inspect_workflow` output.
|
|
553
553
|
|
|
554
|
-
**How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
|
|
554
|
+
**How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `wr.workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
|
|
555
555
|
|
|
556
556
|
- `none` — workflow was validated against the current spec version
|
|
557
557
|
- `likely` — spec was updated since the workflow was last reviewed
|
|
558
|
-
- `possible` — workflow has never been run through `workflow-for-workflows`
|
|
558
|
+
- `possible` — workflow has never been run through `wr.workflow-for-workflows`
|
|
559
559
|
|
|
560
560
|
**Stamping a workflow:**
|
|
561
561
|
|
|
@@ -564,7 +564,7 @@ npm run stamp-workflow -- workflows/my-workflow.json
|
|
|
564
564
|
git add workflows/my-workflow.json && git commit -m "chore: stamp workflow"
|
|
565
565
|
```
|
|
566
566
|
|
|
567
|
-
The stamp must be committed to take effect. The `workflow-for-workflows` Phase 7 step includes a reminder to do this.
|
|
567
|
+
The stamp must be committed to take effect. The `wr.workflow-for-workflows` Phase 7 step includes a reminder to do this.
|
|
568
568
|
|
|
569
569
|
**Visibility:** By default, the staleness signal is only shown for user-owned/imported workflows (`personal`, `rooted_sharing`, `external`). Built-in and legacy_project workflows are excluded. Set `WORKRAIL_DEV=1` to see staleness for all categories (useful for catalog maintenance).
|
|
570
570
|
|
package/docs/changelog-recent.md
CHANGED
|
@@ -117,7 +117,7 @@ Structured migration workflow for moving code between platforms (Android to iOS,
|
|
|
117
117
|
|
|
118
118
|
Since you've created workflows yourself, these changes are directly relevant.
|
|
119
119
|
|
|
120
|
-
### `workflow-for-workflows.v2.json` was rebuilt
|
|
120
|
+
### `wr.workflow-for-workflows.v2.json` was rebuilt
|
|
121
121
|
|
|
122
122
|
The workflow used to create or modernize other workflows was significantly redesigned. The full phase structure now includes:
|
|
123
123
|
|
|
@@ -179,12 +179,12 @@ A visual catalog of every available workflow. Eight category filter pills. Click
|
|
|
179
179
|
WorkRail can now detect when a workflow hasn't been reviewed against the current authoring spec. Three signal levels:
|
|
180
180
|
|
|
181
181
|
- `none` -- validated against the current spec (has a version stamp and it's current)
|
|
182
|
-
- `possible` -- no version stamp (was never run through `workflow-for-workflows`)
|
|
182
|
+
- `possible` -- no version stamp (was never run through `wr.workflow-for-workflows`)
|
|
183
183
|
- `likely` -- has a stamp, but the spec has been updated since the workflow was last reviewed
|
|
184
184
|
|
|
185
185
|
This shows up in `list_workflows` output (agents see it) and in the CI registry validation check. It's shown only for non-built-in workflows -- built-in workflows ship with their own quality process and don't show staleness signals.
|
|
186
186
|
|
|
187
|
-
**What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
|
|
187
|
+
**What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `wr.workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
|
|
188
188
|
|
|
189
189
|
---
|
|
190
190
|
|
package/docs/configuration.md
CHANGED
|
@@ -407,7 +407,7 @@ When `isComplete: true` is returned, summarize all work done across the workflow
|
|
|
407
407
|
After creating this file, the agent becomes available via the Agent tool:
|
|
408
408
|
|
|
409
409
|
```
|
|
410
|
-
Agent(subagent_type="workrail-executor", prompt="Start the bug-investigation
|
|
410
|
+
Agent(subagent_type="workrail-executor", prompt="Start the wr.bug-investigation workflow...")
|
|
411
411
|
```
|
|
412
412
|
|
|
413
413
|
### Cursor
|
|
@@ -59,7 +59,7 @@ Must stay consistent with:
|
|
|
59
59
|
- `src/v2/durable-core/schemas/artifacts/` (typed artifact schemas)
|
|
60
60
|
- `workflows/wr.discovery.json` (Phase 7 -- if emitting artifact)
|
|
61
61
|
- `workflows/wr.shaping.json` (Step 1 -- if adding file search)
|
|
62
|
-
- `workflows/coding-task
|
|
62
|
+
- `workflows/wr.coding-task.json` (Phase 0.5 -- no changes expected)
|
|
63
63
|
|
|
64
64
|
---
|
|
65
65
|
|
|
@@ -73,7 +73,7 @@ WorkTrain sessions are fully isolated. Each spawned session starts from the work
|
|
|
73
73
|
|
|
74
74
|
**1. File-based handoff (wr.shaping -> coding):**
|
|
75
75
|
- `wr.shaping` Step 9 writes `.workrail/current-pitch.md` at the workspace path
|
|
76
|
-
- `coding-task
|
|
76
|
+
- `wr.coding-task` Phase 0.5 actively searches for upstream docs via repo search, WebFetch, MCP integrations
|
|
77
77
|
- Phase 0.5 would find `.workrail/current-pitch.md` automatically
|
|
78
78
|
- **Status: effectively already works** -- no coordinator intervention needed for Shaping->Coding
|
|
79
79
|
|
|
@@ -16,9 +16,9 @@
|
|
|
16
16
|
|
|
17
17
|
3. **Monolithic coordinator vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one pipeline mode. Five modes in one file would be unmanageable. The right architecture decomposes into mode files with a thin dispatcher -- but this requires deciding the seam deliberately.
|
|
18
18
|
|
|
19
|
-
4. **`recommendedPipeline` verbatim vs advisory**: If classify-task
|
|
19
|
+
4. **`recommendedPipeline` verbatim vs advisory**: If wr.classify-task's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
|
|
20
20
|
|
|
21
|
-
5. **Phase 0.5 vs coordinator routing for upstream context**: `coding-task
|
|
21
|
+
5. **Phase 0.5 vs coordinator routing for upstream context**: `wr.coding-task` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
|
|
22
22
|
|
|
23
23
|
### What the codebase already solves (and how)
|
|
24
24
|
|
|
@@ -29,12 +29,12 @@
|
|
|
29
29
|
- Escalation-first: every failure produces `escalated: true` + `escalationReason`, never silent substitution
|
|
30
30
|
- TRACE log before acting on routing decision
|
|
31
31
|
|
|
32
|
-
**`classify-task
|
|
32
|
+
**`wr.classify-task.json`:**
|
|
33
33
|
- Exists as of v3.40.0. Single LLM step, no tools, outputs `recommendedPipeline` as ordered workflow ID array
|
|
34
34
|
- Output format: structured text block with `recommendedPipeline: ["...", "..."]` line
|
|
35
35
|
- Note: `spawn_agent` does NOT return artifacts (v3.40.0 limitation #5) -- output must be read via `spawnSession` + `awaitSessions` + `getAgentResult` + note parsing
|
|
36
36
|
|
|
37
|
-
**Phase 0.5 (`coding-task
|
|
37
|
+
**Phase 0.5 (`wr.coding-task`):**
|
|
38
38
|
- Already detects `pitch.md` and sets `solutionFixed=true`, skipping design phases
|
|
39
39
|
- The coordinator's "IMPLEMENT mode" (skip discovery/shaping) and Phase 0.5 are complementary, not conflicting
|
|
40
40
|
|
|
@@ -78,7 +78,7 @@ From CLAUDE.md (stated) and pr-review.ts (practiced):
|
|
|
78
78
|
- `src/cli-worktrain.ts` -- needs `worktrain run pipeline` subcommand wiring
|
|
79
79
|
- `src/coordinators/pr-review.ts` -- must remain unchanged; new coordinator is additive
|
|
80
80
|
- `src/trigger/types.ts` -- if Candidate D's `pipelineMode` field is added; otherwise unchanged
|
|
81
|
-
- `workflows/classify-task
|
|
81
|
+
- `workflows/wr.classify-task.json` -- coordinator depends on its note output format; format changes break parsing
|
|
82
82
|
- `src/coordinators/routing/route-task.ts` (new) -- pure routing function; all mode selection logic lives here
|
|
83
83
|
- `src/coordinators/modes/*.ts` (new files) -- each mode's pipeline execution logic
|
|
84
84
|
- Test suite: each mode coordinator needs its own unit tests with `CoordinatorDeps` fakes
|
|
@@ -110,8 +110,8 @@ type PipelineMode =
|
|
|
110
110
|
**Per-mode pipelines:**
|
|
111
111
|
- REVIEW_ONLY: `mr-review-workflow.agentic.v2` -> route by verdict
|
|
112
112
|
- QUICK_REVIEW: same + light model config, no arch audit override
|
|
113
|
-
- IMPLEMENT: `coding-task
|
|
114
|
-
- FULL: `wr.discovery` -> `wr.shaping` -> `coding-task
|
|
113
|
+
- IMPLEMENT: `wr.coding-task` (Phase 0.5 picks up pitch) -> PR -> review -> merge
|
|
114
|
+
- FULL: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
|
|
115
115
|
|
|
116
116
|
**Tensions resolved:** determinism, YAGNI, no LLM latency.
|
|
117
117
|
**Tensions accepted:** all ambiguous tasks fall to FULL (wasteful for Medium complexity tasks that don't need full discovery).
|
|
@@ -125,14 +125,14 @@ type PipelineMode =
|
|
|
125
125
|
|
|
126
126
|
---
|
|
127
127
|
|
|
128
|
-
### Candidate B: classify-task
|
|
128
|
+
### Candidate B: wr.classify-task as authoritative source
|
|
129
129
|
|
|
130
|
-
**Summary:** Always spawn `classify-task
|
|
130
|
+
**Summary:** Always spawn `wr.classify-task` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
|
|
131
131
|
|
|
132
132
|
**Architecture:**
|
|
133
133
|
```typescript
|
|
134
134
|
async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
|
|
135
|
-
const handle = await deps.spawnSession('classify-task
|
|
135
|
+
const handle = await deps.spawnSession('wr.classify-task', `Classify: ${goal}`, workspace);
|
|
136
136
|
await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS); // 3 minutes max
|
|
137
137
|
const agentResult = await deps.getAgentResult(handle);
|
|
138
138
|
return parseRecommendedPipeline(agentResult.recapMarkdown);
|
|
@@ -141,12 +141,12 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
|
|
|
141
141
|
|
|
142
142
|
`parseRecommendedPipeline` is a pure function parsing the text block (two-tier: JSON array first, regex fallback).
|
|
143
143
|
|
|
144
|
-
**Fallback:** if parsing fails, default to `['wr.discovery', 'coding-task
|
|
144
|
+
**Fallback:** if parsing fails, default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`.
|
|
145
145
|
|
|
146
146
|
**Tensions resolved:** intelligent routing for all tasks including ambiguous ones; single source of truth for pipeline selection rules.
|
|
147
147
|
**Tensions accepted:** non-deterministic; 5-15 second LLM latency per dispatch; no typed `PipelineMode` discriminated union (pipeline is a string[] at coordinator level).
|
|
148
|
-
**Boundary:** classify-task
|
|
149
|
-
**Failure mode:** classify-task
|
|
148
|
+
**Boundary:** wr.classify-task is the routing authority; coordinator is a runner.
|
|
149
|
+
**Failure mode:** wr.classify-task misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
|
|
150
150
|
**Repo pattern:** departs from determinism-over-cleverness principle. No named discriminated union.
|
|
151
151
|
**Gain:** routing rules live in a workflow file -- updatable without code deployment.
|
|
152
152
|
**Give up:** determinism, transparency, typed modes, dispatch speed for obvious cases.
|
|
@@ -157,7 +157,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
|
|
|
157
157
|
|
|
158
158
|
### Candidate C: Static-first with LLM fallback (hybrid, recommended for routing)
|
|
159
159
|
|
|
160
|
-
**Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to classify-task
|
|
160
|
+
**Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to wr.classify-task for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
|
|
161
161
|
|
|
162
162
|
**PipelineMode type (6 variants):**
|
|
163
163
|
```typescript
|
|
@@ -180,7 +180,7 @@ async function routeTask(
|
|
|
180
180
|
// Tier 1: static (pure, no I/O except filesystem check for pitch.md)
|
|
181
181
|
const staticMode = applyStaticRules(goal, workspace);
|
|
182
182
|
if (staticMode !== null) return ok(staticMode);
|
|
183
|
-
// Tier 2: classify-task
|
|
183
|
+
// Tier 2: wr.classify-task
|
|
184
184
|
const classified = await runClassification(goal, workspace, deps);
|
|
185
185
|
if (classified.kind === 'err') return err(`classification failed: ${classified.error}`);
|
|
186
186
|
return ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value, goal });
|
|
@@ -293,7 +293,7 @@ export async function runAdaptivePipeline(
|
|
|
293
293
|
|
|
294
294
|
### Recommendation: Candidate C (routing mechanism) + Candidate E (architecture)
|
|
295
295
|
|
|
296
|
-
**Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via classify-task
|
|
296
|
+
**Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via wr.classify-task. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
|
|
297
297
|
|
|
298
298
|
**Architecture (E):** Per-mode coordinator files with thin dispatcher. Each mode file follows `pr-review.ts` independently. The dispatcher's `switch(mode.kind)` is exhaustive with `assertNever`. Adding a new mode is additive.
|
|
299
299
|
|
|
@@ -301,7 +301,7 @@ export async function runAdaptivePipeline(
|
|
|
301
301
|
|
|
302
302
|
### Why not A (pure static)?
|
|
303
303
|
|
|
304
|
-
Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with classify-task
|
|
304
|
+
Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with wr.classify-task returning `['wr.coding-task', 'mr-review-workflow.agentic.v2']`.
|
|
305
305
|
|
|
306
306
|
### Why not B (pure LLM)?
|
|
307
307
|
|
|
@@ -319,7 +319,7 @@ Non-deterministic routing is unacceptable for the coordinator. A PR review task
|
|
|
319
319
|
|
|
320
320
|
### Pivot conditions
|
|
321
321
|
|
|
322
|
-
1. If classify-task
|
|
322
|
+
1. If wr.classify-task format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
|
|
323
323
|
2. If trigger operators need deterministic routing for automated workflows -> add `pipelineMode` to TriggerDefinition (Candidate D addition)
|
|
324
324
|
3. If context-passing agent's design requires structured handoff data from routing to mode executors -> add a `contextBundle` field to mode types (implementation change, not routing design change)
|
|
325
325
|
|
|
@@ -151,6 +151,6 @@ Even though not called at MVP, having the pure function ready preserves the upgr
|
|
|
151
151
|
|
|
152
152
|
1. **wr.discovery output standardization**: the routing design assumes wr.discovery notes are injected by the coordinator as `assembledContextSummary` for wr.shaping. But wr.discovery's `designDocPath` output location is not standardized (finding from context-passing agent's doc). The FULL mode executor must parse `lastStepNotes` from the discovery session to build the shaping context -- this is per the context-passing agent's Candidate D (coordinator-injected text). This concern is correctly owned by the context-passing design, not the routing design.
|
|
153
153
|
|
|
154
|
-
2. **classify-task
|
|
154
|
+
2. **wr.classify-task format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
|
|
155
155
|
|
|
156
156
|
3. **REVIEW_ONLY vs pr-review coordinator**: the existing `worktrain run pr-review` command already provides REVIEW_ONLY+QUICK_REVIEW behavior. The new `worktrain run pipeline --mode review_only` should either (a) delegate to pr-review coordinator, or (b) reimplement the same logic in `modes/review-only.ts`. Recommendation: (a) delegate -- avoid duplicating the fix-agent loop logic. Document this delegation explicitly.
|
|
@@ -22,7 +22,7 @@
|
|
|
22
22
|
|
|
23
23
|
**Chosen path:** `design_first`
|
|
24
24
|
|
|
25
|
-
**Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `classify-task
|
|
25
|
+
**Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `wr.classify-task.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
|
|
26
26
|
|
|
27
27
|
---
|
|
28
28
|
|
|
@@ -58,12 +58,12 @@ If a chat rewind occurs: the notes and context variables survive; this file may
|
|
|
58
58
|
|
|
59
59
|
**What exists:**
|
|
60
60
|
- `src/coordinators/pr-review.ts` -- 1462-line hardcoded coordinator for PR review. Establishes the `CoordinatorDeps` injectable interface (16 methods), `spawnSession`/`awaitSessions`/`getAgentResult` pattern, fix-agent loop with escalation-first failure policy.
|
|
61
|
-
- `workflows/classify-task
|
|
61
|
+
- `workflows/wr.classify-task.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
|
|
62
62
|
- `src/cli-worktrain.ts` -- wires `worktrain run pr-review` subcommand. No `worktrain run pipeline` or adaptive coordinator command exists yet.
|
|
63
63
|
- `src/trigger/types.ts` -- `TriggerDefinition` has `workflowId`, `goal`, `goalTemplate`, `contextMapping`, `agentConfig`. No `pipelineMode` field.
|
|
64
|
-
- Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> coding-task
|
|
64
|
+
- Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> wr.coding-task`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
|
|
65
65
|
- `wr.shaping` and `wr.discovery` workflows both exist as of v3.40.0.
|
|
66
|
-
- `coding-task
|
|
66
|
+
- `wr.coding-task` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
|
|
67
67
|
|
|
68
68
|
**The Apr 15 backlog full pipeline DAG** (still relevant design intent):
|
|
69
69
|
```
|
|
@@ -91,13 +91,13 @@ trigger
|
|
|
91
91
|
|
|
92
92
|
### Contradictions and tensions
|
|
93
93
|
|
|
94
|
-
- **classify-task
|
|
94
|
+
- **wr.classify-task is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/wr.classify-task.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
|
|
95
95
|
- **"Always run classify-task first"** (Apr 15 backlog) vs. **"Static heuristics for well-known cases"** (primary uncertainty). The Apr 15 backlog says "always" but this was written before Phase 0.5 upstream context detection was built. With Phase 0.5, many routing decisions can be made statically.
|
|
96
96
|
- **`recommendedPipeline` from classify-task** includes `wr.discovery` for Medium/Large tasks, but the Three-Workflow Pipeline decision treats `wr.discovery` as optional. The coordinator must decide: use classify-task's `recommendedPipeline` verbatim, or treat it as a hint that can be overridden by static signals (e.g., pitch.md already present = skip discovery even if classify says Medium)?
|
|
97
97
|
|
|
98
98
|
### Evidence gaps
|
|
99
99
|
|
|
100
|
-
1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `classify-task
|
|
100
|
+
1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `wr.classify-task`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
|
|
101
101
|
2. No existing test harness for a multi-mode coordinator. `pr-review.ts` tests exist but only cover the review pipeline.
|
|
102
102
|
3. The `worktrain-spawn.ts` CLI wiring for `spawnSession` is the only proven path to dispatch sessions from a coordinator script. No other dispatch mechanism has been tested.
|
|
103
103
|
|
|
@@ -122,7 +122,7 @@ trigger
|
|
|
122
122
|
|
|
123
123
|
3. **Single coordinator file vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one mode. A monolithic adaptive coordinator handling all modes risks becoming unmaintainable. Per-mode coordinator functions (each independently testable) with a thin routing dispatcher is a cleaner architecture -- but introduces coordination between files.
|
|
124
124
|
|
|
125
|
-
4. **`recommendedPipeline` verbatim vs as a hint**: classify-task
|
|
125
|
+
4. **`recommendedPipeline` verbatim vs as a hint**: wr.classify-task encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
|
|
126
126
|
|
|
127
127
|
5. **Phase 0.5 vs coordinator routing for upstream context**: coding-task already auto-detects pitch.md. So the coordinator's routing decision for "skip wr.shaping?" partially duplicates Phase 0.5's detection. The coordinator should route based on what phases to _spawn_, not what the coding workflow will internally skip -- but these can diverge (coordinator spawns shaping but coding-task's Phase 0.5 would have skipped it anyway).
|
|
128
128
|
|
|
@@ -130,8 +130,8 @@ trigger
|
|
|
130
130
|
|
|
131
131
|
- [ ] A `worktrain run pipeline --task "fix the race condition in auth.ts"` command routes to the correct pipeline mode and logs the routing decision before spawning any sessions
|
|
132
132
|
- [ ] A task with `#123` or `PR #123` in the goal routes to REVIEW_ONLY without spawning discovery or shaping sessions
|
|
133
|
-
- [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (coding-task
|
|
134
|
-
- [ ] An ambiguous task (no static signal) routes to classify-task
|
|
133
|
+
- [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (wr.coding-task only)
|
|
134
|
+
- [ ] An ambiguous task (no static signal) routes to wr.classify-task session, parses `recommendedPipeline`, and executes that pipeline
|
|
135
135
|
- [ ] A `dep bump` or `chore:` task routes to QUICK_REVIEW (mr-review only, no arch audit) based on goal text heuristics
|
|
136
136
|
- [ ] Any phase failure produces a `PipelineOutcome` with `escalated: true` and a structured `escalationReason` -- no silent substitution
|
|
137
137
|
- [ ] The `CoordinatorDeps` interface for the adaptive coordinator extends or reuses the existing `CoordinatorDeps` pattern from `pr-review.ts`
|
|
@@ -139,8 +139,8 @@ trigger
|
|
|
139
139
|
|
|
140
140
|
### Assumptions not yet verified
|
|
141
141
|
|
|
142
|
-
1. `classify-task
|
|
143
|
-
2. The `recommendedPipeline` text can be reliably parsed from classify-task
|
|
142
|
+
1. `wr.classify-task` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
|
|
143
|
+
2. The `recommendedPipeline` text can be reliably parsed from wr.classify-task's note output using a regex or structured block parser
|
|
144
144
|
3. A new CLI subcommand `worktrain run pipeline` can be added following the same pattern as `worktrain run pr-review` in `src/cli-worktrain.ts`
|
|
145
145
|
4. Pipeline modes can be named and bounded at design time (not open-ended)
|
|
146
146
|
|
|
@@ -151,17 +151,17 @@ trigger
|
|
|
151
151
|
### HMW (How Might We) reframes
|
|
152
152
|
|
|
153
153
|
- HMW make the pipeline mode explicit in the trigger config so routing is never ambiguous, while still supporting dynamic routing for ad-hoc CLI invocations?
|
|
154
|
-
- HMW use classify-task
|
|
154
|
+
- HMW use wr.classify-task's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
|
|
155
155
|
|
|
156
156
|
### Primary uncertainty (updated)
|
|
157
157
|
|
|
158
|
-
Can classify-task
|
|
158
|
+
Can wr.classify-task's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
|
|
159
159
|
|
|
160
160
|
### Known approaches
|
|
161
161
|
|
|
162
|
-
1. **classify-task
|
|
162
|
+
1. **wr.classify-task first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
|
|
163
163
|
2. **Static heuristics** -- parse goal text and trigger metadata (PR number present, labels, pitch.md present, explicit pipelineMode flag on trigger). Zero LLM cost, covers well-defined cases.
|
|
164
|
-
3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `classify-task
|
|
164
|
+
3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `wr.classify-task` is an optional fast path, not always required.
|
|
165
165
|
4. **Explicit `pipelineMode` on trigger** -- add a `pipelineMode` field to `TriggerDefinition` (or as a context variable). Users/triggers declare mode explicitly. Removes ambiguity but requires configuration overhead.
|
|
166
166
|
5. **classify-task advisory + static overrides** -- run classify-task first (small cost, accurate), then apply static override rules on top of `recommendedPipeline` to handle well-known signals. Classify sets the baseline; static rules correct known exceptions.
|
|
167
167
|
|
|
@@ -221,8 +221,8 @@ function routeTask(goal: string, workspace: string): PipelineMode
|
|
|
221
221
|
**Per-mode pipeline sequences:**
|
|
222
222
|
- `REVIEW_ONLY`: `mr-review-workflow.agentic.v2` -> route by verdict (clean: merge, minor: fix-agent-loop, blocking: escalate)
|
|
223
223
|
- `QUICK_REVIEW`: same as REVIEW_ONLY but `agentConfig: { model: 'haiku-light' }`, no arch audit even if touched
|
|
224
|
-
- `IMPLEMENT`: `coding-task
|
|
225
|
-
- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task
|
|
224
|
+
- `IMPLEMENT`: `wr.coding-task` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
|
|
225
|
+
- `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> `mr-review-workflow.agentic.v2` -> merge
|
|
226
226
|
|
|
227
227
|
**Failure handling:** each phase failure returns a `PipelineOutcome` with `escalated: true` and `escalationReason`. No fallback to simpler pipeline. Same pattern as `PrOutcome` in pr-review.ts.
|
|
228
228
|
|
|
@@ -238,14 +238,14 @@ function routeTask(goal: string, workspace: string): PipelineMode
|
|
|
238
238
|
|
|
239
239
|
---
|
|
240
240
|
|
|
241
|
-
### Candidate B: classify-task
|
|
241
|
+
### Candidate B: wr.classify-task as authoritative source (pure LLM routing)
|
|
242
242
|
|
|
243
|
-
**One-sentence summary:** The coordinator always spawns a `classify-task
|
|
243
|
+
**One-sentence summary:** The coordinator always spawns a `wr.classify-task` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
|
|
244
244
|
|
|
245
245
|
**Architecture:**
|
|
246
246
|
```typescript
|
|
247
247
|
async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
|
|
248
|
-
const handle = await deps.spawnSession('classify-task
|
|
248
|
+
const handle = await deps.spawnSession('wr.classify-task', goal, workspace);
|
|
249
249
|
const result = await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS);
|
|
250
250
|
const notes = await deps.getAgentResult(handle);
|
|
251
251
|
return parseRecommendedPipeline(notes.recapMarkdown); // pure function, text block parser
|
|
@@ -257,15 +257,15 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
|
|
|
257
257
|
|
|
258
258
|
**Pipeline modes:** not named at the coordinator level -- the pipeline IS whatever classify-task returns. The coordinator just runs the sequence.
|
|
259
259
|
|
|
260
|
-
**Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'coding-task
|
|
260
|
+
**Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
|
|
261
261
|
|
|
262
262
|
**Tensions resolved:** intelligent routing for ambiguous tasks; single source of truth for pipeline selection rules (the workflow, not the coordinator).
|
|
263
263
|
**Tensions accepted:** non-deterministic (same task may classify differently); adds 5-15 second LLM latency per dispatch; `recommendedPipeline` is a string array of workflow IDs, not a typed discriminated union.
|
|
264
264
|
**Failure mode to watch:** coordinator runs `wr.discovery` unnecessarily for PR-only tasks if classify-task misclassifies them. Recovery: add static pre-check before spawning classify-task.
|
|
265
|
-
**Follows:** classify-task
|
|
265
|
+
**Follows:** wr.classify-task's existing decision rules are already correct; this candidate delegates trust to them.
|
|
266
266
|
**Gain:** routing rules live in the workflow, not the coordinator -- can be updated without code changes.
|
|
267
267
|
**Give up:** determinism, routing transparency (routing reason requires parsing LLM output), typed pipeline modes.
|
|
268
|
-
**Impact surface:** classify-task
|
|
268
|
+
**Impact surface:** wr.classify-task becomes a critical dependency -- format changes break coordinator.
|
|
269
269
|
**Scope judgment:** Best-fit for teams that want routing rules to evolve without code deployment.
|
|
270
270
|
**Philosophy:** Honors dependency injection (classify-task as a boundary). Conflicts with determinism-over-cleverness (LLM routing is clever but non-deterministic).
|
|
271
271
|
|
|
@@ -273,7 +273,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
|
|
|
273
273
|
|
|
274
274
|
### Candidate C: static-first with LLM fallback (hybrid, recommended)
|
|
275
275
|
|
|
276
|
-
**One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to classify-task
|
|
276
|
+
**One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to wr.classify-task only for ambiguous tasks where no static signal fires.
|
|
277
277
|
|
|
278
278
|
**Architecture:**
|
|
279
279
|
```typescript
|
|
@@ -303,7 +303,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
|
|
|
303
303
|
- `REVIEW_ONLY`: same as Candidate A
|
|
304
304
|
- `QUICK_REVIEW`: same as Candidate A
|
|
305
305
|
- `IMPLEMENT`: same as Candidate A
|
|
306
|
-
- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task
|
|
306
|
+
- `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
|
|
307
307
|
- `CLASSIFY_AND_RUN`: execute phases from classify-task output in order; unknown workflow IDs escalate
|
|
308
308
|
|
|
309
309
|
**Failure handling:** escalation-first, same as pr-review.ts. The routing failure (classify-task parse failure) produces ESCALATE mode with reason.
|
|
@@ -314,7 +314,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
|
|
|
314
314
|
**Follows:** parseFindingsFromNotes two-tier strategy pattern. CoordinatorDeps injection for the LLM fallback path.
|
|
315
315
|
**Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
|
|
316
316
|
**Give up:** complexity of two tiers; CLASSIFY_AND_RUN mode is not a named type with typed data.
|
|
317
|
-
**Impact surface:** same as Candidate A plus classify-task
|
|
317
|
+
**Impact surface:** same as Candidate A plus wr.classify-task dependency.
|
|
318
318
|
**Scope judgment:** Best-fit -- covers all named use cases efficiently. YAGNI risk is low because the LLM fallback adds ~30 lines of code, not a new architecture.
|
|
319
319
|
**Philosophy:** Honors immutability, exhaustiveness (switch on PipelineMode is exhaustive), determinism-over-cleverness (static tier is deterministic, LLM is bounded fallback), errors-as-data.
|
|
320
320
|
|
|
@@ -421,7 +421,7 @@ Each mode coordinator is ~300-600 lines, fully independently testable. No mode-s
|
|
|
421
421
|
|
|
422
422
|
### Recommendation: C + E (Candidate C routing mechanism, Candidate E file architecture)
|
|
423
423
|
|
|
424
|
-
**The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via classify-task
|
|
424
|
+
**The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via wr.classify-task. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
|
|
425
425
|
|
|
426
426
|
**The architecture decision (E):** Per-mode coordinator files with a thin dispatcher is the correct architecture for 5 modes. Each mode file follows pr-review.ts independently. The dispatcher is the only code that changes when a new mode is added. This is how the codebase is already structured (pr-review.ts is one mode file) -- Candidate E just makes the pattern explicit.
|
|
427
427
|
|
|
@@ -447,7 +447,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
|
|
|
447
447
|
|
|
448
448
|
### Pivot conditions
|
|
449
449
|
|
|
450
|
-
- If `classify-task
|
|
450
|
+
- If `wr.classify-task` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
|
|
451
451
|
- If `TriggerDefinition` change is needed for automated workflows, add Candidate D's pipelineMode field
|
|
452
452
|
- If context-passing agent's design shows that the coordinator must inject structured context at spawn time, the mode coordinator files must include context injection logic -- this is implementation detail, not a routing design change
|
|
453
453
|
|
|
@@ -466,7 +466,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
|
|
|
466
466
|
1. **CLASSIFY_AND_RUN seam crack (genuine weakness, not blocking):** C's CLASSIFY_AND_RUN mode creates a typed/untyped seam in the dispatcher. Mitigation: CLASSIFY_AND_RUN fires only for tasks with no static signal; the dispatcher handles it with a dedicated `runClassifyAndRunPipeline` function that is documented as the "catch-all" path. Alternatively: fold CLASSIFY_AND_RUN into FULL (just run the three-workflow pipeline for all ambiguous tasks) and remove the LLM fallback entirely. This would make C = A for ambiguous tasks, simplifying the design.
|
|
467
467
|
- **Final decision: simplify C by removing CLASSIFY_AND_RUN. Ambiguous tasks (no static signal) default to FULL. This gives Candidate A's simplicity with Candidate C's structure.**
|
|
468
468
|
|
|
469
|
-
2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No classify-task
|
|
469
|
+
2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No wr.classify-task dependency at all for the initial implementation.
|
|
470
470
|
|
|
471
471
|
### Final simplified design (A + E, not C + E)
|
|
472
472
|
|
|
@@ -489,7 +489,7 @@ Static rules (prioritized):
|
|
|
489
489
|
3. `.workrail/current-pitch.md` exists -> `IMPLEMENT`
|
|
490
490
|
4. else -> `FULL`
|
|
491
491
|
|
|
492
|
-
**Why remove CLASSIFY_AND_RUN:** classify-task
|
|
492
|
+
**Why remove CLASSIFY_AND_RUN:** wr.classify-task adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
|
|
493
493
|
|
|
494
494
|
**Architecture (E as designed):**
|
|
495
495
|
```
|
|
@@ -549,7 +549,7 @@ src/coordinators/
|
|
|
549
549
|
|
|
550
550
|
1. **Routing determines spawn order, not context shape.** The routing layer (`routeTask()`) produces a `PipelineMode` variant. It does NOT know what context to pass to each spawned session. Context injection is entirely the responsibility of each mode coordinator (full-pipeline.ts, implement.ts, etc.), not the routing layer.
|
|
551
551
|
|
|
552
|
-
2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `coding-task
|
|
552
|
+
2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
|
|
553
553
|
|
|
554
554
|
3. **pitch.md is the canonical Shaping->Coding handoff.** The `IMPLEMENT` mode routes directly to coding because `current-pitch.md` already exists. The coding-task Phase 0.5 detects it and uses it. If the context-passing agent introduces a different handoff mechanism (e.g., coordinator-injected context instead of a file), the `IMPLEMENT` mode coordinator needs to inject that context at spawn time rather than relying on Phase 0.5 file detection.
|
|
555
555
|
|
|
@@ -582,8 +582,8 @@ The adaptive coordinator uses **pure static routing with per-mode file decomposi
|
|
|
582
582
|
|------|---------------|
|
|
583
583
|
| `REVIEW_ONLY` | `mr-review-workflow.agentic.v2` → verdict routing (clean: merge, minor: fix-loop, blocking: escalate) |
|
|
584
584
|
| `QUICK_REVIEW` | same as REVIEW_ONLY with lighter model config |
|
|
585
|
-
| `IMPLEMENT` | `coding-task
|
|
586
|
-
| `FULL` | `wr.discovery` → `wr.shaping` → `coding-task
|
|
585
|
+
| `IMPLEMENT` | `wr.coding-task` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
|
|
586
|
+
| `FULL` | `wr.discovery` → `wr.shaping` → `wr.coding-task` → PR → `mr-review-workflow.agentic.v2` → merge |
|
|
587
587
|
|
|
588
588
|
**File architecture (Candidate E):**
|
|
589
589
|
```
|
|
@@ -633,7 +633,7 @@ const COORDINATOR_MAX_MS = 120 * 60 * 1000; // 120 min total coordinator wa
|
|
|
633
633
|
- Routing decision is logged as traceability JSON before any session spawn
|
|
634
634
|
- FULL pipeline: each phase is an independent escalation point (discovery-fail, shaping-fail, coding-fail each escalate independently)
|
|
635
635
|
|
|
636
|
-
**Why LLM classification (classify-task
|
|
636
|
+
**Why LLM classification (wr.classify-task) was excluded:**
|
|
637
637
|
|
|
638
638
|
After adversarial challenge, CLASSIFY_AND_RUN mode was removed. The LLM classification path adds non-determinism and format-parsing fragility (notes parsing vs typed artifact) for no concrete MVP benefit. All 5 stated use cases are covered by static rules. The upgrade path to add classify-task as a Tier 2 fallback exists when evidence shows >5% misrouting in production.
|
|
639
639
|
|
|
@@ -46,7 +46,7 @@ WorkRail defines three distinct tiers of execution. The system automatically sel
|
|
|
46
46
|
How does WorkRail know which tier to use? It uses a **"Verify then Delegate"** pattern (The Probe Protocol).
|
|
47
47
|
|
|
48
48
|
### 1. The Boot Check (Diagnostic Phase)
|
|
49
|
-
When a session starts (or via the `
|
|
49
|
+
When a session starts (or via the `wr.diagnose-environment` workflow), WorkRail guides the Main Agent to probe the environment:
|
|
50
50
|
|
|
51
51
|
1. **Check for Subagents:** "Do you have a 'Researcher' subagent?"
|
|
52
52
|
* *No:* **Fallback to Tier 1 (Solo).**
|
|
@@ -74,7 +74,7 @@ When executing a workflow step that calls for a specialized routine:
|
|
|
74
74
|
|
|
75
75
|
To support this protocol, WorkRail provides:
|
|
76
76
|
|
|
77
|
-
1. **The Diagnostic Workflow:** A guided utility (`
|
|
77
|
+
1. **The Diagnostic Workflow:** A guided utility (`wr.diagnose-environment.json`) to help users verify and configure their agents.
|
|
78
78
|
2. **The Asset Pack:** Standardized definitions for common roles (Researcher, Architect, Builder, Reviewer) that users can copy-paste into their IDE configs.
|
|
79
79
|
* Includes System Prompts (for Tiers 1-3).
|
|
80
80
|
* Includes Tool Whitelists (for enabling Tier 3).
|