@exaudeus/workrail 3.66.0 → 3.68.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/dist/application/services/compiler/template-registry.js +10 -1
  2. package/dist/application/validation.js +1 -1
  3. package/dist/cli/commands/worktrain-init.js +1 -1
  4. package/dist/console/standalone-console.js +4 -1
  5. package/dist/console-ui/assets/{index-BynU38Vu.js → index-CyzltI6D.js} +1 -1
  6. package/dist/console-ui/index.html +1 -1
  7. package/dist/coordinators/modes/full-pipeline.js +4 -4
  8. package/dist/coordinators/modes/implement-shared.js +5 -5
  9. package/dist/coordinators/modes/implement.js +4 -4
  10. package/dist/coordinators/pr-review.js +4 -4
  11. package/dist/daemon/workflow-runner.d.ts +1 -0
  12. package/dist/daemon/workflow-runner.js +1 -0
  13. package/dist/infrastructure/storage/schema-validating-workflow-storage.d.ts +21 -2
  14. package/dist/infrastructure/storage/schema-validating-workflow-storage.js +48 -0
  15. package/dist/manifest.json +41 -41
  16. package/dist/mcp/handlers/v2-workflow.js +24 -7
  17. package/dist/mcp/output-schemas.d.ts +36 -0
  18. package/dist/mcp/output-schemas.js +11 -1
  19. package/dist/mcp/workflow-protocol-contracts.js +2 -2
  20. package/dist/v2/projections/session-metrics.d.ts +1 -1
  21. package/dist/v2/projections/session-metrics.js +16 -35
  22. package/dist/v2/usecases/console-routes.d.ts +2 -2
  23. package/docs/authoring-v2.md +4 -4
  24. package/docs/changelog-recent.md +3 -3
  25. package/docs/configuration.md +1 -1
  26. package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
  27. package/docs/design/adaptive-coordinator-context.md +1 -1
  28. package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
  29. package/docs/design/adaptive-coordinator-routing-review.md +1 -1
  30. package/docs/design/adaptive-coordinator-routing.md +34 -34
  31. package/docs/design/agent-cascade-protocol.md +2 -2
  32. package/docs/design/console-daemon-separation-discovery.md +323 -0
  33. package/docs/design/context-assembly-design-candidates.md +1 -1
  34. package/docs/design/context-assembly-implementation-plan.md +1 -1
  35. package/docs/design/context-assembly-layer.md +2 -2
  36. package/docs/design/context-assembly-review-findings.md +1 -1
  37. package/docs/design/coordinator-access-audit.md +293 -0
  38. package/docs/design/coordinator-architecture-audit.md +62 -0
  39. package/docs/design/coordinator-error-handling-audit.md +240 -0
  40. package/docs/design/coordinator-testability-audit.md +426 -0
  41. package/docs/design/daemon-architecture-discovery.md +1 -1
  42. package/docs/design/daemon-console-separation-discovery.md +242 -0
  43. package/docs/design/daemon-memory-audit.md +203 -0
  44. package/docs/design/design-candidates-console-daemon-separation.md +256 -0
  45. package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
  46. package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
  47. package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
  48. package/docs/design/discovery-loop-fix-candidates.md +161 -0
  49. package/docs/design/discovery-loop-fix-design-review.md +106 -0
  50. package/docs/design/discovery-loop-fix-validation.md +258 -0
  51. package/docs/design/discovery-loop-investigation-A.md +188 -0
  52. package/docs/design/discovery-loop-investigation-B.md +287 -0
  53. package/docs/design/exploration-workflow-candidates.md +205 -0
  54. package/docs/design/exploration-workflow-design-review.md +166 -0
  55. package/docs/design/exploration-workflow-discovery.md +443 -0
  56. package/docs/design/ide-context-files-candidates.md +231 -0
  57. package/docs/design/ide-context-files-design-review.md +85 -0
  58. package/docs/design/ide-context-files.md +615 -0
  59. package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
  60. package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
  61. package/docs/design/in-process-http-audit.md +190 -0
  62. package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
  63. package/docs/design/loadSessionNotes-candidates.md +108 -0
  64. package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
  65. package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
  66. package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
  67. package/docs/design/probe-session-design-candidates.md +261 -0
  68. package/docs/design/probe-session-phase0.md +490 -0
  69. package/docs/design/routines-guide.md +7 -7
  70. package/docs/design/session-metrics-attribution-candidates.md +250 -0
  71. package/docs/design/session-metrics-attribution-design-review.md +115 -0
  72. package/docs/design/session-metrics-attribution-discovery.md +319 -0
  73. package/docs/design/session-metrics-candidates.md +227 -0
  74. package/docs/design/session-metrics-design-review.md +104 -0
  75. package/docs/design/session-metrics-discovery.md +454 -0
  76. package/docs/design/spawn-session-debug.md +202 -0
  77. package/docs/design/trigger-validator-candidates.md +214 -0
  78. package/docs/design/trigger-validator-review.md +109 -0
  79. package/docs/design/trigger-validator-shaping-phase0.md +239 -0
  80. package/docs/design/trigger-validator.md +454 -0
  81. package/docs/design/v2-core-design-locks.md +2 -2
  82. package/docs/design/workflow-extension-points.md +15 -15
  83. package/docs/design/workflow-id-validation-at-startup.md +1 -1
  84. package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
  85. package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
  86. package/docs/design/worktrain-task-queue-candidates.md +5 -5
  87. package/docs/design/worktrain-task-queue.md +4 -4
  88. package/docs/discovery/coordinator-script-design.md +1 -1
  89. package/docs/discovery/coordinator-ux-discovery.md +3 -3
  90. package/docs/discovery/simulation-report.md +1 -1
  91. package/docs/discovery/workflow-modernization-discovery.md +326 -0
  92. package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
  93. package/docs/discovery/worktrain-status-briefing.md +1 -1
  94. package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
  95. package/docs/docker.md +1 -1
  96. package/docs/ideas/backlog.md +227 -0
  97. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
  98. package/docs/integrations/claude-code.md +5 -5
  99. package/docs/integrations/firebender.md +1 -1
  100. package/docs/plans/agentic-orchestration-roadmap.md +2 -2
  101. package/docs/plans/mr-review-workflow-redesign.md +9 -9
  102. package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
  103. package/docs/plans/ui-ux-workflow-discovery.md +2 -2
  104. package/docs/plans/workflow-categories-candidates.md +8 -8
  105. package/docs/plans/workflow-categories-discovery.md +4 -4
  106. package/docs/plans/workflow-modernization-design.md +430 -0
  107. package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
  108. package/docs/plans/workflow-staleness-detection-review.md +4 -4
  109. package/docs/plans/workflow-staleness-detection.md +9 -9
  110. package/docs/plans/workrail-platform-vision.md +3 -3
  111. package/docs/reference/agent-context-cleaner-snippet.md +1 -1
  112. package/docs/reference/agent-context-guidance.md +4 -4
  113. package/docs/reference/context-optimization.md +2 -2
  114. package/docs/roadmap/now-next-later.md +2 -2
  115. package/docs/roadmap/open-work-inventory.md +16 -16
  116. package/docs/workflows.md +31 -31
  117. package/package.json +1 -1
  118. package/spec/workflow-tags.json +47 -47
  119. package/workflows/adaptive-ticket-creation.json +16 -16
  120. package/workflows/architecture-scalability-audit.json +22 -22
  121. package/workflows/bug-investigation.agentic.v2.json +3 -3
  122. package/workflows/classify-task-workflow.json +1 -1
  123. package/workflows/coding-task-workflow-agentic.json +6 -6
  124. package/workflows/cross-platform-code-conversion.v2.json +8 -8
  125. package/workflows/document-creation-workflow.json +8 -8
  126. package/workflows/documentation-update-workflow.json +8 -8
  127. package/workflows/intelligent-test-case-generation.json +2 -2
  128. package/workflows/learner-centered-course-workflow.json +2 -2
  129. package/workflows/mr-review-workflow.agentic.v2.json +4 -4
  130. package/workflows/personal-learning-materials-creation-branched.json +8 -8
  131. package/workflows/presentation-creation.json +5 -5
  132. package/workflows/production-readiness-audit.json +1 -1
  133. package/workflows/relocation-workflow-us.json +31 -31
  134. package/workflows/routines/context-gathering.json +1 -1
  135. package/workflows/routines/design-review.json +1 -1
  136. package/workflows/routines/execution-simulation.json +1 -1
  137. package/workflows/routines/feature-implementation.json +3 -3
  138. package/workflows/routines/final-verification.json +1 -1
  139. package/workflows/routines/hypothesis-challenge.json +1 -1
  140. package/workflows/routines/ideation.json +1 -1
  141. package/workflows/routines/parallel-work-partitioning.json +3 -3
  142. package/workflows/routines/philosophy-alignment.json +2 -2
  143. package/workflows/routines/plan-analysis.json +1 -1
  144. package/workflows/routines/plan-generation.json +1 -1
  145. package/workflows/routines/tension-driven-design.json +6 -6
  146. package/workflows/scoped-documentation-workflow.json +26 -26
  147. package/workflows/ui-ux-design-workflow.json +14 -14
  148. package/workflows/workflow-diagnose-environment.json +1 -1
  149. package/workflows/workflow-for-workflows.json +32 -77
  150. package/workflows/workflow-for-workflows.v2.json +0 -788
@@ -4,7 +4,7 @@ export interface SessionMetricsV2 {
4
4
  readonly endGitSha: string | null;
5
5
  readonly gitBranch: string | null;
6
6
  readonly agentCommitShas: readonly string[];
7
- readonly captureConfidence: 'high' | 'medium' | 'none';
7
+ readonly captureConfidence: 'high' | 'none';
8
8
  readonly durationMs: number | undefined;
9
9
  readonly outcome: 'success' | 'partial' | 'abandoned' | 'error' | null;
10
10
  readonly prNumbers: readonly number[];
@@ -3,24 +3,22 @@ Object.defineProperty(exports, "__esModule", { value: true });
3
3
  exports.projectSessionMetricsV2 = projectSessionMetricsV2;
4
4
  const constants_js_1 = require("../durable-core/constants.js");
5
5
  function projectSessionMetricsV2(events) {
6
- let runCompletedData = null;
7
- let runCompletedRunId = null;
6
+ let runCompleted = null;
8
7
  for (const e of events) {
9
- const asUnknown = e;
10
- if (asUnknown.kind === 'run_completed') {
11
- runCompletedData = asUnknown.data;
12
- runCompletedRunId = asUnknown.scope?.runId ?? null;
8
+ if (e.kind === 'run_completed') {
9
+ runCompleted = e;
13
10
  break;
14
11
  }
15
12
  }
16
- if (runCompletedData === null) {
13
+ if (runCompleted === null) {
17
14
  return null;
18
15
  }
16
+ const runCompletedRunId = runCompleted.scope.runId;
19
17
  const metricsContext = {};
20
18
  for (const e of events) {
21
19
  if (e.kind !== constants_js_1.EVENT_KIND.CONTEXT_SET)
22
20
  continue;
23
- if (runCompletedRunId !== null && e.scope?.runId !== runCompletedRunId)
21
+ if (e.scope?.runId !== runCompletedRunId)
24
22
  continue;
25
23
  const ctx = e.data.context;
26
24
  if (!ctx || typeof ctx !== 'object' || Array.isArray(ctx))
@@ -32,25 +30,15 @@ function projectSessionMetricsV2(events) {
32
30
  }
33
31
  }
34
32
  }
35
- const d = runCompletedData;
33
+ const d = runCompleted.data;
36
34
  const startGitSha = typeof d.startGitSha === 'string' ? d.startGitSha : null;
37
35
  const endGitSha = typeof d.endGitSha === 'string' ? d.endGitSha : null;
38
36
  const gitBranch = typeof d.gitBranch === 'string' ? d.gitBranch : null;
39
- const agentCommitShas = [];
40
- if (Array.isArray(d.agentCommitShas)) {
41
- for (const sha of d.agentCommitShas) {
42
- if (typeof sha === 'string') {
43
- agentCommitShas.push(sha);
44
- }
45
- }
46
- }
47
- const captureConfidenceRaw = d.captureConfidence;
48
- const captureConfidence = captureConfidenceRaw === 'high' || captureConfidenceRaw === 'medium' || captureConfidenceRaw === 'none'
49
- ? captureConfidenceRaw
50
- : 'none';
51
- const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs)
52
- ? d.durationMs
53
- : undefined;
37
+ const agentCommitShas = Array.isArray(d.agentCommitShas)
38
+ ? d.agentCommitShas.filter((s) => typeof s === 'string')
39
+ : [];
40
+ const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs) ? d.durationMs : undefined;
41
+ const captureConfidence = d.captureConfidence === 'high' ? 'high' : 'none';
54
42
  const outcomeRaw = metricsContext['metrics_outcome'];
55
43
  const outcome = outcomeRaw === 'success' || outcomeRaw === 'partial' || outcomeRaw === 'abandoned' || outcomeRaw === 'error'
56
44
  ? outcomeRaw
@@ -68,24 +56,17 @@ function projectSessionMetricsV2(events) {
68
56
  const metricCommitShas = [];
69
57
  if (Array.isArray(commitShasRaw)) {
70
58
  for (const sha of commitShasRaw) {
71
- if (typeof sha === 'string') {
59
+ if (typeof sha === 'string')
72
60
  metricCommitShas.push(sha);
73
- }
74
61
  }
75
62
  }
76
63
  const finalAgentCommitShas = metricCommitShas.length > 0 ? metricCommitShas : agentCommitShas;
77
64
  const filesChangedRaw = metricsContext['metrics_files_changed'];
78
- const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw)
79
- ? filesChangedRaw
80
- : null;
65
+ const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw) ? filesChangedRaw : null;
81
66
  const linesAddedRaw = metricsContext['metrics_lines_added'];
82
- const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw)
83
- ? linesAddedRaw
84
- : null;
67
+ const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw) ? linesAddedRaw : null;
85
68
  const linesRemovedRaw = metricsContext['metrics_lines_removed'];
86
- const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw)
87
- ? linesRemovedRaw
88
- : null;
69
+ const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw) ? linesRemovedRaw : null;
89
70
  return {
90
71
  startGitSha,
91
72
  endGitSha,
@@ -1,6 +1,6 @@
1
1
  import type { Application } from 'express';
2
2
  import type { ConsoleService } from './console-service.js';
3
- import type { WorkflowService } from '../../application/services/workflow-service.js';
3
+ import type { IWorkflowReader } from '../../types/storage.js';
4
4
  import type { ToolCallTimingRingBuffer } from '../../mcp/tool-call-timing.js';
5
5
  import type { V2ToolContext } from '../../mcp/types.js';
6
- export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?: WorkflowService, timingRingBuffer?: ToolCallTimingRingBuffer, toolCallsPerfFile?: string, serverVersion?: string, v2ToolContext?: V2ToolContext): () => void;
6
+ export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?: IWorkflowReader, timingRingBuffer?: ToolCallTimingRingBuffer, toolCallsPerfFile?: string, serverVersion?: string, v2ToolContext?: V2ToolContext): () => void;
@@ -240,7 +240,7 @@ Profile selection guide:
240
240
  | `"research"` | Workflow produces a finding or recommendation but no commits | Outcome-only reminder on final step only |
241
241
  | `"none"` or absent | Meta-workflows, utilities, authoring tools | No injection -- existing behavior unchanged |
242
242
 
243
- The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
243
+ The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `wr.workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
244
244
 
245
245
  **Final step detection**: The engine injects the final-step footer on the last top-level step, or on the exit step of a loop that is the last top-level step. A loop in a non-terminal position does not trigger the final-step footer on its exit step.
246
246
 
@@ -551,11 +551,11 @@ To keep authoring simple:
551
551
 
552
552
  Workflows can drift out of sync with the authoring spec they were written against. WorkRail surfaces this as a `staleness` signal in `list_workflows` and `inspect_workflow` output.
553
553
 
554
- **How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
554
+ **How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `wr.workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
555
555
 
556
556
  - `none` — workflow was validated against the current spec version
557
557
  - `likely` — spec was updated since the workflow was last reviewed
558
- - `possible` — workflow has never been run through `workflow-for-workflows`
558
+ - `possible` — workflow has never been run through `wr.workflow-for-workflows`
559
559
 
560
560
  **Stamping a workflow:**
561
561
 
@@ -564,7 +564,7 @@ npm run stamp-workflow -- workflows/my-workflow.json
564
564
  git add workflows/my-workflow.json && git commit -m "chore: stamp workflow"
565
565
  ```
566
566
 
567
- The stamp must be committed to take effect. The `workflow-for-workflows` Phase 7 step includes a reminder to do this.
567
+ The stamp must be committed to take effect. The `wr.workflow-for-workflows` Phase 7 step includes a reminder to do this.
568
568
 
569
569
  **Visibility:** By default, the staleness signal is only shown for user-owned/imported workflows (`personal`, `rooted_sharing`, `external`). Built-in and legacy_project workflows are excluded. Set `WORKRAIL_DEV=1` to see staleness for all categories (useful for catalog maintenance).
570
570
 
@@ -117,7 +117,7 @@ Structured migration workflow for moving code between platforms (Android to iOS,
117
117
 
118
118
  Since you've created workflows yourself, these changes are directly relevant.
119
119
 
120
- ### `workflow-for-workflows.v2.json` was rebuilt
120
+ ### `wr.workflow-for-workflows.v2.json` was rebuilt
121
121
 
122
122
  The workflow used to create or modernize other workflows was significantly redesigned. The full phase structure now includes:
123
123
 
@@ -179,12 +179,12 @@ A visual catalog of every available workflow. Eight category filter pills. Click
179
179
  WorkRail can now detect when a workflow hasn't been reviewed against the current authoring spec. Three signal levels:
180
180
 
181
181
  - `none` -- validated against the current spec (has a version stamp and it's current)
182
- - `possible` -- no version stamp (was never run through `workflow-for-workflows`)
182
+ - `possible` -- no version stamp (was never run through `wr.workflow-for-workflows`)
183
183
  - `likely` -- has a stamp, but the spec has been updated since the workflow was last reviewed
184
184
 
185
185
  This shows up in `list_workflows` output (agents see it) and in the CI registry validation check. It's shown only for non-built-in workflows -- built-in workflows ship with their own quality process and don't show staleness signals.
186
186
 
187
- **What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
187
+ **What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `wr.workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
188
188
 
189
189
  ---
190
190
 
@@ -407,7 +407,7 @@ When `isComplete: true` is returned, summarize all work done across the workflow
407
407
  After creating this file, the agent becomes available via the Agent tool:
408
408
 
409
409
  ```
410
- Agent(subagent_type="workrail-executor", prompt="Start the bug-investigation-agentic workflow...")
410
+ Agent(subagent_type="workrail-executor", prompt="Start the wr.bug-investigation workflow...")
411
411
  ```
412
412
 
413
413
  ### Cursor
@@ -59,7 +59,7 @@ Must stay consistent with:
59
59
  - `src/v2/durable-core/schemas/artifacts/` (typed artifact schemas)
60
60
  - `workflows/wr.discovery.json` (Phase 7 -- if emitting artifact)
61
61
  - `workflows/wr.shaping.json` (Step 1 -- if adding file search)
62
- - `workflows/coding-task-workflow-agentic.json` (Phase 0.5 -- no changes expected)
62
+ - `workflows/wr.coding-task.json` (Phase 0.5 -- no changes expected)
63
63
 
64
64
  ---
65
65
 
@@ -73,7 +73,7 @@ WorkTrain sessions are fully isolated. Each spawned session starts from the work
73
73
 
74
74
  **1. File-based handoff (wr.shaping -> coding):**
75
75
  - `wr.shaping` Step 9 writes `.workrail/current-pitch.md` at the workspace path
76
- - `coding-task-workflow-agentic` Phase 0.5 actively searches for upstream docs via repo search, WebFetch, MCP integrations
76
+ - `wr.coding-task` Phase 0.5 actively searches for upstream docs via repo search, WebFetch, MCP integrations
77
77
  - Phase 0.5 would find `.workrail/current-pitch.md` automatically
78
78
  - **Status: effectively already works** -- no coordinator intervention needed for Shaping->Coding
79
79
 
@@ -16,9 +16,9 @@
16
16
 
17
17
  3. **Monolithic coordinator vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one pipeline mode. Five modes in one file would be unmanageable. The right architecture decomposes into mode files with a thin dispatcher -- but this requires deciding the seam deliberately.
18
18
 
19
- 4. **`recommendedPipeline` verbatim vs advisory**: If classify-task-workflow's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
19
+ 4. **`recommendedPipeline` verbatim vs advisory**: If wr.classify-task's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
20
20
 
21
- 5. **Phase 0.5 vs coordinator routing for upstream context**: `coding-task-workflow-agentic` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
21
+ 5. **Phase 0.5 vs coordinator routing for upstream context**: `wr.coding-task` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
22
22
 
23
23
  ### What the codebase already solves (and how)
24
24
 
@@ -29,12 +29,12 @@
29
29
  - Escalation-first: every failure produces `escalated: true` + `escalationReason`, never silent substitution
30
30
  - TRACE log before acting on routing decision
31
31
 
32
- **`classify-task-workflow.json`:**
32
+ **`wr.classify-task.json`:**
33
33
  - Exists as of v3.40.0. Single LLM step, no tools, outputs `recommendedPipeline` as ordered workflow ID array
34
34
  - Output format: structured text block with `recommendedPipeline: ["...", "..."]` line
35
35
  - Note: `spawn_agent` does NOT return artifacts (v3.40.0 limitation #5) -- output must be read via `spawnSession` + `awaitSessions` + `getAgentResult` + note parsing
36
36
 
37
- **Phase 0.5 (`coding-task-workflow-agentic`):**
37
+ **Phase 0.5 (`wr.coding-task`):**
38
38
  - Already detects `pitch.md` and sets `solutionFixed=true`, skipping design phases
39
39
  - The coordinator's "IMPLEMENT mode" (skip discovery/shaping) and Phase 0.5 are complementary, not conflicting
40
40
 
@@ -78,7 +78,7 @@ From CLAUDE.md (stated) and pr-review.ts (practiced):
78
78
  - `src/cli-worktrain.ts` -- needs `worktrain run pipeline` subcommand wiring
79
79
  - `src/coordinators/pr-review.ts` -- must remain unchanged; new coordinator is additive
80
80
  - `src/trigger/types.ts` -- if Candidate D's `pipelineMode` field is added; otherwise unchanged
81
- - `workflows/classify-task-workflow.json` -- coordinator depends on its note output format; format changes break parsing
81
+ - `workflows/wr.classify-task.json` -- coordinator depends on its note output format; format changes break parsing
82
82
  - `src/coordinators/routing/route-task.ts` (new) -- pure routing function; all mode selection logic lives here
83
83
  - `src/coordinators/modes/*.ts` (new files) -- each mode's pipeline execution logic
84
84
  - Test suite: each mode coordinator needs its own unit tests with `CoordinatorDeps` fakes
@@ -110,8 +110,8 @@ type PipelineMode =
110
110
  **Per-mode pipelines:**
111
111
  - REVIEW_ONLY: `mr-review-workflow.agentic.v2` -> route by verdict
112
112
  - QUICK_REVIEW: same + light model config, no arch audit override
113
- - IMPLEMENT: `coding-task-workflow-agentic` (Phase 0.5 picks up pitch) -> PR -> review -> merge
114
- - FULL: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
113
+ - IMPLEMENT: `wr.coding-task` (Phase 0.5 picks up pitch) -> PR -> review -> merge
114
+ - FULL: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
115
115
 
116
116
  **Tensions resolved:** determinism, YAGNI, no LLM latency.
117
117
  **Tensions accepted:** all ambiguous tasks fall to FULL (wasteful for Medium complexity tasks that don't need full discovery).
@@ -125,14 +125,14 @@ type PipelineMode =
125
125
 
126
126
  ---
127
127
 
128
- ### Candidate B: classify-task-workflow as authoritative source
128
+ ### Candidate B: wr.classify-task as authoritative source
129
129
 
130
- **Summary:** Always spawn `classify-task-workflow` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
130
+ **Summary:** Always spawn `wr.classify-task` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
131
131
 
132
132
  **Architecture:**
133
133
  ```typescript
134
134
  async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
135
- const handle = await deps.spawnSession('classify-task-workflow', `Classify: ${goal}`, workspace);
135
+ const handle = await deps.spawnSession('wr.classify-task', `Classify: ${goal}`, workspace);
136
136
  await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS); // 3 minutes max
137
137
  const agentResult = await deps.getAgentResult(handle);
138
138
  return parseRecommendedPipeline(agentResult.recapMarkdown);
@@ -141,12 +141,12 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
141
141
 
142
142
  `parseRecommendedPipeline` is a pure function parsing the text block (two-tier: JSON array first, regex fallback).
143
143
 
144
- **Fallback:** if parsing fails, default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
144
+ **Fallback:** if parsing fails, default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`.
145
145
 
146
146
  **Tensions resolved:** intelligent routing for all tasks including ambiguous ones; single source of truth for pipeline selection rules.
147
147
  **Tensions accepted:** non-deterministic; 5-15 second LLM latency per dispatch; no typed `PipelineMode` discriminated union (pipeline is a string[] at coordinator level).
148
- **Boundary:** classify-task-workflow is the routing authority; coordinator is a runner.
149
- **Failure mode:** classify-task-workflow misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
148
+ **Boundary:** wr.classify-task is the routing authority; coordinator is a runner.
149
+ **Failure mode:** wr.classify-task misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
150
150
  **Repo pattern:** departs from determinism-over-cleverness principle. No named discriminated union.
151
151
  **Gain:** routing rules live in a workflow file -- updatable without code deployment.
152
152
  **Give up:** determinism, transparency, typed modes, dispatch speed for obvious cases.
@@ -157,7 +157,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
157
157
 
158
158
  ### Candidate C: Static-first with LLM fallback (hybrid, recommended for routing)
159
159
 
160
- **Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to classify-task-workflow for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
160
+ **Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to wr.classify-task for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
161
161
 
162
162
  **PipelineMode type (6 variants):**
163
163
  ```typescript
@@ -180,7 +180,7 @@ async function routeTask(
180
180
  // Tier 1: static (pure, no I/O except filesystem check for pitch.md)
181
181
  const staticMode = applyStaticRules(goal, workspace);
182
182
  if (staticMode !== null) return ok(staticMode);
183
- // Tier 2: classify-task-workflow
183
+ // Tier 2: wr.classify-task
184
184
  const classified = await runClassification(goal, workspace, deps);
185
185
  if (classified.kind === 'err') return err(`classification failed: ${classified.error}`);
186
186
  return ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value, goal });
@@ -293,7 +293,7 @@ export async function runAdaptivePipeline(
293
293
 
294
294
  ### Recommendation: Candidate C (routing mechanism) + Candidate E (architecture)
295
295
 
296
- **Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via classify-task-workflow. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
296
+ **Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via wr.classify-task. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
297
297
 
298
298
  **Architecture (E):** Per-mode coordinator files with thin dispatcher. Each mode file follows `pr-review.ts` independently. The dispatcher's `switch(mode.kind)` is exhaustive with `assertNever`. Adding a new mode is additive.
299
299
 
@@ -301,7 +301,7 @@ export async function runAdaptivePipeline(
301
301
 
302
302
  ### Why not A (pure static)?
303
303
 
304
- Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with classify-task-workflow returning `['coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
304
+ Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with wr.classify-task returning `['wr.coding-task', 'mr-review-workflow.agentic.v2']`.
305
305
 
306
306
  ### Why not B (pure LLM)?
307
307
 
@@ -319,7 +319,7 @@ Non-deterministic routing is unacceptable for the coordinator. A PR review task
319
319
 
320
320
  ### Pivot conditions
321
321
 
322
- 1. If classify-task-workflow format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
322
+ 1. If wr.classify-task format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
323
323
  2. If trigger operators need deterministic routing for automated workflows -> add `pipelineMode` to TriggerDefinition (Candidate D addition)
324
324
  3. If context-passing agent's design requires structured handoff data from routing to mode executors -> add a `contextBundle` field to mode types (implementation change, not routing design change)
325
325
 
@@ -151,6 +151,6 @@ Even though not called at MVP, having the pure function ready preserves the upgr
151
151
 
152
152
  1. **wr.discovery output standardization**: the routing design assumes wr.discovery notes are injected by the coordinator as `assembledContextSummary` for wr.shaping. But wr.discovery's `designDocPath` output location is not standardized (finding from context-passing agent's doc). The FULL mode executor must parse `lastStepNotes` from the discovery session to build the shaping context -- this is per the context-passing agent's Candidate D (coordinator-injected text). This concern is correctly owned by the context-passing design, not the routing design.
153
153
 
154
- 2. **classify-task-workflow format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
154
+ 2. **wr.classify-task format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
155
155
 
156
156
  3. **REVIEW_ONLY vs pr-review coordinator**: the existing `worktrain run pr-review` command already provides REVIEW_ONLY+QUICK_REVIEW behavior. The new `worktrain run pipeline --mode review_only` should either (a) delegate to pr-review coordinator, or (b) reimplement the same logic in `modes/review-only.ts`. Recommendation: (a) delegate -- avoid duplicating the fix-agent loop logic. Document this delegation explicitly.
@@ -22,7 +22,7 @@
22
22
 
23
23
  **Chosen path:** `design_first`
24
24
 
25
- **Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `classify-task-workflow.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
25
+ **Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `wr.classify-task.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
26
26
 
27
27
  ---
28
28
 
@@ -58,12 +58,12 @@ If a chat rewind occurs: the notes and context variables survive; this file may
58
58
 
59
59
  **What exists:**
60
60
  - `src/coordinators/pr-review.ts` -- 1462-line hardcoded coordinator for PR review. Establishes the `CoordinatorDeps` injectable interface (16 methods), `spawnSession`/`awaitSessions`/`getAgentResult` pattern, fix-agent loop with escalation-first failure policy.
61
- - `workflows/classify-task-workflow.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
61
+ - `workflows/wr.classify-task.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
62
62
  - `src/cli-worktrain.ts` -- wires `worktrain run pr-review` subcommand. No `worktrain run pipeline` or adaptive coordinator command exists yet.
63
63
  - `src/trigger/types.ts` -- `TriggerDefinition` has `workflowId`, `goal`, `goalTemplate`, `contextMapping`, `agentConfig`. No `pipelineMode` field.
64
- - Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> coding-task-workflow-agentic`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
64
+ - Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> wr.coding-task`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
65
65
  - `wr.shaping` and `wr.discovery` workflows both exist as of v3.40.0.
66
- - `coding-task-workflow-agentic` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
66
+ - `wr.coding-task` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
67
67
 
68
68
  **The Apr 15 backlog full pipeline DAG** (still relevant design intent):
69
69
  ```
@@ -91,13 +91,13 @@ trigger
91
91
 
92
92
  ### Contradictions and tensions
93
93
 
94
- - **classify-task-workflow is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/classify-task-workflow.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
94
+ - **wr.classify-task is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/wr.classify-task.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
95
95
  - **"Always run classify-task first"** (Apr 15 backlog) vs. **"Static heuristics for well-known cases"** (primary uncertainty). The Apr 15 backlog says "always" but this was written before Phase 0.5 upstream context detection was built. With Phase 0.5, many routing decisions can be made statically.
96
96
  - **`recommendedPipeline` from classify-task** includes `wr.discovery` for Medium/Large tasks, but the Three-Workflow Pipeline decision treats `wr.discovery` as optional. The coordinator must decide: use classify-task's `recommendedPipeline` verbatim, or treat it as a hint that can be overridden by static signals (e.g., pitch.md already present = skip discovery even if classify says Medium)?
97
97
 
98
98
  ### Evidence gaps
99
99
 
100
- 1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `classify-task-workflow`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
100
+ 1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `wr.classify-task`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
101
101
  2. No existing test harness for a multi-mode coordinator. `pr-review.ts` tests exist but only cover the review pipeline.
102
102
  3. The `worktrain-spawn.ts` CLI wiring for `spawnSession` is the only proven path to dispatch sessions from a coordinator script. No other dispatch mechanism has been tested.
103
103
 
@@ -122,7 +122,7 @@ trigger
122
122
 
123
123
  3. **Single coordinator file vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one mode. A monolithic adaptive coordinator handling all modes risks becoming unmaintainable. Per-mode coordinator functions (each independently testable) with a thin routing dispatcher is a cleaner architecture -- but introduces coordination between files.
124
124
 
125
- 4. **`recommendedPipeline` verbatim vs as a hint**: classify-task-workflow encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
125
+ 4. **`recommendedPipeline` verbatim vs as a hint**: wr.classify-task encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
126
126
 
127
127
  5. **Phase 0.5 vs coordinator routing for upstream context**: coding-task already auto-detects pitch.md. So the coordinator's routing decision for "skip wr.shaping?" partially duplicates Phase 0.5's detection. The coordinator should route based on what phases to _spawn_, not what the coding workflow will internally skip -- but these can diverge (coordinator spawns shaping but coding-task's Phase 0.5 would have skipped it anyway).
128
128
 
@@ -130,8 +130,8 @@ trigger
130
130
 
131
131
  - [ ] A `worktrain run pipeline --task "fix the race condition in auth.ts"` command routes to the correct pipeline mode and logs the routing decision before spawning any sessions
132
132
  - [ ] A task with `#123` or `PR #123` in the goal routes to REVIEW_ONLY without spawning discovery or shaping sessions
133
- - [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (coding-task-workflow-agentic only)
134
- - [ ] An ambiguous task (no static signal) routes to classify-task-workflow session, parses `recommendedPipeline`, and executes that pipeline
133
+ - [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (wr.coding-task only)
134
+ - [ ] An ambiguous task (no static signal) routes to wr.classify-task session, parses `recommendedPipeline`, and executes that pipeline
135
135
  - [ ] A `dep bump` or `chore:` task routes to QUICK_REVIEW (mr-review only, no arch audit) based on goal text heuristics
136
136
  - [ ] Any phase failure produces a `PipelineOutcome` with `escalated: true` and a structured `escalationReason` -- no silent substitution
137
137
  - [ ] The `CoordinatorDeps` interface for the adaptive coordinator extends or reuses the existing `CoordinatorDeps` pattern from `pr-review.ts`
@@ -139,8 +139,8 @@ trigger
139
139
 
140
140
  ### Assumptions not yet verified
141
141
 
142
- 1. `classify-task-workflow` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
143
- 2. The `recommendedPipeline` text can be reliably parsed from classify-task-workflow's note output using a regex or structured block parser
142
+ 1. `wr.classify-task` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
143
+ 2. The `recommendedPipeline` text can be reliably parsed from wr.classify-task's note output using a regex or structured block parser
144
144
  3. A new CLI subcommand `worktrain run pipeline` can be added following the same pattern as `worktrain run pr-review` in `src/cli-worktrain.ts`
145
145
  4. Pipeline modes can be named and bounded at design time (not open-ended)
146
146
 
@@ -151,17 +151,17 @@ trigger
151
151
  ### HMW (How Might We) reframes
152
152
 
153
153
  - HMW make the pipeline mode explicit in the trigger config so routing is never ambiguous, while still supporting dynamic routing for ad-hoc CLI invocations?
154
- - HMW use classify-task-workflow's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
154
+ - HMW use wr.classify-task's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
155
155
 
156
156
  ### Primary uncertainty (updated)
157
157
 
158
- Can classify-task-workflow's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
158
+ Can wr.classify-task's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
159
159
 
160
160
  ### Known approaches
161
161
 
162
- 1. **classify-task-workflow first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
162
+ 1. **wr.classify-task first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
163
163
  2. **Static heuristics** -- parse goal text and trigger metadata (PR number present, labels, pitch.md present, explicit pipelineMode flag on trigger). Zero LLM cost, covers well-defined cases.
164
- 3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `classify-task-workflow` is an optional fast path, not always required.
164
+ 3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `wr.classify-task` is an optional fast path, not always required.
165
165
  4. **Explicit `pipelineMode` on trigger** -- add a `pipelineMode` field to `TriggerDefinition` (or as a context variable). Users/triggers declare mode explicitly. Removes ambiguity but requires configuration overhead.
166
166
  5. **classify-task advisory + static overrides** -- run classify-task first (small cost, accurate), then apply static override rules on top of `recommendedPipeline` to handle well-known signals. Classify sets the baseline; static rules correct known exceptions.
167
167
 
@@ -221,8 +221,8 @@ function routeTask(goal: string, workspace: string): PipelineMode
221
221
  **Per-mode pipeline sequences:**
222
222
  - `REVIEW_ONLY`: `mr-review-workflow.agentic.v2` -> route by verdict (clean: merge, minor: fix-agent-loop, blocking: escalate)
223
223
  - `QUICK_REVIEW`: same as REVIEW_ONLY but `agentConfig: { model: 'haiku-light' }`, no arch audit even if touched
224
- - `IMPLEMENT`: `coding-task-workflow-agentic` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
225
- - `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> `mr-review-workflow.agentic.v2` -> merge
224
+ - `IMPLEMENT`: `wr.coding-task` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
225
+ - `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> `mr-review-workflow.agentic.v2` -> merge
226
226
 
227
227
  **Failure handling:** each phase failure returns a `PipelineOutcome` with `escalated: true` and `escalationReason`. No fallback to simpler pipeline. Same pattern as `PrOutcome` in pr-review.ts.
228
228
 
@@ -238,14 +238,14 @@ function routeTask(goal: string, workspace: string): PipelineMode
238
238
 
239
239
  ---
240
240
 
241
- ### Candidate B: classify-task-workflow as authoritative source (pure LLM routing)
241
+ ### Candidate B: wr.classify-task as authoritative source (pure LLM routing)
242
242
 
243
- **One-sentence summary:** The coordinator always spawns a `classify-task-workflow` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
243
+ **One-sentence summary:** The coordinator always spawns a `wr.classify-task` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
244
244
 
245
245
  **Architecture:**
246
246
  ```typescript
247
247
  async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
248
- const handle = await deps.spawnSession('classify-task-workflow', goal, workspace);
248
+ const handle = await deps.spawnSession('wr.classify-task', goal, workspace);
249
249
  const result = await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS);
250
250
  const notes = await deps.getAgentResult(handle);
251
251
  return parseRecommendedPipeline(notes.recapMarkdown); // pure function, text block parser
@@ -257,15 +257,15 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
257
257
 
258
258
  **Pipeline modes:** not named at the coordinator level -- the pipeline IS whatever classify-task returns. The coordinator just runs the sequence.
259
259
 
260
- **Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
260
+ **Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
261
261
 
262
262
  **Tensions resolved:** intelligent routing for ambiguous tasks; single source of truth for pipeline selection rules (the workflow, not the coordinator).
263
263
  **Tensions accepted:** non-deterministic (same task may classify differently); adds 5-15 second LLM latency per dispatch; `recommendedPipeline` is a string array of workflow IDs, not a typed discriminated union.
264
264
  **Failure mode to watch:** coordinator runs `wr.discovery` unnecessarily for PR-only tasks if classify-task misclassifies them. Recovery: add static pre-check before spawning classify-task.
265
- **Follows:** classify-task-workflow's existing decision rules are already correct; this candidate delegates trust to them.
265
+ **Follows:** wr.classify-task's existing decision rules are already correct; this candidate delegates trust to them.
266
266
  **Gain:** routing rules live in the workflow, not the coordinator -- can be updated without code changes.
267
267
  **Give up:** determinism, routing transparency (routing reason requires parsing LLM output), typed pipeline modes.
268
- **Impact surface:** classify-task-workflow becomes a critical dependency -- format changes break coordinator.
268
+ **Impact surface:** wr.classify-task becomes a critical dependency -- format changes break coordinator.
269
269
  **Scope judgment:** Best-fit for teams that want routing rules to evolve without code deployment.
270
270
  **Philosophy:** Honors dependency injection (classify-task as a boundary). Conflicts with determinism-over-cleverness (LLM routing is clever but non-deterministic).
271
271
 
@@ -273,7 +273,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
273
273
 
274
274
  ### Candidate C: static-first with LLM fallback (hybrid, recommended)
275
275
 
276
- **One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to classify-task-workflow only for ambiguous tasks where no static signal fires.
276
+ **One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to wr.classify-task only for ambiguous tasks where no static signal fires.
277
277
 
278
278
  **Architecture:**
279
279
  ```typescript
@@ -303,7 +303,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
303
303
  - `REVIEW_ONLY`: same as Candidate A
304
304
  - `QUICK_REVIEW`: same as Candidate A
305
305
  - `IMPLEMENT`: same as Candidate A
306
- - `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
306
+ - `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
307
307
  - `CLASSIFY_AND_RUN`: execute phases from classify-task output in order; unknown workflow IDs escalate
308
308
 
309
309
  **Failure handling:** escalation-first, same as pr-review.ts. The routing failure (classify-task parse failure) produces ESCALATE mode with reason.
@@ -314,7 +314,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
314
314
  **Follows:** parseFindingsFromNotes two-tier strategy pattern. CoordinatorDeps injection for the LLM fallback path.
315
315
  **Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
316
316
  **Give up:** complexity of two tiers; CLASSIFY_AND_RUN mode is not a named type with typed data.
317
- **Impact surface:** same as Candidate A plus classify-task-workflow dependency.
317
+ **Impact surface:** same as Candidate A plus wr.classify-task dependency.
318
318
  **Scope judgment:** Best-fit -- covers all named use cases efficiently. YAGNI risk is low because the LLM fallback adds ~30 lines of code, not a new architecture.
319
319
  **Philosophy:** Honors immutability, exhaustiveness (switch on PipelineMode is exhaustive), determinism-over-cleverness (static tier is deterministic, LLM is bounded fallback), errors-as-data.
320
320
 
@@ -421,7 +421,7 @@ Each mode coordinator is ~300-600 lines, fully independently testable. No mode-s
421
421
 
422
422
  ### Recommendation: C + E (Candidate C routing mechanism, Candidate E file architecture)
423
423
 
424
- **The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via classify-task-workflow. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
424
+ **The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via wr.classify-task. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
425
425
 
426
426
  **The architecture decision (E):** Per-mode coordinator files with a thin dispatcher is the correct architecture for 5 modes. Each mode file follows pr-review.ts independently. The dispatcher is the only code that changes when a new mode is added. This is how the codebase is already structured (pr-review.ts is one mode file) -- Candidate E just makes the pattern explicit.
427
427
 
@@ -447,7 +447,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
447
447
 
448
448
  ### Pivot conditions
449
449
 
450
- - If `classify-task-workflow` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
450
+ - If `wr.classify-task` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
451
451
  - If `TriggerDefinition` change is needed for automated workflows, add Candidate D's pipelineMode field
452
452
  - If context-passing agent's design shows that the coordinator must inject structured context at spawn time, the mode coordinator files must include context injection logic -- this is implementation detail, not a routing design change
453
453
 
@@ -466,7 +466,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
466
466
  1. **CLASSIFY_AND_RUN seam crack (genuine weakness, not blocking):** C's CLASSIFY_AND_RUN mode creates a typed/untyped seam in the dispatcher. Mitigation: CLASSIFY_AND_RUN fires only for tasks with no static signal; the dispatcher handles it with a dedicated `runClassifyAndRunPipeline` function that is documented as the "catch-all" path. Alternatively: fold CLASSIFY_AND_RUN into FULL (just run the three-workflow pipeline for all ambiguous tasks) and remove the LLM fallback entirely. This would make C = A for ambiguous tasks, simplifying the design.
467
467
  - **Final decision: simplify C by removing CLASSIFY_AND_RUN. Ambiguous tasks (no static signal) default to FULL. This gives Candidate A's simplicity with Candidate C's structure.**
468
468
 
469
- 2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No classify-task-workflow dependency at all for the initial implementation.
469
+ 2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No wr.classify-task dependency at all for the initial implementation.
470
470
 
471
471
  ### Final simplified design (A + E, not C + E)
472
472
 
@@ -489,7 +489,7 @@ Static rules (prioritized):
489
489
  3. `.workrail/current-pitch.md` exists -> `IMPLEMENT`
490
490
  4. else -> `FULL`
491
491
 
492
- **Why remove CLASSIFY_AND_RUN:** classify-task-workflow adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
492
+ **Why remove CLASSIFY_AND_RUN:** wr.classify-task adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
493
493
 
494
494
  **Architecture (E as designed):**
495
495
  ```
@@ -549,7 +549,7 @@ src/coordinators/
549
549
 
550
550
  1. **Routing determines spawn order, not context shape.** The routing layer (`routeTask()`) produces a `PipelineMode` variant. It does NOT know what context to pass to each spawned session. Context injection is entirely the responsibility of each mode coordinator (full-pipeline.ts, implement.ts, etc.), not the routing layer.
551
551
 
552
- 2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
552
+ 2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
553
553
 
554
554
  3. **pitch.md is the canonical Shaping->Coding handoff.** The `IMPLEMENT` mode routes directly to coding because `current-pitch.md` already exists. The coding-task Phase 0.5 detects it and uses it. If the context-passing agent introduces a different handoff mechanism (e.g., coordinator-injected context instead of a file), the `IMPLEMENT` mode coordinator needs to inject that context at spawn time rather than relying on Phase 0.5 file detection.
555
555
 
@@ -582,8 +582,8 @@ The adaptive coordinator uses **pure static routing with per-mode file decomposi
582
582
  |------|---------------|
583
583
  | `REVIEW_ONLY` | `mr-review-workflow.agentic.v2` → verdict routing (clean: merge, minor: fix-loop, blocking: escalate) |
584
584
  | `QUICK_REVIEW` | same as REVIEW_ONLY with lighter model config |
585
- | `IMPLEMENT` | `coding-task-workflow-agentic` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
586
- | `FULL` | `wr.discovery` → `wr.shaping` → `coding-task-workflow-agentic` → PR → `mr-review-workflow.agentic.v2` → merge |
585
+ | `IMPLEMENT` | `wr.coding-task` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
586
+ | `FULL` | `wr.discovery` → `wr.shaping` → `wr.coding-task` → PR → `mr-review-workflow.agentic.v2` → merge |
587
587
 
588
588
  **File architecture (Candidate E):**
589
589
  ```
@@ -633,7 +633,7 @@ const COORDINATOR_MAX_MS = 120 * 60 * 1000; // 120 min total coordinator wa
633
633
  - Routing decision is logged as traceability JSON before any session spawn
634
634
  - FULL pipeline: each phase is an independent escalation point (discovery-fail, shaping-fail, coding-fail each escalate independently)
635
635
 
636
- **Why LLM classification (classify-task-workflow) was excluded:**
636
+ **Why LLM classification (wr.classify-task) was excluded:**
637
637
 
638
638
  After adversarial challenge, CLASSIFY_AND_RUN mode was removed. The LLM classification path adds non-determinism and format-parsing fragility (notes parsing vs typed artifact) for no concrete MVP benefit. All 5 stated use cases are covered by static rules. The upgrade path to add classify-task as a Tier 2 fallback exists when evidence shows >5% misrouting in production.
639
639
 
@@ -46,7 +46,7 @@ WorkRail defines three distinct tiers of execution. The system automatically sel
46
46
  How does WorkRail know which tier to use? It uses a **"Verify then Delegate"** pattern (The Probe Protocol).
47
47
 
48
48
  ### 1. The Boot Check (Diagnostic Phase)
49
- When a session starts (or via the `workflow-diagnose-environment` workflow), WorkRail guides the Main Agent to probe the environment:
49
+ When a session starts (or via the `wr.diagnose-environment` workflow), WorkRail guides the Main Agent to probe the environment:
50
50
 
51
51
  1. **Check for Subagents:** "Do you have a 'Researcher' subagent?"
52
52
  * *No:* **Fallback to Tier 1 (Solo).**
@@ -74,7 +74,7 @@ When executing a workflow step that calls for a specialized routine:
74
74
 
75
75
  To support this protocol, WorkRail provides:
76
76
 
77
- 1. **The Diagnostic Workflow:** A guided utility (`workflow-diagnose-environment.json`) to help users verify and configure their agents.
77
+ 1. **The Diagnostic Workflow:** A guided utility (`wr.diagnose-environment.json`) to help users verify and configure their agents.
78
78
  2. **The Asset Pack:** Standardized definitions for common roles (Researcher, Architect, Builder, Reviewer) that users can copy-paste into their IDE configs.
79
79
  * Includes System Prompts (for Tiers 1-3).
80
80
  * Includes Tool Whitelists (for enabling Tier 3).