npm - @exaudeus/workrail - Versions diffs - 3.66.0 → 3.68.0 - Mend

@exaudeus/workrail 3.66.0 → 3.68.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (150) hide show

package/dist/application/services/compiler/template-registry.js +10 -1
package/dist/application/validation.js +1 -1
package/dist/cli/commands/worktrain-init.js +1 -1
package/dist/console/standalone-console.js +4 -1
package/dist/console-ui/assets/{index-BynU38Vu.js → index-CyzltI6D.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/coordinators/modes/full-pipeline.js +4 -4
package/dist/coordinators/modes/implement-shared.js +5 -5
package/dist/coordinators/modes/implement.js +4 -4
package/dist/coordinators/pr-review.js +4 -4
package/dist/daemon/workflow-runner.d.ts +1 -0
package/dist/daemon/workflow-runner.js +1 -0
package/dist/infrastructure/storage/schema-validating-workflow-storage.d.ts +21 -2
package/dist/infrastructure/storage/schema-validating-workflow-storage.js +48 -0
package/dist/manifest.json +41 -41
package/dist/mcp/handlers/v2-workflow.js +24 -7
package/dist/mcp/output-schemas.d.ts +36 -0
package/dist/mcp/output-schemas.js +11 -1
package/dist/mcp/workflow-protocol-contracts.js +2 -2
package/dist/v2/projections/session-metrics.d.ts +1 -1
package/dist/v2/projections/session-metrics.js +16 -35
package/dist/v2/usecases/console-routes.d.ts +2 -2
package/docs/authoring-v2.md +4 -4
package/docs/changelog-recent.md +3 -3
package/docs/configuration.md +1 -1
package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
package/docs/design/adaptive-coordinator-context.md +1 -1
package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
package/docs/design/adaptive-coordinator-routing-review.md +1 -1
package/docs/design/adaptive-coordinator-routing.md +34 -34
package/docs/design/agent-cascade-protocol.md +2 -2
package/docs/design/console-daemon-separation-discovery.md +323 -0
package/docs/design/context-assembly-design-candidates.md +1 -1
package/docs/design/context-assembly-implementation-plan.md +1 -1
package/docs/design/context-assembly-layer.md +2 -2
package/docs/design/context-assembly-review-findings.md +1 -1
package/docs/design/coordinator-access-audit.md +293 -0
package/docs/design/coordinator-architecture-audit.md +62 -0
package/docs/design/coordinator-error-handling-audit.md +240 -0
package/docs/design/coordinator-testability-audit.md +426 -0
package/docs/design/daemon-architecture-discovery.md +1 -1
package/docs/design/daemon-console-separation-discovery.md +242 -0
package/docs/design/daemon-memory-audit.md +203 -0
package/docs/design/design-candidates-console-daemon-separation.md +256 -0
package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
package/docs/design/discovery-loop-fix-candidates.md +161 -0
package/docs/design/discovery-loop-fix-design-review.md +106 -0
package/docs/design/discovery-loop-fix-validation.md +258 -0
package/docs/design/discovery-loop-investigation-A.md +188 -0
package/docs/design/discovery-loop-investigation-B.md +287 -0
package/docs/design/exploration-workflow-candidates.md +205 -0
package/docs/design/exploration-workflow-design-review.md +166 -0
package/docs/design/exploration-workflow-discovery.md +443 -0
package/docs/design/ide-context-files-candidates.md +231 -0
package/docs/design/ide-context-files-design-review.md +85 -0
package/docs/design/ide-context-files.md +615 -0
package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
package/docs/design/in-process-http-audit.md +190 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
package/docs/design/loadSessionNotes-candidates.md +108 -0
package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
package/docs/design/probe-session-design-candidates.md +261 -0
package/docs/design/probe-session-phase0.md +490 -0
package/docs/design/routines-guide.md +7 -7
package/docs/design/session-metrics-attribution-candidates.md +250 -0
package/docs/design/session-metrics-attribution-design-review.md +115 -0
package/docs/design/session-metrics-attribution-discovery.md +319 -0
package/docs/design/session-metrics-candidates.md +227 -0
package/docs/design/session-metrics-design-review.md +104 -0
package/docs/design/session-metrics-discovery.md +454 -0
package/docs/design/spawn-session-debug.md +202 -0
package/docs/design/trigger-validator-candidates.md +214 -0
package/docs/design/trigger-validator-review.md +109 -0
package/docs/design/trigger-validator-shaping-phase0.md +239 -0
package/docs/design/trigger-validator.md +454 -0
package/docs/design/v2-core-design-locks.md +2 -2
package/docs/design/workflow-extension-points.md +15 -15
package/docs/design/workflow-id-validation-at-startup.md +1 -1
package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
package/docs/design/worktrain-task-queue-candidates.md +5 -5
package/docs/design/worktrain-task-queue.md +4 -4
package/docs/discovery/coordinator-script-design.md +1 -1
package/docs/discovery/coordinator-ux-discovery.md +3 -3
package/docs/discovery/simulation-report.md +1 -1
package/docs/discovery/workflow-modernization-discovery.md +326 -0
package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
package/docs/discovery/worktrain-status-briefing.md +1 -1
package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
package/docs/docker.md +1 -1
package/docs/ideas/backlog.md +227 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
package/docs/integrations/claude-code.md +5 -5
package/docs/integrations/firebender.md +1 -1
package/docs/plans/agentic-orchestration-roadmap.md +2 -2
package/docs/plans/mr-review-workflow-redesign.md +9 -9
package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
package/docs/plans/ui-ux-workflow-discovery.md +2 -2
package/docs/plans/workflow-categories-candidates.md +8 -8
package/docs/plans/workflow-categories-discovery.md +4 -4
package/docs/plans/workflow-modernization-design.md +430 -0
package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
package/docs/plans/workflow-staleness-detection-review.md +4 -4
package/docs/plans/workflow-staleness-detection.md +9 -9
package/docs/plans/workrail-platform-vision.md +3 -3
package/docs/reference/agent-context-cleaner-snippet.md +1 -1
package/docs/reference/agent-context-guidance.md +4 -4
package/docs/reference/context-optimization.md +2 -2
package/docs/roadmap/now-next-later.md +2 -2
package/docs/roadmap/open-work-inventory.md +16 -16
package/docs/workflows.md +31 -31
package/package.json +1 -1
package/spec/workflow-tags.json +47 -47
package/workflows/adaptive-ticket-creation.json +16 -16
package/workflows/architecture-scalability-audit.json +22 -22
package/workflows/bug-investigation.agentic.v2.json +3 -3
package/workflows/classify-task-workflow.json +1 -1
package/workflows/coding-task-workflow-agentic.json +6 -6
package/workflows/cross-platform-code-conversion.v2.json +8 -8
package/workflows/document-creation-workflow.json +8 -8
package/workflows/documentation-update-workflow.json +8 -8
package/workflows/intelligent-test-case-generation.json +2 -2
package/workflows/learner-centered-course-workflow.json +2 -2
package/workflows/mr-review-workflow.agentic.v2.json +4 -4
package/workflows/personal-learning-materials-creation-branched.json +8 -8
package/workflows/presentation-creation.json +5 -5
package/workflows/production-readiness-audit.json +1 -1
package/workflows/relocation-workflow-us.json +31 -31
package/workflows/routines/context-gathering.json +1 -1
package/workflows/routines/design-review.json +1 -1
package/workflows/routines/execution-simulation.json +1 -1
package/workflows/routines/feature-implementation.json +3 -3
package/workflows/routines/final-verification.json +1 -1
package/workflows/routines/hypothesis-challenge.json +1 -1
package/workflows/routines/ideation.json +1 -1
package/workflows/routines/parallel-work-partitioning.json +3 -3
package/workflows/routines/philosophy-alignment.json +2 -2
package/workflows/routines/plan-analysis.json +1 -1
package/workflows/routines/plan-generation.json +1 -1
package/workflows/routines/tension-driven-design.json +6 -6
package/workflows/scoped-documentation-workflow.json +26 -26
package/workflows/ui-ux-design-workflow.json +14 -14
package/workflows/workflow-diagnose-environment.json +1 -1
package/workflows/workflow-for-workflows.json +32 -77
package/workflows/workflow-for-workflows.v2.json +0 -788

package/dist/v2/projections/session-metrics.d.ts CHANGED Viewed

@@ -4,7 +4,7 @@ export interface SessionMetricsV2 {
     readonly endGitSha: string | null;
     readonly gitBranch: string | null;
     readonly agentCommitShas: readonly string[];
-    readonly captureConfidence: 'high' | 'medium' | 'none';
+    readonly captureConfidence: 'high' | 'none';
     readonly durationMs: number | undefined;
     readonly outcome: 'success' | 'partial' | 'abandoned' | 'error' | null;
     readonly prNumbers: readonly number[];

package/dist/v2/projections/session-metrics.js CHANGED Viewed

@@ -3,24 +3,22 @@ Object.defineProperty(exports, "__esModule", { value: true });
 exports.projectSessionMetricsV2 = projectSessionMetricsV2;
 const constants_js_1 = require("../durable-core/constants.js");
 function projectSessionMetricsV2(events) {
-    let runCompletedData = null;
-    let runCompletedRunId = null;
+    let runCompleted = null;
     for (const e of events) {
-        const asUnknown = e;
-        if (asUnknown.kind === 'run_completed') {
-            runCompletedData = asUnknown.data;
-            runCompletedRunId = asUnknown.scope?.runId ?? null;
+        if (e.kind === 'run_completed') {
+            runCompleted = e;
             break;
         }
     }
-    if (runCompletedData === null) {
+    if (runCompleted === null) {
         return null;
     }
+    const runCompletedRunId = runCompleted.scope.runId;
     const metricsContext = {};
     for (const e of events) {
         if (e.kind !== constants_js_1.EVENT_KIND.CONTEXT_SET)
             continue;
-        if (runCompletedRunId !== null && e.scope?.runId !== runCompletedRunId)
+        if (e.scope?.runId !== runCompletedRunId)
             continue;
         const ctx = e.data.context;
         if (!ctx || typeof ctx !== 'object' || Array.isArray(ctx))
@@ -32,25 +30,15 @@ function projectSessionMetricsV2(events) {
             }
         }
     }
-    const d = runCompletedData;
+    const d = runCompleted.data;
     const startGitSha = typeof d.startGitSha === 'string' ? d.startGitSha : null;
     const endGitSha = typeof d.endGitSha === 'string' ? d.endGitSha : null;
     const gitBranch = typeof d.gitBranch === 'string' ? d.gitBranch : null;
-    const agentCommitShas = [];
-    if (Array.isArray(d.agentCommitShas)) {
-        for (const sha of d.agentCommitShas) {
-            if (typeof sha === 'string') {
-                agentCommitShas.push(sha);
-            }
-        }
-    }
-    const captureConfidenceRaw = d.captureConfidence;
-    const captureConfidence = captureConfidenceRaw === 'high' || captureConfidenceRaw === 'medium' || captureConfidenceRaw === 'none'
-        ? captureConfidenceRaw
-        : 'none';
-    const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs)
-        ? d.durationMs
-        : undefined;
+    const agentCommitShas = Array.isArray(d.agentCommitShas)
+        ? d.agentCommitShas.filter((s) => typeof s === 'string')
+        : [];
+    const durationMs = typeof d.durationMs === 'number' && Number.isFinite(d.durationMs) ? d.durationMs : undefined;
+    const captureConfidence = d.captureConfidence === 'high' ? 'high' : 'none';
     const outcomeRaw = metricsContext['metrics_outcome'];
     const outcome = outcomeRaw === 'success' || outcomeRaw === 'partial' || outcomeRaw === 'abandoned' || outcomeRaw === 'error'
         ? outcomeRaw
@@ -68,24 +56,17 @@ function projectSessionMetricsV2(events) {
     const metricCommitShas = [];
     if (Array.isArray(commitShasRaw)) {
         for (const sha of commitShasRaw) {
-            if (typeof sha === 'string') {
+            if (typeof sha === 'string')
                 metricCommitShas.push(sha);
-            }
         }
     }
     const finalAgentCommitShas = metricCommitShas.length > 0 ? metricCommitShas : agentCommitShas;
     const filesChangedRaw = metricsContext['metrics_files_changed'];
-    const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw)
-        ? filesChangedRaw
-        : null;
+    const filesChanged = typeof filesChangedRaw === 'number' && Number.isFinite(filesChangedRaw) ? filesChangedRaw : null;
     const linesAddedRaw = metricsContext['metrics_lines_added'];
-    const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw)
-        ? linesAddedRaw
-        : null;
+    const linesAdded = typeof linesAddedRaw === 'number' && Number.isFinite(linesAddedRaw) ? linesAddedRaw : null;
     const linesRemovedRaw = metricsContext['metrics_lines_removed'];
-    const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw)
-        ? linesRemovedRaw
-        : null;
+    const linesRemoved = typeof linesRemovedRaw === 'number' && Number.isFinite(linesRemovedRaw) ? linesRemovedRaw : null;
     return {
         startGitSha,
         endGitSha,

package/dist/v2/usecases/console-routes.d.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 import type { Application } from 'express';
 import type { ConsoleService } from './console-service.js';
-import type { WorkflowService } from '../../application/services/workflow-service.js';
+import type { IWorkflowReader } from '../../types/storage.js';
 import type { ToolCallTimingRingBuffer } from '../../mcp/tool-call-timing.js';
 import type { V2ToolContext } from '../../mcp/types.js';
-export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?: WorkflowService, timingRingBuffer?: ToolCallTimingRingBuffer, toolCallsPerfFile?: string, serverVersion?: string, v2ToolContext?: V2ToolContext): () => void;
+export declare function mountConsoleRoutes(app: Application, consoleService: ConsoleService, workflowService?: IWorkflowReader, timingRingBuffer?: ToolCallTimingRingBuffer, toolCallsPerfFile?: string, serverVersion?: string, v2ToolContext?: V2ToolContext): () => void;

package/docs/authoring-v2.md CHANGED Viewed

@@ -240,7 +240,7 @@ Profile selection guide:
 | `"research"` | Workflow produces a finding or recommendation but no commits | Outcome-only reminder on final step only |
 | `"none"` or absent | Meta-workflows, utilities, authoring tools | No injection -- existing behavior unchanged |
-The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
+The engine does NOT derive the profile from tags automatically. Authors must set this field explicitly. When using `wr.workflow-for-workflows` to author or modernize a workflow, the `phase-7b` step will prompt you for this decision.
 **Final step detection**: The engine injects the final-step footer on the last top-level step, or on the exit step of a loop that is the last top-level step. A loop in a non-terminal position does not trigger the final-step footer on its exit step.
@@ -551,11 +551,11 @@ To keep authoring simple:
 Workflows can drift out of sync with the authoring spec they were written against. WorkRail surfaces this as a `staleness` signal in `list_workflows` and `inspect_workflow` output.
-**How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
+**How it works:** Workflows carry an optional `validatedAgainstSpecVersion` field stamped by `wr.workflow-for-workflows` after the quality gate passes. The engine compares this against the current `spec/authoring-spec.json` version at list/inspect time and returns:
 - `none` — workflow was validated against the current spec version
 - `likely` — spec was updated since the workflow was last reviewed
-- `possible` — workflow has never been run through `workflow-for-workflows`
+- `possible` — workflow has never been run through `wr.workflow-for-workflows`
 **Stamping a workflow:**
@@ -564,7 +564,7 @@ npm run stamp-workflow -- workflows/my-workflow.json
 git add workflows/my-workflow.json && git commit -m "chore: stamp workflow"
 ```
-The stamp must be committed to take effect. The `workflow-for-workflows` Phase 7 step includes a reminder to do this.
+The stamp must be committed to take effect. The `wr.workflow-for-workflows` Phase 7 step includes a reminder to do this.
 **Visibility:** By default, the staleness signal is only shown for user-owned/imported workflows (`personal`, `rooted_sharing`, `external`). Built-in and legacy_project workflows are excluded. Set `WORKRAIL_DEV=1` to see staleness for all categories (useful for catalog maintenance).

package/docs/changelog-recent.md CHANGED Viewed

@@ -117,7 +117,7 @@ Structured migration workflow for moving code between platforms (Android to iOS,
 Since you've created workflows yourself, these changes are directly relevant.
-### `workflow-for-workflows.v2.json` was rebuilt
+### `wr.workflow-for-workflows.v2.json` was rebuilt
 The workflow used to create or modernize other workflows was significantly redesigned. The full phase structure now includes:
@@ -179,12 +179,12 @@ A visual catalog of every available workflow. Eight category filter pills. Click
 WorkRail can now detect when a workflow hasn't been reviewed against the current authoring spec. Three signal levels:
 - `none` -- validated against the current spec (has a version stamp and it's current)
-- `possible` -- no version stamp (was never run through `workflow-for-workflows`)
+- `possible` -- no version stamp (was never run through `wr.workflow-for-workflows`)
 - `likely` -- has a stamp, but the spec has been updated since the workflow was last reviewed
 This shows up in `list_workflows` output (agents see it) and in the CI registry validation check. It's shown only for non-built-in workflows -- built-in workflows ship with their own quality process and don't show staleness signals.
-**What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
+**What this means for your team:** Your team's existing workflows will show as `possible` (no stamp) until they're run through `wr.workflow-for-workflows.v2.json`. That's expected -- it's not an error, just a signal that they haven't been through the new quality gate. Over time, as you modernize them, they'll show `none`.
 ---

package/docs/configuration.md CHANGED Viewed

@@ -407,7 +407,7 @@ When `isComplete: true` is returned, summarize all work done across the workflow
 After creating this file, the agent becomes available via the Agent tool:
 ```
-Agent(subagent_type="workrail-executor", prompt="Start the bug-investigation-agentic workflow...")
+Agent(subagent_type="workrail-executor", prompt="Start the wr.bug-investigation workflow...")
 ```
 ### Cursor

package/docs/design/adaptive-coordinator-context-candidates.md CHANGED Viewed

@@ -59,7 +59,7 @@ Must stay consistent with:
 - `src/v2/durable-core/schemas/artifacts/` (typed artifact schemas)
 - `workflows/wr.discovery.json` (Phase 7 -- if emitting artifact)
 - `workflows/wr.shaping.json` (Step 1 -- if adding file search)
-- `workflows/coding-task-workflow-agentic.json` (Phase 0.5 -- no changes expected)
+- `workflows/wr.coding-task.json` (Phase 0.5 -- no changes expected)
 ---

package/docs/design/adaptive-coordinator-context.md CHANGED Viewed

@@ -73,7 +73,7 @@ WorkTrain sessions are fully isolated. Each spawned session starts from the work
 **1. File-based handoff (wr.shaping -> coding):**
 - `wr.shaping` Step 9 writes `.workrail/current-pitch.md` at the workspace path
-- `coding-task-workflow-agentic` Phase 0.5 actively searches for upstream docs via repo search, WebFetch, MCP integrations
+- `wr.coding-task` Phase 0.5 actively searches for upstream docs via repo search, WebFetch, MCP integrations
 - Phase 0.5 would find `.workrail/current-pitch.md` automatically
 - **Status: effectively already works** -- no coordinator intervention needed for Shaping->Coding

package/docs/design/adaptive-coordinator-routing-candidates.md CHANGED Viewed

@@ -16,9 +16,9 @@
 3. **Monolithic coordinator vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one pipeline mode. Five modes in one file would be unmanageable. The right architecture decomposes into mode files with a thin dispatcher -- but this requires deciding the seam deliberately.
-4. **`recommendedPipeline` verbatim vs advisory**: If classify-task-workflow's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
+4. **`recommendedPipeline` verbatim vs advisory**: If wr.classify-task's pipeline output is authoritative, the coordinator cannot apply static overrides. If advisory, the coordinator re-implements routing and classify-task's rules become redundant for common cases.
-5. **Phase 0.5 vs coordinator routing for upstream context**: `coding-task-workflow-agentic` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
+5. **Phase 0.5 vs coordinator routing for upstream context**: `wr.coding-task` Phase 0.5 auto-detects `pitch.md`. The coordinator's "should I skip shaping?" routing decision partially overlaps with this detection. They must agree.
 ### What the codebase already solves (and how)
@@ -29,12 +29,12 @@
 - Escalation-first: every failure produces `escalated: true` + `escalationReason`, never silent substitution
 - TRACE log before acting on routing decision
-**`classify-task-workflow.json`:**
+**`wr.classify-task.json`:**
 - Exists as of v3.40.0. Single LLM step, no tools, outputs `recommendedPipeline` as ordered workflow ID array
 - Output format: structured text block with `recommendedPipeline: ["...", "..."]` line
 - Note: `spawn_agent` does NOT return artifacts (v3.40.0 limitation #5) -- output must be read via `spawnSession` + `awaitSessions` + `getAgentResult` + note parsing
-**Phase 0.5 (`coding-task-workflow-agentic`):**
+**Phase 0.5 (`wr.coding-task`):**
 - Already detects `pitch.md` and sets `solutionFixed=true`, skipping design phases
 - The coordinator's "IMPLEMENT mode" (skip discovery/shaping) and Phase 0.5 are complementary, not conflicting
@@ -78,7 +78,7 @@ From CLAUDE.md (stated) and pr-review.ts (practiced):
 - `src/cli-worktrain.ts` -- needs `worktrain run pipeline` subcommand wiring
 - `src/coordinators/pr-review.ts` -- must remain unchanged; new coordinator is additive
 - `src/trigger/types.ts` -- if Candidate D's `pipelineMode` field is added; otherwise unchanged
-- `workflows/classify-task-workflow.json` -- coordinator depends on its note output format; format changes break parsing
+- `workflows/wr.classify-task.json` -- coordinator depends on its note output format; format changes break parsing
 - `src/coordinators/routing/route-task.ts` (new) -- pure routing function; all mode selection logic lives here
 - `src/coordinators/modes/*.ts` (new files) -- each mode's pipeline execution logic
 - Test suite: each mode coordinator needs its own unit tests with `CoordinatorDeps` fakes
@@ -110,8 +110,8 @@ type PipelineMode =
 **Per-mode pipelines:**
 - REVIEW_ONLY: `mr-review-workflow.agentic.v2` -> route by verdict
 - QUICK_REVIEW: same + light model config, no arch audit override
-- IMPLEMENT: `coding-task-workflow-agentic` (Phase 0.5 picks up pitch) -> PR -> review -> merge
-- FULL: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
+- IMPLEMENT: `wr.coding-task` (Phase 0.5 picks up pitch) -> PR -> review -> merge
+- FULL: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
 **Tensions resolved:** determinism, YAGNI, no LLM latency.
 **Tensions accepted:** all ambiguous tasks fall to FULL (wasteful for Medium complexity tasks that don't need full discovery).
@@ -125,14 +125,14 @@ type PipelineMode =
 ---
-### Candidate B: classify-task-workflow as authoritative source
+### Candidate B: wr.classify-task as authoritative source
-**Summary:** Always spawn `classify-task-workflow` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
+**Summary:** Always spawn `wr.classify-task` first, parse `recommendedPipeline` output, execute the returned workflow sequence. Pipeline modes are not named at the coordinator level.
 **Architecture:**
 ```typescript
 async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
-  const handle = await deps.spawnSession('classify-task-workflow', `Classify: ${goal}`, workspace);
+  const handle = await deps.spawnSession('wr.classify-task', `Classify: ${goal}`, workspace);
   await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS); // 3 minutes max
   const agentResult = await deps.getAgentResult(handle);
   return parseRecommendedPipeline(agentResult.recapMarkdown);
@@ -141,12 +141,12 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
 `parseRecommendedPipeline` is a pure function parsing the text block (two-tier: JSON array first, regex fallback).
-**Fallback:** if parsing fails, default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
+**Fallback:** if parsing fails, default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`.
 **Tensions resolved:** intelligent routing for all tasks including ambiguous ones; single source of truth for pipeline selection rules.
 **Tensions accepted:** non-deterministic; 5-15 second LLM latency per dispatch; no typed `PipelineMode` discriminated union (pipeline is a string[] at coordinator level).
-**Boundary:** classify-task-workflow is the routing authority; coordinator is a runner.
-**Failure mode:** classify-task-workflow misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
+**Boundary:** wr.classify-task is the routing authority; coordinator is a runner.
+**Failure mode:** wr.classify-task misclassifies a PR-only task and returns discovery+coding phases, wasting 30+ minutes. Recovery: add a pre-check for PR number before spawning classify-task (hybrid).
 **Repo pattern:** departs from determinism-over-cleverness principle. No named discriminated union.
 **Gain:** routing rules live in a workflow file -- updatable without code deployment.
 **Give up:** determinism, transparency, typed modes, dispatch speed for obvious cases.
@@ -157,7 +157,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
 ### Candidate C: Static-first with LLM fallback (hybrid, recommended for routing)
-**Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to classify-task-workflow for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
+**Summary:** Two-tier `routeTask()`: Tier 1 applies static rules (pure, covers 80% of cases); Tier 2 falls back to wr.classify-task for ambiguous tasks and returns a `CLASSIFY_AND_RUN` mode.
 **PipelineMode type (6 variants):**
 ```typescript
@@ -180,7 +180,7 @@ async function routeTask(
   // Tier 1: static (pure, no I/O except filesystem check for pitch.md)
   const staticMode = applyStaticRules(goal, workspace);
   if (staticMode !== null) return ok(staticMode);
-  // Tier 2: classify-task-workflow
+  // Tier 2: wr.classify-task
   const classified = await runClassification(goal, workspace, deps);
   if (classified.kind === 'err') return err(`classification failed: ${classified.error}`);
   return ok({ kind: 'CLASSIFY_AND_RUN', classifiedPipeline: classified.value, goal });
@@ -293,7 +293,7 @@ export async function runAdaptivePipeline(
 ### Recommendation: Candidate C (routing mechanism) + Candidate E (architecture)
-**Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via classify-task-workflow. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
+**Routing (C):** Two-tier static-first with LLM fallback. Static rules cover the 4 well-defined cases at zero cost and latency. `CLASSIFY_AND_RUN` handles genuinely ambiguous tasks via wr.classify-task. This precisely mirrors the `parseFindingsFromNotes` two-tier strategy already established in `pr-review.ts`.
 **Architecture (E):** Per-mode coordinator files with thin dispatcher. Each mode file follows `pr-review.ts` independently. The dispatcher's `switch(mode.kind)` is exhaustive with `assertNever`. Adding a new mode is additive.
@@ -301,7 +301,7 @@ export async function runAdaptivePipeline(
 ### Why not A (pure static)?
-Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with classify-task-workflow returning `['coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`.
+Candidate A is simpler but all tasks without static signals fall to FULL (wr.discovery + wr.shaping + coding). A genuinely vague idea needs FULL. But a Medium complexity coding task with no pitch.md and no PR number -- e.g., `"refactor auth.ts to use Result types"` -- also falls to FULL, running unnecessary discovery phases. Candidate C covers this case with wr.classify-task returning `['wr.coding-task', 'mr-review-workflow.agentic.v2']`.
 ### Why not B (pure LLM)?
@@ -319,7 +319,7 @@ Non-deterministic routing is unacceptable for the coordinator. A PR review task
 ### Pivot conditions
-1. If classify-task-workflow format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
+1. If wr.classify-task format drifts and `parseRecommendedPipeline` fails more than 10% of the time -> pivot to pure static (Candidate A) and accept FULL as default for ambiguous tasks
 2. If trigger operators need deterministic routing for automated workflows -> add `pipelineMode` to TriggerDefinition (Candidate D addition)
 3. If context-passing agent's design requires structured handoff data from routing to mode executors -> add a `contextBundle` field to mode types (implementation change, not routing design change)

package/docs/design/adaptive-coordinator-routing-review.md CHANGED Viewed

@@ -151,6 +151,6 @@ Even though not called at MVP, having the pure function ready preserves the upgr
 1. **wr.discovery output standardization**: the routing design assumes wr.discovery notes are injected by the coordinator as `assembledContextSummary` for wr.shaping. But wr.discovery's `designDocPath` output location is not standardized (finding from context-passing agent's doc). The FULL mode executor must parse `lastStepNotes` from the discovery session to build the shaping context -- this is per the context-passing agent's Candidate D (coordinator-injected text). This concern is correctly owned by the context-passing design, not the routing design.
-2. **classify-task-workflow format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
+2. **wr.classify-task format stability**: if `parseRecommendedPipeline()` is written as a pure function now, it has no tests against real classify-task output. The function should include an integration test stub that documents the expected format.
 3. **REVIEW_ONLY vs pr-review coordinator**: the existing `worktrain run pr-review` command already provides REVIEW_ONLY+QUICK_REVIEW behavior. The new `worktrain run pipeline --mode review_only` should either (a) delegate to pr-review coordinator, or (b) reimplement the same logic in `modes/review-only.ts`. Recommendation: (a) delegate -- avoid duplicating the fix-agent loop logic. Document this delegation explicitly.

package/docs/design/adaptive-coordinator-routing.md CHANGED Viewed

@@ -22,7 +22,7 @@
 **Chosen path:** `design_first`
-**Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `classify-task-workflow.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
+**Rationale:** The goal was stated as a solution (a coordinator with a routing/classification layer). The risk is designing the wrong routing mechanism. The landscape is well-understood from existing code (`pr-review.ts`, `wr.classify-task.json`). The dominant risk is not lack of knowledge -- it is solving the wrong subproblem (e.g., treating all routing as LLM classification when static heuristics cover most cases, or treating one monolithic script as the right shape when decomposition into per-mode coordinators may be cleaner).
 ---
@@ -58,12 +58,12 @@ If a chat rewind occurs: the notes and context variables survive; this file may
 **What exists:**
 - `src/coordinators/pr-review.ts` -- 1462-line hardcoded coordinator for PR review. Establishes the `CoordinatorDeps` injectable interface (16 methods), `spawnSession`/`awaitSessions`/`getAgentResult` pattern, fix-agent loop with escalation-first failure policy.
-- `workflows/classify-task-workflow.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
+- `workflows/wr.classify-task.json` -- EXISTS as of v3.40.0 (contrary to Apr 15 backlog entry that listed it as missing). Single LLM step, no tools, outputs 7 variables including `recommendedPipeline` (ordered workflow ID array with decision rules already encoded).
 - `src/cli-worktrain.ts` -- wires `worktrain run pr-review` subcommand. No `worktrain run pipeline` or adaptive coordinator command exists yet.
 - `src/trigger/types.ts` -- `TriggerDefinition` has `workflowId`, `goal`, `goalTemplate`, `contextMapping`, `agentConfig`. No `pipelineMode` field.
-- Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> coding-task-workflow-agentic`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
+- Three-Workflow Pipeline decision (Apr 18): `wr.discovery -> wr.shaping -> wr.coding-task`. Phase 0.5 in coding-task detects pitch.md and sets `solutionFixed=true` to skip design phases.
 - `wr.shaping` and `wr.discovery` workflows both exist as of v3.40.0.
-- `coding-task-workflow-agentic` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
+- `wr.coding-task` Phase 0.5 detects upstream context (pitch.md, BRD, PRD, etc.).
 **The Apr 15 backlog full pipeline DAG** (still relevant design intent):
 ```
@@ -91,13 +91,13 @@ trigger
 ### Contradictions and tensions
-- **classify-task-workflow is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/classify-task-workflow.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
+- **wr.classify-task is listed as NOT YET BUILT in the Apr 15 backlog** but the file `workflows/wr.classify-task.json` exists today (v3.40.0, Apr 19). This is resolved: it was built between Apr 15 and Apr 19.
 - **"Always run classify-task first"** (Apr 15 backlog) vs. **"Static heuristics for well-known cases"** (primary uncertainty). The Apr 15 backlog says "always" but this was written before Phase 0.5 upstream context detection was built. With Phase 0.5, many routing decisions can be made statically.
 - **`recommendedPipeline` from classify-task** includes `wr.discovery` for Medium/Large tasks, but the Three-Workflow Pipeline decision treats `wr.discovery` as optional. The coordinator must decide: use classify-task's `recommendedPipeline` verbatim, or treat it as a hint that can be overridden by static signals (e.g., pitch.md already present = skip discovery even if classify says Medium)?
 ### Evidence gaps
-1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `classify-task-workflow`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
+1. Does `spawn_agent` (the in-workflow tool) return the `recommendedPipeline` output variable from `wr.classify-task`? The backlog note says `spawn_agent` currently does NOT return `artifacts` (limitation #5 in v3.40.0 current state). This means the coordinator script cannot use `spawn_agent` to run classify-task and read output -- it must use `spawnSession` + `getAgentResult` + parse the notes, just as `pr-review.ts` does for verdict artifacts.
 2. No existing test harness for a multi-mode coordinator. `pr-review.ts` tests exist but only cover the review pipeline.
 3. The `worktrain-spawn.ts` CLI wiring for `spawnSession` is the only proven path to dispatch sessions from a coordinator script. No other dispatch mechanism has been tested.
@@ -122,7 +122,7 @@ trigger
 3. **Single coordinator file vs per-mode decomposition**: `pr-review.ts` is 1462 lines for one mode. A monolithic adaptive coordinator handling all modes risks becoming unmaintainable. Per-mode coordinator functions (each independently testable) with a thin routing dispatcher is a cleaner architecture -- but introduces coordination between files.
-4. **`recommendedPipeline` verbatim vs as a hint**: classify-task-workflow encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
+4. **`recommendedPipeline` verbatim vs as a hint**: wr.classify-task encodes pipeline selection rules. If the coordinator uses these verbatim, it cannot apply static overrides (e.g., pitch.md present -> skip discovery). If it treats them as hints, it re-implements routing logic and classify-task's rules become advisory only.
 5. **Phase 0.5 vs coordinator routing for upstream context**: coding-task already auto-detects pitch.md. So the coordinator's routing decision for "skip wr.shaping?" partially duplicates Phase 0.5's detection. The coordinator should route based on what phases to _spawn_, not what the coding workflow will internally skip -- but these can diverge (coordinator spawns shaping but coding-task's Phase 0.5 would have skipped it anyway).
@@ -130,8 +130,8 @@ trigger
 - [ ] A `worktrain run pipeline --task "fix the race condition in auth.ts"` command routes to the correct pipeline mode and logs the routing decision before spawning any sessions
 - [ ] A task with `#123` or `PR #123` in the goal routes to REVIEW_ONLY without spawning discovery or shaping sessions
-- [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (coding-task-workflow-agentic only)
-- [ ] An ambiguous task (no static signal) routes to classify-task-workflow session, parses `recommendedPipeline`, and executes that pipeline
+- [ ] A task with `pitch.md` present in the workspace routes to IMPLEMENT (wr.coding-task only)
+- [ ] An ambiguous task (no static signal) routes to wr.classify-task session, parses `recommendedPipeline`, and executes that pipeline
 - [ ] A `dep bump` or `chore:` task routes to QUICK_REVIEW (mr-review only, no arch audit) based on goal text heuristics
 - [ ] Any phase failure produces a `PipelineOutcome` with `escalated: true` and a structured `escalationReason` -- no silent substitution
 - [ ] The `CoordinatorDeps` interface for the adaptive coordinator extends or reuses the existing `CoordinatorDeps` pattern from `pr-review.ts`
@@ -139,8 +139,8 @@ trigger
 ### Assumptions not yet verified
-1. `classify-task-workflow` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
-2. The `recommendedPipeline` text can be reliably parsed from classify-task-workflow's note output using a regex or structured block parser
+1. `wr.classify-task` can be invoked via `spawnSession` + `awaitSessions` + `getAgentResult` with note parsing (same as pr-review reads verdict artifacts) -- this is assumed based on the spawn_agent artifact limitation
+2. The `recommendedPipeline` text can be reliably parsed from wr.classify-task's note output using a regex or structured block parser
 3. A new CLI subcommand `worktrain run pipeline` can be added following the same pattern as `worktrain run pr-review` in `src/cli-worktrain.ts`
 4. Pipeline modes can be named and bounded at design time (not open-ended)
@@ -151,17 +151,17 @@ trigger
 ### HMW (How Might We) reframes
 - HMW make the pipeline mode explicit in the trigger config so routing is never ambiguous, while still supporting dynamic routing for ad-hoc CLI invocations?
-- HMW use classify-task-workflow's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
+- HMW use wr.classify-task's `recommendedPipeline` as the default while allowing static overrides to be applied on top, treating classification as advisory rather than authoritative?
 ### Primary uncertainty (updated)
-Can classify-task-workflow's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
+Can wr.classify-task's `recommendedPipeline` output be used as the canonical routing source, with static overrides applied on top for well-known signal patterns (PR number, pitch.md, dep-bump keywords) -- rather than choosing between LLM and heuristics as mutually exclusive?
 ### Known approaches
-1. **classify-task-workflow first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
+1. **wr.classify-task first** -- always spawn a classification session, parse `recommendedPipeline`, then execute the pipeline. LLM-accurate, adds latency and cost per dispatch.
 2. **Static heuristics** -- parse goal text and trigger metadata (PR number present, labels, pitch.md present, explicit pipelineMode flag on trigger). Zero LLM cost, covers well-defined cases.
-3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `classify-task-workflow` is an optional fast path, not always required.
+3. **Hybrid** -- static heuristics handle high-confidence cases; LLM classification handles ambiguous tasks. `wr.classify-task` is an optional fast path, not always required.
 4. **Explicit `pipelineMode` on trigger** -- add a `pipelineMode` field to `TriggerDefinition` (or as a context variable). Users/triggers declare mode explicitly. Removes ambiguity but requires configuration overhead.
 5. **classify-task advisory + static overrides** -- run classify-task first (small cost, accurate), then apply static override rules on top of `recommendedPipeline` to handle well-known signals. Classify sets the baseline; static rules correct known exceptions.
@@ -221,8 +221,8 @@ function routeTask(goal: string, workspace: string): PipelineMode
 **Per-mode pipeline sequences:**
 - `REVIEW_ONLY`: `mr-review-workflow.agentic.v2` -> route by verdict (clean: merge, minor: fix-agent-loop, blocking: escalate)
 - `QUICK_REVIEW`: same as REVIEW_ONLY but `agentConfig: { model: 'haiku-light' }`, no arch audit even if touched
-- `IMPLEMENT`: `coding-task-workflow-agentic` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
-- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> `mr-review-workflow.agentic.v2` -> merge
+- `IMPLEMENT`: `wr.coding-task` (Phase 0.5 finds pitch.md) -> `mr-review-workflow.agentic.v2` -> merge
+- `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> `mr-review-workflow.agentic.v2` -> merge
 **Failure handling:** each phase failure returns a `PipelineOutcome` with `escalated: true` and `escalationReason`. No fallback to simpler pipeline. Same pattern as `PrOutcome` in pr-review.ts.
@@ -238,14 +238,14 @@ function routeTask(goal: string, workspace: string): PipelineMode
 ---
-### Candidate B: classify-task-workflow as authoritative source (pure LLM routing)
+### Candidate B: wr.classify-task as authoritative source (pure LLM routing)
-**One-sentence summary:** The coordinator always spawns a `classify-task-workflow` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
+**One-sentence summary:** The coordinator always spawns a `wr.classify-task` session first, parses the `recommendedPipeline` output from step notes, and executes the pipeline that workflow specifies -- the coordinator script is a runner for whatever classify-task returns.
 **Architecture:**
 ```typescript
 async function routeTask(goal, workspace, deps): Promise<Result<readonly string[], string>> {
-  const handle = await deps.spawnSession('classify-task-workflow', goal, workspace);
+  const handle = await deps.spawnSession('wr.classify-task', goal, workspace);
   const result = await deps.awaitSessions([handle], CLASSIFY_TIMEOUT_MS);
   const notes = await deps.getAgentResult(handle);
   return parseRecommendedPipeline(notes.recapMarkdown); // pure function, text block parser
@@ -257,15 +257,15 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
 **Pipeline modes:** not named at the coordinator level -- the pipeline IS whatever classify-task returns. The coordinator just runs the sequence.
-**Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'coding-task-workflow-agentic', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
+**Failure handling:** if `parseRecommendedPipeline` fails (LLM deviated from format), default to `['wr.discovery', 'wr.coding-task', 'mr-review-workflow.agentic.v2']`. Any spawned phase failure escalates with structured reason.
 **Tensions resolved:** intelligent routing for ambiguous tasks; single source of truth for pipeline selection rules (the workflow, not the coordinator).
 **Tensions accepted:** non-deterministic (same task may classify differently); adds 5-15 second LLM latency per dispatch; `recommendedPipeline` is a string array of workflow IDs, not a typed discriminated union.
 **Failure mode to watch:** coordinator runs `wr.discovery` unnecessarily for PR-only tasks if classify-task misclassifies them. Recovery: add static pre-check before spawning classify-task.
-**Follows:** classify-task-workflow's existing decision rules are already correct; this candidate delegates trust to them.
+**Follows:** wr.classify-task's existing decision rules are already correct; this candidate delegates trust to them.
 **Gain:** routing rules live in the workflow, not the coordinator -- can be updated without code changes.
 **Give up:** determinism, routing transparency (routing reason requires parsing LLM output), typed pipeline modes.
-**Impact surface:** classify-task-workflow becomes a critical dependency -- format changes break coordinator.
+**Impact surface:** wr.classify-task becomes a critical dependency -- format changes break coordinator.
 **Scope judgment:** Best-fit for teams that want routing rules to evolve without code deployment.
 **Philosophy:** Honors dependency injection (classify-task as a boundary). Conflicts with determinism-over-cleverness (LLM routing is clever but non-deterministic).
@@ -273,7 +273,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<readonly string[
 ### Candidate C: static-first with LLM fallback (hybrid, recommended)
-**One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to classify-task-workflow only for ambiguous tasks where no static signal fires.
+**One-sentence summary:** A two-tier `routeTask()` applies static rules first (fast, deterministic, covers 80% of cases), then falls back to wr.classify-task only for ambiguous tasks where no static signal fires.
 **Architecture:**
 ```typescript
@@ -303,7 +303,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
 - `REVIEW_ONLY`: same as Candidate A
 - `QUICK_REVIEW`: same as Candidate A
 - `IMPLEMENT`: same as Candidate A
-- `FULL`: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> PR -> review -> merge
+- `FULL`: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> PR -> review -> merge
 - `CLASSIFY_AND_RUN`: execute phases from classify-task output in order; unknown workflow IDs escalate
 **Failure handling:** escalation-first, same as pr-review.ts. The routing failure (classify-task parse failure) produces ESCALATE mode with reason.
@@ -314,7 +314,7 @@ async function routeTask(goal, workspace, deps): Promise<Result<PipelineMode, st
 **Follows:** parseFindingsFromNotes two-tier strategy pattern. CoordinatorDeps injection for the LLM fallback path.
 **Gain:** fast for common cases, intelligent for ambiguous cases, deterministic for all named modes.
 **Give up:** complexity of two tiers; CLASSIFY_AND_RUN mode is not a named type with typed data.
-**Impact surface:** same as Candidate A plus classify-task-workflow dependency.
+**Impact surface:** same as Candidate A plus wr.classify-task dependency.
 **Scope judgment:** Best-fit -- covers all named use cases efficiently. YAGNI risk is low because the LLM fallback adds ~30 lines of code, not a new architecture.
 **Philosophy:** Honors immutability, exhaustiveness (switch on PipelineMode is exhaustive), determinism-over-cleverness (static tier is deterministic, LLM is bounded fallback), errors-as-data.
@@ -421,7 +421,7 @@ Each mode coordinator is ~300-600 lines, fully independently testable. No mode-s
 ### Recommendation: C + E (Candidate C routing mechanism, Candidate E file architecture)
-**The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via classify-task-workflow. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
+**The routing mechanism decision (C):** Two-tier routing is the best-fit. Static rules cover the 4 well-defined cases (PR number, dep-bump, pitch.md, vague idea) without LLM cost. `CLASSIFY_AND_RUN` as the 5th mode handles genuinely ambiguous tasks via wr.classify-task. This follows the `parseFindingsFromNotes` precedent in pr-review.ts (two-tier: structured first, fallback second).
 **The architecture decision (E):** Per-mode coordinator files with a thin dispatcher is the correct architecture for 5 modes. Each mode file follows pr-review.ts independently. The dispatcher is the only code that changes when a new mode is added. This is how the codebase is already structured (pr-review.ts is one mode file) -- Candidate E just makes the pattern explicit.
@@ -447,7 +447,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
 ### Pivot conditions
-- If `classify-task-workflow` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
+- If `wr.classify-task` note parsing proves unreliable (format drift), pivot to pure static (Candidate A) and accept that ambiguous tasks run FULL
 - If `TriggerDefinition` change is needed for automated workflows, add Candidate D's pipelineMode field
 - If context-passing agent's design shows that the coordinator must inject structured context at spawn time, the mode coordinator files must include context injection logic -- this is implementation detail, not a routing design change
@@ -466,7 +466,7 @@ Candidate D (pipelineMode in TriggerDefinition) would be justified if trigger op
 1. **CLASSIFY_AND_RUN seam crack (genuine weakness, not blocking):** C's CLASSIFY_AND_RUN mode creates a typed/untyped seam in the dispatcher. Mitigation: CLASSIFY_AND_RUN fires only for tasks with no static signal; the dispatcher handles it with a dedicated `runClassifyAndRunPipeline` function that is documented as the "catch-all" path. Alternatively: fold CLASSIFY_AND_RUN into FULL (just run the three-workflow pipeline for all ambiguous tasks) and remove the LLM fallback entirely. This would make C = A for ambiguous tasks, simplifying the design.
    - **Final decision: simplify C by removing CLASSIFY_AND_RUN. Ambiguous tasks (no static signal) default to FULL. This gives Candidate A's simplicity with Candidate C's structure.**
-2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No classify-task-workflow dependency at all for the initial implementation.
+2. **A is sufficient for MVP:** Challenge confirmed that Candidate A covers all 5 stated use cases. C adds value for future Medium tasks. For an MVP, A is correct. The recommended design IS essentially Candidate A + Candidate E architecture. No wr.classify-task dependency at all for the initial implementation.
 ### Final simplified design (A + E, not C + E)
@@ -489,7 +489,7 @@ Static rules (prioritized):
 3. `.workrail/current-pitch.md` exists -> `IMPLEMENT`
 4. else -> `FULL`
-**Why remove CLASSIFY_AND_RUN:** classify-task-workflow adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
+**Why remove CLASSIFY_AND_RUN:** wr.classify-task adds latency, non-determinism, and format-parsing fragility for no concrete benefit over FULL for the stated use cases. The "YAGNI with discipline" principle wins. If Medium tasks turn out to be wasteful with FULL, add classify-task as a future enhancement with a typed artifact (not text parsing).
 **Architecture (E as designed):**
 ```
@@ -549,7 +549,7 @@ src/coordinators/
 1. **Routing determines spawn order, not context shape.** The routing layer (`routeTask()`) produces a `PipelineMode` variant. It does NOT know what context to pass to each spawned session. Context injection is entirely the responsibility of each mode coordinator (full-pipeline.ts, implement.ts, etc.), not the routing layer.
-2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `coding-task-workflow-agentic` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
+2. **FULL pipeline phase order is: `wr.discovery` -> `wr.shaping` -> `wr.coding-task` -> review -> merge.** If the context-passing agent's design changes this order (e.g., by making shaping optional based on discovery findings), the `runFullPipeline()` function must be updated accordingly. The routing layer itself does not need to change.
 3. **pitch.md is the canonical Shaping->Coding handoff.** The `IMPLEMENT` mode routes directly to coding because `current-pitch.md` already exists. The coding-task Phase 0.5 detects it and uses it. If the context-passing agent introduces a different handoff mechanism (e.g., coordinator-injected context instead of a file), the `IMPLEMENT` mode coordinator needs to inject that context at spawn time rather than relying on Phase 0.5 file detection.
@@ -582,8 +582,8 @@ The adaptive coordinator uses **pure static routing with per-mode file decomposi
 |------|---------------|
 | `REVIEW_ONLY` | `mr-review-workflow.agentic.v2` → verdict routing (clean: merge, minor: fix-loop, blocking: escalate) |
 | `QUICK_REVIEW` | same as REVIEW_ONLY with lighter model config |
-| `IMPLEMENT` | `coding-task-workflow-agentic` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
-| `FULL` | `wr.discovery` → `wr.shaping` → `coding-task-workflow-agentic` → PR → `mr-review-workflow.agentic.v2` → merge |
+| `IMPLEMENT` | `wr.coding-task` (Phase 0.5 reads pitch.md) → PR → `mr-review-workflow.agentic.v2` → merge |
+| `FULL` | `wr.discovery` → `wr.shaping` → `wr.coding-task` → PR → `mr-review-workflow.agentic.v2` → merge |
 **File architecture (Candidate E):**
 ```
@@ -633,7 +633,7 @@ const COORDINATOR_MAX_MS = 120 * 60 * 1000;      // 120 min total coordinator wa
 - Routing decision is logged as traceability JSON before any session spawn
 - FULL pipeline: each phase is an independent escalation point (discovery-fail, shaping-fail, coding-fail each escalate independently)
-**Why LLM classification (classify-task-workflow) was excluded:**
+**Why LLM classification (wr.classify-task) was excluded:**
 After adversarial challenge, CLASSIFY_AND_RUN mode was removed. The LLM classification path adds non-determinism and format-parsing fragility (notes parsing vs typed artifact) for no concrete MVP benefit. All 5 stated use cases are covered by static rules. The upgrade path to add classify-task as a Tier 2 fallback exists when evidence shows >5% misrouting in production.

package/docs/design/agent-cascade-protocol.md CHANGED Viewed

@@ -46,7 +46,7 @@ WorkRail defines three distinct tiers of execution. The system automatically sel
 How does WorkRail know which tier to use? It uses a **"Verify then Delegate"** pattern (The Probe Protocol).
 ### 1. The Boot Check (Diagnostic Phase)
-When a session starts (or via the `workflow-diagnose-environment` workflow), WorkRail guides the Main Agent to probe the environment:
+When a session starts (or via the `wr.diagnose-environment` workflow), WorkRail guides the Main Agent to probe the environment:
 1.  **Check for Subagents:** "Do you have a 'Researcher' subagent?"
     *   *No:* **Fallback to Tier 1 (Solo).**
@@ -74,7 +74,7 @@ When executing a workflow step that calls for a specialized routine:
 To support this protocol, WorkRail provides:
-1.  **The Diagnostic Workflow:** A guided utility (`workflow-diagnose-environment.json`) to help users verify and configure their agents.
+1.  **The Diagnostic Workflow:** A guided utility (`wr.diagnose-environment.json`) to help users verify and configure their agents.
 2.  **The Asset Pack:** Standardized definitions for common roles (Researcher, Architect, Builder, Reviewer) that users can copy-paste into their IDE configs.
     *   Includes System Prompts (for Tiers 1-3).
     *   Includes Tool Whitelists (for enabling Tier 3).