npm - @exaudeus/workrail - Versions diffs - 3.74.1 → 3.74.3 - Mend

@exaudeus/workrail 3.74.1 → 3.74.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/dist/application/services/workflow-interpreter.js +22 -0
package/dist/console-ui/assets/{index-BmDxs-a5.js → index-ByqIsoyt.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/session-scope.d.ts +7 -1
package/dist/daemon/workflow-runner.d.ts +11 -4
package/dist/daemon/workflow-runner.js +92 -66
package/dist/manifest.json +13 -13
package/dist/v2/durable-core/domain/context-template-resolver.js +34 -9
package/docs/ideas/backlog.md +227 -27
package/package.json +1 -1
package/workflows/routines/tension-driven-design.json +26 -15
package/workflows/wr.discovery.json +153 -16

package/docs/ideas/backlog.md CHANGED Viewed

@@ -65,26 +65,50 @@ No proposed solutions here -- just the problem.]
 ### wr.coding-task forEach loop exposes broken agent-facing state (Apr 30, 2026)
-**Status: bug** | Priority: high
+**Status: done** | Shipped May 1, 2026 (PR #926)
 **Score: 13** | Cor:3 Cap:1 Eff:2 Lev:2 Con:3 | Blocked: no
-The `phase-6-implement-slices` loop (forEach over `slices`) ran correctly mechanically -- it iterated all 8 slices and stopped. But the agent-facing representation was broken in ways that violate WorkRail's promise of consistency and determinism:
+**Root cause (diagnosed Apr 30, 2026):** The agent wrote `slices` as an array of plain strings (`["1: slice name", ...]`) instead of objects (`[{name: "...", ...}]`). The engine accepted the array (it was an array), entered the loop, and `{{currentSlice.name}}` silently resolved to `[unset]` on every iteration because strings don't have a `.name` property.
-1. **`currentSlice.name` showed `[unset]`** -- the agent was inside a forEach loop over `slices` with `itemVar: "currentSlice"`, but the template variable wasn't being projected into sessionContext before rendering. The agent couldn't see which slice it was on. This is an engine rendering issue in `buildLoopRenderContext` / `prompt-renderer.ts`.
+**Shipped (PR #926):**
+1. **forEach shape guard** (`workflow-interpreter.ts`): at iteration 0, if the body uses `{{itemVar.field}}` dot-path access but the items array contains primitives, returns `LOOP_MISSING_CONTEXT` with a message naming the actual type and a preview of the bad value. The loop never enters with broken state.
+2. **Diagnostic `[unset]` messages** (`context-template-resolver.ts`): when dot-path navigation fails mid-path due to a type mismatch (e.g. `currentSlice` is a string), the rendered prompt now shows `[unset: currentSlice.name -- 'currentSlice' is string ("1: Auth..."), not object]` instead of just `[unset: currentSlice.name]`.
-2. **Agent emitted `wr.loop_control` artifacts that had no effect** -- the forEach loop silently ignores these. The agent did useless work the engine discarded without signaling that this was happening. A correct system should either prevent the agent from emitting artifacts that can't affect the loop, or tell the agent explicitly that artifact-based exit isn't available in this loop type.
+**Remaining open (separate items):** context contract enforcement (systemic fix), `todoList` abstraction, `wr.loop_control` shown in forEach prompts.
-3. **Loop presented as "Pass N of 20" not "Slice 3 of 8"** -- the framing confused the agent about what was happening. The agent should be told it's iterating over concrete slices, not burning through a budget.
+**GitHub issue:** https://github.com/EtienneBBeaulac/workrail/issues/920
+---
-The forEach loop *worked* but the agent experience was wrong. This matters because WorkRail's value is that agents should not be confused about their own loop state. An agent that emits useless artifacts, can't see its own iteration variable, and misunderstands whether the loop is progress-based or budget-based is not operating under the deterministic, correct framework WorkRail promises.
+### Context contract: steps must declare required and produced context keys (Apr 30, 2026)
-**GitHub issue:** https://github.com/EtienneBBeaulac/workrail/issues/920
+**Status: tentative** | Priority: medium
+**Score: 12** | Cor:3 Cap:2 Eff:1 Lev:3 Con:2 | Blocked: no
+The engine has no mechanism to enforce context between steps. `Capture:` instructions in step prompts are prose -- the engine accepts `continue_workflow` with empty context on every advance, silently. This is the systemic root of the forEach `[unset]` bug: the agent wrote planning output as notes, not as context, and the engine accepted every advance without complaint. The same failure can happen in any workflow that passes state between steps.
+**Things to hash out:**
+- What schema format should `contextContract` use -- JSON Schema subset or a simpler workrail-specific type DSL?
+- Should validation be blocking (engine rejects the advance) or advisory (engine warns in the next step prompt)?
+- Does context contract cover loop entry preconditions, or does the separate forEach guard item handle that?
+---
+### `todoList` step type: ergonomic abstraction over forEach (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 10** | Cor:2 Cap:3 Eff:1 Lev:2 Con:2 | Blocked: no
+Workflow authors using forEach must manually wire a prior step to populate the items array, understand iteration variables, avoid emitting `wr.loop_control` artifacts (which have no effect in forEach), and explain the loop framing to the agent. The forEach shape guard (PR #926) now catches primitive-item arrays loudly at loop entry, but the wiring between "the step that produces items" and "the loop that consumes them" remains implicit and invisible to the engine. The `todoList` abstraction would make this wiring structural.
 **Things to hash out:**
-- Is `currentSlice.name = [unset]` a bug in `buildLoopRenderContext` (engine fix needed), or is it a workflow authoring issue (the slices array items don't have a `name` property)?
-- Should the engine prevent agents from emitting `wr.loop_control` artifacts inside forEach loops, or simply document that they have no effect?
-- Should forEach loops surface iteration progress ("slice 3 of 8") differently than while loops ("pass 3 of 20") in the step header text?
+- Should `todoList` compile to a forEach loop at the engine layer, or be a new execution primitive?
+- How does the setup step that produces the items array get authored -- inline prompt, routine reference, or both?
+- What does the agent-facing presentation look like: "Item 3 of 8" with item content injected, or something else?
+- Should `wr.loop_control` artifacts be stripped from the step prompt entirely in a `todoList`, or does the agent still need an explicit completion signal?
 ---
@@ -234,6 +258,71 @@ This is exactly what happened with the commit SHA change: setting `agentCommitSh
 The autonomous workflow runner (`worktrain daemon`). Completely separate from the MCP server -- calls the engine directly in-process.
+### Living work context: shared knowledge document that accumulates across the full pipeline (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
+When a multi-agent pipeline runs -- discovery → shaping → coding → review → fix → re-review -- no agent has a complete picture of what came before it. The coding agent has the goal. The review agent has the code. The fix agent has the findings. None of them have the accumulated context from the full pipeline: why this approach was chosen over alternatives, what was ruled out, what constraints were discovered, what architectural decisions were made, what edge cases were handled, what the review found and why.
+Each agent reconstructs intent from incomplete context, which is why review finds things coding missed (review doesn't know what the coding agent was trying to do), why fix sessions address symptoms without understanding causes (no access to the architectural reasoning), and why agents repeat work that earlier agents already did.
+**The real need:** a **living work context document** that every agent in the pipeline both reads from and contributes to:
+- **Discovery adds**: why this approach over alternatives, what was ruled out, constraints found
+- **Shaping adds**: the bounded problem, no-gos, acceptance criteria -- the verifiable contract
+- **Architecture/coding adds**: why specific decisions were made, what invariants must hold, what was deliberately deferred and why
+- **Review adds**: what was found, the underlying reason it was missed, what the fix must address
+- **Fix adds**: what was changed and why the fix is correct per the spec
+The spec from shaping is one layer of this -- the *what to build* contract. But the full context also includes the *why* from discovery, the *how* decisions from coding, and the *what was missed* from review. All of it should be accessible to every downstream agent.
+This is related to the "session knowledge log" backlog entry (agents appending to `session-knowledge.jsonl`) but is explicitly a **multi-agent shared artifact**, not a single session's private log. The coordinator is responsible for maintaining and passing this document to each spawned agent.
+**Things to hash out:**
+- What is the right format? A growing markdown document is human-readable but hard to query. Structured JSON is queryable but loses the narrative. A hybrid (structured frontmatter + narrative body) may be best.
+- Where does it live? In the worktree (accessible to the coding agent)? In a well-known workspace path? In the session store (accessible to all agents via `read_artifact`)?
+- Who owns writing to it -- the coordinator (scripts that have no LLM)? Each agent? Both?
+- When a pure coordinator pipeline has no main agent, who synthesizes the discovery findings into the document? The discovery agent writes its own section; the coordinator passes it through. But synthesis across sections (connecting discovery constraints to coding decisions) requires reasoning.
+- How does the review agent know which work context applies to the current PR? It needs discovery without being told explicitly.
+- What's the minimum viable version -- is just passing the shaped spec (`SPEC.md`) to the coding and review agents already a major improvement, even without the full living document?
+- This is distinct from "context injection at dispatch time" (passing a static bundle) -- the living document evolves as the pipeline progresses. Does the coordinator update it after each phase completes?
+- **Is "document" even the right abstraction?** A flat document implies agents read it linearly. But agents need to query it selectively -- the coding agent needs "what constraints affect this decision?", the review agent needs "what did the coding agent say about this module?". A structured knowledge store (typed facts, queryable by agent role and topic) may be more useful than a document. This connects to the knowledge graph backlog entry -- the work-unit knowledge store may be a per-pipeline instance of the same infrastructure. This is worth hashing out before designing the format.
+---
+### Move backlog to a dedicated worktrain-meta repo (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 11** | Cor:2 Cap:2 Eff:2 Lev:3 Con:3 | Blocked: no
+The backlog (`docs/ideas/backlog.md`) lives in the code repo, which means every feature branch has its own version of it. Ideas added mid-session on a feature branch are held hostage until that PR merges. If two branches both modify the backlog, git merge conflicts occur. There is no single authoritative place to add an idea that immediately applies everywhere.
+**Proposed fix:** move the backlog to a dedicated `worktrain-meta` repo (e.g. `~/git/personal/worktrain-meta/`). This is a separate git repo that is never branched for feature work -- you commit and push directly to main whenever an idea is added. Full git history is preserved. No code branch ever touches it. WorkTrain daemon sessions and the `npm run backlog` script are configured with the path to this repo.
+**Why separate repo over a dedicated branch in this repo:**
+- A dedicated branch in this repo can be accidentally contaminated by a rebase or merge
+- CI runs on every push to a branch here -- wasting resources on docs-only changes
+- The backlog lifecycle (ideas, grooming, scoring) is independent of the code release cycle -- they should be independent repos
+- When native backlog operations (structured data, SQLite) are built later, the backlog is already isolated and the migration doesn't touch the code repo
+**Migration steps:**
+1. Create `~/git/personal/worktrain-meta/` git repo, push to GitHub as a new repo
+2. Move `docs/ideas/backlog.md` there as the initial commit
+3. Update `scripts/backlog-priority.ts` path
+4. Update AGENTS.md reference to `npm run backlog`
+5. Update daemon-soul.md and any session context that references the backlog path
+6. Add `backlogRepoPath` to `~/.workrail/config.json` so the daemon knows where to find it
+**Things to hash out:**
+- Should the worktrain-meta repo also hold other cross-cutting artifacts like planning docs, the now-next-later roadmap, open-work-inventory? Or just the backlog?
+- How do subagents spawned in a worktree find the backlog? They need the path configured, not relative to the code workspace.
+- When native structured backlog operations are built, does the storage backend (SQLite) live in worktrain-meta or in `~/.workrail/data/`? The history requirement points toward worktrain-meta (git-tracked), but query performance points toward `~/.workrail/data/` (local database).
+---
 ### Subagent context package: project vision and task goal baked into spawning (Apr 30, 2026)
 **Status: idea** | Priority: high
@@ -333,21 +422,7 @@ Five dimensions, each scored 1-3. Score = sum (max 15). Items marked **Blocked**
 ### `delivery_failed` unreachable in `getChildSessionResult` -- type promises more than code delivers (Apr 30, 2026)
-**Status: bug** | Priority: medium
-**Score: 10** | Cor:3 Cap:1 Eff:2 Lev:2 Con:2 | Blocked: no
-`ChildSessionResult` has `reason: 'delivery_failed'` as a variant of `kind: 'failed'`. However `fetchChildSessionResult` in `coordinator-deps.ts` reads session status through `ConsoleService.getSessionDetail`, which returns statuses like `complete`/`blocked`/`in_progress` -- it never returns a `delivery_failed` status. `delivery_failed` is a `TriggerRouter`-level concept (callbackUrl POST failure) that is not stored as a session status in the event log. Child sessions spawned via `spawnSession`/`spawnAndAwait` have no `callbackUrl` and cannot produce it through this code path.
-The result: coordinators using `getChildSessionResult` can never observe `reason: 'delivery_failed'`, even though the type says they might. This violates the "make illegal states unrepresentable" principle -- the type union promises a variant the implementation cannot produce on this path.
-**Architectural fix (not a comment):** surface `delivery_failed` through session status. When `TriggerRouter` records a `delivery_failed` outcome, write a corresponding session event or status that `ConsoleService.getSessionDetail` returns. Then `fetchChildSessionResult` can map it correctly. This closes the gap between what the type promises and what the infrastructure delivers.
-Alternative: if `spawnSession`/`spawnAndAwait` child sessions genuinely cannot have `delivery_failed` outcomes by design, remove `reason: 'delivery_failed'` from `ChildSessionResult` entirely and document that it only exists in `spawn_agent`'s direct outcome mapping.
-**Things to hash out:**
-- Should `delivery_failed` be surfaced through ConsoleService (requires touching session status storage), or removed from `ChildSessionResult` since the `spawnSession` path provably cannot produce it?
-- If surfaced: what event or field in the session store carries this status, and how does ConsoleService project it?
+**Status: done** | Fixed in `cd8aaeb8` -- `delivery_failed` removed from `ChildSessionResult` entirely. The `spawnSession`/`spawnAndAwait` path cannot produce it by design; it only exists in `spawn_agent`'s direct outcome mapping.
 ---
@@ -367,19 +442,27 @@ Alternative: if `spawnSession`/`spawnAndAwait` child sessions genuinely cannot h
 ### Daemon architecture: remaining migrations (Apr 29, 2026)
-**Status: partial** | A9 shipped Apr 29, 2026.
+**Status: partial** | A9 shipped Apr 29, 2026. FC/IS follow-on shipped Apr 30 -- May 1, 2026.
 **Score: 8** | Cor:1 Cap:1 Eff:2 Lev:1 Con:3 | Blocked: no
 Track A (A1-A9) shipped and the `SessionSource` migration is complete. `WorkflowTrigger._preAllocatedStartResponse` is gone.
+**Shipped Apr 30 -- May 1, 2026 (PR #925):**
+- `TerminalSignal` union replaces `stuckReason` + `timeoutReason`. Illegal state (stuck AND timeout simultaneously) now structurally impossible. Stall overwrite bug fixed. `Readonly<SessionState>` at pure read sites.
+- `SessionScope` capability boundary complete: `onTokenUpdate`, `onIssueReported`, `onSteer`, `getCurrentToken`, `sessionWorkspacePath`, spawn depths all named scope fields. `constructTools` signature is `(ctx, apiKey, schemas, scope)` -- zero direct `state.X` references.
+- Early-exit paths unified through `finalizeSession`. `SteerRegistry`/`AbortRegistry` dead exports removed.
+- Architecture tests enforce `state.terminalSignal` write restriction and `constructTools` state-access restriction in CI.
+- `persistTokens` failure early-exit path covered by new outcome invariants tests.
 **Remaining items:**
 - `CriticalEffect<T>` / `ObservabilityEffect` type distinction -- categorize side effects in `runAgentLoop` and finalization as either crash-relevant or observability-only
-- `StateRef` mutation wrapper -- replace direct `state.pendingSteerParts.push()` mutations with an explicit mutation API
 - Zod tool param validation -- replace manual `typeof` checks in tool factories with Zod schema validation (requires `zodToJsonSchema` or maintaining two sources of truth for param schemas)
 - `createCoordinatorDeps` unit tests -- extraction in B3 improved testability; cover `spawnSession`, `awaitSessions`, `getAgentResult` at minimum
 - ~~Wire `AllocatedSession.triggerSource` to the `run_started` event for session attribution~~ -- **done**, PR #899 (Apr 30, 2026)
+- ~~`SessionStateWriter` capability interfaces~~ -- **done** as part of PR #925 (`SessionScope` now owns all mutation callbacks)
+- ~~Architecture test: forbid `state.terminalSignal =` direct writes outside `setTerminalSignal()`~~ -- **done**, PR #925
 ---
@@ -444,6 +527,8 @@ Phase 3 (PRs #835, #837): `buildTurnEndSubscriber`, `buildAgentCallbacks`, `buil
 **Total workflow-runner.ts reduction: ~4,955 → ~2,800 lines (44%).**
+**FC/IS follow-on (PR #925, Apr 30 -- May 1, 2026):** `TerminalSignal` union, `SessionScope` capability boundary completion, early-exit unification through `finalizeSession`, architecture tests. See "Daemon architecture: remaining migrations" entry for full details.
 **Follow-on:** `wr.refactoring` workflow (see backlog entry above). Remaining items in "Daemon architecture: remaining migrations" entry below.
 ---
@@ -1249,6 +1334,25 @@ This is already how mid-run resume works. The same mechanism extends naturally t
 ---
+### Task-scoped rules: step-level rule injection by task type (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 10** | Cor:2 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: no
+Workspace rules today are injected globally -- every session gets the same rules regardless of what the session is doing. This means PR-opening rules, issue-creation rules, commit message rules, and merge rules are all visible to a discovery session that will never do any of those things. Worse, a PR-opening step in a coding workflow doesn't get the rules injected precisely when it needs them -- they're diluted in the full rules blob. There is no mechanism to say "inject these rules only when the agent is about to open a PR" or "inject these rules only when creating a GitHub issue."
+The idea: a rule declaration mechanism (either in the workflow step definition or in a workspace rules file) that tags rules by task type. At step execution time, the engine injects only the rules tagged for that step's declared task type. Examples: a step with `taskType: 'git.open_pr'` automatically receives PR-opening rules; a step with `taskType: 'github.create_issue'` receives issue-creation rules. Rules not tagged for the current task type are not injected into that step's prompt. This is complementary to the phase-scoped rules preprocessing item -- phase scoping is coarse-grained (coding vs review), task scoping is fine-grained (which specific action within a step).
+**Things to hash out:**
+- Where are task-scoped rules declared -- in the workflow step definition (`taskType` field), in a workspace rules file with tags, or both?
+- What is the taxonomy of task types -- is it an open string, a closed enum, or a hierarchical namespace (e.g. `git.*`, `github.*`, `jira.*`)?
+- Does this interact with the ephemeral per-turn injection idea? Task-scoped rules are a natural candidate for ephemeral injection -- visible when needed, not accumulated in history.
+- Should task-scoped rules override or augment the global rules? What is the precedence and load order?
+- Who authors the task-scoped rules -- the workflow author (in the workflow JSON) or the workspace operator (in a workspace rules file)? Both seem valid but have different ownership models.
+---
 ### Rules preprocessing: normalize workspace rules before injection
 **Status: idea** | Priority: medium
@@ -2466,8 +2570,52 @@ A workflow that aggregates activity across git history, GitLab/GitHub MRs and re
 ---
+### Ephemeral per-turn context injection in the agent loop (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 10** | Cor:1 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: no
+The agent loop injects content (rules, soul, workspace context) into the system prompt once at session start. This means rules and behavioral constraints consume tokens for the entire session history. For long-running sessions, this is wasteful: every LLM API call re-sends the full system prompt including rules that were injected 50 turns ago. The alternative -- injecting rules on every turn as a fresh user or system message -- keeps them current but pollutes the conversation history with repetitive injections that further inflate context. There is no mechanism to inject content that is "always fresh, never historical" -- present on every loop iteration but not accumulated in the turn-by-turn conversation log.
+The desired behavior: certain content (rules, behavioral constraints, workspace context, soul principles) should be re-injected on every turn as an ephemeral "floating system message" that is visible to the LLM during inference but not stored in the conversation history. The LLM always sees it but it never grows the history.
+**Things to hash out:**
+- Does the Anthropic API (or other LLM providers) support a distinct ephemeral/volatile content slot that is not part of the messages array? If not, what is the closest approximation?
+- Is this a system prompt update per turn, or a separate "ephemeral context" message type? The distinction affects how context windows are managed by the provider.
+- Should ephemeral content be declared in the workflow (as a `volatileContext` field) or injected by the daemon's buildSystemPrompt() at the infrastructure level?
+- Which content actually benefits from this -- rules/soul only, or also things like "current git status", "last test run output", workspace context that may change mid-session?
+- Does this interact with the WorkRail engine's `continue_workflow` step injection? Step prompts are already injected per turn via `steer()` -- is this just a generalization of that mechanism?
+---
 ## Platform Vision (longer-term)
+### Epic-mode: full autonomous delivery of a multi-task feature from discovery to merged PRs (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 10** | Cor:1 Cap:3 Eff:1 Lev:3 Con:1 | Blocked: yes (blocked by: living work context, coordinator pipeline operational end-to-end, spawn_agent depth + parallel worktree support)
+Today WorkTrain handles one ticket at a time. An epic -- a feature that requires 5-10 interdependent changes across multiple files, modules, or services -- requires the operator to manually decompose it into tickets and dispatch each one separately. The decomposition, dependency ordering, and integration are all human work. This is the gap between "WorkTrain handles tickets" and "WorkTrain handles features."
+The idea: a single operator action kicks off an end-to-end autonomous pipeline for an entire epic. A planning phase fully decomposes the epic into a dependency-ordered task graph. Each task is a concrete, independently-implementable unit of work. Dependent tasks wait for their predecessors to land. Independent tasks are dispatched simultaneously to parallel agents in separate worktrees. Each task produces a PR. PRs target each other in a chain (each PR's base branch is the previous task's feature branch, or a shared integration branch). A coordinator monitors progress, re-plans when a task produces unexpected output, and handles failures by re-dispatching or escalating. When all tasks are merged (in dependency order), the epic is done.
+This is the feature that makes WorkTrain feel like it can take on real engineering work, not just isolated bug fixes and small features.
+**Things to hash out:**
+- What is the planning artifact? The decomposition step needs to produce a typed task graph -- not just a list of tasks, but explicit dependency edges, estimated scope per task, and the integration strategy (shared branch, stacked PRs, merge train). What schema captures this in a way the coordinator can route on deterministically?
+- How are dependencies enforced? If task B depends on task A, does B's agent start only after A's PR is merged, or does it work against A's branch before merge? The latter is faster but requires the coordinator to handle A's branch being rebased or amended.
+- How does the coordinator handle a task whose output invalidates the plan? If task A's implementation reveals a constraint that makes task C unnecessary or changes its scope, the coordinator needs to re-plan. What signals task A to the coordinator, and what does re-planning look like? Does it spawn a new planning agent, or does the coordinator apply deterministic rules?
+- What is the integration strategy for parallel tasks that touch overlapping files? Two agents working in separate worktrees may produce conflicting changes. Is this detected at PR-open time (merge conflicts), at plan time (the planner tries to assign non-overlapping scopes), or both?
+- What is the failure model? If one task in a 10-task epic fails after 3 tasks have merged, what happens to the already-landed work? The coordinator can't un-merge. Does it escalate to the operator, attempt a compensating task, or leave the partial state as-is?
+- How does this interact with the living work context design? Each task agent needs context from the planning phase (what the epic is trying to accomplish, what other tasks are doing, what invariants the whole feature must satisfy). This is exactly the cross-session context problem but at epic scale -- the context store needs to accumulate across a task graph, not just a linear pipeline.
+- What is the operator experience? Does the operator see a dashboard of all tasks in flight, their dependencies, and their status? Can they pause the epic, re-scope a task, or cancel a branch of the task graph mid-execution?
+**Why it's high leverage despite low confidence:** getting this right makes WorkTrain the tool for large-scale autonomous development. Every other item in the backlog improves WorkTrain's reliability or quality for one ticket. This item changes the unit of work from "ticket" to "feature."
+---
 ### Move backlog to a dedicated worktrain-meta repo with version control (Apr 30, 2026)
 **Status: idea** | Priority: high
@@ -4074,3 +4222,55 @@ WorkTrain has no tooling to surface the state of worktrees and branches relative
 ---
+## WorkRail usage report as a mercury-mobile team script (May 4, 2026)
+**Goal:** Make the WorkRail usage report dead simple to run for any mercury-mobile engineer -- one command, zero config beyond a GitLab token.
+### Distribution
+- Lives in mercury-mobile's common-ground team directory (`src/teams/mercury/mercury-mobile/scripts/workrail-report.sh`)
+- Distributed to every mercury engineer's machine by common-ground via `make sync`
+- Runnable as `~/.cg/dist/scripts/workrail-report.sh` or wrapped as a skill/alias
+### What it does
+1. Reads `~/.cg/config.toml` for the engineer's team identity
+2. Reads `~/.cg/repo-list.cache` to resolve repo names to local paths
+3. Scans `~/.workrail/data/sessions/` for sessions in the report window -- this is the authoritative source of what repos WorkRail was used on
+4. Fetches GitLab MRs via API for each repo that had sessions
+5. Builds the HTML report and writes to `~/Downloads/workrail-report-YYYY-MM-DD.html`
+6. Auto-opens the report
+### Configuration
+- **Token:** checks `GITLAB_TOKEN` env var → `~/.cg/secrets` → prompts once and offers to save. Zero setup if engineer already has `GITLAB_TOKEN` set.
+- **Date range:** defaults to last 30 days rolling. Override via `WORKRAIL_REPORT_DAYS=60 ./workrail-report.sh` or `--days 90` flag.
+- **Nothing else** -- team, repos, and GitLab paths are all auto-detected.
+### Report behavior
+- Only shows repos where WorkRail sessions exist in the window -- absence is signal, not a bug
+- Repos worked in outside WorkRail simply don't appear (the report is a WorkRail usage report, not a total productivity report)
+- "WorkRail shipped" correlation tab disabled in distributed version -- too expensive to run automatically. Available as a separate manual step for advanced users.
+### Error handling
+- No WorkRail installed → clear message with install instructions
+- No sessions in window → "No WorkRail activity in the last 30 days" with suggestion to check date range
+- No GitLab token → prompt with instructions for creating one
+- Repo not cloned locally → skip with note (LOC stats require local clone, rest of report works without it)
+### Non-goals
+- Not a team-level aggregated report (that's a future feature once `triggerSource` attribution is built)
+- Not a real-time dashboard
+- Not responsible for repos where WorkRail wasn't used
+### Depends on
+- The shared report scripts (`01-collect-sessions.py`, `02-collect-commits.py`, `04-build-html.py`) being stable -- ship this only after those are solid
+- `triggerSource: 'daemon' | 'mcp'` attribution (backlog) for distinguishing autonomous vs manual sessions -- not blocking but would improve the report
+- Common-ground `make sync` distributing the script reliably
+**Priority:** Medium. The shared scripts work and have been tested. Main remaining work is the shell wrapper, token storage, and integration with common-ground's team config.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.74.1",
+  "version": "3.74.3",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {

package/workflows/routines/tension-driven-design.json CHANGED Viewed

@@ -1,18 +1,20 @@
 {
   "id": "wr.routine-tension-driven-design",
   "name": "Tension-Driven Design Generation",
-  "version": "1.0.0",
+  "version": "1.2.0",
   "metricsProfile": "none",
-  "description": "Generates design candidates by deeply understanding a problem, identifying real tensions and constraints (including the dev's philosophy), and producing candidates that resolve those tensions differently. Each candidate includes explicit tradeoffs, failure modes, and philosophy alignment. Replaces perspective-based generation with constraint-driven generation for higher-quality, genuinely diverse candidates.",
+  "features": [
+    "wr.features.capabilities",
+    "wr.features.subagent_guidance"
+  ],
+  "description": "Generates design candidates grounded in real tensions. Supports standalone and parallel-executor modes. In executor mode, anchors to an assigned focus angle from the goal string and ranks candidates without recommending a winner -- the calling agent owns synthesis and selection.",
   "clarificationPrompts": [
-    "What problem should this design solve?",
+    "What problem should this design solve? (When spawned as an executor, the full fact packet and assigned focus angle should be in the goal string.)",
     "What acceptance criteria, invariants, and constraints must it respect?",
-    "Are `philosophySources` and `philosophyConflicts` available in context? (if not, I will discover from scratch)",
-    "What artifact name should I produce?"
+    "What artifact name should I produce? (Default: design-candidates.md)"
   ],
   "preconditions": [
-    "Problem statement is available",
-    "Acceptance criteria and non-goals are available",
+    "Problem statement is available -- either in the goal string (executor mode) or as context the agent can discover",
     "Relevant files, patterns, or codebase references are available",
     "Agent has read access to the codebase"
   ],
@@ -22,41 +24,50 @@
     "PHILOSOPHY: the dev's coding philosophy is a design constraint, not an afterthought review lens. Discover it and use it.",
     "SIMPLICITY BIAS: always consider whether the problem even needs an architectural solution. The simplest change that works is a valid candidate.",
     "REPO PATTERNS: study how the codebase already solves similar problems. The best design often adapts an existing pattern.",
-    "HONESTY: for each candidate, state what you gain, what you give up, and how it fails. Optimize for useful comparison, not persuasion."
+    "HONESTY: for each candidate, state what you gain, what you give up, and how it fails. Optimize for useful comparison, not persuasion.",
+    "EXECUTOR MODE: if goal contains 'FOCUS ANGLE:', you are a parallel executor. Generate candidates anchored to that angle only. Step 4 ranks, does not recommend. Main agent owns synthesis and selection.",
+    "OUTPUT FILE: use the filename from 'OUTPUT FILE:' in your goal string, defaulting to design-candidates.md."
   ],
   "steps": [
+    {
+      "id": "step-anchor-and-orient",
+      "title": "Step 0: Anchor to Assigned Context and Focus Angle",
+      "prompt": "Before doing any research or generation, read your goal string carefully and extract your operating context.\n\nFrom the goal string, extract and record:\n- **FOCUS ANGLE** (marked 'FOCUS ANGLE:' in the goal): your assigned generation angle. If present, ALL candidates must be anchored to this. You are in executor mode.\n- **PROBLEM** (marked 'PROBLEM:' or 'REFRAMED PROBLEM:'): what you are solving\n- **TENSIONS** (marked 'TENSIONS:'): core tensions the main agent already identified -- use these, do not re-investigate\n- **DECISION CRITERIA** (marked 'CRITERIA:'): what the final direction must satisfy\n- **IDEAL END STATE** (marked 'IDEAL END STATE:'): what the best achievable outcome looks like\n- **RISKIEST ASSUMPTION** (marked 'RISKIEST ASSUMPTION:'): the assumption most likely to invalidate the design\n- **PHILOSOPHY SOURCES** (marked 'PHILOSOPHY:'): pointers to rules or repo files encoding the dev's philosophy\n- **OUTPUT FILE** (marked 'OUTPUT FILE:'): the filename to produce (default: design-candidates.md)\n\nIf your goal contains a FOCUS ANGLE:\n- State your angle explicitly at the top of your notes\n- Confirm what it means for generation: which tensions it asks you to prioritize, which assumptions it asks you to stress-test\n- You will NOT recommend a winner in step 4 -- you will rank. Selection belongs to the main agent.\n\nIf your goal contains no FOCUS ANGLE (standalone execution):\n- Note that you are running standalone\n- You will discover missing context in subsequent steps\n\nWorking notes:\n- Assigned focus angle (or 'standalone')\n- All fact packet fields extracted from goal\n- What this angle means for generation\n- Output filename\n- Executor vs standalone confirmation",
+      "agentRole": "You are anchoring to your assigned role before doing any work. Read the goal string carefully. Do not skip this step.",
+      "requireConfirmation": false
+    },
     {
       "id": "step-discover-philosophy",
       "title": "Step 1: Discover the Dev's Philosophy",
-      "prompt": "Discover the dev's coding philosophy and preferences before designing anything.\n\nCheck `philosophySources` context variable first \u2014 if it contains pointers to rules, Memory entries, or repo files, go read those sources directly.\n\nIf `philosophySources` is empty or unavailable, discover from scratch:\n1. Memory MCP (if available): call `mcp_memory_conventions`, `mcp_memory_prefer`, `mcp_memory_recall` to retrieve learned preferences\n2. Active session rules / Firebender rules: read any rules or philosophy documents in context\n3. Repo patterns: infer preferences from how the codebase works \u2014 error handling, mutability, test style, architecture\n\nNote any `philosophyConflicts` (stated rules vs actual repo patterns).\n\nWorking notes:\n- Philosophy sources consulted\n- Key principles discovered\n- Conflicts between stated and practiced philosophy\n- Which principles are likely to constrain this design",
+      "prompt": "Discover the dev's coding philosophy and preferences before designing anything.\n\nIf PHILOSOPHY SOURCES were provided in step 0, go read those sources directly (they are file paths or Memory entry names).\n\nIf no philosophy sources were provided, discover from scratch:\n1. Memory MCP (if available): call `mcp_memory_conventions`, `mcp_memory_prefer`, `mcp_memory_recall` to retrieve learned preferences\n2. Active session rules / CLAUDE.md / AGENTS.md: read any rules or philosophy documents in context\n3. Repo patterns: infer preferences from how the codebase works -- error handling, mutability, test style, architecture\n\nNote any philosophy conflicts (stated rules vs actual repo patterns).\n\nWorking notes:\n- Philosophy sources consulted\n- Key principles discovered\n- Conflicts between stated and practiced philosophy\n- Which principles are likely to constrain this design",
       "agentRole": "You are discovering what the dev actually cares about before designing solutions.",
       "requireConfirmation": false
     },
     {
       "id": "step-understand-deeply",
       "title": "Step 2: Understand the Problem Deeply",
-      "prompt": "Understand the problem before proposing anything.\n\nReason through:\n- What are the core tensions in this problem? (e.g., performance vs simplicity, flexibility vs type safety, backward compatibility vs clean design)\n- How does the codebase already solve similar problems? Study the most relevant existing patterns \u2014 analyze the architectural decisions and constraints they protect, not just list files.\n- Where does the problem most likely live? Is the requested location the real seam, or just where the symptom appears?\n- What nearby callers, consumers, sibling paths, or contracts must remain consistent if that boundary changes?\n- What's the simplest naive solution? Why is it insufficient? (If it IS sufficient, note that \u2014 it may be the best candidate.)\n- What makes this problem hard? What would a junior developer miss?\n- Which of the dev's philosophy principles are under pressure from this problem's constraints?\n\nWorking notes:\n- Core tensions (2-4 real tradeoffs, not generic labels)\n- Existing patterns analysis (decisions, invariants they protect)\n- Likely seam / plausible boundaries\n- Nearby impact surface that must stay consistent\n- Naive solution and why it's insufficient (or sufficient)\n- What makes this hard\n- Philosophy principles under pressure",
+      "prompt": "Understand the problem before proposing anything.\n\nIf TENSIONS and a REFRAMED PROBLEM were extracted in step 0, use them as your starting point -- do not re-investigate what the main agent already resolved. Build on that foundation and add what the main agent may have missed from your assigned angle's perspective.\n\nReason through:\n- What are the core tensions in this problem?\n- How does the codebase already solve similar problems? Study the most relevant existing patterns -- analyze the architectural decisions and constraints they protect, not just list files.\n- Where does the problem most likely live? Is the requested location the real seam, or just where the symptom appears?\n- What nearby callers, consumers, sibling paths, or contracts must remain consistent if that boundary changes?\n- What is the simplest naive solution? Why is it insufficient? (If it IS sufficient, note that -- it may be the best candidate.)\n- What makes this problem hard? What would a junior developer miss?\n- Which of the dev's philosophy principles are under pressure from this problem's constraints?\n- If in executor mode: what does the problem look like specifically from your assigned angle? What tensions does your angle ask you to prioritize?\n\nWorking notes:\n- Core tensions (2-4 real tradeoffs, not generic labels)\n- Existing patterns analysis (decisions, invariants they protect)\n- Likely seam / plausible boundaries\n- Nearby impact surface that must stay consistent\n- Naive solution and why it's insufficient (or sufficient)\n- What makes this hard\n- Philosophy principles under pressure\n- How your assigned angle (if in executor mode) shapes your view of the problem",
       "agentRole": "You are reasoning deeply about the problem space before generating any solutions.",
       "requireConfirmation": false
     },
     {
       "id": "step-generate-candidates",
       "title": "Step 3: Generate Candidates from Tensions",
-      "prompt": "Generate design candidates that resolve the identified tensions differently.\n\nMANDATORY candidates:\n1. The simplest possible change that satisfies acceptance criteria. If the problem doesn't need an architectural solution, say so.\n2. Follow the existing repo pattern \u2014 adapt what the codebase already does for similar problems. Don't invent when you can adapt.\n\nAdditional candidates (1-2 more):\n- Each must resolve the identified tensions DIFFERENTLY, not just vary surface details\n- Each must be grounded in a real constraint or tradeoff, not an abstract perspective label\n- Consider philosophy conflicts: if the stated philosophy disagrees with repo patterns, one candidate could follow the stated philosophy and another could follow the established pattern\n\nFor each candidate, produce:\n- One-sentence summary of the approach\n- Which tensions it resolves and which it accepts\n- Boundary solved at, and why that boundary is the best fit\n- The specific failure mode you'd watch for\n- How it relates to existing repo patterns (follows / adapts / departs)\n- What you gain and what you give up\n- Impact surface beyond the immediate task\n- Scope judgment: too narrow / best-fit / too broad, with concrete evidence\n- Which philosophy principles it honors and which it conflicts with (by name)\n\nRules:\n- candidates must be genuinely different in shape, not just wording\n- if all candidates converge on the same approach, that's signal \u2014 note it honestly rather than manufacturing fake diversity\n- broader scope requires concrete evidence\n- cite specific files or patterns when they materially shape a candidate\n- specify each candidate at the level of concrete shape, not concept labels: 'tags' is not a candidate specification; 'per-workflow multi-labels drawn from a closed 9-value enum' is. If you find yourself using a concept label (tags, categories, events, hooks), you have not yet specified the candidate \u2014 name the data structure, the vocabulary or value set it uses, who maintains it, and how it is queried",
+      "prompt": "Generate design candidates that resolve the identified tensions differently.\n\nIf in executor mode (FOCUS ANGLE was set in step 0):\n- All candidates must be anchored to your assigned angle. Do not generate generic candidates that ignore it.\n- You are NOT required to include the simplest possible change or the standard repo-pattern candidate unless they genuinely arise from your angle. Those are covered by other executors or by the main agent's synthesis.\n- Generate 2-3 candidates that each explore your angle from a different sub-direction -- vary the scope, the boundary, or the tradeoff accepted, but keep all of them anchored to the angle.\n- One candidate should be the most ambitious expression of your angle. One should be the most constrained. Others fill the space between.\n\nIf running standalone:\n- MANDATORY candidates:\n  1. The simplest possible change that satisfies acceptance criteria. If the problem doesn't need an architectural solution, say so.\n  2. Follow the existing repo pattern -- adapt what the codebase already does for similar problems. Don't invent when you can adapt.\n- Additional candidates (1-2 more): each must resolve the identified tensions DIFFERENTLY, not just vary surface details.\n\nFor each candidate, produce:\n- One-sentence summary of the approach\n- Which tensions it resolves and which it accepts\n- Boundary solved at, and why that boundary is the best fit\n- The specific failure mode you'd watch for\n- How it relates to existing repo patterns (follows / adapts / departs)\n- What you gain and what you give up\n- Impact surface beyond the immediate task\n- Scope judgment: too narrow / best-fit / too broad, with concrete evidence\n- Which philosophy principles it honors and which it conflicts with (by name)\n\nRules:\n- Candidates must be genuinely different in shape, not just wording\n- If all candidates converge on the same approach, note it honestly rather than manufacturing fake diversity\n- Broader scope requires concrete evidence\n- Cite specific files or patterns when they materially shape a candidate\n- Specify each candidate at the level of concrete shape, not concept labels: 'typed store' is not a specification; 'append-only per-run JSON file at a deterministic path, written atomically via temp-rename, read by coordinator before each spawn' is",
       "agentRole": "You are generating genuinely diverse design candidates grounded in real tensions.",
       "requireConfirmation": false
     },
     {
       "id": "step-compare-and-recommend",
-      "title": "Step 4: Compare via Tradeoffs and Recommend",
-      "prompt": "Compare candidates through tradeoff analysis, not checklists.\n\nFor the set of candidates, assess:\n- Which tensions does each resolve best?\n- Which solves the problem at the best-fit boundary?\n- Which has the most manageable failure mode?\n- Which best fits the dev's philosophy? Where are the philosophy conflicts?\n- Which is most consistent with existing repo patterns?\n- Which would be easiest to evolve or reverse if assumptions are wrong?\n- Which is too narrow, best-fit, or too broad \u2014 and why?\n\nProduce a clear recommendation with rationale tied back to tensions, scope judgment, repo patterns, and philosophy. If two candidates are close, say so and explain what would tip the decision.\n\nSelf-critique your recommendation:\n- What's the strongest argument against your pick?\n- What narrower option might still work, and why did it lose?\n- What broader option might be justified, and what evidence would be required?\n- What assumption, if wrong, would invalidate this design?\n\nWorking notes:\n- Comparison matrix (tensions x candidates)\n- Recommendation and rationale\n- Strongest counter-argument\n- Pivot conditions",
-      "agentRole": "You are comparing candidates honestly and recommending based on tradeoffs, not advocacy.",
+      "title": "Step 4: Compare Candidates",
+      "prompt": "Compare candidates through tradeoff analysis, not checklists.\n\nIf in executor mode (FOCUS ANGLE was set in step 0):\n- Do NOT select a winner or make a final recommendation. The main agent owns selection across the full cross-executor candidate set.\n- Rank your candidates by how well each serves your assigned angle. State which is the strongest expression of the angle, which is the most defensible fallback, and what tradeoff separates them.\n- For each candidate: the strongest argument for it from your angle, and the strongest argument against it that the main agent should weigh.\n- State what a candidate from a DIFFERENT angle would need to offer to beat your strongest candidate from this angle's perspective. This is the cross-angle boundary -- it helps the main agent understand where each angle's value runs out.\n\nIf running standalone:\n- Produce a clear recommendation with rationale tied back to tensions, scope judgment, repo patterns, and philosophy.\n- Self-critique: strongest argument against your pick, narrower option that might still work and why it lost, broader option that might be justified and what evidence would be required, assumption that if wrong would invalidate the design.\n\nWorking notes:\n- Ranking (executor) or recommendation (standalone)\n- Strongest argument for and against each candidate\n- Cross-angle boundary statement (executor mode only)\n- Pivot conditions",
+      "agentRole": "You are ranking or recommending honestly. In executor mode you are producing material for the main agent to synthesize -- not closing the decision.",
       "requireConfirmation": false
     },
     {
       "id": "step-deliver",
       "title": "Step 5: Deliver the Design Candidates",
-      "prompt": "Create `{deliverableName}`.\n\nRequired structure:\n- Problem Understanding (tensions, likely seam, what makes it hard)\n- Philosophy Constraints (which principles matter, any conflicts)\n- Impact Surface (what nearby paths, consumers, or contracts must stay consistent)\n- Candidates (each with: summary, tensions resolved/accepted, boundary solved at, why that boundary is the best fit, failure mode, repo-pattern relationship, gains/losses, scope judgment, philosophy fit)\n- Comparison and Recommendation\n- Self-Critique (strongest counter-argument, pivot conditions)\n- Open Questions for the Main Agent\n\nThe main agent will interrogate this output \u2014 it is raw investigative material, not a final decision. Optimize for honest, useful analysis over polished presentation.",
+      "prompt": "Create the output file. Use the filename from OUTPUT FILE in your goal string, defaulting to `design-candidates.md` if none was specified.\n\nRequired structure:\n- Assigned Focus Angle (executor mode) or 'Standalone' -- state this first so the main agent knows the lens\n- Problem Understanding (tensions, likely seam, what makes it hard)\n- Philosophy Constraints (which principles matter, any conflicts)\n- Impact Surface (what nearby paths, consumers, or contracts must stay consistent)\n- Candidates (each with: summary, tensions resolved/accepted, boundary solved at, why that boundary is the best fit, failure mode, repo-pattern relationship, gains/losses, scope judgment, philosophy fit)\n- Ranking (executor mode: ranked by angle fit, no winner declared) or Recommendation (standalone: winner, rationale, self-critique)\n- Cross-Angle Boundary (executor mode only): what a candidate from a different angle would need to offer to beat the strongest candidate from this angle\n- Open Questions for the Main Agent\n\nThe main agent will interrogate this output -- it is raw investigative material, not a final decision. Optimize for honest, useful analysis over polished presentation.",
       "agentRole": "You are delivering design analysis for the main agent to interrogate and build on.",
       "requireConfirmation": false
     }