npm - @exaudeus/workrail - Versions diffs - 3.74.2 → 3.75.0 - Mend

@exaudeus/workrail 3.74.2 → 3.75.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/dist/console-ui/assets/{index-CK8Zux9a.js → index-BvBihscd.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/infrastructure/storage/schema-validating-workflow-storage.js +25 -2
package/dist/manifest.json +5 -5
package/docs/ideas/backlog.md +198 -0
package/package.json +1 -1
package/workflows/routines/tension-driven-design.json +33 -22
package/workflows/workflow-for-workflows.json +5 -11
package/workflows/wr.discovery.json +155 -16

package/dist/manifest.json CHANGED Viewed

@@ -473,8 +473,8 @@
       "sha256": "5fe866e54f796975dec5d8ba9983aefd86074db212d3fccd64eed04bc9f0b3da",
       "bytes": 8011
     },
-    "console-ui/assets/index-CK8Zux9a.js": {
-      "sha256": "6e1d3f6c4ec56f5b9b75a780d9cf3e1a3c62b173850093fae221add3d58d0b10",
+    "console-ui/assets/index-BvBihscd.js": {
+      "sha256": "58bbe4f1249aa4987c65367e6675d7081da84f69d0877b0922e21bc9fcd28ce0",
       "bytes": 768234
     },
     "console-ui/assets/index-DHrKiMCf.css": {
@@ -482,7 +482,7 @@
       "bytes": 60673
     },
     "console-ui/index.html": {
-      "sha256": "06cf7fbd1120973d829d69db8114e98fd38e28e88d9cac7c3daa8c1834d4ebd8",
+      "sha256": "b441d6137262e5fbe4673e515579928db3e073d6fbd1fa1dceedb20c3ac48d25",
       "bytes": 417
     },
     "console/standalone-console.d.ts": {
@@ -1026,8 +1026,8 @@
       "bytes": 2023
     },
     "infrastructure/storage/schema-validating-workflow-storage.js": {
-      "sha256": "4004fac4f8cb1b5cadbe060a389fe6a97377eedc6d4209b92da0d82d36a5e1b0",
-      "bytes": 7801
+      "sha256": "06a8dd9b05f3186dc305d39436b49c6c13e08b30b1fa9ae1f3d6161789c3b993",
+      "bytes": 8878
     },
     "infrastructure/storage/storage.d.ts": {
       "sha256": "481c5c0ef797baa7f18cff6a468a1de6d1ef34dd4b35f53e318e30b825b31e63",

package/docs/ideas/backlog.md CHANGED Viewed

@@ -258,6 +258,71 @@ This is exactly what happened with the commit SHA change: setting `agentCommitSh
 The autonomous workflow runner (`worktrain daemon`). Completely separate from the MCP server -- calls the engine directly in-process.
+### Living work context: shared knowledge document that accumulates across the full pipeline (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
+When a multi-agent pipeline runs -- discovery → shaping → coding → review → fix → re-review -- no agent has a complete picture of what came before it. The coding agent has the goal. The review agent has the code. The fix agent has the findings. None of them have the accumulated context from the full pipeline: why this approach was chosen over alternatives, what was ruled out, what constraints were discovered, what architectural decisions were made, what edge cases were handled, what the review found and why.
+Each agent reconstructs intent from incomplete context, which is why review finds things coding missed (review doesn't know what the coding agent was trying to do), why fix sessions address symptoms without understanding causes (no access to the architectural reasoning), and why agents repeat work that earlier agents already did.
+**The real need:** a **living work context document** that every agent in the pipeline both reads from and contributes to:
+- **Discovery adds**: why this approach over alternatives, what was ruled out, constraints found
+- **Shaping adds**: the bounded problem, no-gos, acceptance criteria -- the verifiable contract
+- **Architecture/coding adds**: why specific decisions were made, what invariants must hold, what was deliberately deferred and why
+- **Review adds**: what was found, the underlying reason it was missed, what the fix must address
+- **Fix adds**: what was changed and why the fix is correct per the spec
+The spec from shaping is one layer of this -- the *what to build* contract. But the full context also includes the *why* from discovery, the *how* decisions from coding, and the *what was missed* from review. All of it should be accessible to every downstream agent.
+This is related to the "session knowledge log" backlog entry (agents appending to `session-knowledge.jsonl`) but is explicitly a **multi-agent shared artifact**, not a single session's private log. The coordinator is responsible for maintaining and passing this document to each spawned agent.
+**Things to hash out:**
+- What is the right format? A growing markdown document is human-readable but hard to query. Structured JSON is queryable but loses the narrative. A hybrid (structured frontmatter + narrative body) may be best.
+- Where does it live? In the worktree (accessible to the coding agent)? In a well-known workspace path? In the session store (accessible to all agents via `read_artifact`)?
+- Who owns writing to it -- the coordinator (scripts that have no LLM)? Each agent? Both?
+- When a pure coordinator pipeline has no main agent, who synthesizes the discovery findings into the document? The discovery agent writes its own section; the coordinator passes it through. But synthesis across sections (connecting discovery constraints to coding decisions) requires reasoning.
+- How does the review agent know which work context applies to the current PR? It needs discovery without being told explicitly.
+- What's the minimum viable version -- is just passing the shaped spec (`SPEC.md`) to the coding and review agents already a major improvement, even without the full living document?
+- This is distinct from "context injection at dispatch time" (passing a static bundle) -- the living document evolves as the pipeline progresses. Does the coordinator update it after each phase completes?
+- **Is "document" even the right abstraction?** A flat document implies agents read it linearly. But agents need to query it selectively -- the coding agent needs "what constraints affect this decision?", the review agent needs "what did the coding agent say about this module?". A structured knowledge store (typed facts, queryable by agent role and topic) may be more useful than a document. This connects to the knowledge graph backlog entry -- the work-unit knowledge store may be a per-pipeline instance of the same infrastructure. This is worth hashing out before designing the format.
+---
+### Move backlog to a dedicated worktrain-meta repo (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 11** | Cor:2 Cap:2 Eff:2 Lev:3 Con:3 | Blocked: no
+The backlog (`docs/ideas/backlog.md`) lives in the code repo, which means every feature branch has its own version of it. Ideas added mid-session on a feature branch are held hostage until that PR merges. If two branches both modify the backlog, git merge conflicts occur. There is no single authoritative place to add an idea that immediately applies everywhere.
+**Proposed fix:** move the backlog to a dedicated `worktrain-meta` repo (e.g. `~/git/personal/worktrain-meta/`). This is a separate git repo that is never branched for feature work -- you commit and push directly to main whenever an idea is added. Full git history is preserved. No code branch ever touches it. WorkTrain daemon sessions and the `npm run backlog` script are configured with the path to this repo.
+**Why separate repo over a dedicated branch in this repo:**
+- A dedicated branch in this repo can be accidentally contaminated by a rebase or merge
+- CI runs on every push to a branch here -- wasting resources on docs-only changes
+- The backlog lifecycle (ideas, grooming, scoring) is independent of the code release cycle -- they should be independent repos
+- When native backlog operations (structured data, SQLite) are built later, the backlog is already isolated and the migration doesn't touch the code repo
+**Migration steps:**
+1. Create `~/git/personal/worktrain-meta/` git repo, push to GitHub as a new repo
+2. Move `docs/ideas/backlog.md` there as the initial commit
+3. Update `scripts/backlog-priority.ts` path
+4. Update AGENTS.md reference to `npm run backlog`
+5. Update daemon-soul.md and any session context that references the backlog path
+6. Add `backlogRepoPath` to `~/.workrail/config.json` so the daemon knows where to find it
+**Things to hash out:**
+- Should the worktrain-meta repo also hold other cross-cutting artifacts like planning docs, the now-next-later roadmap, open-work-inventory? Or just the backlog?
+- How do subagents spawned in a worktree find the backlog? They need the path configured, not relative to the code workspace.
+- When native structured backlog operations are built, does the storage backend (SQLite) live in worktrain-meta or in `~/.workrail/data/`? The history requirement points toward worktrain-meta (git-tracked), but query performance points toward `~/.workrail/data/` (local database).
+---
 ### Subagent context package: project vision and task goal baked into spawning (Apr 30, 2026)
 **Status: idea** | Priority: high
@@ -1269,6 +1334,43 @@ This is already how mid-run resume works. The same mechanism extends naturally t
 ---
+### Extensible output contract registration: coordinator-owned schemas, engine-enforced (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 8** | Cor:1 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: no
+The engine's output contract registry (`ARTIFACT_CONTRACT_REFS` in `src/v2/durable-core/schemas/artifacts/index.ts`) is a closed list maintained in the engine source. Adding a new contract type requires modifying the engine: adding to the registry, implementing a validator in `artifact-contract-validator.ts`, and adding a Zod schema. This is the correct pattern today and works fine at 5 items. But as the pipeline gains more phase types, every new coordinator-domain artifact contract is an engine change. The registry is already mixed -- `review_verdict` and `discovery_handoff` are coordinator-domain artifacts registered there. At 15-20 items this becomes a maintenance burden and a coupling that is harder to justify.
+The better long-term design: the engine owns the enforcement mechanism (validate presence and schema at `complete_step`) but not the schema definitions. Coordinator-domain contracts register their Zod schemas from outside the engine. The engine validates against whatever is registered without a hardcoded case per contract type.
+**Things to hash out:**
+- What is the registration API? DI injection at startup (consistent with existing container pattern), a module-level call, or a config file?
+- How does registration work at compile time vs runtime? Workflow compilation and `complete_step` validation happen at different points -- the registry must be available at both.
+- Does this change the `workflowHash`? If registered schemas change, should the hash change? Does the hash include registered external schemas or only the workflow JSON?
+- Should the existing 5 contracts migrate, or stay hardcoded? A two-tier system (some hardcoded, some registered) is confusing but migration is low priority.
+---
+### Task-scoped rules: step-level rule injection by task type (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 10** | Cor:2 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: no
+Workspace rules today are injected globally -- every session gets the same rules regardless of what the session is doing. This means PR-opening rules, issue-creation rules, commit message rules, and merge rules are all visible to a discovery session that will never do any of those things. Worse, a PR-opening step in a coding workflow doesn't get the rules injected precisely when it needs them -- they're diluted in the full rules blob. There is no mechanism to say "inject these rules only when the agent is about to open a PR" or "inject these rules only when creating a GitHub issue."
+The idea: a rule declaration mechanism (either in the workflow step definition or in a workspace rules file) that tags rules by task type. At step execution time, the engine injects only the rules tagged for that step's declared task type. Examples: a step with `taskType: 'git.open_pr'` automatically receives PR-opening rules; a step with `taskType: 'github.create_issue'` receives issue-creation rules. Rules not tagged for the current task type are not injected into that step's prompt. This is complementary to the phase-scoped rules preprocessing item -- phase scoping is coarse-grained (coding vs review), task scoping is fine-grained (which specific action within a step).
+**Things to hash out:**
+- Where are task-scoped rules declared -- in the workflow step definition (`taskType` field), in a workspace rules file with tags, or both?
+- What is the taxonomy of task types -- is it an open string, a closed enum, or a hierarchical namespace (e.g. `git.*`, `github.*`, `jira.*`)?
+- Does this interact with the ephemeral per-turn injection idea? Task-scoped rules are a natural candidate for ephemeral injection -- visible when needed, not accumulated in history.
+- Should task-scoped rules override or augment the global rules? What is the precedence and load order?
+- Who authors the task-scoped rules -- the workflow author (in the workflow JSON) or the workspace operator (in a workspace rules file)? Both seem valid but have different ownership models.
+---
 ### Rules preprocessing: normalize workspace rules before injection
 **Status: idea** | Priority: medium
@@ -2486,8 +2588,52 @@ A workflow that aggregates activity across git history, GitLab/GitHub MRs and re
 ---
+### Ephemeral per-turn context injection in the agent loop (Apr 30, 2026)
+**Status: idea** | Priority: medium
+**Score: 10** | Cor:1 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: no
+The agent loop injects content (rules, soul, workspace context) into the system prompt once at session start. This means rules and behavioral constraints consume tokens for the entire session history. For long-running sessions, this is wasteful: every LLM API call re-sends the full system prompt including rules that were injected 50 turns ago. The alternative -- injecting rules on every turn as a fresh user or system message -- keeps them current but pollutes the conversation history with repetitive injections that further inflate context. There is no mechanism to inject content that is "always fresh, never historical" -- present on every loop iteration but not accumulated in the turn-by-turn conversation log.
+The desired behavior: certain content (rules, behavioral constraints, workspace context, soul principles) should be re-injected on every turn as an ephemeral "floating system message" that is visible to the LLM during inference but not stored in the conversation history. The LLM always sees it but it never grows the history.
+**Things to hash out:**
+- Does the Anthropic API (or other LLM providers) support a distinct ephemeral/volatile content slot that is not part of the messages array? If not, what is the closest approximation?
+- Is this a system prompt update per turn, or a separate "ephemeral context" message type? The distinction affects how context windows are managed by the provider.
+- Should ephemeral content be declared in the workflow (as a `volatileContext` field) or injected by the daemon's buildSystemPrompt() at the infrastructure level?
+- Which content actually benefits from this -- rules/soul only, or also things like "current git status", "last test run output", workspace context that may change mid-session?
+- Does this interact with the WorkRail engine's `continue_workflow` step injection? Step prompts are already injected per turn via `steer()` -- is this just a generalization of that mechanism?
+---
 ## Platform Vision (longer-term)
+### Epic-mode: full autonomous delivery of a multi-task feature from discovery to merged PRs (Apr 30, 2026)
+**Status: idea** | Priority: high
+**Score: 10** | Cor:1 Cap:3 Eff:1 Lev:3 Con:1 | Blocked: yes (blocked by: living work context, coordinator pipeline operational end-to-end, spawn_agent depth + parallel worktree support)
+Today WorkTrain handles one ticket at a time. An epic -- a feature that requires 5-10 interdependent changes across multiple files, modules, or services -- requires the operator to manually decompose it into tickets and dispatch each one separately. The decomposition, dependency ordering, and integration are all human work. This is the gap between "WorkTrain handles tickets" and "WorkTrain handles features."
+The idea: a single operator action kicks off an end-to-end autonomous pipeline for an entire epic. A planning phase fully decomposes the epic into a dependency-ordered task graph. Each task is a concrete, independently-implementable unit of work. Dependent tasks wait for their predecessors to land. Independent tasks are dispatched simultaneously to parallel agents in separate worktrees. Each task produces a PR. PRs target each other in a chain (each PR's base branch is the previous task's feature branch, or a shared integration branch). A coordinator monitors progress, re-plans when a task produces unexpected output, and handles failures by re-dispatching or escalating. When all tasks are merged (in dependency order), the epic is done.
+This is the feature that makes WorkTrain feel like it can take on real engineering work, not just isolated bug fixes and small features.
+**Things to hash out:**
+- What is the planning artifact? The decomposition step needs to produce a typed task graph -- not just a list of tasks, but explicit dependency edges, estimated scope per task, and the integration strategy (shared branch, stacked PRs, merge train). What schema captures this in a way the coordinator can route on deterministically?
+- How are dependencies enforced? If task B depends on task A, does B's agent start only after A's PR is merged, or does it work against A's branch before merge? The latter is faster but requires the coordinator to handle A's branch being rebased or amended.
+- How does the coordinator handle a task whose output invalidates the plan? If task A's implementation reveals a constraint that makes task C unnecessary or changes its scope, the coordinator needs to re-plan. What signals task A to the coordinator, and what does re-planning look like? Does it spawn a new planning agent, or does the coordinator apply deterministic rules?
+- What is the integration strategy for parallel tasks that touch overlapping files? Two agents working in separate worktrees may produce conflicting changes. Is this detected at PR-open time (merge conflicts), at plan time (the planner tries to assign non-overlapping scopes), or both?
+- What is the failure model? If one task in a 10-task epic fails after 3 tasks have merged, what happens to the already-landed work? The coordinator can't un-merge. Does it escalate to the operator, attempt a compensating task, or leave the partial state as-is?
+- How does this interact with the living work context design? Each task agent needs context from the planning phase (what the epic is trying to accomplish, what other tasks are doing, what invariants the whole feature must satisfy). This is exactly the cross-session context problem but at epic scale -- the context store needs to accumulate across a task graph, not just a linear pipeline.
+- What is the operator experience? Does the operator see a dashboard of all tasks in flight, their dependencies, and their status? Can they pause the epic, re-scope a task, or cancel a branch of the task graph mid-execution?
+**Why it's high leverage despite low confidence:** getting this right makes WorkTrain the tool for large-scale autonomous development. Every other item in the backlog improves WorkTrain's reliability or quality for one ticket. This item changes the unit of work from "ticket" to "feature."
+---
 ### Move backlog to a dedicated worktrain-meta repo with version control (Apr 30, 2026)
 **Status: idea** | Priority: high
@@ -4094,3 +4240,55 @@ WorkTrain has no tooling to surface the state of worktrees and branches relative
 ---
+## WorkRail usage report as a mercury-mobile team script (May 4, 2026)
+**Goal:** Make the WorkRail usage report dead simple to run for any mercury-mobile engineer -- one command, zero config beyond a GitLab token.
+### Distribution
+- Lives in mercury-mobile's common-ground team directory (`src/teams/mercury/mercury-mobile/scripts/workrail-report.sh`)
+- Distributed to every mercury engineer's machine by common-ground via `make sync`
+- Runnable as `~/.cg/dist/scripts/workrail-report.sh` or wrapped as a skill/alias
+### What it does
+1. Reads `~/.cg/config.toml` for the engineer's team identity
+2. Reads `~/.cg/repo-list.cache` to resolve repo names to local paths
+3. Scans `~/.workrail/data/sessions/` for sessions in the report window -- this is the authoritative source of what repos WorkRail was used on
+4. Fetches GitLab MRs via API for each repo that had sessions
+5. Builds the HTML report and writes to `~/Downloads/workrail-report-YYYY-MM-DD.html`
+6. Auto-opens the report
+### Configuration
+- **Token:** checks `GITLAB_TOKEN` env var → `~/.cg/secrets` → prompts once and offers to save. Zero setup if engineer already has `GITLAB_TOKEN` set.
+- **Date range:** defaults to last 30 days rolling. Override via `WORKRAIL_REPORT_DAYS=60 ./workrail-report.sh` or `--days 90` flag.
+- **Nothing else** -- team, repos, and GitLab paths are all auto-detected.
+### Report behavior
+- Only shows repos where WorkRail sessions exist in the window -- absence is signal, not a bug
+- Repos worked in outside WorkRail simply don't appear (the report is a WorkRail usage report, not a total productivity report)
+- "WorkRail shipped" correlation tab disabled in distributed version -- too expensive to run automatically. Available as a separate manual step for advanced users.
+### Error handling
+- No WorkRail installed → clear message with install instructions
+- No sessions in window → "No WorkRail activity in the last 30 days" with suggestion to check date range
+- No GitLab token → prompt with instructions for creating one
+- Repo not cloned locally → skip with note (LOC stats require local clone, rest of report works without it)
+### Non-goals
+- Not a team-level aggregated report (that's a future feature once `triggerSource` attribution is built)
+- Not a real-time dashboard
+- Not responsible for repos where WorkRail wasn't used
+### Depends on
+- The shared report scripts (`01-collect-sessions.py`, `02-collect-commits.py`, `04-build-html.py`) being stable -- ship this only after those are solid
+- `triggerSource: 'daemon' | 'mcp'` attribution (backlog) for distinguishing autonomous vs manual sessions -- not blocking but would improve the report
+- Common-ground `make sync` distributing the script reliably
+**Priority:** Medium. The shared scripts work and have been tested. Main remaining work is the shell wrapper, token storage, and integration with common-ground's team config.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.74.2",
+  "version": "3.75.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {

package/workflows/routines/tension-driven-design.json CHANGED Viewed

@@ -1,62 +1,73 @@
 {
   "id": "wr.routine-tension-driven-design",
   "name": "Tension-Driven Design Generation",
-  "version": "1.0.0",
+  "version": "1.3.0",
   "metricsProfile": "none",
-  "description": "Generates design candidates by deeply understanding a problem, identifying real tensions and constraints (including the dev's philosophy), and producing candidates that resolve those tensions differently. Each candidate includes explicit tradeoffs, failure modes, and philosophy alignment. Replaces perspective-based generation with constraint-driven generation for higher-quality, genuinely diverse candidates.",
+  "features": [
+    "wr.features.capabilities",
+    "wr.features.subagent_guidance"
+  ],
+  "description": "Generates candidates grounded in real tensions for any problem domain. Supports standalone and parallel-executor modes. In executor mode, anchors to an assigned focus angle and ranks without recommending -- the calling agent owns synthesis and selection.",
   "clarificationPrompts": [
-    "What problem should this design solve?",
+    "What problem should this design solve? (When spawned as an executor, the full fact packet and assigned focus angle should be in the goal string.)",
     "What acceptance criteria, invariants, and constraints must it respect?",
-    "Are `philosophySources` and `philosophyConflicts` available in context? (if not, I will discover from scratch)",
-    "What artifact name should I produce?"
+    "What artifact name should I produce? (Default: design-candidates.md)"
   ],
   "preconditions": [
-    "Problem statement is available",
-    "Acceptance criteria and non-goals are available",
+    "Problem statement is available -- either in the goal string (executor mode) or as context the agent can discover",
     "Relevant files, patterns, or codebase references are available",
     "Agent has read access to the codebase"
   ],
   "metaGuidance": [
-    "PURPOSE: generate genuinely different design candidates grounded in real problem tensions, not abstract perspectives.",
-    "ROLE: you are a designer, not an auditor or implementer. Think deeply about the problem before proposing solutions.",
-    "PHILOSOPHY: the dev's coding philosophy is a design constraint, not an afterthought review lens. Discover it and use it.",
-    "SIMPLICITY BIAS: always consider whether the problem even needs an architectural solution. The simplest change that works is a valid candidate.",
-    "REPO PATTERNS: study how the codebase already solves similar problems. The best design often adapts an existing pattern.",
-    "HONESTY: for each candidate, state what you gain, what you give up, and how it fails. Optimize for useful comparison, not persuasion."
+    "PURPOSE: generate genuinely different candidates grounded in real tensions for any problem domain -- software, product, UX, personal, or general.",
+    "ROLE: you are a designer and strategic thinker, not an auditor or implementer. Think deeply about the problem before proposing solutions.",
+    "PRINCIPLES: the decision-maker's principles and constraints are a design constraint, not an afterthought. Discover them (step 1) and use them throughout.",
+    "SIMPLICITY BIAS: always consider whether the problem needs an ambitious solution at all. The simplest option that works is a valid candidate.",
+    "EXISTING PATTERNS: study how similar problems have been solved before in this context. The best solution often adapts an existing pattern rather than inventing from scratch.",
+    "HONESTY: for each candidate, state what you gain, what you give up, and how it fails. Optimize for useful comparison, not persuasion.",
+    "EXECUTOR MODE: if goal contains 'FOCUS ANGLE:', you are a parallel executor. Generate candidates anchored to that angle only. Step 4 ranks, does not recommend. Main agent owns synthesis and selection.",
+    "OUTPUT FILE: use the filename from 'OUTPUT FILE:' in your goal string, defaulting to design-candidates.md."
   ],
   "steps": [
+    {
+      "id": "step-anchor-and-orient",
+      "title": "Step 0: Anchor to Assigned Context and Focus Angle",
+      "prompt": "Before doing any research or generation, read your goal string carefully and extract your operating context.\n\nFrom the goal string, extract and record:\n- **FOCUS ANGLE** (marked 'FOCUS ANGLE:' in the goal): your assigned generation angle. If present, ALL candidates must be anchored to this. You are in executor mode.\n- **PROBLEM** (marked 'PROBLEM:' or 'REFRAMED PROBLEM:'): what you are solving\n- **TENSIONS** (marked 'TENSIONS:'): core tensions the main agent already identified -- use these, do not re-investigate\n- **DECISION CRITERIA** (marked 'CRITERIA:'): what the final direction must satisfy\n- **IDEAL END STATE** (marked 'IDEAL END STATE:'): what the best achievable outcome looks like\n- **RISKIEST ASSUMPTION** (marked 'RISKIEST ASSUMPTION:'): the assumption most likely to invalidate the design\n- **PHILOSOPHY SOURCES** (marked 'PHILOSOPHY:'): pointers to rules or repo files encoding the dev's philosophy\n- **OUTPUT FILE** (marked 'OUTPUT FILE:'): the filename to produce (default: design-candidates.md)\n\nIf your goal contains a FOCUS ANGLE:\n- State your angle explicitly at the top of your notes\n- Confirm what it means for generation: which tensions it asks you to prioritize, which assumptions it asks you to stress-test\n- You will NOT recommend a winner in step 4 -- you will rank. Selection belongs to the main agent.\n\nIf your goal contains no FOCUS ANGLE (standalone execution):\n- Note that you are running standalone\n- You will discover missing context in subsequent steps\n\nWorking notes:\n- Assigned focus angle (or 'standalone')\n- All fact packet fields extracted from goal\n- What this angle means for generation\n- Output filename\n- Executor vs standalone confirmation",
+      "agentRole": "You are anchoring to your assigned role before doing any work. Read the goal string carefully. Do not skip this step.",
+      "requireConfirmation": false
+    },
     {
       "id": "step-discover-philosophy",
-      "title": "Step 1: Discover the Dev's Philosophy",
-      "prompt": "Discover the dev's coding philosophy and preferences before designing anything.\n\nCheck `philosophySources` context variable first \u2014 if it contains pointers to rules, Memory entries, or repo files, go read those sources directly.\n\nIf `philosophySources` is empty or unavailable, discover from scratch:\n1. Memory MCP (if available): call `mcp_memory_conventions`, `mcp_memory_prefer`, `mcp_memory_recall` to retrieve learned preferences\n2. Active session rules / Firebender rules: read any rules or philosophy documents in context\n3. Repo patterns: infer preferences from how the codebase works \u2014 error handling, mutability, test style, architecture\n\nNote any `philosophyConflicts` (stated rules vs actual repo patterns).\n\nWorking notes:\n- Philosophy sources consulted\n- Key principles discovered\n- Conflicts between stated and practiced philosophy\n- Which principles are likely to constrain this design",
-      "agentRole": "You are discovering what the dev actually cares about before designing solutions.",
+      "title": "Step 1: Discover Principles and Constraints",
+      "prompt": "Discover the decision-maker's principles, preferences, and constraints before designing anything. The approach depends on the problem domain.\n\nIf PHILOSOPHY SOURCES were provided in step 0, go read those sources directly first.\n\nIf no philosophy sources were provided, discover based on domain:\n\n**For software / architecture problems:**\n1. Memory MCP (if available): call `mcp_memory_conventions`, `mcp_memory_prefer`, `mcp_memory_recall`\n2. Read CLAUDE.md, AGENTS.md, .cursor/rules/ or equivalent\n3. Infer from repo patterns: error handling, mutability, test style, type safety, architecture decisions\nNote conflicts between stated rules and actual repo patterns.\n\n**For product / strategy problems:**\n1. Look for product principles, north star metrics, company mission, OKRs\n2. Memory MCP for stated product values or past decisions\n3. Infer from existing product decisions: what tradeoffs were previously accepted?\n\n**For UX / design problems:**\n1. Look for design system docs, accessibility guidelines, brand principles\n2. Infer from existing UI decisions: what patterns are already established?\n\n**For personal / career problems:**\n1. Memory MCP for stated values, priorities, past decisions the user has shared\n2. Infer from what the user has said they care about most\n\n**For general problems:**\n1. Identify the decision-maker's stated priorities and constraints\n2. Note what tradeoffs they have accepted in the past\n\nWorking notes:\n- Sources consulted\n- Key principles discovered\n- Conflicts between stated principles and past behavior\n- Which principles are likely to constrain this design",
+      "agentRole": "You are discovering what the decision-maker actually cares about before designing solutions.",
       "requireConfirmation": false
     },
     {
       "id": "step-understand-deeply",
       "title": "Step 2: Understand the Problem Deeply",
-      "prompt": "Understand the problem before proposing anything.\n\nReason through:\n- What are the core tensions in this problem? (e.g., performance vs simplicity, flexibility vs type safety, backward compatibility vs clean design)\n- How does the codebase already solve similar problems? Study the most relevant existing patterns \u2014 analyze the architectural decisions and constraints they protect, not just list files.\n- Where does the problem most likely live? Is the requested location the real seam, or just where the symptom appears?\n- What nearby callers, consumers, sibling paths, or contracts must remain consistent if that boundary changes?\n- What's the simplest naive solution? Why is it insufficient? (If it IS sufficient, note that \u2014 it may be the best candidate.)\n- What makes this problem hard? What would a junior developer miss?\n- Which of the dev's philosophy principles are under pressure from this problem's constraints?\n\nWorking notes:\n- Core tensions (2-4 real tradeoffs, not generic labels)\n- Existing patterns analysis (decisions, invariants they protect)\n- Likely seam / plausible boundaries\n- Nearby impact surface that must stay consistent\n- Naive solution and why it's insufficient (or sufficient)\n- What makes this hard\n- Philosophy principles under pressure",
+      "prompt": "Understand the problem before proposing anything.\n\nIf TENSIONS and a REFRAMED PROBLEM were extracted in step 0, use them as your starting point -- do not re-investigate what the main agent already resolved. Build on that foundation and add what the main agent may have missed from your assigned angle's perspective.\n\nReason through these universal questions first:\n- What are the core tensions in this problem? (e.g., speed vs quality, simplicity vs flexibility, short-term vs long-term)\n- What is the simplest naive solution? Why is it insufficient? (If it IS sufficient, note that -- it may be the best candidate.)\n- What makes this problem hard? What would someone without deep context miss?\n- Which principles discovered in step 1 are under pressure from this problem's constraints?\n- If in executor mode: what does the problem look like specifically from your assigned angle?\n\nThen reason through domain-specific questions:\n\n**For software / architecture problems:**\n- How does the codebase already solve similar problems? Study existing patterns -- the decisions they encode, the invariants they protect.\n- Where does the problem most likely live? Is the requested location the real seam?\n- What nearby callers, consumers, sibling paths, or contracts must remain consistent?\n\n**For product / strategy problems:**\n- What does the competitive and user landscape look like for this decision?\n- What decisions has the team made before that constrain or inform this one?\n- What signals (data, user research, market trends) are relevant?\n\n**For UX / design problems:**\n- What existing patterns in the design system constrain or inform this?\n- What are the user's mental models and where does the current design break them?\n- What edge cases and accessibility implications need to be considered?\n\n**For personal problems:**\n- What are the real stakes and who else is affected?\n- What is the decision-maker's actual track record with similar decisions?\n- What are they optimizing for, stated and unstated?\n\n**For general problems:**\n- Who are the stakeholders and what do they actually need?\n- What existing constraints or commitments narrow the solution space?\n\nWorking notes:\n- Core tensions (2-4 real tradeoffs, not generic labels)\n- Domain-specific context (patterns, landscape, constraints)\n- Naive solution and why it's insufficient (or sufficient)\n- What makes this hard\n- Principles under pressure\n- How your assigned angle (if in executor mode) shapes your view",
       "agentRole": "You are reasoning deeply about the problem space before generating any solutions.",
       "requireConfirmation": false
     },
     {
       "id": "step-generate-candidates",
       "title": "Step 3: Generate Candidates from Tensions",
-      "prompt": "Generate design candidates that resolve the identified tensions differently.\n\nMANDATORY candidates:\n1. The simplest possible change that satisfies acceptance criteria. If the problem doesn't need an architectural solution, say so.\n2. Follow the existing repo pattern \u2014 adapt what the codebase already does for similar problems. Don't invent when you can adapt.\n\nAdditional candidates (1-2 more):\n- Each must resolve the identified tensions DIFFERENTLY, not just vary surface details\n- Each must be grounded in a real constraint or tradeoff, not an abstract perspective label\n- Consider philosophy conflicts: if the stated philosophy disagrees with repo patterns, one candidate could follow the stated philosophy and another could follow the established pattern\n\nFor each candidate, produce:\n- One-sentence summary of the approach\n- Which tensions it resolves and which it accepts\n- Boundary solved at, and why that boundary is the best fit\n- The specific failure mode you'd watch for\n- How it relates to existing repo patterns (follows / adapts / departs)\n- What you gain and what you give up\n- Impact surface beyond the immediate task\n- Scope judgment: too narrow / best-fit / too broad, with concrete evidence\n- Which philosophy principles it honors and which it conflicts with (by name)\n\nRules:\n- candidates must be genuinely different in shape, not just wording\n- if all candidates converge on the same approach, that's signal \u2014 note it honestly rather than manufacturing fake diversity\n- broader scope requires concrete evidence\n- cite specific files or patterns when they materially shape a candidate\n- specify each candidate at the level of concrete shape, not concept labels: 'tags' is not a candidate specification; 'per-workflow multi-labels drawn from a closed 9-value enum' is. If you find yourself using a concept label (tags, categories, events, hooks), you have not yet specified the candidate \u2014 name the data structure, the vocabulary or value set it uses, who maintains it, and how it is queried",
+      "prompt": "Generate design candidates that resolve the identified tensions differently.\n\nIf in executor mode (FOCUS ANGLE was set in step 0):\n- All candidates must be anchored to your assigned angle. Do not generate generic candidates that ignore it.\n- You are NOT required to include the simplest possible change or the standard repo-pattern candidate unless they genuinely arise from your angle. Those are covered by other executors or by the main agent's synthesis.\n- Generate 2-3 candidates that each explore your angle from a different sub-direction -- vary the scope, the boundary, or the tradeoff accepted, but keep all of them anchored to the angle.\n- One candidate should be the most ambitious expression of your angle. One should be the most constrained. Others fill the space between.\n\nIf running standalone:\n- MANDATORY candidates vary by domain:\n  - **Software:** (1) simplest possible change that satisfies acceptance criteria -- if no architectural solution is needed, say so; (2) adapt the existing codebase pattern -- don't invent when you can extend\n  - **Product:** (1) lowest-friction option that achieves the core outcome; (2) highest-differentiation option that maximizes long-term positioning\n  - **UX:** (1) incremental improvement within existing patterns; (2) clean-slate redesign that best serves the user's mental model\n  - **Personal:** (1) most conservative/reversible option; (2) most aligned with stated values even if uncomfortable\n  - **General:** (1) most conservative option; (2) most ambitious option\n- Additional candidates (1-2 more): each must resolve the identified tensions DIFFERENTLY, not just vary surface details.\n\nFor each candidate, produce:\n- One-sentence summary of the approach\n- Which tensions it resolves and which it accepts\n- The boundary or seam this solution addresses, and why that boundary is the right fit\n- The specific failure mode you'd watch for\n- How it relates to existing patterns or precedents (follows / adapts / departs)\n- What you gain and what you give up\n- Impact surface beyond the immediate problem\n- Scope judgment: too narrow / best-fit / too broad, with concrete evidence\n- Which principles from step 1 it honors and which it conflicts with (by name)\n\nRules:\n- Candidates must be genuinely different in shape, not just wording\n- If all candidates converge on the same approach, note it honestly rather than manufacturing fake diversity\n- Broader scope requires concrete evidence\n- Be specific: 'typed store' is not a specification; 'append-only per-run JSON file at a deterministic path, written atomically via temp-rename, read before each spawn' is. Apply the same concreteness to non-software domains: 'focus on enterprise' is not a strategy; 'target CTOs at 500-2000 person companies with a self-serve trial that converts to annual contracts' is.",
       "agentRole": "You are generating genuinely diverse design candidates grounded in real tensions.",
       "requireConfirmation": false
     },
     {
       "id": "step-compare-and-recommend",
-      "title": "Step 4: Compare via Tradeoffs and Recommend",
-      "prompt": "Compare candidates through tradeoff analysis, not checklists.\n\nFor the set of candidates, assess:\n- Which tensions does each resolve best?\n- Which solves the problem at the best-fit boundary?\n- Which has the most manageable failure mode?\n- Which best fits the dev's philosophy? Where are the philosophy conflicts?\n- Which is most consistent with existing repo patterns?\n- Which would be easiest to evolve or reverse if assumptions are wrong?\n- Which is too narrow, best-fit, or too broad \u2014 and why?\n\nProduce a clear recommendation with rationale tied back to tensions, scope judgment, repo patterns, and philosophy. If two candidates are close, say so and explain what would tip the decision.\n\nSelf-critique your recommendation:\n- What's the strongest argument against your pick?\n- What narrower option might still work, and why did it lose?\n- What broader option might be justified, and what evidence would be required?\n- What assumption, if wrong, would invalidate this design?\n\nWorking notes:\n- Comparison matrix (tensions x candidates)\n- Recommendation and rationale\n- Strongest counter-argument\n- Pivot conditions",
-      "agentRole": "You are comparing candidates honestly and recommending based on tradeoffs, not advocacy.",
+      "title": "Step 4: Compare Candidates",
+      "prompt": "Compare candidates through tradeoff analysis, not checklists.\n\nIf in executor mode (FOCUS ANGLE was set in step 0):\n- Do NOT select a winner or make a final recommendation. The main agent owns selection across the full cross-executor candidate set.\n- Rank your candidates by how well each serves your assigned angle. State which is the strongest expression of the angle, which is the most defensible fallback, and what tradeoff separates them.\n- For each candidate: the strongest argument for it from your angle, and the strongest argument against it that the main agent should weigh.\n- State what a candidate from a DIFFERENT angle would need to offer to beat your strongest candidate from this angle's perspective. This is the cross-angle boundary -- it helps the main agent understand where each angle's value runs out.\n\nIf running standalone:\n- Produce a clear recommendation with rationale tied back to tensions, scope judgment, repo patterns, and philosophy.\n- Self-critique: strongest argument against your pick, narrower option that might still work and why it lost, broader option that might be justified and what evidence would be required, assumption that if wrong would invalidate the design.\n\nWorking notes:\n- Ranking (executor) or recommendation (standalone)\n- Strongest argument for and against each candidate\n- Cross-angle boundary statement (executor mode only)\n- Pivot conditions",
+      "agentRole": "You are ranking or recommending honestly. In executor mode you are producing material for the main agent to synthesize -- not closing the decision.",
       "requireConfirmation": false
     },
     {
       "id": "step-deliver",
       "title": "Step 5: Deliver the Design Candidates",
-      "prompt": "Create `{deliverableName}`.\n\nRequired structure:\n- Problem Understanding (tensions, likely seam, what makes it hard)\n- Philosophy Constraints (which principles matter, any conflicts)\n- Impact Surface (what nearby paths, consumers, or contracts must stay consistent)\n- Candidates (each with: summary, tensions resolved/accepted, boundary solved at, why that boundary is the best fit, failure mode, repo-pattern relationship, gains/losses, scope judgment, philosophy fit)\n- Comparison and Recommendation\n- Self-Critique (strongest counter-argument, pivot conditions)\n- Open Questions for the Main Agent\n\nThe main agent will interrogate this output \u2014 it is raw investigative material, not a final decision. Optimize for honest, useful analysis over polished presentation.",
+      "prompt": "Create the output file. Use the filename from OUTPUT FILE in your goal string, defaulting to `design-candidates.md` if none was specified.\n\nRequired structure:\n- Assigned Focus Angle (executor mode) or 'Standalone' -- state this first so the main agent knows the lens\n- Problem Understanding (tensions, likely seam, what makes it hard)\n- Philosophy Constraints (which principles matter, any conflicts)\n- Impact Surface (what nearby paths, consumers, or contracts must stay consistent)\n- Candidates (each with: summary, tensions resolved/accepted, boundary solved at, why that boundary is the best fit, failure mode, repo-pattern relationship, gains/losses, scope judgment, philosophy fit)\n- Ranking (executor mode: ranked by angle fit, no winner declared) or Recommendation (standalone: winner, rationale, self-critique)\n- Cross-Angle Boundary (executor mode only): what a candidate from a different angle would need to offer to beat the strongest candidate from this angle\n- Open Questions for the Main Agent\n\nThe main agent will interrogate this output -- it is raw investigative material, not a final decision. Optimize for honest, useful analysis over polished presentation.",
       "agentRole": "You are delivering design analysis for the main agent to interrogate and build on.",
       "requireConfirmation": false
     }

package/workflows/workflow-for-workflows.json CHANGED Viewed

@@ -647,26 +647,20 @@
       "id": "phase-7a-assign-tags",
       "title": "Phase 7a: Assign Tags",
       "promptBlocks": {
-        "goal": "Register the workflow in the catalog: assign tags in spec/workflow-tags.json and write about and examples fields into the workflow JSON so humans and agents can discover and understand the workflow.",
+        "goal": "Choose the right tags for the workflow and write the about and examples fields into the workflow JSON so humans and agents can discover and understand it.",
         "procedure": [
-          "Read spec/workflow-tags.json to see the available tags and their 'when' phrases.",
           "Based on the workflow's purpose and description, select 1-3 tags from the closed set (coding, review_audit, investigation, design, documentation, tickets, learning, routines, authoring).",
-          "Check whether the workflow ID already exists in the `workflows` section. If it does, update the existing entry tags rather than adding a duplicate. If it does not exist, add a new entry under 'workflows' in spec/workflow-tags.json: { \"tags\": [\"<tag1>\"] }.",
-          "If the workflow is a test fixture or internal utility not meant for end-user discovery, add 'hidden': true.",
-          "Save the tags file. Do not modify any other field.",
           "Write the 'about' field into the workflow JSON: a markdown string (100-400 words) written for a human deciding whether to use this workflow. Cover what it does, when to use it, what it produces, and how to get good results. This is a user-facing surface -- not agent instructions (use metaGuidance for that).",
-          "Write the 'examples' field into the workflow JSON: an array of 2-4 short, concrete goal strings (10-120 chars each) showing what this workflow is used for. Each example should be specific enough to be informative -- not generic ('implement a feature'). These appear in list_workflows output so agents can communicate concrete goal phrasing to users.",
-          "Skip 'about' and 'examples' only if the workflow is marked hidden: true."
+          "Write the 'examples' field into the workflow JSON: an array of 2-4 short, concrete goal strings (10-120 chars each) showing what this workflow is used for. Each example should be specific enough to be informative -- not generic ('implement a feature'). These appear in list_workflows output so agents can communicate concrete goal phrasing to users."
         ],
         "constraints": [
-          "Only use tags from the closed set. Do not invent new tags.",
-          "If the workflow already has an entry in the tags file, update it rather than adding a duplicate.",
-          "Tags should reflect what the workflow does, not what it is named.",
+          "Only use tags from the closed set in spec/workflow-tags.json. Do not invent new tags.",
+          "Do not write tags into the workflow JSON file -- the tags field is not part of the workflow schema. Tags are for catalog registration only, and external users do not have access to the package's spec/workflow-tags.json.",
           "Write 'about' for humans, not agents -- do not copy metaGuidance or step prompt text into it.",
           "Examples must be specific to this workflow; reject generic examples that would fit any workflow."
         ],
         "outputRequired": {
-          "notesMarkdown": "List the assigned tags with a one-line justification for each. Confirm about and examples were written."
+          "notesMarkdown": "State the chosen tags with a one-line justification for each. Confirm about and examples were written into the workflow JSON."
         }
       },
       "requireConfirmation": false