npm - @exaudeus/workrail - Versions diffs - 3.13.0 → 3.15.0 - Mend

@exaudeus/workrail 3.13.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

package/dist/application/services/validation-engine.js +4 -9
package/dist/application/services/workflow-compiler.js +4 -6
package/dist/console/assets/index-BZYIjrzJ.js +28 -0
package/dist/console/assets/index-OLCKbDdm.css +1 -0
package/dist/console/index.html +2 -2
package/dist/engine/engine-factory.js +2 -2
package/dist/engine/types.d.ts +1 -1
package/dist/manifest.json +63 -63
package/dist/mcp/handlers/shared/request-workflow-reader.d.ts +5 -0
package/dist/mcp/handlers/shared/request-workflow-reader.js +47 -2
package/dist/mcp/handlers/v2-advance-core/assessment-consequences.d.ts +1 -1
package/dist/mcp/handlers/v2-advance-core/assessment-consequences.js +4 -5
package/dist/mcp/handlers/v2-advance-core/index.js +1 -1
package/dist/mcp/handlers/v2-advance-core/outcome-blocked.js +1 -1
package/dist/mcp/handlers/v2-execution/start.d.ts +1 -0
package/dist/mcp/handlers/v2-execution/start.js +20 -1
package/dist/mcp/handlers/v2-workflow.d.ts +23 -0
package/dist/mcp/handlers/v2-workflow.js +177 -10
package/dist/mcp/output-schemas.d.ts +202 -8
package/dist/mcp/output-schemas.js +38 -11
package/dist/mcp/server.js +48 -1
package/dist/mcp/tool-descriptions.js +17 -9
package/dist/mcp/v2/tools.d.ts +6 -0
package/dist/mcp/v2/tools.js +2 -0
package/dist/mcp/workflow-protocol-contracts.js +5 -1
package/dist/types/workflow-definition.d.ts +2 -2
package/dist/v2/infra/local/workspace-anchor/index.js +4 -1
package/dist/v2/usecases/console-routes.js +49 -1
package/dist/v2/usecases/console-service.d.ts +1 -0
package/dist/v2/usecases/console-service.js +4 -1
package/dist/v2/usecases/console-types.d.ts +12 -0
package/dist/v2/usecases/worktree-service.js +55 -7
package/package.json +3 -2
package/spec/authoring-spec.json +91 -3
package/spec/workflow-tags.json +132 -0
package/spec/workflow.schema.json +411 -97
package/workflows/adaptive-ticket-creation.json +40 -22
package/workflows/architecture-scalability-audit.json +65 -31
package/workflows/bug-investigation.agentic.v2.json +36 -14
package/workflows/coding-task-workflow-agentic.json +50 -38
package/workflows/coding-task-workflow-agentic.lean.v2.json +124 -37
package/workflows/coding-task-workflow-agentic.v2.json +90 -30
package/workflows/cross-platform-code-conversion.v2.json +168 -48
package/workflows/document-creation-workflow.json +47 -17
package/workflows/documentation-update-workflow.json +8 -8
package/workflows/intelligent-test-case-generation.json +2 -2
package/workflows/learner-centered-course-workflow.json +267 -267
package/workflows/mr-review-workflow.agentic.v2.json +81 -14
package/workflows/personal-learning-materials-creation-branched.json +175 -175
package/workflows/presentation-creation.json +159 -159
package/workflows/production-readiness-audit.json +54 -15
package/workflows/relocation-workflow-us.json +44 -35
package/workflows/routines/tension-driven-design.json +1 -1
package/workflows/scoped-documentation-workflow.json +25 -25
package/workflows/test-artifact-loop-control.json +1 -2
package/workflows/ui-ux-design-workflow.json +327 -0
package/workflows/workflow-diagnose-environment.json +1 -1
package/workflows/workflow-for-workflows.json +507 -484
package/workflows/workflow-for-workflows.v2.json +90 -18
package/workflows/wr.discovery.json +112 -30
package/dist/console/assets/index-DW78t31j.css +0 -1
package/dist/console/assets/index-EsSXrC_a.js +0 -28

package/workflows/adaptive-ticket-creation.json CHANGED Viewed

@@ -2,15 +2,15 @@
   "id": "adaptive-ticket-creation",
   "name": "Adaptive Ticket Creation Workflow",
   "version": "1.0.0",
-  "description": "Create high-quality Jira tickets by automatically selecting the right complexity path (Simple, Standard, or Epic) based on request analysis. One polished ticket for simple requests; structured decomposition with estimates for epics.",
+  "description": "Use this to create high-quality Jira tickets for features, tasks, or epics. Automatically selects the right complexity path (Simple, Standard, or Epic) and generates properly structured tickets with acceptance criteria and estimates.",
   "preconditions": [
     "User has provided a description of the feature, task, or work to be ticketed.",
     "Agent has file system access for loading team preferences and persisting rules."
   ],
   "metaGuidance": [
-    "ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria — not user-story paraphrases.",
+    "ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria \u2014 not user-story paraphrases.",
     "EXPLORE FIRST: use tools to gather context before asking the user anything. Ask only for information you genuinely cannot determine with tools or from the request itself.",
-    "TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path — complex sessions are where durable conventions emerge and where the investment pays off.",
+    "TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path \u2014 complex sessions are where durable conventions emerge and where the investment pays off.",
     "AUTONOMOUS TRIAGE: decide pathComplexity (Simple / Standard / Epic) yourself from the request. Surface your reasoning, then wait for confirmation.",
     "QUALITY FLOOR: every ticket must have a context-rich description, checkbox-style acceptance criteria that are objectively testable, and an effort estimate."
   ],
@@ -21,7 +21,7 @@
       "promptBlocks": {
         "goal": "Analyze the request, gather available context, and select the right complexity path before doing any ticket work.",
         "constraints": [
-          "Decide the path yourself — do not ask the user to choose.",
+          "Decide the path yourself \u2014 do not ask the user to choose.",
           "Load ./.workflow_rules/ticket_creation.md if it exists and let it influence your triage. If the file does not exist, note this explicitly in your output so the user knows team conventions were not applied.",
           "Set pathComplexity to exactly one of: Simple, Standard, or Epic."
         ],
@@ -29,7 +29,7 @@
           "Read any attached documents, linked PRDs, or referenced specs.",
           "Identify complexity signals: scope breadth, number of distinct deliverables, cross-team dependencies, technical unknowns, and estimated ticket count.",
           "Apply the triage rubric: Simple = single ticket, clear requirements, no blocking unknowns, minimal dependencies. Standard = multiple related tickets, moderate scope, some analysis needed. Epic = complex feature requiring decomposition, multiple teams or significant unknowns, likely 6+ tickets.",
-          "Upgrade triggers — escalate to Standard if: request implies more than one clearly separate work item. Escalate to Epic if: multiple teams are involved, architecture decisions are unresolved, or you estimate more than five tickets.",
+          "Upgrade triggers \u2014 escalate to Standard if: request implies more than one clearly separate work item. Escalate to Epic if: multiple teams are involved, architecture decisions are unresolved, or you estimate more than five tickets.",
           "State your selected path and the top three reasons. Capture pathComplexity in context."
         ],
         "outputRequired": {
@@ -53,7 +53,7 @@
       "promptBlocks": {
         "goal": "Generate one complete, developer-ready Jira ticket for this request.",
         "constraints": [
-          "Acceptance criteria must be phrased as observable, testable conditions — not user-story restatements.",
+          "Acceptance criteria must be phrased as observable, testable conditions \u2014 not user-story restatements.",
           "Follow any team conventions from ./.workflow_rules/ticket_creation.md.",
           "Include all fields a developer needs to start work without asking follow-up questions."
         ],
@@ -81,8 +81,14 @@
       "title": "Path C, Phase 1: Gather Context and Surface Gaps",
       "runCondition": {
         "or": [
-          { "var": "pathComplexity", "equals": "Standard" },
-          { "var": "pathComplexity", "equals": "Epic" }
+          {
+            "var": "pathComplexity",
+            "equals": "Standard"
+          },
+          {
+            "var": "pathComplexity",
+            "equals": "Epic"
+          }
         ]
       },
       "promptBlocks": {
@@ -97,7 +103,7 @@
           "Load ./.workflow_rules/ticket_creation.md and note any relevant team conventions.",
           "Identify: key stakeholders, team dependencies, technical constraints, known risks, and any conflicting requirements.",
           "Classify each gap as: Critical (blocks planning), Important (affects scope), or Nice-to-have (can proceed without it).",
-          "For Critical and Important gaps that tools cannot resolve, ask the user — in a single consolidated question block, not one at a time.",
+          "For Critical and Important gaps that tools cannot resolve, ask the user \u2014 in a single consolidated question block, not one at a time.",
           "After receiving answers, check whether any response reveals scope that would change `pathComplexity` (e.g. the user confirms three teams are involved, or the feature is narrower than initially assessed). If so, state the new classification and reasoning, and ask the user to confirm before continuing to Phase 2."
         ],
         "outputRequired": {
@@ -116,23 +122,29 @@
       "title": "Path C, Phase 2: Create High-Level Plan",
       "runCondition": {
         "or": [
-          { "var": "pathComplexity", "equals": "Standard" },
-          { "var": "pathComplexity", "equals": "Epic" }
+          {
+            "var": "pathComplexity",
+            "equals": "Standard"
+          },
+          {
+            "var": "pathComplexity",
+            "equals": "Epic"
+          }
         ]
       },
       "promptBlocks": {
         "goal": "Produce a structured plan that will drive ticket generation. This plan is the source of truth for scope.",
         "constraints": [
-          "Be explicit about scope boundaries — ambiguous scope will produce ambiguous tickets.",
+          "Be explicit about scope boundaries \u2014 ambiguous scope will produce ambiguous tickets.",
           "Success criteria must be measurable, not just descriptive.",
           "For Standard path: this plan feeds directly into batch ticket generation."
         ],
         "procedure": [
           "Write: Project Summary (2-3 sentences, what is being built and why).",
           "Write: Key Deliverables (bulleted list of distinct components or features).",
-          "Write: In-Scope (explicit list — prevents scope creep).",
-          "Write: Out-of-Scope (explicit exclusions — prevents misunderstandings).",
-          "Write: Success Criteria (measurable definition of done — each item verifiable).",
+          "Write: In-Scope (explicit list \u2014 prevents scope creep).",
+          "Write: Out-of-Scope (explicit exclusions \u2014 prevents misunderstandings).",
+          "Write: Success Criteria (measurable definition of done \u2014 each item verifiable).",
           "Write: High-Level Timeline (phases or milestones with rough sizing).",
           "Review: does every deliverable map clearly to implementable work? Is anything in scope that should be out?"
         ],
@@ -158,7 +170,7 @@
         "goal": "Break the approved plan into a logical work hierarchy that development teams can execute.",
         "constraints": [
           "Every item in the plan's In-Scope list must map to at least one work item in the hierarchy.",
-          "Dependencies must be explicit — not implied by ordering alone.",
+          "Dependencies must be explicit \u2014 not implied by ordering alone.",
           "Oversized stories (more than one sprint of work) should be split."
         ],
         "procedure": [
@@ -190,7 +202,7 @@
       "promptBlocks": {
         "goal": "Add effort estimates, risk assessments, and team assignments to each story in the hierarchy.",
         "constraints": [
-          "Conservative estimates are better than optimistic ones — note uncertainty explicitly.",
+          "Conservative estimates are better than optimistic ones \u2014 note uncertainty explicitly.",
           "Justify each estimate with one sentence of reasoning.",
           "Flag stories on the critical path."
         ],
@@ -200,7 +212,7 @@
           "Assign priority: must-have for MVP, should-have, nice-to-have.",
           "Note suggested team or skill area for each story.",
           "Identify critical path: which stories block the most downstream work? Surface these explicitly.",
-          "Flag any stories whose estimates feel uncertain — surface the unknowns rather than hiding them in a range."
+          "Flag any stories whose estimates feel uncertain \u2014 surface the unknowns rather than hiding them in a range."
         ],
         "outputRequired": {
           "notesMarkdown": "Total story point estimate, critical path items, high-risk stories."
@@ -218,8 +230,14 @@
       "title": "Path C/E, Phase 5: Batch Ticket Generation",
       "runCondition": {
         "or": [
-          { "var": "pathComplexity", "equals": "Standard" },
-          { "var": "pathComplexity", "equals": "Epic" }
+          {
+            "var": "pathComplexity",
+            "equals": "Standard"
+          },
+          {
+            "var": "pathComplexity",
+            "equals": "Epic"
+          }
         ]
       },
       "promptBlocks": {
@@ -259,7 +277,7 @@
       "promptBlocks": {
         "goal": "Extract actionable team preferences from this session and persist them so future runs use them automatically.",
         "constraints": [
-          "Only write rules that are genuinely reusable across future tickets — skip one-off project specifics.",
+          "Only write rules that are genuinely reusable across future tickets \u2014 skip one-off project specifics.",
           "Keep rules concise and actionable, not narrative.",
           "Append to ./.workflow_rules/ticket_creation.md rather than replacing it."
         ],
@@ -267,7 +285,7 @@
           "Review what conventions, preferences, or requirements emerged during this session.",
           "Identify patterns worth preserving: naming conventions, field usage, AC format preferences, estimation approach, labeling rules.",
           "Draft new rules as short, imperative statements (e.g., 'Use T-shirt sizing not Fibonacci', 'Always include a Figma link in design tickets').",
-          "Check against existing rules — avoid duplicates or contradictions.",
+          "Check against existing rules \u2014 avoid duplicates or contradictions.",
           "Append new rules to ./.workflow_rules/ticket_creation.md, creating the file if it does not exist."
         ],
         "outputRequired": {

package/workflows/architecture-scalability-audit.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "id": "architecture-scalability-audit",
-  "name": "Architecture Scalability Audit (v1 • Evidence-Driven • Dimension-Scoped • rigorMode-Adaptive)",
+  "name": "Architecture Scalability Audit (v1 \u2022 Evidence-Driven \u2022 Dimension-Scoped \u2022 rigorMode-Adaptive)",
   "version": "0.1.0",
-  "description": "Audit a bounded codebase scope for architecture scalability. The user declares which scalability dimensions matter (load, data volume, team/org, feature extensibility, operational); the workflow audits only those dimensions and produces per-dimension verdicts grounded in actual code, not generic advice.",
+  "description": "Use this to audit a bounded codebase scope for architecture scalability. Declare which scalability dimensions matter (load, data volume, team size, feature extensibility, operational); the workflow investigates each and produces evidence-grounded findings.",
   "recommendedPreferences": {
     "recommendedAutonomy": "guided",
     "recommendedRiskPolicy": "conservative"
@@ -20,7 +20,7 @@
     "DEFAULT BEHAVIOR: self-execute with tools. Ask only for true scope or dimension decisions you cannot resolve yourself.",
     "V2 DURABILITY: keep workflow truth in output.notesMarkdown and explicit context fields. Human-facing markdown artifacts are optional companions only.",
     "OWNERSHIP: the main agent owns the fact packet, synthesis, verdict calibration, and final handoff. Delegated dimension audits are evidence, not authority.",
-    "DIMENSION DISCIPLINE: audit only the dimensions the user declared. Do not add dimensions the user did not select, even if they look relevant — surface them as advisory notes instead.",
+    "DIMENSION DISCIPLINE: audit only the dimensions the user declared. Do not add dimensions the user did not select, even if they look relevant \u2014 surface them as advisory notes instead.",
     "EVIDENCE FIRST: every risk or will_break finding must cite a specific file, class, method, or pattern in the codebase. Technology name alone is not evidence.",
     "GROWTH SCENARIO: every concern must name a growth scenario (e.g. 10x traffic, 100x records, 3x team size). Generic 'won't scale' findings are not acceptable.",
     "VERDICT TIERS: use will_break / risk / fine. Do not force a cleaner answer than the evidence supports.",
@@ -33,16 +33,21 @@
       "promptBlocks": {
         "goal": "Establish a precise bounded scope and confirm which scalability dimensions this audit will cover.",
         "constraints": [
-          [{ "kind": "ref", "refId": "wr.refs.notes_first_durability" }],
+          [
+            {
+              "kind": "ref",
+              "refId": "wr.refs.notes_first_durability"
+            }
+          ],
           "Scope must be bounded before investigation begins. Unbounded scope produces generic findings.",
           "Dimension selection is the user's decision. Explore the codebase to inform the conversation, but the user declares which dimensions matter."
         ],
         "procedure": [
           "Read the codebase to understand the architecture: key components, entry points, data flows, and main patterns within the declared scope.",
-          "Present the five scalability dimensions and ask the user to select which apply: (1) load — handles more requests, users, or throughput; (2) data_volume — handles more records, storage, or query size; (3) team_org — more teams or developers working on this scope; (4) feature_extensibility — more features added without rearchitecting; (5) operational — more deployments, environments, or operational complexity.",
-          "Ask the user to confirm the scope boundary — what is explicitly in and explicitly out.",
-          "Classify audit complexity: Simple (1–2 dimensions, small scope), Medium (2–3 dimensions, moderate scope), Complex (4–5 dimensions or large scope).",
-          "Run a context-clarity check: score boundary_clarity, dimension_clarity, and codebase_familiarity 1–3. If any score is 1, gather more context before advancing."
+          "Present the five scalability dimensions and ask the user to select which apply: (1) load \u2014 handles more requests, users, or throughput; (2) data_volume \u2014 handles more records, storage, or query size; (3) team_org \u2014 more teams or developers working on this scope; (4) feature_extensibility \u2014 more features added without rearchitecting; (5) operational \u2014 more deployments, environments, or operational complexity.",
+          "Ask the user to confirm the scope boundary \u2014 what is explicitly in and explicitly out.",
+          "Classify audit complexity: Simple (1\u20132 dimensions, small scope), Medium (2\u20133 dimensions, moderate scope), Complex (4\u20135 dimensions or large scope).",
+          "Run a context-clarity check: score boundary_clarity, dimension_clarity, and codebase_familiarity 1\u20133. If any score is 1, gather more context before advancing."
         ],
         "outputRequired": {
           "notesMarkdown": "Scope boundary (in and out), declared dimensions with rationale, audit complexity classification, and any open boundary questions.",
@@ -87,7 +92,12 @@
       "promptBlocks": {
         "goal": "Freeze a neutral scalability fact packet and assign one reviewer family per declared dimension.",
         "constraints": [
-          [{ "kind": "ref", "refId": "wr.refs.notes_first_durability" }],
+          [
+            {
+              "kind": "ref",
+              "refId": "wr.refs.notes_first_durability"
+            }
+          ],
           "The fact packet is the primary truth for all dimension reviewer families.",
           "Keep the scalability hypothesis as a reference to challenge, not a frame to defend.",
           "One reviewer family per declared dimension only. Do not add families for undeclared dimensions."
@@ -95,7 +105,7 @@
         "procedure": [
           "Create a neutral `scalabilityFactPacket` containing: scope boundary (in and out), declared dimensions, key architectural patterns found, main components and their roles, data flow and storage patterns, concurrency and state management approach, dependency boundaries and coupling, deployment and runtime assumptions, and explicit open unknowns.",
           "Include realism signals: code that looks scalable at a glance but may have hidden limits (e.g. in-memory state, synchronous choke points, missing pagination, tight coupling between components).",
-          "For each declared dimension, assign a reviewer family mission: load = examine request handling, concurrency, session/state management, caching, connection pools, and horizontal scaling readiness — check whether session state is in-memory or distributed, whether connection pools are bounded, whether synchronous bottlenecks exist in hot paths; data_volume = examine query patterns, pagination, indexing, result set bounds, storage growth, and data access layer scalability — check for unbounded queries (missing LIMIT/pagination), missing indexes on filtered columns, N+1 patterns in repository/service layers, and data structures that grow unboundedly; team_org = examine module coupling, shared state, and parallel development friction — specifically check import graphs for cross-module dependencies that would cause merge conflicts, identify shared mutable singletons or global state, look for test setup that requires spinning up adjacent modules, and check whether public interfaces change frequently or are stable; feature_extensibility = examine how much code changes when a new variant of a core concept is added — specifically look for switch/when/if-else chains on type discriminators that would need a new branch per feature, hardcoded business-rule constants, direct concrete dependencies instead of interfaces or abstractions, and files that are edited for every new feature; operational = examine deployment complexity, environment-specific behavior, observability, configuration surface, and operational runbook needs — specifically check for environment-specific code paths (if/switch on env vars that create different behavior per environment), configuration that must be updated in multiple places per deployment, whether logs and metrics cover the main operational failure modes, and whether a new deployment of this scope would require manual steps beyond a standard deploy.",
+          "For each declared dimension, assign a reviewer family mission: load = examine request handling, concurrency, session/state management, caching, connection pools, and horizontal scaling readiness \u2014 check whether session state is in-memory or distributed, whether connection pools are bounded, whether synchronous bottlenecks exist in hot paths; data_volume = examine query patterns, pagination, indexing, result set bounds, storage growth, and data access layer scalability \u2014 check for unbounded queries (missing LIMIT/pagination), missing indexes on filtered columns, N+1 patterns in repository/service layers, and data structures that grow unboundedly; team_org = examine module coupling, shared state, and parallel development friction \u2014 specifically check import graphs for cross-module dependencies that would cause merge conflicts, identify shared mutable singletons or global state, look for test setup that requires spinning up adjacent modules, and check whether public interfaces change frequently or are stable; feature_extensibility = examine how much code changes when a new variant of a core concept is added \u2014 specifically look for switch/when/if-else chains on type discriminators that would need a new branch per feature, hardcoded business-rule constants, direct concrete dependencies instead of interfaces or abstractions, and files that are edited for every new feature; operational = examine deployment complexity, environment-specific behavior, observability, configuration surface, and operational runbook needs \u2014 specifically check for environment-specific code paths (if/switch on env vars that create different behavior per environment), configuration that must be updated in multiple places per deployment, whether logs and metrics cover the main operational failure modes, and whether a new deployment of this scope would require manual steps beyond a standard deploy.",
           "Set selectedReviewerFamilies to the list of assigned families (one per declared dimension). Set contradictionCount and blindSpotCount to 0."
         ],
         "outputRequired": {
@@ -110,8 +120,11 @@
       "promptFragments": [
         {
           "id": "phase-2-quick",
-          "when": { "var": "auditComplexity", "equals": "Simple" },
-          "text": "For a Simple audit, keep the fact packet compact — scope summary, key patterns, and declared dimensions only. Skip exhaustive realism signal enumeration."
+          "when": {
+            "var": "auditComplexity",
+            "equals": "Simple"
+          },
+          "text": "For a Simple audit, keep the fact packet compact \u2014 scope summary, key patterns, and declared dimensions only. Skip exhaustive realism signal enumeration."
         }
       ],
       "requireConfirmation": false
@@ -122,15 +135,25 @@
       "promptBlocks": {
         "goal": "Run one reviewer family per declared dimension in parallel, then synthesize their findings as evidence rather than verdicts.",
         "constraints": [
-          [{ "kind": "ref", "refId": "wr.refs.notes_first_durability" }],
-          [{ "kind": "ref", "refId": "wr.refs.synthesis_under_disagreement" }],
+          [
+            {
+              "kind": "ref",
+              "refId": "wr.refs.notes_first_durability"
+            }
+          ],
+          [
+            {
+              "kind": "ref",
+              "refId": "wr.refs.synthesis_under_disagreement"
+            }
+          ],
           "Each reviewer family uses scalabilityFactPacket as primary truth.",
           "Reviewer-family outputs are raw evidence. The main agent owns synthesis and verdict assignment.",
-          "Each reviewer family audits only its declared dimension — no cross-dimension scope creep."
+          "Each reviewer family audits only its declared dimension \u2014 no cross-dimension scope creep."
         ],
         "procedure": [
           "Before investigating, restate your scalabilityHypothesis and name which dimension is most likely to challenge it.",
-          "Run one investigation per declared dimension. For each dimension, the investigation must return: top findings, evidence for each finding (specific file, class, method, or pattern references — not just technology names), verdict tier per finding (will_break / risk / fine), growth scenario for each concern (e.g. 10x traffic, 100x records, 3x team size), biggest uncertainty, and likely false-confidence vector for this dimension.",
+          "Run one investigation per declared dimension. For each dimension, the investigation must return: top findings, evidence for each finding (specific file, class, method, or pattern references \u2014 not just technology names), verdict tier per finding (will_break / risk / fine), growth scenario for each concern (e.g. 10x traffic, 100x records, 3x team size), biggest uncertainty, and likely false-confidence vector for this dimension.",
           "After completing all dimension investigations, synthesize explicitly: what was confirmed, what was genuinely new, what looks weak or overstated, and what changed your current hypothesis.",
           "Build dimensionFindings keyed by dimension containing: findings list, verdict summary, evidence quality assessment, and open questions.",
           "Identify cross-cutting concerns: architectural patterns or components that appear in findings from multiple dimensions."
@@ -148,12 +171,18 @@
       "promptFragments": [
         {
           "id": "phase-3-quick",
-          "when": { "var": "auditComplexity", "equals": "Simple" },
+          "when": {
+            "var": "auditComplexity",
+            "equals": "Simple"
+          },
           "text": "For a Simple audit, self-execute each dimension investigation directly without spawning WorkRail Executors. One dimension at a time, using tools to inspect the codebase. This keeps the audit proportionate to the scope."
         },
         {
           "id": "phase-3-thorough",
-          "when": { "var": "auditComplexity", "equals": "Complex" },
+          "when": {
+            "var": "auditComplexity",
+            "equals": "Complex"
+          },
           "text": "For a Complex audit, spawn all dimension executors simultaneously, then after synthesis run routine-hypothesis-challenge against any will_break finding before closing this phase. This adds an adversarial check on the most serious findings."
         }
       ],
@@ -179,7 +208,12 @@
           "promptBlocks": {
             "goal": "Resolve contradictions between dimension findings and sharpen cross-cutting concerns.",
             "constraints": [
-              [{ "kind": "ref", "refId": "wr.refs.parallelize_cognition_serialize_synthesis" }],
+              [
+                {
+                  "kind": "ref",
+                  "refId": "wr.refs.parallelize_cognition_serialize_synthesis"
+                }
+              ],
               "Contradiction resolution is main-agent work. Do not delegate synthesis.",
               "A cross-cutting concern that spans multiple dimensions is its own finding."
             ],
@@ -210,10 +244,10 @@
               "This is a structured four-item check, not a free-form review."
             ],
             "procedure": [
-              "Check 1 — Technology-vs-usage: did any reviewer identify a scalable technology without checking actual usage patterns in the code? (e.g. Postgres was identified as the DB, but were N+1 queries, missing indexes, or unbounded result sets actually checked?) Fix any instances found.",
-              "Check 2 — Scope drift: did any reviewer audit components outside the declared scope boundary? Remove out-of-scope findings.",
-              "Check 3 — Undeclared relevant dimensions: does the codebase have patterns suggesting a declared-out dimension actually matters for this scope? If so, surface it as an advisory note without adding it to the audit verdict.",
-              "Check 4 — Growth scenario vagueness: does every concern name a specific growth scenario? If not, assign one now based on the most realistic growth pattern for this scope.",
+              "Check 1 \u2014 Technology-vs-usage: did any reviewer identify a scalable technology without checking actual usage patterns in the code? (e.g. Postgres was identified as the DB, but were N+1 queries, missing indexes, or unbounded result sets actually checked?) Fix any instances found.",
+              "Check 2 \u2014 Scope drift: did any reviewer audit components outside the declared scope boundary? Remove out-of-scope findings.",
+              "Check 3 \u2014 Undeclared relevant dimensions: does the codebase have patterns suggesting a declared-out dimension actually matters for this scope? If so, surface it as an advisory note without adding it to the audit verdict.",
+              "Check 4 \u2014 Growth scenario vagueness: does every concern name a specific growth scenario? If not, assign one now based on the most realistic growth pattern for this scope.",
               "Set blindSpotCount to the number of blind spots found across all four checks."
             ],
             "outputRequired": {
@@ -265,11 +299,11 @@
           "Do not advance to handoff with known hard gate failures."
         ],
         "procedure": [
-          "Verdict aggregation — derive scalabilityVerdict from dimensionFindings using these explicit rules: (1) at_risk if any declared dimension has a will_break finding; (2) conditional if no will_break findings exist but at least one dimension has a risk finding; (3) ready_to_scale if all declared dimensions have only fine findings; (4) inconclusive if any dimension still has evidenceWeak = true after the synthesis loop, making a reliable verdict impossible. Capture verdictRationale naming the specific dimension and finding that drove the verdict.",
-          "Hard gate 1 — Evidence grounding: for every will_break and risk finding in dimensionFindings, confirm it cites a specific file, class, method, or code pattern. Technology name alone fails this gate. Fix by locating the code evidence or downgrading to risk with an evidence-needed note.",
-          "Hard gate 2 — Dimension coverage: confirm every declared dimension has at least one substantive finding. A verdict of fine with supporting evidence counts. A dimension with no findings at all fails this gate.",
-          "Hard gate 3 — Hypothesis revisited: confirm that scalabilityHypothesis from Phase 1 is either confirmed or explicitly revised in synthesis notes. If it was never addressed, address it now.",
-          "Hard gate 4 — Growth scenario specificity: confirm every concern in dimensionFindings names a growth scenario. If any do not, assign one now.",
+          "Verdict aggregation \u2014 derive scalabilityVerdict from dimensionFindings using these explicit rules: (1) at_risk if any declared dimension has a will_break finding; (2) conditional if no will_break findings exist but at least one dimension has a risk finding; (3) ready_to_scale if all declared dimensions have only fine findings; (4) inconclusive if any dimension still has evidenceWeak = true after the synthesis loop, making a reliable verdict impossible. Capture verdictRationale naming the specific dimension and finding that drove the verdict.",
+          "Hard gate 1 \u2014 Evidence grounding: for every will_break and risk finding in dimensionFindings, confirm it cites a specific file, class, method, or code pattern. Technology name alone fails this gate. Fix by locating the code evidence or downgrading to risk with an evidence-needed note.",
+          "Hard gate 2 \u2014 Dimension coverage: confirm every declared dimension has at least one substantive finding. A verdict of fine with supporting evidence counts. A dimension with no findings at all fails this gate.",
+          "Hard gate 3 \u2014 Hypothesis revisited: confirm that scalabilityHypothesis from Phase 1 is either confirmed or explicitly revised in synthesis notes. If it was never addressed, address it now.",
+          "Hard gate 4 \u2014 Growth scenario specificity: confirm every concern in dimensionFindings names a growth scenario. If any do not, assign one now.",
           "Set hardGatesPassed = true only when the verdict aggregation and all four gates pass. Set hardGateFailures to the list of any that needed fixing."
         ],
         "outputRequired": {
@@ -293,13 +327,13 @@
           "Do not drift into implementation planning or remediation design unless the user explicitly asks."
         ],
         "procedure": [
-          "Open with the overall scalability readiness verdict (ready_to_scale / conditional / at_risk / inconclusive) and the verdictRationale — name the specific dimension and finding that drove it.",
+          "Open with the overall scalability readiness verdict (ready_to_scale / conditional / at_risk / inconclusive) and the verdictRationale \u2014 name the specific dimension and finding that drove it.",
           "For each declared dimension, give: dimension name, verdict tier (will_break / risk / fine), top finding with specific code reference, growth scenario, and severity.",
           "List cross-cutting concerns: patterns that create scalability risk across multiple dimensions.",
           "Revisit scalabilityHypothesis from Phase 1: was it confirmed or revised? What evidence changed your view?",
           "Give a prioritized concern list ordered by: (1) will_break findings first, (2) risk findings by severity, (3) cross-cutting concerns, (4) fine findings worth noting as already solid.",
           "Surface any advisory notes for undeclared dimensions that may be worth considering.",
-          "State what is already well-designed for scale — not everything should be a concern."
+          "State what is already well-designed for scale \u2014 not everything should be a concern."
         ],
         "outputRequired": {
           "notesMarkdown": "Decision-ready scalability handoff: overall verdict, per-dimension summary with code references, prioritized concerns, cross-cutting concerns, hypothesis outcome, and what is already solid."
@@ -308,7 +342,7 @@
           "The handoff is verdict-first and evidence-grounded.",
           "Every concern is tied to a specific code reference and growth scenario.",
           "The hypothesis from Phase 1 is explicitly addressed.",
-          "What is already well-designed is stated — not just the concerns."
+          "What is already well-designed is stated \u2014 not just the concerns."
         ]
       },
       "requireConfirmation": false

package/workflows/bug-investigation.agentic.v2.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "id": "bug-investigation-agentic",
-  "name": "Bug Investigation (v2 • Notes-First • WorkRail Executor)",
+  "name": "Bug Investigation (v2 \u2022 Notes-First \u2022 WorkRail Executor)",
   "version": "2.0.0",
-  "description": "A v2-first bug investigation workflow focused on moving from theory to proof with notes-first durability, explicit trigger fields, de-anchored fresh-eye review, and investigation-only handoff boundaries.",
+  "description": "Use this to diagnose a bug or unexpected behavior in code. Builds a hypothesis, gathers evidence, and proves or disproves the root cause before concluding.",
   "recommendedPreferences": {
     "recommendedAutonomy": "guided",
     "recommendedRiskPolicy": "conservative"
@@ -39,7 +39,10 @@
         {
           "id": "confidence",
           "purpose": "How confident the agent is that the diagnosis is ready for final handoff.",
-          "levels": ["low", "high"]
+          "levels": [
+            "low",
+            "high"
+          ]
         }
       ]
     }
@@ -47,7 +50,7 @@
   "steps": [
     {
       "id": "phase-0-triage-and-intake",
-      "title": "Phase 0: Triage (Bug Intake • Risk • Mode)",
+      "title": "Phase 0: Triage (Bug Intake \u2022 Risk \u2022 Mode)",
       "prompt": "Understand the bug report and choose the right rigor.\n\nCapture:\n- `bugSummary`: concise statement of the issue\n- `reproSummary`: repro steps, symptoms, expected behavior, environment notes\n- `investigationComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `maxParallelism`: 0 / 2 / 3\n\nDecision guidance:\n- QUICK: clear repro, narrow surface area, low ambiguity\n- STANDARD: moderate ambiguity, moderate system breadth, or meaningful risk\n- THOROUGH: high ambiguity, high-risk production impact, broad surface area, or multiple plausible causes\n\nSet context variables:\n- `bugSummary`\n- `reproSummary`\n- `investigationComplexity`\n- `riskLevel`\n- `rigorMode`\n- `automationLevel`\n- `maxParallelism`\n- `reproducibilityConfidence` (High / Medium / Low)\n\nAsk for confirmation only if the chosen rigor materially affects expectations or if critical repro details are still missing.",
       "requireConfirmation": true
     },
@@ -57,8 +60,14 @@
       "prompt": "If critical inputs are missing, ask only for the minimum needed to investigate.\n\nPossible asks:\n- missing repro steps or failing test command\n- missing expected behavior\n- missing environment constraints or permissions\n- missing logs or stack traces when the codebase alone cannot answer the gap\n\nDo NOT ask for information you can discover with tools.",
       "requireConfirmation": {
         "or": [
-          { "var": "automationLevel", "equals": "Low" },
-          { "var": "automationLevel", "equals": "Medium" }
+          {
+            "var": "automationLevel",
+            "equals": "Low"
+          },
+          {
+            "var": "automationLevel",
+            "equals": "Medium"
+          }
         ]
       }
     },
@@ -78,8 +87,14 @@
       "prompt": "Reassess investigation scope after real context is known.\n\nReview:\n- `contextUnknownCount`\n- `executionPathCount`\n- `suspiciousPointCount`\n- actual systems/components involved\n- whether risk or ambiguity is larger than originally assessed\n\nDo:\n- confirm or adjust `investigationComplexity`\n- confirm or adjust `riskLevel`\n- confirm or adjust `rigorMode`\n- confirm or adjust `maxParallelism`\n\nSet context variables:\n- `investigationComplexity`\n- `riskLevel`\n- `rigorMode`\n- `maxParallelism`\n- `retriageChanged`\n\nRule:\n- upgrade rigor when the real investigation surface is broader or riskier than expected",
       "requireConfirmation": {
         "or": [
-          { "var": "retriageChanged", "equals": true },
-          { "var": "automationLevel", "equals": "Low" }
+          {
+            "var": "retriageChanged",
+            "equals": true
+          },
+          {
+            "var": "automationLevel",
+            "equals": "Low"
+          }
         ]
       }
     },
@@ -118,7 +133,7 @@
         {
           "id": "phase-4b-loop-decision",
           "title": "Evidence Loop Decision",
-          "prompt": "Decide whether the evidence loop should continue.\n\nDecision rules:\n- if `contradictionCount > 0` → continue\n- else if `unresolvedEvidenceGapCount > 0` → continue\n- else if `hasStrongAlternative = true` and the alternative is not meaningfully weaker → continue\n- else if `diagnosisType = inconclusive_but_narrowed` and further evidence is not realistically available → stop with bounded uncertainty\n- else → stop\n\nOutput exactly:\n```json\n{\n  \"artifacts\": [{\n    \"kind\": \"wr.loop_control\",\n    \"decision\": \"continue\"\n  }]\n}\n```",
+          "prompt": "Decide whether the evidence loop should continue.\n\nDecision rules:\n- if `contradictionCount > 0` \u2192 continue\n- else if `unresolvedEvidenceGapCount > 0` \u2192 continue\n- else if `hasStrongAlternative = true` and the alternative is not meaningfully weaker \u2192 continue\n- else if `diagnosisType = inconclusive_but_narrowed` and further evidence is not realistically available \u2192 stop with bounded uncertainty\n- else \u2192 stop\n\nOutput exactly:\n```json\n{\n  \"artifacts\": [{\n    \"kind\": \"wr.loop_control\",\n    \"decision\": \"continue\"\n  }]\n}\n```",
           "requireConfirmation": true,
           "outputContract": {
             "contractRef": "wr.contracts.loop_control"
@@ -130,12 +145,13 @@
       "id": "phase-5-diagnosis-validation",
       "title": "Phase 5: Diagnosis Validation Bundle",
       "prompt": "Stress-test the current diagnosis before handoff.\n\nSet `diagnosisConfidenceBand` using these rules:\n- High = all symptoms explained, no material contradictions, no unresolved evidence gaps\n- Medium = likely diagnosis, but one bounded uncertainty remains\n- Low = multiple viable explanations remain or contradictions are unresolved\n\nMode-adaptive validation:\n- QUICK: self-challenge; if `diagnosisConfidenceBand != High` or contradictions remain, optionally spawn ONE WorkRail Executor running `routine-hypothesis-challenge`\n- STANDARD: if delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge` and `routine-execution-simulation`\n- THOROUGH: if delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, and an additional `routine-hypothesis-challenge` pass focused on breaking the current diagnosis from a different angle\n\nParallel-output synthesis rules:\n- if 2+ validators raise serious concerns, reopen evidence or shortlist work\n- if exactly one validator raises a concern, investigate it before escalating\n- if no validator can materially break the diagnosis and `contradictionCount = 0`, proceed to handoff\n\nAfter synthesizing the validation result, assess whether the diagnosis is ready for final handoff.\n\nSet context variables:\n- `diagnosisConfidenceBand`\n- `validationFindingsCountBySeverity`\n- `validationSummary`\n\nBoundary rule:\n- allowed: high-level fix direction, likely files involved, verification recommendations\n- not allowed: implementation plan, patch sequencing, PR plan, or code-writing momentum",
-      "assessmentRefs": ["diagnosis_readiness_gate"],
+      "assessmentRefs": [
+        "diagnosis_readiness_gate"
+      ],
       "assessmentConsequences": [
         {
           "when": {
-            "dimensionId": "confidence",
-            "equalsLevel": "low"
+            "anyEqualsLevel": "low"
           },
           "effect": {
             "kind": "require_followup",
@@ -145,8 +161,14 @@
       ],
       "requireConfirmation": {
         "or": [
-          { "var": "diagnosisConfidenceBand", "equals": "Low" },
-          { "var": "contradictionCount", "not_equals": 0 }
+          {
+            "var": "diagnosisConfidenceBand",
+            "equals": "Low"
+          },
+          {
+            "var": "contradictionCount",
+            "not_equals": 0
+          }
         ]
       }
     },