npm - @exaudeus/workrail - Versions diffs - 3.27.0 → 3.29.0 - Mend

@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
package/dist/console/index.html +1 -1
package/dist/manifest.json +3 -3
package/docs/README.md +57 -0
package/docs/adrs/001-hybrid-storage-backend.md +38 -0
package/docs/adrs/002-four-layer-context-classification.md +38 -0
package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
package/docs/adrs/010-release-pipeline.md +89 -0
package/docs/architecture/README.md +7 -0
package/docs/architecture/refactor-audit.md +364 -0
package/docs/authoring-v2.md +527 -0
package/docs/authoring.md +873 -0
package/docs/changelog-recent.md +201 -0
package/docs/configuration.md +505 -0
package/docs/ctc-mcp-proposal.md +518 -0
package/docs/design/README.md +22 -0
package/docs/design/agent-cascade-protocol.md +96 -0
package/docs/design/autonomous-console-design-candidates.md +253 -0
package/docs/design/autonomous-console-design-review.md +111 -0
package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
package/docs/design/claude-code-source-deep-dive.md +713 -0
package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
package/docs/design/console-execution-trace-candidates-final.md +160 -0
package/docs/design/console-execution-trace-candidates.md +211 -0
package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
package/docs/design/console-execution-trace-design-review.md +74 -0
package/docs/design/console-execution-trace-discovery.md +394 -0
package/docs/design/console-execution-trace-final-review.md +77 -0
package/docs/design/console-execution-trace-review.md +92 -0
package/docs/design/console-performance-discovery.md +415 -0
package/docs/design/console-ui-backlog.md +280 -0
package/docs/design/daemon-architecture-discovery.md +853 -0
package/docs/design/daemon-design-candidates.md +318 -0
package/docs/design/daemon-design-review-findings.md +119 -0
package/docs/design/daemon-engine-design-candidates.md +210 -0
package/docs/design/daemon-engine-design-review.md +131 -0
package/docs/design/daemon-execution-engine-discovery.md +280 -0
package/docs/design/daemon-gap-analysis.md +554 -0
package/docs/design/daemon-owns-console-plan.md +168 -0
package/docs/design/daemon-owns-console-review.md +91 -0
package/docs/design/daemon-owns-console.md +195 -0
package/docs/design/data-model-erd.md +11 -0
package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
package/docs/design/list-workflows-latency-fix-plan.md +128 -0
package/docs/design/list-workflows-latency-fix-review.md +55 -0
package/docs/design/list-workflows-latency-fix.md +109 -0
package/docs/design/native-context-management-api.md +11 -0
package/docs/design/performance-sweep-2026-04.md +96 -0
package/docs/design/routines-guide.md +219 -0
package/docs/design/sequence-diagrams.md +11 -0
package/docs/design/subagent-design-principles.md +220 -0
package/docs/design/temporal-patterns-design-candidates.md +312 -0
package/docs/design/temporal-patterns-design-review-findings.md +163 -0
package/docs/design/test-isolation-from-config-file.md +335 -0
package/docs/design/v2-core-design-locks.md +2746 -0
package/docs/design/v2-lock-registry.json +734 -0
package/docs/design/workflow-authoring-v2.md +1044 -0
package/docs/design/workflow-docs-spec.md +218 -0
package/docs/design/workflow-extension-points.md +687 -0
package/docs/design/workrail-auto-trigger-system.md +359 -0
package/docs/design/workrail-config-file-discovery.md +513 -0
package/docs/docker.md +110 -0
package/docs/generated/v2-lock-closure-plan.md +26 -0
package/docs/generated/v2-lock-coverage.json +797 -0
package/docs/generated/v2-lock-coverage.md +177 -0
package/docs/ideas/backlog.md +3927 -0
package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
package/docs/ideas/implementation_plan.md +249 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
package/docs/implementation/02-architecture.md +316 -0
package/docs/implementation/04-testing-strategy.md +124 -0
package/docs/implementation/09-simple-workflow-guide.md +835 -0
package/docs/implementation/13-advanced-validation-guide.md +874 -0
package/docs/implementation/README.md +21 -0
package/docs/integrations/claude-code.md +300 -0
package/docs/integrations/firebender.md +315 -0
package/docs/migration/v0.1.0.md +147 -0
package/docs/naming-conventions.md +45 -0
package/docs/planning/README.md +104 -0
package/docs/planning/github-ticketing-playbook.md +195 -0
package/docs/plans/README.md +24 -0
package/docs/plans/agent-managed-ticketing-design.md +605 -0
package/docs/plans/agentic-orchestration-roadmap.md +112 -0
package/docs/plans/assessment-gates-engine-handoff.md +536 -0
package/docs/plans/content-coherence-and-references.md +151 -0
package/docs/plans/library-extraction-plan.md +340 -0
package/docs/plans/mr-review-workflow-redesign.md +1451 -0
package/docs/plans/native-context-management-epic.md +11 -0
package/docs/plans/perf-fixes-design-candidates.md +225 -0
package/docs/plans/perf-fixes-design-review-findings.md +61 -0
package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
package/docs/plans/perf-fixes-new-issues-review.md +110 -0
package/docs/plans/prompt-fragments.md +53 -0
package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
package/docs/plans/ui-ux-workflow-discovery.md +100 -0
package/docs/plans/ui-ux-workflow-review.md +48 -0
package/docs/plans/v2-followup-enhancements.md +587 -0
package/docs/plans/workflow-categories-candidates.md +105 -0
package/docs/plans/workflow-categories-discovery.md +110 -0
package/docs/plans/workflow-categories-review.md +51 -0
package/docs/plans/workflow-discovery-model-candidates.md +94 -0
package/docs/plans/workflow-discovery-model-discovery.md +74 -0
package/docs/plans/workflow-discovery-model-review.md +48 -0
package/docs/plans/workflow-source-setup-phase-1.md +245 -0
package/docs/plans/workflow-source-setup-phase-2.md +361 -0
package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
package/docs/plans/workflow-staleness-detection-review.md +58 -0
package/docs/plans/workflow-staleness-detection.md +80 -0
package/docs/plans/workflow-v2-design.md +69 -0
package/docs/plans/workflow-v2-roadmap.md +74 -0
package/docs/plans/workflow-validation-design.md +98 -0
package/docs/plans/workflow-validation-roadmap.md +108 -0
package/docs/plans/workrail-platform-vision.md +420 -0
package/docs/reference/agent-context-cleaner-snippet.md +94 -0
package/docs/reference/agent-context-guidance.md +140 -0
package/docs/reference/context-optimization.md +284 -0
package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
package/docs/reference/example-workflow-repository-template/README.md +268 -0
package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
package/docs/reference/external-workflow-repositories.md +916 -0
package/docs/reference/feature-flags-architecture.md +472 -0
package/docs/reference/feature-flags.md +349 -0
package/docs/reference/god-tier-workflow-validation.md +272 -0
package/docs/reference/loop-optimization.md +209 -0
package/docs/reference/loop-validation.md +176 -0
package/docs/reference/loops.md +465 -0
package/docs/reference/mcp-platform-constraints.md +59 -0
package/docs/reference/recovery.md +88 -0
package/docs/reference/releases.md +177 -0
package/docs/reference/troubleshooting.md +105 -0
package/docs/reference/workflow-execution-contract.md +998 -0
package/docs/roadmap/README.md +22 -0
package/docs/roadmap/legacy-planning-status.md +103 -0
package/docs/roadmap/now-next-later.md +70 -0
package/docs/roadmap/open-work-inventory.md +389 -0
package/docs/tickets/README.md +39 -0
package/docs/tickets/next-up.md +76 -0
package/docs/workflow-management.md +317 -0
package/docs/workflow-templates.md +423 -0
package/docs/workflow-validation.md +184 -0
package/docs/workflows.md +254 -0
package/package.json +3 -1
package/spec/authoring-spec.json +61 -16
package/workflows/workflow-for-workflows.json +252 -93
package/workflows/workflow-for-workflows.v2.json +188 -77

package/docs/design/workflow-authoring-v2.md ADDED Viewed

@@ -0,0 +1,1044 @@
+# WorkRail v2 — Workflow Authoring Reference
+This document describes the v2 JSON authoring model for workflows, including capabilities, features, templates, output contracts, and prompt engineering patterns.
+See also:
+- `docs/authoring.md` (general guide)
+- `docs/reference/workflow-execution-contract.md` (normative contract)
+- `docs/design/studio.md` (Studio discoverability and catalog)
+## JSON-first authoring
+WorkRail v2 uses **JSON** as the canonical authoring format.
+- DSL and YAML remain possible future input formats, but v2 optimizes for determinism and straightforward validation.
+- Workflows are pinned to a `workflowHash` computed from the **compiled canonical model** (after all templates/features/contracts are expanded), not raw source text.
+## Top-level workflow structure
+```jsonc
+{
+  "id": "namespace.workflow_name",
+  "name": "Human-Readable Name",
+  "description": "What this workflow does",
+  "agentRole": "Optional: workflow-level stance/persona for the agent",
+  "capabilities": { ... },
+  "features": [ ... ],
+  "references": [ ... ],
+  "conditions": [ ... ],
+  "steps": [ ... ]
+}
+```
+## Capabilities (workflow-global)
+Capabilities declare optional agent environment enhancements that materially affect workflow execution or are required for correctness.
+```jsonc
+"capabilities": {
+  "delegation": "preferred",
+  "web_browsing": "required"
+}
+```
+**Allowed values**:
+- `required`: workflow cannot run meaningfully without it
+- `preferred`: use when available; degrade gracefully otherwise
+- `disabled`: do not use even if available (rare)
+**Closed set for v2**:
+- `delegation` (subagents / parallel delegation)
+- `web_browsing` (external knowledge lookup)
+**Behavior**:
+- WorkRail cannot introspect the agent environment.
+- Capability availability is learned via **explicit probe/attempt steps** and recorded durably as node-attached observations.
+- If required and unavailable:
+  - blocking modes → `blocked` with remediation ("install web browsing MCP" or provide sources manually)
+  - never-stop → record critical gap + continue
+## Features (compiler middleware)
+Features are closed-set, WorkRail-defined behaviors applied globally during workflow compilation.
+Most features are simple toggles:
+```jsonc
+"features": [
+  "wr.features.mode_guidance",
+  "wr.features.durable_recap_guidance"
+]
+```
+A small subset of features can accept typed configuration:
+```jsonc
+"features": [
+  {
+    "id": "wr.features.capabilities",
+    "config": {
+      "probeVisibility": "collapsed",
+      "recordObservationsAs": "artifact"
+    }
+  },
+  {
+    "id": "wr.features.output_contracts",
+    "config": {
+      "enforce": "block_in_blocking_modes_gap_in_never_stop"
+    }
+  }
+]
+```
+**Whitelist of configurable features** (initial):
+- `wr.features.capabilities`
+- `wr.features.output_contracts`
+- `wr.features.mode_guidance`
+Config schemas are WorkRail-owned and validated per feature.
+## References (workflow-declared external documents)
+References let a workflow point at authoritative external documents (schemas, specs, team guides, playbooks) without inlining their content into prompts or `metaGuidance`.
+```jsonc
+"references": [
+  {
+    "id": "api-schema",
+    "title": "API Schema",
+    "source": "./spec/api-schema.json",       // resolves against user's workspace (default)
+    "purpose": "Canonical API contract the implementation must satisfy",
+    "authoritative": true
+  },
+  {
+    "id": "authoring-spec",
+    "title": "Authoring Specification",
+    "source": "./spec/authoring-spec.json",   // resolves against workrail package root
+    "purpose": "Canonical workflow authoring rules",
+    "authoritative": true,
+    "resolveFrom": "package"
+  }
+]
+```
+**Fields:**
+- `id` (string, unique): identifier for the reference
+- `title` (string): human-readable display name
+- `source` (string): file path, resolved relative to the base specified by `resolveFrom`
+- `purpose` (string): why this reference matters to the workflow
+- `authoritative` (boolean): whether the agent should treat this as a binding constraint vs advisory guidance
+- `resolveFrom` (optional, `"workspace"` | `"package"`, default `"workspace"`): where to resolve `source` from. Use `"workspace"` for project-specific files; use `"package"` for files shipped with the workflow (specs, schemas, bundled guides)
+**Behavior:**
+- References are **pointers, not content**. WorkRail resolves paths at start time (against workspace root or package root per `resolveFrom`) and surfaces them to the agent as a separate content section. The agent reads files itself if needed.
+- On `start_workflow`: full reference set with titles, paths, purposes, and authority level.
+- On `rehydrate`: compact reminder (titles and paths only).
+- On `advance`: no references emitted (the agent already has them).
+- Unresolved paths (file missing) produce a warning but do not block the workflow. The reference is surfaced with an `[unresolved]` tag.
+- Reference declarations are included in the `workflowHash` (the declarations, not the file contents), so changing which references a workflow declares creates a new hash.
+**When to use references vs metaGuidance:**
+- Use **references** only for external documents the agent may genuinely need while **executing the workflow**.
+- Use **metaGuidance** for short behavioral rules surfaced on start and resume (e.g., "always maintain CONTEXT.md", "use Memory MCP for persistence").
+Good fits for **references**:
+- a shipped rubric the running workflow must consult
+- a workspace spec, policy, schema, or playbook that constrains the task being executed
+- a package-shipped companion document that materially affects runtime judgment
+Bad fits for **references** in ordinary execution workflows:
+- authoring-only provenance about how the workflow itself was designed
+- the workflow JSON schema, unless the workflow is explicitly a workflow-authoring or validation workflow
+- authoring guides/specs that only matter to workflow maintainers, not to the running agent
+Rule of thumb:
+- If removing the reference would not make the running workflow materially worse at doing its job, it probably should not be a workflow reference.
+References are surfaced in `inspect_workflow` output for discoverability before starting execution.
+## Steps
+Steps can be either normal steps or template calls.
+### Template calls versus extension points
+These mechanisms solve different problems:
+- **`templateCall`** is for **routine injection**. The compiler expands the called routine into real parent-workflow steps.
+- **`extensionPoints`** are for **project-overridable delegation seams**. The compiler resolves `{{wr.bindings.slotId}}` to a routine/workflow ID in prompt text, but the parent agent still decides whether to call or follow that implementation.
+Use this default rule:
+- If you want **visible inline structure**, **confirmation behavior**, or **step-level session traceability**, use **`templateCall`**.
+- If you want an intentionally opaque bounded implementation that a team may swap per project, use **delegation + `extensionPoints`**.
+Important compiler ordering:
+- Template expansion runs before binding resolution.
+- Therefore, **extension points cannot currently select which routine a `templateCall` injects**.
+### Identifier constraints (authoring-time validation, locked)
+To keep execution deterministic and avoid escaping footguns, step and loop identifiers used in execution state must be delimiter-safe.
+Locks:
+- `step.id` (StepId) MUST match: `[a-z0-9_-]+`
+- `loopId` (when/if loop constructs are used) MUST match: `[a-z0-9_-]+`
+- The following characters are not allowed in these identifiers: `@`, `/`, `:`
+Studio/CLI should provide deterministic auto-fix suggestions (lowercase + replace invalid characters with `_`).
+### Loops (explicit, deterministic) (initial v2 authoring)
+Loops are authored as first-class steps to keep execution deterministic and Studio-renderable. The compiler assigns a stable `bodyIndex` corresponding to the index in the authored `body[]` list.
+Locks:
+- Loops are expressed as `type: "loop"` steps with a unique `loopId` (delimiter-safe).
+- Loop bodies are explicit and ordered: `body[]` is the authoritative step list for `bodyIndex`.
+- Every loop MUST declare `maxIterations` (no defaults).
+- Loop continuation is defined by a `while` condition reference (see Conditions below), not by ad-hoc booleans or free-form strings.
+Example:
+```jsonc
+{
+  "type": "loop",
+  "loopId": "investigation_pass",
+  "while": { "kind": "condition_ref", "conditionId": "hypotheses_stable" },
+  "maxIterations": 5,
+  "body": [
+    { "id": "gather_evidence", "title": "Gather evidence", "prompt": "..." },
+    { "id": "update_hypotheses", "title": "Update hypotheses", "prompt": "..." }
+  ]
+}
+```
+### Conditions (closed set, reusable) (initial v2 authoring)
+Conditions are authored as a closed set of reusable definitions and referenced by loops and (later) other control structures. This keeps control flow based on data state and avoids stringly-typed expression bags.
+Identifier constraints:
+- `conditions[].id` MUST match: `[a-z0-9_-]+`
+Shape:
+```jsonc
+{
+  "conditions": [
+    {
+      "id": "hypotheses_stable",
+      "kind": "always_false"
+    }
+  ]
+}
+```
+Initial closed set (minimal, expandable):
+- `always_true`
+- `always_false`
+- `loop_control` (recommended for most real loops)
+#### `loop_control` condition kind (recommended)
+Because WorkRail cannot infer intent from prompts and we avoid arbitrary expression strings, the most practical deterministic loop exit is an explicit loop control signal produced by the workflow itself.
+Pattern:
+- A step in the loop body emits a small, contract-validated control artifact indicating whether to continue.
+- The loop’s `while` condition references a `loop_control` condition definition.
+Example condition definition:
+```jsonc
+{
+  "id": "keep_iterating",
+  "kind": "loop_control",
+  "continueWhen": "continue" // closed set: continue | stop
+}
+```
+Notes:
+- The control signal should be validated via a WorkRail-owned output contract pack (e.g., `wr.contracts.loop_control`) to keep it self-correcting and deterministic.
+- This avoids reading arbitrary `context` keys or relying on free-form strings for control flow.
+- Deterministic evaluation intent:
+  - On each loop iteration, the workflow must produce a validated `wr.loop_control` artifact for the loop’s `loopId`.
+  - The loop continues while the most recent validated artifact indicates `decision == continueWhen`.
+  - Missing/invalid loop control output is handled by effective autonomy:
+    - blocking modes: `blocked` with structured missing/invalid required output
+    - never-stop: record a critical gap and treat as `decision="stop"` (fail-safe to prevent runaway loops)
+#### `wr.contracts.loop_control` (initial contract pack, locked)
+This contract pack validates a loop control artifact emitted via `output.artifacts[]`.
+Artifact kind:
+- `wr.loop_control`
+Required fields:
+- `loopId` (must match the enclosing loop step’s `loopId`)
+- `decision`: `continue | stop`
+Optional fields:
+- `summary` (bounded text)
+Example artifact:
+```jsonc
+{
+  "kind": "wr.loop_control",
+  "loopId": "investigation_pass",
+  "decision": "continue",
+  "summary": "More hypotheses to test; proceed."
+}
+```
+Notes:
+- The initial set is intentionally tiny; richer condition kinds should be added only when needed and must remain a closed set.
+### Normal step
+```jsonc
+{
+  "id": "phase-1",
+  "title": "Phase 1: Analysis",
+  "agentRole": "Optional: override workflow-level role for this step",
+  "prompt": "A single prompt string (traditional)"
+}
+```
+Or using structured blocks:
+```jsonc
+{
+  "id": "phase-1",
+  "title": "Phase 1: Analysis",
+  "promptBlocks": {
+    "goal": "What this step accomplishes",
+    "constraints": [
+      "Follow the selected mode (guided vs full-auto)",
+      "Do not assume baseline tools exist in workflows"
+    ],
+    "procedure": [
+      "Step 1: ...",
+      "Step 2: ..."
+    ],
+    "outputRequired": {
+      "notesMarkdown": "Short recap (≤10 lines)"
+    },
+    "verify": [
+      "Check that ..."
+    ]
+  },
+  "output": {
+    "contractRef": "wr.contracts.some_contract",
+    "hints": {
+      "notesMarkdown": "If you can't produce the contract, at least write notes."
+    }
+  }
+}
+```
+**PromptBlocks** (optional):
+- Canonical block set: `goal`, `constraints`, `procedure`, `outputRequired`, `verify`
+- WorkRail renders blocks in deterministic order into a text-first `pending.prompt`
+- Features can inject/override specific blocks (e.g., mode guidance → constraints)
+## Authoring for intelligence instead of compliance theater
+The point of a strong workflow is not to script every thought. The point is to create a structure that preserves quality while still letting the agent use its intelligence well.
+### Strong skeleton, not a straitjacket
+Good workflows are rigid about:
+- required outputs
+- important decision points
+- evidence and uncertainty accounting
+- explicit challenge or validation
+- durable state and handoff expectations
+Good workflows stay flexible about:
+- reasoning order
+- synthesis moves
+- ideation style
+- how the agent reaches a sound answer inside the phase boundary
+In practice: constrain the **failure modes**, not the entire cognitive path.
+### Anti-lazy wording patterns
+Adaptive wording is powerful, but it can also make it too easy for the agent to do shallow work.
+Be careful with phrases like:
+- `if appropriate`
+- `minimal pass`
+- `light scan`
+- `you may`
+- `smallest`
+- `cheapest`
+These phrases usually need a quality floor.
+Bad:
+- "Do a light landscape scan."
+- "If appropriate, delegate."
+- "Generate 2-3 options."
+Better:
+- "Do the lightest landscape scan that still surfaces the main existing approaches, hard constraints, and obvious contradictions."
+- "Delegate if it is likely to improve the result enough to be worth the extra step. If not, keep going yourself and record why."
+- "Generate enough genuinely distinct options to support a real choice. If they cluster too tightly, add another pass."
+The pattern to aim for is:
+- creative freedom
+- plus a clear quality floor
+- plus accountability when choosing the lighter path
+### User-voice prose
+For bundled, user-facing workflows, prefer prose that often sounds like the user is directly telling the agent what to do.
+This usually works better than detached author or framework narration for exploratory, creative, or advisory workflows.
+Less effective:
+- "The agent should gather stakeholder input and synthesize next actions."
+More effective:
+- "Figure out what I actually need, gather the missing context, and then tell me what you think I should do next."
+This is not a hard rule for every workflow. Internal, infrastructural, or highly mechanical workflows may still read better in neutral system prose. But user-voice should be considered a first-class authoring option.
+### Extension point ownership
+Extension points work best when they customize a bounded cognitive unit, not the entire workflow's reasoning contract.
+Recommended pattern:
+- parent workflow owns:
+  - orchestration
+  - synthesis
+  - final decision
+  - loop exits
+  - handoff
+- extension slot owns:
+  - bounded candidate generation
+  - bounded review or audit
+  - bounded challenge or validation pass
+This preserves customization without letting the core workflow become incoherent.
+### Prompt references (`wr.refs.*`) (initial v2 authoring)
+Workflows may inject small, canonical WorkRail-owned snippets inline (e.g., “WorkRail v2 definition”, “append-only truth”, “modes semantics”) to avoid copy/paste and keep prompts consistent.
+Locks (v2 intent):
+- **Compile-time only**: references are resolved during compilation and included in the compiled snapshot (and therefore `workflowHash`).
+- **Closed set**: reference IDs are WorkRail-owned and namespaced: `wr.refs.*`.
+- **No templating**: do not support `{{ }}` interpolation, file-path includes, or URL includes. References must be typed and validated.
+- **Budgets**: references are byte-bounded; budget violations are validation errors (no silent truncation).
+- **Placement discipline**: references are allowed only within structured prompt sections (to keep prompts instruction-first).
+Locked choice (v2): the compiled workflow snapshot embeds the fully resolved reference text for every `wr.refs.*` usage (not just `{refId, refContentHash}`), so pinned compiled snapshots are self-contained for export/import.
+Authoring shape (conceptual, code-canonical schema will be generated):
+- `PromptValue = string | PromptParts`
+- `PromptParts = [PromptPart, ...]` (non-empty list)
+- `PromptPart` (closed union):
+  - `{ "kind": "text", "text": "..." }`
+  - `{ "kind": "ref", "refId": "wr.refs.some_snippet" }`
+Recommended placements:
+- `promptBlocks.goal`: `PromptValue`
+- `promptBlocks.constraints[]`: `PromptValue[]`
+- `promptBlocks.procedure[]`: `PromptValue[]`
+- `promptBlocks.verify[]`: `PromptValue[]`
+Example (conceptual):
+```jsonc
+{
+  "id": "project.planning_v2",
+  "steps": [
+    {
+      "id": "plan_storage",
+      "title": "Plan the two-stream commit protocol",
+      "promptBlocks": {
+        "goal": [
+          { "kind": "text", "text": "Plan the durable commit protocol for WorkRail v2." },
+          { "kind": "ref", "refId": "wr.refs.v2_definition" }
+        ],
+        "constraints": [
+          [
+            { "kind": "ref", "refId": "wr.refs.append_only_truth" },
+            { "kind": "text", "text": "Do not introduce salvage scanning as a correctness path." }
+          ]
+        ],
+        "procedure": [
+          "Define the AppendPlan commit ordering.",
+          "List crash states and recovery rules."
+        ],
+        "verify": [
+          [
+            { "kind": "text", "text": "Hashing inputs are canonical and deterministic." },
+            { "kind": "ref", "refId": "wr.refs.jcs_hashing" }
+          ]
+        ]
+      }
+    }
+  ]
+}
+```
+**Output object** (optional):
+- `contractRef` (string): references a WorkRail-owned contract pack; WorkRail validates on `continue_workflow`
+- `hints` (object): non-enforced guidance
+### Template call step
+```jsonc
+{
+  "type": "template_call",
+  "templateId": "wr.templates.capability_probe",
+  "args": {
+    "capability": "web_browsing",
+    "when": "early_if_required"
+  }
+}
+```
+Templates are WorkRail-owned builtins that expand into one or more steps.
+**Template-implied contracts**:
+- Templates may automatically imply an `output.contractRef` without the author specifying it.
+- Example: `wr.templates.capability_probe` implies `wr.contracts.capability_observation`.
+## AgentRole
+`agentRole` is a stance/persona snippet injected into the rendered prompt.
+- WorkRail **cannot control the agent's system prompt** (that lives in the agent/IDE runtime).
+- `agentRole` is simply text guidance included in the workflow/step instructions.
+**Scoping**:
+- Workflow-level `agentRole` applies to all steps unless overridden.
+- Step-level `agentRole` overrides the workflow default.
+**Best practice**: keep short; rely on templates/features for reusable stance content.
+## Reusable authoring patterns for adaptive workflows
+These patterns are emerging as the most reusable v2 design techniques for workflows that need both determinism and adaptive reasoning.
+### 1. Constrain outcomes, not cognition
+Author phases so they specify:
+- what must be established before the phase can end
+- what must be recorded if it cannot be established
+- what control-flow or confidence changes follow
+Do **not** over-prescribe the exact reasoning order unless the order itself is critical for correctness.
+This should also influence wording. Avoid soft escape-hatch phrases like `if appropriate`, `minimal pass`, `light scan`, or `you may` unless you also say what still must be achieved and when the lighter path is actually enough.
+### 2. Locate → Bound → Enrich → Classify front-half
+For workflows that investigate an external or partially-known target, a strong default front-half pattern is:
+1. **Locate** the true target
+2. **Bound** the correct scope / boundary
+3. **Enrich** with all realistically available context
+4. **Classify** risk, confidence, shape, and routing needs
+This is often better than splitting triage, input gating, context gathering, and re-triage into many smaller phases.
+### 3. Confidence-aware orchestration
+Treat confidence as workflow state, not hand-wavy prose.
+Common confidence dimensions:
+- boundary confidence
+- context / intent confidence
+- policy-context confidence
+- evidence confidence
+- validation confidence
+Authors should define:
+- which dimensions the workflow tracks
+- which ones cap final conclusions
+- which ones trigger deeper follow-up versus only downgrade the handoff
+#### Example confidence matrix
+| Dimension | High | Medium | Low |
+|---|---|---|---|
+| Boundary confidence | True scope/base is clear | Likely scope with some ambiguity | Scope may be wrong |
+| Context confidence | Intent and constraints are well-supported | Partial intent reconstruction | Intent mostly inferred or missing |
+| Policy-context confidence | Repo/user rules are clear | Some rules known, some gaps | Rules/preferences mostly unknown |
+| Evidence confidence | Findings are directly supported | Mostly reasoned with some missing proof | Major inference gaps |
+| Validation confidence | Strong challenge/verification happened | Some verification happened | Little meaningful verification |
+#### Example confidence cap rules
+- if **boundary confidence = Low**, final recommendation confidence should not exceed **Low**
+- if **evidence confidence = Low**, Major/Critical claims should be softened or reopened
+- if **policy-context confidence = Low**, style/philosophy findings should be framed more cautiously than correctness findings
+### 4. Non-blocking enrichment with durable disclosure
+Preferred capabilities and enrichment sources should usually follow this pattern:
+- try to use them
+- record what happened durably
+- degrade gracefully if unavailable
+- disclose the limitation in the final handoff
+Only block when the target is not meaningfully reviewable/executable or when a truly required capability/output is missing in blocking modes.
+#### Example gap semantics
+| Situation | Blocking mode | Never-stop mode |
+|---|---|---|
+| Preferred capability unavailable | degrade + disclose | degrade + record gap |
+| Supporting docs missing | continue + disclose | continue + record gap if material |
+| Boundary remains ambiguous | continue with downgraded confidence | continue with downgraded confidence + gap |
+| Required contract output missing | blocked | continue + critical gap |
+### 5. Artifact vs context split
+Use **context** for routing-critical fields that later phases, loops, and conditions need to inspect cheaply and deterministically.
+Use **artifacts** for richer synthesis, human-readable ledgers, and review/handoff material.
+Rule of thumb:
+- **context** = drives control flow
+- **artifact** = drives reading, synthesis, and handoff
+Authoring implication:
+- the agent should compute and record route-driving facts in `context`
+- the workflow/engine should evaluate those facts through conditions, loops, and branching constructs
+- avoid leaving material path decisions to free-form prompt wording such as "choose the best next path" when the route can be made declarative
+#### Example split
+Keep in **context**:
+- `boundaryConfidence`
+- `needsFollowup`
+- `reviewMode`
+- `shapeProfile`
+Keep as **artifacts**:
+- human-readable review/source ledgers
+- boundary-analysis summaries
+- synthesized findings reports
+### 6. Shape/type routing as semi-structured guidance
+Routing guidance should help the agent classify the work, but material pathing should still be encoded in workflow structure wherever practical.
+Good pattern:
+- agent classifies shape, risk, or confidence
+- agent records the route-driving outputs in context
+- engine/workflow uses those outputs for branching, loops, or gates
+Weaker pattern:
+- prompt prose leaves the next branch to implicit agent choice without a condition, route-driving field, or declared control-flow rule
+### 7. Prompt wording should resist shallow compliance
+Prompts should not only be valid or well-structured. They should also be hard to satisfy with shallow compliance.
+Good prompt wording:
+- makes success conditions concrete
+- asks for evidence, alternatives, tradeoffs, or uncertainty disclosure when those matter
+- avoids vague opt-out language unless paired with a real quality floor
+Weak:
+- "Do a light review."
+- "If appropriate, delegate."
+- "Generate 2-3 ideas."
+Stronger:
+- "Do the lightest review that still surfaces the main risks, contradictions, and missing context."
+- "Delegate only if it is likely to improve the result enough to be worth the extra step. Otherwise continue yourself and record why."
+- "Generate enough distinct ideas to support a real choice. If they cluster too tightly, add another pass."
+### 8. Choose prose stance deliberately
+Workflow prose has a stance, whether authors choose it consciously or not.
+Common options:
+- **user-voice**: sounds like the user directly instructing the agent
+- **neutral system prose**: detached but clear
+- **internal-author prose**: suitable for highly technical or maintenance-oriented workflows
+For bundled and user-facing workflows, user-voice is often the strongest default. For deeply internal or infrastructural workflows, neutral/system prose may be a better fit. The important thing is to choose deliberately rather than drifting into accidental framework boilerplate.
+Shape and type classifications are often valuable, but they should remain compact and behavior-linked.
+Use them to influence:
+- reviewer-family selection
+- validation depth
+- simulation need
+- false-positive suppression
+- partitioning and follow-up depth
+Avoid large taxonomies that do not change behavior.
+#### Example routing table
+| Shape / Type | Strong default behavior |
+|---|---|
+| tiny isolated + test/docs only | lighter review depth, stronger false-positive suppression |
+| broad cross-cutting + API/contract change | stronger architecture/runtime review and deeper validation |
+| stacked/ambiguous boundary | boundary follow-up before high-confidence conclusions |
+| migration-heavy | stronger rollout, compatibility, and simulation scrutiny |
+### 7. Auditor-first delegation
+When using subagents/routines, prefer auditor-style use when the main agent should remain the owner of truth and synthesis.
+Good uses:
+- completeness audit
+- depth audit
+- adversarial challenge
+- philosophy alignment review
+- final verification
+Executor-style delegation is still useful for bounded independent work, but should not silently replace workflow-owned judgment.
+## Compact end-to-end example
+For a workflow reviewing an external artifact:
+1. **Phase 0** establishes target, boundary, context, and confidence fields
+2. **Context keys** store routing-critical outputs such as `boundaryConfidence` and `needsFollowup`
+3. **Artifacts** store richer synthesis such as a `source-ledger` or `boundary-analysis`
+4. An **auditor routine** challenges the current hypothesis
+5. Final confidence is capped by weak boundary/evidence state even if the prose summary sounds strong
+## Validation Criteria (v1 compatibility, design guidance)
+Steps can include `validationCriteria` to validate agent output. This is a v1 feature maintained for backward compatibility.
+### Evidence-Based Validation Design (CRITICAL)
+**Lock (§19)**: Validate **evidence of work**, not **completion flags**.
+When agents see validation requirements (via prompt enhancement, error messages, or guidance fields), they may optimize for **passing validation** rather than **doing quality work**. Prevent this with evidence-based validation.
+#### Anti-Pattern: Flag-Only Validation
+```json
+{
+  "prompt": "Analyze security vulnerabilities",
+  "validationCriteria": {
+    "type": "contains",
+    "value": "analysisComplete = true"
+  }
+}
+```
+**Problem**: Agent writes `analysisComplete = true` without analysis. Passes validation, produces no value.
+#### Good Pattern: Evidence-Based Validation
+```json
+{
+  "prompt": "Analyze security vulnerabilities. List findings with severity.",
+  "validationCriteria": {
+    "and": [
+      {"type": "regex", "pattern": "Finding \\d+:.*severity:(high|medium|low)", "message": "List numbered findings with severity"},
+      {"type": "contains", "value": "file:", "message": "Reference specific file locations"},
+      {"type": "length", "min": 200, "max": 5000, "message": "Substantive analysis (200-5000 chars)"}
+    ]
+  }
+}
+```
+**Why it works**: Can't fake structured findings without doing analysis.
+#### Design Guidelines
+**DO**:
+- Check for work products ("Finding N:", "file:line", "rationale:")
+- Require patterns that indicate substance (not just keywords)
+- Use multiple independent criteria (cross-validation)
+- Set reasonable length bounds (ensures substance, prevents spam)
+**DON'T**:
+- Validate single boolean flags (`complete = true`, `done = yes`)
+- Check for magic phrases alone (`must contain 'finished'`)
+- Make validation trivially satisfiable with minimal text
+- Rely on agent honesty without verification
+### Migration Note
+**Existing workflows** using flag-only validation will continue to work (backward compatible), but should be audited and enhanced to follow evidence-based patterns.
+**New workflows** should follow evidence-based validation from the start.
+See §19 in `v2-core-design-locks.md` for full rationale.
+## Output contracts
+Output contracts enable WorkRail to validate required outputs and return self-correcting `blocked` responses (or record gaps).
+### Contract packs (closed set)
+WorkRail defines a small, closed set of contract packs. Initial set (conceptual):
+- `wr.contracts.capability_observation`
+- `wr.contracts.workflow_divergence`
+- `wr.contracts.loop_control`
+- (and gaps-related contracts when formalized)
+Each pack includes:
+- allowed artifact kind(s)
+- required fields
+- minimal example payload
+### Workflow-authored output schemas (rejected for v2)
+v2 does **not** allow workflows to define arbitrary inline JSON schemas (or point to project-local / git-hosted schema files) for required outputs.
+Instead:
+- Use `output.contractRef` referencing a **WorkRail-owned** contract pack (`wr.contracts.*`).
+- If you need richer structured artifacts than existing packs support, prefer:
+  - expanding the WorkRail-owned contract pack catalog (recommended), or
+  - using `output.notesMarkdown` temporarily (generic durability) until a pack exists.
+This preserves determinism, prevents drift, and keeps Studio rendering consistent.
+### Enforcement
+When a step declares `output.contractRef`:
+- On `continue_workflow`, WorkRail validates the contract output.
+- **Blocking modes**: if missing/invalid, return `kind: "blocked"` with structured "missing required output" + example.
+- **Never-stop mode**: if missing/invalid, record critical gap and continue.
+### Versioning
+Contract packs are referenced by ID only (no explicit version refs).
+Versioning is **implicit** via the pinned compiled workflow snapshot (`workflowHash`). The snapshot carries the exact contract schemas resolved at compile time, ensuring deterministic behavior even if packs evolve.
+## Verify (instructional vs enforceable)
+The `verify` block (in `promptBlocks`) is **instructional by default**: the agent follows it as a self-check before acknowledging.
+To make verification **enforceable**, express it as an output contract:
+```jsonc
+"output": {
+  "contractRef": "wr.contracts.verification_report",
+  "hints": { ... }
+}
+```
+Then WorkRail can validate the verification output before advancing.
+## Divergence markers
+Agents can report `workflow_divergence` when intentionally deviating from instructions.
+Divergence is a structured artifact:
+```jsonc
+{
+  "kind": "workflow_divergence",
+  "reason": "efficiency_skip",  // closed set (WorkRail-owned)
+  "summary": "Skipped hypothesis X because ...",
+  "relatedStepId": "phase-2"
+}
+```
+Initial closed set (conceptual; generated from canonical contract pack definitions to prevent drift):
+- `missing_user_context`
+- `capability_unavailable`
+- `efficiency_skip`
+- `safety_stop`
+- `policy_constraint`
+Studio badges nodes with divergence for auditability.
+Enforcement: divergence is **optional** (agent reports when applicable); it should not block unless a step explicitly requires it.
+## Example: complete v2 workflow
+```jsonc
+{
+  "id": "project.bug_investigation_v2",
+  "name": "Bug Investigation (v2)",
+  "description": "Deterministic, rewind-safe bug investigation with mode-aware ambiguity handling.",
+  "agentRole": "You are a senior engineer. Be explicit about assumptions and verification. Follow workflow instructions as the user's process.",
+  "capabilities": {
+    "delegation": "preferred",
+    "web_browsing": "preferred"
+  },
+  "features": [
+    "wr.features.mode_guidance",
+    {
+      "id": "wr.features.capabilities",
+      "config": {
+        "probeVisibility": "collapsed",
+        "recordObservationsAs": "artifact"
+      }
+    }
+  ],
+  "steps": [
+    {
+      "id": "triage",
+      "title": "Triage and focus",
+      "promptBlocks": {
+        "goal": "Classify scope/risks and choose focus areas for the investigation.",
+        "constraints": [
+          "Follow the selected mode (guided vs full-auto).",
+          "If delegation is unavailable, do sequential passes.",
+          "Record a durable recap at the end."
+        ],
+        "procedure": [
+          "Summarize the bug in 3 bullets.",
+          "List top 3 hypotheses.",
+          "Choose 2–4 investigation focus areas."
+        ],
+        "outputRequired": {
+          "notesMarkdown": "≤10 lines: scope, hypotheses, focus areas, and what's next."
+        },
+        "verify": [
+          "Hypotheses are testable and mutually distinguishable."
+        ]
+      }
+    },
+    {
+      "type": "template_call",
+      "templateId": "wr.templates.capability_probe",
+      "args": {
+        "capability": "delegation",
+        "when": "lazy_on_first_use"
+      }
+    },
+    {
+      "id": "investigate",
+      "title": "Run investigation passes",
+      "promptBlocks": {
+        "goal": "Test hypotheses via evidence gathering.",
+        "constraints": [
+          "If delegation is available: run parallel passes; otherwise do sequential passes.",
+          "If you intentionally deviate (skip a hypothesis), record a divergence marker."
+        ],
+        "procedure": [
+          "For each hypothesis: gather minimal evidence to confirm or falsify.",
+          "Record observations durably."
+        ],
+        "outputRequired": {
+          "notesMarkdown": "Key observations + updated confidence per hypothesis."
+        }
+      },
+      "output": {
+        "hints": {
+          "divergence": "Report if you skip hypotheses or deviate from procedure."
+        }
+      }
+    },
+    {
+      "id": "finalize",
+      "title": "Finalize root cause",
+      "promptBlocks": {
+        "goal": "Provide final root cause and recommendations.",
+        "procedure": [
+          "State the most likely root cause with confidence level.",
+          "Propose 2–3 preventative recommendations."
+        ],
+        "outputRequired": {
+          "notesMarkdown": "Root cause + recommendations."
+        },
+        "verify": [
+          "Root cause is grounded in observed evidence."
+        ]
+      }
+    }
+  ]
+}
+```
+## What happens at compile time
+WorkRail compiles the source workflow:
+1) Expands `template_call` steps into their full step sequences (with provenance).
+2) Applies `features` in deterministic order (injects prompt content, adds probe logic, etc.).
+3) Resolves `contractRef` to actual schemas from the contract pack registry.
+4) Validates `references` structurally (unique IDs, non-empty fields).
+5) Renders `promptBlocks` into deterministic text `pending.prompt` strings.
+6) Computes `workflowHash` from the fully expanded canonical model (including reference declarations).
+## Source vs compiled (Studio)
+- **Source**: what the author wrote (editable depending on source provenance/namespace).
+- **Compiled**: what WorkRail executes and pins.
+Studio's compiled view shows:
+- injected steps (collapsed by default, expandable with provenance)
+- effective `agentRole` per step
+- resolved contract schemas
+- rendered `pending.prompt` text
+This makes injection and determinism transparent.
+## Prompt engineering best practices (baked into templates/features)
+WorkRail templates/features should encode:
+- **Instruction clarity**: single clear goal, explicit constraints, prioritized procedure.
+- **Structured layout**: consistent sections (goal/constraints/procedure/output/verify).
+- **Role/stance**: appropriate persona per workflow/step.
+- **Context discipline**: provide only necessary context; avoid bloat.
+- **Tool-use guidance**: when capabilities matter (delegation/web), include probe + fallback.
+- **Output format constraints**: explicit schemas via contracts; minimal examples.
+- **Ambiguity handling**: mode-aware (ask/block vs assume+disclose).
+- **Verification**: explicit self-check step; optionally enforceable via contracts.
+- **Error handling as data**: blocked with structured reasons, not vague failures.
+Authors should **rely on templates/features** for repetitive structure rather than hand-rolling best practices in every step.
+## Discoverability
+Builtins (templates/features/contracts/capabilities) should be discoverable via:
+- **Studio Builtins Catalog**: searchable, with copyable snippets and insert actions.
+- **Contextual autocomplete**: while editing workflows in Studio.
+- **Validation as suggestions**: "You declared capability X; consider feature Y."
+The catalog is powered by a **generated registry** sourced from the same canonical builtin definitions the compiler uses (to prevent drift).
+See: `docs/design/studio.md` for Studio catalog and autocomplete UX.
+## Anti-patterns (avoid)
+- Assuming WorkRail can control the agent's system prompt (it can't).
+- Modeling "baseline tools" (file ops, grep, terminal) in workflow capabilities (noise + drift).
+- Inline schema authoring (use contract pack references instead).
+- Huge unstructured prompts (use promptBlocks + templates/features for reusable structure).
+- Silent divergence or assumptions (report via divergence markers/gaps).
+- Trusting agent-declared capabilities without probe/attempt (WorkRail can't introspect).
+## Related
+- Normative execution protocol: `docs/reference/workflow-execution-contract.md`
+- Studio UX and catalog: `docs/design/studio.md`
+- General authoring guide: `docs/authoring.md`