npm - @exaudeus/workrail - Versions diffs - 3.27.0 → 3.29.0 - Mend

@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
package/dist/console/index.html +1 -1
package/dist/manifest.json +3 -3
package/docs/README.md +57 -0
package/docs/adrs/001-hybrid-storage-backend.md +38 -0
package/docs/adrs/002-four-layer-context-classification.md +38 -0
package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
package/docs/adrs/010-release-pipeline.md +89 -0
package/docs/architecture/README.md +7 -0
package/docs/architecture/refactor-audit.md +364 -0
package/docs/authoring-v2.md +527 -0
package/docs/authoring.md +873 -0
package/docs/changelog-recent.md +201 -0
package/docs/configuration.md +505 -0
package/docs/ctc-mcp-proposal.md +518 -0
package/docs/design/README.md +22 -0
package/docs/design/agent-cascade-protocol.md +96 -0
package/docs/design/autonomous-console-design-candidates.md +253 -0
package/docs/design/autonomous-console-design-review.md +111 -0
package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
package/docs/design/claude-code-source-deep-dive.md +713 -0
package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
package/docs/design/console-execution-trace-candidates-final.md +160 -0
package/docs/design/console-execution-trace-candidates.md +211 -0
package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
package/docs/design/console-execution-trace-design-review.md +74 -0
package/docs/design/console-execution-trace-discovery.md +394 -0
package/docs/design/console-execution-trace-final-review.md +77 -0
package/docs/design/console-execution-trace-review.md +92 -0
package/docs/design/console-performance-discovery.md +415 -0
package/docs/design/console-ui-backlog.md +280 -0
package/docs/design/daemon-architecture-discovery.md +853 -0
package/docs/design/daemon-design-candidates.md +318 -0
package/docs/design/daemon-design-review-findings.md +119 -0
package/docs/design/daemon-engine-design-candidates.md +210 -0
package/docs/design/daemon-engine-design-review.md +131 -0
package/docs/design/daemon-execution-engine-discovery.md +280 -0
package/docs/design/daemon-gap-analysis.md +554 -0
package/docs/design/daemon-owns-console-plan.md +168 -0
package/docs/design/daemon-owns-console-review.md +91 -0
package/docs/design/daemon-owns-console.md +195 -0
package/docs/design/data-model-erd.md +11 -0
package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
package/docs/design/list-workflows-latency-fix-plan.md +128 -0
package/docs/design/list-workflows-latency-fix-review.md +55 -0
package/docs/design/list-workflows-latency-fix.md +109 -0
package/docs/design/native-context-management-api.md +11 -0
package/docs/design/performance-sweep-2026-04.md +96 -0
package/docs/design/routines-guide.md +219 -0
package/docs/design/sequence-diagrams.md +11 -0
package/docs/design/subagent-design-principles.md +220 -0
package/docs/design/temporal-patterns-design-candidates.md +312 -0
package/docs/design/temporal-patterns-design-review-findings.md +163 -0
package/docs/design/test-isolation-from-config-file.md +335 -0
package/docs/design/v2-core-design-locks.md +2746 -0
package/docs/design/v2-lock-registry.json +734 -0
package/docs/design/workflow-authoring-v2.md +1044 -0
package/docs/design/workflow-docs-spec.md +218 -0
package/docs/design/workflow-extension-points.md +687 -0
package/docs/design/workrail-auto-trigger-system.md +359 -0
package/docs/design/workrail-config-file-discovery.md +513 -0
package/docs/docker.md +110 -0
package/docs/generated/v2-lock-closure-plan.md +26 -0
package/docs/generated/v2-lock-coverage.json +797 -0
package/docs/generated/v2-lock-coverage.md +177 -0
package/docs/ideas/backlog.md +3927 -0
package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
package/docs/ideas/implementation_plan.md +249 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
package/docs/implementation/02-architecture.md +316 -0
package/docs/implementation/04-testing-strategy.md +124 -0
package/docs/implementation/09-simple-workflow-guide.md +835 -0
package/docs/implementation/13-advanced-validation-guide.md +874 -0
package/docs/implementation/README.md +21 -0
package/docs/integrations/claude-code.md +300 -0
package/docs/integrations/firebender.md +315 -0
package/docs/migration/v0.1.0.md +147 -0
package/docs/naming-conventions.md +45 -0
package/docs/planning/README.md +104 -0
package/docs/planning/github-ticketing-playbook.md +195 -0
package/docs/plans/README.md +24 -0
package/docs/plans/agent-managed-ticketing-design.md +605 -0
package/docs/plans/agentic-orchestration-roadmap.md +112 -0
package/docs/plans/assessment-gates-engine-handoff.md +536 -0
package/docs/plans/content-coherence-and-references.md +151 -0
package/docs/plans/library-extraction-plan.md +340 -0
package/docs/plans/mr-review-workflow-redesign.md +1451 -0
package/docs/plans/native-context-management-epic.md +11 -0
package/docs/plans/perf-fixes-design-candidates.md +225 -0
package/docs/plans/perf-fixes-design-review-findings.md +61 -0
package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
package/docs/plans/perf-fixes-new-issues-review.md +110 -0
package/docs/plans/prompt-fragments.md +53 -0
package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
package/docs/plans/ui-ux-workflow-discovery.md +100 -0
package/docs/plans/ui-ux-workflow-review.md +48 -0
package/docs/plans/v2-followup-enhancements.md +587 -0
package/docs/plans/workflow-categories-candidates.md +105 -0
package/docs/plans/workflow-categories-discovery.md +110 -0
package/docs/plans/workflow-categories-review.md +51 -0
package/docs/plans/workflow-discovery-model-candidates.md +94 -0
package/docs/plans/workflow-discovery-model-discovery.md +74 -0
package/docs/plans/workflow-discovery-model-review.md +48 -0
package/docs/plans/workflow-source-setup-phase-1.md +245 -0
package/docs/plans/workflow-source-setup-phase-2.md +361 -0
package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
package/docs/plans/workflow-staleness-detection-review.md +58 -0
package/docs/plans/workflow-staleness-detection.md +80 -0
package/docs/plans/workflow-v2-design.md +69 -0
package/docs/plans/workflow-v2-roadmap.md +74 -0
package/docs/plans/workflow-validation-design.md +98 -0
package/docs/plans/workflow-validation-roadmap.md +108 -0
package/docs/plans/workrail-platform-vision.md +420 -0
package/docs/reference/agent-context-cleaner-snippet.md +94 -0
package/docs/reference/agent-context-guidance.md +140 -0
package/docs/reference/context-optimization.md +284 -0
package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
package/docs/reference/example-workflow-repository-template/README.md +268 -0
package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
package/docs/reference/external-workflow-repositories.md +916 -0
package/docs/reference/feature-flags-architecture.md +472 -0
package/docs/reference/feature-flags.md +349 -0
package/docs/reference/god-tier-workflow-validation.md +272 -0
package/docs/reference/loop-optimization.md +209 -0
package/docs/reference/loop-validation.md +176 -0
package/docs/reference/loops.md +465 -0
package/docs/reference/mcp-platform-constraints.md +59 -0
package/docs/reference/recovery.md +88 -0
package/docs/reference/releases.md +177 -0
package/docs/reference/troubleshooting.md +105 -0
package/docs/reference/workflow-execution-contract.md +998 -0
package/docs/roadmap/README.md +22 -0
package/docs/roadmap/legacy-planning-status.md +103 -0
package/docs/roadmap/now-next-later.md +70 -0
package/docs/roadmap/open-work-inventory.md +389 -0
package/docs/tickets/README.md +39 -0
package/docs/tickets/next-up.md +76 -0
package/docs/workflow-management.md +317 -0
package/docs/workflow-templates.md +423 -0
package/docs/workflow-validation.md +184 -0
package/docs/workflows.md +254 -0
package/package.json +3 -1
package/spec/authoring-spec.json +61 -16
package/workflows/workflow-for-workflows.json +252 -93
package/workflows/workflow-for-workflows.v2.json +188 -77

package/docs/adrs/001-hybrid-storage-backend.md ADDED Viewed

@@ -0,0 +1,38 @@
+# ADR 001: Hybrid Storage Backend for Context Management
+**Status:** Accepted
+**Date:** 2025-07-27
+## Context
+The native context management feature requires a persistent storage layer to save workflow checkpoints. The primary challenges are:
+1.  **Efficiently storing large context data blobs** (which can be many megabytes) without degrading the performance of metadata operations.
+2.  **Providing fast and flexible querying capabilities** for session and checkpoint metadata (e.g., listing, filtering by tags, finding the latest).
+3.  **Maintaining a zero-configuration setup** that works out-of-the-box for users.
+4.  **Ensuring data integrity** through atomic operations.
+Three main options were considered:
+-   **Filesystem-only**: Simple, but inefficient for queries and complex metadata management.
+-   **Database-only**: Excellent for queries, but can lead to database bloat and performance issues when storing large blobs directly.
+-   **Hybrid Approach**: Combines a database for metadata with a filesystem for large data blobs.
+## Decision
+We will implement a **hybrid storage backend** using **SQLite for metadata** and the **local filesystem for large context blobs**.
+-   **SQLite Database (`workrail.db`):** This file will store all metadata, including `Session` and `CheckpointMetadata` records. This includes session info, checkpoint headers, user-provided tags, and search indices.
+-   **Filesystem (`contexts/` directory):** Large context blobs will be compressed (e.g., as `.json.gz` files) and stored in a structured directory layout. The database will store a relative path to the corresponding blob file in each checkpoint record.
+## Consequences
+### Positive:
+-   **High Performance:** Keeps the SQLite database lean and fast, ensuring metadata queries are highly performant.
+-   **Efficient Blob Storage:** The filesystem is optimized for storing large, opaque files. This also allows for efficient streaming of compression/decompression operations.
+-   **Data Integrity:** SQLite's ACID-compliant transactions ensure metadata operations are always atomic. Atomic writes to the filesystem (write to temp file, then rename) will prevent corrupted context blobs.
+-   **Inspectability:** Developers and power users can easily inspect or backup the human-readable (though compressed) context files on the filesystem.
+-   **Scalability:** This is a proven, scalable pattern for local-first applications that handles data growth well.
+### Negative:
+-   **Increased Complexity:** The implementation is slightly more complex than a single-storage solution, as it requires coordinating between the database and the filesystem.
+-   **Orphan Management:** We must implement logic to handle potential orphan files (e.g., blobs whose corresponding database records were deleted) through cleanup jobs.
+-   **Backup & Restore:** A full backup requires copying both the SQLite database file and the entire `contexts` directory.

package/docs/adrs/002-four-layer-context-classification.md ADDED Viewed

@@ -0,0 +1,38 @@
+# ADR 002: Four-Layer Context Classification Model
+**Status:** Accepted
+**Date:** 2024-07-27
+## Context
+To intelligently manage the LLM context window and storage, the system needs a way to differentiate the importance of various pieces of information within a workflow's context. A simple, undifferentiated approach would treat all data equally, leading to suboptimal compression and potential loss of critical information when the context budget is exceeded.
+We considered several approaches:
+1.  **No Classification:** Treat all context as a single blob. Simple, but ineffective.
+2.  **Binary Classification:** A simple `critical` / `non-critical` flag. Better, but lacks nuance for gradual compression.
+3.  **Content-Based Scoring:** Use algorithms to score the importance of text. Potentially powerful, but complex and computationally expensive for a real-time system.
+4.  **A Multi-Layered Hierarchy:** A predefined set of importance levels that context can be sorted into.
+## Decision
+We will adopt a **four-layer context classification hierarchy**, which categorizes information into one of four levels:
+1.  **CRITICAL:** Essential information that must never be compressed or dropped (e.g., user goals, final outputs).
+2.  **IMPORTANT:** High-value information that should be preserved, but can be compressed under pressure (e.g., reasoning chains, implementation plans).
+3.  **USEFUL:** Detailed information that is valuable but can be aggressively compressed or summarized (e.g., code examples, verbose tool outputs).
+4.  **EPHEMERAL:** Temporary data that can be safely dropped between steps (e.g., debug logs, timestamps).
+Classification will be implemented using a hybrid approach: automatic pattern-based rules (e.g., regex on context keys) as a baseline, with optional, explicit hints in the workflow schema and manual agent overrides (`workflow_mark_critical`) for fine-grained control.
+## Consequences
+### Positive:
+-   **Intelligent Compression:** Provides clear, tiered priorities for the compression engine, ensuring that the most critical information is preserved with the highest fidelity.
+-   **Efficient Token Management:** Allows the system to make informed decisions about what to summarize or drop when facing context window limits.
+-   **Research-Validated:** This model is based on research into effective token distribution, which shows it aligns well with typical information patterns in complex tasks.
+-   **Flexible and Controllable:** The combination of automatic rules and manual overrides provides a powerful system that is both easy to use by default and highly controllable for advanced workflows.
+### Negative:
+-   **Requires Upfront Definition:** The patterns for automatic classification need to be well-defined and maintained.
+-   **Potential for Misclassification:** If patterns are not reliable, information could be assigned to the wrong category, although the manual override acts as a safeguard.
+-   **Minor Agent Overhead:** While mostly automatic, the agent needs to be aware of the system to use the override tool effectively.

package/docs/adrs/003-checkpoint-trigger-strategy.md ADDED Viewed

@@ -0,0 +1,35 @@
+# ADR 003: Checkpoint Triggering Strategy
+**Status:** Accepted
+**Date:** 2024-07-27
+## Context
+The system needs a reliable strategy for when to automatically save a checkpoint. The goal is to ensure data durability and create logical, meaningful save points for resumption, without creating excessive, low-value checkpoints or putting the entire burden on the agent.
+We considered the following options:
+1.  **Manual-Only Trigger:** The agent is solely responsible for calling `workflow_checkpoint_save`. This provides maximum control but is brittle, as agent oversight could lead to data loss.
+2.  **Time-Based Trigger:** Automatically save a checkpoint every N minutes. This is simple but disconnected from the workflow's actual progress, potentially creating checkpoints in awkward, non-resumable states.
+3.  **Step-Based Trigger:** Save a checkpoint after every N steps. This is better, but still arbitrary and can create too many checkpoints for simple workflows.
+4.  **Phase-Based Trigger:** Save a checkpoint after a logical "phase" of work is completed, as defined in the workflow's structure.
+## Decision
+We will implement a **hybrid checkpoint triggering strategy** that combines automatic phase-based triggers with a manual agent override.
+-   **Automatic Phase-Based Trigger (Default):** The system will automatically save a checkpoint at the end of a major workflow "phase." A phase is a logical unit of work, which can be defined in the workflow's metadata (e.g., a group of steps). This provides a reliable, automatic baseline.
+-   **Manual Override (Agent Control):** The agent can explicitly call `workflow_checkpoint_save` at any time to force a save at a critical moment that might not align with a phase boundary.
+This approach creates a system that is both reliable by default and flexible enough to handle the unpredictable nature of AI-driven tasks.
+## Consequences
+### Positive:
+-   **High Data Durability:** Guarantees that progress is saved regularly at logical intervals without requiring perfect agent behavior.
+-   **Meaningful Checkpoints:** Checkpoints align with the workflow's semantic structure, making them easier for a user or agent to understand and choose from when resuming.
+-   **Agent Flexibility:** The manual override provides a crucial escape hatch for agents to save state at critical junctures (e.g., after receiving a key insight or before attempting a risky operation).
+-   **Balanced Performance:** Avoids the overhead of saving after every single step, striking a balance between data safety and performance.
+### Negative:
+-   **Requires Workflow Annotation:** For the automatic trigger to be most effective, workflow authors are encouraged to structure their workflows with logical "phases" in their metadata. Workflows without this annotation will have less meaningful automatic checkpoints.
+-   **Slightly More Complex Implementation:** The server needs logic to parse workflow phase boundaries in addition to handling the manual tool call.

package/docs/adrs/004-opt-in-encryption-strategy.md ADDED Viewed

@@ -0,0 +1,36 @@
+# ADR 004: Opt-In Encryption Strategy
+**Status:** Accepted
+**Date:** 2024-07-27
+## Context
+Workflow context can contain sensitive information. Storing this data in plain text on a user's local machine poses a security risk, especially on multi-user systems or if the machine is compromised. We need a strategy to protect this data at rest.
+The primary options considered were:
+1.  **No Encryption:** Simple, but insecure. Not a viable option for a production tool.
+2.  **Encryption by Default:** Maximum security, but introduces performance overhead and potential key management complexity for all users, even those who do not handle sensitive data.
+3.  **User-Provided Key:** Require the user to supply an encryption key via an environment variable or config file. This is flexible but places a significant security burden on the user (e.g., key storage, rotation).
+4.  **Opt-In Encryption with Secure Key Management:** Make encryption an optional feature that, when enabled, uses the native, secure credential storage facilities of the host operating system.
+## Decision
+We will implement an **opt-in encryption strategy using the host OS's native keychain** for secure, non-interactive key management.
+-   **Disabled by Default:** Encryption will be off by default to ensure maximum performance and simplicity for the common case.
+-   **Enabled via Configuration:** Users can enable encryption with a simple configuration flag (e.g., `WORKRAIL_ENCRYPTION=enabled`).
+-   **OS Keychain Integration:** When enabled, the server will generate a master encryption key and store it securely in the appropriate OS keychain (macOS Keychain, Windows Credential Manager, or Linux Secret Service API via a library like `keytar`). This avoids storing raw keys in config files.
+-   **Transparent Operation:** Once enabled, the encryption and decryption of context blobs will be handled transparently by the storage layer.
+## Consequences
+### Positive:
+-   **User Choice & Flexibility:** Users who do not need encryption are not impacted by its performance overhead. Those who do can enable it with a single, simple flag.
+-   **High Security:** Uses industry-standard, secure key storage mechanisms provided by the operating system, which is significantly more secure than storing keys in plain text files.
+-   **Good User Experience:** Avoids burdening the user with manual key management. The process is non-interactive and transparent after the initial setup.
+-   **Aligns with Professional Tooling:** This approach is standard practice for mature developer tools that handle potentially sensitive local data.
+### Negative:
+-   **Not Secure by Default:** Requires a conscious choice from the user to enable protection. Users who are unaware or forget to enable the feature will have their data stored in plain text. (This can be mitigated with clear documentation).
+-   **Added Dependency:** Requires adding a dependency (e.g., `keytar`) to interact with the various OS keychains.
+-   **Platform Complexity:** Requires implementation and testing across all three major platforms (macOS, Windows, Linux), each of which has a different native API for secret storage.

package/docs/adrs/005-agent-first-workflow-execution-tokens.md ADDED Viewed

@@ -0,0 +1,105 @@
+# ADR 005: Agent-First Workflow Execution via Opaque Tokens
+**Status:** Accepted
+**Date:** 2025-12-17
+## Context
+WorkRail workflows are driven by an LLM agent operating inside a chat UI. Chat UIs allow users to rewind/edit conversation history, which makes any server-side “current pointer” state inherently unreliable: the agent can unknowingly resume from an earlier point in time.
+WorkRail runs locally (stdio MCP) and is intended for a single developer machine. We optimize for an “honest-but-buggy” caller model rather than malicious clients. Confidentiality is not a primary requirement, but integrity and fail-fast validation remain important to catch accidental corruption and protocol drift.
+This ADR is constrained by the MCP platform model (no server push into the chat, no transcript access, lossy/replayed tool calls). See:
+- `docs/reference/mcp-platform-constraints.md`
+We recently moved to a state/event workflow engine and exposed internal engine state at the MCP boundary (`state` + optional `event`). While correct and expressive, this leaks engine internals into the agent contract. In practice it increases agent error rate, increases payload complexity, and makes tool descriptions harder to keep aligned with actual tool schemas.
+We also have “session tools” primarily to power a dashboard. Today, session state can drift out of sync under rewinds because the dashboard/session layer treats the server’s session pointer as authoritative.
+As a concrete example of boundary drift, we observed `workflow_next` tool descriptions still referencing `completedSteps` while the actual tool contract had already shifted to a state/event schema. This kind of mismatch is easy to introduce when the public API exposes engine internals and evolves quickly.
+Rewinds create an additional reality gap: meaningful work often happens “between workflow calls.” If a user rewinds after substantial off-workflow work (e.g., implementing code guided by the agent without advancing a workflow step), that context is lost unless WorkRail offers a durable persistence primitive that does not advance workflow state.
+## Before / After (Why this ADR exists)
+### Before (agent submits progress)
+- Agent submits “progress” directly (e.g., completed steps).
+- Issues:
+  - Easy for the agent to hallucinate or drift from reality.
+  - Hard to model loops and step instances precisely.
+  - Rewinds are ambiguous (what is “the current run” if history changed?).
+### Current (engine internals exposed)
+- Agent submits engine internals (`state` + `event`) and must understand protocol mechanics.
+- Issues:
+  - Tool boundary is not agent-first; the agent constructs data it shouldn’t need to understand.
+  - Description drift is easy (we already saw mismatches between tool descriptions and input schemas).
+### Proposed (opaque tokens)
+- Agent submits opaque tokens minted by WorkRail; WorkRail owns the engine mechanics.
+- Rewinds become a first-class, correct behavior (older token → older snapshot).
+## Decision
+Make workflow execution tools **agent-first** by hiding engine internals behind two opaque primitives:
+- **`stateToken`**: an opaque, server-minted handle that refers to a workflow execution snapshot (identified by workflow hash and run/node identity). The client must round-trip it unchanged.
+- **`ackToken`**: an opaque, server-minted acknowledgement token representing “I completed the pending step WorkRail instructed for this `stateToken`”. The server must treat `(stateToken, ackToken)` as **idempotent** (replay returns the same response and does not double-advance).
+Sessions are **demoted** to a dashboard/UX projection over immutable token lineage rather than an authoritative pointer:
+- The dashboard groups and renders runs for human consumption.
+- The workflow “truth” is always derived from the token presented by the client.
+- Rewinds/forks are represented as branches in token lineage, not as “session desync”.
+Pin workflow definitions for deterministic runs:
+- `start_workflow` pins a run to a specific workflow snapshot identified by a content hash (`workflowHash`).
+- `stateToken` embeds `workflowHash` so runs remain deterministic even if workflow files evolve.
+- Pinned workflow snapshots must be persisted in session storage to support export/import and long-lived runs.
+Tools are renamed for clarity:
+- `workflow_get` → `inspect_workflow` (read-only definition/preview; never executes)
+- `workflow_next` → split into `start_workflow` + `continue_workflow` (explicit lifecycle: start a run, then continue it)
+Introduce a checkpoint primitive for rewind resilience outside the strict workflow step loop:
+- `checkpoint_workflow`: append a durable summary/artifact without advancing workflow state. This is intended for “off-workflow” work and post-workflow iteration. It should be treated as opt-in and can be gated behind a feature flag while it is validated in real usage.
+## Consequences
+### Positive
+- **Rewind/fork correctness**: rewinding a chat naturally reuses an older `stateToken` and continues from that snapshot.
+- **Agent usability**: the agent no longer constructs engine internals (loop stacks, discriminated unions, etc.).
+- **Modes without boundary expansion**: guided vs full-auto behavior can be controlled by WorkRail-defined preferences without exposing engine internals at the MCP boundary.
+- **Idempotent progress**: replays of the same completion do not advance twice.
+- **Dashboard consistency**: sessions become a graph/timeline of token lineage; rewinds produce branches rather than desync.
+- **Durable memory**: checkpoints capture high-signal progress that would otherwise be lost when rewinding or trimming chat context.
+- **Portable sharing**: sessions can be exported/imported locally, and rendered exports (e.g., Markdown/PDF) can be generated as projections of session artifacts.
+- **Deterministic runs**: pinning workflows by content hash prevents “live” behavior changes mid-run when workflow files are edited.
+### Negative
+- **Token design requirements**: tokens must be versioned, validated, and (at minimum) tamper-evident (e.g., signature/HMAC).
+- **Migration effort**: MCP tool names and contracts change; workflow authors may need to add explicit output contracts for structured dashboards (see contract spec).
+- **More surface area**: checkpointing adds a new tool and a new user/agent behavior. To mitigate, keep checkpoint payloads minimal and gate behind a feature flag until proven.
+- **More persistence**: pinning workflow snapshots and storing event logs requires explicit session storage (still local-only and small).
+## Alternatives Considered
+- **Server-managed session as truth**: breaks under chat rewinds unless additional concurrency/forking machinery is required.
+- **Keep exposed engine state**: correct but not agent-first; easy to drift in descriptions/schemas and increases agent failure modes.
+- **ClientCallId idempotency**: adds additional surface area; we prefer to make `ackToken` idempotent.
+- **Runtime inference of step semantics**: brittle across diverse workflows; avoid guessing.
+## Related
+- Tool contract spec: `docs/reference/workflow-execution-contract.md`
+- Append-only session/run log: `docs/adrs/006-append-only-session-run-event-log.md`
+- Resumption lookup + checkpoint-only sessions: `docs/adrs/007-resume-and-checkpoint-only-sessions.md`

package/docs/adrs/006-append-only-session-run-event-log.md ADDED Viewed

@@ -0,0 +1,76 @@
+# ADR 006: Append-Only Session/Run Event Log as Source of Truth
+**Status:** Accepted
+**Date:** 2025-12-17
+## Context
+WorkRail runs as a local stdio MCP server. Chat UIs can rewind/edit history without warning, and WorkRail cannot read the chat transcript or push messages into it. Agents are lossy and can replay tool calls.
+We need workflow execution to be deterministic, rewind-safe, and export/import resumable, while keeping the MCP tool surface small and hard to misuse.
+## Decision
+Use an **append-only event log per session** as the **source of truth** for durable state, and derive all dashboard/session views as projections.
+Storage-level invariants (segmentation, crash-safe append, integrity/recovery, etc.) are consolidated and locked in:
+- `docs/design/v2-core-design-locks.md`
+Definitions:
+- **Session**: a UX grouping for a single workstream (ticket/PR/chat). Sessions are not an authoritative pointer; they are a projection over stored events.
+- **Run**: a single workflow execution pinned to a workflow snapshot. A session can contain 0..N runs.
+- **Node graph**: each run forms a **DAG of nodes** representing durable execution snapshots. Rewinds naturally create **branches** (multiple children from the same parent node). Nodes can be created both by step advancement and by checkpointing (durable writes without advancement).
+We also record **observations** (e.g., git branch, HEAD SHA) as append-only events to improve resume/search accuracy without relying on agent-provided identity.
+Preference changes are also treated as durable truth:
+- Preferences are a closed set of WorkRail-defined values (not arbitrary key/value).
+- Effective preferences are recorded on nodes (or via append-only events) so behavior is rewind-safe and export/import safe.
+Capability observations are also durable truth:
+- WorkRail cannot introspect the agent’s environment/tooling.
+- Capability availability must be learned via explicit agent-reported observations (e.g., probe steps) and recorded durably so projections and resumption do not depend on ambient IDE state.
+Output contracts and enforcement outcomes are also durable truth (when enabled):
+- WorkRail must not infer schemas from prompts; required outputs are declared explicitly (contract packs).
+- If a step acknowledgement is blocked due to missing/invalid required output, the structured reason should be recorded (or reconstructible) from the durable run graph so Studio and exports remain explainable without relying on chat transcript.
+Compiled workflow snapshots are treated as part of the durable truth:
+- Workflow execution is pinned to a `workflowHash` computed from the **fully expanded compiled workflow** (including built-in template expansion, feature application, and selected contract packs).
+- The pinned snapshot (by hash) is persisted so export/import and long-lived runs remain deterministic even as on-disk workflow files or builtin packs evolve.
+## Why
+- **Rewind-safe correctness**: “current pointer” state drifts under rewinds. A graph of immutable snapshots makes forks explicit and correct.
+- **Deterministic resume**: execution resumes from a stored snapshot node, not from chat context.
+- **Export/import resumability**: portable events and node snapshots can be moved between machines; tokens are re-minted on import.
+- **Tool simplicity**: agents should not manage or construct engine internals; the engine owns the mechanics.
+## Consequences
+- Tokens (`stateToken`, `ackToken`) are **handles**, not truth. The canonical truth is stored events and node snapshots.
+- The storage model must support:
+  - stable identifiers for runs and nodes
+  - edges keyed for idempotency
+  - portable node payloads sufficient to rehydrate execution deterministically
+- The event log may include a bounded “decision trace” (step selection reasons, loop decisions, fork detection) to support debugging/auditing as first-class capabilities without relying on chat transcript history.
+- “Sessions” and “latest” are derived views; no authoritative server-side cursor.
+- Global preferences are treated as defaults only: they are copied into a session baseline at the start of work, so later global changes do not retroactively affect existing runs.
+- Export/import bundles should be able to carry the durable truth (events, node snapshots, pinned workflow snapshots) so another machine can import and resume deterministically.
+- The **Workrail Console** (UI) is a derived projection over the event log and stored workflow snapshots; it does not mutate the event log except indirectly via workflow editing (which writes to source storage).
+## Related
+- MCP constraints: `docs/reference/mcp-platform-constraints.md`
+- Normative tool contract: `docs/reference/workflow-execution-contract.md`
+- Token boundary decision: `docs/adrs/005-agent-first-workflow-execution-tokens.md`
+- Resume and checkpoint-only sessions: `docs/adrs/007-resume-and-checkpoint-only-sessions.md`
+- Storage and other “easy to drift” v2 locks: `docs/design/v2-core-design-locks.md`

package/docs/adrs/007-resume-and-checkpoint-only-sessions.md ADDED Viewed

@@ -0,0 +1,51 @@
+# ADR 007: Resumption UX via Lookup + Checkpoint-Only Sessions (Flagged)
+**Status:** Accepted
+**Date:** 2025-12-17
+## Context
+WorkRail cannot access chat history. A brand new chat has no transcript context and cannot “resume” unless it supplies some external handle.
+Requiring users to copy/paste opaque execution tokens (`stateToken`) between chats is high friction. However, adding broad session CRUD tools expands the tool surface area and is easy for agents to misuse.
+## Decision
+Introduce two **optional, feature-flagged** MCP tools that improve resumption without reintroducing mutable session state:
+1) **`resume_session` (flagged)**: a read-only lookup tool that supports queries like “resume my session about xyz”.
+   - Returns **tip-only** resume targets (latest branch tip by deterministic policy).
+   - Uses a **layered search** strategy (titles/keys first; durable notes/artifact previews next; deep search last).
+   - Returns bounded, high-signal snippets for disambiguation and a recommended candidate when confidence is high.
+2) **`start_session` (flagged)**: enables checkpoint-only sessions in the future.
+   - Mints an opaque `sessionToken` / `sessionRef` so `checkpoint_workflow` can attach durable outputs even when no workflow run is active.
+   - Does **not** introduce general-purpose session update/read/write APIs.
+Both tools are behind the same feature flag(s) as the capability they unlock.
+## Why
+- **Resumption requires a handle**: without transcript access, the system must rely on durable storage.
+- **Reduce user friction**: lookup by “xyz” is materially easier than copy/paste of long tokens.
+- **Keep tool surface area small**: one lookup tool + one narrowly-scoped session tool, both flagged, avoids a sprawling CRUD API.
+- **Deterministic + rewind-safe**: resumption should target the run’s latest tip; rewinds/forks should be explicit and represented as branches.
+## Consequences
+- `resume_session` must be backed by durable storage and stable projections (see ADR 006).
+- Resume ranking uses layered search: git observations (branch/HEAD SHA) at highest tier, then durable node recap outputs, then workflow id/name matching. No session-level title/tag fields exist; aboutness is derived from observations and outputs.
+- The contract must clearly separate:
+  - **resumption** (tip): return bounded durable recap up to the pending step
+  - **rewind/fork** (non-tip): automatically fork and return branch-focused context the agent likely lost (including a bounded downstream recap), without requiring user confirmation
+- Export/import must be **resumable**:
+  - bundles carry portable run graph nodes and pinned workflow snapshots
+  - tokens are re-minted on import from stored node snapshots
+## Related
+- MCP constraints: `docs/reference/mcp-platform-constraints.md`
+- Normative tool contract: `docs/reference/workflow-execution-contract.md`
+- Append-only storage model: `docs/adrs/006-append-only-session-run-event-log.md`
+- Token boundary decision: `docs/adrs/005-agent-first-workflow-execution-tokens.md`

package/docs/adrs/008-blocked-nodes-architectural-upgrade.md ADDED Viewed

@@ -0,0 +1,178 @@
+# ADR 008: Blocked Nodes as First-Class DAG Nodes (Architectural Upgrade)
+**Status:** Accepted and Implemented
+**Date:** 2026-01-10
+**Implemented:** 2026-02-17
+## Context
+WorkRail v2's current model records validation failures as outcome annotations on `advance_recorded` events:
+```typescript
+advance_recorded.outcome = { kind: "blocked", blockers: BlockerReport }
+```
+This architectural approach has a correctness limitation: when a client retries with corrected output using the same `ackToken`, idempotent replay returns the recorded `blocked` outcome unchanged, ignoring the new output. Agents perceive this as a stuck state and require an unnecessary rehydrate round-trip to break the loop.
+**Root cause:** Blocked is modeled as an outcome annotation on an advance attempt, not as a durable node in the run DAG. The validation that led to the block is not recorded as a first-class event, making replay deterministic but non-responsive to new inputs.
+Additionally, projections must scan the entire event log to find blocked outcomes (O(n) query), and blocked nodes cannot be rendered as topology-level DAG nodes in Studio/Console.
+## Decision
+**Upgrade the v2 model to treat blocked attempts as first-class DAG nodes:**
+1. **`nodeKind="blocked_attempt"` nodes** exist in the run DAG alongside `step` and `checkpoint` nodes.
+2. **`validation_performed` events** record validation results durably before any outcome is recorded.
+3. **`retryAckToken`** is minted for retryable blocks, enabling agents to retry with corrected output in a single call.
+4. **Terminal blocks** create nodes but do not mint retry tokens (architectural consistency; they just don't open a retry path).
+### What This Means
+#### Before (outcome annotation):
+```
+Validation fails
+  → record: advance_recorded.outcome = { kind: "blocked" }
+  → no validation event exists
+  → blocking decision not durable
+Retry with ackToken
+  → idempotent replay returns original blocked outcome
+  → ignores new output
+  → agent stuck (needs rehydrate)
+```
+#### After (DAG nodes):
+```
+Validation fails
+  → record: validation_performed event (durable validation facts)
+  → record: blocked_attempt node in DAG
+  → record: edge from parent to blocked node
+  → mint retryAckToken (if retryable)
+Retry with retryAckToken
+  → validates new output
+  → creates success node OR chains another blocked node
+  → agents can iterate without rehydrate
+```
+### Alignment with Design Locks
+This change is an **intentional architectural upgrade** of v2's design locks (not a violation):
+**ADR 005 (Tokens)**: Still satisfied. Tokens remain opaque; clients round-trip them unchanged.
+**ADR 006 (Append-Only)**: Strengthened. Blocked outcomes become first-class nodes in the append-only DAG.
+**ADR 007 (Resume & Checkpoint)**: Compatible. Blocked nodes are valid tips; checkpoints can follow blocked nodes.
+**Design Locks § 1–17**: All preserved (append-only truth, idempotency, snapshot pinning, crash-safety).
+**Design Locks § 18 (Event schema)**: Updated with intent. The blocked outcome on `advance_recorded` is deprecated (marked for removal after 2 releases) in favor of blocked_attempt nodes. This is **deliberate evolution**, not drift—both models coexist during the deprecation buffer.
+### Decision ID
+**Locked Decision 5** (per implementation_plan.md § 7, Decision 5, lines 830–843):
+> "Blocked attempts are durable DAG nodes, not outcome annotations. Validation results are first-class events. This is an architectural upgrade (not a violation of append-only semantics) that strengthens guarantees while remaining backward compatible."
+## Consequences
+### Positive
+- **Single-call retry**: Agents can retry with corrected output in one call (better UX).
+- **Observable validation**: Validation facts are durable and auditable.
+- **DAG topology = state machine**: Run status can be derived from node kinds (O(1) query).
+- **Studio rendering**: Blocked nodes render as first-class DAG nodes.
+- **Terminal blocks**: Architecturally consistent (they create nodes, just not retry paths).
+- **Backward compatible**: New response fields (`retryable`, `retryAckToken`, `validation`) are additive; old clients ignore them.
+### Risks & Mitigations
+| Risk | Mitigation |
+|------|-----------|
+| Removing blocked from `advance_recorded` breaks projection consumers | S3-WP2.5: Comprehensive search + refactor before schema change |
+| Validation event size causes log bloat | Bounded truncation (deterministic, sorted issues/suggestions) + tests |
+| Atomicity breaks under crashes | Single append transaction (validation event + node + edge) + tests |
+| Key rotation breaks replay determinism | Validation events are durable; tokens re-signed; functional equivalence proven in tests |
+| Chained blocks need complex parent tracking | Parent linkage via `parentNodeId`; blocked nodes chain naturally |
+### Non-Goals
+- Changing validation logic (ValidationEngine stays, but exceptions → Result types).
+- Supporting retry for already-advanced nodes.
+- Building Studio UI (substrate only; UI is post-core).
+- Adding `ackDisposition` field (YAGNI).
+- Creating validation projection (load on-demand).
+## Implementation Safeguards
+(See implementation_plan.md § 8–15 for full execution details.)
+### Pre-Implementation
+1. **ADR sign-off** (this document): confirms intention to upgrade v2 locks.
+2. **Comprehensive search**: `rg "advance_recorded.*blocked|outcome.*blocked"` identifies all consumers.
+3. **Environment checks**: Verify `ValidationEngine` can be refactored to Result types without major surgery.
+### Per-Slice
+- **Slice 1** (Schemas): New types, backward-compatible response fields.
+- **Slice 2** (Validation Events): Emit durable validation events before blocking decisions.
+- **Slice 3** (Blocked Nodes): Create nodes; deprecate (not remove) blocked outcome.
+- **Slices 4–7** (Retry, Projections, Tests): Complete the feature with comprehensive test coverage.
+### Testing Strategy
+- Unit tests: Schema validation, truncation determinism, token signing.
+- Integration tests: End-to-end blocked→retry flow, chained blocks, atomicity.
+- Determinism tests: Key rotation, idempotency, replay equivalence.
+- Edge case tests: Terminal blocks, missing validation events, concurrent retries.
+### Backward Compatibility
+- `advance_recorded.outcome.kind="blocked"` remains valid during a 2-release buffer period (deprecated, not removed).
+- Feature flag `USE_BLOCKED_NODES` (default: true) allows rollback if needed.
+- New response fields are optional; old clients parse successfully.
+- Projections updated before schema change (no orphaned consumers).
+## Rationale
+### Why This Upgrade Aligns with WorkRail Philosophy
+**Immutability + append-only truth**: Blocked nodes are immutable DAG nodes; validation events are durable facts. No mutation.
+**Architectural fix over patch**: Root cause is "blocked is not a node"; solution is "make it one." This is not a parameter tweak but a model upgrade.
+**Make illegal states unrepresentable**: Discriminated union (`retryable_block | terminal_block`) prevents `{ terminal, retryAttemptId: "x" }`.
+**Type safety first**: Node kind encodes semantics; retry paths are topology, not flags.
+**Determinism over cleverness**: Retry token derivation is pure; validation events immutable; replay fact-returning.
+**Errors as data**: Validation results are structured, durable events (not inferred from chat).
+**Observable**: Blocked nodes + validation events = first-class audit trail.
+## Comparison to Alternatives
+### Alternative 1: Outcome annotation + retry token (Hybrid 1)
+- **Pro**: Minimal schema change.
+- **Con**: Blocking decision still not durable; projections still scan events; nodes don't appear in DAG.
+- **Verdict**: Patches the symptom; doesn't address root cause.
+### Alternative 2: Blocked nodes + no validation event (Minimal)
+- **Pro**: Fewer events.
+- **Con**: No durable validation facts; replay must recompute; non-deterministic under validation engine changes.
+- **Verdict**: Violates append-only principle and determinism lock.
+### Chosen: Blocked nodes + validation events (Full Upgrade)
+- **Pro**: Durable validation, observable, deterministic, consistent.
+- **Con**: More events (but bounded by truncation and budgets).
+- **Verdict**: Strongest alignment with v2 philosophy.
+## References
+- `docs/adrs/005-agent-first-workflow-execution-tokens.md` (token basis)
+- `docs/adrs/006-append-only-session-run-event-log.md` (append-only principle)
+- `docs/design/v2-core-design-locks.md` (locked decisions)
+- `implementation_plan.md` (detailed execution plan, 1214 lines, 7 vertical slices)