@exaudeus/workrail 3.27.0 → 3.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +3 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +252 -93
- package/workflows/workflow-for-workflows.v2.json +188 -77
|
@@ -0,0 +1,525 @@
|
|
|
1
|
+
# WorkRail Autonomous Platform: MVP Discovery
|
|
2
|
+
|
|
3
|
+
> Design discovery for the autonomous WorkRail platform: minimum console changes, 12-month product vision, and competitive positioning. Generated: 2026-04-14.
|
|
4
|
+
>
|
|
5
|
+
> **Artifact strategy:** This document is a human-readable reference. Execution truth (decisions, rationale, open questions) is recorded in WorkRail session notes and context variables -- not in this file. This file may be out of date if the session was rewound; the session notes are always authoritative.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Ask
|
|
10
|
+
|
|
11
|
+
The goal is threefold:
|
|
12
|
+
1. Define the **minimum console changes** to make autonomous sessions visible and controllable (live view, pause/resume/cancel, real-time step progress)
|
|
13
|
+
2. Articulate the **full product vision** when daemon + triggers + evidence + live console all assemble
|
|
14
|
+
3. Establish **how WorkRail surpasses** nexus-core, OpenClaw, ruflo, and Devin in that 12-month vision
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Path Recommendation: `full_spectrum`
|
|
19
|
+
|
|
20
|
+
The dominant need is both landscape grounding (what exists in the codebase today) and reframing (what the product becomes). Neither a pure landscape audit nor a pure concept reframe is sufficient. The existing codebase is already sophisticated; the design question is where the minimum seam lies for the autonomous platform.
|
|
21
|
+
|
|
22
|
+
**Why not `landscape_first`:** We already have extensive context (ideas backlog, nexus comparison docs). The bottleneck is not understanding -- it's framing the coherent product arc.
|
|
23
|
+
|
|
24
|
+
**Why not `design_first`:** The MVP console changes are concrete code changes. Landing on the right design requires grounding in the actual component structure (SessionList, hooks, API layer, SSE, ConsoleService).
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Constraints / Anti-goals
|
|
29
|
+
|
|
30
|
+
**Hard constraints:**
|
|
31
|
+
- Console is currently read-only -- the MVP must introduce write operations carefully
|
|
32
|
+
- SSE infrastructure already exists (`/api/v2/workspace/events`) -- extend, do not replace
|
|
33
|
+
- React Query deduplication must be preserved -- no second SSE connection from new hooks
|
|
34
|
+
- The `ConsoleService` is stateless and read-only by design -- autonomous control endpoints need their own service class
|
|
35
|
+
- All new API routes must follow the `{ success: true, data: T }` envelope pattern
|
|
36
|
+
|
|
37
|
+
**Anti-goals:**
|
|
38
|
+
- Do not build the full daemon in the MVP -- the console changes must be useful even with manual autonomous sessions
|
|
39
|
+
- Do not change the existing token protocol -- autonomous sessions are regular sessions that happen to run without a human at the keyboard
|
|
40
|
+
- Do not require OpenClaw or pi-mono as dependencies -- WorkRail's autonomous mode must be freestanding
|
|
41
|
+
- Do not add a database -- the existing append-only event log is the source of truth
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Landscape Packet
|
|
46
|
+
|
|
47
|
+
### Ruflo competitive update (verified April 2026)
|
|
48
|
+
|
|
49
|
+
ruflo v3.5 (ruvnet/claude-flow, MIT, GitHub) -- "Enterprise AI Orchestration Platform":
|
|
50
|
+
- 16 specialized agent roles, 100+ pre-built agents, multi-LLM support (Claude/GPT/Gemini/Llama)
|
|
51
|
+
- SONA self-learning: pattern storage via HNSW-indexed vector memory, "sub-millisecond retrieval"
|
|
52
|
+
- Byzantine fault-tolerant consensus, hierarchical/mesh/ring/star topologies
|
|
53
|
+
- Claims "7-phase pipeline with 4 gates the model cannot bypass" (via `@claude-flow/guidance`) -- but this is prompt-based, not cryptographic
|
|
54
|
+
- Claims session persistence ("saves context across conversations") -- but no durable append-only event log
|
|
55
|
+
- Key claim to evaluate: "4 gates the model cannot bypass" is marketing; bypassing requires the agent to not call the gate tool, which is exactly what context pressure enables. WorkRail's HMAC token protocol makes the bypass mathematically infeasible, not merely instructionally discouraged.
|
|
56
|
+
|
|
57
|
+
### Existing WorkRail daemon infrastructure (from codebase depth audit)
|
|
58
|
+
|
|
59
|
+
**What already exists (ready to reuse):**
|
|
60
|
+
- `ExecutionSessionGateV2` -- central choke point: lock acquisition + health validation + witness minting. The daemon reuses this exactly via `gate.withHealthySessionLock(sessionId, fn)`.
|
|
61
|
+
- `SessionLockPortV2` -- OS-level exclusive file lock, cross-process safe, fail-fast. The daemon respects this lock natively since it calls `continue_workflow` through the gate.
|
|
62
|
+
- `ResumeSessions` usecase -- discovers resumable sessions; daemon can reuse for "find sessions needing continuation"
|
|
63
|
+
- `HttpServer` -- console + MCP run in the SAME Node.js process (confirmed). `DaemonRegistry` is in-process Map with no IPC needed. The server startup mounts routes via `mountRoutes()`; daemon mounts the same way.
|
|
64
|
+
- `mountConsoleRoutes()` pattern -- the daemon's control routes follow this exact pattern; a `mountDaemonRoutes(app, daemonRegistry)` function mounts alongside the console routes in `composeServer()`
|
|
65
|
+
- `ShutdownEvents` port -- daemon hooks into this for graceful teardown (not direct signal handlers)
|
|
66
|
+
- `ProcessLifecyclePolicy` -- daemon respects this; no signal handlers in test mode
|
|
67
|
+
|
|
68
|
+
**What does NOT exist (daemon must implement):**
|
|
69
|
+
- Daemon service class (the LLM API client + agent loop + session driver) -- net-new
|
|
70
|
+
- `DaemonRegistry` -- net-new (design exists in this doc)
|
|
71
|
+
- Control endpoints (pause/resume/cancel) -- net-new
|
|
72
|
+
- Trigger system (webhook/cron/CLI) -- net-new
|
|
73
|
+
- LLM tool call interception (before/after hooks) -- net-new
|
|
74
|
+
- `AbortController` usage in daemon -- only existing use is in remote-workflow-storage fetch timeout; daemon owns this pattern for its API calls
|
|
75
|
+
|
|
76
|
+
### Current console architecture (from source)
|
|
77
|
+
|
|
78
|
+
**Backend:**
|
|
79
|
+
- `ConsoleService` -- stateless read-only projections: `getSessionList()`, `getSessionDetail()`, `getNodeDetail()`
|
|
80
|
+
- `ConsoleServicePorts` -- 5 ports: `directoryListing`, `dataDir`, `sessionStore`, `snapshotStore`, `pinnedWorkflowStore`
|
|
81
|
+
- `mountConsoleRoutes()` -- mounts GET routes + SSE endpoint; returns `stopWatcher` disposer
|
|
82
|
+
- SSE: `/api/v2/workspace/events` broadcasts `{type: "change"}` on `.jsonl` writes; `{type: "worktrees-updated"}` on background enrichment completion
|
|
83
|
+
- File watcher: `watchSessionsDir()` watches sessions dir recursively, filters to `.jsonl` changes
|
|
84
|
+
|
|
85
|
+
**Frontend hooks / API:**
|
|
86
|
+
- `useSessionList()` -- React Query, key `['sessions']`, 30s poll, 25s stale
|
|
87
|
+
- `useSessionDetail()` -- React Query, key `['session', id]`, 5s poll, 3s stale
|
|
88
|
+
- `useWorkspaceEvents()` -- SSE client; on `{type: "change"}` invalidates `['sessions']`; on `{type: "worktrees-updated"}` invalidates `['worktrees']`
|
|
89
|
+
- `useSessionListRepository()` -- wraps `useSessionList()`, uses `isLoading` not `isFetching` to avoid background-refetch flicker
|
|
90
|
+
|
|
91
|
+
**Session list view:** `SessionList.tsx` -- search, sort, group, filter by status; `SessionCard` shows title, workflow, git branch, node count, status badge, health badge, last modified time
|
|
92
|
+
|
|
93
|
+
**Session detail view:** `SessionDetail.tsx` -- `SessionMetaCard` + per-run `RunCard`; `RunCard` has DAG tab + TRACE tab (when trace data exists); floating `NodeDetailSection` slide-in panel on node click
|
|
94
|
+
|
|
95
|
+
**DAG visualization:** `RunDag.tsx` (full) + `RunLineageDag.tsx` (lineage variant); ReactFlow-based; node kinds: step (gold), checkpoint (green), blocked_attempt (red); preferred tip gets gold glow
|
|
96
|
+
|
|
97
|
+
**Console types (`api/types.ts`):**
|
|
98
|
+
- `ConsoleSessionStatus`: `in_progress | complete | complete_with_gaps | blocked | dormant`
|
|
99
|
+
- `ConsoleRunStatus`: `in_progress | complete | complete_with_gaps | blocked`
|
|
100
|
+
- No `paused` or `cancelled` status exists yet -- needed for autonomous mode
|
|
101
|
+
|
|
102
|
+
**Key observation:** The console has no write path today. All mutation (advance, pause, cancel) would be net-new endpoints.
|
|
103
|
+
|
|
104
|
+
### Reference architecture synthesis
|
|
105
|
+
|
|
106
|
+
| Source | Stars | Key pattern for WorkRail |
|
|
107
|
+
|--------|-------|-------------------------|
|
|
108
|
+
| OpenClaw | 357k | `SessionActorQueue` per-session serialization; `RuntimeCache` idle eviction; `SpawnAcpParams` spawn interface; `AbortController` for cancellation; policy system (`isXxxEnabledByPolicy`) |
|
|
109
|
+
| pi-mono | 35k | `agentLoop`/`agentLoopContinue` pattern; `BeforeToolCallResult`/`AfterToolCallResult` hooks; `EventStream<AgentEvent>` for streaming; Slack bot as simplest channel integration |
|
|
110
|
+
| nexus-core | 11 (internal) | Phase transition enforcement; skills-as-slash-commands; per-repo context injection; multi-model routing |
|
|
111
|
+
| Claude Code | N/A | Pre-compact hooks; session memory as durable store; `PreToolUse`/`PostToolUse` hooks for evidence collection; subagent coordinator model |
|
|
112
|
+
|
|
113
|
+
**Critical insight from OpenClaw `RuntimeCache`:** The in-memory `Map<actorKey, {state, lastTouchedAt}>` with idle TTL is the right shape for tracking live daemon sessions. WorkRail's daemon equivalent: `Map<sessionId, {pid, startedAt, lastHeartbeatMs, status: 'running' | 'paused' | 'cancelled'}>`. This is the in-process state that the console live view reads.
|
|
114
|
+
|
|
115
|
+
**Critical insight from pi-mono `BeforeToolCallResult`:** WorkRail's evidence gating hook is exactly this. A `BeforeToolCallResult` that returns `{block: true, reason: "continue_workflow token required"}` is how you enforce that the agent cannot call `continue_workflow` without the required evidence. The daemon intercepts tool calls in this hook.
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Problem Frame Packet
|
|
120
|
+
|
|
121
|
+
### Primary users (by value, not by volume)
|
|
122
|
+
|
|
123
|
+
1. **Solo developer running scheduled maintenance** -- nightly dependency updates, security patches, test suite maintenance. Wants to wake up to completed changes with an audit trail. Success: verify 10 sessions in 5 minutes via summary, with drill-down available.
|
|
124
|
+
2. **Team lead batch-processing code reviews** -- 10-15 PRs overnight, consistent feedback without bottleneck. Success: check batch summary over coffee, spot-check 2 flagged decisions, release all feedback.
|
|
125
|
+
3. **SRE running diagnostic playbooks** -- health checks, incident runbooks at 3am or on alert. Success: faster MTTR, confidence nothing was missed, clear escalation triggers.
|
|
126
|
+
|
|
127
|
+
### The critical reframe (from user lens)
|
|
128
|
+
|
|
129
|
+
**Autonomous mode is NOT a real-time monitoring problem. It is an asynchronous verification problem.**
|
|
130
|
+
|
|
131
|
+
Users are not at the console when the daemon runs. They need "what happened while I was away" -- not "watch it happen live." The primary job is efficient post-execution verification and progressive trust-building over the first 10 sessions.
|
|
132
|
+
|
|
133
|
+
Implications for MVP:
|
|
134
|
+
- The `[ LIVE ]` badge matters less than the post-completion summary
|
|
135
|
+
- A "session health at a glance" view matters more than a real-time tool call stream
|
|
136
|
+
- Pause/cancel are edge case operations (distress signals), not primary UX
|
|
137
|
+
|
|
138
|
+
### The key tension
|
|
139
|
+
|
|
140
|
+
**"I need to trust it ran correctly" vs. "I don't want to read 10,000 lines of logs."**
|
|
141
|
+
Design must enable: verify correctness in 30 seconds, drill down when suspicious. The existing DAG + node detail already provides the drill-down; the gap is the 30-second summary.
|
|
142
|
+
|
|
143
|
+
### Core question
|
|
144
|
+
|
|
145
|
+
What is the minimum seam in the existing console that (a) makes autonomous sessions visible for post-execution verification, and (b) provides pause/cancel as a safety net for the exceptional case when something looks wrong mid-run?
|
|
146
|
+
|
|
147
|
+
### The three facts that constrain the answer
|
|
148
|
+
|
|
149
|
+
**Fact 1: Autonomous sessions are already regular sessions.**
|
|
150
|
+
A daemon-driven session produces the same event log as a human-guided session. `ConsoleService` already reads it. The list view already shows it with `status: 'in_progress'`. The only gap is: the console cannot tell the difference between "human is at the keyboard" and "daemon is running unsupervised." Both look like `in_progress`. The MVP's job is to make that distinction visible.
|
|
151
|
+
|
|
152
|
+
**Fact 2: The control actions (pause/resume/cancel) need a live daemon handle.**
|
|
153
|
+
Pausing a session means sending a signal to the running process that has the active Claude API call. This is not a file operation -- it requires an in-process handle or an inter-process signal. The daemon must maintain a control socket, PID file, or in-memory registry that the console server can reach.
|
|
154
|
+
|
|
155
|
+
**Fact 3: Real-time tool-call progress requires a new event type or a new SSE channel.**
|
|
156
|
+
The current SSE event `{type: "change"}` fires on `.jsonl` writes -- one event per step advance. Tool calls within a step do not produce `.jsonl` writes. To show "currently executing: Bash tool call #3" the daemon must push a separate real-time stream. The simplest implementation: a separate SSE endpoint (`/api/v2/sessions/:id/live`) that the daemon writes to via an in-process pub/sub.
|
|
157
|
+
|
|
158
|
+
### Reframe: what "live view" actually means at MVP
|
|
159
|
+
|
|
160
|
+
The MVP does not need millisecond-resolution tool call streaming. It needs:
|
|
161
|
+
1. A way to know which sessions have a live daemon process (the `is_autonomous` flag)
|
|
162
|
+
2. The current step label and step number of that running session (already derivable from `getSessionDetail`)
|
|
163
|
+
3. A control surface (pause / resume / cancel button) that sends a signal to the daemon
|
|
164
|
+
|
|
165
|
+
The 5-second `refetchInterval` on `useSessionDetail` already provides near-real-time step progress for the current step. What's missing is (1) the flag and (2) the control buttons.
|
|
166
|
+
|
|
167
|
+
### Critical risk: in-memory DaemonRegistry as source-of-truth violation
|
|
168
|
+
|
|
169
|
+
The in-memory `Map<sessionId, DaemonEntry>` approach creates the highest-risk single point of failure in the design. If the registry is wrong (crash, restart, stale entry), the `[ LIVE ]` badge is wrong, control buttons send signals to ghost sessions, and users cannot trust the console. This violates WorkRail's core principle: the append-only event log is the source of truth.
|
|
170
|
+
|
|
171
|
+
**Design response to this risk:** Make the registry a WRITE-ONLY cache of daemon state. The read path (is this session live?) should derive from the event log, not the registry. Specifically: the daemon appends `daemon_heartbeat` context updates to the session at each step start. `ConsoleService` reads the latest heartbeat timestamp; if it is within N seconds, the session is "live." The registry exists only to hold the `AbortController` for cancellation -- it is not the source of truth for liveness.
|
|
172
|
+
|
|
173
|
+
**Amended MVP design:** Two-track approach:
|
|
174
|
+
- **Liveness detection:** `context_set(daemon_heartbeat: "<ISO timestamp>")` at each step advance; if last heartbeat < 60 seconds ago AND session is `in_progress`, display `[ LIVE ]` badge. Crash-safe: if daemon crashes, no new heartbeat events are appended, badge disappears within 60 seconds.
|
|
175
|
+
- **Control actions:** `DaemonRegistry` holds `AbortController` only. The registry is ephemeral; it loses data on restart. That is acceptable because a restarted server cannot control sessions started before the restart -- those sessions are genuinely orphaned.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
## Candidate Directions
|
|
180
|
+
|
|
181
|
+
### Direction A: Minimal flag + control actions (recommended for MVP)
|
|
182
|
+
|
|
183
|
+
**Summary:** Add an `isAutonomous` boolean to `ConsoleSessionSummary` and `ConsoleSessionDetail`. The daemon registers each running session in a new `DaemonRegistryService` that the console routes can query. Add three control endpoints (`POST /api/v2/sessions/:id/pause`, `POST /api/v2/sessions/:id/resume`, `POST /api/v2/sessions/:id/cancel`). Add a pause/resume/cancel button strip to the console session detail view. Real-time step progress uses existing `useSessionDetail` 5s poll (no new SSE channel needed for MVP).
|
|
184
|
+
|
|
185
|
+
**What "daemon registration" means:** When the daemon starts a session, it calls a new in-process method `daemonRegistry.register(sessionId, abortController, status)`. The console routes call `daemonRegistry.get(sessionId)` to determine if a session is live and to send control signals.
|
|
186
|
+
|
|
187
|
+
**Console changes (backend):**
|
|
188
|
+
1. Add `DaemonRegistry` service class -- in-process `Map<sessionId, DaemonEntry>` with `register/deregister/pause/resume/cancel/list` methods
|
|
189
|
+
2. Add `DaemonEntry` type: `{ sessionId, workflowId, goal, startedAtMs, lastHeartbeatMs, abortController: AbortController, status: 'running' | 'paused' | 'cancelling' }`
|
|
190
|
+
3. Extend `ConsoleServicePorts` with optional `daemonRegistry?: DaemonRegistry`
|
|
191
|
+
4. In `getSessionList()` / `getSessionSummary()`: if `daemonRegistry` is set, augment summary with `isAutonomous: boolean` and `daemonStatus: 'running' | 'paused' | 'cancelling' | null`
|
|
192
|
+
5. Mount three new POST routes in `mountConsoleRoutes()`:
|
|
193
|
+
- `POST /api/v2/sessions/:id/pause`
|
|
194
|
+
- `POST /api/v2/sessions/:id/resume`
|
|
195
|
+
- `POST /api/v2/sessions/:id/cancel`
|
|
196
|
+
6. Add new SSE event type: `{type: "daemon-status-changed", sessionId, status}` -- broadcast when daemon status changes
|
|
197
|
+
|
|
198
|
+
**Console changes (frontend):**
|
|
199
|
+
1. Extend `ConsoleSessionSummary` and `ConsoleSessionDetail` types with `isAutonomous: boolean` and `daemonStatus: 'running' | 'paused' | 'cancelling' | null`
|
|
200
|
+
2. Add `useDaemonControl(sessionId)` hook -- wraps `POST` mutations, optimistic UI updates
|
|
201
|
+
3. In `SessionCard`: show `[ LIVE ]` badge (pulsing amber dot) when `isAutonomous && daemonStatus === 'running'`; show `[ PAUSED ]` badge when `daemonStatus === 'paused'`
|
|
202
|
+
4. In `SessionDetail`: add `AutonomousControlStrip` component -- three buttons `[ PAUSE ]`, `[ RESUME ]`, `[ CANCEL ]` visible only when `isAutonomous`. Show current step label + elapsed time.
|
|
203
|
+
5. In `useSessionListRepository`: subscribe to `daemon-status-changed` SSE event to trigger immediate re-render (no poll wait)
|
|
204
|
+
|
|
205
|
+
**Scope:** ~8 backend files changed, ~5 frontend files changed, ~2 new files. No schema changes, no new ports, no database.
|
|
206
|
+
|
|
207
|
+
**Limitations accepted:** Tool-call granularity requires a separate SSE channel (deferred to Next). The `DaemonRegistry` is in-process only -- if the server restarts mid-session, the daemon status is lost (acceptable for MVP; solve with heartbeat file later).
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
### Direction B: Typed SSE event stream per session (comprehensive but larger)
|
|
212
|
+
|
|
213
|
+
**Summary:** Add a per-session SSE endpoint (`/api/v2/sessions/:id/live`) that streams `AgentEvent` messages as the daemon executes: tool calls started/completed, step advanced, paused, cancelled, error. Frontend subscribes when viewing a session detail. Provides millisecond resolution but requires more infrastructure.
|
|
214
|
+
|
|
215
|
+
**Additional required pieces:**
|
|
216
|
+
- In-process pub/sub bus (EventEmitter or similar) keyed by sessionId
|
|
217
|
+
- Daemon writes to bus on each tool call; console routes read from bus and pipe to SSE client
|
|
218
|
+
- New `ConsoleAgentEvent` union type: `{kind: 'tool_call_started', toolName, args?} | {kind: 'tool_call_completed', toolName, durationMs} | {kind: 'step_advanced', stepLabel} | {kind: 'session_paused'} | {kind: 'session_cancelled'}`
|
|
219
|
+
|
|
220
|
+
**Verdict:** High value for observability, but wrong for MVP. The complexity is in the daemon side (emitting events for every tool call), not the console side. Direction A gives 80% of the user value with 20% of the work. Promote to "Next" after MVP ships.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
### Direction C: File-based daemon heartbeat (simplest possible, but weaker)
|
|
225
|
+
|
|
226
|
+
**Summary:** The daemon writes a `daemon.json` heartbeat file into the session directory every N seconds. Console service reads `daemon.json` on each session load; if `Date.now() - heartbeatMs < 30_000`, the session is considered live.
|
|
227
|
+
|
|
228
|
+
**Verdict:** Elegant for detecting "is running" but insufficient for control actions (can't send pause/resume via a file). Also the file watcher emits a `.jsonl` change-only filter so `daemon.json` would not trigger SSE updates. Too limited for the control surface. Keep as a heartbeat persistence mechanism alongside Direction A.
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Challenge Notes
|
|
233
|
+
|
|
234
|
+
### Challenge 1: Daemon registration vs. console server lifecycle
|
|
235
|
+
|
|
236
|
+
The `DaemonRegistry` is in-process. If the console server and daemon are separate OS processes (not co-located in the same Node.js process), the registry must communicate via a socket, pipe, or shared file. The MVP assumption is that the daemon and console server share the same Node.js process -- valid if WorkRail's HTTP server hosts both the console API and the daemon. This assumption must be validated before implementation.
|
|
237
|
+
|
|
238
|
+
**Resolution:** In the MVP, start with the shared-process model. Design the `DaemonRegistry` interface such that it can be backed by an IPC socket later without changing callers. The interface is: `register(sessionId, entry) / deregister(sessionId) / get(sessionId) / list() / pause(sessionId) / resume(sessionId) / cancel(sessionId)`. The in-process `Map` is the first implementation; a Unix socket-backed implementation is the second.
|
|
239
|
+
|
|
240
|
+
### Challenge 2: The `isAutonomous` signal -- where does it come from?
|
|
241
|
+
|
|
242
|
+
A session created by the daemon vs. a session created by a human MCP call looks identical in the event log. The daemon must annotate the session at creation time to distinguish it. Options:
|
|
243
|
+
- (a) A new domain event `{kind: "autonomous_session_started"}` in the event log -- durable, visible in history, queryable without the daemon registry
|
|
244
|
+
- (b) A context variable `{key: "is_autonomous", value: "true"}` set at session start -- already in the event log as a `context_set` event; `ConsoleService` can project it from `projectRunContextV2`
|
|
245
|
+
- (c) Daemon registry only -- query the registry; fall back to `false` if session is not registered (not durable across restarts)
|
|
246
|
+
|
|
247
|
+
**Recommendation:** (b) is the right approach -- it uses an existing mechanism, requires no new event types, is durable across server restarts, and is queryable by `ConsoleService` without coupling to the daemon registry. The daemon calls `context_set` with `is_autonomous: "true"` and `daemon_goal: "<goal text>"` at session start.
|
|
248
|
+
|
|
249
|
+
### Challenge 3: Pause semantics
|
|
250
|
+
|
|
251
|
+
"Pause" in an autonomous session means "stop executing after the current tool call completes." It does not mean "abort the current LLM response mid-stream." The cleanest implementation: a cooperative pause flag that the daemon checks before each `continue_workflow` call. The daemon loop checks `daemonRegistry.isPaused(sessionId)` before advancing; if paused, it blocks the loop and emits a `{type: "daemon-status-changed", sessionId, status: "paused"}` SSE event. The session remains `in_progress` in the event log -- it has not advanced -- but the console shows it as paused.
|
|
252
|
+
|
|
253
|
+
**Outcome:** Pause = cooperative gate. Resume = release gate. Cancel = `AbortController.abort()` + deregister. The `blocked` status in the event log is already used for workflow-level blocking (not daemon-level pausing), so a new `paused` concept belongs in the `DaemonEntry`, not in `ConsoleSessionStatus`.
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Resolution Notes
|
|
258
|
+
|
|
259
|
+
### Chosen direction
|
|
260
|
+
|
|
261
|
+
**Direction A is the MVP.** The daemon registry is the minimal additional surface. The three control endpoints are the minimal write path. The `context_set` approach solves `isAutonomous` durably without new event types. The existing 5s session detail poll handles step progress visibility adequately for MVP.
|
|
262
|
+
|
|
263
|
+
### What "real-time step progress" means in practice
|
|
264
|
+
|
|
265
|
+
When a user is watching a live autonomous session in the detail view:
|
|
266
|
+
- `useSessionDetail` polls every 5 seconds -- each step advance (which writes `.jsonl` and triggers SSE) will surface within 5 seconds
|
|
267
|
+
- The current step label is visible as the preferred tip node's `stepLabel` in the DAG
|
|
268
|
+
- The step start time can be approximated from the node's `createdAtEventIndex` cross-referenced with `lastModifiedMs`
|
|
269
|
+
- "What tool calls have been observed" at MVP = the recap snippet of the current tip node (already available)
|
|
270
|
+
|
|
271
|
+
The gap vs. the stated goal: "real-time tool call observation" needs the Direction B event stream. At MVP, users see "which step is running" but not "which tool call within that step is running." This is explicitly deferred and explicitly acceptable.
|
|
272
|
+
|
|
273
|
+
### Console live view implementation plan (ordered)
|
|
274
|
+
|
|
275
|
+
**Phase 1 -- Visibility (no control, no new routes):**
|
|
276
|
+
1. Daemon calls `context_set(is_autonomous: "true", daemon_goal: "<goal>")` at session start
|
|
277
|
+
2. `ConsoleService.projectSessionSummary()` reads `is_autonomous` from `projectRunContextV2()` output
|
|
278
|
+
3. `ConsoleSessionSummary` gains `isAutonomous: boolean`
|
|
279
|
+
4. `SessionCard` shows `[ LIVE ]` pulsing badge for `isAutonomous && status === 'in_progress'`
|
|
280
|
+
5. Done. Zero new routes. Zero new ports.
|
|
281
|
+
|
|
282
|
+
**Phase 2 -- Control surface:**
|
|
283
|
+
1. `DaemonRegistry` class with Map-backed implementation
|
|
284
|
+
2. Three POST endpoints mounted in `mountConsoleRoutes()`
|
|
285
|
+
3. `useDaemonControl()` frontend hook
|
|
286
|
+
4. `AutonomousControlStrip` in `SessionDetail`
|
|
287
|
+
5. `daemon-status-changed` SSE event type
|
|
288
|
+
|
|
289
|
+
**Phase 3 -- Tool-call granularity (Next):**
|
|
290
|
+
1. Per-session SSE endpoint + in-process pub/sub bus
|
|
291
|
+
2. Daemon emits `tool_call_started / completed` events to bus
|
|
292
|
+
3. Frontend `useLiveSession()` hook subscribes to per-session stream
|
|
293
|
+
4. Live tool call log panel in `SessionDetail`
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## 12-Month Product Vision
|
|
298
|
+
|
|
299
|
+
### What WorkRail becomes in 12 months
|
|
300
|
+
|
|
301
|
+
```
|
|
302
|
+
WorkRail Autonomous Platform (v3)
|
|
303
|
+
|
|
304
|
+
┌────────────────────────────────────────────────────────────────┐
|
|
305
|
+
│ Control plane (console) │
|
|
306
|
+
│ ─ Session list with LIVE badges for autonomous sessions │
|
|
307
|
+
│ ─ Per-session control: pause / resume / cancel │
|
|
308
|
+
│ ─ Real-time tool call stream (step granularity) │
|
|
309
|
+
│ ─ Evidence viewer: required artifacts per step, observed tools │
|
|
310
|
+
│ ─ Workflow authoring: visual step editor, markdown input │
|
|
311
|
+
│ ─ Trigger configuration: webhook setup, cron, CLI │
|
|
312
|
+
└────────────────────────────────────────────────────────────────┘
|
|
313
|
+
│ │
|
|
314
|
+
▼ ▼
|
|
315
|
+
┌──────────────────┐ ┌──────────────────────────────────────┐
|
|
316
|
+
│ Workflow engine │ │ Autonomous daemon │
|
|
317
|
+
│ (existing) │ │ ─ LLM call layer (Anthropic API) │
|
|
318
|
+
│ ─ Durable state │ │ ─ AgentLoop wrapper │
|
|
319
|
+
│ ─ HMAC tokens │ │ ─ Evidence collection hooks │
|
|
320
|
+
│ ─ DAG / trace │ │ ─ Pre/PostToolCall gating │
|
|
321
|
+
│ ─ Projections │ │ ─ Session actor queue │
|
|
322
|
+
└──────────────────┘ │ ─ Task flow chaining │
|
|
323
|
+
└──────────────────────────────────────┘
|
|
324
|
+
│
|
|
325
|
+
┌─────────────────────┼──────────────────────┐
|
|
326
|
+
▼ ▼ ▼
|
|
327
|
+
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
|
|
328
|
+
│ GitLab / GitHub │ │ Jira / Linear │ │ CLI / cron │
|
|
329
|
+
│ webhook triggers │ │ ticket triggers │ │ manual triggers │
|
|
330
|
+
└──────────────────┘ └──────────────────┘ └──────────────────┘
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**The platform in prose:**
|
|
334
|
+
|
|
335
|
+
WorkRail in 12 months is the **open-source enforcement layer for autonomous AI agents**. It is the only platform that combines:
|
|
336
|
+
|
|
337
|
+
1. **Autonomous execution** -- a daemon that drives Claude (or any Anthropic model) through a structured workflow without a human at the keyboard
|
|
338
|
+
2. **Cryptographic step enforcement** -- every step advance requires a valid HMAC-signed token; the agent cannot skip steps even when running autonomously and even when context degrades
|
|
339
|
+
3. **Full session observability** -- every tool call, every step, every branch decision is visible in the console DAG and execution trace; nothing is a black box
|
|
340
|
+
4. **Durable cross-session state** -- sessions survive restarts, context compaction, and model upgrades; the event log is the ground truth regardless of what happens to Claude's context window
|
|
341
|
+
5. **Human-in-the-loop control** -- any autonomous session can be paused, inspected, and resumed from the console; pause/resume is first-class, not an afterthought
|
|
342
|
+
6. **Trigger-driven automation** -- GitLab MR opened, Jira ticket moved, cron schedule, CLI call -- any event can start a workflow; autonomous sessions are a natural extension of the existing session model
|
|
343
|
+
7. **Workflow chaining** -- a completed workflow A can automatically start workflow B with A's outputs as B's context; multi-step autonomous pipelines with no human intervention between stages
|
|
344
|
+
8. **Evidence-gated gates** -- steps that require human approval or required artifacts block until the evidence is present; the daemon cannot bulldoze through a verification gate
|
|
345
|
+
|
|
346
|
+
### The 12 milestones
|
|
347
|
+
|
|
348
|
+
**Q2 2026 (now-next, 0-3 months):**
|
|
349
|
+
1. **Autonomous daemon alpha** -- daemon drives Claude through a workflow via Anthropic API; single trigger type (CLI); produces regular session in existing event log
|
|
350
|
+
2. **Console live view Phase 1** -- `[ LIVE ]` badge; `isAutonomous` from context; no new routes
|
|
351
|
+
3. **Console live view Phase 2** -- pause/resume/cancel endpoints; `AutonomousControlStrip`
|
|
352
|
+
|
|
353
|
+
**Q3 2026 (3-6 months):**
|
|
354
|
+
4. **Webhook trigger system** -- GitLab MR webhook → workflow start; authenticated delivery; trigger registry in daemon config
|
|
355
|
+
5. **Evidence collection hooks** -- `BeforeToolCall` intercept; step evidence requirements declared in workflow JSON; daemon blocks `continue_workflow` until evidence is present
|
|
356
|
+
6. **Real-time tool call stream** -- per-session SSE; live tool call log in console
|
|
357
|
+
|
|
358
|
+
**Q4 2026 (6-9 months):**
|
|
359
|
+
7. **Task flow chaining** -- workflow A completion produces a chain artifact; daemon picks it up and starts workflow B; visible in console as linked sessions
|
|
360
|
+
8. **Compaction survival** -- WorkRail step notes injected into Claude session memory pre-compaction; daemon sessions survive context resets without losing workflow state
|
|
361
|
+
9. **Jira / Linear triggers** -- ticket status change → workflow start; bi-directional: workflow step can update ticket status
|
|
362
|
+
|
|
363
|
+
**Q1 2027 (9-12 months):**
|
|
364
|
+
10. **Multi-model routing** -- daemon can route steps to Sonnet (fast) vs. Opus (deep) based on step metadata in the workflow JSON; cost-aware routing
|
|
365
|
+
11. **Visual workflow authoring** -- step editor in console; drag-and-drop, prompt editing, loop configuration; outputs JSON
|
|
366
|
+
12. **Workflow marketplace** -- bundled + team + public workflows discoverable from the console; install/update from URL; WorkRail becomes the npm for AI workflows
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## How WorkRail Surpasses Competitors
|
|
371
|
+
|
|
372
|
+
### Surpassing nexus-core
|
|
373
|
+
|
|
374
|
+
**nexus-core's ceiling:** Advisory prompts that an agent under context pressure can and will ignore. No durability -- session dies with the conversation. Human-initiated only -- no autonomous mode.
|
|
375
|
+
|
|
376
|
+
**WorkRail's advantage:**
|
|
377
|
+
- Where nexus-core has skill text, WorkRail has HMAC tokens -- mathematically unskippable
|
|
378
|
+
- Where nexus-core has one conversation, WorkRail has a durable append-only event log that survives compaction, model upgrades, and restarts
|
|
379
|
+
- Where nexus-core requires a human to start `/flow`, WorkRail starts autonomously on a webhook
|
|
380
|
+
- Where nexus-core's "learning capture" is a markdown file in a conversation, WorkRail's session notes are queryable structured artifacts in a session store
|
|
381
|
+
|
|
382
|
+
**The combinatorial play:** WorkRail can *run nexus-core phases as steps*. The nexus-vs-workrail comparison document already frames this as "C3: WorkRail meta-workflow wraps nexus-core phases." WorkRail doesn't compete with nexus-core's org integrations -- it enforces the phase gates around them. This is additive, not competitive.
|
|
383
|
+
|
|
384
|
+
### Surpassing OpenClaw
|
|
385
|
+
|
|
386
|
+
**OpenClaw's ceiling:** In-memory session store (24h TTL, lost on restart). No step enforcement -- tasks can be abandoned. No audit trail -- you can't reconstruct what the agent did. Task system is SQLite-backed but not cryptographically enforced.
|
|
387
|
+
|
|
388
|
+
**WorkRail's advantage:**
|
|
389
|
+
- **Durability:** OpenClaw's `RuntimeCache` is in-memory with a 24h TTL. WorkRail's session store is an append-only event log on disk -- sessions survive restarts, process crashes, and machine reboots.
|
|
390
|
+
- **Enforcement:** OpenClaw has no equivalent of WorkRail's HMAC token protocol. An OpenClaw task can be abandoned at any step. WorkRail steps cannot be skipped.
|
|
391
|
+
- **Auditability:** WorkRail's DAG + execution trace + node detail gives complete session forensics. OpenClaw has no session replay or audit trail.
|
|
392
|
+
- **Workflow composition:** OpenClaw has task strings. WorkRail has structured JSON workflows with loops, conditionals, assessment gates, and typed context.
|
|
393
|
+
|
|
394
|
+
**What WorkRail takes from OpenClaw (patterns, not code):**
|
|
395
|
+
- The `SessionActorQueue` per-session serialization pattern (prevent concurrent modification)
|
|
396
|
+
- The `SpawnAcpParams` interface shape (minimal spawn interface)
|
|
397
|
+
- The policy system (`isXxxEnabledByPolicy`) for daemon feature flags
|
|
398
|
+
|
|
399
|
+
### Surpassing ruflo
|
|
400
|
+
|
|
401
|
+
**ruflo's ceiling:** ruflo v3.5 now claims "4 gates the model cannot bypass" via `@claude-flow/guidance`. This is a prompt-based gate -- the agent is instructed to call the gate tool. Context pressure, model substitution, or adversarial prompts can cause the agent to skip it. There is no cryptographic binding between steps. The SONA self-learning system stores "what works" in a vector database, which means successful bypasses can be learned and repeated.
|
|
402
|
+
|
|
403
|
+
**WorkRail's advantage:**
|
|
404
|
+
- Every step in a WorkRail autonomous session is cryptographically enforced -- the daemon cannot say "I advanced" without a valid token
|
|
405
|
+
- WorkRail's DAG shows exactly what happened vs. what was supposed to happen
|
|
406
|
+
- WorkRail can pause any session and inspect every decision, tool call, and output
|
|
407
|
+
- ruflo is a coordination framework; WorkRail is an enforcement framework
|
|
408
|
+
|
|
409
|
+
**The framing:** ruflo is "get agents to do more things faster." WorkRail is "get agents to do the right things in the right order with proof." These are different products for different risk tolerances.
|
|
410
|
+
|
|
411
|
+
### Surpassing Devin / GitHub Copilot Workspace
|
|
412
|
+
|
|
413
|
+
**Devin's ceiling:** Closed-source, cloud-only, one model (opaque), black box execution. You cannot see what it did. You cannot enforce a process. You cannot self-host.
|
|
414
|
+
|
|
415
|
+
**GitHub Copilot Workspace's ceiling:** GitHub-only, no process enforcement, no session durability, no audit trail, no self-hosting.
|
|
416
|
+
|
|
417
|
+
**WorkRail's advantage:**
|
|
418
|
+
- **Open source, self-hosted:** Your data stays in your infrastructure. No vendor lock-in. Audit everything.
|
|
419
|
+
- **Any model:** Anthropic today, extensible to any provider via the pi-mono unified API pattern.
|
|
420
|
+
- **Process enforcement:** Copilot Workspace can implement however it wants -- WorkRail cannot skip steps by design.
|
|
421
|
+
- **Session forensics:** Every decision, branch, tool call, and output is queryable and visualizable in the console. Devin shows you a PR; WorkRail shows you why every decision was made.
|
|
422
|
+
- **Human control plane:** Any session can be paused and inspected. There is no equivalent in Devin or Copilot Workspace.
|
|
423
|
+
|
|
424
|
+
**The positioning:** WorkRail is for organizations that cannot put their IP into a closed-source cloud AI system and accept "trust the black box." It is the enforcement-first, audit-first, self-hosted autonomous agent platform.
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
428
|
+
## Challenge Review Findings
|
|
429
|
+
|
|
430
|
+
### BLOCKING Challenge 1: Heartbeat timer cannot hold session lock (RESOLVED)
|
|
431
|
+
|
|
432
|
+
**Finding:** The session lock (`ExecutionSessionGateV2`) is held for the entire duration of a step execution. A background timer trying to write heartbeat `context_set` events cannot acquire the same lock while the daemon holds it. This invalidates the "emit heartbeat every 30 seconds from a background timer" design.
|
|
433
|
+
|
|
434
|
+
**Resolution:** Heartbeats must be written within the daemon's existing write path, not from a separate timer thread. Specifically:
|
|
435
|
+
- Heartbeat is written at step START (daemon calls `context_set(daemon_heartbeat: "<ISO>")` before beginning the LLM call, while it holds the session lock for step setup)
|
|
436
|
+
- For steps that take longer than 60 seconds: the daemon writes heartbeats at each tool call result boundary (the daemon already processes tool calls one at a time; each tool result is a write boundary where the lock can be briefly acquired)
|
|
437
|
+
- The 60-second liveness window remains valid: any step that is actively executing tool calls will produce heartbeats at tool-call-result boundaries
|
|
438
|
+
|
|
439
|
+
**Alternative (simpler for MVP):** Store liveness in DaemonRegistry as `lastHeartbeatMs: number`, updated by the daemon at each tool call without lock contention. The ConsoleService reads from the registry for liveness (not the event log). The event log still records `is_autonomous: "true"` at session start for durability. The LIVE detection uses the registry `lastHeartbeatMs` for freshness. This is a pragmatic hybrid: `is_autonomous` is durable (event log), `lastHeartbeatMs` is ephemeral (registry).
|
|
440
|
+
|
|
441
|
+
**Impact on design:** The "liveness from heartbeat events" design must be amended. For MVP, the hybrid approach (durable `is_autonomous` in event log + ephemeral `lastHeartbeatMs` in registry) is the correct implementation. The registry stores one more field than previously designed.
|
|
442
|
+
|
|
443
|
+
### BLOCKING Challenge 2: STDIO transport assumption (NOT BLOCKING -- mis-scoped)
|
|
444
|
+
|
|
445
|
+
**Finding:** The challenge correctly identifies that STDIO mode (Claude Code using WorkRail as MCP server) does not have an HTTP server or DaemonRegistry. However, the challenge conflates two distinct WorkRail runtime modes:
|
|
446
|
+
|
|
447
|
+
- **MCP mode (STDIO/HTTP):** Claude Code is the agent; WorkRail is the MCP server providing tools. The daemon concept does not exist in this mode. WorkRail does not drive Claude -- Claude drives WorkRail via MCP tool calls.
|
|
448
|
+
- **Daemon mode:** WorkRail is the agent driver; it calls the Anthropic API directly and drives Claude through a workflow autonomously. This mode requires HTTP mode (it needs the console, the DaemonRegistry, and control endpoints).
|
|
449
|
+
|
|
450
|
+
These are mutually exclusive runtime modes. The autonomous daemon always runs in HTTP mode. STDIO mode continues to work exactly as today. No breaking change.
|
|
451
|
+
|
|
452
|
+
**Not a blocking issue.** The design is correct for daemon mode. STDIO mode users do not get autonomous mode.
|
|
453
|
+
|
|
454
|
+
### MEDIUM Challenge 3: Pause flag check location (RESOLVED -- mis-framed)
|
|
455
|
+
|
|
456
|
+
**Finding:** The challenge assumed the daemon runs inside Claude Code's harness and calls `continue_workflow` via MCP tool calls. This is incorrect for autonomous daemon mode. The daemon is a standalone WorkRail process that calls the Anthropic API and advances the session via the engine's in-process methods directly (not via MCP). The daemon has a direct reference to the DaemonRegistry (same process). The pause check is `if (daemonRegistry.isPaused(sessionId)) { await waitForResume(); }` called before the engine's `continueWorkflow()` method. No MCP hop required.
|
|
457
|
+
|
|
458
|
+
**Not a new issue.** Already resolved by the same-process design.
|
|
459
|
+
|
|
460
|
+
### MEDIUM Challenge 4: Cancelled session orphan (VALID, MITIGATED)
|
|
461
|
+
|
|
462
|
+
**Finding:** When the daemon is cancelled, the session remains `in_progress` in the event log. There is no mechanism to mark it as terminated.
|
|
463
|
+
|
|
464
|
+
**Resolution:** Leverage existing `dormant` status. WorkRail already has dormancy detection: a session that has been `in_progress` for more than 1 hour with no activity is displayed as `dormant`. When the daemon receives a cancel signal, it writes a `context_set(daemon_status: "cancelled")` event before deregistering. The dormancy threshold already handles the display. For users who want immediate "abandoned" status, the console can show `dormant` once the last heartbeat is > 60 seconds old (which happens immediately after cancel since no new heartbeats are emitted). No new terminal status required for MVP.
|
|
465
|
+
|
|
466
|
+
### LOW Challenge 5: LIVE badge spoofing (ACCEPTED)
|
|
467
|
+
|
|
468
|
+
**Finding:** Users can manually set `is_autonomous: "true"` via context_set MCP calls, causing the LIVE badge to show on non-autonomous sessions. Low severity because this requires deliberate user action and only affects their own sessions.
|
|
469
|
+
|
|
470
|
+
**Resolution accepted:** Document that `is_autonomous` is a daemon-reserved context key. The badge is best-effort, not a security boundary. No enforcement for MVP.
|
|
471
|
+
|
|
472
|
+
## Decision Log
|
|
473
|
+
|
|
474
|
+
| Decision | Choice | Rationale |
|
|
475
|
+
|----------|--------|-----------|
|
|
476
|
+
| How to mark autonomous sessions | `context_set(is_autonomous: "true")` at session start | Durable in event log, queryable by existing projections, no new event types |
|
|
477
|
+
| Liveness detection | HYBRID: `is_autonomous` from event log (durable) + `lastHeartbeatMs` from DaemonRegistry (ephemeral) | Pure event-log heartbeats blocked by session lock (Challenge 1); registry holds ephemeral freshness signal; event log holds durable autonomous flag |
|
|
478
|
+
| Pause semantics | Cooperative gate (check before in-process `continueWorkflow()` call) | Daemon runs in-process with DaemonRegistry; no MCP hop required; no LLM abort needed; reversible |
|
|
479
|
+
| Control endpoint method | POST (not DELETE/PATCH) | Simple, idempotent, consistent with REST conventions for actions |
|
|
480
|
+
| DaemonRegistry scope | In-process for MVP, socket-backed later | Avoids IPC complexity in MVP; interface design allows future migration |
|
|
481
|
+
| DaemonRegistry contents | `abortController` + `pauseFlag` + `status` + `lastHeartbeatMs` | Heartbeat timer cannot write to event log (session lock); registry is the freshness signal |
|
|
482
|
+
| Abandoned session handling | Use existing `dormant` status detection | When daemon cancels, no more heartbeats emitted; session becomes `dormant` within 60s naturally; no new terminal status for MVP |
|
|
483
|
+
| Tool-call granularity in MVP | Deferred (5s poll sufficient) | 80/20: step-level progress covers the primary use case; tool-level needs Direction B infrastructure |
|
|
484
|
+
| Autonomous runtime mode | HTTP mode only | Daemon mode requires console, control endpoints, persistent process; STDIO mode continues unchanged for human-driven sessions |
|
|
485
|
+
| `is_autonomous` field security | Best-effort, not enforced | Document as daemon-reserved; badge is informational, not a security boundary; MVP acceptable |
|
|
486
|
+
|
|
487
|
+
---
|
|
488
|
+
|
|
489
|
+
## Final Recommendation Summary
|
|
490
|
+
|
|
491
|
+
### Selected direction: Candidate 2 amended (hybrid liveness + C3 features absorbed)
|
|
492
|
+
|
|
493
|
+
**Confidence: HIGH**
|
|
494
|
+
|
|
495
|
+
The direction has been grounded in full codebase read, adversarially challenged, philosophy-reviewed, compared to alternatives, and had all tradeoffs explicitly accepted. No RED findings. Two ORANGE implementation improvements incorporated.
|
|
496
|
+
|
|
497
|
+
**Strongest alternative:** Candidate 3 (History Reframe) + Candidate 1 (Visibility Only). This combination serves the primary user job (post-execution verification) without any control infrastructure. It is the right Phase 1 but fails the safety net acceptance criterion. It is not the MVP.
|
|
498
|
+
|
|
499
|
+
**Residual risks (3, all LOW to MEDIUM, none blocking):**
|
|
500
|
+
1. Heartbeat interval is an implicit behavioral contract -- document in daemon implementation spec
|
|
501
|
+
2. Backend/frontend type sharing for `DaemonEntry` status -- add string literal union to `api/types.ts`
|
|
502
|
+
3. Control endpoint idempotency not specified for edge cases -- document 200/409 behavior before coding
|
|
503
|
+
|
|
504
|
+
**Pivot condition:** If daemon and console server run in separate processes (future containerization), the in-process DaemonRegistry requires a socket-backed implementation. The registry interface design accommodates this migration without changing callers.
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## Final Summary
|
|
509
|
+
|
|
510
|
+
**Minimum console changes for MVP (ordered):**
|
|
511
|
+
|
|
512
|
+
1. Daemon writes `context_set(is_autonomous: "true")` at session start -- zero console changes, done in daemon
|
|
513
|
+
2. `ConsoleService.projectSessionSummary()` reads `is_autonomous` context key, adds `isAutonomous: boolean` to `ConsoleSessionSummary`
|
|
514
|
+
3. `SessionCard` adds `[ LIVE ]` pulsing amber badge when `isAutonomous && status === 'in_progress'`
|
|
515
|
+
4. `DaemonRegistry` class (in-process Map, ~50 lines)
|
|
516
|
+
5. Three POST control endpoints in `mountConsoleRoutes()`
|
|
517
|
+
6. `useDaemonControl()` frontend hook
|
|
518
|
+
7. `AutonomousControlStrip` component in `SessionDetail`
|
|
519
|
+
8. New `{type: "daemon-status-changed"}` SSE event type
|
|
520
|
+
|
|
521
|
+
**Total scope:** ~10 files, ~400 lines. No schema changes. No database. No new ports in `ConsoleServicePorts`. The existing session model, event log, and SSE infrastructure are reused throughout.
|
|
522
|
+
|
|
523
|
+
**12-month vision:** WorkRail becomes the open-source, enforcement-first autonomous agent platform -- the only system that combines autonomous execution, cryptographic step enforcement, full session observability, durable state, and human control. It surpasses OpenClaw (durability + enforcement), nexus-core (autonomous + durable), ruflo (enforcement vs. coordination), and Devin (open-source + self-hosted + auditable).
|
|
524
|
+
|
|
525
|
+
**The product moat:** WorkRail's moat is not the LLM integration (anyone can call the Anthropic API) or the workflow format (JSON is not a moat). The moat is **cryptographic enforcement + full session observability + durable state** combined in a single open-source platform. This combination is not a feature -- it is an architectural invariant that cannot be bolted on to existing systems.
|