@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +3 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +252 -93
  160. package/workflows/workflow-for-workflows.v2.json +188 -77
@@ -0,0 +1,3927 @@
1
+ # Ideas Backlog
2
+
3
+ Workflow and feature ideas that are worth capturing but not yet planned or designed.
4
+
5
+ ---
6
+
7
+ ## Research Notes: Autonomous Platform Vision (Apr 14, 2026)
8
+
9
+ ### Common-Ground relationship + cross-repo execution model
10
+
11
+ **Common-Ground stays separate -- WorkRail wraps it.**
12
+
13
+ Common-Ground and WorkRail solve different problems at different layers:
14
+ - Common-Ground: "what does this agent know about this codebase and team?" (context distribution)
15
+ - WorkRail: "what should this agent do next, and did it actually do it?" (workflow enforcement)
16
+
17
+ Merging them would make WorkRail opinionated about team structure, IDE configs, AGENTS.md formats, and org-specific conventions -- breaking WorkRail's portability. `npx -y @exaudeus/workrail` works for any engineer anywhere with zero config. That's a feature to protect.
18
+
19
+ **The right relationship:** Common-Ground distributes WorkRail as part of the team toolchain (already true via `[[workflow_repos]]` in `team.toml`). WorkRail stays generic. Common-Ground stays org-specific. They're friends, not merged.
20
+
21
+ **WorkRail bootstraps Common-Ground (tentative idea, low priority):**
22
+
23
+ > ⚠️ Tentative -- not committed. Needs more thought before pursuing.
24
+
25
+ WorkRail could be the *setup layer* for Common-Ground -- a guided `workrail init` workflow that generates a Common-Ground config, runs `make sync`, and registers workflow directories as managed sources. Would make Common-Ground configurations shareable as WorkRail workflows.
26
+
27
+ Also tentative: Common-Ground's `make sync` triggering a WorkRail daemon session to validate the distributed configuration. Interesting but not a near-term priority.
28
+
29
+ ---
30
+
31
+ **Cross-repo execution model -- HIGH IMPORTANCE, post-MVP:** ⭐
32
+
33
+ WorkRail must handle any environment. Not MVP, but a must-have before WorkRail can be called a general-purpose platform.
34
+
35
+ WorkRail currently assumes a single repo. The autonomous daemon breaks this assumption -- a coding task may touch Android, iOS, and a GraphQL backend simultaneously. An investigation may span 5 services.
36
+
37
+ **Workspace manifest** -- sessions declare which repos they need:
38
+ ```json
39
+ {
40
+ "context": {
41
+ "repos": [
42
+ { "name": "android", "path": "~/git/zillow/zillow-android-2" },
43
+ { "name": "ios", "path": "~/git/zillow/ZillowMap" },
44
+ { "name": "backend", "path": "~/git/zillow/mercury-graphql" }
45
+ ]
46
+ }
47
+ }
48
+ ```
49
+
50
+ **Scoped tools** -- `BashInRepo`, `ReadRepo`, `WriteRepo` that route to the correct working directory:
51
+ ```
52
+ BashInRepo(repo: "android", command: "gradle test")
53
+ ReadRepo(repo: "ios", path: "Sources/Messaging/ZIMGallery.swift")
54
+ ```
55
+
56
+ **Dynamic repo provisioning** -- the daemon resolves repos at session start:
57
+ - If the repo is already cloned locally, use it
58
+ - If declared as a remote URL, clone to `~/.workrail/repos/<name>/` (same pattern as Common-Ground's `[[workflow_repos]]`)
59
+ - Workflow authors declare repo requirements; WorkRail ensures they're available
60
+
61
+ **Why this matters:** This is what Common-Ground's `make scan` does manually today -- finds repos, injects context. WorkRail's daemon does it dynamically, driven by workflow declarations. Any environment, any combination of repos, any org -- zero manual setup.
62
+
63
+ **Cross-repo is the feature that makes WorkRail truly freestanding.** A developer anywhere can point WorkRail at their repos, declare a workspace manifest in their workflow, and get the same autonomous multi-repo execution that Mercury Mobile gets -- without Common-Ground, without Zillow infrastructure, without anything except WorkRail.
64
+
65
+ ---
66
+
67
+ ### Long-term vision: WorkTrain as a general engine, domain packs as configuration (Apr 15, 2026)
68
+
69
+ WorkTrain is not just a coding tool. The underlying engine -- session management, workflow enforcement, daemon, agent loop, knowledge graph, context bundle assembly -- is domain-agnostic. What makes it a "coding tool" today is entirely configuration: the workflows, the graph schema, the context bundle queries, the trigger definitions.
70
+
71
+ **Domain packs** are the abstraction that makes this general:
72
+
73
+ A domain pack is a self-contained configuration bundle that specializes WorkTrain for a specific problem domain:
74
+ - a set of workflows (the step structure and agent instructions for that domain)
75
+ - a knowledge graph schema (the node and edge types relevant to that domain)
76
+ - context bundle query definitions (what "give me everything relevant to X" means in that domain)
77
+ - trigger definitions (what events kick off work in that domain)
78
+ - a daemon soul template (default agent persona and principles for that domain)
79
+
80
+ **Examples of domain packs:**
81
+ - `worktrain-coding` -- software engineering (the current default)
82
+ - `worktrain-research` -- literature review, synthesis, citation tracking
83
+ - `worktrain-creative` -- narrative generation, continuity tracking, style enforcement
84
+ - `worktrain-ops` -- incident response, runbook execution, alert-to-action
85
+ - `worktrain-data` -- pipeline validation, schema monitoring, anomaly investigation
86
+
87
+ **The core engine is shared across all of them.** A domain pack author writes workflows, a graph schema, and context bundle queries -- they don't reimplement session management, token protocols, daemon loops, or the console.
88
+
89
+ **Why this matters for WorkTrain's positioning:** most autonomous agent platforms are either too generic (the user has to build everything) or too specific (locked to one use case). Domain packs give WorkTrain a middle path: powerful enough to be opinionated about engineering workflows today, open enough to run any structured agentic domain tomorrow. New domains get the session durability, enforcement, observability, and knowledge graph for free.
90
+
91
+ **What to build first:** nothing new for now. The architecture already supports this -- the domain pack concept is latent in the current design. The right time to make it explicit is when a second domain (creative writing, ops, research) is ready to be added. At that point, extract the coding-specific pieces into `worktrain-coding` and establish the domain pack contract.
92
+
93
+ ---
94
+
95
+ ### Core architectural principle: WorkRail drives itself
96
+
97
+ **The daemon doesn't bypass WorkRail -- it IS WorkRail.**
98
+
99
+ The autonomous engine uses WorkRail's own MCP tools (`start_workflow`, `continue_workflow`) internally, from inside the same process. When running as MCP server, Claude Code calls these tools over the wire. When running as daemon, WorkRail calls them itself. The session engine, token protocol, step sequencer, and workflow registry are shared -- identical in both modes.
100
+
101
+ **The workflow is the interface between the two modes.** A workflow has no knowledge of whether it's being driven by a human through Claude Code or by WorkRail's autonomous daemon. Zero changes to existing workflows required -- every workflow in the library today runs in autonomous mode tomorrow.
102
+
103
+ ```
104
+ WorkRail Core (shared)
105
+ ├── Session engine (durable store, HMAC token protocol, step sequencer)
106
+ ├── Workflow registry (bundled + user + managed sources)
107
+ └── Console (DAG visualization, live session view)
108
+
109
+ WorkRail MCP Server (existing entry point)
110
+ └── Claude Code / Cursor / Firebender call start_workflow, continue_workflow externally
111
+
112
+ WorkRail Daemon (new entry point -- same core, different driver)
113
+ ├── Trigger listener (webhooks, cron, CLI, REST)
114
+ ├── Agent loop (pi-mono's agentLoop calling WorkRail's own MCP tools internally)
115
+ └── Tool execution (Bash, Read, Write -- same tools Claude Code uses)
116
+ ```
117
+
118
+ **Why this matters:**
119
+ - No duplicate session logic, no duplicate workflow format, no duplicate enforcement
120
+ - WorkRail can autonomously improve itself -- the daemon runs `workflow-for-workflows` to author new workflows, which then run in both modes
121
+ - Users who start with Claude Code MCP get autonomous mode for free -- same config, same workflows, second entry point
122
+ - The enforcement guarantee is identical: whether a human or the daemon is driving, the agent cannot skip steps
123
+
124
+ **The single-process model:** The daemon entry point is a new `src/daemon/` module that imports and calls the same handlers as the MCP server -- `executeStartWorkflow`, `executeContinueWorkflow` -- directly, without HTTP overhead. The session store, pinned workflow store, and all other ports are shared DI singletons. MCP server and daemon can run simultaneously in the same process.
125
+
126
+ ---
127
+
128
+ ### The four reference architectures: synthesis
129
+
130
+ **The vision:** WorkRail as the next evolution -- open source, freestanding autonomous agent platform with cryptographic workflow enforcement, durable sessions, full observability, and first-class Anthropic API integration.
131
+
132
+ | Source | Stars | What to take | What WorkRail already does better |
133
+ |--------|-------|-------------|----------------------------------|
134
+ | **OpenClaw** | 357k | ACP session store pattern, task flow chaining, policy system, spawn interface, freestanding daemon architecture | Durable disk sessions, cryptographic enforcement, checkpoint/resume tokens, DAG visualization |
135
+ | **Claude Code** (leaked) | - | Compaction hooks (inject WorkRail notes into session memory before compaction), session runner pattern for programmatic Claude API calls, coordinator/subagent model, `PreToolUse`/`PostToolUse` hooks for evidence collection | Everything -- WorkRail is the enforcement layer above Claude Code |
136
+ | **nexus-core** | 11 (internal) | Org profile system concept, skills-as-slash-commands UX, per-repo context injection, multi-model routing hints | Structural enforcement (nexus: advisory prompts; WorkRail: HMAC-gated tokens), cross-session durability, portability |
137
+ | **pi-mono** | 35k | `@mariozechner/pi-ai` unified multi-provider LLM API (OpenAI, Anthropic, Google, etc.), `agentLoop`/`agentLoopContinue` pattern, `ToolExecutionMode` (sequential/parallel), `BeforeToolCallResult`/`AfterToolCallResult` hooks, `EventStream<AgentEvent>` for streaming agent events, `mom` (Slack bot) as the simplest possible channel integration reference | N/A -- pi-mono is libraries, not a workflow engine |
138
+
139
+ **pi-mono specifically:** 35k stars, MIT, TypeScript monorepo by Mario Zechner (badlogic). The most architecturally clean of the four:
140
+ - `packages/ai` -- `streamSimple`, `complete`, `stream` over a unified `Model<TApi>` abstraction covering OpenAI, Anthropic, Google, Bedrock. This is WorkRail's LLM call layer for autonomous mode.
141
+ - `packages/agent` -- `agentLoop(prompts, context, config, signal?)` returns `EventStream<AgentEvent, AgentMessage[]>`. Clean separation: the loop manages tool calls and context; the caller manages state. `ToolExecutionMode`: "sequential" vs "parallel" tool execution. `BeforeToolCallResult` (can block a tool call with a reason) + `AfterToolCallResult` (can override tool result content). These are the hooks WorkRail needs to observe and gate tool calls.
142
+ - `packages/mom` -- Slack bot that runs an agent per channel, persists MEMORY.md per workspace, loads skills from directory. The simplest reference for "daemon receives message → runs agent → responds." WorkRail's daemon follows this exact pattern.
143
+ - `packages/coding-agent` -- `SessionManager`, `AgentSession`, skill loading from directory. Session/skill abstractions WorkRail's daemon needs.
144
+
145
+ **The synthesis -- what WorkRail becomes:**
146
+
147
+ ```
148
+ WorkRail Autonomous Platform
149
+ ├── Workflow Engine (existing -- keep as-is)
150
+ │ ├── Durable session store (append-only event log)
151
+ │ ├── HMAC token protocol (cryptographic enforcement)
152
+ │ ├── Workflow format (JSON, loops, conditionals, routines)
153
+ │ └── Console + DAG visualization
154
+
155
+ ├── Daemon (new -- build from pi-mono + OpenClaw patterns)
156
+ │ ├── Trigger system (GitLab/GitHub webhooks, Jira, cron, CLI)
157
+ │ │ └── Pattern: OpenClaw's block/trigger architecture
158
+ │ ├── LLM call layer (pi-mono's pi-ai unified API)
159
+ │ ├── Agent loop (pi-mono's agentLoop/agentLoopContinue)
160
+ │ ├── Session management (OpenClaw's AcpSessionStore pattern)
161
+ │ ├── Task flow chaining (OpenClaw's task-flow-registry pattern)
162
+ │ └── Tool observation (Claude Code's PreToolUse hooks → evidence gating)
163
+
164
+ ├── Context survival (new -- from Claude Code compaction research)
165
+ │ ├── WorkRail step notes injected into session memory pre-compaction
166
+ │ ├── Session notes survive context resets as structured memory
167
+ │ └── WorkRail session store = ground truth across all compactions
168
+
169
+ └── Integration layer (optional extensions)
170
+ ├── Slack bot (pi-mono's mom pattern)
171
+ ├── OpenClaw skill (optional, not a dependency)
172
+ └── REST API / CLI triggers
173
+ ```
174
+
175
+ **Build order for MVP:**
176
+ 1. `pi-ai` integration -- WorkRail daemon calls Claude API directly via pi-mono's unified API
177
+ 2. `agentLoop` wrapper -- WorkRail drives agent steps using pi-mono's loop, advancing its own session
178
+ 3. Single trigger: GitLab MR webhook → `coding-task-workflow` → autonomous execution
179
+ 4. Evidence collection: `BeforeToolCallResult` hook intercepts tool calls, WorkRail gates continue token on required evidence
180
+ 5. Console live view: active daemon sessions visible in existing console
181
+ 6. Task flow chaining: completed workflow A triggers workflow B
182
+
183
+ **What this surpasses:**
184
+ - nexus-core: autonomous (not human-initiated), durable, enforced, observable
185
+ - OpenClaw: workflow-enforced (not just skill-prompted), cryptographically gated, full audit trail
186
+ - ruflo/oh-my-claudecode: not a black box -- every step is visible, pauseable, resumeable
187
+ - Devin/GitHub Copilot Workspace: open source, self-hosted, works with any LLM, enforcement-first
188
+
189
+ ---
190
+
191
+ ### Claude Code source reference
192
+
193
+ The leaked Claude Code source is at `https://github.com/Archie818/Claude-Code` (also mirrored at `ai-tpstudio/claude-code-haha`). Key files to study before designing WorkRail's autonomous mode:
194
+
195
+ | File | What to learn |
196
+ |------|---------------|
197
+ | `src/commands/compact/compact.ts` | How compaction works: `trySessionMemoryCompaction` first, then `compactConversation`, then `microcompactMessages`. Session memory compaction is separate from conversation compaction -- two different mechanisms. Pre-compact hooks (`executePreCompactHooks`) run before compaction, giving WorkRail an integration point to inject its session notes before context is summarized. |
198
+ | `src/services/compact/sessionMemoryCompact.ts` | Session memory as a durable store that survives compaction -- this is the pattern WorkRail should adopt: inject WorkRail step notes into session memory so they survive context resets |
199
+ | `src/assistant/sessionHistory.ts` | Paginated session event log via API (`/v1/sessions/{id}/events`). WorkRail already has this pattern in its own session store -- the key insight is that Claude Code stores events server-side and fetches them page by page, not just in the context window |
200
+ | `src/commands/agents/agents.tsx` + `src/components/CoordinatorAgentStatus.tsx` | Subagent coordination model -- coordinator agent dispatches to worker agents, each with their own tool permission context |
201
+ | `src/commands/hooks/hooks.tsx` | Hook system: `PreToolUse`, `PostToolUse`, `Stop` hooks. WorkRail can write these via `setup-hooks.sh` to observe agent actions and gate continue tokens on required evidence |
202
+ | `src/bridge/sessionRunner.ts` | How sessions are initiated and run programmatically -- key for WorkRail's autonomous daemon mode |
203
+ | `src/components/CompactSummary.tsx` | What survives compaction as visible summary -- informs what WorkRail should inject into the summary to preserve workflow state |
204
+
205
+ ### OpenClaw architecture deep-dive
206
+
207
+ **Repo:** `https://github.com/openclaw/openclaw` -- 357k stars, MIT, TypeScript, sponsored by OpenAI + GitHub + NVIDIA. The real one. Created Nov 2025, actively maintained Apr 2026.
208
+
209
+ **What OpenClaw is:** A personal AI assistant daemon ("the lobster way 🦞") that runs 24/7 on your machine, listens on 20+ messaging channels (WhatsApp, Telegram, Slack, Discord, iMessage, etc.), and executes tasks autonomously. It's the architecture blueprint for WorkRail's autonomous mode.
210
+
211
+ **Key architectural concepts:**
212
+
213
+ **ACP (Agent Control Protocol)** -- OpenClaw's core protocol for managing autonomous agent sessions:
214
+ - `src/acp/session.ts` -- `AcpSessionStore` with in-memory session management (up to 5,000 sessions, 24h idle TTL, LRU eviction). Clean interface: `createSession`, `setActiveRun`, `cancelActiveRun`, `clearActiveRun`. Uses `AbortController` for cancellation. **WorkRail already has a superior version of this** -- durable disk-persisted sessions vs OpenClaw's in-memory store.
215
+ - `src/acp/policy.ts` -- `AcpDispatchPolicyState` ("enabled" | "acp_disabled" | "dispatch_disabled"), per-agent allowlist via `cfg.acp.allowedAgents`. Clean policy separation. WorkRail should adopt the same `isXxxEnabledByPolicy(cfg)` pattern for its daemon config.
216
+ - `src/acp/control-plane/` -- `manager.ts` (session lifecycle), `spawn.ts` (session creation), `session-actor-queue.ts` (serialized per-session message processing), `runtime-cache.ts` (in-flight session cache)
217
+ - `src/agents/acp-spawn.ts` -- `SpawnAcpParams` (`task`, `label`, `agentId`, `resumeSessionId`, `cwd`, `mode`, `thread`, `sandbox`, `streamTo`). This is the entry point for spawning an autonomous agent session. Key insight: `resumeSessionId` enables resuming a previous session -- WorkRail's checkpoint token is the superior version of this.
218
+
219
+ **Task system** (`src/tasks/`) -- Full task registry with SQLite persistence:
220
+ - `task-registry.store.sqlite.ts` -- SQLite-backed task store (vs WorkRail's append-only event log)
221
+ - `task-executor.ts` -- `createRunningTaskRun`, `TaskRuntime` ("acp" | "subagent"), `TaskScopeKind` ("session"), `TaskFlowRecord` for chained task flows
222
+ - `task-flow-registry.ts` -- Task flow registry for chaining workflows -- `createTaskFlowForTask`, `linkTaskToFlowById`. This is the workflow chaining primitive WorkRail needs for its autonomous mode.
223
+ - `TaskNotifyPolicy`, `TaskDeliveryStatus`, `TaskTerminalOutcome` -- clean typed state machine for task lifecycle
224
+
225
+ **Channel system** (`src/channels/`) + **Skills** (`skills/`) -- 50+ integrations as installable skills:
226
+ - Each skill is a `SKILL.md` declaring what the skill does and how the agent should use it
227
+ - Channels (WhatsApp, Telegram, Slack, etc.) are separate extensions in `extensions/`
228
+ - For WorkRail: the `skills/github/`, `skills/slack/`, `skills/taskflow/`, `skills/session-logs/` skills are directly relevant
229
+
230
+ **What WorkRail should take from OpenClaw (architectural patterns, not code):**
231
+
232
+ 1. **`session-actor-queue.ts` pattern** -- serialize messages per session to prevent concurrent modification. WorkRail's gate/lock system already does this but the OpenClaw pattern is simpler for the daemon use case.
233
+
234
+ 2. **`SpawnAcpParams` interface** -- the minimal interface for spawning an autonomous task. WorkRail's equivalent: `{ workflowId, goal, context, triggerSource, resumeCheckpointToken? }`.
235
+
236
+ 3. **Task flow chaining** -- `createTaskFlowForTask` + `linkTaskToFlowById` is the pattern for chaining workflows. WorkRail's version: final step of Workflow A produces a `{kind: "wr.chain", workflowId, context}` artifact that the daemon picks up and starts Workflow B with.
237
+
238
+ 4. **Policy system** -- `isAcpEnabledByPolicy(cfg)` pattern for feature flags and agent allowlists in daemon config. WorkRail daemon config should follow this.
239
+
240
+ 5. **`TaskRuntime` enum** -- distinguishing "acp" (full autonomous session) vs "subagent" (delegated sub-task). WorkRail has the same distinction in its workflow format; the daemon should surface it the same way.
241
+
242
+ **What WorkRail does BETTER than OpenClaw:**
243
+ - Durable disk-persisted sessions (OpenClaw: in-memory, 24h TTL)
244
+ - Cryptographic step enforcement (OpenClaw: none -- tasks can be abandoned or skipped)
245
+ - Full execution trace + DAG visualization (OpenClaw: none)
246
+ - Checkpoint/resume with signed portable tokens (OpenClaw: `resumeSessionId` but no cryptographic binding)
247
+ - Workflow composition with loops, conditionals, typed context (OpenClaw: free-form task strings)
248
+
249
+ **The integration play (optional, not a dependency):** OpenClaw's channel system is the input layer; WorkRail's workflow engine is the execution layer. A WorkRail skill for OpenClaw could be: "when you receive a task that matches a WorkRail workflow, dispatch it to the WorkRail daemon and report back results." OpenClaw handles the messaging; WorkRail handles the enforcement.
250
+
251
+ **However: WorkRail should be freestanding.** The autonomous daemon must work completely independently -- no OpenClaw required. Triggers come from webhooks (GitLab, Jira, GitHub), cron schedules, CLI invocations, and the console UI. The OpenClaw integration is an optional add-on for users who want channel-based interaction (Slack, Telegram, etc.), not a prerequisite. WorkRail's value proposition is enforcement + durability + observability; those are fully available without OpenClaw. Build the daemon first as a self-contained system; consider an OpenClaw skill as a future distribution channel, not a core dependency.
252
+
253
+ **Key compaction insight for WorkRail:** Claude Code has three compaction tiers: (1) session memory compaction (preferred, uses durable server-side memory), (2) full conversation compaction (summarize everything into one message), (3) microcompaction (emergency, minimal). WorkRail's step notes should be injected into tier 1 (session memory) so they survive all three tiers. The `preCompactHooks` integration point is where WorkRail can do this injection.
254
+
255
+ ### Competitive landscape: autonomous agent platforms
256
+
257
+ | Project | Stars | What it is | WorkRail's advantage |
258
+ |---------|-------|------------|---------------------|
259
+ | **ruflo** (ruvnet/ruflo) | 31.8k | "Leading agent orchestration platform for Claude" -- multi-agent swarms, RAG, distributed intelligence | No workflow enforcement -- agents can drift or skip. No session durability. WorkRail's token protocol means steps can't be skipped even in long autonomous runs |
260
+ | **oh-my-claudecode** (Yeachan-Heo) | 28.8k | Teams-first multi-agent orchestration for Claude Code | Orchestration without enforcement. No auditability. WorkRail has a full session history and DAG visualization |
261
+ | **AionUi / OpenClaw** (iOfficeAI) | 21.8k | 24/7 cowork app supporting multiple CLI agents (Claude Code, Gemini CLI, Codex, etc.) | Interface/UI layer -- not a workflow engine. No step enforcement or session state |
262
+ | **OpenClaw core** (clawdkit) | ~1 | Language-agnostic autonomous agent runtime | Very early / minimal. No workflow composition, no enforcement, no console |
263
+ | **nexus-core** (Peter Yao, internal) | 11 (internal) | Full-lifecycle AI dev workflow for Zillow engineers | No autonomous mode (human-initiated only). No session durability. No cryptographic enforcement |
264
+
265
+ **The gap WorkRail fills:** Every existing autonomous agent platform is a black box -- you can't see what the agent did, you can't enforce that it followed a process, and you can't resume a session that was interrupted. WorkRail's autonomous mode would be the first open-source platform that combines:
266
+ 1. Autonomous execution (daemon, triggers, API calls)
267
+ 2. Cryptographic step enforcement (cannot skip)
268
+ 3. Full session observability (DAG, execution trace)
269
+ 4. Durable cross-session state (survives restarts, compaction)
270
+ 5. Human-in-the-loop control plane (console approvals, pause/resume)
271
+
272
+ ### Workflow chaining + compaction design sketch
273
+
274
+ When WorkRail chains workflows autonomously:
275
+ 1. Workflow A completes -- final step output becomes context for Workflow B
276
+ 2. Before starting Workflow B, WorkRail injects relevant step notes from Workflow A's session into Claude's session memory (via pre-compact hook or explicit system prompt injection)
277
+ 3. If context compacts during Workflow B, the session memory contains WorkRail's structured notes -- nothing important is lost
278
+ 4. WorkRail's own session store has the complete history regardless of what happens to Claude's context window -- it's the ground truth
279
+
280
+ This means WorkRail's session store is not just a log -- it's the **memory that survives compaction**. Every piece of information in a step note is recoverable even if Claude's context window is completely reset.
281
+
282
+ ### Subagent design sketch
283
+
284
+ WorkRail autonomous sessions can spawn subagents for parallel work:
285
+ - Coordinator session holds the main workflow state and continue token
286
+ - Subagent sessions each run a delegated routine (already supported in WorkRail's workflow format via `mcp__nested-subagent__Task`)
287
+ - In autonomous mode, subagents are separate Claude API calls managed by WorkRail's daemon
288
+ - Each subagent reports back to the coordinator via WorkRail's session store, not via in-context communication
289
+ - This is more robust than nexus-core's Opus/Sonnet/Haiku orchestration pattern which depends on context not degrading across the delegation boundary
290
+
291
+ ---
292
+
293
+ ## Workflow ideas
294
+
295
+ ### Standup Status Generator
296
+
297
+ - **Status**: idea
298
+ - **Summary**: A workflow that automatically generates a daily standup status by aggregating activity across the user's tools since the last standup.
299
+ - **Data sources** (adaptive based on what the user has available):
300
+ - Git history (commits, branches, PRs/MRs)
301
+ - GitLab (merge requests, comments, reviews)
302
+ - Jira (ticket transitions, comments, new assignments)
303
+ - Other issue trackers or project management tools the user configures
304
+ - **Key behavior**:
305
+ - Detect the last standup date (stored in session or inferred from history)
306
+ - Aggregate activity since that date across all configured sources
307
+ - Categorize into "what I did", "what I'm doing today", and "blockers"
308
+ - Generate a concise, human-readable standup message
309
+ - **Design considerations**:
310
+ - Should be tool-agnostic: detect available integrations and adapt
311
+ - Could leverage MCP tool discovery to find available data sources at runtime
312
+ - Needs a lightweight persistence mechanism for last-standup timestamp
313
+ - Output format should be configurable (Slack message, plain text, structured JSON)
314
+
315
+ ## Feature ideas
316
+
317
+ ### Console interactivity and liveliness
318
+
319
+ - **Status**: idea
320
+ - **Summary**: Make the console feel more alive and interactive -- currently it is largely a static visualization layer. Key areas: DAG node hover effects, micro-animations, click-to-inspect affordances, and overall responsiveness to user input.
321
+ - **Concrete starting points**:
322
+ - **DAG node hover effects** -- nodes in `RunLineageDag` should have visible hover states: border brightens, subtle background glow, cursor changes to pointer. Currently nodes are clickable but give no visual feedback until clicked. This is the single highest-impact item.
323
+ - **Node selection highlight** -- the selected node should pulse or glow in a way that draws the eye, rather than just a static border change.
324
+ - **Transition animations** -- when the node detail panel slides in, the selected node in the DAG should subtly indicate the connection (e.g. a brief highlight flash).
325
+ - **Live session pulse** -- sessions with `status: in_progress` could have a subtle periodic animation (not just a static badge) to reinforce that something is actively running.
326
+ - **Tooltip polish** -- the current tooltip (delayed 300ms, no animation) could fade in/out rather than appearing instantly.
327
+ - **Design constraint**: the console already has a strong aesthetic (dark navy, amber accent, cyberpunk adjacent). Interactivity additions should reinforce this language, not contradict it. See `docs/design/console-cyberpunk-ui-discovery.md` for the ranked visual language list.
328
+ - **Where to start**: DAG node hover is in `console/src/components/RunLineageDag.tsx`. ReactFlow nodes use custom node type components -- hover state can be managed via React state or CSS. The tooltip pattern (`handleNodeMouseEnter`/`handleNodeMouseLeave`) already exists; a hover glow is a natural peer addition.
329
+ - **Related**: `docs/design/console-cyberpunk-ui-discovery.md` (ranked list of visual polish items), `docs/design/console-ui-backlog.md`
330
+
331
+ ### Autonomous background agent platform ⭐ HIGH PRIORITY
332
+
333
+ - **Status**: idea -- high priority, not yet designed
334
+ - **Summary**: Transform WorkRail from an MCP server that responds to agent calls into a persistent background daemon that initiates workflows autonomously, integrates with external systems (Jira, GitLab, Slack), and uses the console as a control plane rather than a passive visualization tool.
335
+ - **The shift**: today WorkRail waits for an agent to call it. In this model, WorkRail *initiates* -- it listens for triggers, calls the Claude API directly, manages conversations, advances its own sessions, and surfaces results through the console. Humans interact via the console or via external system integrations, not necessarily via an AI coding session.
336
+ - **Core capabilities**:
337
+ - **Triggers** -- Jira webhook when a ticket moves to "In Progress," GitLab webhook when an MR is opened, cron schedule, Slack message, manual console dispatch. WorkRail selects the right workflow and starts a session automatically.
338
+ - **Autonomous execution** -- WorkRail spawns a Claude API session (not Claude Code -- direct Anthropic API), passes the workflow step by step, collects tool call results, advances without a human in the loop unless a step requires approval.
339
+ - **Integration layer** -- first-class tools for Jira (read ticket, post comment, transition status), GitLab (read MR, post review comment, approve/request changes), Slack (send message, read channel), PagerDuty (acknowledge alert). These are just tools workflows can call.
340
+ - **Console as mission control** -- live running sessions visible in the console, not just history. Pause a session, inject context, approve a step, redirect. Think Temporal's UI but for AI workflows.
341
+ - **Evidence collection** -- hooks into Claude Code's `PreToolUse`/`PostToolUse` events to observe what the agent actually did, not just what it reported. Required evidence declared in workflow steps; token gated on observed evidence, not agent claims.
342
+ - **Why WorkRail's existing architecture already points here**:
343
+ - Durable session store is append-only -- exactly right for long-running background jobs
344
+ - Token protocol handles resumption -- a background job that gets interrupted can resume via checkpoint token
345
+ - DAG console already visualizes session state -- one step from making it live
346
+ - Workflow composition (templateCall, routines, loops) already supports complex orchestration
347
+ - **Concrete first use cases** (Zillow/Mercury Mobile):
348
+ - Auto-review every incoming MR using `mr-review-workflow` -- post findings as GitLab comment
349
+ - Auto-triage new Jira tickets assigned to Mercury Mobile -- classify, estimate, link to related work
350
+ - Daily async standup summary -- aggregate team activity, post to Slack channel
351
+ - Auto-run `goals-update-workflow` before every 1:1 based on calendar trigger
352
+ - **What's genuinely hard**:
353
+ - MCP transport assumption breaks -- WorkRail needs to *initiate* Claude API calls, not wait for them
354
+ - Credential management -- background process needs Claude API key, Jira token, GitLab token; secrets model needs design
355
+ - Concurrency and resource limits -- multiple simultaneous autonomous sessions need guardrails
356
+ - Human-in-the-loop design -- some steps should pause and wait for human approval before proceeding
357
+ - **Why this surpasses nexus-core**:
358
+ - nexus-core is fundamentally human-initiated -- you run `/flow`, it works because you're there. It cannot run autonomously while you sleep. It's a plugin, not a daemon.
359
+ - WorkRail's durable session model is already designed for this. nexus-core would need a full architectural rewrite.
360
+ - **Why this is differentiated in the broader market**:
361
+ - Devin, GitHub Copilot Workspace, etc. are autonomous coding agents but are black boxes -- no enforcement, no auditability, no human control plane
362
+ - WorkRail's autonomous mode retains cryptographic step enforcement and full session observability -- you can see exactly what it did and why, pause it, resume it, roll back to a checkpoint
363
+ - **Design questions**:
364
+ - Should the daemon run as a separate process from the MCP server, or share the same process with different entry points?
365
+ - How does the console authenticate to the daemon for live session control?
366
+ - What is the minimal trigger/integration surface for v1 -- just GitLab MR webhooks + Jira ticket webhooks?
367
+ - How do we handle workflows that require human approval mid-step in an otherwise autonomous session?
368
+ - Should WorkRail ship integration adapters, or define an integration contract that external adapters implement?
369
+ - **Related**:
370
+ - `docs/design/console-ui-backlog.md` -- console evolution
371
+ - `docs/roadmap/open-work-inventory.md` -- platform vision
372
+ - Discovery notes: `~/git/zillow/etienne-2026-goals/goals/2026/discovery-notes-apr-2026.md`
373
+
374
+ ---
375
+
376
+ ### Forever backward compatibility via engine version declaration
377
+
378
+ - **Status**: high importance, not yet properly thought through -- the solution sketched here is tentative and needs real design work before implementation
379
+ - **Summary**: Every workflow declares the WorkRail engine version it was written against (`workrailVersion: "1.4.0"`). The engine maintains compatibility adapters for all previous declared versions -- old workflows run forever without author intervention. The engine adapts; authors never migrate. **This is one rough idea; the right solution may look completely different after proper design.**
380
+ - **Design direction**:
381
+ - Add `workrailVersion` as a top-level required field in `workflow.schema.json`. Validated at load time; workflows without it default to `"1.0.0"` (the oldest supported version).
382
+ - The engine has a `WorkflowVersionAdapter` layer that normalizes old workflow shapes into the current internal representation before execution. Branching paths in the compiler/executor handle version-specific semantics.
383
+ - New fields are always additive and optional with sensible defaults -- never remove a field, only deprecate with a redirect.
384
+ - When a workflow is loaded, the engine resolves its declared version and selects the appropriate normalization path. `workrailVersion` is recorded in `run_started` events for diagnostic traceability.
385
+ - The validation pipeline (`npm run validate:registry`) runs all bundled workflows through all adapters in CI to catch regressions before release.
386
+ - **The web model**: this is how browsers handle HTML from 1995. A `<marquee>` tag still renders because the browser adapts, not because the author rewrote their page. WorkRail should make the same guarantee to workflow authors.
387
+ - **Engineering implication**: this is a permanent commitment. Once a version adapter is shipped, it cannot be removed. The tradeoff is real but the alternative (expecting external authors to track WorkRail releases and migrate) breaks the platform trust model.
388
+ - **What this does NOT mean**:
389
+ - Authors still benefit from upgrading -- newer versions get access to new primitives (assessment gates, loop control, references, etc.)
390
+ - The engine only adapts the schema and execution semantics, not the runtime environment (MCP tools, context variables, file system)
391
+ - "Forever" means "as long as WorkRail is maintained" -- version adapters would only be removed with a major breaking release and explicit migration announcement
392
+ - **Implementation sketch**:
393
+ - Phase 1: Add `workrailVersion` field to schema. Default to `"1.0.0"` for existing workflows. Record in run events.
394
+ - Phase 2: Introduce the first adapter when the first schema-breaking change is needed. The adapter normalizes the old shape to the current internal representation.
395
+ - Phase 3: Build a compatibility test harness that runs representative old-version workflows against the current engine in CI.
396
+ - **Related**:
397
+ - `docs/design/v2-core-design-locks.md` -- existing invariants (must not conflict)
398
+ - `docs/reference/workflow-execution-contract.md` -- execution contract
399
+ - `src/v2/read-only/v1-to-v2-shim.ts` -- existing precedent for version adaptation
400
+
401
+ ---
402
+
403
+ ### Remote references (URLs, GDocs, Confluence, etc.)
404
+
405
+ - **Status**: idea
406
+ - **Summary**: Extend the workflow `references` system to support remote sources (HTTP URLs, Google Docs, Confluence pages, etc.) in addition to local file paths. WorkRail remains a pointer system — it resolves and delivers reference metadata, and the agent does the actual fetching using whatever tools it has available. Auth is entirely delegated to the agent.
407
+ - **Core design principle**: same model as today, extended to remote sources. WorkRail validates that a reference declaration is well-formed, delivers the pointer to the agent at workflow start, and the agent fetches the content with its own HTTP or integration tools. If the agent lacks access, it surfaces that to the user — which is the right failure mode. WorkRail does not need to store credentials or act as a fetch proxy.
408
+ - **Why this matters**: teams keep their authoritative docs (architecture decisions, coding standards, runbooks, API contracts) in external systems. Remote refs let workflows point at those docs directly without requiring anyone to maintain a local copy.
409
+ - **Incremental path**:
410
+ - Phase 1: public HTTP URLs. `resolveFrom: "url"`. WorkRail delivers the URL as a reference pointer. Agent fetches using HTTP tools. No auth surface in WorkRail.
411
+ - Phase 2: workspace-configured bearer tokens in `.workrail/config.json` keyed by domain. Covers most internal tools (Confluence API tokens, private wikis, etc.) without native integrations.
412
+ - Phase 3: named integrations (GDocs, Confluence, Notion) as first-class configured sources — the full platform play, only if Phase 1/2 prove insufficient.
413
+ - **Reachability validation**: soft check or skippable at start time. A URL being reachable during validation doesn't guarantee the agent can authenticate at runtime, and a failed ping shouldn't block the workflow from starting.
414
+ - **Design questions**:
415
+ - Should WorkRail attempt a reachability check at start time, or skip entirely for remote refs?
416
+ - How should remote refs appear in `workflowHash`? The declaration is stable but content is not — may need content-hashing at fetch time or explicit versioned URLs for determinism.
417
+ - Should the `references` schema add a `kind` field (`local` vs `remote`) or infer from the `source` value?
418
+ - **Risks / tradeoffs**:
419
+ - Agent-side fetching means the workflow only works if the agent has appropriate tools — acceptable tradeoff, explicitly the user's responsibility
420
+ - Remote content can change between runs, weakening the determinism guarantee that local refs provide
421
+
422
+ ### Declarative workflow composition engine
423
+
424
+ - **Status**: idea
425
+ - **Summary**: Instead of authoring full workflow JSON for every use case, users or agents fill out a declarative spec (dimensions, scope, rigor level, etc.) and the WorkRail engine assembles a workflow automatically from a library of pre-validated routines and step templates. The agent is a form-filler, not an architect - the composition logic lives in the engine.
426
+ - **Why this is different from agent-generated workflows**:
427
+ - Agent-generated workflows have no quality gate - you're trusting the agent's judgment on structure, which is exactly what workflow-for-workflows exists to prevent
428
+ - Engine-composed workflows are assembled from pre-reviewed building blocks using deterministic rules - same spec always produces the same workflow shape
429
+ - Trustworthy because composition logic is owned by WorkRail, not improvised at runtime
430
+ - **How it would work**:
431
+ - A composable routine library with well-defined inputs, outputs, and composition contracts
432
+ - A spec format that captures user intent declaratively (e.g. workflow type, dimensions to cover, scope, rigor mode)
433
+ - A composition engine that selects and wires the right routines based on the spec
434
+ - The assembled workflow is fully inspectable before execution - no black box
435
+ - **Relationship to current authoring**:
436
+ - Full workflow JSON authoring remains the escape hatch for workflows that need custom shapes the composition engine can't express
437
+ - Composition covers the common cases; manual authoring covers the edge cases
438
+ - Routines built for composition also remain usable as standalone delegatable units in manually authored workflows
439
+ - **Good early use cases**:
440
+ - Audit-style workflows (scalability audit, readiness audit, tech debt audit) - user picks dimensions, engine assembles the right auditor steps
441
+ - Review workflows - user picks scope and rigor, engine assembles reviewer family + synthesis
442
+ - Investigation workflows - user picks investigation type, engine assembles the right hypothesis + evidence + validation path
443
+ - **Design questions**:
444
+ - What is the right spec format? Enums + variables + a workflow type identifier? A richer DSL?
445
+ - How does the engine handle dependencies between composed steps (context flow, artifact ownership)?
446
+ - Should composition happen at session-start time (assembled once, then executed) or be fully static (compiled to workflow JSON)?
447
+ - How does the console/dashboard show a composed workflow's structure vs a manually authored one?
448
+ - What is the governance model for the composable routine library - who can add to it, and what quality bar do new routines need to meet?
449
+ - **Risks / tradeoffs**:
450
+ - A composition engine is a significant investment - the routine library needs enough coverage before composition is useful
451
+ - Composition rules can become their own form of complexity if not kept simple
452
+ - Need a clear story for when manual authoring is the right choice vs composition, so authors don't fight the system
453
+
454
+ ### Dashboard artifacts (replace file-based docs)
455
+
456
+ - **Status**: designed, not yet implemented
457
+ - **Summary**: Instead of having agents write markdown files into the working repo, agents would submit structured artifacts through `continue_workflow` output payloads. Artifacts are stored per-session and rendered in the console/dashboard. Eliminates repo pollution and gives users a single place to see all workflow outputs.
458
+ - **Key dependencies**: console/dashboard UI (does not exist yet), server-side artifact storage
459
+ - **Design doc**: `docs/reference/workflow-execution-contract.md` (section "Replacing File-Based Docs with Dashboard Artifacts")
460
+
461
+ ### Derived / overlay workflows for bundled workflow specialization
462
+
463
+ - **Status**: parked idea
464
+ - **Note**: see `docs/roadmap/open-work-inventory.md` for details
465
+
466
+ ### Workflow categories and category-first discovery
467
+
468
+ - **Status**: idea
469
+ - **Summary**: Improve workflow discovery by organizing bundled workflows into categories and teaching `list_workflows` to support a category-first exploration path instead of always returning one large flat list.
470
+ - **Why this seems useful**:
471
+ - the workflow catalog is getting large enough that flat discovery is becoming noisy
472
+ - agents often do not know the exact workflow ID they want, but they may know the task family (coding, review, docs, investigation, planning, learning)
473
+ - category-first discovery could reduce prompt overload and make workflow selection feel more guided
474
+ - **Possible phase 1 shape**:
475
+ - add workflow categories as metadata on workflow definitions or a registry-side mapping
476
+ - extend `list_workflows` with an optional category-style input
477
+ - if no category is passed, return:
478
+ - category names
479
+ - workflow count per category
480
+ - a few representative workflow titles per category
481
+ - guidance telling the agent to call `list_workflows` again with the category it wants
482
+ - if a category is passed, return the full workflows for that category with names, descriptions, IDs, and hashes
483
+ - **Possible phase 2 shape**:
484
+ - support multiple discovery views such as grouped-by-category, grouped-by-source, or full flat list
485
+ - add filtering by category + source + maybe keywords
486
+ - align category discovery with future platform / multi-root discovery work
487
+ - **Design questions**:
488
+ - should categories live in workflow JSON, in a registry overlay, or be inferred from directory / naming conventions?
489
+ - should `list_workflows` become polymorphic, or should category discovery be a separate read-only tool / mode?
490
+ - how much summary content should the uncategorized response include before it becomes too verbose again?
491
+ - how do categories interact with routines, examples, project workflows, and external workflow repositories?
492
+ - **Risks / tradeoffs**:
493
+ - changing `list_workflows` is a real tool contract and output-schema change, not just a UI tweak
494
+ - overloading one tool with too many discovery modes could make the contract less predictable
495
+ - static categories can drift unless there is a clear ownership model
496
+ - **Related docs / context**:
497
+ - `docs/plans/workrail-platform-vision.md` (already discusses grouped discovery by source)
498
+ - `docs/roadmap/open-work-inventory.md` (legacy workflow modernization increases the need for better discovery)
499
+ - current implementation: `src/mcp/handlers/v2-workflow.ts`, `src/mcp/v2/tools.ts`, `src/mcp/output-schemas.ts`
500
+
501
+ ### Multi-root workflow discovery and setup UX
502
+
503
+ - **Status**: designing
504
+ - **Summary**: Simplify third-party and team workflow hookup by requiring explicit `workspacePath`, silently remembering repo roots in user-level `~/.workrail/config.json`, recursively discovering team/module `.workrail/workflows/` folders under remembered roots, and improving grouped source visibility / precedence explanations. Use workspace-aware ranking, cross-repo surfacing, and later console integration as the control plane for inspecting remembered roots, discovered workflow sources, and precedence. For remote repositories, prefer **managed sync by default** so users experience remote workflow repos as connected and kept current while WorkRail still reasons over a local effective state. Avoid trusting MCP roots and avoid requiring workflow config to live at the main repo root.
505
+ - **Current recommendation**:
506
+ - phase 1: `Rooted Team Sharing + minimal Source Control Tower`
507
+ - require explicit workspace identity
508
+ - silently persist repo roots at the user level
509
+ - support cross-repo workflows from remembered roots
510
+ - make remote repos default to managed-sync mode rather than pinned snapshots or live-remote behavior
511
+ - treat Slack/chat/file/zip sharing as an ingestion path that classifies into repo, file, pack, or snippet flows
512
+ - design the backend so the console can eventually manage and explain the remembered/discovered source model
513
+ - **Additional idea**:
514
+ - explore enterprise auth / SSO integration for private repo access, such as Okta-backed flows for GitHub Enterprise, GitLab, or other self-hosted providers
515
+ - likely shape: WorkRail detects that a private repo uses org-managed auth and guides the user through the right browser/device-code/credential flow instead of assuming raw personal-access-token setup
516
+ - main question: should WorkRail integrate directly with identity providers like Okta, or should it integrate one layer lower with Git hosts / credential helpers that are already SSO-aware?
517
+ - **Design doc**: `docs/ideas/third-party-workflow-setup-design-thinking.md`
518
+
519
+ ### Workflow rewind / re-scope support
520
+
521
+ - **Status**: idea
522
+ - **Summary**: Allow an in-progress workflow session to go back to an earlier point when new information changes scope understanding, invalidates assumptions, or reveals that the current execution path is wrong.
523
+ - **Why this seems useful**:
524
+ - agents and users often learn important scope information only after work has already started
525
+ - current step-by-step enforcement is strong, but it can feel rigid if the original framing turns out to be wrong
526
+ - a first-class rewind / re-scope mechanism could make workflows feel safer and more adaptable without abandoning structure
527
+ - **Possible phase 1 shape**:
528
+ - allow rewind to a prior checkpoint or earlier decision node with an explicit reason
529
+ - record a short “why we rewound” note in session history
530
+ - make the resumed path visible in the console/session timeline
531
+ - **Possible phase 2 shape**:
532
+ - support scope-change prompts like:
533
+ - “our understanding changed”
534
+ - “the task is broader/narrower than we thought”
535
+ - “we need to revisit planning before implementation”
536
+ - let workflows declare safe rewind points or re-scope checkpoints explicitly
537
+ - support branch-aware comparison between abandoned and current paths
538
+ - **Design questions**:
539
+ - should rewind be limited to explicit checkpoints, or should WorkRail support arbitrary node-level rewind?
540
+ - how should the system preserve durable notes and outputs from abandoned paths?
541
+ - should some workflow steps be marked as non-rewindable once external side effects have happened?
542
+ - how should the agent explain to the user what changed and why a rewind is appropriate?
543
+ - **Risks / tradeoffs**:
544
+ - rewind power could make workflows feel less deterministic if used too casually
545
+ - durable session history gets more complex when abandoned paths and resumed paths coexist
546
+ - workflows with real-world side effects may need stricter rollback / compensation rules
547
+
548
+ ### Assessment-gate follow-up tiers beyond v1
549
+
550
+ - **Status**: idea
551
+ - **Summary**: Capture the likely progression of assessment-triggered redo / follow-up behavior so the engine can grow beyond the narrow v1 same-step follow-up model without losing the conceptual roadmap.
552
+ - **Why this seems useful**:
553
+ - assessment-triggered follow-up is likely to want richer behavior over time
554
+ - the v1 consequence model is intentionally narrow, but the design pressure already points toward stronger redo semantics
555
+ - writing the tiers down now reduces the chance that future work jumps straight to a subflow design without acknowledging the intermediate options
556
+ - **Tier 1: same-step follow-up retry**
557
+ - consequence keeps the same step pending
558
+ - engine returns semantic follow-up guidance
559
+ - agent retries the same step after improving its work / evidence
560
+ - this is the current intended v1 behavior
561
+ - **Tier 2: structured redo recipe on the same step**
562
+ - same step still remains the logical unit of work
563
+ - engine can surface a bounded checklist or structured follow-up actions
564
+ - no new DAG nodes or true subflow yet
565
+ - likely useful if “retry” is too vague but full subflow control flow would be too heavy
566
+ - **Tier 3: assessment-triggered redo subflow**
567
+ - matched assessment consequence routes into an explicit sequence of follow-up steps
568
+ - subflow has its own durable progress and then returns to the original step or onward path
569
+ - this is a significantly larger feature because it introduces assessment-driven control-flow behavior rather than just a blocked follow-up requirement
570
+ - **Design questions**:
571
+ - when does Tier 2 become necessary instead of plain semantic retry guidance?
572
+ - what durable model would Tier 3 need for entering, progressing through, and returning from a redo subflow?
573
+ - how should the engine distinguish “redo the same step better” from “enter a dedicated recovery path”?
574
+ - can Tier 3 reuse existing workflow / routine primitives, or would it need dedicated assessment-triggered topology support?
575
+ - **Risks / tradeoffs**:
576
+ - jumping straight from Tier 1 to Tier 3 could create a hidden mini control-flow DSL
577
+ - Tier 2 may be enough for many real cases and should not be skipped without evidence
578
+ - Tier 3 likely changes authoring, durability, replay, and console explainability at the same time
579
+
580
+ ### Console engine-trace visibility and phase UX
581
+
582
+ - **Status**: idea
583
+ - **Summary**: Evolve the console from a node-only DAG viewer into an execution-aware surface that shows both created nodes and the engine decisions that explain how the run got there. This should make fast paths, skipped phases, condition evaluation, loop entry/exit, and branch selection legible instead of looking like missing DAG nodes or broken rendering.
584
+ - **Why this seems useful**:
585
+ - users currently see only `node_created` / `edge_created`, which makes legitimate engine behavior look like missing workflow phases
586
+ - workflows use authoring concepts like phases, fast paths, run conditions, and loop gates, but the console does not show those decisions today
587
+ - sessions like small-task fast paths can appear to “jump” from phase 0 to phase 5 even when the engine is behaving correctly
588
+ - **Current gap**:
589
+ - engine event log records `decision_trace_appended`, `context_set`, and related runtime decisions
590
+ - console DTOs expose only run status plus DAG nodes/edges and node detail
591
+ - there is no first-class UI for “why the engine chose this path”
592
+ - **Recommended direction**:
593
+ - keep phases as authoring / workflow-organization concepts
594
+ - stop treating the rendered DAG as the whole execution story
595
+ - add an engine-trace / decision layer that can show:
596
+ - selected next step
597
+ - evaluated conditions
598
+ - entered/exited loops
599
+ - important run context variables such as `taskComplexity`
600
+ - skipped / bypassed planning paths such as small-task fast paths
601
+ - **Possible phase 1 shape**:
602
+ - extend console service / DTOs with a run-scoped execution-trace summary
603
+ - show a compact “engine decisions” strip or timeline above the DAG
604
+ - annotate jumps such as “small-task fast path selected” so sparse DAGs do not look broken
605
+ - **Possible phase 2 shape**:
606
+ - richer explainability timeline with branches, skipped authoring phases, and condition results
607
+ - allow toggling between “execution DAG” and “engine trace” views, or combine them in one unified run narrative
608
+ - surface effective run context and selected branch/loop decisions in node detail or run detail
609
+ - **Design questions**:
610
+ - should the console continue using phase-oriented labels in the primary UI, or should it prefer step titles / execution narrative labels?
611
+ - should trace events appear as first-class timeline items, DAG annotations, or a separate run-explanation panel?
612
+ - what subset of run context variables is useful enough to surface without becoming noisy?
613
+ - how do we distinguish authoring structure from runtime execution structure cleanly in the UX?
614
+ - **Risks / tradeoffs**:
615
+ - exposing too much raw engine state could make the console noisier and harder to scan
616
+ - mixing authoring structure and runtime trace without clear separation could create more confusion, not less
617
+ - DTO growth needs care so the console does not become tightly coupled to every low-level event detail
618
+ - **Related docs / context**:
619
+ - `docs/reference/workflow-execution-contract.md`
620
+ - `docs/design/v2-core-design-locks.md`
621
+ - `docs/plans/workrail-platform-vision.md`
622
+ - current implementation: `src/v2/usecases/console-service.ts`, `src/v2/projections/run-context.ts`, `console/src/api/types.ts`
623
+
624
+ ### Workflow previewer for compiled and runtime behavior
625
+
626
+ - **Status**: idea
627
+ - **Summary**: Add a workflow previewer for the `workflows/` directory that shows what a workflow actually compiles to and how the engine can traverse it at runtime.
628
+ - **Why this seems useful**:
629
+ - authors currently have to mentally reconstruct branching, loops, blocked-node behavior, and other runtime structure from authored JSON plus tests
630
+ - advanced workflow authoring gets much easier when the compiled DAG and runtime edges are visible
631
+ - it would help explain engine behavior to both contributors and workflow authors
632
+ - **What it should show**:
633
+ - the compiled step graph / DAG
634
+ - branch points and condition-driven paths
635
+ - loop structure and loop-control edges
636
+ - blocked / resumed / checkpoint-related node shapes where applicable
637
+ - template/routine expansion boundaries or provenance
638
+ - the gap between authored JSON structure and runtime execution structure
639
+ - **Initial scope**:
640
+ - start as a read-only preview for bundled workflows
641
+ - optimize for accuracy over polish
642
+ - do not require full execution simulation in phase 1
643
+ - **Design questions**:
644
+ - should this live in the existing Console, as a dev-only page, or as a local authoring utility?
645
+ - should it show only the compiled DAG, or also annotate likely runtime transitions such as blocked attempts, rewinds, and loop continuations?
646
+ - how much provenance should it expose for injected routines/templates?
647
+
648
+ ### Native assessment / decision gates for workflows
649
+
650
+ - **Status**: idea
651
+ - **Summary**: Add a first-class workflow primitive for structured assessments that can drive routing. The agent would assess a small set of named dimensions, give short rationales, and let the engine use explicit aggregation / gate rules to influence continuation, follow-up, branching, or final confidence.
652
+ - **Why this seems useful**:
653
+ - some workflow decisions are clearer and more auditable as small assessment matrices than as long prompt prose
654
+ - confidence computation is a strong example: workflows may want to derive final confidence from dimensions like boundary, intent, evidence, coverage, and disagreement
655
+ - explicit assessment gates would let the engine drive loops/branches without relying entirely on prose interpretation
656
+ - **Near-term shape**:
657
+ - keep the reasoning with the agent, but let the workflow declare named assessment dimensions and allowed levels such as `High | Medium | Low`
658
+ - let the agent provide one short rationale per dimension
659
+ - let the engine compute caps / next actions / routing outcomes from explicit gate rules
660
+ - **Ownership split**:
661
+ - the **agent** assesses each dimension and gives the short rationale
662
+ - the **engine** applies declared gate rules such as caps, routing outcomes, or follow-up triggers
663
+ - **Longer-term shape**:
664
+ - add a first-class authoring primitive such as `assessmentGate`, `assessmentRef`, or similar
665
+ - optionally allow reusable built-in or repo-owned assessment schemas/matrices
666
+ - optionally validate assessment shape against WorkRail-owned schemas
667
+ - **Good early use cases**:
668
+ - MR review confidence assessment
669
+ - planning readiness / confidence gates
670
+ - debugging confidence and next-step routing
671
+ - block-vs-continue / revisit-earlier-step decisions
672
+ - **Design questions**:
673
+ - should this be a narrow `assessmentGate` primitive or a more generic structured decision-table feature?
674
+ - should reusable matrices be inline first, or backed by repo-owned refs from the start?
675
+ - how much aggregation logic should the engine support directly versus leaving to workflow-defined rules?
676
+ - how should assessment provenance and rationales appear in compiled/runtime traces?
677
+
678
+ ### Engine-injected note scaffolding
679
+
680
+ - **Status**: related follow-on idea
681
+ - **Summary**: Add an opt-in execution-contract or note-structure feature that helps agents produce compact notes useful to both humans and future resume agents.
682
+ - **Why it may matter**:
683
+ - some workflows want notes to consistently capture current understanding, key findings, decisions, uncertainties, and next-step implications
684
+ - this is related to assessment-driven routing, but it is a different product concern
685
+ - **Open question**:
686
+ - should note scaffolding live as a separate execution-contract feature, or share any underlying primitives with assessment gates?
687
+
688
+ ---
689
+
690
+ ### Daemon architecture decision -- findings and direction (Apr 14, 2026)
691
+
692
+ **Status:** Research complete, direction chosen, not yet implemented.
693
+
694
+ **The question:** Should the autonomous daemon be (A) same-process calling the engine directly, (B) a separate process connecting to WorkRail's MCP server as an HTTP client, or (C) a composite same-process model with direct engine calls + REST control plane?
695
+
696
+ **What the research found:**
697
+
698
+ Two discovery agents independently reached opposite conclusions:
699
+
700
+ - Agent 1 (correctness focus): **Option B** -- separate process. Two hard bugs in same-process: (1) `LocalSessionLockV2.clearIfStaleLock()` uses `process.kill(pid, 0)` -- same PID for daemon + MCP server means a crashed daemon permanently locks sessions with no recovery; (2) `engineActive` guard in `engine-factory.ts` explicitly blocks a second engine instance per process.
701
+
702
+ - Agent 2 (vision focus): **Option C** composite -- same process, direct engine calls, REST control plane. `V2Dependencies` is already concurrent-safe (stateless, per-session locking). `engineActive` guard is about DI initialization, not concurrent handler safety. Self-referential workflows (coordinator spawning sub-workflows) work immediately via existing delegation.
703
+
704
+ **Settling the disagreement -- the lock code:**
705
+
706
+ Read `src/v2/infra/local/session-lock/index.ts` directly. Line 45 confirms Agent 1's bug is real: `process.kill(pid, 0)` -- if daemon + MCP server share a PID and the daemon crashes mid-step, the lock file's PID check returns "process alive" forever. The session is permanently locked until the process restarts. No recovery path. Hard bug.
707
+
708
+ **Direction: Option C (in-process composite) -- but fix the lock first.**
709
+
710
+ Option C is the right 12-month architecture:
711
+ - No transport overhead (MCP HTTP adds ~1ms+ per step, meaningless in human sessions, significant in tight autonomous loops)
712
+ - Shared session store, DI, keyring -- no sync issues
713
+ - Self-referential workflows work immediately -- coordinator spawns sub-workflows via existing delegation
714
+ - REST control plane on existing Express server -- 4 routes, no new process
715
+ - MCP + daemon in same binary, same deployment, same config
716
+
717
+ **The prerequisite: fix `LocalSessionLockV2`**
718
+
719
+ Replace PID-only staleness with PID + workerId:
720
+ ```json
721
+ { "pid": 1234, "workerId": "mcp-server", "sessionId": "sess_abc" }
722
+ ```
723
+
724
+ Staleness logic:
725
+ - Same PID + same workerId → I own this, proceed
726
+ - Same PID + different workerId → not stale, return SESSION_LOCK_BUSY
727
+ - Different PID, process alive → SESSION_LOCK_BUSY
728
+ - Different PID, process dead → stale, clear it
729
+
730
+ `workerId` injected at construction: `new LocalSessionLockV2(dataDir, fs, clock, workerId)`. MCP server passes `"mcp-server"`, daemon passes `"daemon"`. ~50-60 lines across `session-lock/index.ts` + `session-lock.port.ts`. Zero behavior change for existing single-process case.
731
+
732
+ Also add `isHeldByMe(sessionId)` to the lock port for clean "pause after current step" support.
733
+
734
+ **Other architecture decisions from the 5 MVP discovery agents:**
735
+
736
+ - **Context survival**: ~~3-line deletion in `prompt-renderer.ts`~~ **CORRECTED** -- injecting ancestry recap on every normal step advance is wrong. The agent completing step 4 already has steps 1-4 in context -- injecting the recap would be noise and token waste. The correct approach: the **daemon** injects the ancestry recap into the system prompt when initializing a fresh Claude API session via pi-mono's `Agent`. Engine code untouched. The existing `intent: "rehydrate"` path is already correct for human-driven sessions. This is a daemon feature, not a prompt-renderer change.
737
+ - **Evidence gate**: `requiredEvidence` field + `record_evidence` MCP tool + gate check in `detectBlockingReasonsV1`. MVP = assertion gate; push-hook upgrade = zero schema changes later.
738
+ - **Trigger system**: Standalone `src/trigger/` process (~600 LOC). GitLab MR webhook → `start_workflow` → loop `continue_workflow` → post MR comment.
739
+ - **Console live view**: `is_autonomous: true` context_set event + ephemeral `DaemonRegistry` for heartbeat + `[ LIVE ]` badge. Session lock held during steps prevents timer-based heartbeats -- hybrid model required.
740
+ - **Token persistence**: Daemon must write `continueToken` + `checkpointToken` to `~/.workrail/daemon-state.json` (atomic write) before each step. Crash without this = unrecoverable sessions.
741
+
742
+ **Build order (tentative):**
743
+
744
+ 1. Fix `LocalSessionLockV2` with workerId (prerequisite for in-process model)
745
+ 2. Context survival fix (3-line deletion -- ship immediately, it's almost free)
746
+ 3. Daemon runtime: `src/daemon/` with `runWorkflow()` calling engine directly
747
+ 4. Evidence gate: `requiredEvidence` + `record_evidence` tool
748
+ 5. Trigger system: `src/trigger/` webhook server
749
+ 6. Console live view: `DaemonRegistry` + `[ LIVE ]` badge
750
+
751
+ **Reference for loop implementation:** pi-mono `agentLoop` vs OpenClaw `session-actor-queue` -- comparison agent running, results pending.
752
+
753
+ ---
754
+
755
+ ### Agent loop decision: pi-mono wins (Apr 14, 2026)
756
+
757
+ **Use `@mariozechner/pi-agent-core` (pi-mono) as the daemon loop foundation.** Pinned at 0.67.2, MIT, 246kB, 1 dependency, published on npm.
758
+
759
+ **Key finding:** OpenClaw's runner wraps pi-mono's `Agent` class internally (`src/agents/pi-embedded-runner/run/attempt.ts` imports `@mariozechner/pi-agent-core` directly). OpenClaw adds auth rotation, provider failover, and preemptive compaction -- none needed at MVP. Comparison was always pi-mono vs "pi-mono + 80 internal modules." Easy call.
760
+
761
+ **What to take from pi-mono:**
762
+ - `Agent` class -- the multi-turn LLM + tool call loop
763
+ - `AgentTool<TParameters>` with TypeBox schemas -- define `start_workflow`, `continue_workflow`, `Bash`, `Read`, `Write`
764
+ - `getFollowUpMessages` -- termination hook: return `[]` when `isComplete=true` from `continue_workflow`
765
+ - `agent.abort()` -- cancellation threaded through every async boundary
766
+ - `agent.subscribe()` -- observability without modifying the loop
767
+
768
+ **What to reimplement from OpenClaw (not import):**
769
+ - `KeyedAsyncQueue` pattern (~30 lines) -- serializes concurrent runs against same session ID
770
+ - Retry wrapper with backoff on `stopReason === 'error'`
771
+
772
+ **Non-obvious implementation detail:** pi-mono terminates structurally (no tool calls + no follow-ups), not semantically. Bridge `isComplete` from `continue_workflow` into `getFollowUpMessages` returning `[]`. Use `createDaemonLoopConfig()` factory per run -- no shared state across concurrent sessions.
773
+
774
+ **Typed discriminant for continue_workflow result:**
775
+ ```typescript
776
+ type WorkflowContinueResult =
777
+ | { _tag: 'advance'; step: PendingStep; continueToken: string }
778
+ | { _tag: 'complete'; finalNotes: string }
779
+ | { _tag: 'error'; message: string };
780
+ ```
781
+
782
+ **Pre-production (not MVP blocking):** Add `agent.abort()` after wall-clock limit + max-turn counter via `getSteeringMessages`. No built-in timeout in pi-mono's loop.
783
+
784
+ ---
785
+
786
+ ### Mobile monitoring + control (post-MVP) ⭐
787
+
788
+ **Goal:** Control and monitor autonomous WorkRail sessions from a phone.
789
+
790
+ **What's needed:**
791
+
792
+ 1. **Mobile-responsive console** -- existing React console needs touch-friendly layout, readable on small screens, tap to pause/resume/cancel sessions. The DAG is probably too complex for mobile; a linear step-by-step log view is better for quick checks.
793
+
794
+ 2. **Push notifications** -- phone notified when a session completes, fails, or hits a human-approval gate. Simplest path: Slack/Telegram notification via configured channel (OpenClaw's channel system is the reference). No native app required for MVP of this feature.
795
+
796
+ 3. **Human-in-the-loop approval on mobile** -- workflow steps that require sign-off before proceeding ("about to merge this MR, confirm?") send a push notification with Approve/Reject. Maps to REST control plane: `POST /api/v2/sessions/:id/resume` from a mobile tap.
797
+
798
+ 4. **Session log view** -- scroll through what the daemon did while you were away. Linear timeline, not DAG.
799
+
800
+ **Simplest implementation path:** Make console responsive + add Slack/Telegram notification on session completion/failure/approval-needed. OpenClaw's 20+ channel integrations are the reference -- WorkRail doesn't need to build a native app, just configure an output channel.
801
+
802
+ **Priority:** Post-MVP, but design the REST control plane with mobile in mind from the start (clean JSON responses, no server-side rendering assumptions).
803
+
804
+ ---
805
+
806
+ ### Remote access: connect to local WorkRail from phone (post-MVP)
807
+
808
+ **Goal:** Access and control a WorkRail session running on your laptop from your phone, even behind NAT/VPN.
809
+
810
+ **The problem:** Laptop is behind NAT. Corporate VPN routes all traffic, blocking direct connections. Phone needs to reach the WorkRail console without port forwarding or IT involvement.
811
+
812
+ **Options to explore:**
813
+
814
+ 1. **`workrail tunnel` command** -- WorkRail opens an outbound authenticated tunnel (Cloudflare Tunnel or similar) from the laptop and prints a URL. Phone opens the URL, gets the live console. Works behind any NAT/VPN since the connection is outbound from the laptop. Auth via WorkRail keyring token. Most WorkRail-native story.
815
+
816
+ 2. **Tailscale integration** -- document Tailscale as the recommended setup. Zero WorkRail code needed. WorkRail console becomes accessible at a stable Tailscale address. Handles NAT and coexists with most corporate VPNs via split-tunneling.
817
+
818
+ 3. **Cloud session sync** -- daemon pushes session events to a configured cloud store (S3, Cloudflare R2). Mobile reads from there. Most robust, works offline and behind any firewall, but adds complexity and a cloud dependency.
819
+
820
+ **VPN note:** Tailscale handles most corporate VPN conflicts. `workrail tunnel` sidesteps VPN entirely since it's outbound-only from the laptop. Either approach is better than trying to punch through corporate firewalls.
821
+
822
+ **Priority:** Post-MVP. Design the REST control plane and console with this in mind -- clean JSON API, no server-side rendering assumptions, authentication token model that works over tunnels.
823
+
824
+ ---
825
+
826
+ ### WorkRail Auto: cloud-hosted autonomous platform (long-term vision) ⭐⭐
827
+
828
+ **Goal:** WorkRail Auto runs on a server 24/7, connected to your engineering ecosystem, working autonomously without a laptop open.
829
+
830
+ **What this enables:**
831
+ - GitLab opens an MR → WorkRail reviews it, posts comment, done. Laptop closed.
832
+ - Jira ticket moves to In Progress → WorkRail starts coding task, pushes branch, opens draft MR. Review it in the morning.
833
+ - PagerDuty fires → WorkRail runs incident investigation workflow, posts findings to Slack.
834
+ - Scheduled: nightly test suite run, auto-filed bugs for new failures.
835
+ - Docs updated → WorkRail triggers documentation review workflow.
836
+
837
+ **Integrations needed (not exhaustive):**
838
+ - **Triggers:** GitLab/GitHub webhooks, Jira webhooks, Linear, PagerDuty, Slack slash commands, cron
839
+ - **Actions:** GitLab/GitHub API (MR comments, branch creation, commits), Jira (transition tickets, add comments), Slack (post messages, threads), Confluence/Notion (read docs), email
840
+ - **Auth:** Per-org credential vault (Jira token, GitLab token, Slack token, etc.)
841
+
842
+ **Architecture implications for hosted:**
843
+ - Multi-tenancy: multiple users/orgs, isolated session stores, isolated credential vaults
844
+ - The tunnel problem disappears -- server has a public IP, webhooks just work
845
+ - Credential vaulting: secrets stored encrypted per org, injected at session start
846
+ - Horizontal scaling: multiple daemon instances consuming from a shared trigger queue
847
+ - Rate limiting per org, per integration
848
+
849
+ **Relationship to self-hosted:**
850
+ - Self-hosted (local) is always free, always open source, always works offline
851
+ - Hosted WorkRail Auto is the natural SaaS layer -- same engine, same workflows, managed infrastructure
852
+ - Workflows written for self-hosted run unchanged on hosted (this is the portability guarantee)
853
+
854
+ **Priority:** Long-term. Design the local daemon with multi-tenancy seams in mind from the start (don't hardcode single-user assumptions), but don't build the hosted layer until the local daemon is proven.
855
+
856
+ **Reference:** OpenClaw's channel/extension architecture is the best existing model for multi-integration connectivity. AutoGPT's block/trigger system is the best model for declarative integration configuration.
857
+
858
+ ---
859
+
860
+ ### Business model (tentative)
861
+
862
+ Three tiers:
863
+
864
+ | Tier | Who | Price | Notes |
865
+ |------|-----|-------|-------|
866
+ | **Personal / OSS** | Individual devs, open-source projects, non-commercial | Free forever | Builds community, reputation, workflow library. Never charge for this. |
867
+ | **Corporate self-hosted** | Companies running WorkRail on their own infrastructure | Paid license | Data never leaves their VPC. Enterprise buyers pay well for data sovereignty + compliance. Priced per seat or per org. |
868
+ | **WorkRail Auto (cloud)** | Anyone who wants managed, zero-ops | Paid subscription | Higher price, lower friction. Pre-configured integrations. |
869
+
870
+ **License model options:**
871
+ - **Dual-license:** AGPL for open-source use (anyone can use but must open-source modifications), commercial license for everyone else who doesn't want AGPL obligations. Clean legal distinction.
872
+ - **BSL-style:** Core is source-available, commercial use requires a license after some threshold (employees, revenue, or deployment count). HashiCorp's original model before the community backlash -- careful with this one.
873
+ - **MIT core + paid features:** Core engine stays MIT forever, advanced features (hosted dashboard, enterprise SSO, multi-tenant credential vault, audit logs) are paid. Keeps the community trust, monetizes the enterprise layer.
874
+
875
+ **The corporate self-hosted market is often the most lucrative.** Enterprises pay well for "runs in our VPC, vendor can't see our code." GitLab, Grafana, Jira -- all built significant businesses on self-hosted enterprise licenses before or alongside their cloud offerings.
876
+
877
+ **What NOT to do:** Don't charge for the workflow library or the core MCP protocol. Those are the commons that make WorkRail valuable. Charge for the infrastructure layer, not the knowledge layer.
878
+
879
+ **Priority:** Don't worry about this until there are users. Get the product right first.
880
+
881
+ ---
882
+
883
+ ### Competitive landscape findings (Apr 14, 2026)
884
+
885
+ **WorkRail occupies a nearly empty quadrant:** durable session state + cryptographic step enforcement + MCP-native. No other tool currently has all three.
886
+
887
+ ```
888
+ ENFORCEMENT STRENGTH
889
+ Weak (Prompt) Strong (Structural)
890
+ ┌─────────────────────┬──────────────────────────┐
891
+ Yes │ nexus-core │ WorkRail ← HERE │
892
+ DURABLE │ LangGraph+LangSmith │ Temporal.io (not MCP) │
893
+ STATE │ CIAME contracts │ mcp-graph (closest) │
894
+ ├─────────────────────┼──────────────────────────┤
895
+ No │ CLAUDE.md files │ CrewAI, AutoGen │
896
+ │ maestro, ADbS │ LangGraph (standalone) │
897
+ └─────────────────────┴──────────────────────────┘
898
+ ```
899
+
900
+ **Key findings:**
901
+
902
+ - **mcp-graph** (DiegoNogueiraDev) -- SQLite-backed MCP server with graph-based step locking. Closest external analog. Not cryptographic enforcement but worth watching.
903
+ - **LangGraph + LangSmith** -- Durable (thread-IDs + Postgres) but prompt-based enforcement. Top-left quadrant, not top-right. **Watch condition:** if LangGraph adds MCP-server exposure, the MCP-native moat shrinks. Response: lean harder on JSON-authored + token-gated.
904
+ - **Temporal.io** -- Different domain (code-defined workflows, Go), different users. Low competitive concern but high architectural learning value for event-sourcing and crash recovery. Study it.
905
+ - **CrewAI / AutoGen / nexus-core** -- No durability, no structural enforcement. Not in the same quadrant.
906
+
907
+ **Internal finding -- most actionable:**
908
+ The **CIAME team** (Samuel Pérez, `samuelpe@`) is building WorkRail's exact problem manually in markdown (`rs-sdk-agent-execution-contract.md` -- execution contracts for AI agents). Most concrete internal adoption candidate. Direct cold share hook: "you're building this by hand, here's the tool."
909
+
910
+ **Positioning anchor:** "If you know Temporal.io, WorkRail is Temporal for AI agent process governance via MCP."
911
+
912
+ **Two immediate internal actions:**
913
+ 1. List WorkRail in the Zodiac AI Marketplace + ZG AI Tools Catalog
914
+ 2. DM Samuel Pérez (CIAME team) -- strongest cold share candidate alongside Peter Yao
915
+
916
+ ---
917
+
918
+ ### Deep dive findings: all reference architectures (Apr 14-15, 2026)
919
+
920
+ Research complete on all reference projects. Design docs written to `docs/design/` and `docs/ideas/`. Key findings per source:
921
+
922
+ ---
923
+
924
+ #### OpenClaw findings (design-openclaw-deep-dive.md)
925
+
926
+ **Channel abstraction:** `ChannelPlugin<ResolvedAccount>` -- one TypeScript interface, ~25 optional adapter slots, lazily loaded. WorkRail equivalent: `WorkRailIntegration<TConfig>`. Integration-agnostic daemon core.
927
+
928
+ **Skills:** Not a separate primitive -- `agentTools` slot on ChannelPlugin injects pi-mono typed tools at session start. **WorkRail workflows ARE the skill layer.** No separate skill system needed.
929
+
930
+ **Session persistence:** `AcpSessionStore` confirmed in-memory only (LRU, 5k sessions, 24h TTL, vanishes on crash). WorkRail's disk-persisted append-only store is strictly better.
931
+
932
+ **Delivery binding:** Bind the delivery target (MR iid, Jira key, Slack thread) at spawn time, not completion time. `DeliveryRouter.resolve(triggerSource)` at completion. WorkRail: store `TriggerSource` when session starts.
933
+
934
+ **Credential model:** `$secret` refs with `file:path`, `exec:command` (enables 1Password CLI, Bitwarden, Keychain), env var. Adopt nearly verbatim.
935
+
936
+ **DaemonRegistry shape:** `RuntimeCache` (`Map<actorKey, {runtime, handle, lastTouchedAt}>`) + `RunStateMachine` for heartbeat. Extend with `continueToken` + `checkpointToken` + `persistTokens()` for WorkRail.
937
+
938
+ ---
939
+
940
+ #### nexus-core findings
941
+
942
+ **Org profile system:** `configs/profiles/zillow.yaml` declares CLI tool bindings (glab vs gh, acli vs jira-cli). WorkRail: `workrail profile apply <org>` writes `~/.workrail/config.json` with active integration bindings.
943
+
944
+ **Skill loading:** Three-mirror layout (`.claude/skills/`, `skills/`, `.agents/skills/`) with symlink-based plugin discovery. Core always wins. WorkRail: `~/.workrail/plugins/` with `workrail-plugin.yaml` manifest.
945
+
946
+ **SOUL.md:** Behavioral principles injected into agent system prompts. WorkRail Auto should ship a `SOUL.md` equivalent in daemon session system prompts -- agent character beyond workflow steps. "Evidence before assertion" = WorkRail's enforcement principle as a behavioral norm.
947
+
948
+ **Session lifecycle hooks:** JSON stdin/stdout protocol (`{session_id, reason, transcript_path}`). Maps to WorkRail daemon: init (inject ancestry, register in DaemonRegistry, acquire lock) → end (write checkpointToken atomically, release lock, post results to trigger source).
949
+
950
+ **Knowledge injection:** `inject-knowledge.sh` -- before Claude API call, inject: ancestry recap + `~/.workrail/knowledge/` + repo-specific `.workrail/context.md`. Cap at N lines (200 default). SHORT_NAME matching for repo-relevant selection.
951
+
952
+ **Skill-as-git-history:** Each skill evolves through atomic commits traceable to session context. WorkRail: session notes improve workflows via `workflow-for-workflows`.
953
+
954
+ ---
955
+
956
+ #### pi-mono findings (docs/design/pi-mono-integration-discovery.md)
957
+
958
+ **`agent.state` returns a snapshot, not live reference.** Must reassign: `agent.state.messages = [...agent.state.messages, newMsg]`.
959
+
960
+ **Tools must throw on failure** -- never encode errors in content. LLM sees and can retry.
961
+
962
+ **`agent.followUp()` is the termination pattern** -- `continue_workflow` tool calls `agent.followUp(buildStepPrompt(result.step, continueToken))`. `isComplete` captured in closure drives `getFollowUpMessages` returning `[]` to exit naturally.
963
+
964
+ **Token persistence via `afterToolCall`** -- write `continueToken` + `checkpointToken` to `~/.workrail/daemon-state.json` atomically before returning tool result.
965
+
966
+ **Console streaming:** Subscribe to `message_update` events where `assistantMessageEvent.type === "text_delta"`. Push over SSE/WebSocket. `tool_execution_start/end` drive tool progress indicators.
967
+
968
+ **`mom` dispatch model:** One `Agent` instance per session (not per trigger). `ChannelQueue` (KeyedAsyncQueue) serializes messages per channel. WorkRail: one `Agent` per daemon session, reconstructed from WorkRail event log on each run.
969
+
970
+ Full tool registration TypeScript in design doc.
971
+
972
+ ---
973
+
974
+ #### LangGraph findings (docs/ideas/langgraph-discovery.md)
975
+
976
+ **Time-travel checkpointing:** `CheckpointMetadata.source = "fork"` enables re-invoking from any historical `checkpoint_id`. This is the implementation pattern for WorkRail's "workflow rewind" backlog feature. WorkRail's event log already stores enough -- what's missing is branch-from-earlier-point API.
977
+
978
+ **`interrupt()` is a function, not middleware** -- raises `GraphInterrupt`, node re-runs from scratch on resume (requires idempotency). WorkRail's design is cleaner -- step advances, doesn't re-execute. WorkRail's HMAC token can't be faked; LangGraph's interrupt can be bypassed.
979
+
980
+ **Streaming is a `(namespace, mode, data)` triple** -- includes subgraph namespace path. Right format for WorkRail Auto's console SSE events. pi-mono's `agent.subscribe()` is the direct equivalent.
981
+
982
+ **Multi-tenancy is soft** -- metadata-filter-based, no per-tenant schema isolation. A bug in an auth handler leaks cross-tenant data. **WorkRail's opportunity: structural per-org storage roots from day one.**
983
+
984
+ **`Workflow + Session + Run` hierarchy confirmed at scale** -- right entity model for WorkRail Auto cloud.
985
+
986
+ ---
987
+
988
+ #### Temporal.io findings
989
+
990
+ **Event-sourcing model:** Temporal workflows replay event history deterministically on each activation. `DeterminismViolationError` when code changes break replay compatibility. WorkRail already has this pattern in its event log + `replay.ts`. Key addition: Temporal's `Worker.runReplayHistories()` for batch testing workflow code changes against production history before deploying.
991
+
992
+ **Activity/workflow separation:** Workflows = deterministic orchestration (no side effects, must be pure). Activities = side-effectful work (API calls, file I/O, non-deterministic ops). WorkRail's current design conflates these -- workflow steps can have side effects. For WorkRail Auto, this distinction matters: the daemon's `runWorkflow()` loop is the "workflow" (deterministic step sequencer), and each tool execution is an "activity" (side-effectful). Not a blocking design change, but a useful mental model.
993
+
994
+ **Worker polling vs webhook push:** Temporal workers poll a task queue; WorkRail uses webhook push. Both are valid. Worker polling is better for cloud/multi-tenant (workers can scale independently, no direct webhook routing needed). WorkRail Auto local: webhooks are simpler. WorkRail Auto cloud: task queue model worth adopting.
995
+
996
+ **Workflow versioning:** `patched()` / `deprecatePatch()` pattern for evolving running workflows. WorkRail has no equivalent. Minimal needed: workflow definition hash pinning (already done via `workflowHash`), plus a mechanism to continue old sessions on old workflow versions while new sessions use new versions. Not MVP but important for production.
997
+
998
+ **Namespace isolation:** Per-org Temporal namespaces with separate history and quota. WorkRail Auto cloud: per-org data dirs (`~/.workrail/orgs/<orgId>/`) from day one. No shared state between orgs.
999
+
1000
+ **Schedule client:** Temporal's `ScheduleClient` has `ScheduleOverlapPolicy` (SKIP, BUFFER_ONE, BUFFER_ALL, CANCEL_OTHER, ALLOW_ALL). WorkRail's cron trigger needs the same overlap policy -- what happens if a scheduled run is still running when the next one fires?
1001
+
1002
+ ---
1003
+
1004
+ #### AutoGPT findings
1005
+
1006
+ **Block abstraction:** `BlockType` enum includes `WEBHOOK`, `HUMAN_IN_THE_LOOP`, `MCP_TOOL`, `AGENT`, `AI`. Each block has typed `Input`/`Output` schemas (Pydantic). `BlockWebhookConfig` for trigger blocks. This is the right abstraction for WorkRail Auto's integration layer -- every integration (GitHub trigger, Jira action, Slack message) is a typed `WorkRailBlock<TInput, TOutput>`.
1007
+
1008
+ **`HUMAN_IN_THE_LOOP` block type:** AutoGPT has this as a first-class concept. Directly maps to WorkRail Auto's approval-gate feature -- workflow steps that pause for human confirmation before proceeding.
1009
+
1010
+ **mcp-graph:** Repo not found at `DiegoNogueiraDev/mcp-graph` -- may have been deleted or renamed. The competitive scan agent may have surfaced a different project. Not a concern -- WorkRail has no close competitors in its quadrant.
1011
+
1012
+ ---
1013
+
1014
+ #### Key synthesis: what to build vs import
1015
+
1016
+ | Component | Decision | Source |
1017
+ |-----------|----------|--------|
1018
+ | Agent loop | Import `@mariozechner/pi-agent-core` | pi-mono |
1019
+ | LLM providers | Import `@mariozechner/pi-ai` | pi-mono |
1020
+ | Channel abstraction | Build `WorkRailIntegration<TConfig>` | OpenClaw pattern |
1021
+ | Credential system | Build `$secret` resolver | OpenClaw pattern |
1022
+ | Delivery binding | Build `TriggerSource` + `DeliveryRouter` | OpenClaw pattern |
1023
+ | DaemonRegistry | Build `RuntimeCache` shape | OpenClaw pattern |
1024
+ | Session lifecycle | Build `session-init` / `session-end` hooks | nexus-core pattern |
1025
+ | Knowledge injection | Build `buildDaemonSystemPrompt()` | nexus-core pattern |
1026
+ | SOUL.md | Build daemon behavioral principles | nexus-core pattern |
1027
+ | Console streaming | Build SSE with `(namespace, mode, data)` triple | LangGraph pattern |
1028
+ | Approval gates | Build `HUMAN_IN_THE_LOOP` block type | AutoGPT pattern |
1029
+ | Overlap policy | Build cron trigger overlap config | Temporal pattern |
1030
+ | Namespace isolation | Build per-org storage roots | Temporal + LangGraph |
1031
+ | Workflow versioning | Defer -- hash pinning sufficient for MVP | Temporal insight |
1032
+ | Activity/workflow split | Defer -- useful mental model, not blocking | Temporal insight |
1033
+ | Time-travel rewind | Defer -- fork-from-checkpoint API | LangGraph insight |
1034
+
1035
+ **AutoGPT + mcp-graph-workflow additional findings (from design agent):**
1036
+
1037
+ **AutoGPT trigger declaration pattern:** Three-layer design: block declares schema + `webhook_config`; `WebhooksManager` handles registration + payload validation; payload flows in as hidden `Input.payload` field. Distinction between auto-register (`BlockWebhookConfig`) and user-configured (`BlockManualWebhookConfig`) is exactly the pattern WorkRail's trigger system needs.
1038
+
1039
+ **Fernet credential encryption:** `encrypt(data: dict) -> str` / `decrypt(str) -> dict` using symmetric key. 40 lines. WorkRail's `CredentialStore` should be a direct port.
1040
+
1041
+ **Acquire-at-execution injection:** Credentials fetched just before step execution, injected as typed objects, held under lock for duration, released in `finally`. Acquire-inject-release contract for WorkRail's step runner.
1042
+
1043
+ **`SecretStr` type enforcement:** Wrap secrets in an opaque type (branded type or `class Secret<T>` in TypeScript) that prevents accidental logging.
1044
+
1045
+ **mcp-graph-workflow `resource_locks` SQLite table:** `leaseToken + agentId + expiresAt + TTL auto-expiry`. Upgrade `LocalSessionLockV2` from PID-file to SQLite lock table. Adds multi-process safety without Redis. Directly addresses the workerId bug already fixed on `feat/session-lock-worker-id`.
1046
+
1047
+ **`leaseToken` for subagent step claiming:** `start_task` returns `leaseToken`; `finish_task` requires it. WorkRail subagent delegation: coordinator passes leaseToken, subagent includes in `continue_workflow` context, engine validates.
1048
+
1049
+ **`nextAction` in every tool response:** mcp-graph appends `_lifecycle.nextAction` to every MCP response. WorkRail: add typed `nextAction` field to `continue_workflow` responses (parsed step summary, suggested tool, context keys) -- complement to HMAC enforcement.
1050
+
1051
+ **mcp-graph-workflow vs WorkRail honest comparison:** mcp-graph has SQLite persistence, lifecycle phases, gate checks, multi-agent task claiming, RAG, knowledge store. It's in "durable + advisory enforcement" quadrant. WorkRail's moat: cryptographic enforcement (mcp-graph is advisory -- agents CAN call tools out of sequence), checkpoint/resume tokens, workflow composition DSL, DAG visualization. Not marginal differences.
1052
+
1053
+ **Temporal/Prefect/Dagster additional findings (full discovery agent):**
1054
+
1055
+ **Central insight -- Temporal's replay model is NOT applicable to WorkRail.** Temporal's event-sourcing depends on deterministic code. AI agent tool calls are inherently non-deterministic. WorkRail's checkpoint token + append-only session store is already the right architecture. "Temporal for AI agent process governance" is valid as an analogy -- take Temporal's invariants, not its mechanisms.
1056
+
1057
+ **Workflow versioning is already solved.** `PinnedWorkflowStorePortV2` + `workflowHash` verified in `src/mcp/handlers/v2-advance-core/outcome-success.ts` (line 57) and `src/mcp/handlers/v2-workflow.ts` (lines 460-463). Deploy-safe in-flight sessions are fully handled. No new code needed.
1058
+
1059
+ **Trigger system from Dagster sensor cursor model (~200 LOC):** `TriggerSourcePortV2<TEvent, TCursor>` port + `TriggerCursorStore` + `CronTrigger` + `GitLabMRTrigger`. Trigger event ID used as workflowId for idempotency (Dagster's `run_key` pattern -- prevents double-fire after daemon restarts). Prefect's lookahead pre-insertion: `CronTrigger.poll()` computes all missed ticks since last cursor and fires them as separate sessions.
1060
+
1061
+ **Human approval gates (post-daemon-MVP, ~200 LOC + schema):** Three new typed domain events (`step_approval_pending/received/timeout`) + REST endpoint with HMAC-signed approval token. Requires workflow schema change. Build after autonomous daemon is proven.
1062
+
1063
+ **Daemon crash recovery (~80 LOC, build first):** `DaemonStateStore` port -- atomic write of `{ sessionId, continueToken, stepIndex, approvalGate? }` to `~/.workrail/daemon-state.json` before every `continue_workflow`. Follows existing temp→fsync→rename pattern from session store. Out-of-band from session lock by design.
1064
+
1065
+ **Temporal-to-WorkRail mapping confirmed:**
1066
+ - Event history → append-only session event log ✅ (exists)
1067
+ - Workflow task token → `ct_`/`st_` checkpoint token ✅ (exists)
1068
+ - `condition(fn, timeout)` human gate → `approvalGate` step + REST resume (to design)
1069
+ - Activity heartbeat → `requiredEvidence` field (to implement)
1070
+ - Deployment versioning → `PinnedWorkflowStorePortV2` + `workflowHash` ✅ (verified)
1071
+ - Namespace → `orgId` prefix in `dataDir` + credential vault (cloud tier)
1072
+ - Worker long-polling → direct in-process engine calls ✅ (daemon model)
1073
+
1074
+ **Temporal additional findings (third agent, deepest source read):**
1075
+
1076
+ **WorkRail's JSON model eliminates Temporal's entire determinism complexity class.** Temporal's VM isolation, `DeterminismViolationError`, `patched()`, and replay machinery exist because Temporal workflows are user TypeScript code. WorkRail workflows are JSON interpreted by the engine -- no determinism problem. Genuine architectural advantage, not a gap.
1077
+
1078
+ **Minimum additions to WorkRail schema:**
1079
+ - `versioningBehavior: "PINNED" | "AUTO_UPGRADE"` -- PINNED keeps in-flight sessions on current workflow version; AUTO_UPGRADE migrates to latest on next continue_workflow
1080
+ - `orgId` in session store paths: `~/.workrail/sessions/<orgId>/` with startup migration (needed for multi-tenancy from day one)
1081
+
1082
+ **Human-in-loop signal pattern:** Temporal's `setHandler(signal, handler)` + `condition(fn)` is the right mental model for WorkRail's approval gates. Buffer incoming signals (approval/rejection), `condition()` unblocks when buffer has a matching signal. Translate to WorkRail: daemon emits `step_approval_pending` event, REST endpoint receives approval, emits `step_approval_received`, daemon's `condition()` equivalent unblocks `continue_workflow`.
1083
+
1084
+ **Triggers in workflow schema (dedicated sprint):** `triggers: DeploymentTrigger[]` inline in workflow JSON with `posture: "reactive" | "proactive"` + optional `schedule_after` delay. Prefect's `automations.py` deployment trigger pattern. Not MVP.
1085
+
1086
+ **Worker polling seam:** Design the trigger port with `poll()` interface now even though self-hosted uses webhooks. Cloud deployment uses long-poll task queue without architectural changes.
1087
+
1088
+ **AutoGPT + mcp-graph-workflow CORRECTION (deepest agent, read actual source):**
1089
+
1090
+ **mcp-graph-workflow is NOT a close WorkRail analog.** Earlier characterization was wrong. It's a local-first SQLite-backed MCP server that converts PRD docs into execution graphs with a fixed 9-phase lifecycle. Its gates are advisory and bypassable (`force:true` parameter). WorkRail's HMAC tokens are cryptographic and unbypassable. Different quadrants, different trust models.
1091
+
1092
+ **mcp-graph does better that's worth watching:** Local RAG context compression (70-85% via BM25 + ONNX embeddings, zero cloud) -- relevant to WorkRail's future context survival. AST code intelligence (out of scope but useful for coding-task workflows).
1093
+
1094
+ **Concrete WorkRail Auto trigger system design (from AutoGPT + validation):**
1095
+
1096
+ Three-layer model: declare → register → execute.
1097
+
1098
+ ```typescript
1099
+ interface TriggerDefinition {
1100
+ id: string;
1101
+ provider: string; // "github" | "gitlab" | "jira" | "cron" | "generic"
1102
+ triggerType: string; // provider-specific
1103
+ resourceTemplate: string; // "{owner}/{repo}"
1104
+ eventFilter: Record<string, boolean>;
1105
+ credentialRef?: string; // keyring named ref -- never plaintext
1106
+ workflowId: string;
1107
+ contextMapping?: ContextMapping; // optional JSONPath payload → workflow context
1108
+ }
1109
+ ```
1110
+
1111
+ **The generic provider alone is a complete MVP.** Any system that can send HTTP POST can trigger a WorkRail workflow. GitLab, Jira, Slack, PagerDuty all work without provider-specific code. Auto-registration is post-MVP.
1112
+
1113
+ **Port:** 3200 (separate from MCP 3100). **Feature flag:** `wr.features.triggers`.
1114
+
1115
+ **MVP build order:** `trigger-store.ts` → `trigger-listener.ts` → `trigger-router.ts` → `providers/generic.ts` → `providers/cron.ts` → MCP CRUD tools.
1116
+
1117
+ **Credential model:** keyring-based named refs. Two backends: OS keychain (dev) + encrypted env-file (Docker/CI/headless). Never plaintext in trigger definitions.
1118
+
1119
+ Full design at: `docs/design/workrail-auto-trigger-system.md`
1120
+
1121
+ **CORRECTION: pi-mono termination bridge (third agent, deepest read):**
1122
+
1123
+ **`getFollowUpMessages()` is the WRONG termination bridge.** Earlier finding was incorrect. Correct approach:
1124
+
1125
+ - Use `agent.steer()` for step injection -- fires after each tool batch, inside the inner loop
1126
+ - `followUp()` only fires when agent would otherwise stop -- adds an unnecessary extra LLM turn per workflow step
1127
+ - **Termination:** simply don't call `steer()` when workflow is complete. Agent stops naturally.
1128
+
1129
+ **Correct daemon runner pattern (from mom's `createRunner()`):**
1130
+ - Subscribe to agent once at daemon session creation
1131
+ - Mutable `runState` reset per run (in closure)
1132
+ - `agent.steer()` injects next step after each tool batch
1133
+ - When `isComplete=true` from `continue_workflow`, stop calling `steer()` -- agent exits cleanly
1134
+
1135
+ **`abort()` is best-effort** for synchronous engine operations (SQLite/HMAC can't be interrupted). Don't rely on it for immediate cancellation.
1136
+
1137
+ **Claude Code deep dive -- THREE CORRECTIONS to backlog (deepest source read, 11 files):**
1138
+
1139
+ **Correction 1: Session memory injection does NOT work for daemon mode.** The session memory file is Claude Code-internal, at a path only Claude Code controls. WorkRail's daemon calls Anthropic API directly via pi-mono -- there is no Claude Code session memory file. **Daemon mode must use system prompt injection:** prepend `<workrail_session_state>` XML block to system prompt before each `agentLoop()` call (last 3 step note summaries, ~200 tokens each).
1140
+
1141
+ **Correction 2: PreCompact hooks do NOT fire for Tier 1 (Session Memory Compaction).** `trySessionMemoryCompaction()` runs before hooks are invoked. When Tier 1 succeeds, PreCompact hooks are never called. Hooks only cover Tier 2 (legacy/reactive) compaction.
1142
+
1143
+ **Correction 3: `sessionRunner.ts` is NOT the daemon pattern.** It's Claude.ai web UI's bridge for controlling a local Claude CLI subprocess. WorkRail's daemon calls Anthropic API directly.
1144
+
1145
+ **Correct integration architecture:**
1146
+
1147
+ For **human-driven sessions (Claude Code + WorkRail MCP):**
1148
+ ```
1149
+ PreCompact hook → output step notes as custom compaction instructions
1150
+ PostToolUse hook (Bash|Write|Edit) → log tool calls to evidence NDJSON file
1151
+ PreToolUse hook (continue_workflow) → check evidence log; deny if required evidence missing
1152
+ ```
1153
+ Evidence gate is fail-open when log missing.
1154
+
1155
+ For **daemon mode (WorkRail daemon + pi-mono):**
1156
+ ```
1157
+ Before each agentLoop() call: prepend <workrail_session_state> XML to system prompt
1158
+ Evidence gate: in-process check before executeContinueWorkflow() -- reads tool_call_observed
1159
+ events from session store (stronger than hook-based, no subprocess reliability concern)
1160
+ ```
1161
+
1162
+ In-process evidence gate is architecturally superior for daemon mode -- direct session store reads, no subprocess IPC.
1163
+
1164
+ **OpenClaw final findings (deepest agent, 15+ source files):**
1165
+
1166
+ **`KeyedAsyncQueue` is FIRST prerequisite -- build before daemon runner.** Prevents token corruption when multiple triggers fire concurrently. 30-80 LOC to reimplement from `src/acp/control-plane/session-actor-queue.ts`.
1167
+
1168
+ **`TriggerPlugin<TConfig, TCredentials>` interface:** Phase 1 MVP -- typed interface + `TRIGGER_REGISTRY = new Map<TriggerId, TriggerPlugin>` + factory credential resolution. ~300 LOC. DI injection deferred to Phase 2 when test coverage needed. Use branded `TriggerId` string type (not closed union) -- extensible without recompile.
1169
+
1170
+ **`deliveryContext` persistence (~20 LOC):** Store routing info (MR iid, Jira key, Slack thread) at session creation in session store. Crash recovery: on restart, `DeliveryRouter.resolve(deliveryContext)` knows where to post results.
1171
+
1172
+ **`TaskNotifyPolicy` enum:** `done_only` / `state_changes` / `silent` -- 5 LOC, adopt verbatim for trigger notification behavior config.
1173
+
1174
+ **`DaemonRegistry.snapshot()`:** `RuntimeCache` equivalent (~50 LOC) feeding console live view API. `snapshot()` returns current running sessions; `collectIdleCandidates()` for GC.
1175
+
1176
+ **Pre-implementation checklist before any trigger code:**
1177
+ 1. `KeyedAsyncQueue` (prerequisite, ~50 LOC)
1178
+ 2. Branded `TriggerId` type
1179
+ 3. `never` branch in startup switch over `TriggerInboundAdapter.kind`
1180
+ 4. `deliveryContext` stored at session start
1181
+
1182
+ ---
1183
+
1184
+ ## Ultimate MVP -- non-blocking build order
1185
+
1186
+ Everything researched. Build order that ships fastest without blocking future:
1187
+
1188
+ **Step 1 ✅ DONE:** `LocalSessionLockV2` workerId + instanceId fix (merged to main)
1189
+
1190
+ **Step 2: `KeyedAsyncQueue` (~50 LOC)**
1191
+ Concurrent session serialization. Prerequisite for daemon safety. Prevents token corruption. Re-implement from OpenClaw pattern, don't import.
1192
+
1193
+ **Step 3: `src/daemon/workflow-runner.ts` (~150 LOC)**
1194
+ - `runWorkflow(trigger, apiKey)` calls engine directly (in-process, shared DI)
1195
+ - `@mariozechner/pi-agent-core` `Agent` class as the loop
1196
+ - `agent.steer()` for step injection after each tool batch (NOT `followUp()`)
1197
+ - Persist `continueToken` + `checkpointToken` to `~/.workrail/daemon-state.json` atomically before each step
1198
+ - `isComplete=true` → stop calling `steer()` → agent exits naturally
1199
+ - Register `start_workflow`, `continue_workflow`, `Bash`, `Read`, `Write` as `AgentTool<T>` with TypeBox schemas
1200
+ - Inject `<workrail_session_state>` XML block in system prompt (last 3 step note summaries, ~200 tokens each)
1201
+
1202
+ **Step 4: Trigger webhook server (~300 LOC)**
1203
+ - `TriggerPlugin<TConfig, TCredentials>` interface + `TRIGGER_REGISTRY` Map
1204
+ - `POST /webhook/generic` -- accepts any JSON payload on port 3200
1205
+ - `triggers.yml` config: `workflowId`, optional `contextMapping` (JSONPath payload → context)
1206
+ - HMAC signature verification, async queue, 202 response
1207
+ - Feature flag: `wr.features.triggers`
1208
+ - Generic provider works for ALL integrations out of the box
1209
+
1210
+ **Step 5: Console live view (~3 files, no new routes)**
1211
+ - `context_set(is_autonomous: true)` event at session start
1212
+ - Ephemeral `DaemonRegistry` with `lastHeartbeatMs`
1213
+ - `[ LIVE ]` pulsing badge in session list
1214
+
1215
+ **What stays non-blocking:**
1216
+ - In-process daemon → cloud HTTP client is a transport swap, not a rewrite
1217
+ - `TriggerSourcePortV2` has `poll()` interface → worker polling for cloud is config, not code
1218
+ - Per-org session paths (`~/.workrail/sessions/default/`) → adding orgId prefix later is migration
1219
+ - Feature flags on everything → merge incrementally
1220
+ - Generic webhook → all integrations (GitLab, Jira, Slack) via config, zero code per integration
1221
+
1222
+ ---
1223
+
1224
+ ### Daemon context customization (implemented Apr 15, 2026)
1225
+
1226
+ **`~/.workrail/daemon-soul.md`** -- operator-customizable agent rules injected into every daemon session system prompt. Analogous to nexus-core's `SOUL.md` and Common-Ground's `AGENTS.md`. Default created on first run with commented instructions. Override per-workspace or globally.
1227
+
1228
+ **Auto-inject `AGENTS.md` / `CLAUDE.md`** -- daemon scans `workspacePath` for `.claude/CLAUDE.md`, `CLAUDE.md`, `AGENTS.md`, `.github/AGENTS.md` (in priority order) and injects into system prompt under `## Workspace Context`. Combined 32KB limit, truncated with notice if over. Enables the daemon to adapt to different repos' coding standards and conventions automatically -- same as how Claude Code uses these files.
1229
+
1230
+ **Daemon calls `start_workflow` directly** -- removes the "Call the start_workflow tool now" LLM indirection. Daemon calls `executeStartWorkflow()` directly, gets step 1, passes it as the initial prompt. More reliable, cheaper (one fewer LLM turn), and the agent starts working immediately instead of being told to call a tool.
1231
+
1232
+ ---
1233
+
1234
+ ### WorkTrain onboarding: `worktrain init` guided setup (high priority, post-MVP)
1235
+
1236
+ **Goal:** A guided CLI onboarding that sets up everything WorkTrain needs to work well, asked once, never asked again.
1237
+
1238
+ **What it configures (in order):**
1239
+
1240
+ 1. **LLM provider** -- Bedrock (AWS SSO profile) or direct Anthropic API key. Validates the credentials actually work before proceeding.
1241
+ 2. **Workspace** -- default workspacePath for daemon sessions. Offer to auto-detect from git repos in common locations.
1242
+ 3. **Daemon soul** -- create `~/.workrail/daemon-soul.md` interactively. Ask: "What language/framework does your main project use? Any coding conventions the agent should follow? Commit style?" Write the soul file from answers.
1243
+ 4. **Trigger configuration** -- set up the first trigger. Ask: what workflow? (list available) what webhook source? (GitHub/GitLab/Jira/manual) Configure `triggers.yml`.
1244
+ 5. **Common-Ground** -- if detected, offer to sync the team's AGENTS.md and workflows.
1245
+ 6. **Notification** -- optional Slack/Telegram webhook for session completion/failure notifications.
1246
+ 7. **Verification** -- fire a smoke-test workflow (cheap, non-destructive) to confirm end-to-end works. Show the result.
1247
+
1248
+ **Design principles:**
1249
+ - Skip sections that are already configured (idempotent)
1250
+ - `--reconfigure <section>` to re-run a specific section
1251
+ - All answers stored in `~/.workrail/config.json` (already exists) + `daemon-soul.md` + `triggers.yml`
1252
+ - Should complete in under 5 minutes for a typical setup
1253
+ - The soul questionnaire is the most important part -- a well-written soul dramatically improves output quality
1254
+
1255
+ **Longer term:** A WorkTrain hosted onboarding that teams can share via a URL (`worktrain init --from https://worktrain.io/teams/mercury-mobile`) -- imports team-specific soul, triggers, and workflow config in one command.
1256
+
1257
+ ---
1258
+
1259
+ ### Post-update onboarding: contextual feature announcements
1260
+
1261
+ **Goal:** When WorkTrain updates to a new version with significant new capabilities, it prompts the user to configure the new feature -- once, the first time they run after updating.
1262
+
1263
+ **How it works:**
1264
+
1265
+ Each significant feature ships with a `migration step` keyed to a minimum version:
1266
+ ```json
1267
+ // ~/.workrail/config.json
1268
+ {
1269
+ "onboardingCompleted": "3.17.0",
1270
+ "featureStepsCompleted": ["daemon-soul", "bedrock-setup", "triggers-v2"]
1271
+ }
1272
+ ```
1273
+
1274
+ On startup, WorkTrain checks: current version > `onboardingCompleted`? Any new `featureSteps` not in `featureStepsCompleted`? If yes, run those steps interactively before continuing.
1275
+
1276
+ **What triggers a feature onboarding step:**
1277
+ - New capability that requires user configuration to activate (e.g. daemon soul file, Bedrock credentials, new trigger source)
1278
+ - Breaking change to config format that needs migration (e.g. triggers.yml schema v2)
1279
+ - Feature that's opt-in and valuable but off by default (e.g. AGENTS.md auto-injection)
1280
+
1281
+ **What does NOT trigger it:**
1282
+ - Bug fixes and performance improvements
1283
+ - New workflows added to the library
1284
+ - Any change that works without user input
1285
+
1286
+ **Tone:** Brief, useful, never annoying. Each step should take < 60 seconds. Show what changed, ask what's needed, confirm it works. Skip if already configured.
1287
+
1288
+ **Example:**
1289
+ ```
1290
+ WorkTrain updated to v4.1.0 ✦ One new capability to configure:
1291
+
1292
+ Workspace Context Injection
1293
+ WorkTrain can now automatically read AGENTS.md and CLAUDE.md from
1294
+ your repos and inject them into every agent session.
1295
+
1296
+ → Your workspaces will be scanned automatically. No action needed.
1297
+ → To add custom rules for all sessions: ~/.workrail/daemon-soul.md
1298
+ (run: workrail init --section soul)
1299
+
1300
+ Press Enter to continue, or 's' to skip this setup.
1301
+ ```
1302
+
1303
+ ---
1304
+
1305
+ ### Multi-agent support: concurrent sessions + agent collaboration (high importance, post-MVP)
1306
+
1307
+ **Concurrent sessions (near-term):**
1308
+ WorkTrain should run multiple workflows in parallel -- different agents on different repos or different tasks simultaneously. The current architecture supports this (per-session state files, `KeyedAsyncQueue` serializes per trigger ID), but the global concurrency cap from the arch audit needs implementing:
1309
+ - `maxConcurrentSessions: N` config in `~/.workrail/config.json`
1310
+ - Global semaphore in `TriggerRouter` -- queues new dispatches when at capacity
1311
+ - Console shows all concurrent sessions in QueuePane
1312
+ - Mobile monitoring shows live count
1313
+
1314
+ **Agent collaboration on a single task (longer-term):**
1315
+ Multiple agents coordinating on one task. Two patterns:
1316
+
1317
+ 1. **Coordinator + worker subagents** -- already possible today via WorkRail's existing `mcp__nested-subagent__Task` delegation in workflow steps. A coordinator workflow spawns subagents with scoped tasks (e.g. one agent writes Android code, another writes iOS). Each subagent has its own WorkRail session and reports back to the coordinator.
1318
+
1319
+ 2. **Parallel agent teams** -- multiple agents working independently on separate parts of a task (e.g. separate feature branches) with a final merge/review step. Requires cross-repo execution and a workflow that understands how to partition and recombine work.
1320
+
1321
+ **MVP path:** Concurrent sessions with `maxConcurrentSessions` first (small change). Coordinator + subagent delegation second (already works, just needs workflow authoring). Full parallel teams is the longer-term investment.
1322
+
1323
+ ---
1324
+
1325
+ ### Core daemon design principle: scripts over agent (permanent)
1326
+
1327
+ **The agent is expensive, inconsistent, and slow. Scripts are free, deterministic, and instant.**
1328
+
1329
+ Any operation the daemon can perform with a shell script, git command, or API call should be done that way -- not delegated to the LLM. The agent's job is cognition: understanding the task, making decisions, writing code. Everything else is mechanical work that scripts do better.
1330
+
1331
+ **Concrete rule:** if an operation is deterministic and has no ambiguity, it is a script. Examples:
1332
+
1333
+ - `git add -A && git commit -m "..."` -- script (daemon reads the handoff artifact the agent produced and runs this itself)
1334
+ - `gh pr create --title "..." --body "..."` -- script (daemon reads PR title/body from the agent's handoff note)
1335
+ - running the build (`npm run build`, `gradle assembleDebug`) -- script
1336
+ - running tests (`npm test`, `./gradlew test`) -- script
1337
+ - reading a file to check if it exists -- script (use Read tool, not ask the agent)
1338
+ - detecting which workflow to run for a given trigger -- script (workflowId is in `triggers.yml`)
1339
+ - formatting output, writing JSON state files, sending HTTP requests -- scripts
1340
+
1341
+ **The agent only does what requires judgment:**
1342
+
1343
+ - understanding what files need to change and how
1344
+ - evaluating whether an approach matches the repo's patterns
1345
+ - generating commit messages and PR descriptions (because those require understanding the change)
1346
+ - deciding whether a test failure is a real issue or a flaky test
1347
+ - making tradeoff decisions when there are competing valid approaches
1348
+
1349
+ **Auto-commit and auto-PR design (near-term daemon work):**
1350
+
1351
+ The workflow's final step produces a structured handoff artifact with `commitType`, `commitScope`, `commitSubject`, `prTitle`, `prBody`, and `filesChanged`. The daemon reads this artifact after the workflow completes and runs git commands directly:
1352
+
1353
+ ```typescript
1354
+ // After runWorkflow() resolves successfully:
1355
+ const handoff = extractHandoffArtifact(result); // parse notes for the structured block
1356
+ if (handoff && triggerConfig.autoCommit) {
1357
+ await execa('git', ['add', ...handoff.filesChanged], { cwd: workspacePath });
1358
+ await execa('git', ['commit', '-m', handoff.commitMessage], { cwd: workspacePath });
1359
+ }
1360
+ if (handoff && triggerConfig.autoOpenPR) {
1361
+ await execa('gh', ['pr', 'create', '--title', handoff.prTitle, '--body', handoff.prBody], { cwd: workspacePath });
1362
+ }
1363
+ ```
1364
+
1365
+ `autoCommit` and `autoOpenPR` are opt-in flags in `triggers.yml`. Default off. The daemon never commits without explicit config.
1366
+
1367
+ **Why this matters for quality:** LLM-run git commands have non-deterministic output, can hallucinate flags, and burn tokens on mechanical work. A script-run commit is always correct, always fast, always auditable. The agent writes the message; the daemon runs the command. That split is the right architecture.
1368
+
1369
+ **Key open question:** When two agents work on the same repo concurrently, file conflicts are possible. The right answer is git worktrees -- each agent gets its own worktree, merges at the end. This is what the `cw` command does for human developers. WorkTrain should do the same autonomously.
1370
+
1371
+ ---
1372
+
1373
+ ### Workflow complexity routing: fast-path thoroughness and subagent offloading (design questions, Apr 15, 2026)
1374
+
1375
+ Three open questions that should be resolved before the lean.v2 workflow is considered stable for autonomous use:
1376
+
1377
+ ---
1378
+
1379
+ **Q1: Is one step enough for Small tasks?**
1380
+
1381
+ Currently: Small tasks take one step (phase-5-small-task-fast-path). That step now requires wiring verification, build, tests, and a handoff artifact. But it is still one LLM context doing everything.
1382
+
1383
+ The real risk is not the number of steps -- it is context overload within that one step. If the task is genuinely small (add a CLI flag, fix a one-line bug), one focused context is fine and lower cost. But if "Small" is being misclassified -- or if the task is technically small but requires non-obvious wiring across several files -- a single context is likely to miss things.
1384
+
1385
+ **Tentative answer:** the classification is the real gate, not the step count. The fix is making Phase 0 classify more conservatively and making it easier to reclassify upward after the fast path discovers unexpected scope. A `reclassifyToMedium` escape hatch in the fast path step (sets a context var that routes to phase-3 planning) would cover the "started small, turned out bigger" case without forcing every Small task through the full path.
1386
+
1387
+ ---
1388
+
1389
+ **Q2: Should Medium tasks get a dedicated path?**
1390
+
1391
+ Currently: Medium falls into the same non-Small path as Large, which includes the full design review, plan audit, and final verification loops. For genuinely Medium tasks (well-understood, moderate scope, low architectural uncertainty), that path is too heavy.
1392
+
1393
+ **Tentative answer:** add a QUICK rigor path for Medium. The existing `rigorMode=QUICK` conditions already skip the hypothesis, deep design, and plan audit steps -- so Medium+QUICK already produces a lighter path. The issue is that the workflow doesn't explicitly name "Medium fast path" anywhere. Document that `taskComplexity=Medium + rigorMode=QUICK` is the intended Medium track. No new steps needed -- just make the intended routing explicit in Phase 0 guidance.
1394
+
1395
+ ---
1396
+
1397
+ **Q3: Subagent offloading for classification and context gathering**
1398
+
1399
+ The main agent's context is expensive and degrades as it fills up. The right architecture is:
1400
+
1401
+ - **Phase 0 (classify)**: delegate to a cheap subagent. It reads the task description, scans relevant files, and returns: `taskComplexity`, `riskLevel`, `rigorMode`, `candidateFiles`, `invariants`. Main agent reviews and accepts/overrides. Cost: one cheap context instead of part of the main context.
1402
+
1403
+ - **Context gathering (phase-1)**: already delegates to `routine-context-gathering` subagents. That's the right model. The question is whether those subagents share results via a persistent layer (knowledge graph) or repeat sweeps every session.
1404
+
1405
+ - **Design review, plan audit, final verification**: already delegate to routine subagents. Good.
1406
+
1407
+ The main agent should own: decisions, synthesis, and implementation. Everything else should be offloaded.
1408
+
1409
+ **Dependency:** subagent offloading at scale requires a reliable handoff/knowledge-sharing system. Right now subagent results live in step notes and context variables -- ephemeral, per-session. If agents are going to stop repeating repo sweeps, something needs to persist knowledge between sessions.
1410
+
1411
+ ---
1412
+
1413
+ ### Knowledge graph for agent context (high importance, research needed, Apr 15, 2026)
1414
+
1415
+ **The problem:** every session starts with a full repo sweep. Context gathering subagents re-read the same files, re-trace the same call chains, re-identify the same invariants. This is expensive, slow, and scales badly as the codebase and team grow. The same problem appears in Storyforge (see `~/git/personal/storyforge/docs/architecture/design-notes/graph-memory-mcp.md`).
1416
+
1417
+ **The idea:** a persistent, derived knowledge graph that agents build incrementally and query instead of sweeping. Key properties from Storyforge's design thinking that apply directly to WorkRail:
1418
+
1419
+ - **Derived, not authoritative.** Source files are ground truth. The graph is a compiled/indexed view with provenance pointers back to source. Graph state never silently outranks a file read.
1420
+ - **Context bundles, not raw queries.** An agent doesn't query individual nodes -- it requests a context bundle: "give me everything relevant to `src/trigger/trigger-router.ts` for a bug investigation." The graph assembles and returns one scoped bundle.
1421
+ - **Provenance on every fact.** Every node/edge records: which file it came from, which session created it, which agent, when. Stale facts are detectable.
1422
+ - **Incremental, session-driven updates.** After each session completes, the daemon updates the graph with what the agent learned (new files read, new relationships traced, new invariants recorded). The graph grows session by session without requiring a full sweep.
1423
+
1424
+ **Node types for a code knowledge graph:**
1425
+ - `file` (path, language, last_modified, last_indexed)
1426
+ - `symbol` (function, class, type, constant -- with file + line)
1427
+ - `call_edge` (caller -> callee with file/line provenance)
1428
+ - `invariant` (named constraint with the files it spans)
1429
+ - `workflow_session` (what task was done, which files changed, what was found)
1430
+ - `dependency` (npm/gradle package with version)
1431
+ - `test` (test file -> symbols under test)
1432
+
1433
+ **Edge types:**
1434
+ - `imports`, `calls`, `exports`, `implements`, `extends`
1435
+ - `tested_by`, `modified_in_session`, `invariant_spans`
1436
+ - `depends_on`, `registered_in` (DI container, CLI map, router)
1437
+
1438
+ **What this solves for WorkRail:**
1439
+ - Context gathering drops from "sweep 200 files" to "query the graph for the relevant subgraph + fetch the 5-10 source files that are actually going to change"
1440
+ - Agents can ask "what other files import `trigger-router.ts`?" in one graph query instead of a grep sweep
1441
+ - The wiring check in the fast path becomes: "query the graph for all registrations of type `CliCommand`, confirm the new command is in the set" -- not "read index.ts, cli.ts, and hope you find all the entry points"
1442
+ - Session history is queryable: "what sessions touched `session-lock` in the last 30 days?" -- useful for debugging and for not re-investigating known issues
1443
+
1444
+ **The target architecture: vector + graph hybrid (not just a relational index)**
1445
+
1446
+ The knowledge graph vision is more than a queryable symbol index. The real goal is a system where an agent asks "give me everything related to trigger-router.ts" and the system surfaces things that are *semantically relevant* -- not just things explicitly linked by import edges, but files that implement the same pattern, functions with similar signatures, sessions that touched related concepts. This is closer to a neural network for knowledge than to a SQL database.
1447
+
1448
+ This requires two complementary layers:
1449
+
1450
+ **Layer 1: Structural graph (hard edges, deterministic)**
1451
+ Built by parsing the codebase. Captures known, explicit relationships:
1452
+ - `imports`, `calls`, `exports`, `implements`, `extends`
1453
+ - `registers_in` (DI container, CLI command map, router)
1454
+ - `tested_by`, `modified_in_session`
1455
+
1456
+ This layer answers precise questions with certainty: "what imports trigger-router.ts?", "what CLI commands are registered?", "what did session X touch?" Built by scripts (ts-morph for TypeScript, equivalent parsers for other languages), never by an LLM. Fast, deterministic, always correct.
1457
+
1458
+ **Layer 2: Vector similarity (soft weights, semantic)**
1459
+ Every node in the structural graph also gets an **embedding** -- a vector encoding its semantic meaning (function name + signature + docstring + surrounding context). Nodes that are semantically similar end up geometrically close in vector space, regardless of whether they have an explicit edge between them.
1460
+
1461
+ This layer answers fuzzy questions: "what is conceptually related to this?", "what files implement patterns similar to this one?", "what past sessions are relevant to this bug?" It surfaces things the agent didn't know to look for.
1462
+
1463
+ **Together:** the structural graph provides the skeleton; the vector layer provides the connective tissue. An agent query resolves both: exact structural neighbors first, semantically similar nodes ranked by distance second. Per-repo and per-module scoping is handled naturally -- intra-repo edges are hard structural links, cross-repo relevance falls back to vector similarity.
1464
+
1465
+ **Technology layers:**
1466
+
1467
+ | Layer | Technology | Role |
1468
+ |-------|-----------|------|
1469
+ | Structural parsing | ts-morph (TypeScript), tree-sitter (other langs) | Extract hard edges deterministically |
1470
+ | Structural storage + traversal | DuckDB | Store nodes/edges, recursive reachability queries |
1471
+ | Vector embeddings | Local embedding model (e.g. `nomic-embed-text` via Ollama, or `@xenova/transformers`) | Encode every node as a vector |
1472
+ | Vector storage + similarity search | LanceDB (embedded, TypeScript-native) or Qdrant (self-hosted) | ANN search over embeddings |
1473
+ | Unified query layer | WorkTrain MCP tool | Single `query_knowledge_graph(intent)` call returns merged structural + semantic results |
1474
+
1475
+ LanceDB is the strongest fit for the vector layer: embedded (no server process), TypeScript-native, local-first, co-locates vector and metadata in the same store. It pairs cleanly with DuckDB handling the structural/relational queries.
1476
+
1477
+ **Build order (spike first, hybrid later):**
1478
+
1479
+ The structural layer (ts-morph + DuckDB) is the right first spike because:
1480
+ 1. It answers the immediately valuable questions (wiring checks, import graphs, CLI registration)
1481
+ 2. It produces the nodes that the vector layer will embed -- you can't embed nothing
1482
+ 3. It proves the foundation before adding semantic complexity
1483
+
1484
+ Once the structural spike works, add the vector layer: embed each node's name + context, store in LanceDB, expose a similarity query alongside the structural query. The two layers are additive -- the structural layer doesn't get replaced, it gets augmented.
1485
+
1486
+ **Per-repo and per-module scoping:**
1487
+ Each repo gets its own structural graph partition and its own vector namespace. Cross-repo queries join partitions explicitly (structural) or search across namespaces with a distance penalty (semantic). The system handles this automatically once the partition boundaries are defined at index time. Finer-grained module-level scoping falls out naturally from the structural graph -- the subgraph rooted at a module's entry point is the module's partition.
1488
+
1489
+ **WorkRail fits:** the graph becomes a new WorkRail source -- `graphSource` alongside `bundledSource`, `userSource`, and `managedSource`. The MCP server exposes `query_knowledge_graph(intent)`. Workflow steps call it instead of running file sweeps. The daemon runs the indexer post-session as a script (structural layer: re-index changed files; vector layer: re-embed changed nodes).
1490
+
1491
+ **Cross-project note:** the same architecture applies to any domain pack. Storyforge's graph has narrative nodes (characters, promises, locations) instead of code nodes, and a different parser (YAML/markdown instead of ts-morph), but the same two-layer design -- structural edges + vector embeddings -- gives it both "what chapters does this character appear in?" (structural) and "what story elements are thematically related to this scene?" (semantic).
1492
+
1493
+ ---
1494
+
1495
+ ### Knowledge graph candidate research findings (Apr 15, 2026)
1496
+
1497
+ Four discovery subagents evaluated Cognee, GraphRAG, LightRAG, Mem0, Zep, Sourcegraph, LSP, ctags, tree-sitter, ts-morph, and DuckDB against a pure relational/structural framing. Findings below, updated with the corrected hybrid architecture understanding.
1498
+
1499
+ **Structural layer decision: ts-morph + DuckDB for the spike.**
1500
+
1501
+ - **ts-morph**: wraps the real TypeScript Compiler API, not a generic parser. Extracts exports, imports, call sites, class implementations, DI `.bind()` patterns, CLI registration maps. In-process, zero external dependencies. Strictly better than tree-sitter for a TypeScript codebase.
1502
+ - **DuckDB**: embedded SQL with recursive CTEs. Handles structural reachability queries. No server process. A 1-day spike, not a 2-week project.
1503
+ - **LSP**: correct answers but requires managing a long-running server process -- wrong operational model.
1504
+ - **ctags**: definitions only, no call edges. Too shallow.
1505
+ - **Sourcegraph**: right idea, enterprise weight. Overkill for local daemon use.
1506
+
1507
+ **Vector layer decision: LanceDB (deferred to post-spike).**
1508
+
1509
+ - **LanceDB**: embedded, TypeScript-native, local-first. Best fit for the vector layer alongside DuckDB.
1510
+ - **Qdrant**: self-hosted, strong ANN performance. Good alternative if LanceDB proves insufficient at scale.
1511
+ - **Weaviate**: vector + graph hybrid in one system. Worth revisiting if maintaining two separate stores becomes painful -- it does both layers but is heavier to self-host.
1512
+
1513
+ **Why GraphRAG/Cognee/LightRAG don't fit (even with the hybrid architecture):**
1514
+ These tools use LLMs to *build* the graph -- entity extraction, relationship identification, summarization all require LLM calls during indexing. That violates the scripts-over-agent principle. The structural layer must be deterministic (parser-built); the vector layer uses an embedding model (deterministic given the same input), not a generative LLM. GraphRAG's semantic richness is real but the wrong tradeoff for a system that needs to re-index after every session without burning tokens.
1515
+
1516
+ **The spike (structural layer, build now):**
1517
+ 1. `npm install ts-morph @duckdb/node-api`
1518
+ 2. 50-line indexer: `project.getSourceFiles()` → walk exports, imports, call expressions → rows into DuckDB nodes/edges tables
1519
+ 3. One MCP tool: `query_knowledge_graph(query: string)` running SQL, returning a context bundle
1520
+ 4. Validation: "what imports trigger-router.ts?" and "what CLI commands are registered?" must return correct answers
1521
+
1522
+ **Post-spike (vector layer):**
1523
+ 1. `npm install vectordb` (LanceDB) + local embedding model via Ollama or `@xenova/transformers`
1524
+ 2. After each structural node is created, embed `name + file + context snippet` → store vector alongside node ID in LanceDB
1525
+ 3. Extend `query_knowledge_graph` to merge: structural neighbors (DuckDB) + semantic neighbors (LanceDB ANN search) → unified ranked context bundle
1526
+ 4. Validate: "what is related to trigger-router.ts?" should surface files not directly imported but implementing the same webhook/routing pattern
1527
+
1528
+ **Incremental update model:**
1529
+ After each daemon session completes, re-index only files in the handoff artifact's `filesChanged` list (structural: ts-morph re-parse; vector: re-embed changed nodes). Full rebuild only on first run or schema changes. Script, not agent.
1530
+
1531
+ ---
1532
+
1533
+ ### Polling trigger model: zero-external-config integrations (Apr 15, 2026)
1534
+
1535
+ **Problem with webhooks:** GitLab/GitHub webhooks require admin access to the project, a publicly reachable URL, and per-project setup. Three friction points that break the freestanding, zero-config philosophy.
1536
+
1537
+ **Solution: polling triggers.** WorkTrain polls external APIs on a schedule instead of waiting for pushes. No external system configuration required -- just a token.
1538
+
1539
+ ```yaml
1540
+ # triggers.yml example
1541
+ triggers:
1542
+ - id: new-mrs
1543
+ type: gitlab_poll
1544
+ source:
1545
+ baseUrl: https://gitlab.com
1546
+ projectId: 12345
1547
+ token: $GITLAB_TOKEN
1548
+ events: [merge_request.opened, merge_request.updated]
1549
+ pollIntervalSeconds: 60
1550
+ workflowId: mr-review-workflow-agentic
1551
+ goalTemplate: "Review MR !{{$.iid}}: {{$.title}}"
1552
+ workspacePath: ~/git/my-project
1553
+ ```
1554
+
1555
+ **What to build:**
1556
+ 1. `PollingTriggerSource` -- new source type alongside existing `generic` (webhook). Fields: `pollIntervalSeconds`, `token`, `baseUrl`, `projectId`, `events`.
1557
+ 2. `PolledEventStore` -- lightweight local state file (`~/.workrail/polled-events.json`) tracking which event IDs have been processed. Prevents re-firing after restart.
1558
+ 3. Polling scheduler in the daemon -- calls `TriggerRouter.dispatch()` directly when a new event is detected. Clean integration, no new routing plumbing.
1559
+
1560
+ **Generalizes to all sources without external config:**
1561
+ - GitHub: poll `/repos/:owner/:repo/pulls`
1562
+ - Jira: poll `/rest/api/3/search?jql=...`
1563
+ - Linear: poll GraphQL for new issues
1564
+ - Slack: poll conversations for pattern matches
1565
+
1566
+ **This is the preferred trigger model for external integrations.** Webhooks remain available for high-volume or latency-sensitive use cases, but polling is the default for everything else -- it works behind firewalls, requires no admin access, and fits `worktrain init` naturally (just ask for a token).
1567
+
1568
+ **Tradeoff:** up to `pollIntervalSeconds` latency (60s default). Acceptable for MR reviews and most agentic tasks. Not acceptable for real-time chat bots.
1569
+
1570
+ **Market research needed before building:**
1571
+ Several tools in this space worth evaluating before building from scratch:
1572
+
1573
+ - **CodeGraph / Tree-sitter based indexes** -- open source, parse-based symbol graphs. Fast to build, no LLM required, but only structural (no semantic edges).
1574
+ - **Sourcegraph** -- enterprise code search + graph. Well-proven at scale. Question: does it expose an API suitable for agent context bundle queries? Overkill for solo/small team.
1575
+ - **Microsoft GraphRAG** -- LLM-built knowledge graphs with community detection. Research project, but directly relevant architecture. Slower to build (LLM-driven), richer semantic edges.
1576
+ - **Cognee** -- open source knowledge graph + RAG, designed for agent workflows. Active project, worth a close look.
1577
+ - **Mem0** -- agent memory layer with graph backend. Simpler than Cognee but less code-specific.
1578
+ - **tree-sitter + DuckDB** -- build-it-yourself option: tree-sitter parses symbols + call graph, DuckDB stores and queries. Full control, no external dependency, fits WorkRail's freestanding philosophy.
1579
+
1580
+ **Recommended approach:** research Cognee and tree-sitter+DuckDB first. Cognee may already solve 80% of this. If not, tree-sitter+DuckDB is the build path -- it fits the "scripts over agent" principle (the graph is built by a deterministic parser, not by asking an LLM to summarize files).
1581
+
1582
+ **WorkRail fits:** the graph is a new WorkRail source -- `graphSource` alongside `bundledSource`, `userSource`, and `managedSource`. The MCP server exposes `query_knowledge_graph` and `update_knowledge_graph` tools. Workflow steps call those tools instead of running file sweeps. The daemon updates the graph after each session completes (script, not agent).
1583
+
1584
+ **Cross-project note:** Storyforge will likely need the same graph layer. Worth building it once in WorkRail and making it available to both -- the node/edge schema is different (code vs narrative) but the architecture (derived layer, provenance, context bundles, session-driven updates) is identical.
1585
+
1586
+ ---
1587
+
1588
+ ### Knowledge graph candidate research findings (Apr 15, 2026)
1589
+
1590
+ Four discovery subagents evaluated Cognee, GraphRAG, LightRAG, Mem0, Zep, Sourcegraph, LSP, ctags, tree-sitter, ts-morph, and DuckDB. Findings were unanimous.
1591
+
1592
+ **Decision: ts-morph + DuckDB. Spike it now.**
1593
+
1594
+ **Why the others lost:**
1595
+
1596
+ - **Cognee**: Python-only SDK, no TypeScript client, no code-aware indexing primitives. Built for document RAG not code graphs. Watch list only.
1597
+ - **GraphRAG / LightRAG**: Use LLMs to build the graph -- violates the "scripts over agent" principle. Non-deterministic output, expensive, no TypeScript client. Skip.
1598
+ - **Mem0 / Zep**: Conversational/session memory, not code graphs. Orthogonal problem. Skip for this use case.
1599
+ - **Sourcegraph**: Enterprise-scale, heavy Docker infrastructure. Overkill for local daemon use. Skip.
1600
+ - **LSP (typescript-language-server)**: Queryable from Node.js but requires managing a separate long-running process with stdio IPC. Correct answers, wrong operational model for a daemon.
1601
+ - **universal-ctags**: Definitions only, no call edges or cross-file references. Too shallow.
1602
+ - **tree-sitter**: Generic parser, good but requires custom TypeScript-specific traversal logic. ts-morph is strictly better for a TypeScript codebase because it uses the real TypeScript compiler.
1603
+
1604
+ **Why ts-morph + DuckDB wins:**
1605
+
1606
+ - **ts-morph** wraps the TypeScript Compiler API directly -- it understands TypeScript semantics, types, and scopes, not just syntax. Extracts exports, imports, call sites, class implementations, DI `.bind()` patterns, and CLI registration maps out of the box. Runs in-process, zero external dependencies.
1607
+ - **DuckDB** is embedded SQL with recursive CTE support. Graph reachability queries work today with `WITH RECURSIVE`. Fast, local, no server process.
1608
+ - Combined: a 1-day spike, not a 2-week project.
1609
+
1610
+ **Schema (from subagent):**
1611
+ ```sql
1612
+ nodes (id, file, name, kind, scope)
1613
+ -- kind: "function" | "class" | "interface" | "constant" | "export" | "di_binding" | "cli_command"
1614
+
1615
+ edges (from_id, to_id, kind, line)
1616
+ -- kind: "calls" | "imports" | "exports" | "registers_in" | "provides"
1617
+
1618
+ provenance (node_id, source_file, source_line, session_id, indexed_at)
1619
+ ```
1620
+
1621
+ **Reachability query example:**
1622
+ ```sql
1623
+ WITH RECURSIVE reachable AS (
1624
+ SELECT id FROM nodes WHERE name = 'executeVersionCommand'
1625
+ UNION ALL
1626
+ SELECT e.to_id FROM edges e JOIN reachable r ON e.from_id = r.id
1627
+ )
1628
+ SELECT n.* FROM nodes n WHERE n.id IN (SELECT id FROM reachable);
1629
+ ```
1630
+
1631
+ **The spike (what to build first):**
1632
+ 1. `npm install ts-morph @duckdb/node-api` -- both are available today
1633
+ 2. Write a 50-line indexer: `project.getSourceFiles()` → walk exports, imports, and call expressions → emit rows to DuckDB nodes/edges tables
1634
+ 3. Write one MCP tool: `query_knowledge_graph(query: string)` that runs SQL and returns a context bundle
1635
+ 4. Test it against the WorkRail `src/` directory: can it answer "what imports trigger-router.ts?" and "what CLI commands are registered?"
1636
+
1637
+ If the spike answers those two questions correctly, the foundation is proven and we build out incrementally from there.
1638
+
1639
+ **Incremental update model (post-spike):**
1640
+ After each daemon session completes, run the indexer only on files that appear in the session's `filesChanged` list (from the handoff artifact). Full re-index only on first run or when the schema changes. This is a script the daemon runs post-workflow, not an agent task.
1641
+
1642
+ ---
1643
+
1644
+ ### Dynamic pipeline composition: task maturity determines the workflow mix (Apr 15, 2026)
1645
+
1646
+ **The insight:** not all tasks are equal in how much work is needed before implementation. A raw idea needs a completely different pipeline than a fully-specced ticket with BRD and designs. WorkTrain should compose the pipeline dynamically based on what already exists, not always run the same fixed set of phases.
1647
+
1648
+ **The maturity spectrum:**
1649
+
1650
+ ```
1651
+ Raw idea Fully specced
1652
+ │ │
1653
+ ▼ ▼
1654
+ "it would be nice if..." "here's the BRD, designs, "fix this bug in
1655
+ acceptance criteria, and file X, line Y"
1656
+ ticket with all context"
1657
+ ```
1658
+
1659
+ **What changes at each maturity level:**
1660
+
1661
+ | What exists | Pipeline additions |
1662
+ |-------------|-------------------|
1663
+ | Nothing -- just an idea | ideation → market research → feasibility → scope definition → spec authoring → design → ticket creation → then all of implementation phases |
1664
+ | Rough spec or ticket | clarify requirements → design → then implementation |
1665
+ | BRD + designs | architecture review → implementation |
1666
+ | BRD + designs + arch decision | implementation only |
1667
+ | Fully specced + arch decided | coding → review → audit → verify |
1668
+ | Code written, needs validation | review → audit → test → verify |
1669
+
1670
+ **How classify-task-workflow learns maturity:**
1671
+ The classify step doesn't just classify complexity and risk -- it also assesses maturity:
1672
+ - `taskMaturity`: idea / rough / specced / ready / code-complete
1673
+ - `existingArtifacts`: which of [brd, designs, arch-decision, acceptance-criteria, ticket, implementation] exist
1674
+ - `missingArtifacts`: what needs to be created before implementation can begin
1675
+
1676
+ The coordinator script uses `taskMaturity` and `missingArtifacts` to prepend the right phases to the pipeline.
1677
+
1678
+ **New workflows needed for the early phases:**
1679
+
1680
+ | Workflow | Purpose |
1681
+ |----------|---------|
1682
+ | `ideation-workflow` | Expand a raw idea into a structured opportunity: problem statement, user value, rough scope, open questions |
1683
+ | `market-research-workflow` | Research whether this problem is solved elsewhere, what competitors do, what patterns exist |
1684
+ | `spec-authoring-workflow` | Author a BRD/PRD from scratch: user stories, acceptance criteria, non-goals, success metrics |
1685
+ | `ticket-creation-workflow` | Break a spec into actionable tickets with proper sizing and dependencies |
1686
+ | `grooming-workflow` | Review a spec or ticket for completeness, edge cases, and implementation readiness |
1687
+
1688
+ **The full lifecycle pipeline for a raw idea:**
1689
+ ```
1690
+ idea → ideation → market research → spec authoring → grooming/validation
1691
+ → design (if hasUI) → architecture → ticket creation
1692
+ → for each ticket: implementation pipeline
1693
+ → integration testing → production audit → ship
1694
+ ```
1695
+
1696
+ **The key design:** the coordinator script drives all of this. It checks what artifacts exist, decides which phases to run, spawns workers for each phase in the right order, and gates on artifacts before proceeding. No human needed to manage the pipeline -- the maturity assessment tells the coordinator exactly what to do.
1697
+
1698
+ **Context from today's session as evidence:** we've been doing exactly this manually -- ideas emerged in conversation (coordinator sessions, message queue, knowledge graph), we groomed them into backlog items, the backlog items have varying levels of completeness, and different agents are running different phases based on where each item is in the lifecycle. WorkTrain should own this entire flow.
1699
+
1700
+ ---
1701
+
1702
+ ### Verification and proof as first-class citizens (Apr 15, 2026)
1703
+
1704
+ **The problem:** today there's no single place that tells you "here's everything that was done to verify this feature is correct." Tests pass, a review ran, an audit happened -- but it's scattered across session notes, PR descriptions, CI logs, and half-remembered conversations. No verification chain.
1705
+
1706
+ **The vision:** every shipped change has a **proof record** -- a structured document that answers: what was built, how was it verified, by whom (which agents), and what was the verdict at each gate. Not a summary for humans -- a queryable record that the coordinator and watchdog can use to enforce quality gates and answer questions like "has this module been production-audited in the last 30 days?"
1707
+
1708
+ **What a proof record contains:**
1709
+
1710
+ ```json
1711
+ {
1712
+ "prNumber": 402,
1713
+ "goal": "auto-commit and auto-PR daemon feature",
1714
+ "verificationChain": [
1715
+ {
1716
+ "kind": "unit_tests",
1717
+ "outcome": "pass",
1718
+ "coverage": "14 tests, delivery-action.ts covered",
1719
+ "sessionId": "sess_abc123",
1720
+ "timestamp": "2026-04-15T22:00:00Z"
1721
+ },
1722
+ {
1723
+ "kind": "mr_review",
1724
+ "outcome": "request_changes",
1725
+ "findings": [{ "severity": "Major", "id": "F1", "description": "shell injection via exec()" }],
1726
+ "sessionId": "sess_def456",
1727
+ "timestamp": "2026-04-15T22:10:00Z"
1728
+ },
1729
+ {
1730
+ "kind": "mr_review",
1731
+ "outcome": "approve",
1732
+ "findings": [],
1733
+ "sessionId": "sess_ghi789",
1734
+ "timestamp": "2026-04-15T23:00:00Z"
1735
+ },
1736
+ {
1737
+ "kind": "production_audit",
1738
+ "outcome": "pass",
1739
+ "sessionId": "sess_jkl012",
1740
+ "timestamp": "2026-04-15T23:05:00Z"
1741
+ }
1742
+ ],
1743
+ "gates": {
1744
+ "unit_tests": "pass",
1745
+ "mr_review": "approved",
1746
+ "production_audit": "pass",
1747
+ "architecture_audit": "skipped (riskLevel=Medium)"
1748
+ },
1749
+ "overallVerdict": "verified",
1750
+ "mergedAt": "2026-04-15T23:15:00Z"
1751
+ }
1752
+ ```
1753
+
1754
+ **Verification gates the coordinator enforces:**
1755
+
1756
+ | Gate | Required for | Trigger |
1757
+ |------|-------------|---------|
1758
+ | Unit tests pass | All changes | After coding, before review |
1759
+ | MR review approved (no Critical/Major) | All changes | After unit tests |
1760
+ | Architecture audit | `touchesArchitecture=true` or `riskLevel=High` | Before coding |
1761
+ | Production audit | `riskLevel=High` or affects prod paths | After coding |
1762
+ | Integration tests | `taskComplexity=Large` | After all slices |
1763
+ | Performance audit | touches hot paths | After coding |
1764
+ | Security audit | touches auth/input/external | After coding |
1765
+
1766
+ No PR merges without passing all required gates for its classification. The coordinator enforces this -- not as a suggestion, but as a hard gate in the script.
1767
+
1768
+ **Visibility surfaces:**
1769
+
1770
+ 1. **Console PR view** -- shows the full verification chain for any merged or open PR. Expandable: click any gate to see the session notes from that review.
1771
+
1772
+ 2. **Module health dashboard** -- per module (e.g. `src/trigger/`, `src/daemon/`), shows: last MR review date, last production audit date, test coverage, open findings. Answers "is this module production-ready right now?"
1773
+
1774
+ 3. **`worktrain verify <pr-number>`** -- command that checks whether a PR has passed all required gates for its classification. Output: pass/fail per gate, with session links.
1775
+
1776
+ 4. **Proof record in every PR description** -- auto-generated section: "Verification chain: ✅ 14 unit tests | ✅ MR review (0 findings) | ✅ Production audit | ⏭ Architecture audit (skipped: riskLevel=Low)"
1777
+
1778
+ **Why this matters:**
1779
+ Right now, "has this been reviewed and audited?" is a question that requires reading through PRs and session notes. With proof records, it's a query: `SELECT * FROM proof_records WHERE module='src/trigger/' AND kind='production_audit' AND outcome='pass' AND timestamp > NOW()-30days`. The knowledge graph stores these records. The watchdog checks them on a schedule. The coordinator gates on them before merging. Verification becomes infrastructure, not process.
1780
+ ---
1781
+
1782
+ ### Dynamic model selection: right model for the right task (Apr 15, 2026)
1783
+
1784
+ **The principle:** not every task needs Sonnet 4.6. Not every task should be locked to Anthropic. The coordinator and the task classifier should be able to select the model dynamically based on what the task actually needs.
1785
+
1786
+ **Why this matters:**
1787
+ - **Cost**: classification, simple routing decisions, and status checks don't need a frontier model. A fast cheap model (Haiku) costs ~20x less and is fast enough for deterministic tasks.
1788
+ - **Quality ceiling**: some tasks (complex architecture decisions, multi-file refactors) benefit from the best available model regardless of cost.
1789
+ - **Provider flexibility**: Anthropic goes down, pricing changes, a new provider releases a better model. Being locked to one provider is an operational risk and a competitive disadvantage.
1790
+ - **Specialization**: some models are better at specific tasks -- code generation, reasoning, multimodal (if designs/screenshots are involved).
1791
+
1792
+ **Model selection in triggers.yml:**
1793
+ ```yaml
1794
+ triggers:
1795
+ - id: mr-review
1796
+ workflowId: mr-review-workflow.agentic.v2
1797
+ agentConfig:
1798
+ model: claude-sonnet-4-6 # explicit override
1799
+ provider: anthropic # or: amazon-bedrock, openai, gemini
1800
+
1801
+ - id: classify-task
1802
+ workflowId: classify-task-workflow
1803
+ agentConfig:
1804
+ model: claude-haiku-4-5 # fast + cheap for classification
1805
+ provider: amazon-bedrock
1806
+
1807
+ - id: architecture-design
1808
+ workflowId: architecture-scalability-audit
1809
+ agentConfig:
1810
+ model: claude-opus-4-6 # best available for high-stakes design
1811
+ provider: anthropic
1812
+ ```
1813
+
1814
+ **Model selection in the classifier output:**
1815
+ The classify-task-workflow can output a `recommendedModel` var alongside the pipeline:
1816
+ - `Small + Low` → Haiku (fast, cheap)
1817
+ - `Medium + Medium` → Sonnet (balanced)
1818
+ - `Large + High` or `touchesArchitecture=true` → Opus (best quality)
1819
+
1820
+ The coordinator script reads `recommendedModel` from the classifier and passes it as `agentConfig` when spawning child sessions.
1821
+
1822
+ **Provider abstraction (already partially built):**
1823
+ The Bedrock integration (`src/daemon/pi-mono-loader.ts` / first-party agent loop in progress) already handles Anthropic vs Bedrock. The abstraction needs to extend to:
1824
+ - **OpenAI** -- GPT-4o, o3 (useful for reasoning-heavy tasks)
1825
+ - **Google Gemini** -- Gemini 1.5 Pro (strong at long-context, multimodal)
1826
+ - **Ollama** -- local models (air-gapped environments, cost-zero for classification tasks)
1827
+ - **Any OpenAI-compatible API** -- covers most new providers automatically
1828
+
1829
+ **Implementation path:**
1830
+ The first-party agent loop (in progress, PR TBD) is the right place to implement the provider abstraction. Instead of hardcoding the Anthropic SDK, it accepts a provider client that conforms to the Anthropic messages API format (which most providers support via compatibility layers). The `agentConfig.provider` and `agentConfig.model` fields are already in `TriggerDefinition` (added in GAP-8 / PR #397) -- the loop just needs to instantiate the right client.
1831
+
1832
+ **Cost optimization opportunity:**
1833
+ With model routing, a full development pipeline becomes significantly cheaper:
1834
+ - classify-task: Haiku (~$0.002)
1835
+ - discovery: Sonnet (~$0.05)
1836
+ - coding: Sonnet (~$0.20)
1837
+ - mr-review: Sonnet (~$0.10)
1838
+ - production-audit: Sonnet (~$0.08)
1839
+ - architecture (if needed): Opus (~$0.50)
1840
+
1841
+ vs today where everything runs on Sonnet by default. For a typical Medium task, model routing saves ~60% cost with no quality loss on the lightweight phases.
1842
+
1843
+ ---
1844
+
1845
+ ### Native multi-agent orchestration: coordinator sessions + session DAG (HIGH PRIORITY, Apr 15, 2026)
1846
+
1847
+ **The problem:** Everything we can do manually today -- spawn parallel agents, chain discovery→implement→review→fix, react to findings, merge when clean -- WorkTrain should be able to do natively, fully autonomously, with full observability, and without any user feedback.
1848
+
1849
+ Today this requires a human (or Claude Code) to:
1850
+ - Read completion notifications
1851
+ - Interpret findings
1852
+ - Decide what follow-up agents to spawn
1853
+ - Track which PRs are clean vs need fixes
1854
+ - Trigger the merge sequence when everything is ready
1855
+
1856
+ None of that should require a human. It's all policy that belongs in a coordinator workflow.
1857
+
1858
+ ---
1859
+
1860
+ #### New primitives required
1861
+
1862
+ **`spawn_session` tool** (available inside workflow steps)
1863
+ Starts a child session with a given workflowId + goal. Non-blocking -- returns a `sessionHandle` immediately. The coordinator continues executing the current step.
1864
+
1865
+ ```typescript
1866
+ spawn_session({
1867
+ workflowId: 'mr-review-workflow-agentic',
1868
+ goal: `Review PR #${prNumber}: ${prTitle}`,
1869
+ workspacePath: '/path/to/repo',
1870
+ context: { prNumber, prTitle, prDiff }
1871
+ }) → { sessionHandle: 'sess_abc123' }
1872
+ ```
1873
+
1874
+ **`await_sessions` tool** (available inside workflow steps)
1875
+ Blocks until one or all of a set of session handles complete. Returns their results and output artifacts (notes, handoff artifacts, MR review findings).
1876
+
1877
+ ```typescript
1878
+ await_sessions({
1879
+ handles: ['sess_abc123', 'sess_def456'],
1880
+ mode: 'all' // or 'any'
1881
+ }) → [{ handle, result, outputs: { notes, findings, artifacts } }]
1882
+ ```
1883
+
1884
+ **Coordinator session type**
1885
+ A session that owns child sessions. Parent-child relationship stored in the session store. Killing a coordinator kills all its children. The console DAG view shows the full tree.
1886
+
1887
+ **Result routing**
1888
+ Child session outputs are automatically available when `await_sessions` resolves -- the coordinator doesn't manually query the session store.
1889
+
1890
+ ---
1891
+
1892
+ #### Coordinator workflow pattern
1893
+
1894
+ A coordinator workflow uses a `while` loop step with `spawn_session` + `await_sessions` to drive a dynamic DAG:
1895
+
1896
+ ```
1897
+ Phase 1: Gather work items (e.g. open PRs, open issues, failing tests)
1898
+ Phase 2: Spawn workers in parallel (one per work item)
1899
+ Phase 3: Await all workers
1900
+ Phase 4: Classify results
1901
+ - Clean items: queue for merge/close
1902
+ - Items with findings: spawn fix agents
1903
+ - Items with blockers: escalate to human (fire onComplete notification)
1904
+ Phase 5: Await fix agents, re-review if needed (circuit breaker: max 3 attempts)
1905
+ Phase 6: Execute final action (merge sequence, create summary, post to Slack)
1906
+ ```
1907
+
1908
+ This is what we did manually all day. It should be a workflow anyone can run with a single trigger.
1909
+
1910
+ ---
1911
+
1912
+ #### Observability: session DAG view in console
1913
+
1914
+ The QueuePane shows a flat list today. For coordinator workflows it must show a tree:
1915
+
1916
+ ```
1917
+ ● coordinator: groom and fix all PRs [running, 47 min]
1918
+ ├── ✓ sess_abc: GAP-1 implement [merged, 18 min]
1919
+ ├── ✓ sess_def: GAP-1 MR review [approved, 4 min]
1920
+ ├── ✓ sess_ghi: GAP-6 implement [merged, 12 min]
1921
+ ├── ● sess_jkl: GAP-6 MR review fix [running, 3 min]
1922
+ │ └── ✓ sess_mno: GAP-6 findings fix [complete, 8 min]
1923
+ └── ✓ sess_pqr: TS6 tsconfig fix [merged, 6 min]
1924
+ ```
1925
+
1926
+ Each node: status icon, workflow type, goal snippet, duration. Expand to see step notes. Parent-child edges visible. Critical path highlighted.
1927
+
1928
+ ---
1929
+
1930
+ #### No-user-feedback policy logic
1931
+
1932
+ The coordinator workflow encodes the policy as workflow step instructions:
1933
+
1934
+ - **Critical/Major finding** → block merge, spawn fix agent, re-review (max 3 passes), escalate if still failing
1935
+ - **Minor finding** → spawn fix agent if auto-fixable, else log and proceed
1936
+ - **Nit** → log, proceed without fix
1937
+ - **Clean** → queue for merge
1938
+ - **Merge sequence** → serial (one at a time, pull before each merge to avoid conflicts)
1939
+ - **Circuit breaker** → after 3 failed fix attempts on same finding, post to Slack/GitLab and pause
1940
+
1941
+ This policy lives in the coordinator workflow, not in the daemon code. Different teams can have different policies by using different coordinator workflows.
1942
+
1943
+ ---
1944
+
1945
+ #### What this unlocks
1946
+
1947
+ A single trigger fires the entire development cycle autonomously:
1948
+
1949
+ ```yaml
1950
+ # triggers.yml
1951
+ - id: daily-grooming
1952
+ type: cron
1953
+ schedule: "0 9 * * 1-5" # 9am weekdays
1954
+ workflowId: coordinator-groom-and-ship
1955
+ goal: "Review all open PRs, fix findings, merge clean PRs, file issues for blockers"
1956
+ workspacePath: ~/git/my-project
1957
+ autoCommit: true
1958
+ autoOpenPR: true
1959
+ ```
1960
+
1961
+ WorkTrain wakes up at 9am, reviews every open PR, fixes everything it can, merges what's clean, posts a summary to Slack, and files GitHub issues for anything that needs human judgment. No human involved unless the circuit breaker fires.
1962
+
1963
+ ---
1964
+
1965
+ #### Build order
1966
+
1967
+ 1. **`spawn_session` + `await_sessions` tools** -- the core primitives. These are new MCP tools exposed to workflow steps, backed by a new `SpawnedSessionRegistry` in the DI container.
1968
+ 2. **Parent-child session relationship in session store** -- `parentSessionId` field on session creation event.
1969
+ 3. **Console DAG view** -- new `CoordinatorView` component that renders the session tree from the parent-child graph.
1970
+ 4. **Coordinator workflow templates** -- `coordinator-groom-and-ship`, `coordinator-review-all-prs`, `coordinator-investigate-and-fix` as bundled workflows.
1971
+ 5. **No-feedback policy encoding** -- document the MR review finding classification schema so coordinator workflows can reliably parse and act on it.
1972
+
1973
+ **This is the most important architectural work remaining in WorkTrain.** Everything else -- polling triggers, onboarding, knowledge graph -- makes WorkTrain better. This makes it genuinely autonomous.
1974
+
1975
+ ---
1976
+
1977
+ ### Message queue: async communication with WorkTrain from anywhere (Apr 15, 2026)
1978
+
1979
+ **The problem:** working with WorkTrain today requires you to be in the terminal, watching notifications, responding in real time. But the most valuable moments are often asynchronous -- you have a thought at 2am, want to redirect a running agent from your phone, or want to queue a direction before the current batch finishes.
1980
+
1981
+ **The design:** a persistent message queue that decouples when you send a message from when WorkTrain acts on it.
1982
+
1983
+ ```bash
1984
+ worktrain tell "skip the architecture review for the polling triggers PR, it's low risk"
1985
+ worktrain tell "add knowledge graph vector layer to next sprint"
1986
+ worktrain tell "stop the worktrain-init agent, I changed my mind on the UX"
1987
+ ```
1988
+
1989
+ Each command appends to `~/.workrail/message-queue.jsonl` (append-only, one JSON line per message). The daemon drains the queue between agent completions -- never mid-run, always at a natural break point. Messages are delivered in order and never lost across restarts.
1990
+
1991
+ **What the queue enables:**
1992
+
1993
+ - **Direction changes while agents run** -- "actually, use polling not webhooks" can be queued while 6 agents are running; the coordinator picks it up before spawning the next batch
1994
+ - **Mobile input** -- a mobile app (or simple HTTP endpoint) writes to the queue; WorkTrain processes when ready
1995
+ - **Async ideation** -- thoughts queued whenever they occur, not forced into a synchronous conversation window
1996
+ - **Stop/pause signals** -- `worktrain tell "pause after current batch"` is queue-delivered; coordinator checks for pause signals before each spawn
1997
+ - **Priority override** -- `worktrain tell "prioritize the shell injection fix, it's blocking"` bumps a task to the front of the coordinator's work list
1998
+
1999
+ **Outbox (WorkTrain → user):**
2000
+ The same pattern in reverse. WorkTrain appends notifications to `~/.workrail/outbox.jsonl` -- agent completions, findings that need human judgment, questions that require a decision. A mobile client polls this file (or an HTTP SSE endpoint wraps it) and pushes to the user's phone. The user reads the notification, taps a response, it goes into the message queue. Full async loop with no real-time presence required.
2001
+
2002
+ **Architecture:**
2003
+ - `~/.workrail/message-queue.jsonl` -- inbound, append-only, drained in order
2004
+ - `~/.workrail/outbox.jsonl` -- outbound, append-only, read by clients
2005
+ - `worktrain tell <message>` CLI command -- appends to message-queue
2006
+ - `worktrain inbox` CLI command -- reads unread outbox items
2007
+ - Coordinator loop checks message-queue at the start of each cycle before spawning new agents
2008
+ - The `talk` session (interactive ideation) consumes from the same queue -- seamless transition between async messages and live conversation
2009
+
2010
+ **This is the foundation for mobile monitoring.** The mobile app is just a client that reads outbox and writes to message-queue. No new daemon capability needed -- just a thin client over these two files.
2011
+
2012
+ ---
2013
+
2014
+ ### Autonomous merge: WorkTrain approves and merges its own PRs after full vetting (Apr 15, 2026)
2015
+
2016
+ **The idea:** after the full verification chain passes (unit tests, MR review clean, all required audits green), WorkTrain runs `gh pr review --approve && gh pr merge --squash` itself. No human needed in the loop for PRs that pass all gates.
2017
+
2018
+ **This is already mostly built.** The coordinator script already calls `gh pr merge` -- we've been doing it today. The gap is formalizing the policy that makes auto-merge safe: what gates must pass, what findings are acceptable, and what always requires a human.
2019
+
2020
+ ---
2021
+
2022
+ #### The auto-merge policy (what makes it safe)
2023
+
2024
+ **Auto-merge allowed when ALL of:**
2025
+ - All required verification gates pass (defined by task classification)
2026
+ - MR review: 0 Critical, 0 Major findings
2027
+ - If `riskLevel=High`: production audit also passes
2028
+ - If `touchesArchitecture=true`: architecture audit also passes
2029
+ - CI is green (all required checks pass)
2030
+ - No `needs-human-review` label on the PR
2031
+ - The PR is not to a protected branch that requires human approval (configurable)
2032
+
2033
+ **Auto-merge blocked when ANY of:**
2034
+ - Any Critical or Major finding in any review/audit
2035
+ - CI is failing
2036
+ - The PR was authored by a human (WorkTrain only auto-merges its own PRs)
2037
+ - The PR touches security-sensitive paths (auth, credentials, network exposure) -- configurable blocklist
2038
+ - Circuit breaker has fired (3+ fix attempts on same finding = escalate to human)
2039
+ - `riskLevel=Critical` (always human approval for highest-risk changes)
2040
+
2041
+ **Human always required for:**
2042
+ - Schema changes (breaking changes to public API contracts)
2043
+ - Dependency upgrades (major version)
2044
+ - Infrastructure/CI/CD changes
2045
+ - Changes to WorkTrain's own merge policy
2046
+ - Anything the watchdog flags as a drift-from-spec
2047
+
2048
+ ---
2049
+
2050
+ #### Implementation
2051
+
2052
+ This is a coordinator script policy, not a new capability. The required pieces:
2053
+
2054
+ 1. **Proof record gates** (in progress -- verification chain spec) -- the coordinator checks the proof record before calling merge
2055
+ 2. **`--admin` merge bypass for CI false positives** -- already used today; coordinator should note when it uses `--admin` and why
2056
+ 3. **`needs-human-review` label escape hatch** -- any human can block auto-merge by adding this label; WorkTrain respects it
2057
+ 4. **Merge audit log** -- every auto-merge appended to `~/.workrail/merge-log.jsonl`: which PR, which gates passed, which were skipped and why, timestamp. The watchdog checks this log.
2058
+
2059
+ **The coordinator script merge gate:**
2060
+ ```typescript
2061
+ const proofRecord = await getProofRecord(prNumber);
2062
+ const canAutoMerge =
2063
+ proofRecord.gates.unit_tests === 'pass' &&
2064
+ proofRecord.gates.mr_review === 'approved_clean' && // 0 Critical, 0 Major
2065
+ (riskLevel !== 'High' || proofRecord.gates.production_audit === 'pass') &&
2066
+ (touchesArchitecture !== true || proofRecord.gates.architecture_audit === 'pass') &&
2067
+ !prLabels.includes('needs-human-review') &&
2068
+ prAuthor.startsWith('worktrain-'); // only merge own PRs
2069
+
2070
+ if (canAutoMerge) {
2071
+ await exec(`gh pr merge ${prNumber} --squash`);
2072
+ appendMergeLog({ prNumber, gates: proofRecord.gates, timestamp: new Date() });
2073
+ } else {
2074
+ await notifyHuman(prNumber, proofRecord); // post to Slack with what's blocking
2075
+ }
2076
+ ```
2077
+
2078
+ **The trust boundary is the proof record.** WorkTrain doesn't decide "this looks fine" -- it checks whether each required gate has a recorded pass. The merge decision is deterministic. A human can always override by adding `needs-human-review`. The audit log makes every auto-merge traceable.
2079
+
2080
+ **Why this is safe even though it sounds scary:**
2081
+ The risk of auto-merge is "something bad gets into main." The mitigations are: the review agent is adversarial (actively looks for problems), the production audit checks for runtime risks, CI validates behavior, and the proof record is the immutable record of what was checked. A human reviewing the PR manually doesn't add much signal beyond what 3 specialized audit agents already found. The real human value is in edge cases -- which is exactly what `needs-human-review` and the `riskLevel=Critical` block handle.
2082
+
2083
+ **Near-term:** WorkTrain already merges in the coordinator script (we've done it today). Formalizing the policy above just makes it explicit and auditable rather than ad-hoc.
2084
+
2085
+ ---
2086
+
2087
+ ### Periodic analysis agents: continuous project health scanning (Apr 15, 2026)
2088
+
2089
+ **The idea:** WorkTrain runs agents on a schedule to proactively identify issues, gaps, improvement opportunities, and ideas -- without being asked. The watchdog (already spec'd) handles drift detection. These are deeper, domain-specific scans that run weekly or monthly.
2090
+
2091
+ **The agent zoo:**
2092
+
2093
+ **Weekly: Code health scan**
2094
+ Runs `architecture-scalability-audit` on modules that haven't been audited in 30 days. Scans for: coupling violations, growing complexity hotspots (files with most churn), missing abstractions that are emerging across multiple recent PRs, performance anti-patterns introduced in the last sprint. Output: `code-health-report.md` + GitHub issues filed for actionable findings.
2095
+
2096
+ **Weekly: Test coverage scan**
2097
+ Identifies files modified in the last 30 days with zero or low test coverage. Files with new exported symbols that have no tests. Critical paths (error handling, auth, external API boundaries) with only happy-path tests. Output: files a missing test coverage filed as GitHub issues with suggested test scenarios.
2098
+
2099
+ **Weekly: Documentation drift scan**
2100
+ Checks if recently merged PRs changed behavior that's described in docs. Identifies code that lacks inline documentation for non-obvious logic. Finds CLAUDE.md / AGENTS.md that haven't been updated to reflect new modules or conventions. Output: `doc-drift-report.md` + PRs to fix the most important gaps.
2101
+
2102
+ **Monthly: Dependency health scan**
2103
+ Goes beyond just "is it outdated?" -- assesses: are there known CVEs? are there active forks or replacements? are there lighter alternatives for heavy dependencies? is pi-mono still the right choice or should it be replaced? Output: `dependency-health-report.md` with recommendations ranked by impact.
2104
+
2105
+ **Monthly: Performance baseline**
2106
+ Runs a set of benchmark scenarios: startup time, first workflow step latency, session store read/write throughput, knowledge graph query time on a real repo. Compares against the previous month's baseline. Flags regressions > 10%. Output: `performance-baseline-YYYY-MM.md` + issues for regressions.
2107
+
2108
+ **Continuous: Security scan**
2109
+ On every PR merge: scan changed files for OWASP top 10 patterns -- hardcoded secrets, command injection vectors (like the `exec()` issue we found in #402), missing input validation at boundaries, unsafe deserialization. Output: findings posted as PR comments before merge if not already reviewed.
2110
+
2111
+ **Monthly: Ideas generation**
2112
+ The most interesting one. Runs `wr.discovery` on the current state of the codebase + backlog + recent session history and asks: "what's the most impactful thing we could build next that we haven't thought of yet?" Cross-references with competitor landscape (GraphRAG, LangGraph, nexus-core updates), recent AI research, and user pain points in the session notes. Output: `ideas-YYYY-MM.md` -- a list of concrete improvement opportunities with rough effort estimates. The best ideas get promoted to the backlog by the watchdog.
2113
+
2114
+ **How this works with the coordinator:**
2115
+ All of these are just cron triggers in `triggers.yml`. The coordinator script for each runs the appropriate workflow, reads the output, files GitHub issues for actionable findings, and posts a summary to Slack. No human needed to kick them off -- they just run.
2116
+
2117
+ ```yaml
2118
+ triggers:
2119
+ - id: weekly-code-health
2120
+ type: cron
2121
+ schedule: "0 8 * * 1" # Monday 8am
2122
+ workflowId: architecture-scalability-audit
2123
+ goal: "Weekly code health scan: identify coupling violations, complexity hotspots, missing abstractions"
2124
+ workspacePath: ~/git/personal/workrail
2125
+ agentConfig:
2126
+ model: claude-sonnet-4-6
2127
+ callbackUrl: http://localhost:3200/internal/file-issues
2128
+
2129
+ - id: monthly-ideas
2130
+ type: cron
2131
+ schedule: "0 9 1 * *" # 1st of every month
2132
+ workflowId: wr.discovery
2133
+ goal: "Monthly ideas generation: what's the most impactful improvement we haven't thought of yet?"
2134
+ workspacePath: ~/git/personal/workrail
2135
+ ```
2136
+
2137
+ **The meta-point:** WorkTrain running these agents on the WorkRail/WorkTrain repo means the product improves itself on a schedule. Every Monday it finds its own architectural problems. Every month it generates ideas for its own improvement. Every PR gets a security scan before it merges. The codebase gets continuously healthier without anyone managing it.
2138
+
2139
+ ---
2140
+
2141
+ ### Monitoring, analytics, and autonomous remediation (Apr 15, 2026)
2142
+
2143
+ **The idea:** WorkTrain watches your application's health metrics in real time, identifies anomalies, investigates root causes, and resolves what it can -- automatically. This closes the full loop from "something went wrong" to "it's fixed and here's why."
2144
+
2145
+ ---
2146
+
2147
+ #### What WorkTrain monitors
2148
+
2149
+ **Application metrics (via polling or push):**
2150
+ - Error rate (Sentry, Datadog, CloudWatch, custom endpoint)
2151
+ - Latency P50/P95/P99 (per endpoint, per workflow step)
2152
+ - Memory and CPU usage of the daemon itself
2153
+ - Session success/failure rate (from the daemon's own session store)
2154
+ - Workflow completion time trends (are sessions getting slower?)
2155
+ - Queue depth (are triggers backing up?)
2156
+
2157
+ **Codebase health metrics (derived from WorkTrain's own data):**
2158
+ - Test coverage trends (going up or down over time?)
2159
+ - Build time trends
2160
+ - PR cycle time (time from open to merge)
2161
+ - Number of open findings by severity across all open PRs
2162
+ - Number of sessions that ended in `_tag: 'error'` vs `'success'` in the last 7 days
2163
+ - Workflow steps most likely to fail (from session store analysis)
2164
+
2165
+ **Custom metrics (user-defined):**
2166
+ ```yaml
2167
+ # triggers.yml
2168
+ monitoring:
2169
+ - id: session-error-rate
2170
+ type: metric_threshold
2171
+ source: daemon_sessions # reads from ~/.workrail/data/sessions/
2172
+ query: "error_rate_7d > 0.15" # >15% session failure rate
2173
+ workflowId: bug-investigation.agentic.v2
2174
+ goal: "Investigate high daemon session error rate: {{$.error_rate}}% failures in last 7 days"
2175
+ workspacePath: ~/git/personal/workrail
2176
+
2177
+ - id: sentry-errors
2178
+ type: sentry_poll
2179
+ project: workrail
2180
+ token: $SENTRY_TOKEN
2181
+ threshold: new_error_rate_1h > 5
2182
+ workflowId: bug-investigation.agentic.v2
2183
+ goalTemplate: "Investigate Sentry error spike: {{$.error.type}} -- {{$.error.message}}"
2184
+ ```
2185
+
2186
+ ---
2187
+
2188
+ #### The monitoring loop
2189
+
2190
+ ```
2191
+ monitor: detect anomaly
2192
+
2193
+ ├── classify severity (script -- based on threshold breach magnitude)
2194
+ │ Critical: > 3x normal, affects production users
2195
+ │ High: > 2x normal, degraded but functional
2196
+ │ Low: trending bad but within bounds
2197
+
2198
+ ├── [if Critical] page immediately
2199
+ │ script: post to Slack #incidents with metric data + session link
2200
+
2201
+ ├── investigate
2202
+ │ workflow: bug-investigation.agentic.v2
2203
+ │ inputs: metric data, recent commits, error logs, affected code paths
2204
+ │ outputs: root cause hypothesis, affected files, confidence score
2205
+
2206
+ ├── [if confidence >= 0.8 AND severity <= High] attempt auto-remediation
2207
+ │ ├── [if config/feature-flag fix] flip flag (script, instant)
2208
+ │ ├── [if code fix, well-understood] spawn coding-task → review → merge
2209
+ │ └── [if rollback needed] create rollback PR → review → merge
2210
+
2211
+ ├── [if confidence < 0.8 OR severity == Critical] escalate
2212
+ │ script: post full investigation findings to Slack + file GitHub issue
2213
+
2214
+ └── follow-up check
2215
+ cron: 30 min later → has the metric recovered? post update.
2216
+ ```
2217
+
2218
+ ---
2219
+
2220
+ #### WorkTrain analytics dashboard
2221
+
2222
+ Beyond alerting, WorkTrain maintains a persistent analytics layer that answers questions like:
2223
+
2224
+ - "What's our average PR cycle time this month vs last month?"
2225
+ - "Which workflow steps fail most often?"
2226
+ - "How much did autonomous sessions cost in tokens this week?"
2227
+ - "What percentage of bugs were auto-fixed vs escalated?"
2228
+ - "Which modules have the most open findings from MR reviews?"
2229
+ - "How many sessions ran today / this week / this month?"
2230
+
2231
+ This data lives in the knowledge graph (structured, queryable) and is visualized in the console. The `worktrain talk` interface can answer these questions conversationally: "how are we doing this week?" → pulls the analytics and gives a natural language summary.
2232
+
2233
+ ---
2234
+
2235
+ #### Self-monitoring: WorkTrain watching itself
2236
+
2237
+ The most immediately useful instance is WorkTrain monitoring its own daemon:
2238
+
2239
+ - Session error rate rising → investigate what kinds of tasks are failing
2240
+ - Queue depth growing → daemon may be overloaded → reduce poll frequency or spawn fewer concurrent sessions
2241
+ - Session duration outliers → some sessions are running way too long → investigate which workflow step is stuck
2242
+ - Memory leak → daemon process growing unbounded → restart + file bug
2243
+ - Disk usage → session store growing too large → prune old sessions
2244
+
2245
+ These are all monitorable from `~/.workrail/data/sessions/` with no external dependency. WorkTrain can watch itself with zero additional infrastructure.
2246
+
2247
+ ---
2248
+
2249
+ #### Implementation path
2250
+
2251
+ **Now (no new features needed):** cron trigger → `wr.discovery` workflow that reads session store metrics → posts summary to Slack. This gives analytics immediately.
2252
+
2253
+ **Near-term (needs `metric_threshold` trigger type):** new `PollingMonitorSource` that evaluates a metric expression on a schedule and fires only when threshold breaches. Same polling infrastructure as `gitlab_poll`.
2254
+
2255
+ **Medium-term:** Sentry/Datadog/CloudWatch adapters as polling sources. Same pattern as GitLab -- poll the API, deduplicate events, dispatch workflow.
2256
+
2257
+ **Long-term:** real-time metric ingestion (push rather than pull), time-series storage in DuckDB alongside the knowledge graph, analytics dashboard in the console.
2258
+
2259
+ ---
2260
+
2261
+ ### Per-workspace work queue: proactive task drain instead of pure event-driven (Apr 15, 2026)
2262
+
2263
+ **The insight:** triggers make WorkTrain reactive (something happens, WorkTrain responds). A work queue makes WorkTrain proactive -- it pulls the next item when capacity is available, works it to completion, pulls the next. This is how a real development team operates: you have a sprint board you drain, not just a webhook listener.
2264
+
2265
+ **The queue is the backlog made executable.** Every item in the backlog, every GitHub issue labeled for autonomous work, every `worktrain enqueue "..."` from the terminal -- all normalized into one ordered list per workspace that WorkTrain drains continuously.
2266
+
2267
+ ---
2268
+
2269
+ #### How it works
2270
+
2271
+ **Internal queue format:** `~/.workrail/workspaces/<name>/queue.jsonl` -- append-only, one item per line. The daemon's coordinator loop checks this file between sessions and pulls the next item when under `maxConcurrentSessions`. Items are consumed in priority order, then FIFO.
2272
+
2273
+ ```jsonl
2274
+ {"id":"q_001","goal":"implement maxConcurrentSessions global semaphore","priority":"high","source":"manual","createdAt":"2026-04-15T22:00:00Z","workflow":null,"status":"pending"}
2275
+ {"id":"q_002","goal":"add GitHub polling adapter","priority":"medium","source":"github_issue","issueNumber":410,"createdAt":"2026-04-15T22:01:00Z","workflow":null,"status":"pending"}
2276
+ {"id":"q_003","goal":"investigate flaky timing test in console-service-dormancy","priority":"low","source":"manual","createdAt":"2026-04-15T22:02:00Z","workflow":"bug-investigation.agentic.v2","status":"pending"}
2277
+ ```
2278
+
2279
+ **CLI interface:**
2280
+ ```bash
2281
+ worktrain enqueue "implement X" --workspace workrail --priority high
2282
+ worktrain enqueue "investigate this bug" --workspace workrail --workflow bug-investigation.agentic.v2
2283
+ worktrain queue list --workspace workrail # show pending items
2284
+ worktrain queue pause --workspace workrail # stop draining
2285
+ worktrain queue resume --workspace workrail # resume draining
2286
+ worktrain queue remove <id> --workspace workrail # remove an item
2287
+ ```
2288
+
2289
+ **External pull sources (normalized into the internal queue):**
2290
+ ```yaml
2291
+ workspaces:
2292
+ workrail:
2293
+ path: ~/git/personal/workrail
2294
+ queue:
2295
+ maxConcurrentSessions: 3
2296
+ sources:
2297
+ - type: github_issues
2298
+ integration: github
2299
+ filter: 'label:worktrain-queue'
2300
+ priority:
2301
+ - label: 'priority:high' → high
2302
+ - label: 'priority:medium' → medium
2303
+ - default: → low
2304
+ - type: internal # always included
2305
+ ```
2306
+
2307
+ When a GitHub issue is labeled `worktrain-queue`, a poll cycle picks it up and normalizes it into the internal queue. When WorkTrain completes the work, it removes the label (or transitions status) and closes the issue. The team uses GitHub issues as their task interface; WorkTrain drains them autonomously.
2308
+
2309
+ **Supported external sources:**
2310
+ - GitHub issues (label filter)
2311
+ - GitLab issues (label filter)
2312
+ - Jira sprint board (active sprint items assigned to worktrain user)
2313
+ - Linear (triage queue or assignee filter)
2314
+ - Internal queue.jsonl (always available, zero config)
2315
+
2316
+ ---
2317
+
2318
+ #### Queue + message queue + talk: the full interface
2319
+
2320
+ Three modes, all async-safe, all persisted:
2321
+
2322
+ | Interface | Use case | Latency |
2323
+ |-----------|----------|---------|
2324
+ | **Work queue** | "do this when you have capacity" | Whenever a slot is free |
2325
+ | **Message queue** (`worktrain tell`) | "do this now, between current sessions" | End of current batch |
2326
+ | **Talk** (`worktrain talk`) | "let's discuss and decide together" | Interactive |
2327
+
2328
+ You can send a thought from your phone at 2am via `worktrain tell`, and separately have a queue of 10 backlog items WorkTrain is draining during the day. The talk session can inspect the queue, reorder items, and add new ones -- all from natural conversation.
2329
+
2330
+ ---
2331
+
2332
+ #### Queue-aware coordinator loop
2333
+
2334
+ The coordinator's main loop becomes:
2335
+
2336
+ ```typescript
2337
+ while (daemon.running) {
2338
+ // 1. Drain message queue (direction changes, questions)
2339
+ const messages = await readMessageQueue();
2340
+ for (const msg of messages) await handleMessage(msg);
2341
+
2342
+ // 2. Pull next queue items up to maxConcurrentSessions
2343
+ const active = await getActiveSessions();
2344
+ const slots = maxConcurrentSessions - active.length;
2345
+ if (slots > 0) {
2346
+ const items = await dequeueItems(slots);
2347
+ for (const item of items) {
2348
+ const pipeline = await classifyAndBuildPipeline(item.goal);
2349
+ await spawnCoordinatorSession(pipeline, item);
2350
+ }
2351
+ }
2352
+
2353
+ // 3. Check external pull sources for new items
2354
+ await syncExternalSources();
2355
+
2356
+ await sleep(5_000); // 5s coordinator tick
2357
+ }
2358
+ ```
2359
+
2360
+ The queue is the thing that makes WorkTrain feel like a teammate rather than a service -- it has its own work to do, it makes progress autonomously, and you can check in on it rather than having to drive every task manually.
2361
+
2362
+ ---
2363
+
2364
+ #### Queue visibility in the console
2365
+
2366
+ The console adds a **Queue tab** (alongside Sessions and AUTO):
2367
+ - Pending items (ordered by priority, FIFO within priority)
2368
+ - Active items (with live session link)
2369
+ - Completed items (with outcome, duration, PR link if applicable)
2370
+ - Paused/blocked items (with reason)
2371
+
2372
+ Drag-to-reorder for priority. Click to expand and see the full pipeline plan. Button to pause/resume the queue. "Add item" form that goes to `worktrain enqueue`.
2373
+
2374
+ ---
2375
+
2376
+ #### Relationship to worktrain spawn/await
2377
+
2378
+ `worktrain spawn` / `worktrain await` are for coordinator *scripts* -- explicit programmatic orchestration. The work queue is for *ambient* drain -- WorkTrain autonomously pulls items when capacity is free. Both use the same underlying session engine. The difference is who's driving: a script (spawn/await) or the queue drain loop. They compose naturally: a queue item might be a coordinator script that spawns its own child sessions via spawn/await.
2379
+
2380
+ ---
2381
+
2382
+ ### Work queue refinements: filtering, catch-all mode, and deadline-aware prioritization (Apr 15, 2026)
2383
+
2384
+ #### Issue/ticket filtering
2385
+
2386
+ The external pull sources need richer filtering than just a label. Real teams organize work by project, team, component, sprint, and assignee -- all of these should be filterable:
2387
+
2388
+ ```yaml
2389
+ workspaces:
2390
+ workrail:
2391
+ queue:
2392
+ sources:
2393
+ - type: github_issues
2394
+ integration: github
2395
+ filter:
2396
+ labels: ['worktrain-queue'] # optional -- if omitted, pulls all open issues
2397
+ milestone: 'Sprint 12' # optional
2398
+ assignee: 'worktrain-bot' # optional -- only issues assigned to WorkTrain
2399
+ notLabels: ['needs-human', 'blocked', 'wontfix'] # always exclude these
2400
+
2401
+ - type: jira
2402
+ integration: jira
2403
+ filter:
2404
+ project: ENG # required -- scope to one project
2405
+ sprint: active # 'active', 'backlog', or sprint name
2406
+ assignee: worktrain # Jira user
2407
+ issueTypes: ['Bug', 'Task'] # not Stories/Epics
2408
+ notStatuses: ['Done', 'Closed']
2409
+
2410
+ - type: linear
2411
+ integration: linear
2412
+ filter:
2413
+ team: platform # Linear team slug
2414
+ state: triage # pull from triage queue
2415
+ priority: [urgent, high] # only urgent and high priority
2416
+ ```
2417
+
2418
+ **Catch-all mode:** if `filter` is omitted entirely, WorkTrain pulls everything open and unassigned in the project/repo. This is the "let WorkTrain go find work" mode -- useful for batch grooming sessions but should require explicit opt-in (`catchAll: true`) since it could pull thousands of items.
2419
+
2420
+ ```yaml
2421
+ - type: github_issues
2422
+ integration: github
2423
+ catchAll: true # pulls ALL open issues, no label required
2424
+ filter:
2425
+ notLabels: ['needs-human', 'wontfix']
2426
+ maxItemsPerCycle: 5 # drain slowly, not everything at once
2427
+ ```
2428
+
2429
+ ---
2430
+
2431
+ #### Deadline-aware prioritization
2432
+
2433
+ WorkTrain should be able to determine priority not just from labels, but from deadlines it finds anywhere:
2434
+
2435
+ **Sources WorkTrain reads for deadline context:**
2436
+ - Issue/ticket due dates (Jira, Linear, GitHub milestones)
2437
+ - Epic end dates (Jira epics, Linear projects)
2438
+ - Sprint end date (current active sprint)
2439
+ - Release/milestone dates from the repo
2440
+ - Calendar events (via Glean or Google Calendar integration)
2441
+ - Confluence/Notion pages that mention deadlines
2442
+ - Docs in the repo (`ROADMAP.md`, `docs/milestones.md`, etc.)
2443
+
2444
+ **What WorkTrain does with deadlines:**
2445
+ The classify-task-workflow (or a new `prioritize-queue` workflow) reads the deadline context and produces an adjusted priority score:
2446
+
2447
+ ```
2448
+ base_priority = from label/assignee (low/medium/high)
2449
+ deadline_urgency = days_until_deadline:
2450
+ < 2 days → +3 (critical)
2451
+ < 7 days → +2 (high)
2452
+ < 14 days → +1 (medium)
2453
+ > 14 days → +0 (no adjustment)
2454
+ past due → +4 (overdue, surface immediately)
2455
+
2456
+ adjusted_priority = base_priority + deadline_urgency
2457
+ ```
2458
+
2459
+ Items are queued in adjusted_priority order, not just the label order. A medium-priority task due tomorrow beats a high-priority task due in 3 months.
2460
+
2461
+ **Glean integration for deadline discovery:**
2462
+ Glean indexes everything -- Jira, Confluence, Google Docs, Slack, emails. WorkTrain can query Glean: "what are the deadlines affecting the workrail project this month?" and get a synthesized view across all systems. This is especially powerful for deadline context that lives in documents rather than tickets (e.g. a Confluence roadmap page that says "feature X must ship by Q2").
2463
+
2464
+ ```yaml
2465
+ workspaces:
2466
+ workrail:
2467
+ queue:
2468
+ deadlineContext:
2469
+ sources:
2470
+ - type: glean
2471
+ query: "workrail deadlines milestones due dates"
2472
+ maxResults: 10
2473
+ - type: github_milestones
2474
+ integration: github
2475
+ - type: jira_epics
2476
+ integration: jira
2477
+ project: ENG
2478
+ refreshInterval: 3600 # re-fetch deadline context every hour
2479
+ ```
2480
+
2481
+ **The prioritize-queue routine:**
2482
+ A cheap, fast routine (one step, Haiku model) that runs after each external sync and re-scores the queue. Reads: current queue items + deadline context. Outputs: reordered queue with deadline annotations. The coordinator's drain loop always reads the latest ordering.
2483
+
2484
+ ```
2485
+ Input: queue items + deadline context
2486
+ Output: same items reordered, each with:
2487
+ - adjustedPriority (critical/high/medium/low)
2488
+ - deadlineReason: "Sprint 12 ends in 3 days" or "Epic ENG-200 due June 1"
2489
+ - deadlineSource: URL or doc reference
2490
+ ```
2491
+
2492
+ **Escalation when deadlines are at risk:**
2493
+ If a queue item has a deadline within 48 hours and hasn't been started yet, the watchdog notifies: "WORKRAIL-410 (GitHub polling adapter) is due in 2 days and hasn't been started. Current queue position: 8. Bumping to position 1." Posts to Slack + the message outbox. The user can override via message queue if they disagree.
2494
+
2495
+ **Why this is powerful:**
2496
+ WorkTrain effectively becomes your sprint manager. It knows what's due, in what order things need to happen, and it works the highest-urgency items first -- without anyone having to manually reorder a board. The deadline context is always fresh (re-fetched every hour), so if a Confluence page updates the roadmap, the queue re-prioritizes automatically.
2497
+
2498
+ ---
2499
+
2500
+ ### Workspace pipeline policy: artifact gates vs autonomous decomposition (Apr 15, 2026)
2501
+
2502
+ **The core tension:** some workspaces have rigorous pre-implementation processes (BRD required, design approved, shapeup doc reviewed). Others are solo/small-team projects where you figure it out as you go. WorkTrain should respect both -- waiting patiently in governed workspaces, doing the work itself in autonomous workspaces.
2503
+
2504
+ ---
2505
+
2506
+ #### Two workspace modes
2507
+
2508
+ **Governed mode** -- for projects with existing process gates:
2509
+
2510
+ ```yaml
2511
+ workspaces:
2512
+ my-work-project:
2513
+ path: ~/git/work/my-project
2514
+ pipelinePolicy:
2515
+ mode: governed
2516
+ requiredArtifacts:
2517
+ - type: brd # Business Requirements Document
2518
+ sources: [confluence, jira_epic, google_docs]
2519
+ searchQuery: "BRD {{ticket.key}}"
2520
+ - type: design # UI/UX designs
2521
+ sources: [figma, confluence]
2522
+ searchQuery: "designs {{ticket.key}}"
2523
+ - type: shapeup # Shape Up pitch/bet
2524
+ sources: [notion, confluence]
2525
+ onMissingArtifacts: wait # 'wait', 'skip', or 'escalate'
2526
+ waitCheckInterval: 3600 # re-check every hour
2527
+ waitTimeout: 168h # escalate after 7 days of waiting
2528
+ escalationMessage: "Ticket {{ticket.key}} has been waiting for required artifacts for {{wait_duration}}. Manual review needed."
2529
+ ```
2530
+
2531
+ When WorkTrain picks up a ticket in governed mode, it first searches for the required artifacts using the configured sources and search queries. If they're not found:
2532
+ - `wait`: holds the ticket in a "waiting" state, re-checks every hour, notifies when artifacts appear
2533
+ - `skip`: moves to the next ticket, re-queues this one later
2534
+ - `escalate`: posts to Slack + blocks the ticket, requires human to resolve
2535
+
2536
+ When artifacts are found, WorkTrain automatically extracts context from them, attaches them as `referenceUrls` to the session, and proceeds with implementation -- skipping the discovery/design phases since those artifacts already contain the answer.
2537
+
2538
+ **Autonomous mode** -- for projects without pre-existing process:
2539
+
2540
+ ```yaml
2541
+ workspaces:
2542
+ workrail:
2543
+ path: ~/git/personal/workrail
2544
+ pipelinePolicy:
2545
+ mode: autonomous
2546
+ # No required artifacts -- WorkTrain does its own discovery and design
2547
+ # Uses the full pipeline: classify → discovery → design → arch review → implement → review
2548
+ decompositionEnabled: true # can break large tasks into sub-tickets
2549
+ decompositionThreshold: Large # tasks classified Large get decomposed
2550
+ ```
2551
+
2552
+ In autonomous mode, WorkTrain runs the full pipeline including discovery, UX design (if `hasUI`), architecture review (if `touchesArchitecture`), and implementation. It doesn't wait for external artifacts because there are none -- it generates them itself.
2553
+
2554
+ ---
2555
+
2556
+ #### Automatic task decomposition
2557
+
2558
+ When a task is classified as `Large` (or Medium with high complexity), WorkTrain decomposes it into sub-tickets before starting implementation. The sub-tickets go into the workspace queue and are worked in order.
2559
+
2560
+ **Decomposition workflow** (new, needs authoring):
2561
+ ```
2562
+ Input: task description + context from discovery
2563
+ Output: ordered list of sub-tickets, each with:
2564
+ - title (imperative, specific)
2565
+ - goal (1-2 sentence description)
2566
+ - estimatedComplexity (Small/Medium)
2567
+ - dependencies (which sub-tickets must complete first)
2568
+ - workflowId (which workflow to use)
2569
+ ```
2570
+
2571
+ **Example:** task "implement polling triggers system"
2572
+ ```
2573
+ Decomposed into:
2574
+ 1. [Small] Add PollingTriggerSource type to TriggerDefinition → depends: none
2575
+ 2. [Small] Implement PolledEventStore with atomic persistence → depends: 1
2576
+ 3. [Small] Implement GitLab MR polling adapter → depends: 2
2577
+ 4. [Small] Implement PollingScheduler with setInterval → depends: 2,3
2578
+ 5. [Small] Wire PollingScheduler into TriggerListener → depends: 4
2579
+ 6. [Small] Add unit tests for all new modules → depends: 1-5
2580
+ ```
2581
+
2582
+ Each sub-ticket is Small or Medium -- never Large. If a sub-ticket comes out Large during decomposition, it gets recursively decomposed. The decomposition agent enforces this invariant.
2583
+
2584
+ Sub-tickets are added to the queue with:
2585
+ - `parentTicketId` linking back to the original task
2586
+ - `dependsOn` list preventing out-of-order execution
2587
+ - Same priority as the parent ticket
2588
+ - Auto-label so they're visually grouped in GitHub/Jira
2589
+
2590
+ **Queue behavior with dependencies:**
2591
+ The queue drain loop respects `dependsOn` -- a sub-ticket is only picked up when all its dependencies are completed. The coordinator naturally serializes dependent work and parallelizes independent work (sub-tickets with no shared dependencies can run concurrently).
2592
+
2593
+ ---
2594
+
2595
+ #### Hybrid: governed workspace with autonomous decomposition
2596
+
2597
+ Some workspaces need both -- a BRD required before implementation starts, but the implementation itself gets decomposed autonomously:
2598
+
2599
+ ```yaml
2600
+ workspaces:
2601
+ my-work-project:
2602
+ pipelinePolicy:
2603
+ mode: governed
2604
+ requiredArtifacts:
2605
+ - type: brd
2606
+ onMissingArtifacts: wait
2607
+ decompositionEnabled: true # once BRD is found, decompose into sub-tickets
2608
+ decompositionThreshold: Medium # decompose Medium and Large tasks
2609
+ ```
2610
+
2611
+ Flow: ticket filed → WorkTrain finds BRD → reads BRD for context → classifies task → if Medium/Large decomposes into sub-tickets → works sub-tickets in order. The BRD gates the start; decomposition handles the execution.
2612
+
2613
+ ---
2614
+
2615
+ #### The "patiently waiting" UX
2616
+
2617
+ In the console Queue tab, tickets waiting for artifacts show a distinct state:
2618
+
2619
+ ```
2620
+ ⏳ WORKRAIL-410: Implement new auth flow [waiting for: BRD, designs]
2621
+ Waiting 2d 4h · Last checked: 5 min ago · Artifacts: 0/2 found
2622
+
2623
+ → Found: none
2624
+ → Searched: Confluence ("BRD WORKRAIL-410"), Figma ("WORKRAIL-410 designs")
2625
+ ```
2626
+
2627
+ WorkTrain posts a Slack message when it starts waiting: "I picked up WORKRAIL-410 but it's missing required artifacts (BRD, designs). I'll check hourly and start automatically when they're ready." Then posts again when artifacts are found: "Found BRD and designs for WORKRAIL-410. Starting implementation now."
2628
+
2629
+ The team doesn't have to remember to trigger WorkTrain -- they just do their normal process (write the BRD, create the designs) and WorkTrain starts automatically.
2630
+
2631
+ ---
2632
+
2633
+ #### Why this matters
2634
+
2635
+ - **Governed projects**: WorkTrain integrates with existing process rather than bypassing it. PMs and designers work normally; WorkTrain picks up when the handoff is ready. No one has to remember to trigger it.
2636
+ - **Autonomous projects**: WorkTrain is a full solo developer -- it discovers, designs, decomposes, implements, reviews, and ships. The only human touchpoint is approving the final PR (or enabling auto-merge for fully vetted changes).
2637
+ - **The queue is the unifying interface**: both modes feed the same queue. The pipeline policy determines what happens when an item is picked up.
2638
+
2639
+ ---
2640
+
2641
+ ### Templates, living docs, and external workflow ingestion (Apr 15, 2026)
2642
+
2643
+ ---
2644
+
2645
+ #### Templates: consistent output formatting across all systems
2646
+
2647
+ WorkTrain should know the templates used in each workspace and apply them automatically when creating artifacts. No more agents writing PRs in inconsistent formats or Jira tickets missing required fields.
2648
+
2649
+ **Template types:**
2650
+
2651
+ ```yaml
2652
+ workspaces:
2653
+ my-work-project:
2654
+ templates:
2655
+ pullRequest:
2656
+ source: .github/pull_request_template.md # repo-local
2657
+ mergeRequest:
2658
+ source: .gitlab/merge_request_templates/default.md
2659
+ jiraTicket:
2660
+ source: confluence://ENG/ticket-template # from Confluence
2661
+ requiredFields: [summary, description, acceptanceCriteria, storyPoints, component]
2662
+ jiraBug:
2663
+ source: confluence://ENG/bug-template
2664
+ requiredFields: [summary, description, stepsToReproduce, expectedVsActual, severity]
2665
+ shapeup:
2666
+ source: notion://templates/shapeup-pitch
2667
+ brd:
2668
+ source: confluence://templates/brd-template
2669
+ designSpec:
2670
+ source: notion://templates/design-spec
2671
+ incidentPostmortem:
2672
+ source: confluence://templates/postmortem
2673
+ ```
2674
+
2675
+ When WorkTrain creates a PR, it reads the PR template and structures its output to match. When it files a Jira bug from an investigation, it reads the bug template and fills every required field. When it writes a BRD in autonomous mode, it uses the BRD template so the output looks like what the team expects.
2676
+
2677
+ Templates are resolved at session start and injected as context. The agent is told: "When creating a [type], use this template structure exactly." The handoff artifact for the auto-commit/PR path includes the PR body pre-formatted to match the template.
2678
+
2679
+ **Template sources:**
2680
+ - Local files in the repo (`.github/`, `.gitlab/`, `docs/templates/`)
2681
+ - Confluence pages
2682
+ - Notion databases/templates
2683
+ - Google Docs
2684
+ - Inline in `triggers.yml`
2685
+
2686
+ ---
2687
+
2688
+ #### Living docs: on-demand generation and continuous updates
2689
+
2690
+ WorkTrain maintains documentation as a first-class output, not an afterthought. Docs can be generated on-demand and kept current automatically.
2691
+
2692
+ **On-demand doc generation:**
2693
+
2694
+ ```bash
2695
+ worktrain doc generate --type architecture-overview --workspace workrail
2696
+ worktrain doc generate --type api-reference --workspace workrail
2697
+ worktrain doc generate --type runbook "How to debug a stuck daemon session"
2698
+ worktrain doc generate --type adr "Why we replaced pi-mono with a first-party agent loop"
2699
+ ```
2700
+
2701
+ Each generates a doc by pulling from all available sources:
2702
+ - Knowledge graph (structural understanding of the codebase)
2703
+ - Session store (recent decisions and findings)
2704
+ - Backlog (design decisions and rationale)
2705
+ - GitHub PRs (what changed and why -- from PR descriptions)
2706
+ - Confluence/Notion (existing docs to extend, not duplicate)
2707
+
2708
+ **Continuous doc updates:**
2709
+ When code changes, affected docs are flagged for update. WorkTrain runs a `doc-drift-scan` (part of the periodic analysis agents) that identifies docs whose described behavior no longer matches the code. When drift is detected, a queue item is created: "Update architecture-overview.md -- AgentLoop class was added, pi-mono removed."
2710
+
2711
+ ```yaml
2712
+ workspaces:
2713
+ workrail:
2714
+ docs:
2715
+ autoUpdate: true
2716
+ docPaths:
2717
+ - docs/architecture/
2718
+ - docs/design/
2719
+ - README.md
2720
+ driftCheck:
2721
+ schedule: "0 8 * * 1" # Monday morning
2722
+ onDrift: queue # or: pr, notify, ignore
2723
+ ```
2724
+
2725
+ **Doc sources it pulls from:**
2726
+
2727
+ | Source | What it provides |
2728
+ |--------|-----------------|
2729
+ | Knowledge graph | Symbol relationships, module structure, call paths |
2730
+ | Session store | Recent decisions, investigation findings, design rationale |
2731
+ | Backlog | Why things were built the way they were |
2732
+ | Git log | What changed, when, linked PRs |
2733
+ | Confluence/Notion | Existing team knowledge to incorporate |
2734
+ | Glean | Cross-system knowledge synthesis |
2735
+ | Code comments and JSDoc | Inline documentation |
2736
+
2737
+ **Doc formats it produces:**
2738
+ - Architecture overview (modules, dependencies, data flow)
2739
+ - API reference (from TypeScript types + JSDoc)
2740
+ - Runbook (operational procedures)
2741
+ - ADR (Architecture Decision Record -- from backlog decisions)
2742
+ - Postmortem (from incident investigation sessions)
2743
+ - Sprint recap (from completed queue items)
2744
+ - Onboarding guide (from architecture + setup docs)
2745
+
2746
+ ---
2747
+
2748
+ #### External workflow ingestion
2749
+
2750
+ WorkTrain can already discover and run workflows from external repos via managed sources (`[[workflow_repos]]` in Common-Ground config). This should be a first-class feature, not just a Common-Ground integration.
2751
+
2752
+ **How it works today (via managed sources):**
2753
+ Any workflow JSON file in a configured directory or git repo is automatically available. `workrail list` shows all workflows from all sources.
2754
+
2755
+ **What to add:**
2756
+
2757
+ **1. Workflow registry / marketplace:**
2758
+ A curated list of community workflows that WorkTrain can pull from. `worktrain workflow install <id>` fetches a workflow from the registry and adds it to the user's workflow library.
2759
+
2760
+ ```bash
2761
+ worktrain workflow install community/postgres-migration-workflow
2762
+ worktrain workflow install company/my-company-mr-review # private org registry
2763
+ worktrain workflow install ./local-custom-workflow.json # local file
2764
+ ```
2765
+
2766
+ **2. Workflow composition:**
2767
+ A workflow that calls another workflow as a step (already possible via `templateCall`, extend to full `workflowCall`). A coordinator workflow can invoke specialized workflows as phases:
2768
+
2769
+ ```json
2770
+ {
2771
+ "id": "full-feature-pipeline",
2772
+ "steps": [
2773
+ { "workflowCall": { "workflowId": "classify-task-workflow" } },
2774
+ { "workflowCall": { "workflowId": "wr.discovery", "when": "taskComplexity != Small" } },
2775
+ { "workflowCall": { "workflowId": "coding-task-workflow-agentic" } },
2776
+ { "workflowCall": { "workflowId": "mr-review-workflow.agentic.v2" } }
2777
+ ]
2778
+ }
2779
+ ```
2780
+
2781
+ **3. Workflow sharing between workspaces:**
2782
+ A workflow authored for workrail can be shared to storyforge without copying it. Workflows are linked by reference, not copied. Updates to the source propagate automatically (or on explicit sync).
2783
+
2784
+ **4. Org-level workflow libraries:**
2785
+ Teams publish their workflow libraries to a git repo. WorkTrain pulls from it. Every team member's WorkTrain automatically gets the team's curated workflow set. This is exactly what Common-Ground's `[[workflow_repos]]` does today -- make it a first-class WorkTrain config option without requiring Common-Ground.
2786
+
2787
+ ```yaml
2788
+ workspaces:
2789
+ my-work-project:
2790
+ workflowSources:
2791
+ - type: git
2792
+ url: https://github.com/mycompany/worktrain-workflows
2793
+ branch: main
2794
+ syncInterval: 3600
2795
+ - type: local
2796
+ path: ~/git/personal/workrail/workflows
2797
+ ```
2798
+
2799
+ ---
2800
+
2801
+ ### Workflow effectiveness assessment and self-improvement proposals (Apr 15, 2026)
2802
+
2803
+ **The idea:** WorkTrain runs workflows hundreds of times. It accumulates more data about workflow effectiveness than any human author ever could. It should use that data to propose improvements back -- to the workflow library, to the workflow authors, and to the community.
2804
+
2805
+ This closes the self-improvement loop: WorkTrain uses workflows → measures outcomes → proposes improvements → workflows get better → WorkTrain produces better results.
2806
+
2807
+ ---
2808
+
2809
+ #### What WorkTrain measures per workflow run
2810
+
2811
+ Every session already stores rich data in the session store. From this, WorkTrain can derive:
2812
+
2813
+ **Efficiency metrics:**
2814
+ - Steps skipped (condition gates that always skip for a given workflow type) → candidate for removal or restructuring
2815
+ - Steps that consistently take the most tokens/time → candidates for subagent offloading or simplification
2816
+ - Steps where the agent calls `continue_workflow` immediately with minimal work → the step prompt may be too vague or redundant
2817
+ - Steps where the agent hits `requireConfirmation` and always gets the same response → the gate is unnecessary for autonomous use
2818
+
2819
+ **Quality metrics:**
2820
+ - Sessions that produced PRs: how many had MR review findings? How severe?
2821
+ - Sessions where findings required multiple fix passes → the workflow may not be thorough enough in those areas
2822
+ - Sessions where the final output was rejected or required manual correction → workflow produced low-quality output
2823
+ - Verification gate pass rate (build_correctness, invariant_preservation) → how often does the workflow produce code that actually works?
2824
+
2825
+ **Completion metrics:**
2826
+ - Sessions that completed vs hit max_turns or timeout → workflow may be too long for the given task type
2827
+ - Steps where the agent loops unexpectedly (loop_control: continue more than expected) → loop exit conditions may be wrong
2828
+ - Steps with unusually high token consumption → prompt may be bloated
2829
+
2830
+ ---
2831
+
2832
+ #### The assessment workflow
2833
+
2834
+ A new `workflow-effectiveness-assessment` workflow (or routine) that:
2835
+
2836
+ 1. Reads session store history for a given workflowId (last N sessions)
2837
+ 2. Computes the metrics above
2838
+ 3. Identifies the top 3-5 issues with evidence (specific sessions, specific steps)
2839
+ 4. Proposes concrete changes:
2840
+ - "Step `phase-1b-design-quick` was skipped in 87% of sessions because `rigorMode != QUICK`. Consider making this condition more permissive or removing the step."
2841
+ - "Step `phase-4-plan-audit` consumed an average of 4,200 tokens per session. The loop runs 1.8 times on average. Consider reducing `maxIterations` from 2 to 1 for QUICK rigor mode."
2842
+ - "3 of the last 8 `coding-task-workflow-agentic` sessions produced PRs with Critical MR review findings. The workflow's verification step may not be catching these issues."
2843
+
2844
+ 5. Outputs a structured proposal:
2845
+
2846
+ ```json
2847
+ {
2848
+ "workflowId": "coding-task-workflow-agentic.lean.v2",
2849
+ "assessmentPeriod": "last 30 sessions",
2850
+ "proposedChanges": [
2851
+ {
2852
+ "stepId": "phase-1b-design-quick",
2853
+ "issue": "Skipped in 87% of sessions",
2854
+ "evidence": ["sess_abc", "sess_def", "sess_ghi"],
2855
+ "proposedChange": "Remove or restructure -- not exercised enough to justify its existence",
2856
+ "confidence": 0.85,
2857
+ "impactEstimate": "Saves ~200 tokens per session, no quality impact"
2858
+ }
2859
+ ],
2860
+ "overallHealthScore": 0.72,
2861
+ "recommendation": "Run workflow-for-workflows on this workflow with assessment findings attached"
2862
+ }
2863
+ ```
2864
+
2865
+ ---
2866
+
2867
+ #### How proposals flow back
2868
+
2869
+ **To WorkRail (the open-source project):**
2870
+ WorkTrain creates a GitHub issue on `EtienneBBeaulac/workrail` with the assessment findings and proposed changes. The issue includes:
2871
+ - The assessment data (anonymized session stats, no content)
2872
+ - The proposed changes with rationale
2873
+ - Label: `workflow-improvement-proposal`
2874
+
2875
+ Any WorkTrain user can contribute workflow improvements back to the community just by running WorkTrain and enabling assessments.
2876
+
2877
+ **To workflow authors (for non-bundled workflows):**
2878
+ If the workflow came from an org workflow library (`workflowSources: git`), WorkTrain opens a PR against that repo with the proposed changes. The workflow author reviews and merges.
2879
+
2880
+ **To the local workflow library:**
2881
+ WorkTrain can automatically apply low-risk changes (reordering steps, updating prompt text) to the user's local workflow copy. High-risk changes (removing steps, changing conditions) require human review. Same governed/autonomous split as everywhere else.
2882
+
2883
+ ---
2884
+
2885
+ #### Continuous improvement loop
2886
+
2887
+ ```
2888
+ WorkTrain runs workflows
2889
+ → session store accumulates data
2890
+ → weekly: assessment routine analyzes patterns
2891
+ → proposals generated per workflow
2892
+ → low-confidence proposals: GitHub issue for human review
2893
+ → high-confidence, low-risk proposals: auto-applied to local copy + PR to community
2894
+ → workflow gets better
2895
+ → WorkTrain produces better results
2896
+ → loop repeats
2897
+ ```
2898
+
2899
+ **The compounding effect:** every WorkTrain instance that runs assessments contributes signal. A workflow used by 100 teams accumulates 10x the data of a workflow used by 10 teams. The more WorkTrain is used, the better its workflows get -- for everyone. This is the flywheel that makes WorkTrain's workflow library genuinely better than hand-authored alternatives over time.
2900
+
2901
+ **What makes this different from manual workflow improvement:**
2902
+ Humans improve workflows based on intuition and memorable failures. WorkTrain improves workflows based on statistical patterns across hundreds of runs. It finds issues that no human would notice -- like a step that's almost always skipped, or a loop that almost always terminates on the first pass, or a prompt fragment that correlates with lower-quality output.
2903
+
2904
+ **Integration with `workflow-for-workflows`:**
2905
+ The assessment output is designed to feed directly into `workflow-for-workflows`. Assessment findings become the context for authoring improved workflow versions. WorkTrain literally uses its own meta-workflow to improve its own workflows, informed by real execution data.
2906
+
2907
+
2908
+ **The problem with polling-only:** the queue is as fresh as the last poll cycle. A critical bug filed in Jira might not appear in the queue for 5 minutes. A deadline that just moved to tomorrow might not re-prioritize for an hour. The work queue should feel live -- changes in external systems should surface in the queue within seconds, not minutes.
2909
+
2910
+ **Two mechanisms for live updates:**
2911
+
2912
+ **1. Push sources (webhooks from external systems)**
2913
+ When an external system supports webhooks, WorkTrain should register a receiver and process events immediately -- no polling lag.
2914
+
2915
+ ```yaml
2916
+ workspaces:
2917
+ workrail:
2918
+ queue:
2919
+ sources:
2920
+ - type: github_issues
2921
+ integration: github
2922
+ mode: push # vs poll -- receives webhook, processes immediately
2923
+ webhookSecret: $GITHUB_WEBHOOK_SECRET
2924
+ filter:
2925
+ labels: ['worktrain-queue']
2926
+
2927
+ - type: jira
2928
+ integration: jira
2929
+ mode: push # Jira webhook on issue create/update/transition
2930
+ webhookSecret: $JIRA_WEBHOOK_SECRET
2931
+ filter:
2932
+ project: ENG
2933
+ ```
2934
+
2935
+ A new GitHub issue labeled `worktrain-queue` fires a webhook → WorkTrain adds it to the queue within milliseconds. A Jira ticket assigned to WorkTrain → in the queue before the assignee closes the tab.
2936
+
2937
+ **2. The message queue as live input**
2938
+ `worktrain tell "add X to the queue"` is already instantaneous -- it appends to `message-queue.jsonl` which the daemon drains between sessions. This is the live grooming path for manual items. It's also how you reorder, prioritize, remove, or modify queue items in real time:
2939
+
2940
+ ```bash
2941
+ worktrain tell "move the GitHub polling adapter to the top of the queue"
2942
+ worktrain tell "remove the documentation update task -- no longer needed"
2943
+ worktrain tell "bump the maxConcurrentSessions task to high priority, we need it for the demo"
2944
+ ```
2945
+
2946
+ The daemon's coordinator loop reads these messages, interprets them as queue operations, and applies them immediately.
2947
+
2948
+ **3. Live re-prioritization via deadline watcher**
2949
+ The deadline context refresh (already spec'd) runs every hour. For live grooming, the deadline watcher should also subscribe to calendar/milestone change events via webhook where available:
2950
+ - GitHub milestone due date changed → immediate re-prioritization
2951
+ - Jira sprint end date changed → immediate re-scoring
2952
+ - Google Calendar event added/moved → immediate re-scoring
2953
+
2954
+ **The live queue architecture:**
2955
+
2956
+ ```
2957
+ External events (webhooks) ──→ POST /webhook/queue-push
2958
+
2959
+
2960
+ QueueEventProcessor
2961
+
2962
+ ┌─────┴──────┐
2963
+ │ │
2964
+ Add to queue Re-prioritize
2965
+ immediately affected items
2966
+ │ │
2967
+ └─────┬──────┘
2968
+
2969
+ queue.jsonl updated
2970
+
2971
+
2972
+ Console Queue tab refreshes (SSE)
2973
+ Coordinator picks up next item
2974
+ ```
2975
+
2976
+ **The queue tab in the console is live:**
2977
+ The console Queue tab streams updates via SSE (same pattern as the live session badge already implemented). When a new item is added via webhook or message queue, it appears in the tab within milliseconds -- no page refresh needed. When re-prioritization happens, items smoothly reorder. This is the always-on view of what WorkTrain is working on and what's coming next.
2978
+
2979
+ **Grooming operations the live queue supports:**
2980
+
2981
+ | Operation | How |
2982
+ |-----------|-----|
2983
+ | Add item | `worktrain tell`, webhook, `worktrain enqueue` |
2984
+ | Remove item | `worktrain tell "remove X"`, `worktrain queue remove <id>` |
2985
+ | Reprioritize | `worktrain tell "prioritize X"`, deadline watcher, manual drag in console |
2986
+ | Pause item | `worktrain queue pause <id>` -- holds in place, not worked until resumed |
2987
+ | Block item | System-set when dependencies not met (auto-resolves when deps complete) |
2988
+ | Split item | `worktrain tell "split X into smaller tasks"` → runs decomposition workflow |
2989
+ | Merge items | `worktrain tell "X and Y are the same thing, merge them"` |
2990
+ | Add context | `worktrain tell "for X, the BRD is at <url>"` → attaches to queue item |
2991
+
2992
+ **Why this changes the interaction model:**
2993
+ With polling-only queues, you have to trust that WorkTrain will eventually see the work. With live queuing, WorkTrain is always current. You file a critical bug at 11pm, the webhook fires, it's at the top of the queue, and WorkTrain starts investigating within seconds. You push a doc link into `worktrain tell`, the queue item gets the context immediately. The queue feels like a shared workspace, not a batch job.
2994
+
2995
+ ---
2996
+
2997
+ ### Live status briefings: WorkTrain narrates its own work in human terms (Apr 15, 2026)
2998
+
2999
+ **The problem:** WorkTrain is doing a lot. Sessions are running, PRs are open, the queue has items. But the raw view -- session IDs, PR numbers, branch names -- is only meaningful to someone who's been following along. A user who checks in after a few hours needs a human-readable briefing, not a list of `sess_abc123` entries.
3000
+
3001
+ **The vision:** WorkTrain can produce a live status briefing at any time -- a clear, plain-language summary of what's happening, why, and what comes next. Like a teammate giving you a standup.
3002
+
3003
+ ---
3004
+
3005
+ #### The `worktrain status` command
3006
+
3007
+ ```bash
3008
+ worktrain status --workspace workrail
3009
+ ```
3010
+
3011
+ Example output:
3012
+ ```
3013
+ WorkTrain — workrail workspace [16 Apr 2026, 14:32]
3014
+
3015
+ ACTIVE (3 sessions running)
3016
+ ● Implementing GitHub polling adapter
3017
+ → Adding support for GitHub Issues/PRs without requiring webhooks
3018
+ → Step 4 of 8: writing the polling scheduler integration tests
3019
+ → Running ~22 min, estimated 15 min remaining
3020
+
3021
+ ● Reviewing PR #406: first-party agent loop
3022
+ → Critical dependency removal: eliminates private npm package blocking public install
3023
+ → Step 2 of 6: analyzing tool schema migration
3024
+ → Running ~8 min
3025
+
3026
+ ● Fixing PR #402: auto-commit shell injection
3027
+ → Security fix: replacing exec() with execFile() to prevent shell injection
3028
+ → Step 6 of 8: running verification
3029
+ → Running ~31 min
3030
+
3031
+ QUEUE (next 5 items)
3032
+ 1. [HIGH] Implement maxConcurrentSessions semaphore
3033
+ → Prevents token burn under high load
3034
+ 2. [HIGH] worktrain tell/inbox message queue CLI
3035
+ → Enables async communication from mobile
3036
+ 3. [MED] Proof record schema for verification chain
3037
+ → Gates the auto-merge capability
3038
+ 4. [MED] Workspace namespacing groundwork
3039
+ → Prerequisite for multi-project support
3040
+ 5. [MED] Native cron trigger provider
3041
+
3042
+ RECENTLY COMPLETED (last 6 hours)
3043
+ ✓ PR #403 merged — worktrain init onboarding command (now: npm install -g + worktrain init = running)
3044
+ ✓ PR #397 merged — Session timeout + max-turn limit (prevents runaway LLM loops)
3045
+ ✓ PR #392 merged — Prior session context injection (agent remembers previous work)
3046
+ ✓ PR #405 merged — classify-task workflow (coordinator can now route pipelines)
3047
+
3048
+ BLOCKED / WAITING
3049
+ ⏸ PR #406 review returned changes — fixing 2 issues (tsc breakage + max_tokens handling)
3050
+ Will resume automatically once fixed and re-reviewed
3051
+
3052
+ UPCOMING MILESTONES
3053
+ → First-party agent loop (#406) — unblocks: public npm install without private packages
3054
+ → worktrain spawn/await — unblocks: script-driven coordinator orchestration
3055
+ → Auto-merge on proof records — unblocks: fully autonomous merge without human approval
3056
+ ```
3057
+
3058
+ ---
3059
+
3060
+ #### How it works
3061
+
3062
+ The briefing is assembled by a `build-status-briefing` routine (not a full workflow -- a single fast step) that reads:
3063
+ - Active sessions from the session store (what's running, which step, how long)
3064
+ - Queue state from `queue.jsonl`
3065
+ - Recent completions from the merge audit log + session store
3066
+ - Blocked/waiting items from the queue
3067
+ - Milestone dependencies from the backlog (which items unblock what)
3068
+
3069
+ The routine summarizes each session in 2-3 plain English lines:
3070
+ - What is being built (not the PR number, the capability)
3071
+ - Why it matters (how it connects to the user's goals)
3072
+ - Where it is (which step, estimated remaining time)
3073
+
3074
+ This requires WorkTrain to maintain a brief "plain English description" for each queue item and active session -- either extracted from the goal text, or generated when the item is enqueued.
3075
+
3076
+ ---
3077
+
3078
+ #### Live view in the console
3079
+
3080
+ The console gains a **Status tab** (the default view when you open the console):
3081
+
3082
+ ```
3083
+ ┌─────────────────────────────────────────────┐
3084
+ │ WorkTrain — workrail Live │
3085
+ ├─────────────────────────────────────────────┤
3086
+ │ ACTIVE 3 │
3087
+ │ │
3088
+ │ ● GitHub polling adapter 22m ████ │
3089
+ │ Step 4/8: writing tests │
3090
+ │ │
3091
+ │ ● PR #406 agent loop review 8m ██ │
3092
+ │ Step 2/6: schema analysis │
3093
+ │ │
3094
+ │ ● PR #402 shell injection fix 31m ████ │
3095
+ │ Step 6/8: verification │
3096
+ ├─────────────────────────────────────────────┤
3097
+ │ QUEUE 8 │
3098
+ │ 1 ▲ maxConcurrentSessions (HIGH) │
3099
+ │ 2 message queue CLI (HIGH) │
3100
+ │ 3 proof record schema (MED) │
3101
+ │ 4 ▼ workspace namespacing (MED) │
3102
+ ├─────────────────────────────────────────────┤
3103
+ │ DONE TODAY 12 │
3104
+ │ ✓ worktrain init ✓ session timeout │
3105
+ │ ✓ classify-task ✓ session context │
3106
+ └─────────────────────────────────────────────┘
3107
+ ```
3108
+
3109
+ Updates via SSE -- the progress bars move in real time, completed items slide up to DONE, new queue items animate in. Click any row to expand the full session detail or queue item.
3110
+
3111
+ ---
3112
+
3113
+ #### Push notifications to mobile/Slack
3114
+
3115
+ The same briefing data drives push notifications:
3116
+
3117
+ **Milestone completions:**
3118
+ > "WorkTrain shipped: worktrain init is live. You can now run `npm install -g @exaudeus/workrail && worktrain init` to set up a new instance in under 5 minutes. 3 more PRs in review."
3119
+
3120
+ **Blockers surfaced:**
3121
+ > "PR #406 (first-party agent loop) came back with 2 issues -- one causes tsc to fail on clean install. Fixing automatically, estimated 20 min."
3122
+
3123
+ **Daily digest (optional, configurable):**
3124
+ > "WorkTrain daily summary — 6 sessions completed, 3 PRs merged, 2 in review. Top priority tomorrow: spawn/await CLI (unblocks coordinator scripts). Queue has 8 items, 3 high priority."
3125
+
3126
+ The briefing is generated by a fast, cheap routine (Haiku model) that translates raw state into the right level of detail for the audience. Technical details available on request; the default is executive summary.
3127
+
3128
+ ---
3129
+
3130
+ #### Context-aware summarization
3131
+
3132
+ The briefing adapts to who's asking and what they know:
3133
+
3134
+ - **Owner/developer** (you): full detail -- PR numbers, session steps, technical blockers
3135
+ - **Stakeholder** (PM, manager): capability level -- "implementing X which enables Y, shipping this week"
3136
+ - **External** (customer, blog post): outcome level -- "automated code review is live, auto-merge coming next sprint"
3137
+
3138
+ `worktrain status --audience stakeholder` generates the right level of detail automatically. The underlying data is the same; the presentation layer changes.
3139
+
3140
+ This is also what the `worktrain talk` session uses as its opening context -- before any conversation, WorkTrain gives itself a briefing on the current state so it can answer questions accurately.
3141
+
3142
+ ---
3143
+
3144
+ ### WorkTrain analytics: stats, time saved, and quality metrics (Apr 15, 2026)
3145
+
3146
+ **The principle:** WorkTrain should be accountable. Not just "it did work" but "did it do good work?" Stats without quality metrics are vanity. Quality metrics without stats lack context. Both together tell you whether WorkTrain is actually worth running.
3147
+
3148
+ ---
3149
+
3150
+ #### Volume stats (what got done)
3151
+
3152
+ Derived from session store + merge audit log + GitHub/Jira API:
3153
+
3154
+ ```
3155
+ WorkTrain — workrail workspace [last 30 days]
3156
+
3157
+ VOLUME
3158
+ PRs opened: 23 (18 merged, 3 in review, 2 closed)
3159
+ PRs reviewed: 31 (autonomous MR review sessions)
3160
+ Bugs investigated: 8 (bug-investigation workflow runs)
3161
+ Tasks completed: 19 (coding-task workflow runs → merged PRs)
3162
+ Discoveries run: 12 (wr.discovery workflow runs)
3163
+ Issues filed: 6 (by WorkTrain based on findings)
3164
+ Issues resolved: 4 (WorkTrain opened and closed)
3165
+
3166
+ QUEUE THROUGHPUT
3167
+ Items added: 34
3168
+ Items completed: 27
3169
+ Items in progress: 4
3170
+ Items deferred: 3
3171
+ Average queue time: 2.4h (enqueue → session start)
3172
+ ```
3173
+
3174
+ ---
3175
+
3176
+ #### Time saved estimates
3177
+
3178
+ "Time saved" is directionally useful but must be honest about what it's estimating. WorkTrain shouldn't claim 40 hours saved if a human would have done the same work in 30 minutes.
3179
+
3180
+ **Estimation model:**
3181
+
3182
+ Each workflow type has a calibrated human-equivalent time estimate, validated against real data where possible:
3183
+
3184
+ | Workflow | Human equivalent | Basis |
3185
+ |----------|-----------------|-------|
3186
+ | MR review (STANDARD) | 25 min | Industry average for 200-line diff |
3187
+ | MR review (THOROUGH) | 45 min | Complex architectural changes |
3188
+ | Bug investigation | 60 min | Triage + root cause hypothesis |
3189
+ | Coding task (Small) | 30 min | Estimate based on task complexity |
3190
+ | Coding task (Medium) | 2h | |
3191
+ | Coding task (Large) | 6h | |
3192
+ | Discovery run | 45 min | Research + synthesis |
3193
+
3194
+ ```
3195
+ TIME SAVINGS (estimated)
3196
+ MR reviews: 31 × 25 min = 12.9h
3197
+ Bug investigation: 8 × 60 min = 8.0h
3198
+ Coding tasks: 19 tasks = 32.5h (mix of Small/Medium)
3199
+ Discovery: 12 × 45 min = 9.0h
3200
+ ─────────────────────────────────────────
3201
+ Total estimate: 62.4h ≈ 1.5 engineer-weeks
3202
+
3203
+ COST
3204
+ Total LLM tokens used: 4.2M
3205
+ Estimated API cost: $12.40
3206
+ Cost per hour saved: $0.20/h
3207
+
3208
+ NOTE: These are estimates. Actual time savings depend on task complexity
3209
+ and whether the work would otherwise have been done at all.
3210
+ ```
3211
+
3212
+ The honesty note matters. "Time saved" is only real if the work would have been done by a human. Tasks that were deprioritized indefinitely until WorkTrain did them represent more value than 25-minute estimates suggest.
3213
+
3214
+ ---
3215
+
3216
+ #### Quality metrics (is WorkTrain actually doing a good job?)
3217
+
3218
+ This is the most important section. Volume without quality is noise.
3219
+
3220
+ **Output quality:**
3221
+
3222
+ ```
3223
+ QUALITY — last 30 days
3224
+
3225
+ MR REVIEWS
3226
+ Reviews with 0 findings: 14 / 31 (45%) -- clean PRs, reviewed correctly
3227
+ Reviews that caught Critical: 4 / 31 (13%) -- high-value catches
3228
+ Reviews where human disagreed: 2 / 31 (6%) -- false positives / misses
3229
+ Review finding accuracy: 94% -- verified against merge outcomes
3230
+
3231
+ CODING TASKS
3232
+ PRs merged without rework: 13 / 18 (72%)
3233
+ PRs that needed 1 fix cycle: 4 / 18 (22%)
3234
+ PRs that needed 2+ fix cycles: 1 / 18 (6%)
3235
+ PRs that were rejected/closed: 0 / 18 (0%)
3236
+
3237
+ Post-merge bugs filed (30d): 1 -- bug traced to WorkTrain PR
3238
+ Post-merge bugs rate: 5.6% -- 1 in 18 PRs caused a bug
3239
+
3240
+ BUG INVESTIGATIONS
3241
+ Correct root cause identified: 6 / 8 (75%)
3242
+ Confidence was too high: 1 / 8 (13%) -- confidently wrong
3243
+ Insufficient context: 1 / 8 (13%) -- escalated correctly
3244
+
3245
+ OVERALL QUALITY SCORE: 78 / 100
3246
+ Trend: ↑ +6 vs last month
3247
+ ```
3248
+
3249
+ **What the failure rate means:**
3250
+ A 5.6% post-merge bug rate on coding tasks means roughly 1 in 18 WorkTrain PRs introduced a bug that was later filed as an issue. That's comparable to junior developer rates (industry average ~10-15%). If it rises above 10%, there's a systemic problem to investigate -- maybe the verification step isn't thorough enough, maybe certain task types are too risky for autonomous work.
3251
+
3252
+ The quality score is a weighted composite:
3253
+ - Review accuracy (40%)
3254
+ - Coding task success rate (35%)
3255
+ - Investigation accuracy (25%)
3256
+
3257
+ It's the single number that answers "is WorkTrain doing good work?" A score below 70 should trigger a `workflow-effectiveness-assessment` run automatically.
3258
+
3259
+ ---
3260
+
3261
+ #### Quality feedback loop
3262
+
3263
+ WorkTrain actively solicits quality signals:
3264
+
3265
+ 1. **Post-merge outcome tracking:** when a PR merged by WorkTrain has a bug filed against it within 30 days, the session that produced that PR is flagged. The bug filing creates a data point that reduces the quality score.
3266
+
3267
+ 2. **MR review validation:** when WorkTrain reviews a PR and the PR author disputes a finding (e.g. closes without fixing what WorkTrain flagged, or fixes something WorkTrain missed), that's a signal. WorkTrain tracks these via webhook: if a PR that WorkTrain reviewed APPROVE ships a Critical bug, that review retroactively becomes a miss.
3268
+
3269
+ 3. **Human override tracking:** when a human changes a WorkTrain decision (reorders the queue, rejects a proposed change, overrides an auto-merge), those are signals that WorkTrain got something wrong. Each override is logged with a reason (if provided) and fed into the quality model.
3270
+
3271
+ 4. **Explicit feedback:** `worktrain feedback "the PR #402 review missed the temp file cleanup issue"` appends to a feedback log. The workflow effectiveness assessment picks these up.
3272
+
3273
+ ---
3274
+
3275
+ #### The quality dashboard (console Analytics tab)
3276
+
3277
+ ```
3278
+ ┌─────────────────────────────────────────────────────┐
3279
+ │ WorkTrain Analytics — workrail Last 30 days │
3280
+ ├─────────────────────────────────────────────────────┤
3281
+ │ QUALITY SCORE 78/100 ↑+6 COST $12.40 │
3282
+ │ ████████████████░░░░ │
3283
+ ├─────────────────────────────────────────────────────┤
3284
+ │ VOLUME QUALITY │
3285
+ │ PRs opened: 23 Merge success: 94% │
3286
+ │ PRs reviewed: 31 Review accuracy: 94% │
3287
+ │ Tasks done: 19 Post-merge bugs: 5.6% │
3288
+ │ Bugs found: 8 Bug investigation: 75% │
3289
+ ├─────────────────────────────────────────────────────┤
3290
+ │ TIME SAVED (estimated) │
3291
+ │ Total: ~62h Cost/hour: $0.20 │
3292
+ │ ████████████████████████████░░░ (62/80h budget) │
3293
+ ├─────────────────────────────────────────────────────┤
3294
+ │ TREND ────────────────────────────────── │
3295
+ │ Quality score by week: │
3296
+ │ W1: 68 W2: 71 W3: 74 W4: 78 ↑ improving │
3297
+ │ │
3298
+ │ Post-merge bug rate by workflow: │
3299
+ │ coding-task (Small): 0% (Medium): 8% (Large): 0%│
3300
+ │ → Medium tasks have highest bug rate, investigate │
3301
+ └─────────────────────────────────────────────────────┘
3302
+ ```
3303
+
3304
+ The "investigate" callout in the trend section is important -- the analytics dashboard doesn't just show numbers, it flags anomalies and links to the `workflow-effectiveness-assessment` that would address them. Stats → insight → action is the full loop.
3305
+
3306
+ ---
3307
+
3308
+ ### Pattern and architecture validation: WorkTrain enforces team conventions (Apr 15, 2026)
3309
+
3310
+ **The idea:** beyond just reviewing code for bugs, WorkTrain validates that the code matches the patterns and architecture the team expects. Not "does it work?" but "does it fit?"
3311
+
3312
+ **Two levels:**
3313
+
3314
+ **1. Philosophy lens (already partially built)**
3315
+ The coding-task workflow already applies the user's coding philosophy as a review lens -- flagging violations by principle name. This needs to be extended to be:
3316
+ - **Per-workspace configurable** -- different projects have different conventions
3317
+ - **Machine-checkable** -- some patterns can be verified structurally (no direct db access outside the repository layer, no console.log in production code, no any types) rather than relying on the LLM to catch them
3318
+
3319
+ **2. Architectural invariant checking (new)**
3320
+ Explicit rules about what the codebase's structure must look like:
3321
+
3322
+ ```yaml
3323
+ workspaces:
3324
+ workrail:
3325
+ architectureRules:
3326
+ # Layer boundaries
3327
+ - id: no-daemon-imports-from-mcp
3328
+ rule: "src/daemon/** must not import from src/mcp/**"
3329
+ type: import_boundary
3330
+ severity: error
3331
+
3332
+ - id: no-di-calls-in-daemon
3333
+ rule: "src/daemon/** must not call initializeContainer() or container.resolve()"
3334
+ type: forbidden_call
3335
+ severity: error
3336
+
3337
+ # Pattern enforcement
3338
+ - id: errors-as-data
3339
+ rule: "No throw statements in src/daemon/**, src/trigger/** -- use Result types"
3340
+ type: no_throw
3341
+ severity: warning
3342
+ exceptions: ["constructor", "assertExhaustive"]
3343
+
3344
+ - id: no-exec-shell
3345
+ rule: "No child_process.exec() -- use execFile() with args array"
3346
+ type: forbidden_call
3347
+ severity: error
3348
+
3349
+ - id: no-hardcoded-tmp
3350
+ rule: "No '/tmp/' string literals -- use os.tmpdir()"
3351
+ type: forbidden_literal
3352
+ severity: warning
3353
+ ```
3354
+
3355
+ These rules run as scripts (static analysis, not LLM) -- fast, deterministic, zero tokens. They're checked:
3356
+ - During the coding-task workflow (before the agent commits anything)
3357
+ - As part of the CI gate (same `posix_tmp_literal` rule we fixed in PR #390 -- this is exactly that pattern generalized)
3358
+ - By the periodic architecture scan
3359
+
3360
+ **What this enables combined with quality metrics:**
3361
+ If WorkTrain's coding tasks have a 5.6% post-merge bug rate AND those bugs consistently violate the same architectural rule, the pattern validation catches it before merge next time. Quality metrics identify the problem; architecture rules prevent recurrence. The self-improvement loop: bugs found → rule added → violations caught earlier → bug rate drops.
3362
+
3363
+ **The self-improvement connection:**
3364
+ When the `workflow-effectiveness-assessment` runs and finds that a certain class of bug appears repeatedly in WorkTrain's output (e.g. "3 of the last 5 coding tasks had shell injection risks"), it can propose a new architecture rule (`no-exec-shell`) that prevents the pattern going forward. Rules start as soft warnings, graduate to errors after being validated. WorkTrain learns from its own failure patterns and codifies them as invariants.
3365
+
3366
+ ---
3367
+
3368
+ ### Resource management: preventing agent congestion under high concurrency (Apr 15, 2026)
3369
+
3370
+ **Observed problem:** running 10 simultaneous agents bogs down the system -- API rate limits, token exhaustion, context degradation from too many concurrent Bedrock/Anthropic calls, and the host machine running hot. The `maxConcurrentSessions` semaphore addresses the daemon-level cap, but the broader resource management problem has several dimensions.
3371
+
3372
+ **The dimensions:**
3373
+
3374
+ **1. API rate limits**
3375
+ Anthropic and Bedrock both have tokens-per-minute limits. 10 concurrent agents each hitting the API at once creates bursts that exceed the limit, causing retries and backpressure. The daemon needs a token-bucket rate limiter shared across all sessions: before each LLM call, acquire a slot from the bucket. If the bucket is empty, wait.
3376
+
3377
+ **2. Host machine resources**
3378
+ Each agent loop runs in-process, consuming RAM and CPU. Node.js is single-threaded but I/O is concurrent -- 10 agents making parallel API calls is fine until they all get responses simultaneously and saturate the JS event loop with JSON parsing and session store writes. The right limit is not "10 sessions" but "N sessions where N is calibrated to the host's memory and the model's response size."
3379
+
3380
+ **3. Tiered concurrency by task type**
3381
+ Not all sessions are equal. A `wr.discovery` session is cheap (mostly reads, fast). A `coding-task-workflow-agentic` session is expensive (many tool calls, long responses). Running 10 coding tasks simultaneously is very different from running 10 discovery sessions.
3382
+
3383
+ ```yaml
3384
+ workspaces:
3385
+ workrail:
3386
+ concurrency:
3387
+ maxTotal: 6 # global cap
3388
+ perWorkflowType:
3389
+ coding-task-workflow-agentic: 2 # expensive, cap low
3390
+ mr-review-workflow.agentic.v2: 3 # medium cost
3391
+ wr.discovery: 5 # cheap, allow more
3392
+ bug-investigation.agentic.v2: 2
3393
+ ```
3394
+
3395
+ **4. Queue-aware throttling**
3396
+ When the queue has a mix of high-priority and low-priority items, WorkTrain should prefer starting high-priority items even if slots are available for low-priority ones. If all slots are taken by low-priority work, high-priority items wait unnecessarily.
3397
+
3398
+ **5. Graceful degradation**
3399
+ When the system is under load, WorkTrain should degrade gracefully rather than failing hard. Options:
3400
+ - Slow down polling intervals (less frequent API calls)
3401
+ - Prefer fast/cheap workflows over slow/expensive ones
3402
+ - Pause the queue drain and process the backlog sequentially
3403
+
3404
+ **Build order:**
3405
+ 1. `maxConcurrentSessions` semaphore (in flight -- simple global cap)
3406
+ 2. Token-bucket rate limiter in the agent loop (prevents API bursts)
3407
+ 3. Per-workflow-type concurrency limits (tiered caps)
3408
+ 4. Queue-aware slot allocation (high-priority first)
3409
+ 5. Adaptive throttling based on observed latency (automatic backpressure)
3410
+
3411
+ **The meta-point:** WorkTrain running at full capacity on itself is the best stress test for these constraints. Every day we run 10 simultaneous agents, we discover the edges of what the system can handle. Those discoveries should directly inform the resource management implementation.
3412
+
3413
+
3414
+ ---
3415
+
3416
+ ### Universal integration layer: WorkTrain interfaces with everything (Apr 15, 2026)
3417
+
3418
+ **The principle:** WorkTrain is not opinionated about your stack. It works with whatever version control, project management, communication, monitoring, and documentation systems you use -- cloud or self-hosted, SaaS or on-prem. The integration layer is the boundary where WorkTrain connects to the outside world.
3419
+
3420
+ ---
3421
+
3422
+ #### Integration categories
3423
+
3424
+ **Version control**
3425
+ | System | Interface | Notes |
3426
+ |--------|-----------|-------|
3427
+ | GitHub (cloud) | REST API + polling | Primary target, already designed |
3428
+ | GitLab (cloud + self-hosted) | REST API + polling | Already in polling triggers |
3429
+ | Bitbucket | REST API + polling | Same pattern as GitLab |
3430
+ | Azure DevOps | REST API + polling | Large enterprise share |
3431
+ | Gitea / Forgejo | REST API + polling | Self-hosted open source |
3432
+ | Gerrit | REST API | Google's code review system |
3433
+ | Raw git | git CLI + filesystem | No API needed -- just a remote |
3434
+
3435
+ All VCS integrations share the same polling adapter pattern. The difference is the API schema -- the `GitLabPoller` becomes a template: implement `fetchEvents(since: Date): Event[]` and WorkTrain handles the rest.
3436
+
3437
+ **Project management / ticketing**
3438
+ | System | Interface | Notes |
3439
+ |--------|-----------|-------|
3440
+ | GitHub Issues | REST API (same token as VCS) | Zero extra config for GitHub users |
3441
+ | GitLab Issues | REST API (same token as VCS) | Zero extra config for GitLab users |
3442
+ | Jira (Cloud + Server + Data Center) | REST API + polling | Dominant enterprise tracker |
3443
+ | Linear | GraphQL API | Dominant startup tracker |
3444
+ | Asana | REST API | Common in non-engineering teams |
3445
+ | Notion | REST API | Database + docs hybrid |
3446
+ | Monday.com | REST API | Common in agencies/SMB |
3447
+ | Azure Boards | REST API | Azure ecosystem |
3448
+ | Shortcut (formerly Clubhouse) | REST API | Engineering-focused |
3449
+
3450
+ WorkTrain reads tickets to understand context, writes comments/status updates when work completes, creates new tickets when investigations surface issues, and transitions ticket status when PRs merge.
3451
+
3452
+ **Communication / notifications**
3453
+ | System | Interface | Notes |
3454
+ |--------|-----------|-------|
3455
+ | Slack | Incoming webhooks + Bot API | Most common dev team chat |
3456
+ | Microsoft Teams | Incoming webhooks + Graph API | Enterprise dominant |
3457
+ | Discord | Webhooks + Bot API | Common in open source |
3458
+ | Telegram | Bot API | Common for personal/small team |
3459
+ | Email | SMTP | Universal fallback |
3460
+ | PagerDuty | Events API | Incident escalation |
3461
+ | OpsGenie | REST API | Alerting + on-call |
3462
+ | Webhook (generic) | HTTP POST | Any system that accepts webhooks |
3463
+
3464
+ WorkTrain posts to the right channel based on the event type: PR review findings → the team's dev channel, critical incidents → #incidents + on-call, weekly health summary → #engineering, ideas → #product.
3465
+
3466
+ **Monitoring / observability**
3467
+ | System | Interface | Notes |
3468
+ |--------|-----------|-------|
3469
+ | Sentry | REST API + polling | Error tracking |
3470
+ | Datadog | REST API + polling | Metrics, traces, logs |
3471
+ | New Relic | REST API | APM |
3472
+ | Grafana / Prometheus | HTTP API | Self-hosted metrics |
3473
+ | PagerDuty | Events API | Incident triggers |
3474
+ | CloudWatch | AWS SDK | AWS-native |
3475
+ | Custom HTTP endpoint | HTTP GET/POST | Any system with an API |
3476
+
3477
+ WorkTrain polls for threshold breaches (same `PollingTriggerSource` pattern as VCS), investigates anomalies, and posts findings back.
3478
+
3479
+ **Documentation**
3480
+ | System | Interface | Notes |
3481
+ |--------|-----------|-------|
3482
+ | Confluence (Cloud + Server) | REST API | Most common enterprise wiki |
3483
+ | Notion | REST API | Also a project management system |
3484
+ | Google Docs / Drive | Google API | Common in startups |
3485
+ | Markdown in repo | git + filesystem | Zero extra config |
3486
+ | ReadTheDocs / Sphinx | Filesystem | Generated docs |
3487
+ | Docusaurus | Filesystem | Modern static docs |
3488
+
3489
+ WorkTrain reads doc systems as reference context for agents (same as `referenceUrls` today). It writes back when documentation needs updating after code changes.
3490
+
3491
+ ---
3492
+
3493
+ #### The integration architecture
3494
+
3495
+ **Three integration modes:**
3496
+
3497
+ 1. **Polling source** (already built for GitLab) -- WorkTrain calls the external API on a schedule, deduplicates events, dispatches workflows. Works for: VCS (new PRs/issues), ticketing (new tickets), monitoring (threshold breaches).
3498
+
3499
+ 2. **Delivery target** (already built for `callbackUrl`) -- WorkTrain POSTs results to an external system when a workflow completes. Works for: Slack/Teams/Discord notifications, Jira status updates, GitLab MR comments, PagerDuty incident resolution.
3500
+
3501
+ 3. **Reference context** (already built for `referenceUrls`) -- WorkTrain fetches external documents and injects them into the agent's context. Works for: Confluence pages, Google Docs, Notion databases, external API docs.
3502
+
3503
+ **The integration manifest in triggers.yml:**
3504
+ ```yaml
3505
+ integrations:
3506
+ github:
3507
+ token: $GITHUB_TOKEN
3508
+ baseUrl: https://api.github.com # override for GitHub Enterprise
3509
+
3510
+ jira:
3511
+ token: $JIRA_TOKEN
3512
+ baseUrl: https://mycompany.atlassian.net
3513
+ projectKey: ENG
3514
+
3515
+ slack:
3516
+ webhookUrl: $SLACK_WEBHOOK_URL
3517
+ channels:
3518
+ reviews: "#code-review"
3519
+ incidents: "#incidents"
3520
+ weekly: "#engineering"
3521
+
3522
+ datadog:
3523
+ apiKey: $DATADOG_API_KEY
3524
+ appKey: $DATADOG_APP_KEY
3525
+
3526
+ triggers:
3527
+ - id: new-jira-bug
3528
+ type: jira_poll
3529
+ source:
3530
+ integration: jira
3531
+ jql: "project = ENG AND issuetype = Bug AND status = Open AND created >= -1h"
3532
+ pollIntervalSeconds: 300
3533
+ workflowId: bug-investigation.agentic.v2
3534
+ goalTemplate: "Investigate Jira bug {{$.key}}: {{$.fields.summary}}"
3535
+ callbackUrl: "{{jira.baseUrl}}/rest/api/3/issue/{{$.key}}/comment"
3536
+ ```
3537
+
3538
+ **The adapter pattern:**
3539
+ Each integration is a standalone adapter module in `src/trigger/adapters/`:
3540
+ - `github-poller.ts` -- `fetchEvents(since): GitHubEvent[]`
3541
+ - `gitlab-poller.ts` -- (already exists) `fetchEvents(since): GitLabMR[]`
3542
+ - `jira-poller.ts` -- `fetchEvents(since): JiraIssue[]`
3543
+ - `linear-poller.ts` -- `fetchEvents(since): LinearIssue[]`
3544
+ - `sentry-poller.ts` -- `fetchEvents(since): SentryError[]`
3545
+ - `datadog-poller.ts` -- `fetchEvents(since): DatadogAlert[]`
3546
+
3547
+ Each adapter implements the same interface. The `PollingScheduler` doesn't know which adapter it's running -- it just calls `fetchEvents()` and dispatches. Adding a new integration is: implement the adapter, add a type to `TriggerDefinition`, handle it in `trigger-store.ts`. No changes to the scheduler or router.
3548
+
3549
+ **Delivery adapters** follow the same pattern for writing back:
3550
+ - `slack-delivery.ts` -- formats and POSTs to Slack webhook
3551
+ - `jira-delivery.ts` -- adds comment to Jira issue, transitions status
3552
+ - `github-delivery.ts` -- posts PR review comment, creates issue
3553
+ - `pagerduty-delivery.ts` -- resolves or escalates incident
3554
+
3555
+ **The `callbackUrl` field becomes `deliveryTarget`** with a richer schema:
3556
+ ```yaml
3557
+ deliveryTarget:
3558
+ type: slack # or: jira, github, gitlab, pagerduty, webhook
3559
+ integration: slack # reference to integrations block
3560
+ channel: "#code-review"
3561
+ # OR for generic webhook:
3562
+ url: https://hooks.example.com/worktrain
3563
+ ```
3564
+
3565
+ ---
3566
+
3567
+ #### What this enables
3568
+
3569
+ A fully connected WorkTrain for a typical engineering team:
3570
+
3571
+ ```
3572
+ New Jira bug filed
3573
+ → WorkTrain investigates → posts findings as Jira comment
3574
+ → if auto-fixable → opens GitHub PR → reviews it → merges
3575
+ → transitions Jira ticket to "In Review" / "Done"
3576
+ → posts to #engineering: "Fixed JIRA-1234 autonomously -- PR #456"
3577
+
3578
+ Datadog alert fires
3579
+ → WorkTrain investigates logs + recent commits
3580
+ → posts to #incidents with root cause + affected files
3581
+ → if config fix → deploys fix → resolves PagerDuty incident
3582
+ → updates Confluence runbook with new pattern
3583
+
3584
+ Weekly
3585
+ → WorkTrain posts health summary to #engineering
3586
+ → files Linear tickets for technical debt items found in audit
3587
+ → updates Google Doc "Architecture Notes" with recent decisions
3588
+ ```
3589
+
3590
+ Zero humans needed unless the circuit breaker fires.
3591
+
3592
+ ---
3593
+
3594
+ #### Build order
3595
+
3596
+ **Now (already works):** generic `callbackUrl` (HTTP POST to any endpoint). Any system that accepts webhooks works immediately.
3597
+
3598
+ **Near-term:** GitHub polling adapter (same as GitLab, already written as template), Slack delivery adapter (format + post to webhook).
3599
+
3600
+ **Medium-term:** Jira polling + delivery (high enterprise value), Linear polling (high startup value), PagerDuty delivery (incident escalation).
3601
+
3602
+ **Long-term:** the full matrix above. Each adapter is a bounded, testable, independently shippable unit. The architecture supports adding them without touching the core engine.
3603
+
3604
+ ---
3605
+
3606
+
3607
+
3608
+ ---
3609
+
3610
+ ### Multi-project WorkTrain: workspace isolation vs cross-project knowledge (to investigate, Apr 15, 2026)
3611
+
3612
+ **The problem:** WorkTrain needs to handle multiple completely unrelated projects simultaneously, but some projects are related and need to share knowledge. These are contradictory requirements if handled naively.
3613
+
3614
+ **Three axes of tension:**
3615
+
3616
+ 1. **Isolation vs shared context** -- project A's TypeScript symbols should never pollute project B's Python context. But if A and B share architectural patterns (both use WorkTrain, both follow the same auth pattern), that shared knowledge is valuable.
3617
+
3618
+ 2. **Independent execution vs cross-project tasks** -- most tasks are scoped to one project. But some tasks span projects: "update the mobile app AND the backend API for this feature", "apply the same refactor pattern we used in workrail to storyforge".
3619
+
3620
+ 3. **One daemon vs many** -- one daemon is easier to manage (one config, one console, one binary). Multiple daemons give true blast-radius isolation. The right answer is probably workspace namespacing inside one process, but with cross-namespace knowledge queries for when projects are related.
3621
+
3622
+ **The cross-project knowledge requirement:**
3623
+ When two projects are related (share patterns, have a dependency relationship, or are being worked on together), the knowledge graph should be queryable across project boundaries -- but opt-in, not default. A session working on project A can explicitly query "what's the equivalent pattern in project B?" but never sees project B's context by default.
3624
+
3625
+ **Proposed model:** workspace namespacing with explicit cross-workspace links
3626
+
3627
+ ```yaml
3628
+ workspaces:
3629
+ workrail:
3630
+ path: ~/git/personal/workrail
3631
+ soul: ~/.workrail/souls/workrail.md
3632
+ knowledgeGraph: ~/.workrail/graphs/workrail.db
3633
+ maxConcurrentSessions: 3
3634
+ relatedWorkspaces: [storyforge] # can query storyforge graph when explicitly needed
3635
+
3636
+ storyforge:
3637
+ path: ~/git/personal/storyforge
3638
+ soul: ~/.workrail/souls/storyforge.md
3639
+ knowledgeGraph: ~/.workrail/graphs/storyforge.db
3640
+ maxConcurrentSessions: 1
3641
+ relatedWorkspaces: [workrail]
3642
+ ```
3643
+
3644
+ A session in `workrail` gets `workrail` context by default. If it calls `query_knowledge_graph(workspace: 'storyforge', ...)`, it gets storyforge context explicitly. The coordinator script can spawn workers in multiple workspaces for cross-project tasks.
3645
+
3646
+ **Investigation needed (discovery agent running):**
3647
+ - Is workspace namespacing inside one process the right architecture, or should each project run a separate daemon?
3648
+ - What exactly needs to be workspace-scoped vs globally shared?
3649
+ - How do cross-project coordinator tasks work? (spawn worker in workspace A, spawn worker in workspace B, coordinator synthesizes)
3650
+ - What's the knowledge graph query interface for cross-workspace queries?
3651
+ - How does the console show multi-workspace activity without being overwhelming?
3652
+ - What's the blast radius if one workspace's agent goes rogue?
3653
+
3654
+ **What CAN be shared globally (no namespace needed):**
3655
+ - The WorkTrain binary and workflow library
3656
+ - Token usage / billing tracking
3657
+ - The message queue (`~/.workrail/message-queue.jsonl`)
3658
+ - The merge audit log
3659
+ - The outbox (notifications to the user)
3660
+ - The `worktrain talk` session (can discuss any workspace)
3661
+
3662
+ **What MUST be workspace-scoped:**
3663
+ - Knowledge graph (symbols from different codebases must not mix)
3664
+ - daemon-soul.md (different stacks need different principles)
3665
+ - Session store (project A's sessions should not appear in project B's console view by default)
3666
+ - Concurrency limits (project A should not starve project B)
3667
+ - Triggers and polling sources (each workspace has its own event sources)
3668
+
3669
+ ---
3670
+
3671
+
3672
+
3673
+ ---
3674
+
3675
+ ### Never worktree main: branch safety rules for WorkTrain (Apr 16, 2026)
3676
+
3677
+ **Critical invariant:** WorkTrain must never check out `main` or `master` into a worktree. Locking main in a worktree blocks all other agents from checking out main and prevents fast-forward merges.
3678
+
3679
+ **The rule:**
3680
+ - All agent worktrees must use feature branches, never `main` or `master` or any protected branch
3681
+ - When creating a worktree for a task, WorkTrain always creates a new branch: `git worktree add <path> -b <branch-name>`
3682
+ - If an agent needs to read main's state, it uses `git show origin/main:<file>` without checking out the branch
3683
+ - Stale worktrees (branches that have been merged) must be cleaned up automatically after session completion
3684
+
3685
+ **How it breaks today:**
3686
+ The `--isolation worktree` flag on subagents creates a worktree. If the agent's task involves reading and committing to main directly (e.g. a merge task), it can end up with main locked. This happened during today's session.
3687
+
3688
+ **The fix (two parts):**
3689
+
3690
+ 1. **In the daemon worktree creation code:** before creating a worktree, check if the requested branch is `main`, `master`, or any branch in a configurable `protectedBranches` list. If so, create a new branch from it instead.
3691
+
3692
+ 2. **In the daemon-soul.md:** add explicit rule:
3693
+ ```
3694
+ ## Branch Safety
3695
+ - NEVER check out main, master, or any protected branch into a worktree
3696
+ - NEVER use 'git checkout main' -- always work on a feature branch
3697
+ - When merging to main, use 'gh pr merge' (via PR), never direct git push
3698
+ - After a PR merges, immediately clean up the local worktree
3699
+ ```
3700
+
3701
+ 3. **Automatic stale worktree cleanup:** after each session completes (success or failure), the daemon should run `git worktree prune` and remove any worktrees whose branches have been merged to main.
3702
+
3703
+
3704
+ ---
3705
+
3706
+
3707
+ ---
3708
+
3709
+ ### Communication agent: Slack monitoring, email management, and suggested responses (Apr 16, 2026)
3710
+
3711
+ **The idea:** WorkTrain monitors your communication channels, understands context, and either responds on your behalf or prepares vetted drafts for you to send.
3712
+
3713
+ **Slack:**
3714
+ - Monitor specified channels and DMs for messages that mention you, reference your projects, or require a response
3715
+ - Understand context: "who is asking, what do they need, what's the relevant project state?"
3716
+ - Options: auto-respond for routine questions ("what's the status of X?" → WorkTrain knows), draft a response for you to review and send, or surface with a notification "someone needs your input on Y"
3717
+ - Configurable per-channel: some channels auto-respond, some always require your review
3718
+ - Filter noise: identify which Slack threads are actually important vs chatter
3719
+
3720
+ **Email:**
3721
+ - Same pattern as Slack -- monitor inbox, understand context, draft responses
3722
+ - Suggest email filters, folder rules, and unsubscribe candidates based on patterns WorkTrain observes
3723
+ - "You've received 47 newsletters this month from 12 senders and never opened them -- want me to unsubscribe?"
3724
+ - Priority surfacing: "3 emails in your inbox need a response, here are the drafts"
3725
+
3726
+ **Important constraint:** WorkTrain never sends on your behalf without explicit approval for anything that goes to other people. Auto-respond is opt-in per-channel, with a review window before sending. You always see what was sent and can recall/edit.
3727
+
3728
+ ---
3729
+
3730
+
3731
+
3732
+ ---
3733
+
3734
+ ### Local file organization and maintenance (Apr 16, 2026)
3735
+
3736
+ - WorkTrain scans specified directories for stale, duplicate, and disorganized files
3737
+ - Suggests folder structures based on file content and usage patterns
3738
+ - Identifies documents that are out of date and offers to update them
3739
+ - Keeps project-related files in sync with the repo (e.g. local design files linked to Figma specs, local notes linked to Confluence pages)
3740
+ - "~/Downloads has 847 files, most untouched for 6 months -- here's what's safe to delete and what should be archived"
3741
+ - Connects to the knowledge graph: files that reference code or projects get indexed alongside the code
3742
+
3743
+ ---
3744
+
3745
+
3746
+
3747
+ ---
3748
+
3749
+ ### Git worktrees and branch management as a first-class capability (Apr 16, 2026)
3750
+
3751
+ **Critical for parallel work.** WorkTrain needs native, sophisticated git management -- not just running git commands but understanding the full branching topology and managing it intelligently.
3752
+
3753
+ **What this means:**
3754
+
3755
+ **Worktree management:**
3756
+ - Create, list, switch between, and clean up worktrees automatically
3757
+ - Each concurrent task gets its own worktree (WorkTrain already does this via `.claude/worktrees/`)
3758
+ - Detect and warn about stale worktrees (branches that have been merged or abandoned)
3759
+ - The `cw <branch>` command pattern already exists -- WorkTrain should be able to invoke it for any task that needs isolation
3760
+
3761
+ **Branch lifecycle:**
3762
+ - Know which branches are: active (being worked on), stale (no commits in N days), merged (on main), or orphaned (created but abandoned)
3763
+ - Automatic cleanup proposals: "14 branches are merged and safe to delete, 3 are stale, 2 have uncommitted work"
3764
+ - Rebase management: when main advances, WorkTrain knows which in-flight branches need rebasing and does it automatically (or queues it)
3765
+ - Conflict detection: before spawning a new session, check if any in-flight branch would conflict with the planned changes
3766
+
3767
+ **Parallel work coordination:**
3768
+ - When multiple tasks touch the same files, WorkTrain detects potential conflicts before they happen
3769
+ - Sequences tasks that would conflict, parallelizes those that won't
3770
+ - Maintains a "file lock" mental model: this file is being modified by session A, session B should wait or work on a different scope
3771
+ - When a feature branch is ready, WorkTrain handles the full merge/rebase/PR creation flow
3772
+
3773
+ **Branch naming and organization:**
3774
+ - Enforces consistent branch naming conventions (already partially done via daemon soul)
3775
+ - Groups related branches: `feat/github-polling-*` are all part of the same epic
3776
+ - Links branches to tickets/queue items: opening a PR creates the Jira transition, closing a PR cleans up the branch
3777
+
3778
+ **The `worktrain worktree` command family:**
3779
+ ```bash
3780
+ worktrain worktree list # all worktrees and their status
3781
+ worktrain worktree clean # remove merged/stale worktrees
3782
+ worktrain worktree new <branch> [--task] # create worktree + optionally link to queue item
3783
+ worktrain worktree status # which files are locked by active sessions
3784
+ ```
3785
+
3786
+ This is especially critical when WorkTrain is managing 10 concurrent sessions -- without explicit worktree management, two sessions could clobber each other's changes on the same branch.
3787
+
3788
+ ---
3789
+
3790
+
3791
+ ---
3792
+
3793
+ ### Thin spots: ideas that need fuller spec (Apr 16, 2026)
3794
+
3795
+ These were mentioned and partially captured but need more detail when the time comes:
3796
+
3797
+ **`worktrain feedback` command:**
3798
+ Explicit quality feedback loop. `worktrain feedback "the PR #402 review missed the temp file cleanup issue"` appends to `~/.workrail/feedback.jsonl`. The workflow-effectiveness-assessment picks these up alongside statistical patterns. User feedback is weighted higher than inferred signals.
3799
+
3800
+ **`worktrain idea` command:**
3801
+ Lightweight idea capture without interrupting active work. `worktrain idea "nested subagents up to N depth"` appends to `~/.workrail/ideas-buffer.jsonl`. The `worktrain talk` session reviews the buffer at conversation start and decides what to groom into the backlog. Prevents good ideas from getting lost when 10 agents are running.
3802
+
3803
+ **Audience-aware status briefings (`--audience` flag):**
3804
+ `worktrain status --audience owner` (full technical detail, default) vs `--audience stakeholder` (capability level, no PR numbers) vs `--audience external` (outcome level, no internal terminology). Same underlying data, different presentation layer. The Haiku-level routine adjusts verbosity and replaces technical terms with plain language.
3805
+
3806
+ **`worktrain queue` CLI commands:**
3807
+ ```bash
3808
+ worktrain queue list [--workspace <name>] # show queue with priorities and status
3809
+ worktrain queue pause [--workspace <name>] # stop draining
3810
+ worktrain queue resume [--workspace <name>] # resume draining
3811
+ worktrain queue remove <id> # remove item
3812
+ worktrain queue bump <id> # move to top
3813
+ worktrain queue show <id> # full item details + pipeline plan
3814
+ ```
3815
+
3816
+ **Workspace-scoped soul and config:**
3817
+ Each workspace has its own `daemon-soul.md` at a configurable path. Soul resolution cascade: trigger-level override → workspace soul → global `~/.workrail/daemon-soul.md` → built-in default. Enables TypeScript and Python workspaces to have different behavioral profiles on the same WorkTrain instance.
3818
+
3819
+ **Automatic worktree cleanup:**
3820
+ After any session completes (success or failure), the daemon automatically runs `git worktree prune` and removes any worktrees whose branches are merged to main. Prevents the main-worktree-lock issue encountered today.
3821
+
3822
+ ---
3823
+
3824
+ ### The single-conversation problem: WorkTrain needs multi-threaded interaction (Apr 16, 2026)
3825
+
3826
+ A single chat where everything is happening at the same time is not ideal. When WorkTrain is managing 10 concurrent agents, it becomes impossible to know what's been captured vs what's floating, follow any one thread, or distinguish "in progress" from "needs a decision."
3827
+
3828
+ **Threaded conversations per work group:**
3829
+ Each active work group gets its own conversation thread. You can follow the polling-triggers work in thread A without seeing the spawn/await implementation in thread B. Threads are persistent -- come back 2 hours later and pick up exactly where you left off.
3830
+
3831
+ **`worktrain talk` shows a thread list:**
3832
+ ```
3833
+ Threads:
3834
+ ● WorkRail development [3 active agents, 2 waiting]
3835
+ ● Storyforge chapter work [idle]
3836
+ → Select thread or type to start a new one
3837
+ ```
3838
+
3839
+ **`worktrain idea` for mid-conversation capture:**
3840
+ When a new idea comes up while 10 agents are running, `worktrain idea "..."` appends to an ideas buffer without interrupting active work. The talk session reviews the buffer at the start of each conversation.
3841
+
3842
+ **Build order:** thread model → thread list console view → cross-thread notifications → idea capture buffer.
3843
+
3844
+ ---
3845
+
3846
+ ### Nested subagent depth: configurable delegation chains (Apr 16, 2026)
3847
+
3848
+ WorkTrain should support nested subagents -- an agent spawning a subagent, which spawns its own -- up to a configurable depth limit.
3849
+
3850
+ ```yaml
3851
+ workspaces:
3852
+ workrail:
3853
+ agentDefaults:
3854
+ maxSubagentDepth: 3 # coordinator=0, worker=1, subagent=2, sub-subagent=3
3855
+ maxTotalAgentsPerTask: 10 # hard cap across all depths for a single task
3856
+ ```
3857
+
3858
+ **Depth semantics:**
3859
+ - Depth 0: coordinator script (no LLM, pure script)
3860
+ - Depth 1: main worker (coding-task, mr-review)
3861
+ - Depth 2: subagent from workflow step (routine-context-gathering, etc.)
3862
+ - Depth 3: sub-subagent (rare, deep investigation chains)
3863
+ - Depth 4+: almost certainly a bug or runaway loop
3864
+
3865
+ **The `maxTotalAgentsPerTask` budget** prevents exponential explosion -- a depth-3 tree with 3 agents per node = 27 concurrent agents without this cap.
3866
+
3867
+ **Console DAG view** shows nesting depth as indentation. Makes over-delegation immediately visible.
3868
+
3869
+ ---
3870
+
3871
+ ### WorkTrain attribution and acting as the user (Apr 16, 2026)
3872
+
3873
+ **Attribution / signing:**
3874
+ 1. **Commit signatures:** commits made by WorkTrain include `Co-Authored-By: WorkTrain <worktrain@etienneb.dev>`. The configured `worktrain-bot` identity is consistent across all workspaces.
3875
+ 2. **PR/MR description footer:** `---\n🤖 Implemented by WorkTrain · Session: sess_abc123 · Workflows run: coding-task, mr-review`. Links to session for full audit trail.
3876
+ 3. **Issue/comment attribution:** WorkTrain comments include "WorkTrain investigation" with session link. Clearly not a human.
3877
+
3878
+ **Value:** audit trail, trust calibration for reviewers, "how much of our code was WorkTrain-authored?" becomes queryable, open-source visibility.
3879
+
3880
+ **Acting as the user:**
3881
+ WorkTrain uses the user's git identity and GitHub account (via user's token) to act as them. PRs appear from @EtienneBBeaulac, commits show as Etienne Beaulac.
3882
+
3883
+ **Why useful:** normal PR approval flows, no bot account permissions needed, personal git history stays personal even for WorkTrain-authored work.
3884
+
3885
+ **Trust guardrails:** `actAsUser: true` explicit opt-in, only for commits/PRs (never emails or Slack without additional permission), PR description always notes "Created by WorkTrain," full audit log in `~/.workrail/actions-as-user.jsonl`.
3886
+
3887
+ ---
3888
+
3889
+ ### Console session detail: more than the DAG when running standalone (Apr 16, 2026)
3890
+
3891
+ **The gap:** the session DAG shows structure (steps, edges, progress) but not meaning. When you're watching a session run in the console without being in Claude Code, you want to know what the agent is *actually doing* -- not just which step it's on.
3892
+
3893
+ **What's missing from the current DAG view:**
3894
+ - The latest step output note, rendered inline and updating as it streams (not hidden behind a click)
3895
+ - A plain-English summary of what the agent is doing right now ("Analyzing the diff for shell injection risks")
3896
+ - Current step prompt visible on demand (so you know what the agent was asked to do)
3897
+ - Token count and cost estimate for the session so far
3898
+ - Time elapsed + estimated time remaining based on step history
3899
+ - A live feed of tool calls as they happen ("Reading trigger-router.ts", "Running npm test")
3900
+
3901
+ **The streaming step output** is the most valuable addition. Right now the DAG shows a step as "in progress" with a spinner. It should show the last few lines of the step's output note as it's being written, similar to how a terminal streams command output.
3902
+
3903
+ **Build order:**
3904
+ 1. Inline latest step output in the session detail panel (read from session store, poll every 2s)
3905
+ 2. Live tool call feed alongside the DAG (SSE from the daemon, log each tool call as it fires)
3906
+ 3. Token/cost counter (daemon tracks tokens per session, expose via GET /api/v2/sessions/:id)
3907
+ 4. Plain-English status line ("Step 3/8: analyzing diff" vs just a spinner)
3908
+
3909
+ This makes the console genuinely useful as a standalone monitoring surface -- not just for developers who understand the DAG topology, but for anyone who wants to know if WorkTrain is doing useful work or spinning.
3910
+
3911
+ ---
3912
+
3913
+ ### Orphaned daemon session state: smarter recovery (Apr 16, 2026)
3914
+
3915
+ **The problem:** When the daemon is killed mid-session, the session's in-process `KeyedAsyncQueue` promise chain is lost. On restart, the startup recovery reads orphaned session files and clears them from disk -- but the `serial` concurrency queue key based on `trigger.id` is an in-memory construct. Any external state (e.g. a lock file, a flag in the session store) that was tied to the queue is now inconsistent.
3916
+
3917
+ More critically: if a session is restarted by the daemon but then stalls (Bedrock call hangs, exception suppressed), the daemon log shows nothing after "Injecting workspace context" -- no error, no completion. The session is in limbo.
3918
+
3919
+ **What needs to happen:**
3920
+
3921
+ 1. **Startup recovery should also clear any pending queue slots.** If a session file exists in `~/.workrail/daemon-sessions/` at startup, that trigger's queue key should be treated as free -- no prior promise is alive.
3922
+
3923
+ 2. **Session liveness detection.** If a session has been `in_progress` for more than N minutes with no `advance_recorded` events, the daemon watchdog should log a warning and optionally abort the session. Currently a hung session is invisible.
3924
+
3925
+ 3. **Orphaned session cleanup should be user-facing.** `worktrain cleanup` or `worktrain status` should surface orphaned sessions with their age and offer to clear them. Right now they silently accumulate.
3926
+
3927
+ 4. **Better logging when runWorkflow() swallows errors.** The `void runWorkflow(...)` pattern in `console-routes.ts` and `trigger-router.ts` drops errors silently. Every path that ends in silence (no log, no session advance, no error) should at minimum log `[WorkflowRunner] Session died silently` with the session ID.