@exaudeus/workrail 3.31.1 → 3.33.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (82) hide show
  1. package/dist/cli/commands/index.d.ts +1 -0
  2. package/dist/cli/commands/index.js +3 -1
  3. package/dist/cli/commands/worktrain-await.js +11 -9
  4. package/dist/cli/commands/worktrain-daemon-install.d.ts +35 -0
  5. package/dist/cli/commands/worktrain-daemon-install.js +291 -0
  6. package/dist/cli/commands/worktrain-daemon.d.ts +31 -0
  7. package/dist/cli/commands/worktrain-daemon.js +272 -0
  8. package/dist/cli/commands/worktrain-spawn.js +11 -9
  9. package/dist/cli-worktrain.js +329 -0
  10. package/dist/cli.js +4 -22
  11. package/dist/console/standalone-console.d.ts +28 -0
  12. package/dist/console/standalone-console.js +142 -0
  13. package/dist/{console/assets/index-6H9DeFxj.js → console-ui/assets/index-BuJFLLfY.js} +1 -1
  14. package/dist/{console → console-ui}/index.html +1 -1
  15. package/dist/daemon/agent-loop.d.ts +26 -0
  16. package/dist/daemon/agent-loop.js +53 -2
  17. package/dist/daemon/daemon-events.d.ts +103 -0
  18. package/dist/daemon/daemon-events.js +56 -0
  19. package/dist/daemon/workflow-runner.d.ts +6 -3
  20. package/dist/daemon/workflow-runner.js +229 -33
  21. package/dist/infrastructure/session/HttpServer.js +133 -34
  22. package/dist/manifest.json +134 -70
  23. package/dist/mcp/output-schemas.d.ts +30 -30
  24. package/dist/mcp/transports/bridge-events.d.ts +4 -0
  25. package/dist/mcp/transports/fatal-exit.js +4 -0
  26. package/dist/mcp/transports/http-entry.js +2 -0
  27. package/dist/mcp/transports/stdio-entry.js +26 -6
  28. package/dist/mcp/v2/tools.d.ts +4 -4
  29. package/dist/trigger/adapters/github-poller.d.ts +44 -0
  30. package/dist/trigger/adapters/github-poller.js +190 -0
  31. package/dist/trigger/adapters/gitlab-poller.d.ts +27 -0
  32. package/dist/trigger/adapters/gitlab-poller.js +81 -0
  33. package/dist/trigger/delivery-client.d.ts +2 -1
  34. package/dist/trigger/delivery-client.js +4 -1
  35. package/dist/trigger/index.d.ts +4 -1
  36. package/dist/trigger/index.js +5 -1
  37. package/dist/trigger/polled-event-store.d.ts +22 -0
  38. package/dist/trigger/polled-event-store.js +173 -0
  39. package/dist/trigger/polling-scheduler.d.ts +20 -0
  40. package/dist/trigger/polling-scheduler.js +249 -0
  41. package/dist/trigger/trigger-listener.d.ts +5 -0
  42. package/dist/trigger/trigger-listener.js +53 -4
  43. package/dist/trigger/trigger-router.d.ts +4 -2
  44. package/dist/trigger/trigger-router.js +7 -4
  45. package/dist/trigger/trigger-store.js +114 -33
  46. package/dist/trigger/types.d.ts +17 -1
  47. package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +224 -224
  48. package/dist/v2/durable-core/schemas/session/events.d.ts +42 -42
  49. package/dist/v2/durable-core/schemas/session/manifest.d.ts +6 -6
  50. package/dist/v2/durable-core/schemas/session/validation-event.d.ts +2 -2
  51. package/dist/v2/durable-core/tokens/payloads.d.ts +52 -52
  52. package/dist/v2/usecases/console-routes.js +3 -3
  53. package/dist/v2/usecases/console-service.js +133 -9
  54. package/dist/v2/usecases/console-types.d.ts +7 -0
  55. package/docs/design/daemon-conversation-logging-plan.md +98 -0
  56. package/docs/design/daemon-conversation-logging-review.md +55 -0
  57. package/docs/design/daemon-conversation-logging.md +129 -0
  58. package/docs/design/github-polling-adapter-design-candidates.md +226 -0
  59. package/docs/design/github-polling-adapter-design-review-findings.md +131 -0
  60. package/docs/design/github-polling-adapter-implementation-plan.md +284 -0
  61. package/docs/design/implementation_plan.md +192 -0
  62. package/docs/design/workflow-id-validation-at-startup.md +146 -0
  63. package/docs/design/workflow-id-validation-design-review.md +87 -0
  64. package/docs/design/workflow-id-validation-implementation-plan.md +185 -0
  65. package/docs/design/worktrain-system-prompt-report-issue-candidates.md +135 -0
  66. package/docs/design/worktrain-system-prompt-report-issue-design-review.md +73 -0
  67. package/docs/ideas/backlog.md +465 -0
  68. package/package.json +1 -1
  69. package/workflows/architecture-scalability-audit.json +1 -1
  70. package/workflows/bug-investigation.agentic.v2.json +3 -3
  71. package/workflows/coding-task-workflow-agentic.json +32 -32
  72. package/workflows/coding-task-workflow-agentic.lean.v2.json +1 -1
  73. package/workflows/coding-task-workflow-agentic.v2.json +7 -7
  74. package/workflows/mr-review-workflow.agentic.v2.json +21 -12
  75. package/workflows/personal-learning-materials-creation-branched.json +2 -2
  76. package/workflows/production-readiness-audit.json +1 -1
  77. package/workflows/relocation-workflow-us.json +2 -2
  78. package/workflows/ui-ux-design-workflow.json +14 -14
  79. package/workflows/workflow-for-workflows.json +3 -3
  80. package/workflows/workflow-for-workflows.v2.json +2 -2
  81. package/workflows/wr.discovery.json +1 -1
  82. /package/dist/{console → console-ui}/assets/index-8dh0Psu-.css +0 -0
@@ -0,0 +1,192 @@
1
+ # Implementation Plan: WorkTrain System Prompt Preamble + report_issue Tool
2
+
3
+ ## Problem Statement
4
+
5
+ The WorkTrain daemon's system prompt preamble is thin (15 lines) and relies on the soul file for behavioral guidance. This leaves unattended agents without explicit direction on self-directed reasoning, the oracle hierarchy, or what to do when things go wrong. Additionally, there is no structured way for agents to record issues/errors for a future auto-fix coordinator -- failures either go unrecorded or end up buried in step notes.
6
+
7
+ ---
8
+
9
+ ## Acceptance Criteria
10
+
11
+ 1. `buildSystemPrompt()` output contains a richer preamble (~55 lines) that:
12
+ - Opens with "You are WorkRail Auto, an autonomous agent..." (existing test assertion preserved)
13
+ - Includes `## Your tools` section listing all 5 tools (existing test assertion)
14
+ - Includes `## Execution contract` section (existing test assertion)
15
+ - Adds `## What you are`, `## Your oracle`, `## Self-directed reasoning`, `## The workflow is the contract`, `## Silent failure is the worst outcome`, `## Tools are your hands not your voice`, `## You don't have a user` sections
16
+ - All existing `workflow-runner-system-prompt.test.ts` tests pass without modification
17
+
18
+ 2. `makeReportIssueTool(sessionId, emitter?, issuesDirOverride?)` is exported from `workflow-runner.ts`:
19
+ - Tool name: `report_issue`
20
+ - Input schema accepts: `kind` (5-value literal enum), `severity` (4-value literal enum), `summary` (string, required), `context` (string, optional), `toolName` (string, optional), `command` (string, optional), `suggestedFix` (string, optional), `continueToken` (string, optional)
21
+ - `execute()` appends one JSON line to `~/.workrail/issues/<sessionId>.jsonl` (or `issuesDirOverride/<sessionId>.jsonl` in tests) -- fire-and-forget (void+catch)
22
+ - `execute()` emits a `DaemonEventEmitter` event with `kind: 'issue_reported'`
23
+ - For non-fatal severity: returns `"Issue recorded (severity=<severity>). Continue with your work unless this is fatal."`
24
+ - For fatal severity: returns `"FATAL issue recorded. Call continue_workflow with notes explaining the blocker, then the session will end."`
25
+ - Wired into `runWorkflow()` tools array
26
+
27
+ 3. `IssueReportedEvent` is added to `DaemonEvent` union in `daemon-events.ts`:
28
+ - `kind: 'issue_reported'`, `sessionId: string`, `issueKind` (5-value literal union), `severity` (4-value literal union), `summary: string`, `continueToken?: string`
29
+
30
+ 4. `npm run build` succeeds (no TS errors)
31
+ 5. `npx vitest run` passes (all existing tests + new tests)
32
+
33
+ ---
34
+
35
+ ## Non-Goals
36
+
37
+ - No auto-fix coordinator implementation
38
+ - No IssueStore class (YAGNI -- extract when coordinator needs it)
39
+ - No changes to `soul-template.ts`, `triggers.yml`, or `src/v2/`
40
+ - No changes to the soul file template/default
41
+ - No changes to AgentLoop behavior (fatal severity does not abort the loop)
42
+ - No async changes to `buildSystemPrompt()` (must remain synchronous and pure)
43
+
44
+ ---
45
+
46
+ ## Philosophy-Driven Constraints
47
+
48
+ - `buildSystemPrompt()` must remain a pure, synchronous function (no I/O, no side effects)
49
+ - All `DaemonEvent` variants must use `readonly` fields only
50
+ - `IssueReportedEvent.issueKind` and `.severity` must be literal union types (not `string`)
51
+ - JSONL write must be fire-and-forget: `void appendIssueAsync().catch(() => {})`
52
+ - `mkdir({ recursive: true })` before every appendFile (handles missing dir silently)
53
+ - `issuesDirOverride` parameter for test isolation (mirrors DaemonEventEmitter constructor)
54
+
55
+ ---
56
+
57
+ ## Invariants
58
+
59
+ 1. `buildSystemPrompt()` is pure and synchronous -- verified by existing tests calling it directly
60
+ 2. `'You are WorkRail Auto'` is present in `buildSystemPrompt()` output -- verified by test L29
61
+ 3. `'## Your tools'` is present in `buildSystemPrompt()` output -- verified by test L30
62
+ 4. `'## Execution contract'` is present in `buildSystemPrompt()` output -- verified by test L32
63
+ 5. All `DaemonEvent` variants use `readonly` fields -- verified by TS compiler
64
+ 6. `DaemonEvent` union is exhaustive -- TS compiler enforces at every switch site
65
+ 7. `report_issue.execute()` never throws -- returns `AgentToolResult` always
66
+ 8. JSONL write never blocks `execute()` return -- `void` Promise
67
+
68
+ ---
69
+
70
+ ## Selected Approach + Rationale
71
+
72
+ **Part 1:** Module-private `BASE_SYSTEM_PROMPT` string constant defined above `buildSystemPrompt()`. The function uses it as the start of the lines array. Rationale: named constant is readable as a document; testable via `buildSystemPrompt()` output; follows `soul-template.ts` precedent for stable-content constants.
73
+
74
+ **Part 2:** `makeReportIssueTool(sessionId, emitter?, issuesDirOverride?)` inline tool factory following the exact shape of `makeReadTool`/`makeWriteTool`. Private `appendIssueAsync()` helper for JSONL write. `issuesDirOverride` for test isolation (hybrid of inline factory + runner-up's dirOverride). Rationale: YAGNI -- no IssueStore class until coordinator exists; hybrid resolves testability without over-engineering.
75
+
76
+ **Runner-up:** IssueStore class (Candidate B). Lost to YAGNI -- one caller, no coordinator yet.
77
+
78
+ ---
79
+
80
+ ## Vertical Slices
81
+
82
+ ### Slice 1: Create feature branch
83
+ - Create `feat/worktrain-system-prompt-and-report-issue` from current main
84
+ - Verify clean state
85
+
86
+ ### Slice 2: Add IssueReportedEvent to daemon-events.ts
87
+ - Add `IssueReportedEvent` interface
88
+ - Add to `DaemonEvent` union
89
+ - Verify TS compiles
90
+
91
+ ### Slice 3: Replace buildSystemPrompt() preamble
92
+ - Define `BASE_SYSTEM_PROMPT` constant above `buildSystemPrompt()`
93
+ - Replace lines 1087-1108 to use the constant
94
+ - Verify all existing system-prompt tests pass
95
+
96
+ ### Slice 4: Implement makeReportIssueTool
97
+ - Add private `appendIssueAsync()` helper
98
+ - Add `makeReportIssueTool()` factory
99
+ - Wire into `runWorkflow()` tools array
100
+
101
+ ### Slice 5: Tests
102
+ - Add tests for `makeReportIssueTool` -- verify JSONL write with temp dir, verify event emitted, verify return strings, verify fatal vs non-fatal
103
+ - Verify all existing tests still pass
104
+
105
+ ### Slice 6: Build + full test run
106
+ - `npm run build` -- zero errors
107
+ - `npx vitest run` -- all pass
108
+
109
+ ### Slice 7: PR
110
+ - Commit with conventional commit message
111
+ - Open PR to main
112
+
113
+ ---
114
+
115
+ ## Test Design
116
+
117
+ ### Existing tests (must pass unchanged)
118
+ - `tests/unit/workflow-runner-system-prompt.test.ts` -- all 11 tests
119
+ - `tests/unit/daemon-events.test.ts` -- all existing tests
120
+
121
+ ### New tests to add
122
+ File: `tests/unit/workflow-runner-report-issue.test.ts`
123
+
124
+ Test cases:
125
+ 1. `makeReportIssueTool` -- returns correct tool name and description
126
+ 2. `execute()` with non-fatal severity -- returns confirmation string with severity
127
+ 3. `execute()` with fatal severity -- returns FATAL message
128
+ 4. `execute()` -- writes JSON line to issuesDirOverride/<sessionId>.jsonl
129
+ 5. `execute()` -- written JSON contains kind, severity, summary, ts, sessionId
130
+ 6. `execute()` -- creates dir if it doesn't exist (mkdir recursive)
131
+ 7. `execute()` -- emits `issue_reported` event via emitter
132
+ 8. `execute()` -- optional fields (context, toolName, command, suggestedFix, continueToken) present in JSON when provided
133
+ 9. `execute()` -- does not throw when write fails (fire-and-forget)
134
+
135
+ ---
136
+
137
+ ## Risk Register
138
+
139
+ | Risk | Likelihood | Impact | Mitigation |
140
+ |---|---|---|---|
141
+ | BASE_SYSTEM_PROMPT missing required test strings | Low | High (CI break) | Include `'You are WorkRail Auto'`, `'## Your tools'`, `'## Execution contract'` explicitly; tests catch immediately |
142
+ | IssueReportedEvent `issueKind` vs tool input `kind` confusion | Low | Medium (runtime behavior ok, TS shape wrong) | Use `issueKind` in event interface; keep `kind` in input schema |
143
+ | Silent JSONL write failure not caught in tests | Low | Low (fire-and-forget is intentional) | issuesDirOverride isolates write path; test case #9 verifies no throw |
144
+ | Agent ignores fatal severity | Medium | Medium (tokens wasted) | Out of scope; coordinator detects post-hoc |
145
+
146
+ ---
147
+
148
+ ## PR Packaging Strategy
149
+
150
+ Single PR: `feat/worktrain-system-prompt-and-report-issue`
151
+ - All 3 files changed: `src/daemon/workflow-runner.ts`, `src/daemon/daemon-events.ts`, `tests/unit/workflow-runner-report-issue.test.ts`
152
+ - Commit message: `feat(console): richer daemon system prompt and report_issue tool`
153
+
154
+ Wait -- scope is `daemon`, not `console`. Correct commit message:
155
+ `feat(mcp): richer daemon system prompt and report_issue tool for auto-fix coordinator`
156
+
157
+ Actually these are daemon changes. The allowed scopes from CLAUDE.md are: `console`, `mcp`, `workflows`, `engine`, `schema`, `docs`. The daemon lives under `mcp` in this codebase (daemon is part of the WorkRail server). Use scope `mcp`.
158
+
159
+ ---
160
+
161
+ ## Philosophy Alignment Per Slice
162
+
163
+ ### Slice 2 (daemon-events.ts)
164
+ - Exhaustiveness everywhere -> satisfied (new union variant, TS enforces handling)
165
+ - Make illegal states unrepresentable -> satisfied (literal unions for issueKind/severity)
166
+ - Immutability by default -> satisfied (readonly fields)
167
+
168
+ ### Slice 3 (BASE_SYSTEM_PROMPT)
169
+ - Functional core, imperative shell -> satisfied (buildSystemPrompt remains pure)
170
+ - Immutability by default -> satisfied (const)
171
+ - Document why not what -> satisfied (JSDoc on constant)
172
+ - YAGNI with discipline -> satisfied (no speculative additions)
173
+
174
+ ### Slice 4 (makeReportIssueTool)
175
+ - Observability as a constraint -> satisfied (fire-and-forget, never blocks)
176
+ - Errors are data -> satisfied (execute() returns AgentToolResult, never throws)
177
+ - Prefer fakes over mocks -> satisfied (issuesDirOverride for tests)
178
+ - YAGNI with discipline -> satisfied (no IssueStore class)
179
+ - Exhaustiveness everywhere -> satisfied (return value handles all severity levels)
180
+
181
+ ### Slice 5 (tests)
182
+ - Prefer fakes over mocks -> satisfied (temp dir, no fs mocking)
183
+ - Determinism -> satisfied (all test writes go to unique temp dirs)
184
+
185
+ ---
186
+
187
+ ## Summary
188
+
189
+ - `estimatedPRCount`: 1
190
+ - `planConfidenceBand`: High
191
+ - `unresolvedUnknownCount`: 0
192
+ - `followUpTickets`: Extract IssueStore class when auto-fix coordinator is built
@@ -0,0 +1,146 @@
1
+ # Design: Workflow ID Validation at Daemon Startup
2
+
3
+ **Status:** Decision made -- implement Candidate A
4
+ **Date:** 2026-04-16
5
+ **Context:** Backlog item "Workflow ID validation at startup" (Tier 1, groomed Apr 18)
6
+
7
+ ---
8
+
9
+ ## Problem Understanding
10
+
11
+ ### The Bug
12
+
13
+ A user writes `workflowId: coding-task-workflow-agentic.lean.v2` (filename without extension) instead of `coding-task-workflow-agentic` (the actual workflow ID). The daemon starts fine, accepts webhooks, but every dispatch silently fails with `workflow_not_found`. The error only surfaces in logs, not at startup. The operator has no way to know their trigger is broken until they watch logs during an actual webhook event.
14
+
15
+ ### Core Tensions
16
+
17
+ 1. **Testability vs. production simplicity** -- `ctx.workflowService.getWorkflowById` is available in production but tests use `FAKE_CTX = {} as V2ToolContext` where `workflowService` is `undefined`. Requires an injectable function approach, not direct ctx access.
18
+ 2. **Warn+skip consistency vs. fail-fast** -- `loadTriggerConfig` already chose warn+skip for invalid triggers. A hard-fail here would create two conflicting behaviors in the same startup path.
19
+ 3. **Where to wire the lookup** -- `StartTriggerListenerOptions` injectable (matches existing `runWorkflowFn` pattern) vs. direct `ctx` access.
20
+
21
+ ### Likely Seam
22
+
23
+ `startTriggerListener` in `src/trigger/trigger-listener.ts`, after `buildTriggerIndex()` returns ok (~line 235), before `new TriggerRouter(...)`. This is the correct seam -- triggers are loaded and indexed, but no webhooks can arrive yet.
24
+
25
+ ### What Makes This Hard
26
+
27
+ - `FAKE_CTX = {} as V2ToolContext` in tests -- direct `ctx.workflowService` use breaks existing test infrastructure without any compile-time warning.
28
+ - Need to decide what happens when `getWorkflowByIdFn` is not provided (backward compat: skip validation entirely).
29
+ - Workflows are static YAML files -- if not found at startup, they will never be found at dispatch time either. No "not found now, maybe later" case exists.
30
+
31
+ ---
32
+
33
+ ## Philosophy Constraints
34
+
35
+ **Sources:**
36
+ - `/Users/etienneb/CLAUDE.md`: "Dependency injection for boundaries -- inject external effects (I/O, clocks, randomness) to keep core logic testable"
37
+ - `/Users/etienneb/CLAUDE.md`: "Validate at boundaries, trust inside -- do input validation at system edges"
38
+ - Repo pattern: `runWorkflowFn?: RunWorkflowFn` in `StartTriggerListenerOptions` -- exact injectable pattern to follow
39
+ - Repo pattern: `loadTriggerConfig` warn+skip -- policy to remain consistent with
40
+
41
+ **No conflicts.** All sources agree on: DI injectable for testability, warn+skip policy, validate at the startup boundary.
42
+
43
+ ---
44
+
45
+ ## Impact Surface
46
+
47
+ - **`src/trigger/trigger-listener.ts`** -- primary change. New validation loop and new `StartTriggerListenerOptions` field.
48
+ - **`tests/unit/trigger-router.test.ts`** -- add new test cases. Existing tests unaffected (they don't provide `getWorkflowByIdFn`, so validation is skipped -- same behavior as today).
49
+ - **`src/trigger/trigger-router.ts`** -- no change. Router already handles `workflow_not_found` at dispatch; this is an earlier defense layer.
50
+ - **`src/trigger/trigger-store.ts`** -- no change. YAML parsing is separate from workflow ID resolution.
51
+ - **`src/trigger/types.ts`** -- no change. `TriggerDefinition` shape unchanged.
52
+
53
+ ---
54
+
55
+ ## Candidates
56
+
57
+ ### Candidate A -- Injectable function on StartTriggerListenerOptions (RECOMMENDED)
58
+
59
+ **Summary:** Add `getWorkflowByIdFn?: (id: string) => Promise<boolean>` to `StartTriggerListenerOptions`. Production path defaults to `(id) => ctx.workflowService.getWorkflowById(id).then(w => w !== null)`. When not provided, validation is skipped (backward compat for existing tests).
60
+
61
+ **Tensions resolved:** Testability (tests inject stub), warn+skip consistency, DI principle.
62
+ **Tensions accepted:** Slight verbosity (new option field). Validation silently skipped if fn not provided (intentional).
63
+
64
+ **Boundary:** `startTriggerListener`, after `buildTriggerIndex()` returns ok.
65
+ **Why this boundary:** Single assembly point before the router accepts any traffic. Earlier (store layer) would require making `loadTriggerConfig` async. Later (dispatch time) is too late -- that's the bug we're fixing.
66
+
67
+ **Failure mode:** Existing tests that don't inject `getWorkflowByIdFn` silently skip validation. This is intentional backward compat, not a latent bug -- they still test all other startup behavior.
68
+
69
+ **Repo pattern:** Exact match to `runWorkflowFn?: RunWorkflowFn` in the same `StartTriggerListenerOptions` interface.
70
+
71
+ **Gains:** Full testability, no changes to existing tests, clean DI seam, consistent with all philosophy principles.
72
+ **Losses:** Caller must inject the fn to get validation. If someone creates a new caller of `startTriggerListener` without providing it, they get no validation. (Low risk: only one production caller.)
73
+
74
+ **Scope judgment:** Best-fit. Changes only `trigger-listener.ts` and adds tests. No interface changes to store or router.
75
+
76
+ **Philosophy fit:** Honors "Dependency injection for boundaries", "Validate at boundaries, trust inside". No conflicts.
77
+
78
+ ---
79
+
80
+ ### Candidate B -- Use ctx.workflowService directly with null guard
81
+
82
+ **Summary:** Call `ctx.workflowService?.getWorkflowById(id)` directly in the validation loop, skipping the whole loop if `ctx.workflowService` is undefined.
83
+
84
+ **Tensions resolved:** Production simplicity (no new option field).
85
+ **Tensions accepted:** Testability gap -- the warn+skip behavior can't be tested without constructing a real `workflowService` in `ctx`.
86
+
87
+ **Failure mode:** New validation behavior is untestable with the existing `FAKE_CTX` test infrastructure.
88
+
89
+ **Repo pattern:** Departs from `runWorkflowFn` injectable pattern. Conflicts with DI principle.
90
+
91
+ **Scope judgment:** Best-fit for production behavior, too narrow for test coverage.
92
+
93
+ **Philosophy fit:** Conflicts with "Dependency injection for boundaries".
94
+
95
+ ---
96
+
97
+ ### Candidate C -- Validate inside loadTriggerConfig (store layer)
98
+
99
+ **Summary:** Add `workflowResolver?: (id: string) => Promise<boolean>` to `loadTriggerConfig`, filtering unknown workflowId triggers at parse time.
100
+
101
+ **Tensions resolved:** Centralizes all trigger validation.
102
+ **Tensions accepted:** `trigger-store.ts` is a pure synchronous YAML parser; making it async for the resolver breaks its pure/impure boundary and all existing sync call sites.
103
+
104
+ **Failure mode:** Breaks `loadTriggerConfig`'s synchronous interface contract. All existing callers would need updating.
105
+
106
+ **Repo pattern:** Departs from the pure-sync design of `trigger-store.ts`.
107
+
108
+ **Scope judgment:** Too broad -- adds async I/O to a pure parsing module with no justification beyond this feature.
109
+
110
+ **Philosophy fit:** Conflicts with "Compose with small, pure functions".
111
+
112
+ ---
113
+
114
+ ## Comparison and Recommendation
115
+
116
+ | Tension | A (Injectable) | B (ctx direct) | C (store layer) |
117
+ |---------|---------------|----------------|-----------------|
118
+ | Testability | Wins | Loses | N/A |
119
+ | Warn+skip consistency | Wins | Wins | Breaks pure boundary |
120
+ | DI principle | Honors | Conflicts | Conflicts |
121
+ | Repo pattern fit | Exact match | Departs | Departs |
122
+ | Reversibility | Easy | Easy | Hard |
123
+
124
+ **Recommendation: Candidate A.** It resolves all tensions, is a direct repo-pattern match, requires minimal code change, and leaves all existing tests unchanged.
125
+
126
+ ---
127
+
128
+ ## Self-Critique
129
+
130
+ **Strongest counter-argument:** "Why add a new option when `ctx.workflowService` is already there? That's extra API surface for a one-time startup check." -- Response: `FAKE_CTX = {} as V2ToolContext` (line 33, `trigger-router.test.ts`) means `ctx.workflowService` is `undefined` at test runtime. Without the injectable, the new validation behavior is untestable. Fixing a silent-failure bug without being able to test it is unacceptable.
131
+
132
+ **Narrower option that lost:** Candidate B (ctx direct with null guard). Loses because new behavior is untestable.
133
+
134
+ **Broader option that would need evidence:** Candidate C (store layer) would be justified if multiple callers of `loadTriggerConfig` needed workflow ID validation -- but there is only one production caller. The scope increase is not warranted.
135
+
136
+ **Invalidating assumption:** If `FAKE_CTX` were replaced by a real mock with a `workflowService`, Candidate B would be equally valid. But that's a larger test infrastructure change that's out of scope.
137
+
138
+ ---
139
+
140
+ ## Open Questions for the Main Agent
141
+
142
+ None. All design decisions are resolved. Implementation is straightforward:
143
+ 1. Add `getWorkflowByIdFn?: (id: string) => Promise<boolean>` to `StartTriggerListenerOptions`
144
+ 2. After `buildTriggerIndex()` returns ok, if `getWorkflowByIdFn` is provided, iterate `triggerIndex`, call fn for each `workflowId`, warn and delete unknowns
145
+ 3. Production default (when fn not provided): use `ctx.workflowService.getWorkflowById(id).then(w => w !== null)`
146
+ 4. Add test cases for: warn+skip on unknown workflowId, valid workflowId passes through, fn not provided skips validation
@@ -0,0 +1,87 @@
1
+ # Design Review: Workflow ID Validation at Daemon Startup
2
+
3
+ **Design under review:** Candidate A -- injectable `getWorkflowByIdFn` on `StartTriggerListenerOptions`
4
+ **Date:** 2026-04-16
5
+
6
+ ---
7
+
8
+ ## Tradeoff Review
9
+
10
+ | Tradeoff | Acceptable? | Condition that breaks it |
11
+ |----------|-------------|--------------------------|
12
+ | Validation silently skipped when fn not provided | Yes | A new production call site added without the fn |
13
+ | New option field (API surface) | Yes | Interface is internal, non-breaking |
14
+
15
+ Hidden assumption: single production call site for `startTriggerListener`. True today.
16
+
17
+ **Mitigation added:** Log message when fn not provided, making the skip visible in startup logs.
18
+
19
+ ---
20
+
21
+ ## Failure Mode Review
22
+
23
+ | Failure Mode | Handled? | Action Required |
24
+ |--------------|----------|-----------------|
25
+ | FM1: getWorkflowByIdFn throws/rejects | NOT YET | Add try/catch around each fn call; warn+skip on error |
26
+ | FM2: transient workflow unavailability | Acceptable | warn+skip behavior is correct here |
27
+ | FM3: Map mutation during iteration | NOT YET | Collect unknowns in first pass, delete in second pass |
28
+ | FM4: ctx.workflowService undefined in production | Needs guard | Use optional chaining `?.` in default fn |
29
+
30
+ **Highest-risk:** FM1. An unhandled rejection would crash `startTriggerListener`. Must fix.
31
+
32
+ ---
33
+
34
+ ## Runner-Up / Simpler Alternative Review
35
+
36
+ - Runner-up (Candidate B, ctx direct): no elements to borrow. Testability loss outweighs API surface saving.
37
+ - No simpler variant satisfies both testability and correctness requirements.
38
+ - Candidate A is already minimum viable for the acceptance criteria.
39
+
40
+ ---
41
+
42
+ ## Philosophy Alignment
43
+
44
+ **Satisfied:** Dependency injection, validate at boundaries, errors are data, YAGNI, surface information.
45
+ **Under tension (acceptable):**
46
+ - "Make illegal states unrepresentable" -- TriggerDefinition can still hold invalid workflowIds. Compile-time enforcement would require two-phase types; over-engineering for a Small task.
47
+ - "Immutability by default" -- triggerIndex Map is mutated, but mutation is local (created and modified within `startTriggerListener`, not shared until passed to TriggerRouter).
48
+
49
+ ---
50
+
51
+ ## Findings
52
+
53
+ **ORANGE -- FM1: Unhandled rejection from getWorkflowByIdFn**
54
+ The current design has no try/catch around the fn call. An I/O error in the workflow storage lookup would propagate as an unhandled rejection and crash `startTriggerListener`. Fix: wrap each `await getWorkflowByIdFn(trigger.workflowId)` in try/catch; on error, log warning and skip that trigger (same policy as other validation failures).
55
+
56
+ **YELLOW -- FM3: Two-pass Map deletion**
57
+ Must not delete from `triggerIndex` while iterating it. Fix: collect unknown IDs in an array during the loop, then delete in a second pass.
58
+
59
+ **YELLOW -- FM4: ctx.workflowService guard**
60
+ The default fn production expression `ctx.workflowService.getWorkflowById(id).then(...)` will throw if `workflowService` is undefined. Fix: use optional chaining `ctx.workflowService?.getWorkflowById(id).then(w => w !== null) ?? true` (treat unavailable service as "found" -- skip validation rather than crash).
61
+
62
+ ---
63
+
64
+ ## Recommended Revisions
65
+
66
+ 1. **Required:** Add try/catch in the validation loop; treat fn errors as warn+skip.
67
+ 2. **Required:** Collect unknowns first, delete after iteration.
68
+ 3. **Required:** Use optional chaining for the default production fn.
69
+ 4. **Nice-to-have:** Log `[TriggerListener] workflowId validation skipped (no resolver provided)` when fn is absent, for observability.
70
+
71
+ ---
72
+
73
+ ## Residual Concerns
74
+
75
+ - The "silent skip when fn not provided" is acceptable but relies on a naming convention (option field) to communicate intent. Future callers won't get a compile-time reminder to provide the fn. This is a documentation concern, not a correctness concern.
76
+ - `onComplete.workflowId` is not validated by this design. Out of scope for this task; should be a follow-up if `onComplete` usage grows.
77
+ - No RED findings. All issues are fixable at implementation time with minor code additions.
78
+
79
+ ---
80
+
81
+ ## Pass 2 Findings (incremental)
82
+
83
+ No new RED or ORANGE findings. Design revisions from pass 1 are sufficient.
84
+
85
+ **New observation (YELLOW):** `onComplete.workflowId` (secondary workflow for completion hooks) is not validated. Accepted as out of scope -- add a comment in the implementation noting this limitation.
86
+
87
+ **Performance:** Sequential validation of N triggers is acceptable at expected trigger counts (1-10). No action needed.
@@ -0,0 +1,185 @@
1
+ # Implementation Plan: Workflow ID Validation at Daemon Startup
2
+
3
+ **Status:** Ready to implement
4
+ **Branch:** `fix/workflow-id-validation-at-startup`
5
+ **Date:** 2026-04-16
6
+
7
+ ---
8
+
9
+ ## 1. Problem Statement
10
+
11
+ When a user writes an incorrect `workflowId` in `triggers.yml` (e.g., `coding-task-workflow-agentic.lean.v2` instead of `coding-task-workflow-agentic`), the daemon starts successfully, accepts webhooks, but every dispatch silently fails with `workflow_not_found`. The error only appears in logs during actual webhook events -- not at startup. This is a silent-failure bug.
12
+
13
+ ---
14
+
15
+ ## 2. Acceptance Criteria
16
+
17
+ - [x] At daemon startup, after loading and indexing triggers, validate that each trigger's `workflowId` resolves to a known workflow
18
+ - [x] Triggers with unknown `workflowId` are logged with a clear warning (naming the triggerId and the bad workflowId) and removed from the active index
19
+ - [x] Triggers with valid `workflowId` start normally
20
+ - [x] If `getWorkflowByIdFn` throws or rejects, that trigger is also warned+skipped (not a daemon crash)
21
+ - [x] Existing behavior when `getWorkflowByIdFn` is not provided: validation is skipped (backward compat, logged)
22
+ - [x] Existing tests continue to pass without modification
23
+
24
+ ---
25
+
26
+ ## 3. Non-Goals
27
+
28
+ - No hard-fail policy (daemon does not refuse to start; it starts with fewer triggers)
29
+ - No validation of `onComplete.workflowId` (secondary workflow ID -- follow-up ticket)
30
+ - No changes to `trigger-store.ts` or `TriggerDefinition` type
31
+ - No re-validation on webhook arrival
32
+ - No dynamic reload / hot-reload of trigger config
33
+ - No change to `trigger-router.ts` (it already handles `workflow_not_found` at dispatch)
34
+
35
+ ---
36
+
37
+ ## 4. Philosophy-Driven Constraints
38
+
39
+ - **Dependency injection**: `getWorkflowByIdFn` must be injectable -- no direct `ctx.workflowService` access
40
+ - **Validate at boundaries**: validation runs at `startTriggerListener` (startup boundary), not inside routing
41
+ - **Errors are data**: validation failures are warnings + skip, not thrown exceptions
42
+ - **Document why**: implementation must include WHY comments on the key decisions
43
+ - **Warn+skip over hard-fail**: consistent with `loadTriggerConfig` existing behavior
44
+
45
+ ---
46
+
47
+ ## 5. Invariants
48
+
49
+ 1. The `triggerIndex` passed to `TriggerRouter` contains ONLY triggers whose `workflowId` was confirmed to exist (when `getWorkflowByIdFn` is provided)
50
+ 2. The validation loop MUST NOT mutate `triggerIndex` during iteration (collect unknowns first, delete after)
51
+ 3. A `getWorkflowByIdFn` rejection/throw MUST NOT propagate -- it is caught, the trigger is warned+skipped
52
+ 4. When `getWorkflowByIdFn` is absent, validation is skipped entirely (backward compat) and a log message says so
53
+ 5. `DefaultWorkflowService.getWorkflowById` delegates directly to storage (no compilation cache interference) -- validation results are authoritative
54
+
55
+ ---
56
+
57
+ ## 6. Selected Approach
58
+
59
+ **Add `getWorkflowByIdFn?: (id: string) => Promise<boolean>` to `StartTriggerListenerOptions`.**
60
+
61
+ In `startTriggerListener`, after `buildTriggerIndex()` returns ok, if `getWorkflowByIdFn` is provided:
62
+ 1. Iterate `triggerIndex` entries (read-only pass), collect unknown workflowIds
63
+ 2. For each, try `await getWorkflowByIdFn(trigger.workflowId)` -- catch rejection, treat as false
64
+ 3. Collect trigger IDs where result is false or threw
65
+ 4. After iteration: delete collected IDs from `triggerIndex`, log warnings
66
+ 5. Log summary if any were skipped
67
+
68
+ Production default (not on the option -- called inline): `async (id) => (await ctx.workflowService?.getWorkflowById(id)) !== null`.
69
+
70
+ **Runner-up:** Candidate B (ctx direct with null guard) -- lost because `FAKE_CTX = {} as V2ToolContext` makes the behavior untestable.
71
+
72
+ ---
73
+
74
+ ## 7. Vertical Slices
75
+
76
+ ### Slice 1 -- Core validation logic in `startTriggerListener`
77
+
78
+ **Files:** `src/trigger/trigger-listener.ts`
79
+
80
+ **Changes:**
81
+ - Add `getWorkflowByIdFn?: (id: string) => Promise<boolean>` to `StartTriggerListenerOptions`
82
+ - After `buildTriggerIndex()` returns ok (before `new TriggerRouter`): add validation block
83
+ - Validation block logic:
84
+ ```
85
+ if (getWorkflowByIdFn) {
86
+ const unknownTriggerIds: string[] = []
87
+ for (const [triggerId, trigger] of triggerIndex) {
88
+ let found: boolean
89
+ try {
90
+ found = await getWorkflowByIdFn(trigger.workflowId)
91
+ } catch (e) {
92
+ found = false
93
+ console.warn(`[TriggerListener] Error validating workflowId '${trigger.workflowId}' for trigger '${triggerId}': ${e}`)
94
+ }
95
+ if (!found) {
96
+ unknownTriggerIds.push(triggerId)
97
+ console.warn(`[TriggerListener] Skipping trigger '${triggerId}': workflowId '${trigger.workflowId}' not found`)
98
+ }
99
+ }
100
+ for (const id of unknownTriggerIds) { triggerIndex.delete(id) }
101
+ if (unknownTriggerIds.length > 0) {
102
+ console.warn(`[TriggerListener] Skipped ${unknownTriggerIds.length} trigger(s) with unknown workflowId(s)`)
103
+ }
104
+ } else {
105
+ console.log(`[TriggerListener] workflowId validation skipped (no resolver provided)`)
106
+ }
107
+ ```
108
+ - Add production default invocation: pass `async (id) => (await ctx.workflowService?.getWorkflowById(id)) !== null` as the default when option not provided
109
+
110
+ **Acceptance criterion:** Unknown workflowId triggers are not present in the index passed to `TriggerRouter`.
111
+
112
+ ### Slice 2 -- Tests in `trigger-router.test.ts`
113
+
114
+ **Files:** `tests/unit/trigger-router.test.ts`
115
+
116
+ **New test cases (in a new describe block `startTriggerListener workflowId validation`):**
117
+ 1. Triggers with unknown workflowId are warned and skipped (index excludes them, server starts)
118
+ 2. Triggers with valid workflowId are kept in the index
119
+ 3. When `getWorkflowByIdFn` is not provided, validation is skipped and all triggers are kept
120
+ 4. When `getWorkflowByIdFn` rejects, that trigger is warned and skipped (daemon doesn't crash)
121
+ 5. Mix: some valid, some invalid -- only valid triggers remain
122
+
123
+ **Acceptance criterion:** All 5 test cases pass.
124
+
125
+ ---
126
+
127
+ ## 8. Test Design
128
+
129
+ **Pattern to follow:** `startTriggerListener` tests in `trigger-router.test.ts` (~line 432). Same structure:
130
+ - Use `tmpPath()` for workspacePath
131
+ - `env: { WORKRAIL_TRIGGERS_ENABLED: 'true' }`
132
+ - `port: 0` for OS-assigned port
133
+ - `runWorkflowFn: vi.fn()`
134
+ - `workspaces: {}` to skip workspace config loading
135
+
136
+ **Fixtures needed:**
137
+ - A minimal `triggers.yml` with two triggers: one with valid workflowId, one with invalid
138
+ - `getWorkflowByIdFn` stub: `vi.fn().mockImplementation(async (id: string) => id === 'coding-task-workflow-agentic')`
139
+
140
+ **Note:** Tests write real `triggers.yml` files to `tmpPath()` directories (pattern established in existing tests). Check how existing `startTriggerListener` tests set up workspace directories.
141
+
142
+ ---
143
+
144
+ ## 9. Risk Register
145
+
146
+ | Risk | Likelihood | Impact | Mitigation |
147
+ |------|-----------|--------|-----------|
148
+ | `ctx.workflowService` undefined in production | Low | Medium | Optional chaining `?.` in default fn |
149
+ | FM1: getWorkflowByIdFn throws | Low-Medium | High | try/catch per call, warn+skip |
150
+ | FM3: Map mutation during iteration | Low (easy to avoid) | Medium | Two-pass (collect then delete) |
151
+ | False positive (valid workflow not found due to I/O error) | Low | Low | Same as FM1 -- warn+skip, operator can restart |
152
+ | `onComplete.workflowId` still silent-fails | Medium | Low | Accepted, documented, follow-up ticket |
153
+
154
+ ---
155
+
156
+ ## 10. PR Packaging
157
+
158
+ **Single PR.** Small task, all changes in 2 files. Branch: `fix/workflow-id-validation-at-startup`.
159
+
160
+ PR title: `fix(trigger): warn and skip triggers with unknown workflowId at startup`
161
+
162
+ ---
163
+
164
+ ## 11. Philosophy Alignment Per Slice
165
+
166
+ | Principle | Slice 1 | Slice 2 |
167
+ |-----------|---------|---------|
168
+ | Dependency injection for boundaries | Satisfied -- fn injectable | Satisfied -- tests inject stub |
169
+ | Validate at boundaries | Satisfied -- startup boundary | N/A |
170
+ | Errors are data | Satisfied -- warn+skip, no throw | Satisfied -- tests verify no crash |
171
+ | Document why | Satisfied -- WHY comments required | N/A |
172
+ | Warn+skip over hard-fail | Satisfied | Verified by tests |
173
+ | Immutability by default | Tension -- triggerIndex mutated, but local scope | N/A |
174
+
175
+ ---
176
+
177
+ ## 12. Follow-up Tickets
178
+
179
+ - `onComplete.workflowId` validation (secondary workflow IDs in completion hooks)
180
+
181
+ ---
182
+
183
+ **unresolvedUnknownCount:** 0
184
+ **planConfidenceBand:** High
185
+ **estimatedPRCount:** 1