@exaudeus/workrail 3.39.0 → 3.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (97) hide show
  1. package/dist/cli/commands/init.js +0 -3
  2. package/dist/cli-worktrain.js +58 -26
  3. package/dist/cli.js +0 -18
  4. package/dist/config/app-config.d.ts +0 -16
  5. package/dist/config/app-config.js +0 -14
  6. package/dist/config/config-file.js +0 -3
  7. package/dist/console-ui/assets/index-CQt4UhPB.js +28 -0
  8. package/dist/console-ui/assets/index-DGj8EsFR.css +1 -0
  9. package/dist/console-ui/index.html +2 -2
  10. package/dist/coordinators/pr-review.d.ts +23 -1
  11. package/dist/coordinators/pr-review.js +224 -5
  12. package/dist/daemon/daemon-events.d.ts +9 -1
  13. package/dist/daemon/soul-template.d.ts +2 -2
  14. package/dist/daemon/soul-template.js +11 -1
  15. package/dist/daemon/workflow-runner.d.ts +17 -3
  16. package/dist/daemon/workflow-runner.js +401 -28
  17. package/dist/di/container.js +1 -25
  18. package/dist/di/tokens.d.ts +0 -3
  19. package/dist/di/tokens.js +0 -3
  20. package/dist/engine/engine-factory.js +0 -1
  21. package/dist/infrastructure/console-defaults.d.ts +1 -0
  22. package/dist/infrastructure/console-defaults.js +4 -0
  23. package/dist/infrastructure/session/index.d.ts +0 -1
  24. package/dist/infrastructure/session/index.js +1 -3
  25. package/dist/manifest.json +124 -124
  26. package/dist/mcp/handlers/session.d.ts +1 -0
  27. package/dist/mcp/handlers/session.js +61 -13
  28. package/dist/mcp/output-schemas.d.ts +10 -10
  29. package/dist/mcp/server.js +1 -18
  30. package/dist/mcp/tools.d.ts +12 -12
  31. package/dist/mcp/transports/http-entry.js +0 -2
  32. package/dist/mcp/transports/stdio-entry.js +1 -2
  33. package/dist/mcp/types.d.ts +0 -2
  34. package/dist/trigger/daemon-console.d.ts +2 -0
  35. package/dist/trigger/daemon-console.js +1 -1
  36. package/dist/trigger/trigger-listener.d.ts +2 -0
  37. package/dist/trigger/trigger-listener.js +3 -1
  38. package/dist/trigger/trigger-router.d.ts +4 -3
  39. package/dist/trigger/trigger-router.js +13 -5
  40. package/dist/trigger/trigger-store.js +17 -4
  41. package/dist/types/workflow-source.d.ts +0 -1
  42. package/dist/types/workflow-source.js +3 -6
  43. package/dist/types/workflow.d.ts +1 -1
  44. package/dist/types/workflow.js +1 -2
  45. package/dist/v2/durable-core/domain/artifact-contract-validator.js +66 -0
  46. package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.d.ts +25 -0
  47. package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.js +31 -0
  48. package/dist/v2/durable-core/schemas/artifacts/index.d.ts +3 -1
  49. package/dist/v2/durable-core/schemas/artifacts/index.js +14 -1
  50. package/dist/v2/durable-core/schemas/artifacts/review-verdict.d.ts +41 -0
  51. package/dist/v2/durable-core/schemas/artifacts/review-verdict.js +30 -0
  52. package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +236 -236
  53. package/dist/v2/durable-core/schemas/session/events.d.ts +50 -50
  54. package/dist/v2/durable-core/schemas/session/gaps.d.ts +2 -2
  55. package/dist/v2/durable-core/schemas/session/manifest.d.ts +4 -4
  56. package/dist/v2/durable-core/schemas/session/outputs.d.ts +8 -8
  57. package/dist/v2/usecases/console-routes.d.ts +2 -1
  58. package/dist/v2/usecases/console-routes.js +207 -5
  59. package/dist/v2/usecases/console-service.js +14 -0
  60. package/dist/v2/usecases/console-types.d.ts +1 -0
  61. package/docs/authoring.md +16 -16
  62. package/docs/design/coordinator-artifact-protocol-design-candidates.md +155 -0
  63. package/docs/design/coordinator-artifact-protocol-design-review.md +103 -0
  64. package/docs/design/coordinator-artifact-protocol-implementation-plan.md +259 -0
  65. package/docs/design/coordinator-message-queue-drain-plan.md +241 -0
  66. package/docs/design/coordinator-message-queue-drain-review.md +120 -0
  67. package/docs/design/coordinator-message-queue-drain.md +289 -0
  68. package/docs/design/shaping-workflow-external-research.md +119 -0
  69. package/docs/discovery/late-bound-goals-impl-plan.md +147 -0
  70. package/docs/discovery/late-bound-goals-review.md +82 -0
  71. package/docs/discovery/late-bound-goals.md +118 -0
  72. package/docs/discovery/steer-endpoint-design-candidates.md +288 -0
  73. package/docs/discovery/steer-endpoint-design-review-findings.md +104 -0
  74. package/docs/discovery/steer-endpoint-implementation-plan.md +284 -0
  75. package/docs/ideas/backlog.md +447 -97
  76. package/docs/ideas/design-candidates-console-session-tree-impl.md +64 -0
  77. package/docs/ideas/design-candidates-session-tree-view.md +196 -0
  78. package/docs/ideas/design-review-findings-console-session-tree-impl.md +75 -0
  79. package/docs/ideas/design-review-findings-session-tree-view.md +88 -0
  80. package/docs/ideas/implementation_plan_session_tree_view.md +238 -0
  81. package/package.json +2 -1
  82. package/spec/authoring-spec.json +16 -16
  83. package/spec/shape.schema.json +178 -0
  84. package/spec/workflow-tags.json +232 -47
  85. package/workflows/coding-task-workflow-agentic.json +491 -480
  86. package/workflows/mr-review-workflow.agentic.v2.json +5 -1
  87. package/workflows/wr.shaping.json +182 -0
  88. package/dist/console-ui/assets/index-3oXZ_A9m.js +0 -28
  89. package/dist/console-ui/assets/index-8dh0Psu-.css +0 -1
  90. package/dist/infrastructure/session/DashboardHeartbeat.d.ts +0 -8
  91. package/dist/infrastructure/session/DashboardHeartbeat.js +0 -39
  92. package/dist/infrastructure/session/DashboardLockRelease.d.ts +0 -2
  93. package/dist/infrastructure/session/DashboardLockRelease.js +0 -29
  94. package/dist/infrastructure/session/HttpServer.d.ts +0 -60
  95. package/dist/infrastructure/session/HttpServer.js +0 -912
  96. package/workflows/coding-task-workflow-agentic.lean.v2.json +0 -648
  97. package/workflows/coding-task-workflow-agentic.v2.json +0 -324
@@ -0,0 +1,120 @@
1
+ # Design Review Findings: Coordinator Message Queue Drain
2
+
3
+ **Design reviewed:** Candidate B from `coordinator-message-queue-drain.md`
4
+ (drainMessageQueue with cursor + text parsing)
5
+
6
+ ---
7
+
8
+ ## Tradeoff Review
9
+
10
+ ### T1: Stringly-typed dispatch (free-form text parsing)
11
+
12
+ Accepted tradeoff. The `^\\s*stop\\b/i` anchor pattern is narrower than bare `stop` matching
13
+ and covers realistic CLI usage. The risk of false-positive halt is real but diagnosable -- the
14
+ outbox notification includes the triggering message text. Condition for no longer acceptable:
15
+ automated tooling writing to the queue. Explicitly documented as a pivot trigger for Candidate C.
16
+
17
+ ### T2: New cursor file on disk
18
+
19
+ Fully acceptable. Same format as `InboxCursor`; desync guard handles truncation; write failure
20
+ is non-fatal. No new schema maintenance burden.
21
+
22
+ ### T3: Outbox notifications for all actionable messages
23
+
24
+ Fully acceptable. Outbox write failure is non-fatal; stderr provides a backup diagnostic.
25
+ Including notifications for all actions (not just `stop`) is the right call -- users need the
26
+ feedback loop.
27
+
28
+ ---
29
+
30
+ ## Failure Mode Review
31
+
32
+ | FM | Description | Mitigation | Residual risk |
33
+ |---|---|---|---|
34
+ | FM1 | `stop` fires on note message | `^\\s*stop\\b` anchor; outbox shows triggering text | Low -- diagnosable and recoverable |
35
+ | FM2 | Cursor desync after queue wipe | Reset to 0 if cursor > totalLines | Low -- re-triggers past stop if present; outbox makes it visible |
36
+ | FM3 | Duplicate add-pr | Set dedup before Stage 1 | None |
37
+ | FM4 | Outbox write failure during stop | Non-fatal; stderr fallback | None -- stop still honored |
38
+ | FM5 | ENOENT (no queue file) | Return empty DrainResult | None -- expected on fresh install |
39
+
40
+ **Highest-risk failure mode:** FM1. Must include triggering message text and timestamp in the
41
+ outbox notification and stderr log -- this is a required implementation detail, not optional.
42
+
43
+ ---
44
+
45
+ ## Runner-Up / Simpler Alternative Review
46
+
47
+ **Candidate C strengths borrowed:** Structured parse result logged to stderr (`[INFO drain:kind=stop
48
+ message=...]`) -- same diagnostic value as a `kind` field at zero schema cost.
49
+
50
+ **Simpler variant (skip outbox notifications):** Rejected -- silent halt is a UX regression.
51
+
52
+ **Simpler variant (skip `add-pr`):** Viable as a scope reduction. Included in this PR because the
53
+ implementation cost is ~10 lines, and `skip-pr` without `add-pr` is asymmetric.
54
+
55
+ ---
56
+
57
+ ## Philosophy Alignment
58
+
59
+ **Clearly satisfied:** Immutability, errors as values, DI, validate at boundaries, determinism,
60
+ fakes over mocks, small pure functions, document WHY.
61
+
62
+ **Under tension:**
63
+ - "Explicit domain types over primitives" -- free-form text dispatch. Acceptable: pre-existing
64
+ schema constraint, documented as follow-up.
65
+ - "Make illegal states unrepresentable" -- `DrainResult` can represent `stop: true` with
66
+ non-empty `skipPrNumbers`. Acceptable: `stop` check is first at call site; documented.
67
+
68
+ ---
69
+
70
+ ## Findings
71
+
72
+ ### YELLOW: `stop` regex false-positive on note messages
73
+
74
+ The `^\\s*stop\\b/i` pattern is significantly better than bare `stop` matching, but it will still
75
+ fire on a message like "stop and think about this before merging." No additional regex constraint
76
+ is practical without excluding valid stop forms. The mitigation (outbox + stderr with triggering
77
+ message text) is the correct and sufficient response.
78
+
79
+ **Recommended revision:** None to the pattern itself. Ensure the outbox notification reads:
80
+ `WorkTrain coordinator stopped by queued message: "[full message text]" (queued at [timestamp])`
81
+ rather than a generic "coordinator stopped" message.
82
+
83
+ ### YELLOW: `DrainResult` allows `stop: true` + non-empty `skipPrNumbers`
84
+
85
+ The call site must check `stop` before anything else. If a future maintainer adds code between
86
+ the drain call and the `stop` check, or moves the check, the skip/add arrays could be acted on
87
+ before the stop is honored.
88
+
89
+ **Recommended revision:** Add a JSDoc invariant on `DrainResult`: "When `stop` is true, all
90
+ other fields are informational only. The coordinator MUST honor `stop` before inspecting
91
+ `skipPrNumbers` or `addPrNumbers`." Also add a comment at the call site.
92
+
93
+ ### YELLOW (minor): No structured parse log to stderr
94
+
95
+ Without logging which pattern matched and for which message, diagnosing unexpected behavior
96
+ requires reading the outbox. A one-line stderr log per actionable message helps during
97
+ development and debugging.
98
+
99
+ **Recommended revision:** For each actionable message (stop, skip-pr, add-pr), emit:
100
+ `[INFO coord:drain kind=stop handle=... message="..." ts=...]` to `deps.stderr`.
101
+
102
+ ---
103
+
104
+ ## Recommended Revisions (summary)
105
+
106
+ 1. Outbox notification for `stop` must include the full triggering message text and timestamp.
107
+ 2. Add JSDoc invariant on `DrainResult` documenting that `stop: true` takes absolute precedence.
108
+ 3. Add a `[INFO coord:drain]` stderr log line for each actionable message (diagnostics).
109
+
110
+ None of these revisions change the architecture. All are implementation-level details.
111
+
112
+ ---
113
+
114
+ ## Residual Concerns
115
+
116
+ 1. **Schema follow-up not filed yet.** A GitHub issue or backlog entry for adding a `kind`
117
+ field to `QueuedMessage` (Candidate C path) should be created as part of this PR.
118
+ 2. **No integration test.** Unit tests with fake deps are sufficient for the drain logic, but
119
+ an end-to-end test (write to real queue file, run coordinator, verify outbox) is not planned.
120
+ This is acceptable for a developer CLI tool.
@@ -0,0 +1,289 @@
1
+ # Design Candidates: Coordinator Message Queue Drain
2
+
3
+ **Task:** The PR review coordinator never reads `~/.workrail/message-queue.jsonl`, so
4
+ messages queued via `worktrain tell` (from phone, terminal, or automation) are silently ignored.
5
+ This document captures the design investigation for draining that queue inside the coordinator.
6
+
7
+ ---
8
+
9
+ ## Problem Understanding
10
+
11
+ ### Core tensions
12
+
13
+ 1. **Append-only invariant vs. consumed-message tracking.** The queue file must never be
14
+ truncated or rewritten -- the `worktrain-tell` command's documented invariant. But without
15
+ tracking which messages were processed, the coordinator re-processes the entire history on
16
+ every invocation. A cursor file (same pattern as `inbox-cursor.json`) resolves this cleanly
17
+ but adds a second file to manage.
18
+
19
+ 2. **Stringly-typed messages vs. explicit domain types.** `QueuedMessage.message` is free-form
20
+ text. The repo philosophy demands explicit domain types, but no `kind` field exists in the
21
+ current schema. Text parsing at the coordinator's read boundary is the only option within
22
+ the current schema -- it is not a patch, it is adapting to a pre-existing constraint.
23
+
24
+ 3. **Coordinator statefulness vs. single-pass design.** The coordinator is invoked once per run
25
+ today, not as a persistent loop. A cursor handles both cases correctly: repeat invocations
26
+ see only new messages; a one-time invocation drains everything queued since last run.
27
+
28
+ 4. **`stop` signal semantics vs. partial progress.** A `stop` in the queue must halt before any
29
+ spawn. But `stop` might appear alongside `skip-pr 42` in the same drain batch. `stop` takes
30
+ absolute precedence -- no partial processing, coordinator exits cleanly and writes an outbox
31
+ acknowledgment.
32
+
33
+ ### Likely seam
34
+
35
+ The real seam is the top of `runPrReviewCoordinator()`, immediately before Stage 1 (PR discovery).
36
+ This matches the backlog intent: "coordinator loop checks message-queue at the start of each cycle
37
+ before spawning new agents." The coordinator is the right owner, not a shared utility, because
38
+ message routing is coordinator-specific logic.
39
+
40
+ ### What makes this hard
41
+
42
+ Not technically difficult. The risks are:
43
+ - Forgetting to handle ENOENT (queue file doesn't exist yet = no messages, not a crash)
44
+ - Cursor desync: if the queue is wiped, cursor > total lines; reset to 0 (same guard as `inbox-cursor.json`)
45
+ - Text matching fragility: `stop` in "stop overthinking this" triggers coordinator halt
46
+
47
+ ---
48
+
49
+ ## Philosophy Constraints
50
+
51
+ From `CLAUDE.md` and observed repo patterns:
52
+
53
+ - **Errors are values, never thrown** -- `pr-review.ts` uses `Result<T, string>` throughout.
54
+ The drain result uses a plain `DrainResult` struct (stop is not an error, it is a valid outcome).
55
+ - **All I/O injected via deps** -- new `drainMessageQueue()` must accept deps, not import `fs`.
56
+ - **Immutability by default** -- all interface fields are `readonly`.
57
+ - **Prefer fakes over mocks** -- tests use in-memory fake deps, no `vi.mock()`.
58
+ - **Validate at boundaries, trust inside** -- malformed JSONL lines are skipped at the parse
59
+ boundary; core routing logic trusts parsed data.
60
+ - **Document WHY, not WHAT** -- comments explain rationale, not mechanics.
61
+
62
+ **Conflict:** "Explicit domain types over primitives" is under pressure from the free-form message
63
+ text. The mitigation is narrow keyword patterns and clear documentation. This conflict is not
64
+ resolved in this PR -- a `kind` field on `QueuedMessage` is the proper fix but changes the
65
+ public CLI interface (out of scope here).
66
+
67
+ ---
68
+
69
+ ## Impact Surface
70
+
71
+ Changes that must stay consistent if this design is implemented:
72
+
73
+ - **`CoordinatorDeps` interface** in `src/coordinators/pr-review.ts`: gains `readFile` and
74
+ `appendFile`. These are additive -- no existing caller is broken.
75
+ - **`cli-worktrain.ts` pr-review action**: must wire `readFile` and `appendFile` into the deps
76
+ object (two new lines in the composition root).
77
+ - **`tests/unit/coordinator-pr-review.test.ts`**: every fake `CoordinatorDeps` object needs the
78
+ two new fields. Mechanical but must not be missed.
79
+ - **`discoverConsolePort` deps** (mini-subset type): no change needed; it already has `readFile`.
80
+
81
+ New files introduced on disk (runtime, not source):
82
+ - `~/.workrail/message-queue-cursor.json` -- created on first coordinator run after this ships.
83
+
84
+ ---
85
+
86
+ ## Candidates
87
+
88
+ ### Candidate A -- Minimal: full-history drain, no cursor, timestamp filter
89
+
90
+ **Summary:** On each coordinator run, read all messages in `message-queue.jsonl`, discard messages
91
+ older than the coordinator's start time, act on the remainder.
92
+
93
+ **Tensions resolved:** Simplest change; no new cursor file.
94
+
95
+ **Tensions accepted:** Stale messages re-processed if clock skew or same-second invocations.
96
+ A `stop` message from two days ago can halt a coordinator run today if the clock check is ambiguous.
97
+
98
+ **Boundary:** Inline in `runPrReviewCoordinator()`, no new function or file.
99
+
100
+ **Why this boundary is wrong:** The timestamp filter is not reliable enough. Same-second writes,
101
+ NTP jumps, or leap-second events can cause a current `stop` to be discarded or a stale `stop` to
102
+ fire. The cursor is strictly more correct.
103
+
104
+ **Failure mode:** Stale `stop` from a previous session kills today's coordinator run. No recovery
105
+ path -- the coordinator just exits. Users have to manually inspect the queue to understand why.
106
+
107
+ **Repo-pattern relationship:** Departs -- `worktrain-inbox.ts` uses a cursor precisely to avoid
108
+ the re-processing problem. This candidate ignores the established pattern.
109
+
110
+ **Gains:** Zero new files.
111
+
112
+ **Gives up:** Correctness. Behavior depends on queue history, not just current inputs -- violates
113
+ "determinism over cleverness."
114
+
115
+ **Scope judgment:** Too narrow -- solves the immediate symptom but breaks on any real usage.
116
+
117
+ **Philosophy fit:** Conflicts with "determinism over cleverness." Does not honor "validate at
118
+ boundaries" (stale messages leak through).
119
+
120
+ **Verdict: Rejected.** Stale message re-processing is a correctness bug, not a tradeoff.
121
+
122
+ ---
123
+
124
+ ### Candidate B -- Best-fit: `drainMessageQueue()` with cursor, narrow text parsing
125
+
126
+ **Summary:** Add a pure function `drainMessageQueue(deps, opts)` to `src/coordinators/pr-review.ts`.
127
+ It reads new lines since `~/.workrail/message-queue-cursor.json`, parses message text for `stop` /
128
+ `skip-pr N` / `add-pr N` using narrow regex patterns, writes outbox acknowledgments for actionable
129
+ messages, advances the cursor. Called at the top of `runPrReviewCoordinator()` before Stage 1.
130
+
131
+ **Tensions resolved:**
132
+ - Append-only invariant respected (cursor tracks progress, queue file never modified)
133
+ - Stale message re-processing eliminated by cursor
134
+ - ENOENT handled (no queue = empty drain result = coordinator proceeds normally)
135
+ - `stop` takes absolute precedence
136
+
137
+ **Tensions accepted:**
138
+ - Text parsing is not type-safe; fragile to natural language variation
139
+
140
+ **Boundary solved at:** New exported function in `src/coordinators/pr-review.ts`.
141
+
142
+ **Why this boundary is best-fit:** Message routing is coordinator-specific. The drain reads a
143
+ coordinator-managed cursor file and writes outbox notifications -- both are coordinator
144
+ responsibilities. Extracting to a shared utility would create coupling without benefit (no other
145
+ coordinator exists today).
146
+
147
+ **Key data structures:**
148
+
149
+ ```ts
150
+ export interface DrainResult {
151
+ readonly stop: boolean;
152
+ readonly stopReason: string | null;
153
+ readonly skipPrNumbers: readonly number[];
154
+ readonly addPrNumbers: readonly number[];
155
+ readonly messagesProcessed: number;
156
+ }
157
+ ```
158
+
159
+ Cursor shape: `{ lastReadCount: number }` -- identical to `InboxCursor` in `worktrain-inbox.ts`.
160
+
161
+ New `CoordinatorDeps` fields:
162
+ ```ts
163
+ readonly readFile: (path: string) => Promise<string>;
164
+ readonly appendFile: (path: string, content: string) => Promise<void>;
165
+ ```
166
+
167
+ Parsing patterns:
168
+ - stop: `/\bstop\b/i`
169
+ - skip-pr: `/\bskip[- ]pr[\s#]+([0-9]+)/i`
170
+ - add-pr: `/\badd[- ]pr[\s#]+([0-9]+)/i`
171
+
172
+ **Failure mode:** A note message like "stop overthinking this" triggers coordinator halt. Mitigation:
173
+ word-boundary requirement limits false positives; documented as known behavior with workaround
174
+ ("add-pr" or "note:" prefix for non-command messages).
175
+
176
+ **Repo-pattern relationship:** Follows `worktrain-inbox.ts` cursor pattern exactly; follows
177
+ `CoordinatorDeps` injection pattern exactly.
178
+
179
+ **Gains:** Correct deduplication; clean separation; fully testable with fakes.
180
+
181
+ **Gives up:** Type-safe dispatch. A `kind` field would be cleaner.
182
+
183
+ **Impact surface:** `CoordinatorDeps` (additive), `cli-worktrain.ts` (2 new dep wires),
184
+ `coordinator-pr-review.test.ts` (2 new fake dep fields).
185
+
186
+ **Scope judgment:** Best-fit.
187
+
188
+ **Philosophy fit:** Honors immutability (readonly result), DI for boundaries, errors as values,
189
+ validate at boundaries. Partial conflict with "explicit domain types" (documented and accepted).
190
+
191
+ **Verdict: Recommended.**
192
+
193
+ ---
194
+
195
+ ### Candidate C -- Broader: structured `kind` field on `QueuedMessage`
196
+
197
+ **Summary:** Extend `QueuedMessage` with `readonly kind?: 'stop' | 'skip-pr' | 'add-pr' | 'note'`
198
+ and `readonly payload?: Record<string, unknown>`. Update `worktrain-tell.ts` to accept `--kind`
199
+ flag. Coordinator drains on `kind` field instead of text parsing.
200
+
201
+ **Tensions resolved:** Eliminates the stringly-typed tension entirely. Discriminated union on
202
+ `kind` makes routing exhaustive and type-safe.
203
+
204
+ **Tensions accepted:** Schema change affects the public CLI interface. Existing `tell` invocations
205
+ omitting `--kind` fall back to `kind: 'note'` (safe), but natural language commands no longer work
206
+ (`worktrain tell "stop"` becomes a note, not a stop signal).
207
+
208
+ **Boundary solved at:** `QueuedMessage` type in `worktrain-tell.ts` + coordinator drain in
209
+ `pr-review.ts` + CLI parser in `cli-worktrain.ts`.
210
+
211
+ **Why this boundary is too broad:** Adds `kind` to `QueuedMessage` -- a public interface change.
212
+ The `tell` command is documented as accepting any free-form text. Adding a required semantic field
213
+ is a separate design decision that should be preceded by discussion of the CLI UX.
214
+
215
+ **Failure mode:** Users who currently type `worktrain tell "stop the agent"` find it ignored
216
+ unless they learn to use `--kind stop`. The ergonomic regression is silent.
217
+
218
+ **Repo-pattern relationship:** Honors "explicit domain types" and "make illegal states
219
+ unrepresentable" from philosophy. Departs from current free-form-text CLI design.
220
+
221
+ **Gains:** Type-safe dispatch, no regex fragility, forward-compatible for new action kinds.
222
+
223
+ **Gives up:** Natural language ergonomics; requires more CLI plumbing.
224
+
225
+ **Scope judgment:** Too broad for this task.
226
+
227
+ **Philosophy fit:** Strongly honors explicit domain types, discriminated unions, exhaustiveness.
228
+ Conflicts with YAGNI -- adds schema complexity before the feature is proven.
229
+
230
+ **Verdict: Out of scope for this PR. File a follow-up issue.**
231
+
232
+ ---
233
+
234
+ ## Comparison and Recommendation
235
+
236
+ | | A (timestamp) | B (cursor + text) | C (structured kind) |
237
+ |---|---|---|---|
238
+ | Stale message safety | Weak | Strong | Strong |
239
+ | Schema change | No | No | Yes |
240
+ | Scope fit | Too narrow | Best-fit | Too broad |
241
+ | Testability | Full | Full | Full |
242
+ | Text-parse fragility | Avoided (no parse) | Narrow regexes | Eliminated |
243
+ | Repo-pattern alignment | Poor | Exact | Partial |
244
+ | Philosophy fit | Weak | Good (with caveat) | Strong |
245
+
246
+ **Recommendation: Candidate B.**
247
+
248
+ Candidate A fails on correctness. Candidate C solves the right problem but changes the wrong
249
+ boundary for this task. Candidate B is a direct adaptation of the existing `worktrain-inbox.ts`
250
+ cursor pattern to the coordinator context -- it introduces no new architectural ideas, just
251
+ applies the established approach.
252
+
253
+ ---
254
+
255
+ ## Self-Critique
256
+
257
+ **Strongest argument against Candidate B:**
258
+
259
+ The text-matching approach creates an implicit, undiscoverable API. Users sending messages from
260
+ phones have no way to know that `stop` means stop but `halt` does not. There is no help text,
261
+ no validation, no error message for unrecognized commands. This is a real UX problem.
262
+
263
+ **What would tip the decision toward Candidate C:**
264
+
265
+ Evidence that multiple clients (mobile app, automation scripts) need to send structured commands.
266
+ At that point, the text-parsing approach becomes a reliability liability. The right test: if
267
+ a second coordinator (e.g., a work-queue coordinator) also needs to consume the message queue,
268
+ Candidate C's structured dispatch becomes clearly necessary.
269
+
270
+ **Invalidating assumption:**
271
+
272
+ Candidate B assumes the word-boundary `stop` regex is specific enough. If users commonly type
273
+ messages like "stop worrying and trust the process" via phone, the stop regex will fire. Mitigation:
274
+ require the stop keyword to appear as the first meaningful token in the message, or require a
275
+ command prefix (e.g., `/stop`). This can be tightened without changing the architecture.
276
+
277
+ ---
278
+
279
+ ## Open Questions for the Main Agent
280
+
281
+ 1. Should the drain function write an outbox notification for every actionable message, or only
282
+ for `stop` (where the coordinator is halting and the user needs confirmation)? Suggested:
283
+ write for all actionable messages (stop, skip-pr, add-pr) to close the feedback loop.
284
+
285
+ 2. The `stop` signal exits cleanly -- should the coordinator report which messages caused the
286
+ stop in its final report? Suggested: yes, log the message text and timestamp in the run log.
287
+
288
+ 3. Should `add-pr` messages add new PRs to the list before or after deduplication? Suggested:
289
+ add them to `prs` before Stage 1 begins, guarding against duplicates with a Set.
@@ -0,0 +1,119 @@
1
+ # Shaping Workflow: External Research Synthesis
2
+ # Date: Apr 18, 2026
3
+ # Source: Deep research prompt answered by frontier model
4
+
5
+ ## TL;DR
6
+
7
+ An 11-step prompt chain with two mandatory human gates, a self-refine loop with evaluator-optimizer split, sectioned solution divergence, and a hybrid JSON+markdown artifact. The single highest-leverage design decision: **generation and critique run on structurally different prompts (ideally different model families)** -- anchoring and self-preference bias are not mitigated by CoT or self-reflection alone (Lou & Sun 2025; Panickssery et al. 2024).
8
+
9
+ ## The 11-Step Skeleton
10
+
11
+ | # | Step | Pattern | Output | Tokens |
12
+ |---|---|---|---|---|
13
+ | 1 | `ingest_and_extract` | Chain | Frame candidates, forces, open questions | 2–5k |
14
+ | 2 | `frame_gate` | Interrupt | Confirmed problem + appetite | small | **MANDATORY HUMAN GATE** |
15
+ | 3 | `diverge_solution_shapes` | Parallel ×4 | 4 candidate rough shapes | med ×4 |
16
+ | 4 | `converge_pick` | Separate judge | Chosen shape + rationale | small-med |
17
+ | 5 | `breadboard_and_elements` | Chain + 1 refine | Breadboard + fat-marker elements | 8–15k |
18
+ | 6 | `rabbit_holes_nogos` | Adversarial | Risks, mitigations, no-gos, assumptions | 3–6k |
19
+ | 7 | `context_pack_build` | Tool-augmented | File globs, utilities, conventions, related PRs | med-large |
20
+ | 8 | `example_map_and_gherkin` | Chain | Rules, examples, Gherkin scenarios | 3–6k |
21
+ | 9 | `draft_pitch` | Self-refine ×2, critic=separate prompt | Full pitch (markdown + JSON) | 8–15k ×critique |
22
+ | 10 | `approval_gate` | Interrupt | Approved pitch | small | **MANDATORY HUMAN GATE** |
23
+ | 11 | `finalize_and_handoff` | Deterministic + schema validate | Canonical artifact + pitch.md | <1k |
24
+
25
+ Total budget: 50–200k tokens depending on divergence fan-out.
26
+
27
+ ## Key Empirical Findings
28
+
29
+ ### What actually mitigates LLM failure modes in shaping (ranked):
30
+ 1. **Generator ≠ Evaluator with authorship obfuscation** -- use different model families for generation vs critique. Beats anchoring, self-preference, and mode collapse simultaneously. CoT and self-reflection alone do NOT work (Lou & Sun 2025).
31
+ 2. **Verbalized Sampling + N-alternatives-before-selection** -- prompt for a distribution, not a single answer. 1.6–2.1× diversity gain (Zhang et al. arXiv 2510.01171).
32
+ 3. **Schema-constrained structured output** -- kills verbosity compensation, forces right abstraction level by construction.
33
+ 4. **ClarifyGPT-style consistency check** -- generate two independent interpretations; divergence triggers clarification.
34
+ 5. **Self-Refine with specific rubric**, bounded at 2–3 iterations (~20% absolute gain, Madaan et al. arXiv 2303.17651).
35
+ 6. **Red-team pass** with explicit "what's hallucinated / what's missing" prompts against a separate instance.
36
+
37
+ ### The right level of abstraction (encodable heuristic)
38
+ **Interfaces and Invariants, Not Function Bodies.**
39
+
40
+ Classify every sentence in the pitch as:
41
+ - **(a) Interface** -- user-visible surfaces, data objects, integration points, touched modules
42
+ - **(b) Invariant** -- declarative constraints (idempotency, auth model, consistency requirements, latency budgets)
43
+ - **(c) Exclusion** -- explicitly excluded functionality
44
+ - **(d) Implementation detail** -- over-specification, demote or cut
45
+ - **(e) Vague** -- under-specification, replace with concrete interface/invariant or ask clarifying question
46
+
47
+ A well-shaped pitch contains only (a), (b), (c).
48
+
49
+ ### Shaping for AI implementers vs humans (the key asymmetry)
50
+ LLM implementers need:
51
+ - **MORE explicit** than any human spec on: interfaces, invariants, conventions, no-gos, exact API versions, file boundaries (LLMs fabricate APIs, lack tacit codebase knowledge, lack scope-shame)
52
+ - **LESS explicit** than junior-human spec on: standard implementation patterns (CRUD, routing, idiomatic error handling -- LLMs know these better)
53
+
54
+ The dominant failure mode to design against: **confident architectural divergence** -- agent produces working, tested, reviewable PR that reinvents an existing utility or lands logic in the wrong layer. Looks plausible in review. Neither tests nor LLM sensors reliably catch it. Only a better spec prevents it.
55
+
56
+ ### Context Pack (Step 7) is the highest-leverage AI-specific addition
57
+ But: **LLM-generated Context Packs are measurably inferior to human-curated ones** (ETH Zurich AGENTS.md study -- LLM-generated context reduced task success in 5 of 8 settings). Treat Step 7 output as a draft requiring spot-check.
58
+
59
+ ## The Artifact Schema
60
+
61
+ ```jsonc
62
+ {
63
+ "shaping_run_id": "uuid",
64
+ "frame": {
65
+ "problem_story_md": "...",
66
+ "appetite": {
67
+ "calendar_weeks": 6,
68
+ "token_budget_est": 120000,
69
+ "agent_turns_est": 60,
70
+ "files_touched_est": 8,
71
+ "sizing_bucket": "small|medium|large"
72
+ },
73
+ "forces": { "push": [...], "pull": [...], "anxiety": [...], "habit": [...] }
74
+ },
75
+ "solution": {
76
+ "breadboard_md": "...",
77
+ "elements": [{ "name": "...", "description_md": "...", "classification": "interface|invariant|exclusion" }],
78
+ "alternatives_considered": [{ "sketch": "...", "rejected_because": "..." }]
79
+ },
80
+ "context_pack": {
81
+ "touch_globs": ["src/billing/**"],
82
+ "do_not_touch_globs": ["src/auth/**", "migrations/**"],
83
+ "reuse_utilities": [{ "path": "...", "symbol": "...", "signature": "...", "reason_to_reuse": "..." }],
84
+ "conventions_md": "...",
85
+ "related_prior_art": [{ "path_or_pr": "...", "relevance": "..." }]
86
+ },
87
+ "acceptance_criteria": {
88
+ "gherkin": "Feature: ...\n Scenario: ...",
89
+ "verification_commands": ["pnpm test src/billing", "tsc --noEmit"],
90
+ "example_map": { "rules": [...], "examples": [...], "open_questions": [...] }
91
+ },
92
+ "rabbit_holes": [{ "risk": "...", "severity": "low|med|critical", "mitigation": "...", "patch_applied": true }],
93
+ "no_gos": ["..."],
94
+ "assumptions_log": [{ "step": "...", "assumption": "...", "confidence": 0.7, "rationale": "..." }],
95
+ "decomposition": {
96
+ "walking_skeleton": { "description": "thin end-to-end slice", "files": [...] },
97
+ "atomic_subtasks": [{ "id": "s1", "title": "...", "depends_on": [], "est_context_window": "single", "acceptance_scenario_refs": ["scenario-1"] }]
98
+ },
99
+ "pitch_md": "# Pitch: ...\n\n## Problem\n...",
100
+ "build_readiness_score": { "rubric_pass_count": 5, "critical_blockers": 0 }
101
+ }
102
+ ```
103
+
104
+ ## What NOT to Build
105
+ - Do NOT make this a dynamic autonomous agent -- shaping has a known skeleton (workflow, not agent)
106
+ - Do NOT use tree-of-thoughts -- no cheap partial-goal verification signal in shaping
107
+ - Do NOT build multi-agent role-plays -- single-voice judge with sectioning strictly dominates
108
+ - Do NOT skip the frame gate on "small" tasks -- wrong frame on a small task still wastes the run
109
+
110
+ ## Failure Modes and Mitigations
111
+
112
+ | Failure mode | Mitigation |
113
+ |---|---|
114
+ | Mode collapse on diverge step | Verbalized Sampling framing, explicit framing diversity, auto-retry at higher temperature if >70% overlap |
115
+ | Self-preference on judge | Obfuscate authorship by rewriting all candidates into uniform voice; ideally different model family |
116
+ | Verbosity compensation on pitch | Hard max-length on JSON fields; critic checks for vague modifiers without concrete nouns |
117
+ | Hallucinated Context Pack entries | Tool-augment Step 7 with repo grep/AST scan; schema-validate all paths before Step 10 |
118
+ | Over-decomposition | Minimum subtask size = single context window; maximum 8 subtasks per pitch; if more, appetite was wrong |
119
+ | Silent architectural divergence | Include consistency-check sub-task: implementer lists every new file/symbol and justifies why it's not a duplicate |
@@ -0,0 +1,147 @@
1
+ # Implementation Plan: Late-Bound Goals
2
+
3
+ **Feature**: Default `goalTemplate: "{{$.goal}}"` when no `goal` and no `goalTemplate` configured.
4
+
5
+ ---
6
+
7
+ ## Problem Statement
8
+
9
+ Triggers require a static `goal` in `triggers.yml`. This makes dynamic-goal use cases (PR review, incident response) require either a static placeholder goal or a custom `goalTemplate`. The feature enables: `curl -X POST /webhook/my-trigger -d '{"goal": "review PR #42"}'` without any goal pre-configuration.
10
+
11
+ ---
12
+
13
+ ## Acceptance Criteria
14
+
15
+ 1. A trigger with neither `goal` nor `goalTemplate` in `triggers.yml` loads successfully (no `missing_field` error).
16
+ 2. When the webhook payload contains a `goal` field, the session is started with that value as the goal.
17
+ 3. When the webhook payload has no `goal` field, the session is started with `'Autonomous task'` as the goal, AND a warning is logged to daemon stderr.
18
+ 4. Existing triggers with a static `goal` field behave identically to before.
19
+ 5. TypeScript compiles without errors (no new `any` or `!` assertions on the new code path).
20
+
21
+ ---
22
+
23
+ ## Non-Goals
24
+
25
+ - No change to `TriggerDefinition` type (`goal: string` stays required).
26
+ - No change to `trigger-router.ts` interpolation logic.
27
+ - No change to `console-routes.ts` dispatch endpoint.
28
+ - No `goalSource` discriminant on `TriggerDefinition` (future enhancement).
29
+ - No configurable fallback string other than `'Autonomous task'`.
30
+
31
+ ---
32
+
33
+ ## Philosophy-Driven Constraints
34
+
35
+ - Validate at boundaries: injection must happen in `trigger-store.ts`, not in `trigger-router.ts`.
36
+ - Make illegal states unrepresentable: `TriggerDefinition.goal` must always be a valid `string`.
37
+ - YAGNI: minimal change -- named constant + ~8 lines + tests.
38
+ - Document why: WHY comment required above the injection block.
39
+
40
+ ---
41
+
42
+ ## Invariants
43
+
44
+ 1. `TriggerDefinition.goal: string` -- never `undefined` or empty string after parse.
45
+ 2. `trigger.goal` is only used as a display/fallback value, never as a routing key.
46
+ 3. The interpolated goal (from payload or fallback) flows to the session, not the sentinel.
47
+ 4. Injection only fires when both `raw.goal` and `raw.goalTemplate` are absent.
48
+
49
+ ---
50
+
51
+ ## Selected Approach
52
+
53
+ **Candidate A: Parse-time sentinel injection in `validateAndResolveTrigger()`**
54
+
55
+ 1. Remove `'goal'` from `requiredStringFields`.
56
+ 2. After the required-fields loop, add a block:
57
+ - If `raw.goal` absent AND `raw.goalTemplate` absent: set `resolvedGoal = LATE_BOUND_GOAL_SENTINEL` and `resolvedGoalTemplate = '{{$.goal}}'`. Log an INFO message.
58
+ - If `raw.goal` absent AND `raw.goalTemplate` present: set `resolvedGoal = LATE_BOUND_GOAL_SENTINEL`.
59
+ - Otherwise: `resolvedGoal = raw.goal!.trim()`.
60
+ 3. Replace `goal: raw.goal!.trim()` on line 974 with `goal: resolvedGoal`.
61
+ 4. If `resolvedGoalTemplate` was injected, pass it via the `goalTemplate` spread at line 980.
62
+
63
+ **Runner-up**: Candidate B (optional type) -- rejected because it cascades null checks to 3+ files.
64
+
65
+ **Rationale**: Exact match to the `concurrencyMode` default pattern at line 756. Zero downstream impact.
66
+
67
+ ---
68
+
69
+ ## Vertical Slices
70
+
71
+ ### Slice 1: Core implementation (~8 lines in trigger-store.ts)
72
+
73
+ **File**: `src/trigger/trigger-store.ts`
74
+
75
+ **Changes**:
76
+ - Add `const LATE_BOUND_GOAL_SENTINEL = 'Autonomous task';` near the top of the validation section.
77
+ - Remove `'goal'` from `requiredStringFields` array.
78
+ - Add injection block with WHY comment + INFO log.
79
+ - Declare `let resolvedGoal: string;` and `let resolvedGoalTemplate: string | undefined;` before the injection block.
80
+ - Replace `goal: raw.goal!.trim()` with `goal: resolvedGoal` in the TriggerDefinition literal.
81
+ - Pass `resolvedGoalTemplate` to the `goalTemplate` spread.
82
+
83
+ **Done when**: TypeScript compiles. `loadTriggerConfig` returns a valid `TriggerDefinition` for a YAML with no `goal` and no `goalTemplate`.
84
+
85
+ ### Slice 2: Tests for trigger-store.ts
86
+
87
+ **File**: `tests/unit/trigger-store.test.ts`
88
+
89
+ **New test cases**:
90
+ 1. Trigger with no `goal` and no `goalTemplate` loads successfully, has `goal = 'Autonomous task'` and `goalTemplate = '{{$.goal}}'`.
91
+ 2. Trigger with no `goal` but an explicit `goalTemplate` loads successfully, has `goal = 'Autonomous task'` and the specified `goalTemplate`.
92
+ 3. Trigger with a static `goal` behaves identically to before (regression).
93
+
94
+ **Done when**: All 3 tests pass.
95
+
96
+ ### Slice 3: Test for trigger-router.ts integration
97
+
98
+ **File**: `tests/unit/trigger-router.test.ts`
99
+
100
+ **New test case**:
101
+ 1. Route a webhook event with payload `{ goal: 'review PR #42' }` to a late-bound trigger. Verify `runWorkflow` is called with `goal = 'review PR #42'`.
102
+ 2. Route a webhook event with no `goal` field to a late-bound trigger. Verify `runWorkflow` is called with `goal = 'Autonomous task'`.
103
+
104
+ **Done when**: Both tests pass.
105
+
106
+ ---
107
+
108
+ ## Test Design
109
+
110
+ All tests use the existing `loadTriggerConfig` (pure function, no I/O) and `TriggerRouter` patterns from the test files. No mocks needed for Slice 2. Slice 3 uses `makeFakeRunWorkflow` pattern already in the test file.
111
+
112
+ Run: `npx vitest run tests/unit/trigger-store.test.ts tests/unit/trigger-router.test.ts`
113
+
114
+ ---
115
+
116
+ ## Risk Register
117
+
118
+ | Risk | Likelihood | Impact | Mitigation |
119
+ |---|---|---|---|
120
+ | Non-null assertion on line 974 missed | Low | Compile error | TypeScript will catch |
121
+ | Slice 3 test fails due to goal not threading through | Low | Test failure | Trace through route() logic |
122
+ | Existing trigger-store tests broken by goal no longer being required | Low | Test failure | Regression test in Slice 2 covers this |
123
+
124
+ ---
125
+
126
+ ## PR Packaging
127
+
128
+ Single PR: `feat/late-bound-goals`. All 3 slices in one commit. ~20-30 lines total (source + tests).
129
+
130
+ ---
131
+
132
+ ## Philosophy Alignment
133
+
134
+ - validate-at-boundaries -> satisfied (injection in trigger-store.ts)
135
+ - make-illegal-states-unrepresentable -> satisfied (goal: string always holds)
136
+ - YAGNI -> satisfied (~8 source lines)
137
+ - document-why -> satisfied (WHY comment in injection block)
138
+ - prefer-explicit-domain-types -> tension (sentinel is stringly-typed; acceptable -- future `goalSource` discriminant tracked)
139
+
140
+ ---
141
+
142
+ ## Plan Confidence
143
+
144
+ - unresolvedUnknownCount: 0
145
+ - planConfidenceBand: High
146
+ - estimatedPRCount: 1
147
+ - followUpTickets: "Add goalSource discriminant to TriggerDefinition for console UI enhancement"