@exaudeus/workrail 3.79.3 → 3.80.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-worktrain.js +3 -3
- package/dist/console-ui/assets/{index-pA7_pNwu.js → index-2NrQPYdF.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/daemon/active-sessions.d.ts +8 -5
- package/dist/daemon/active-sessions.js +11 -2
- package/dist/daemon/core/session-result.d.ts +2 -2
- package/dist/daemon/daemon-events.d.ts +17 -13
- package/dist/daemon/daemon-events.js +4 -0
- package/dist/daemon/runner/agent-loop-runner.d.ts +4 -4
- package/dist/daemon/runner/pre-agent-session.d.ts +2 -2
- package/dist/daemon/runner/pre-agent-session.js +2 -1
- package/dist/daemon/runner/runner-types.d.ts +4 -4
- package/dist/daemon/session-scope.d.ts +2 -2
- package/dist/daemon/startup-recovery.js +2 -1
- package/dist/daemon/tools/bash.d.ts +2 -2
- package/dist/daemon/tools/continue-workflow.d.ts +3 -3
- package/dist/daemon/tools/file-tools.d.ts +4 -4
- package/dist/daemon/tools/glob-grep.d.ts +3 -3
- package/dist/daemon/tools/report-issue.d.ts +2 -2
- package/dist/daemon/tools/signal-coordinator.d.ts +2 -2
- package/dist/daemon/tools/spawn-agent.d.ts +2 -2
- package/dist/daemon/types.d.ts +4 -2
- package/dist/daemon/workflow-runner.js +2 -1
- package/dist/manifest.json +47 -47
- package/docs/ideas/backlog.md +35 -491
- package/package.json +1 -1
package/docs/ideas/backlog.md
CHANGED
|
@@ -55,6 +55,7 @@ No proposed solutions here -- just the problem.]
|
|
|
55
55
|
**Rules for writing entries:**
|
|
56
56
|
- **State the problem, not the solution.** "There is no way to invoke a routine directly" not "We should add a `worktrain invoke` command."
|
|
57
57
|
- **No steering.** Don't tell future implementers how to build it. Capture what needs to exist, not how to make it exist.
|
|
58
|
+
- **Solutions belong in "Things to hash out", not in the problem description.** If you find yourself writing "the coordinator should..." or "a script that..." in the problem body, move it to a hash-out question instead. You may mention a possible direction in a hash-out question, but frame it as an untested candidate -- not a decision.
|
|
58
59
|
- **Things to hash out = genuine open questions.** Only include questions that actually need to be answered before design can start. If you know the answer, state it in the problem description.
|
|
59
60
|
- **Relationships matter.** If this item depends on another, or would be superseded by another, name it explicitly.
|
|
60
61
|
- **Be specific about what "done" looks like** when it's not obvious -- e.g. "done means an operator can invoke any routine by name from the CLI without writing a workflow."
|
|
@@ -152,25 +153,6 @@ Issue #241 (TTL eviction across multiple files + new tests) was classified as Sm
|
|
|
152
153
|
|
|
153
154
|
---
|
|
154
155
|
|
|
155
|
-
### `worktrain doctor`: typed service config audit and auto-repair (May 7, 2026)
|
|
156
|
-
|
|
157
|
-
**Status: idea** | Priority: high
|
|
158
|
-
|
|
159
|
-
**Score: 11** | Cor:2 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
160
|
-
|
|
161
|
-
There is no command that audits whether the WorkTrain daemon is correctly configured and healthy before an operator relies on it for overnight work. Config problems (stale binary, wrong launchd plist path, missing env vars, port mismatch, bad model override) surface as silent failures at runtime -- often at 3am. The operator has no way to proactively detect or repair them.
|
|
162
|
-
|
|
163
|
-
OpenClaw ships a `SERVICE_AUDIT_CODES` system (typed issue codes: `gatewayCommandMissing`, `gatewayEntrypointMismatch`, `launchdKeepAlive`, `gatewayRuntimeBun`, etc.) with a `doctor --fix` command that auto-repairs the most common issues. A `worktrain doctor` command with the same pattern -- audit returns typed `ServiceConfigIssue[]`, severity `warning | error`, suggested fix per code -- would surface config problems before they become silent 3am failures.
|
|
164
|
-
|
|
165
|
-
This item is distinct from the "daemon --start reports success on crash" fix (PR #898, done): that fix verifies liveness after start; doctor would verify config correctness before start and at any time.
|
|
166
|
-
|
|
167
|
-
**Things to hash out:**
|
|
168
|
-
- Which audit codes are most valuable for Phase 1? Candidates: stale binary (binary mtime check), missing ANTHROPIC_API_KEY, triggers.yml parse error, launchd plist mismatch (plist path points to wrong binary), port conflict.
|
|
169
|
-
- Should `doctor --fix` auto-repair or only print instructions? Auto-repair for simple cases (rewrite plist path), print instructions for secrets.
|
|
170
|
-
- Where does this live -- `worktrain doctor` as a new subcommand, or integrated into `worktrain daemon --status`?
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
156
|
### Daemon binary stale after rebuild, no indication to user
|
|
175
157
|
|
|
176
158
|
**Status: ux gap** | Priority: medium
|
|
@@ -211,137 +193,42 @@ The delivery pipeline was extracted into `delivery-pipeline.ts` with explicit st
|
|
|
211
193
|
|
|
212
194
|
## WorkTrain Daemon
|
|
213
195
|
|
|
214
|
-
###
|
|
215
|
-
|
|
216
|
-
**Status: done** | Shipped in PR #946 (fix/etienneb/context-injection-bugs, auto-merge enabled)
|
|
217
|
-
|
|
218
|
-
**Score: 13** | Cor:3 Cap:1 Eff:3 Lev:3 Con:3 | Blocked: no
|
|
219
|
-
|
|
220
|
-
All three bugs fixed. `WorkflowContextSlots` typed interface + `extractContextSlots()` introduced in `src/daemon/types.ts`. `buildSystemPrompt` refactored to pipeline of pure section functions. `truncateToByteLimit` uses Buffer/surrogate-safe walk-back.
|
|
221
|
-
|
|
222
|
-
---
|
|
223
|
-
|
|
224
|
-
### Universal context enricher for all session entry points (Apr 30, 2026)
|
|
225
|
-
|
|
226
|
-
**Status: done** | Shipped in PR #947 (feat/etienneb/workflow-enricher, auto-merge enabled, depends on #946)
|
|
227
|
-
|
|
228
|
-
**Score: 11** | Cor:1 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
229
|
-
|
|
230
|
-
`WorkflowEnricher` service in `src/daemon/workflow-enricher.ts`. Fires for root sessions (`spawnDepth === 0`) inside `runWorkflow()` before `buildPreAgentSession()`. `PriorNotesPolicy` discriminated type controls notes injection. 1s timeout with partial fallback on `listRecentSessions`. `EnricherResult` threaded as typed value through call chain -- trigger never mutated. All 6 entry points covered.
|
|
231
|
-
|
|
232
|
-
**Pilot test gate still pending:** before declaring full success, verify agents reference prior notes in turn-1 reasoning in at least one real session.
|
|
233
|
-
|
|
234
|
-
---
|
|
235
|
-
|
|
236
|
-
### Unified daemon event schema: merge daemon event log and v2 session store into one trace format (May 7, 2026)
|
|
237
|
-
|
|
238
|
-
**Status: idea** | Priority: high
|
|
239
|
-
|
|
240
|
-
**Score: 11** | Cor:1 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
241
|
-
|
|
242
|
-
WorkTrain writes two separate event stores: the daemon event log at `~/.workrail/events/daemon/<date>.jsonl` (tool calls, session lifecycle, trigger fired) and the v2 session event store at `~/.workrail/data/sessions/<id>/`. Every console feature that needs both structured session data and daemon-layer events (goal text, trigger provenance, tool call history) must bridge two storage formats. The status briefing discovery doc explicitly flagged this as a blocking split: "Two storage systems: Daemon event log vs per-session store. Bridging them requires either reading two systems or accepting one system's incomplete picture."
|
|
243
|
-
|
|
244
|
-
OpenClaw ships a unified `TrajectoryEvent` schema (`traceId`, `source: "runtime" | "transcript" | "export"`, `type`, `ts`, `seq`, `sessionId`, `runId`, `workspaceDir`, `provider`, `modelId`) covering all event kinds from both sources. A single schema means tooling (console, export, replay, search) is built once.
|
|
245
|
-
|
|
246
|
-
The migration path does not require replacing either store immediately -- it can start by making the daemon event log speak the same schema fields, so the console can query either source without a bridge layer.
|
|
247
|
-
|
|
248
|
-
**Things to hash out:**
|
|
249
|
-
- Phase 1 scope: unify schema fields only (no storage migration), or actually consolidate to one storage backend?
|
|
250
|
-
- Does the v2 engine event format need to change, or can the daemon event log adopt v2-compatible fields without touching the engine?
|
|
251
|
-
- What is the correct `source` discriminant for WorkTrain? Candidates: `daemon` (trigger/session lifecycle), `engine` (step advances, token ops), `agent` (tool calls, LLM turns).
|
|
252
|
-
|
|
253
|
-
---
|
|
254
|
-
|
|
255
|
-
### Pluggable context assembly: replace hardcoded `buildSystemPrompt()` with an injectable interface (May 7, 2026)
|
|
196
|
+
### Large-comment smell detection during implementation (May 8, 2026)
|
|
256
197
|
|
|
257
198
|
**Status: idea** | Priority: medium
|
|
258
199
|
|
|
259
|
-
**Score: 9** | Cor:
|
|
260
|
-
|
|
261
|
-
WorkTrain's context injection is a hardcoded pipeline of pure section functions in `buildSystemPrompt()` (`sectionWorktreeScope`, `sectionWorkspaceContext`, `sectionAssembledContext`, `sectionPriorWorkspaceNotes`, `sectionChangedFiles`, `sectionReferenceUrls`). This works for the current fixed context set, but cannot support: (a) dynamic token-budget management (truncation today is a hard 8KB ceiling on assembled context with no compaction), (b) retrieval-augmented context injection (pull relevant prior session notes by semantic similarity rather than recency), (c) per-workflow context policies (a discovery session needs different context than an implementation session). Adding any of these requires modifying `buildSystemPrompt()` directly rather than composing a new context strategy.
|
|
262
|
-
|
|
263
|
-
OpenClaw ships a `ContextEngine` interface (`assemble(params: { tokenBudget, availableTools, model, prompt }) => Promise<{ messages, estimatedTokens, systemPromptAddition }>`, `compact()`, `maintain()` with background/foreground mode, `ingest()`, `rewriteTranscriptEntries()` via runtime callback) that is fully pluggable. Any context strategy -- windowed recency, semantic retrieval, summary-based compaction -- can be implemented behind the interface without touching the agent loop.
|
|
200
|
+
**Score: 9** | Cor:2 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: no
|
|
264
201
|
|
|
265
|
-
|
|
202
|
+
When an implementing agent adds a large comment to explain a code decision, that comment is often a signal that the architecture is wrong -- the agent is explaining a workaround rather than fixing the underlying constraint. There is currently no mechanism in the pipeline to detect this pattern and force the agent to investigate whether the comment reflects a design problem. Agents tend to either leave large comments in place or delete them silently; neither response surfaces the underlying architectural question.
|
|
266
203
|
|
|
267
204
|
**Things to hash out:**
|
|
268
|
-
- What
|
|
269
|
-
-
|
|
270
|
-
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
### Add runId, provider, and modelId to DaemonEvent (May 7, 2026)
|
|
275
|
-
|
|
276
|
-
**Status: idea** | Priority: high
|
|
277
|
-
|
|
278
|
-
**Score: 10** | Cor:1 Cap:2 Eff:3 Lev:2 Con:3 | Blocked: no
|
|
279
|
-
|
|
280
|
-
The console correlates daemon event log entries with v2 session store entries via `workrailSessionId`, but that field is only available after the continueToken is decoded (~50ms after session start). Events emitted before decode (notably `session_started`) have no `workrailSessionId`, creating a gap where the console cannot link early lifecycle events to the correct session.
|
|
281
|
-
|
|
282
|
-
Adding `runId` (set to the process-local `sessionId` UUID at the top of `runWorkflow()`) as an optional field on all per-session DaemonEvent interfaces closes this gap without migration: `runId` is available immediately, constant for the session's lifetime, and already threaded through `buildPreAgentSession()`. Adding `provider` and `modelId` at the same time gives the event log the model attribution that OpenClaw's trajectory schema has and WorkTrain's currently lacks.
|
|
283
|
-
|
|
284
|
-
Pattern source: OpenClaw `src/trajectory/types.ts` `TrajectoryEvent.runId` / `.provider` / `.modelId`.
|
|
285
|
-
|
|
286
|
-
**Philosophy note:** `runId` should be a branded type (`type RunId = string & { readonly _brand: 'RunId' }`) so it cannot be accidentally swapped with `sessionId` or `workrailSessionId` at call sites -- explicit domain types over primitives. All three fields should be optional in the event union (additive, backward-compatible) but required in a separate `SessionEventContext` helper type that is constructed once per session and passed through explicitly -- single source of state truth, no scattered `runId` assignments.
|
|
287
|
-
|
|
288
|
-
**Done looks like:** All per-session event interfaces (`SessionStartedEvent`, `ToolCalledEvent`, `StepAdvancedEvent`, etc.) have an optional `runId?: string` field. `runWorkflow()` assigns `runId = sessionId` and passes it wherever `workrailSessionId` is currently passed. `provider` and `modelId` are populated at session start from `buildAgentClient()`.
|
|
289
|
-
|
|
290
|
-
---
|
|
291
|
-
|
|
292
|
-
### QueuedFileWriter for DaemonEventEmitter (May 7, 2026)
|
|
293
|
-
|
|
294
|
-
**Status: idea** | Priority: medium
|
|
295
|
-
|
|
296
|
-
**Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
|
|
297
|
-
|
|
298
|
-
`DaemonEventEmitter._append()` calls `fs.appendFile()` concurrently. Under burst writes (a turn with many tool calls emitting `tool_call_started`, `tool_call_completed` pairs in rapid succession), concurrent appends can interleave JSONL lines, producing a corrupt log that `JSON.parse()` fails on. The failure is silent -- `emit()` is fire-and-forget and errors are swallowed.
|
|
299
|
-
|
|
300
|
-
The fix is a per-file promise chain: `this._writers.set(path, (this._writers.get(path) ?? Promise.resolve()).then(() => fs.appendFile(...)))`. Each write chains onto the previous write for the same file, serializing them without a mutex. ~10-line change.
|
|
301
|
-
|
|
302
|
-
Pattern source: OpenClaw `src/trajectory/runtime.ts` `QueuedFileWriter` per-session writer map.
|
|
303
|
-
|
|
304
|
-
**Philosophy note:** The promise-chain approach is the right fix over a mutex or a lock file: it is purely functional, uses no shared mutable state beyond the Map, and composes cleanly with the existing fire-and-forget emit() contract. The Map entry should be cleaned up when the promise resolves to prevent unbounded accumulation -- determinism over cleverness, no hidden state.
|
|
305
|
-
|
|
306
|
-
**Done looks like:** `DaemonEventEmitter._append()` uses a `Map<string, Promise<void>>` to serialize writes per file path. Existing tests pass. A new test asserts that 50 concurrent emits produce 50 valid JSONL lines in the correct order.
|
|
205
|
+
- What heuristic defines a "large" comment worth flagging? The right threshold is unknown -- too sensitive produces noise, too coarse misses real smells.
|
|
206
|
+
- Where in the pipeline should detection happen -- during implementation, post-implementation, or as part of review?
|
|
207
|
+
- Who is responsible for detection: the agent itself, the coordinator, or the reviewer? Each has different visibility and different trust levels.
|
|
208
|
+
- What is the right response when a smell is detected -- inject an extra step, block advancement, emit a signal, or something else entirely? One rough candidate: a coordinator-side script that diffs the working tree, scans for large newly-added comment blocks, and injects an additional verification step into the active session when it finds them -- but this is completely untested thinking, take with a large grain of salt.
|
|
209
|
+
- How does the agent distinguish a comment explaining a non-obvious invariant (legitimate) from one explaining a workaround (smell)? This may require LLM judgment, not just pattern matching.
|
|
307
210
|
|
|
308
211
|
---
|
|
309
212
|
|
|
310
|
-
###
|
|
311
|
-
|
|
312
|
-
**Status: idea** | Priority: medium
|
|
313
|
-
|
|
314
|
-
**Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
|
|
315
|
-
|
|
316
|
-
`workflowId` and `triggerId` values from config are validated at parse time for format correctness, but not at file-path construction time. If a malformed value slipped through (e.g. `wr/../../../etc/passwd`), file operations using those values as path segments could escape the intended directory. WorkTrain's `sessionId` values are `randomUUID()` (safe), but `workflowId` and `triggerId` are operator-supplied strings.
|
|
317
|
-
|
|
318
|
-
The pattern: `assertSafeFileSegment(id: string): string` rejects `/`, `\`, and null bytes and returns the sanitized id or throws. Then `isPathInside(safeDir, resolvedPath)` as a second structural containment check. Also: add `fs.chmod(filePath, 0o600)` to sidecar writes in `persistTokens()` so session recovery files are not world-readable.
|
|
213
|
+
### Context injection bugs: double-injection, byte-slice truncation, workspaceRules[0] drop (Apr 30, 2026)
|
|
319
214
|
|
|
320
|
-
|
|
215
|
+
**Status: done** | Shipped in PR #946 (fix/etienneb/context-injection-bugs, auto-merge enabled)
|
|
321
216
|
|
|
322
|
-
**
|
|
217
|
+
**Score: 13** | Cor:3 Cap:1 Eff:3 Lev:3 Con:3 | Blocked: no
|
|
323
218
|
|
|
324
|
-
|
|
219
|
+
All three bugs fixed. `WorkflowContextSlots` typed interface + `extractContextSlots()` introduced in `src/daemon/types.ts`. `buildSystemPrompt` refactored to pipeline of pure section functions. `truncateToByteLimit` uses Buffer/surrogate-safe walk-back.
|
|
325
220
|
|
|
326
221
|
---
|
|
327
222
|
|
|
328
|
-
###
|
|
329
|
-
|
|
330
|
-
**Status: idea** | Priority: medium
|
|
331
|
-
|
|
332
|
-
**Score: 8** | Cor:1 Cap:2 Eff:2 Lev:1 Con:2 | Blocked: no
|
|
333
|
-
|
|
334
|
-
WorkTrain's `spawn_agent` tool allows a parent session to spawn any `workflowId` without restriction. There is no mechanism for an operator to limit which child workflows a given trigger's sessions may delegate to. A misconfigured or misbehaving agent could spawn arbitrary long-running sessions, consuming queue slots and API budget.
|
|
223
|
+
### Universal context enricher for all session entry points (Apr 30, 2026)
|
|
335
224
|
|
|
336
|
-
|
|
225
|
+
**Status: done** | Shipped in PR #947 (feat/etienneb/workflow-enricher, auto-merge enabled, depends on #946)
|
|
337
226
|
|
|
338
|
-
|
|
227
|
+
**Score: 11** | Cor:1 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
339
228
|
|
|
340
|
-
|
|
229
|
+
`WorkflowEnricher` service in `src/daemon/workflow-enricher.ts`. Fires for root sessions (`spawnDepth === 0`) inside `runWorkflow()` before `buildPreAgentSession()`. `PriorNotesPolicy` discriminated type controls notes injection. 1s timeout with partial fallback on `listRecentSessions`. `EnricherResult` threaded as typed value through call chain -- trigger never mutated. All 6 entry points covered.
|
|
341
230
|
|
|
342
|
-
**
|
|
343
|
-
- Should this be on `TriggerDefinition` (per-trigger restriction) or on `agentConfig` (inheritable by child sessions)? Per-trigger is simpler; agentConfig inheritance is more flexible.
|
|
344
|
-
- Default when absent: `'*'` (current behavior, no restriction) or an explicit opt-in? A safe default would restrict to the same `workflowId` as the parent, but that would break coordinator patterns that intentionally spawn different workflows.
|
|
231
|
+
**Pilot test gate still pending:** before declaring full success, verify agents reference prior notes in turn-1 reasoning in at least one real session.
|
|
345
232
|
|
|
346
233
|
---
|
|
347
234
|
|
|
@@ -1551,143 +1438,6 @@ Essential before WorkTrain manages more than 2-3 repos.
|
|
|
1551
1438
|
|
|
1552
1439
|
---
|
|
1553
1440
|
|
|
1554
|
-
### Self-improvement loop MVP: WorkTrain picks up and ships workrail issues end-to-end (May 8, 2026)
|
|
1555
|
-
|
|
1556
|
-
**Status: idea** | Priority: high
|
|
1557
|
-
|
|
1558
|
-
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: yes (blocked by: convention enforcement in review, scope filter at dispatch, protected file gate, interpretation checkpoint operator approval, verification agent, resolver agent)
|
|
1559
|
-
|
|
1560
|
-
The self-improvement loop is the vision's north star and the primary test of whether WorkTrain works. This item defines the minimum viable version: WorkTrain picks up a labeled workrail issue, runs the full pipeline, and produces a correct, convention-compliant, reviewed PR -- without the operator intervening between phases except at the interpretation checkpoint.
|
|
1561
|
-
|
|
1562
|
-
The quality bar is non-negotiable: WorkTrain produces exemplary code that passes the same review standard it applies to others. Every finding gets fixed before merge, regardless of severity. No "we'll note it and move on."
|
|
1563
|
-
|
|
1564
|
-
---
|
|
1565
|
-
|
|
1566
|
-
**Full gate sequence:**
|
|
1567
|
-
|
|
1568
|
-
```
|
|
1569
|
-
Issue labeled worktrain:ready on workrail repo
|
|
1570
|
-
↓
|
|
1571
|
-
[Gate 0: Scope filter] -- coordinator checks issue is safe to dispatch
|
|
1572
|
-
PASS: single subsystem, no protected files, backlog Effort:3 or less
|
|
1573
|
-
FAIL: comment on issue, remove label, do not dispatch
|
|
1574
|
-
(pure TypeScript, no LLM, reads issue body + changed-files prediction)
|
|
1575
|
-
↓
|
|
1576
|
-
Adaptive coordinator: classify and select pipeline mode
|
|
1577
|
-
(implement for scoped bugs/features, full for anything needing discovery)
|
|
1578
|
-
↓
|
|
1579
|
-
Discovery + shaping phases (if full pipeline)
|
|
1580
|
-
Shaping output becomes the verifier's specification of "done"
|
|
1581
|
-
↓
|
|
1582
|
-
[Gate 1: Interpretation checkpoint -- ALWAYS requires operator approval]
|
|
1583
|
-
Operator approves: coding begins
|
|
1584
|
-
Operator edits: revised interpretation injected as coding context
|
|
1585
|
-
Operator rejects: issue returned to queue, label removed
|
|
1586
|
-
NOTE: no auto_confirm for the self-improvement loop, ever
|
|
1587
|
-
↓
|
|
1588
|
-
Coding phase (isolated worktree, branchStrategy: 'worktree')
|
|
1589
|
-
↓
|
|
1590
|
-
[Gate 2: Protected file check]
|
|
1591
|
-
Checks git diff for: daemon-soul.md, triggers.yml,
|
|
1592
|
-
src/v2/durable-core/ HMAC layer, docs/design/v2-core-design-locks.md,
|
|
1593
|
-
src/daemon/ session lifecycle core
|
|
1594
|
-
ANY HIT → stop immediately, escalate to operator, do not open PR
|
|
1595
|
-
(deterministic script, no LLM)
|
|
1596
|
-
↓
|
|
1597
|
-
PR opened automatically
|
|
1598
|
-
↓
|
|
1599
|
-
[Gate 3: CI]
|
|
1600
|
-
PASS → continue
|
|
1601
|
-
FAIL → WorkTrain reads CI output, attempts one targeted fix,
|
|
1602
|
-
pushes to same branch, re-runs CI
|
|
1603
|
-
STILL FAILING → escalate to operator, do not proceed
|
|
1604
|
-
↓
|
|
1605
|
-
[Gate 4: Verification agent] -- independent QA agent, adversarial stance
|
|
1606
|
-
Fed: original issue + shaped pitch (if exists) + implementation diff
|
|
1607
|
-
Tools: Read, Glob, Grep, constrained Bash (npx vitest, git diff, git show only)
|
|
1608
|
-
Job: prove each requirement is met with explicit evidence
|
|
1609
|
-
|
|
1610
|
-
VerificationRecord output contract:
|
|
1611
|
-
requirementsCoveredCount: number
|
|
1612
|
-
requirementsTotal: number
|
|
1613
|
-
evidencePerRequirement: Array<{
|
|
1614
|
-
requirement: string
|
|
1615
|
-
evidence: string // test output, grep result, etc.
|
|
1616
|
-
confident: boolean
|
|
1617
|
-
}>
|
|
1618
|
-
gapsFound: ReadonlyArray<string> // asked for, not implemented
|
|
1619
|
-
unexpectedScope: ReadonlyArray<string> // implemented, not asked for
|
|
1620
|
-
verdict: 'approved' | 'gaps_found' | 'disputed' | 'uncertain'
|
|
1621
|
-
|
|
1622
|
-
'approved' → proceed to review
|
|
1623
|
-
'gaps_found' → back to coding agent with gaps as context (one retry only)
|
|
1624
|
-
'uncertain' → escalate to operator
|
|
1625
|
-
'disputed' → Resolver agent (see below)
|
|
1626
|
-
↓
|
|
1627
|
-
[Gate 4b: Resolver agent -- fires only on 'disputed']
|
|
1628
|
-
Third independent agent, no stake in either position
|
|
1629
|
-
Fed: original requirement text, coding agent's implementation rationale
|
|
1630
|
-
(from session notes), verifier's specific objection + evidence
|
|
1631
|
-
Tools: same constrained Bash as verifier
|
|
1632
|
-
Job: determine ground truth -- does the code satisfy the requirement?
|
|
1633
|
-
|
|
1634
|
-
Resolver verdict (binding -- coding agent cannot argue back):
|
|
1635
|
-
'satisfied' → proceed to review
|
|
1636
|
-
'not_satisfied' → back to coding agent with resolver's rationale (final)
|
|
1637
|
-
'requirement_ambiguous' → escalate to operator with:
|
|
1638
|
-
original requirement + both agents' positions + resolver's analysis
|
|
1639
|
-
+ suggested clarification for the operator to add to the issue
|
|
1640
|
-
|
|
1641
|
-
NOTE: ACP (agent-to-agent real-time messaging) is a future enhancement
|
|
1642
|
-
to this gate. When ACP ships, verifier and resolver could exchange
|
|
1643
|
-
structured messages through the coordinator rather than using a batch
|
|
1644
|
-
three-agent panel. The coordinator-mediated panel is the MVP approach.
|
|
1645
|
-
↓
|
|
1646
|
-
[Gate 5: wr.mr-review -- calibrated for workrail]
|
|
1647
|
-
Loaded with: coding philosophy principles, daemon invariants doc,
|
|
1648
|
-
design locks, commit message rules, neverthrow/assertNever conventions
|
|
1649
|
-
ALL findings fixed before proceeding, regardless of severity
|
|
1650
|
-
WorkTrain fixes, re-reviews until clean -- no threshold, no "note it"
|
|
1651
|
-
↓
|
|
1652
|
-
Auto-merge (squash, delete worktree)
|
|
1653
|
-
↓
|
|
1654
|
-
Issue closed, backlog item marked done
|
|
1655
|
-
```
|
|
1656
|
-
|
|
1657
|
-
---
|
|
1658
|
-
|
|
1659
|
-
**What needs to be built (in dependency order):**
|
|
1660
|
-
|
|
1661
|
-
1. **Convention enforcement in wr.mr-review for workrail** -- workspace context that injects coding philosophy, design locks, and daemon invariants as explicit review criteria. Without this, Gate 5 is generic and toothless.
|
|
1662
|
-
|
|
1663
|
-
2. **Scope filter at dispatch (Gate 0)** -- pure TypeScript coordinator check. Refuses dispatch if: protected files predicted in scope, issue touches multiple subsystems, issue is architectural. Reads issue body + label set.
|
|
1664
|
-
|
|
1665
|
-
3. **Protected file gate (Gate 2)** -- post-coding delivery-layer script, runs `git diff --name-only` against a blocklist. Hard stop, no retry.
|
|
1666
|
-
|
|
1667
|
-
4. **Interpretation checkpoint wired to operator approval (Gate 1)** -- in daemon sessions on the workrail repo, the interpretation checkpoint must never auto-confirm. Coordinator sets `requireInterpretationApproval: true` for this trigger.
|
|
1668
|
-
|
|
1669
|
-
5. **Verification agent (Gate 4)** -- new agent role with dedicated system prompt (adversarial QA stance), constrained Bash tool variant, and `VerificationRecord` output contract enforced by the engine.
|
|
1670
|
-
|
|
1671
|
-
6. **Resolver agent (Gate 4b)** -- new agent role, binding verdict, same constrained Bash. Coordinator spawns only on `disputed` verdict.
|
|
1672
|
-
|
|
1673
|
-
7. **CI failure one-retry loop (Gate 3)** -- coordinator reads CI status via `gh`, spawns a targeted-fix session if failing, re-polls.
|
|
1674
|
-
|
|
1675
|
-
---
|
|
1676
|
-
|
|
1677
|
-
**Spawn depth:** The full gate sequence uses coordinator → coding → verifier → (if disputed) resolver. That's depth 3, hitting the current default `maxSubagentDepth: 3`. The workrail self-improvement trigger needs `maxSubagentDepth: 4` in `triggers.yml`.
|
|
1678
|
-
|
|
1679
|
-
**MVP issue scope:** Start with issues that are: scoped to one file or directory, have Effort:3 in the backlog (hours to a day), and don't touch `src/v2/durable-core/` or `src/daemon/` session lifecycle. Good candidates: infra utilities, CLI commands, test coverage, observability additions.
|
|
1680
|
-
|
|
1681
|
-
**Trust ramp:** For the first 10 issues, operator reviews the interpretation checkpoint output in full before approving. After 10 clean runs (no gaps found by verifier, no resolver invocations), interpretation checkpoint can be reviewed asynchronously (operator approves via `worktrain inbox`). After 20 clean runs, the gate timing can be relaxed further. The loop tightens based on track record, not a timer.
|
|
1682
|
-
|
|
1683
|
-
**Things to hash out:**
|
|
1684
|
-
- The constrained Bash tool for the verifier/resolver is a new tool variant not yet in the codebase -- `makeConstrainedBashTool(allowedPrefixes: readonly string[])`. Where does it live and how is it enforced?
|
|
1685
|
-
- `requireInterpretationApproval: true` is not a current field on `TriggerDefinition`. Does it go on `agentConfig` or as a top-level trigger field?
|
|
1686
|
-
- How does the operator approve/reject at Gate 1 in practice? Via `worktrain inbox`? Via console? This needs to work reliably at 3am when the operator isn't watching.
|
|
1687
|
-
- The one-retry CI fix (Gate 3) spawns a child session -- that child needs access to CI failure output. Does the coordinator fetch it via `gh` and inject it as context, or does the child agent fetch it itself?
|
|
1688
|
-
|
|
1689
|
-
---
|
|
1690
|
-
|
|
1691
1441
|
### Demo repo feedback loop: WorkTrain improves itself via real task execution (Apr 20, 2026)
|
|
1692
1442
|
|
|
1693
1443
|
**Status: idea** | Priority: high
|
|
@@ -2259,80 +2009,6 @@ This is already how mid-run resume works. The same mechanism extends naturally t
|
|
|
2259
2009
|
|
|
2260
2010
|
---
|
|
2261
2011
|
|
|
2262
|
-
### `withTimeout` + `withRetry` as first-class async boundary utilities (May 7, 2026)
|
|
2263
|
-
|
|
2264
|
-
**Status: idea** | Priority: medium
|
|
2265
|
-
|
|
2266
|
-
**Score: 10** | Cor:2 Cap:1 Eff:3 Lev:2 Con:3 | Blocked: no
|
|
2267
|
-
|
|
2268
|
-
WorkTrain's daemon has no composable timeout or retry primitives. `AgentLoop` has a stall timer wired in internally; `PollingScheduler` has no error backoff; `startup-recovery.ts` makes one attempt per session with no retry. The vision says "cancellation/timeouts are first-class" and "overnight-safe" -- but the infrastructure for that is scattered or absent.
|
|
2269
|
-
|
|
2270
|
-
The pattern (from `etienne-clone/src/types.ts`):
|
|
2271
|
-
|
|
2272
|
-
```typescript
|
|
2273
|
-
withTimeout<T>(fn: (signal: AbortSignal) => Promise<T>, ms: number, label: string): Promise<Result<T, TimeoutError>>
|
|
2274
|
-
withRetry<T, E>(fn: () => Promise<Result<T, E>>, config: RetryConfig): Promise<Result<T, E>>
|
|
2275
|
-
```
|
|
2276
|
-
|
|
2277
|
-
`RetryConfig` has `retryOn: (error: unknown) => boolean` -- the caller decides what's retryable, not the primitive. `withTimeout` threads `AbortSignal` into the function so cancellation propagates correctly. Both return `Result` types, never throw.
|
|
2278
|
-
|
|
2279
|
-
**Adaptation note:** etienne-clone rolls its own `Result<T,E>`; WorkTrain uses `neverthrow`. The `RetryConfig` shape and `retryOn` predicate are directly portable. The function bodies need to be rewritten against `ResultAsync` from neverthrow rather than copied verbatim.
|
|
2280
|
-
|
|
2281
|
-
**Philosophy note:** "Higher-order functions as a tool" -- retry and timeout are cross-cutting behaviors that should be composed around functions, not scattered across call sites. "Cancellation/timeouts are first-class" is a stated coding principle. These primitives make it structurally impossible to call an async boundary without deciding upfront whether it can timeout or retry.
|
|
2282
|
-
|
|
2283
|
-
**Done looks like:** `src/infra/async-boundaries.ts` exports `withTimeout()` and `withRetry()` using neverthrow `ResultAsync`. Used at: coordinator `callbackUrl` POST retries, polling error recovery, startup-recovery rehydrate attempts.
|
|
2284
|
-
|
|
2285
|
-
---
|
|
2286
|
-
|
|
2287
|
-
### `OrchestratorWorkflowAvailability` pattern: make missing-workflow states unrepresentable (May 7, 2026)
|
|
2288
|
-
|
|
2289
|
-
**Status: idea** | Priority: medium
|
|
2290
|
-
|
|
2291
|
-
**Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
|
|
2292
|
-
|
|
2293
|
-
WorkTrain's adaptive coordinator checks whether a requested workflow exists at dispatch time, but the check is implicit -- a missing workflow ID returns `workflow_not_found` from the engine at runtime, mid-session, after a session slot has been consumed. There is no compile-time or startup-time guarantee that the coordinator cannot route to a non-existent workflow.
|
|
2294
|
-
|
|
2295
|
-
The pattern (from `etienne-clone/src/pipeline/orchestrator-workflow-selection.ts`):
|
|
2296
|
-
|
|
2297
|
-
```typescript
|
|
2298
|
-
type OrchestratorWorkflowAvailability =
|
|
2299
|
-
| { kind: 'standard_only' }
|
|
2300
|
-
| { kind: 'standard_and_focused' }
|
|
2301
|
-
|
|
2302
|
-
assessAvailableOrchestratorWorkflows(
|
|
2303
|
-
availableWorkflowIds: readonly string[],
|
|
2304
|
-
workflowIds: OrchestratorWorkflowIds,
|
|
2305
|
-
): Result<OrchestratorWorkflowAvailability, string>
|
|
2306
|
-
```
|
|
2307
|
-
|
|
2308
|
-
The `standard=false` state cannot be constructed -- `assessAvailableOrchestratorWorkflows` returns `err` if the standard workflow is missing. The coordinator only ever holds a valid `OrchestratorWorkflowAvailability` value, so its dispatch logic cannot route to a missing workflow.
|
|
2309
|
-
|
|
2310
|
-
**Philosophy note:** "Make illegal states unrepresentable" -- the type system enforces that routing only happens after availability is confirmed. A bare string `workflowId` can point to anything; a typed `WorkflowAvailability` discriminated union can only be constructed when the workflows actually exist. This is the difference between a label and a constraint.
|
|
2311
|
-
|
|
2312
|
-
**Done looks like:** WorkTrain's adaptive coordinator calls `assessAvailableWorkflows(ctx, workflowIds)` at startup or pre-dispatch. Returns `Result<WorkflowAvailability, string>`. The dispatch function takes `WorkflowAvailability` as a parameter -- it is structurally impossible to call without first confirming availability.
|
|
2313
|
-
|
|
2314
|
-
---
|
|
2315
|
-
|
|
2316
|
-
### Per-session cost tracking: estimated spend visible in execution stats and console (May 7, 2026)
|
|
2317
|
-
|
|
2318
|
-
**Status: idea** | Priority: medium
|
|
2319
|
-
|
|
2320
|
-
**Score: 8** | Cor:1 Cap:2 Eff:3 Lev:1 Con:3 | Blocked: no
|
|
2321
|
-
|
|
2322
|
-
WorkTrain records `inputTokens` and `outputTokens` per session in `LlmTurnCompletedEvent` and step-level metrics, but never converts them to an estimated dollar cost. Operators have no way to see how much a session cost, which workflows are expensive, or when a stuck session has burned a disproportionate budget.
|
|
2323
|
-
|
|
2324
|
-
The pattern (from `etienne-clone/src/observability/cost.ts`): `estimateCost(modelId, usage, env)` is a pure function that returns `Result<number, CostEstimationError>`. Pricing is overridable via `LLM_INPUT_COST_PER_1M` and `LLM_OUTPUT_COST_PER_1M` env vars (injected for testability). Token counts are already in `LlmTurnCompletedEvent` and the v2 session store.
|
|
2325
|
-
|
|
2326
|
-
**Philosophy note:** "Observability as a constraint" -- cost is a first-class observable dimension of a session, not a post-hoc calculation. The env-injection pattern (`env: EnvRecord = process.env`) is already how WorkTrain tests env-dependent code. The function is pure and trivially testable.
|
|
2327
|
-
|
|
2328
|
-
**Done looks like:** `src/observability/cost.ts` (port from etienne-clone, adapting to WorkTrain's model IDs). `estimatedCostUsd` added to `execution-stats.jsonl` rows and `SessionCompletedEvent`. Console session detail shows cost. Alert threshold emits `orchestrator_review_warning`-equivalent event when a session exceeds a configured per-workflow cost cap.
|
|
2329
|
-
|
|
2330
|
-
**Things to hash out:**
|
|
2331
|
-
- Should cost alerts trigger escalation (surface to operator) or just log? A stuck session burning $5 should probably escalate.
|
|
2332
|
-
- Per-workflow cost caps belong in `TriggerDefinition.agentConfig` or in a separate `costPolicy` block?
|
|
2333
|
-
|
|
2334
|
-
---
|
|
2335
|
-
|
|
2336
2012
|
### Extensible output contract registration: coordinator-owned schemas, engine-enforced (Apr 30, 2026)
|
|
2337
2013
|
|
|
2338
2014
|
**Status: idea** | Priority: medium
|
|
@@ -2457,155 +2133,6 @@ The problem is not just "add an LLM to make the decision." An LLM making approva
|
|
|
2457
2133
|
|
|
2458
2134
|
---
|
|
2459
2135
|
|
|
2460
|
-
### Multi-score deterministic workflow routing: replace string-matching coordinator dispatch with typed scoring (May 7, 2026)
|
|
2461
|
-
|
|
2462
|
-
**Status: idea** | Priority: high
|
|
2463
|
-
|
|
2464
|
-
**Score: 12** | Cor:2 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
2465
|
-
|
|
2466
|
-
WorkTrain's adaptive coordinator selects which pipeline to run (quick_review, review_only, implement, full) based on task content. The current dispatch uses heuristics and LLM-assisted classification. This violates vision principle #1: zero LLM turns for routing. Coordinator decisions must be deterministic TypeScript code, not LLM reasoning.
|
|
2467
|
-
|
|
2468
|
-
The pattern: compute independent typed scores (size, complexity, risk, breadth) over the task's structured metadata, classify the task into a typed shape discriminated union (`isolated_fix`, `small_cohesive_behavior`, `broad_or_risky`, etc.), find hard blockers (conditions that force a specific pipeline regardless of scores), and return a typed `WorkflowAssessment` with score breakdown, shape, blockers, and a human-readable reason string. No LLM call in the routing path.
|
|
2469
|
-
|
|
2470
|
-
Pattern source: `etienne-clone/src/pipeline/orchestrator-workflow-selection.ts` -- `classifyMrShape()`, `assessWorkflowRoute()`, `chooseOrchestratorWorkflow()`. Also: `OrchestratorWorkflowAvailability` discriminated union ensures the "standard workflow missing" state cannot be constructed -- `assessAvailableOrchestratorWorkflows()` returns `Result<OrchestratorWorkflowAvailability, string>` so callers only hold valid states.
|
|
2471
|
-
|
|
2472
|
-
**Philosophy note:** This is the vision's "control flow from data state" principle made concrete: routing decisions derive from an explicit typed state machine over task scores, not from an LLM's implicit reasoning. Exhaustiveness on the shape union (`assertNever` in the confidence function) makes the routing logic refactor-safe. The `WorkflowAssessment` return type (not a bare string) makes every decision traceable without reading session transcripts.
|
|
2473
|
-
|
|
2474
|
-
**Done looks like:** The adaptive coordinator dispatches entirely from a `WorkflowAssessment` value produced by a pure TypeScript function over the task's metadata. No LLM call occurs before `runAdaptivePipeline()` selects a mode. A test can assert the routing decision for any task shape without mocking an LLM.
|
|
2475
|
-
|
|
2476
|
-
**Things to hash out:**
|
|
2477
|
-
- What dimensions make sense for WorkTrain tasks? MR review uses size/cohesion/risk/breadth over file diffs. WorkTrain tasks may need different dimensions (specificity, ambiguity, scope breadth, ticket maturity).
|
|
2478
|
-
- Where does the input data come from? The task candidate (from queue poll) has title, body, labels, issue number. Is that enough to score confidently without an LLM?
|
|
2479
|
-
- Should hard blockers be configurable per-trigger (e.g. "tasks labeled `security` always use the full pipeline") or hardcoded in the assessment function?
|
|
2480
|
-
|
|
2481
|
-
---
|
|
2482
|
-
|
|
2483
|
-
### Required `reason` field on coordinator signals: typed audit trail for headless gate decisions (May 7, 2026)
|
|
2484
|
-
|
|
2485
|
-
**Status: idea** | Priority: high
|
|
2486
|
-
|
|
2487
|
-
**Score: 11** | Cor:2 Cap:2 Eff:3 Lev:2 Con:3 | Blocked: no
|
|
2488
|
-
|
|
2489
|
-
When an agent calls `signal_coordinator` with `kind: 'approval_needed'` or `kind: 'blocked'`, the coordinator receives a signal but not necessarily the agent's reasoning. The coordinator (and operator) must read the full session transcript to understand why the agent requested approval. At scale -- dozens of sessions per day -- transcript reading is impractical. Signals without stated reasoning are unauditable.
|
|
2490
|
-
|
|
2491
|
-
Vision principle: "every decision visible in the session store." A signal that doesn't include the agent's stated reasoning violates this -- the decision (to pause and request approval) is visible, but the reason for it is buried in prose.
|
|
2492
|
-
|
|
2493
|
-
The fix is structural: make `reason` a required field on `approval_needed` and `blocked` signal kinds. The engine or coordinator rejects signals that omit it. This is the same pattern as `SelfConfirmEvent.validationReason` in `etienne-clone/src/types.ts`, where the comment reads: "`validationReason` is REQUIRED -- it's the audit trail for headless gates."
|
|
2494
|
-
|
|
2495
|
-
Pattern source: `etienne-clone/src/types.ts` `SelfConfirmEvent` -- `readonly validationReason: string` required (not optional) on the self-confirm event interface.
|
|
2496
|
-
|
|
2497
|
-
**Philosophy note:** "Errors are data" applies to under-specified signals too. A signal with no stated reason is incomplete data -- the receiver cannot act on it deterministically. Making `reason` required at the schema level turns a runtime ambiguity into a compile-time constraint. Capability-based: the signal type declares what information it carries, enforced by the schema, not by convention.
|
|
2498
|
-
|
|
2499
|
-
**Done looks like:** `CoordinatorSignalKindSchema` for `approval_needed` and `blocked` includes `reason: z.string().min(10)`. The `signal_emitted` daemon event includes the reason. `worktrain inbox` shows it. Coordinators can filter and route based on `reason` text without reading session transcripts.
|
|
2500
|
-
|
|
2501
|
-
---
|
|
2502
|
-
|
|
2503
|
-
### Typed finding extraction with multi-strategy fallback: enforce structured output contracts at coordinator boundaries (May 7, 2026)
|
|
2504
|
-
|
|
2505
|
-
**Status: idea** | Priority: high
|
|
2506
|
-
|
|
2507
|
-
**Score: 12** | Cor:3 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: no
|
|
2508
|
-
|
|
2509
|
-
WorkTrain's coordinators read structured phase handoff artifacts (review verdicts, discovery summaries, shaped pitches) from session output. Today, if an agent doesn't produce a clean JSON handoff, the coordinator either fails hard or reads free-text that it can't reliably parse. There is no graceful degradation ladder for malformed or missing structured output.
|
|
2510
|
-
|
|
2511
|
-
Vision principle: "structured outputs at every boundary" and "typed contracts make phases composable." This requires not just that the engine validates artifacts when they're present, but that the coordinator has a systematic recovery strategy when they aren't -- without silently accepting garbage.
|
|
2512
|
-
|
|
2513
|
-
The pattern (from `etienne-clone/src/findings/extract.ts`):
|
|
2514
|
-
1. Try the ideal path: find the typed artifact in the last `complete_step` output
|
|
2515
|
-
2. Fallback: parse a JSON block from the agent's final text response, normalize field names and casing
|
|
2516
|
-
3. Fallback: reconstruct from observable side effects (e.g. tool calls that imply what the agent concluded)
|
|
2517
|
-
4. All paths return `Result<T, ExtractionError>` with a typed error union (`no_handoff_output`, `invalid_json`, `schema_mismatch`) -- never throw, never silently accept
|
|
2518
|
-
|
|
2519
|
-
The normalization layer (`normalizeRawFindingsOutput`) handles the practical reality that agents produce `"APPROVE"` when the schema says `"approve"`, or `reviewFindings` when the field should be `findings`. This is boundary validation done right.
|
|
2520
|
-
|
|
2521
|
-
**Philosophy note:** "Validate at boundaries, trust inside" -- this is exactly the boundary. The coordinator trusts the extracted artifact once it passes validation; it never trusts raw agent output. "Errors are data" -- `ExtractionError` is a typed discriminated union that lets coordinators route on failure kind, not parse error messages.
|
|
2522
|
-
|
|
2523
|
-
**Done looks like:** Every coordinator phase that reads a structured handoff uses an `extractHandoff<T>()` function that applies the three-strategy fallback chain and returns `Result<T, HandoffExtractionError>`. The coordinator handles `err` cases explicitly -- escalate to operator, retry the phase, or degrade gracefully -- rather than crashing or silently accepting bad output.
|
|
2524
|
-
|
|
2525
|
-
**Things to hash out:**
|
|
2526
|
-
- Strategy 3 (reconstruct from tool calls) is highly session-type-specific. Should each coordinator define its own reconstruction strategy, or is there a generic fallback (e.g. "use the last `complete_step` notes as free text")?
|
|
2527
|
-
- Should extraction errors produce a `report_issue` record automatically, or is that the coordinator's responsibility?
|
|
2528
|
-
|
|
2529
|
-
---
|
|
2530
|
-
|
|
2531
|
-
### `InteractionIntent` discriminated union: platform-neutral operator decision model (May 7, 2026)
|
|
2532
|
-
|
|
2533
|
-
**Status: idea** | Priority: high
|
|
2534
|
-
|
|
2535
|
-
**Score: 12** | Cor:2 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
2536
|
-
|
|
2537
|
-
WorkTrain has no typed model for operator decisions on in-flight pipeline outcomes -- approving a PR, unblocking a stuck session, dropping an escalated finding, confirming an interpretation. Today, operator actions arrive as CLI commands (`worktrain tell`) or console HTTP calls, but the domain has no typed representation of what the operator actually decided. The coordinator cannot route on decision kind without parsing free text.
|
|
2538
|
-
|
|
2539
|
-
The pattern (from `etienne-clone/src/messaging/types.ts`): `InteractionIntent` is an exhaustive discriminated union of everything an operator can decide -- `approve_all`, `skip`, `post_finding`, `drop_finding`, `edit_finding`, `batch_post`, `batch_drop`. Platform adapters (Slack buttons, console HTTP, CLI) translate operator input into these intents. The domain handler switches on them exhaustively with `assertNever`. Neither layer knows about the other.
|
|
2540
|
-
|
|
2541
|
-
This directly addresses WorkTrain's open gap around human-in-the-loop for critical findings and stuck session escalation: the console or CLI emits a typed `OperatorIntent`, the coordinator handles it the same way regardless of channel.
|
|
2542
|
-
|
|
2543
|
-
Pattern source: `etienne-clone/src/messaging/types.ts` `InteractionIntent` + `etienne-clone/src/messaging/review-handler.ts` `createReviewInteractionHandler()`.
|
|
2544
|
-
|
|
2545
|
-
**Philosophy note:** "Make illegal states unrepresentable" -- a bare string `action=approve` can carry anything; a typed `{ kind: 'approve_all'; sessionId }` cannot. "Exhaustiveness everywhere" -- the switch on `intent.kind` with `assertNever` ensures adding a new decision kind forces every handler to be updated. "Capability-based" -- the operator is given exactly the decisions they can make, enforced by the type, not by convention.
|
|
2546
|
-
|
|
2547
|
-
**Done looks like:** `OperatorIntent` discriminated union covers the decisions WorkTrain surfaces to operators: `approve_pr`, `block_pr`, `unblock_session`, `drop_escalation`, `confirm_interpretation`. Console and CLI translate input into `OperatorIntent` values. The coordinator handler switches on them. A test can assert coordinator behavior for any intent without mocking a UI.
|
|
2548
|
-
|
|
2549
|
-
**Things to hash out:**
|
|
2550
|
-
- Which decisions should be in scope for v1? Not all decisions need to be interactive -- many can be autonomous. The union should cover only the cases where human judgment is genuinely required.
|
|
2551
|
-
- Should `OperatorIntent` carry a `reason?: string` (optional for human input, vs required for synthetic gates)? Or separate types for human vs synthetic decisions?
|
|
2552
|
-
|
|
2553
|
-
---
|
|
2554
|
-
|
|
2555
|
-
### `PendingDecision` store with pure state transitions: immutable approval state for operator-gated pipeline actions (May 7, 2026)
|
|
2556
|
-
|
|
2557
|
-
**Status: idea** | Priority: high
|
|
2558
|
-
|
|
2559
|
-
**Score: 11** | Cor:2 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: yes (needs InteractionIntent above)
|
|
2560
|
-
|
|
2561
|
-
When a WorkTrain pipeline reaches a point requiring operator approval -- a critical finding before merge, an interpretation confirmation before coding, a PR before auto-merge -- it currently either blocks indefinitely (waiting for an outbox message) or skips the gate. There is no structured in-memory state tracking what's pending, what the operator decided, and what the disposition of each item is.
|
|
2562
|
-
|
|
2563
|
-
The pattern (from `etienne-clone/src/messaging/pending-reviews.ts`): `PendingReview` is a fully immutable snapshot with `approvedFindings: ReadonlySet<FindingId>`, `droppedFindings: ReadonlySet<FindingId>`, `editedBodies: ReadonlyMap<FindingId, string>`, `status: 'pending' | 'posted' | 'skipped'`. All state transitions are pure functions `(state: PendingDecision) => PendingDecision` composed with `store.update(id, fn)` -- `approveFinding`, `dropFinding`, `editFinding`, `approveAllBySeverity`. The store is a thin `Map` wrapper with `add`, `get`, `update(id, fn)`, `cleanup(olderThanMs)`.
|
|
2564
|
-
|
|
2565
|
-
Pattern source: `etienne-clone/src/messaging/pending-reviews.ts`.
|
|
2566
|
-
|
|
2567
|
-
**Philosophy note:** "Derive state, don't accumulate it" -- each transition is a pure function over the current state, not an imperative mutation. `ReadonlySet` and `ReadonlyMap` enforce immutability at the type level. "Single source of state truth" -- the `PendingDecision` store is the one place that tracks what an operator has decided; no parallel flags or session-level booleans.
|
|
2568
|
-
|
|
2569
|
-
**Done looks like:** `PendingDecisionStore` in `src/coordinator/pending-decisions.ts`. Pipeline sessions that reach an operator gate call `pendingDecisions.add(decision)`. The coordinator polls or subscribes and calls `pendingDecisions.update(id, applyIntent(intent))` when the operator acts. `cleanup(24 * 60 * 60 * 1000)` runs at startup to expire stale pending decisions.
|
|
2570
|
-
|
|
2571
|
-
---
|
|
2572
|
-
|
|
2573
|
-
### `BriefingDelivery` + `InteractionPort` ports: hexagonal adapter contract for operator notification channels (May 7, 2026)
|
|
2574
|
-
|
|
2575
|
-
**Status: idea** | Priority: medium
|
|
2576
|
-
|
|
2577
|
-
**Score: 10** | Cor:1 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: yes (needs InteractionIntent above)
|
|
2578
|
-
|
|
2579
|
-
WorkTrain's notification path (when it ships) will need to deliver pipeline results to operators and receive their decisions back. If the notification logic is coupled to a specific channel (Slack, console, CLI), adding a second channel requires duplicating the entire delivery + interaction wiring. There is no current abstraction that separates "what to deliver" from "how to deliver it."
|
|
2580
|
-
|
|
2581
|
-
The pattern (from `etienne-clone/src/messaging/port.ts`): two tiny focused interfaces -- `BriefingDelivery` (domain → adapter: `deliverBriefing`, `updateStatus`, `markCompleted`, `notifySkipped`) and `InteractionPort` (adapter → domain: `start`, `stop`, `onInteraction`). The adapter owns all layout decisions (threads, message IDs, button rendering). The domain never sees channel-specific types. Any channel -- Slack, console, webhook, CLI -- implements the same two interfaces.
|
|
2582
|
-
|
|
2583
|
-
Pattern source: `etienne-clone/src/messaging/port.ts`.
|
|
2584
|
-
|
|
2585
|
-
**Philosophy note:** "Keep interfaces small and focused" -- two interfaces with four methods each, no leakage of platform details into the domain. "Dependency injection for boundaries" -- the coordinator receives `BriefingDelivery` and `InteractionPort` as injected dependencies; swapping Slack for the console is a constructor argument change. "Capability-based architecture" -- the coordinator can only do what the `BriefingDelivery` interface exposes, not arbitrary channel operations.
|
|
2586
|
-
|
|
2587
|
-
**Done looks like:** `PipelineDelivery` and `OperatorInteractionPort` interfaces in `src/coordinator/ports.ts`. Console HTTP adapter and CLI adapter each implement both. The coordinator takes them as constructor injections. Adding a Slack adapter requires only implementing the two interfaces, no coordinator changes.
|
|
2588
|
-
|
|
2589
|
-
---
|
|
2590
|
-
|
|
2591
|
-
### `CorrectionEvent` with typed `correctionType`: structured learning from operator edits to pipeline outputs (May 7, 2026)
|
|
2592
|
-
|
|
2593
|
-
**Status: idea** | Priority: medium
|
|
2594
|
-
|
|
2595
|
-
**Score: 9** | Cor:1 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: yes (needs PendingDecision store above)
|
|
2596
|
-
|
|
2597
|
-
When an operator edits a WorkTrain output -- rewrites a PR description, changes a finding severity, adjusts a summary -- that edit is currently invisible to the system. The session event log records that the pipeline ran; it does not record what the operator thought was wrong with the output. This is the core gap in the per-run retrospective backlog item: there is no structured way to capture what the operator corrected.
|
|
2598
|
-
|
|
2599
|
-
The pattern (from `etienne-clone/src/messaging/review-handler.ts` `emitCorrection()`): when an operator edits a finding, `CorrectionEvent` is emitted with a typed `correctionType: { textChanged, severityChanged, locationChanged, principleChanged, dropped }`. The event also carries `{ original, edited, context }`. Repeated correction patterns across sessions become training signal for improving prompts and workflows.
|
|
2600
|
-
|
|
2601
|
-
Pattern source: `etienne-clone/src/messaging/review-handler.ts` `emitCorrection()` + `etienne-clone/src/types.ts` `CorrectionEvent`.
|
|
2602
|
-
|
|
2603
|
-
**Philosophy note:** "Observability as a constraint" -- operator corrections are first-class events in the session store, not manual notes. "Errors are data" -- a correction is structured data about what the system got wrong, not a free-text annotation. The typed `correctionType` makes the correction machine-readable without LLM parsing.
|
|
2604
|
-
|
|
2605
|
-
**Done looks like:** `PipelineCorrectionEvent` added to `DaemonEvent` union. When an operator edits a WorkTrain output via the `PendingDecision` flow, the correction is classified and emitted. `worktrain logs` surfaces corrections. A future analytics pass can aggregate correction patterns per workflow and per step to identify systematic output quality gaps.
|
|
2606
|
-
|
|
2607
|
-
---
|
|
2608
|
-
|
|
2609
2136
|
### Agents must not perform delivery actions -- only the coordinator's delivery layer can (Apr 30, 2026)
|
|
2610
2137
|
|
|
2611
2138
|
**Status: idea** | Priority: high
|
|
@@ -5615,6 +5142,23 @@ The agent is expensive, inconsistent, and slow. Scripts are free, deterministic,
|
|
|
5615
5142
|
|
|
5616
5143
|
---
|
|
5617
5144
|
|
|
5145
|
+
### Branch dependency tracking: prevent accidental stacking and handle intentional stacking correctly (May 8, 2026)
|
|
5146
|
+
|
|
5147
|
+
**Status: idea** | Priority: high
|
|
5148
|
+
|
|
5149
|
+
**Score: 12** | Cor:3 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
5150
|
+
|
|
5151
|
+
WorkTrain creates branches and opens PRs without tracking whether a branch was created from main or from another in-flight PR branch. When a branch is accidentally based on a pending PR branch, two problems follow: (1) squash-merging the base PR absorbs the dependent PR's commits, making the dependent PR either empty or conflicted; (2) CI on the dependent PR tests the combined diff, not just the intended change. This has already caused real merge failures and required manual rebases. When stacking is intentional (PR B genuinely depends on PR A), WorkTrain has no mechanism to enforce merge order or automatically rebase B after A lands.
|
|
5152
|
+
|
|
5153
|
+
**Things to hash out:**
|
|
5154
|
+
- Should WorkTrain enforce "always branch from main" as a hard rule, or support intentional stacking with explicit dependency metadata?
|
|
5155
|
+
- If stacking is allowed, what is the right representation for a stack dependency -- a field in the session store, a git note, or a GitHub PR relationship?
|
|
5156
|
+
- When the base PR merges, who is responsible for rebasing dependents -- the coordinator, a post-merge hook, or a separate `worktrain rebase` command?
|
|
5157
|
+
- One candidate approach for the accidental case: detect at branch creation time whether HEAD is on a pending PR branch and warn or block. For the intentional case, this would be wrong -- some work genuinely depends on an unmerged PR and the stack is correct. The distinction needs to be explicit, not inferred. Take with a large grain of salt.
|
|
5158
|
+
- What should happen to a dependent branch mid-session when its base merges? Should the daemon interrupt the session, or let it finish and rebase at the end?
|
|
5159
|
+
|
|
5160
|
+
---
|
|
5161
|
+
|
|
5618
5162
|
### Worktree and branch lifecycle management
|
|
5619
5163
|
|
|
5620
5164
|
WorkTrain has no tooling to surface the state of worktrees and branches relative to main. Doing this manually today requires running git commands across every registered worktree, cross-referencing merged PR lists, and inspecting each branch's unique commits to determine if the work landed. Pain points observed in practice:
|