@exaudeus/workrail 3.79.2 → 3.80.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -55,6 +55,7 @@ No proposed solutions here -- just the problem.]
55
55
  **Rules for writing entries:**
56
56
  - **State the problem, not the solution.** "There is no way to invoke a routine directly" not "We should add a `worktrain invoke` command."
57
57
  - **No steering.** Don't tell future implementers how to build it. Capture what needs to exist, not how to make it exist.
58
+ - **Solutions belong in "Things to hash out", not in the problem description.** If you find yourself writing "the coordinator should..." or "a script that..." in the problem body, move it to a hash-out question instead. You may mention a possible direction in a hash-out question, but frame it as an untested candidate -- not a decision.
58
59
  - **Things to hash out = genuine open questions.** Only include questions that actually need to be answered before design can start. If you know the answer, state it in the problem description.
59
60
  - **Relationships matter.** If this item depends on another, or would be superseded by another, name it explicitly.
60
61
  - **Be specific about what "done" looks like** when it's not obvious -- e.g. "done means an operator can invoke any routine by name from the CLI without writing a workflow."
@@ -152,25 +153,6 @@ Issue #241 (TTL eviction across multiple files + new tests) was classified as Sm
152
153
 
153
154
  ---
154
155
 
155
- ### `worktrain doctor`: typed service config audit and auto-repair (May 7, 2026)
156
-
157
- **Status: idea** | Priority: high
158
-
159
- **Score: 11** | Cor:2 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
160
-
161
- There is no command that audits whether the WorkTrain daemon is correctly configured and healthy before an operator relies on it for overnight work. Config problems (stale binary, wrong launchd plist path, missing env vars, port mismatch, bad model override) surface as silent failures at runtime -- often at 3am. The operator has no way to proactively detect or repair them.
162
-
163
- OpenClaw ships a `SERVICE_AUDIT_CODES` system (typed issue codes: `gatewayCommandMissing`, `gatewayEntrypointMismatch`, `launchdKeepAlive`, `gatewayRuntimeBun`, etc.) with a `doctor --fix` command that auto-repairs the most common issues. A `worktrain doctor` command with the same pattern -- audit returns typed `ServiceConfigIssue[]`, severity `warning | error`, suggested fix per code -- would surface config problems before they become silent 3am failures.
164
-
165
- This item is distinct from the "daemon --start reports success on crash" fix (PR #898, done): that fix verifies liveness after start; doctor would verify config correctness before start and at any time.
166
-
167
- **Things to hash out:**
168
- - Which audit codes are most valuable for Phase 1? Candidates: stale binary (binary mtime check), missing ANTHROPIC_API_KEY, triggers.yml parse error, launchd plist mismatch (plist path points to wrong binary), port conflict.
169
- - Should `doctor --fix` auto-repair or only print instructions? Auto-repair for simple cases (rewrite plist path), print instructions for secrets.
170
- - Where does this live -- `worktrain doctor` as a new subcommand, or integrated into `worktrain daemon --status`?
171
-
172
- ---
173
-
174
156
  ### Daemon binary stale after rebuild, no indication to user
175
157
 
176
158
  **Status: ux gap** | Priority: medium
@@ -211,137 +193,42 @@ The delivery pipeline was extracted into `delivery-pipeline.ts` with explicit st
211
193
 
212
194
  ## WorkTrain Daemon
213
195
 
214
- ### Context injection bugs: double-injection, byte-slice truncation, workspaceRules[0] drop (Apr 30, 2026)
215
-
216
- **Status: done** | Shipped in PR #946 (fix/etienneb/context-injection-bugs, auto-merge enabled)
217
-
218
- **Score: 13** | Cor:3 Cap:1 Eff:3 Lev:3 Con:3 | Blocked: no
219
-
220
- All three bugs fixed. `WorkflowContextSlots` typed interface + `extractContextSlots()` introduced in `src/daemon/types.ts`. `buildSystemPrompt` refactored to pipeline of pure section functions. `truncateToByteLimit` uses Buffer/surrogate-safe walk-back.
221
-
222
- ---
223
-
224
- ### Universal context enricher for all session entry points (Apr 30, 2026)
225
-
226
- **Status: done** | Shipped in PR #947 (feat/etienneb/workflow-enricher, auto-merge enabled, depends on #946)
227
-
228
- **Score: 11** | Cor:1 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
229
-
230
- `WorkflowEnricher` service in `src/daemon/workflow-enricher.ts`. Fires for root sessions (`spawnDepth === 0`) inside `runWorkflow()` before `buildPreAgentSession()`. `PriorNotesPolicy` discriminated type controls notes injection. 1s timeout with partial fallback on `listRecentSessions`. `EnricherResult` threaded as typed value through call chain -- trigger never mutated. All 6 entry points covered.
231
-
232
- **Pilot test gate still pending:** before declaring full success, verify agents reference prior notes in turn-1 reasoning in at least one real session.
233
-
234
- ---
235
-
236
- ### Unified daemon event schema: merge daemon event log and v2 session store into one trace format (May 7, 2026)
237
-
238
- **Status: idea** | Priority: high
239
-
240
- **Score: 11** | Cor:1 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
241
-
242
- WorkTrain writes two separate event stores: the daemon event log at `~/.workrail/events/daemon/<date>.jsonl` (tool calls, session lifecycle, trigger fired) and the v2 session event store at `~/.workrail/data/sessions/<id>/`. Every console feature that needs both structured session data and daemon-layer events (goal text, trigger provenance, tool call history) must bridge two storage formats. The status briefing discovery doc explicitly flagged this as a blocking split: "Two storage systems: Daemon event log vs per-session store. Bridging them requires either reading two systems or accepting one system's incomplete picture."
243
-
244
- OpenClaw ships a unified `TrajectoryEvent` schema (`traceId`, `source: "runtime" | "transcript" | "export"`, `type`, `ts`, `seq`, `sessionId`, `runId`, `workspaceDir`, `provider`, `modelId`) covering all event kinds from both sources. A single schema means tooling (console, export, replay, search) is built once.
245
-
246
- The migration path does not require replacing either store immediately -- it can start by making the daemon event log speak the same schema fields, so the console can query either source without a bridge layer.
247
-
248
- **Things to hash out:**
249
- - Phase 1 scope: unify schema fields only (no storage migration), or actually consolidate to one storage backend?
250
- - Does the v2 engine event format need to change, or can the daemon event log adopt v2-compatible fields without touching the engine?
251
- - What is the correct `source` discriminant for WorkTrain? Candidates: `daemon` (trigger/session lifecycle), `engine` (step advances, token ops), `agent` (tool calls, LLM turns).
252
-
253
- ---
254
-
255
- ### Pluggable context assembly: replace hardcoded `buildSystemPrompt()` with an injectable interface (May 7, 2026)
196
+ ### Large-comment smell detection during implementation (May 8, 2026)
256
197
 
257
198
  **Status: idea** | Priority: medium
258
199
 
259
- **Score: 9** | Cor:1 Cap:2 Eff:1 Lev:2 Con:2 | Blocked: no
260
-
261
- WorkTrain's context injection is a hardcoded pipeline of pure section functions in `buildSystemPrompt()` (`sectionWorktreeScope`, `sectionWorkspaceContext`, `sectionAssembledContext`, `sectionPriorWorkspaceNotes`, `sectionChangedFiles`, `sectionReferenceUrls`). This works for the current fixed context set, but cannot support: (a) dynamic token-budget management (truncation today is a hard 8KB ceiling on assembled context with no compaction), (b) retrieval-augmented context injection (pull relevant prior session notes by semantic similarity rather than recency), (c) per-workflow context policies (a discovery session needs different context than an implementation session). Adding any of these requires modifying `buildSystemPrompt()` directly rather than composing a new context strategy.
262
-
263
- OpenClaw ships a `ContextEngine` interface (`assemble(params: { tokenBudget, availableTools, model, prompt }) => Promise<{ messages, estimatedTokens, systemPromptAddition }>`, `compact()`, `maintain()` with background/foreground mode, `ingest()`, `rewriteTranscriptEntries()` via runtime callback) that is fully pluggable. Any context strategy -- windowed recency, semantic retrieval, summary-based compaction -- can be implemented behind the interface without touching the agent loop.
200
+ **Score: 9** | Cor:2 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: no
264
201
 
265
- Phase 1 is not a full interface extraction -- it is scoping the problem: identify which callers of `buildSystemPrompt()` would benefit from a budget parameter, and what the minimal injectable seam looks like.
202
+ When an implementing agent adds a large comment to explain a code decision, that comment is often a signal that the architecture is wrong -- the agent is explaining a workaround rather than fixing the underlying constraint. There is currently no mechanism in the pipeline to detect this pattern and force the agent to investigate whether the comment reflects a design problem. Agents tend to either leave large comments in place or delete them silently; neither response surfaces the underlying architectural question.
266
203
 
267
204
  **Things to hash out:**
268
- - What is the right Phase 1 scope? Candidates: (a) add a `tokenBudget` parameter to `buildSystemPrompt()` and use it for truncation decisions, (b) extract a `ContextAssembler` port with a single `assemble()` method, (c) define the full interface and ship a `DefaultContextAssembler` that wraps the current behavior unchanged.
269
- - Does this depend on the unified event schema (above), or is it independent? If context retrieval needs session history, it depends on MemoryStore, which depends on indexed session history.
270
- - Effort: Phase 1 (budget parameter only) is hours. Full interface extraction is days. The backlog score reflects Phase 1.
271
-
272
- ---
273
-
274
- ### Add runId, provider, and modelId to DaemonEvent (May 7, 2026)
275
-
276
- **Status: idea** | Priority: high
277
-
278
- **Score: 10** | Cor:1 Cap:2 Eff:3 Lev:2 Con:3 | Blocked: no
279
-
280
- The console correlates daemon event log entries with v2 session store entries via `workrailSessionId`, but that field is only available after the continueToken is decoded (~50ms after session start). Events emitted before decode (notably `session_started`) have no `workrailSessionId`, creating a gap where the console cannot link early lifecycle events to the correct session.
281
-
282
- Adding `runId` (set to the process-local `sessionId` UUID at the top of `runWorkflow()`) as an optional field on all per-session DaemonEvent interfaces closes this gap without migration: `runId` is available immediately, constant for the session's lifetime, and already threaded through `buildPreAgentSession()`. Adding `provider` and `modelId` at the same time gives the event log the model attribution that OpenClaw's trajectory schema has and WorkTrain's currently lacks.
283
-
284
- Pattern source: OpenClaw `src/trajectory/types.ts` `TrajectoryEvent.runId` / `.provider` / `.modelId`.
285
-
286
- **Philosophy note:** `runId` should be a branded type (`type RunId = string & { readonly _brand: 'RunId' }`) so it cannot be accidentally swapped with `sessionId` or `workrailSessionId` at call sites -- explicit domain types over primitives. All three fields should be optional in the event union (additive, backward-compatible) but required in a separate `SessionEventContext` helper type that is constructed once per session and passed through explicitly -- single source of state truth, no scattered `runId` assignments.
287
-
288
- **Done looks like:** All per-session event interfaces (`SessionStartedEvent`, `ToolCalledEvent`, `StepAdvancedEvent`, etc.) have an optional `runId?: string` field. `runWorkflow()` assigns `runId = sessionId` and passes it wherever `workrailSessionId` is currently passed. `provider` and `modelId` are populated at session start from `buildAgentClient()`.
289
-
290
- ---
291
-
292
- ### QueuedFileWriter for DaemonEventEmitter (May 7, 2026)
293
-
294
- **Status: idea** | Priority: medium
295
-
296
- **Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
297
-
298
- `DaemonEventEmitter._append()` calls `fs.appendFile()` concurrently. Under burst writes (a turn with many tool calls emitting `tool_call_started`, `tool_call_completed` pairs in rapid succession), concurrent appends can interleave JSONL lines, producing a corrupt log that `JSON.parse()` fails on. The failure is silent -- `emit()` is fire-and-forget and errors are swallowed.
299
-
300
- The fix is a per-file promise chain: `this._writers.set(path, (this._writers.get(path) ?? Promise.resolve()).then(() => fs.appendFile(...)))`. Each write chains onto the previous write for the same file, serializing them without a mutex. ~10-line change.
301
-
302
- Pattern source: OpenClaw `src/trajectory/runtime.ts` `QueuedFileWriter` per-session writer map.
303
-
304
- **Philosophy note:** The promise-chain approach is the right fix over a mutex or a lock file: it is purely functional, uses no shared mutable state beyond the Map, and composes cleanly with the existing fire-and-forget emit() contract. The Map entry should be cleaned up when the promise resolves to prevent unbounded accumulation -- determinism over cleverness, no hidden state.
305
-
306
- **Done looks like:** `DaemonEventEmitter._append()` uses a `Map<string, Promise<void>>` to serialize writes per file path. Existing tests pass. A new test asserts that 50 concurrent emits produce 50 valid JSONL lines in the correct order.
205
+ - What heuristic defines a "large" comment worth flagging? The right threshold is unknown -- too sensitive produces noise, too coarse misses real smells.
206
+ - Where in the pipeline should detection happen -- during implementation, post-implementation, or as part of review?
207
+ - Who is responsible for detection: the agent itself, the coordinator, or the reviewer? Each has different visibility and different trust levels.
208
+ - What is the right response when a smell is detected -- inject an extra step, block advancement, emit a signal, or something else entirely? One rough candidate: a coordinator-side script that diffs the working tree, scans for large newly-added comment blocks, and injects an additional verification step into the active session when it finds them -- but this is completely untested thinking, take with a large grain of salt.
209
+ - How does the agent distinguish a comment explaining a non-obvious invariant (legitimate) from one explaining a workaround (smell)? This may require LLM judgment, not just pattern matching.
307
210
 
308
211
  ---
309
212
 
310
- ### Two-layer path validation for config-derived file paths (May 7, 2026)
311
-
312
- **Status: idea** | Priority: medium
313
-
314
- **Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
315
-
316
- `workflowId` and `triggerId` values from config are validated at parse time for format correctness, but not at file-path construction time. If a malformed value slipped through (e.g. `wr/../../../etc/passwd`), file operations using those values as path segments could escape the intended directory. WorkTrain's `sessionId` values are `randomUUID()` (safe), but `workflowId` and `triggerId` are operator-supplied strings.
317
-
318
- The pattern: `assertSafeFileSegment(id: string): string` rejects `/`, `\`, and null bytes and returns the sanitized id or throws. Then `isPathInside(safeDir, resolvedPath)` as a second structural containment check. Also: add `fs.chmod(filePath, 0o600)` to sidecar writes in `persistTokens()` so session recovery files are not world-readable.
213
+ ### Context injection bugs: double-injection, byte-slice truncation, workspaceRules[0] drop (Apr 30, 2026)
319
214
 
320
- Pattern source: OpenClaw `src/cron/run-log.ts` `assertSafeCronRunLogJobId()` + `isPathInside()`.
215
+ **Status: done** | Shipped in PR #946 (fix/etienneb/context-injection-bugs, auto-merge enabled)
321
216
 
322
- **Philosophy note:** This is validate-at-boundaries in practice: `workflowId` and `triggerId` are external inputs (operator config) and must be re-validated at every boundary where they're used to construct a file path, not just at parse time. `assertSafeFileSegment()` should return a branded type (`SafeFileSegment`) so the compiler enforces that path construction only ever uses validated segments -- type safety as the first line of defense. `isPathInside()` is the runtime guard for cases where the type system can't help.
217
+ **Score: 13** | Cor:3 Cap:1 Eff:3 Lev:3 Con:3 | Blocked: no
323
218
 
324
- **Done looks like:** A `src/infra/safe-path.ts` module exports `assertSafeFileSegment()`. Applied at all sites where `workflowId` or `triggerId` is used to construct a file path. `persistTokens()` adds `chmod 0o600` after the atomic rename. Tests cover the rejection cases.
219
+ All three bugs fixed. `WorkflowContextSlots` typed interface + `extractContextSlots()` introduced in `src/daemon/types.ts`. `buildSystemPrompt` refactored to pipeline of pure section functions. `truncateToByteLimit` uses Buffer/surrogate-safe walk-back.
325
220
 
326
221
  ---
327
222
 
328
- ### Spawn allowlist policy: restrict which workflows a session can spawn (May 7, 2026)
329
-
330
- **Status: idea** | Priority: medium
331
-
332
- **Score: 8** | Cor:1 Cap:2 Eff:2 Lev:1 Con:2 | Blocked: no
333
-
334
- WorkTrain's `spawn_agent` tool allows a parent session to spawn any `workflowId` without restriction. There is no mechanism for an operator to limit which child workflows a given trigger's sessions may delegate to. A misconfigured or misbehaving agent could spawn arbitrary long-running sessions, consuming queue slots and API budget.
223
+ ### Universal context enricher for all session entry points (Apr 30, 2026)
335
224
 
336
- A `spawnPolicy` field on `TriggerDefinition` (or on the `agentConfig` block) would let operators declare an allowlist: `allowedWorkflows: ['wr.review', 'wr.coding-task']` or `'*'` for unrestricted. `makeSpawnAgentTool()` checks the allowlist before calling `executeStartWorkflow()` and returns a typed error if the requested `workflowId` is not permitted.
225
+ **Status: done** | Shipped in PR #947 (feat/etienneb/workflow-enricher, auto-merge enabled, depends on #946)
337
226
 
338
- Pattern source: OpenClaw `src/agents/subagent-target-policy.ts` `resolveSubagentTargetPolicy()` -- pure function, `{ ok: true } | { ok: false, allowedText, error }`.
227
+ **Score: 11** | Cor:1 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
339
228
 
340
- **Philosophy note:** The allowlist check must be a pure function with a discriminated union result -- `{ ok: true } | { ok: false; allowedText: string; error: string }` -- not a boolean or an exception. Errors are data. The check belongs in `makeSpawnAgentTool()` before `executeStartWorkflow()` is called, not inside the engine. The allowed workflows type should use a discriminated union: `'*'` (unrestricted) vs `{ kind: 'allowlist'; workflows: readonly string[] }` -- make illegal states unrepresentable, no stringly-typed `'*'` mixed with arrays.
229
+ `WorkflowEnricher` service in `src/daemon/workflow-enricher.ts`. Fires for root sessions (`spawnDepth === 0`) inside `runWorkflow()` before `buildPreAgentSession()`. `PriorNotesPolicy` discriminated type controls notes injection. 1s timeout with partial fallback on `listRecentSessions`. `EnricherResult` threaded as typed value through call chain -- trigger never mutated. All 6 entry points covered.
341
230
 
342
- **Things to hash out:**
343
- - Should this be on `TriggerDefinition` (per-trigger restriction) or on `agentConfig` (inheritable by child sessions)? Per-trigger is simpler; agentConfig inheritance is more flexible.
344
- - Default when absent: `'*'` (current behavior, no restriction) or an explicit opt-in? A safe default would restrict to the same `workflowId` as the parent, but that would break coordinator patterns that intentionally spawn different workflows.
231
+ **Pilot test gate still pending:** before declaring full success, verify agents reference prior notes in turn-1 reasoning in at least one real session.
345
232
 
346
233
  ---
347
234
 
@@ -1551,143 +1438,6 @@ Essential before WorkTrain manages more than 2-3 repos.
1551
1438
 
1552
1439
  ---
1553
1440
 
1554
- ### Self-improvement loop MVP: WorkTrain picks up and ships workrail issues end-to-end (May 8, 2026)
1555
-
1556
- **Status: idea** | Priority: high
1557
-
1558
- **Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: yes (blocked by: convention enforcement in review, scope filter at dispatch, protected file gate, interpretation checkpoint operator approval, verification agent, resolver agent)
1559
-
1560
- The self-improvement loop is the vision's north star and the primary test of whether WorkTrain works. This item defines the minimum viable version: WorkTrain picks up a labeled workrail issue, runs the full pipeline, and produces a correct, convention-compliant, reviewed PR -- without the operator intervening between phases except at the interpretation checkpoint.
1561
-
1562
- The quality bar is non-negotiable: WorkTrain produces exemplary code that passes the same review standard it applies to others. Every finding gets fixed before merge, regardless of severity. No "we'll note it and move on."
1563
-
1564
- ---
1565
-
1566
- **Full gate sequence:**
1567
-
1568
- ```
1569
- Issue labeled worktrain:ready on workrail repo
1570
-
1571
- [Gate 0: Scope filter] -- coordinator checks issue is safe to dispatch
1572
- PASS: single subsystem, no protected files, backlog Effort:3 or less
1573
- FAIL: comment on issue, remove label, do not dispatch
1574
- (pure TypeScript, no LLM, reads issue body + changed-files prediction)
1575
-
1576
- Adaptive coordinator: classify and select pipeline mode
1577
- (implement for scoped bugs/features, full for anything needing discovery)
1578
-
1579
- Discovery + shaping phases (if full pipeline)
1580
- Shaping output becomes the verifier's specification of "done"
1581
-
1582
- [Gate 1: Interpretation checkpoint -- ALWAYS requires operator approval]
1583
- Operator approves: coding begins
1584
- Operator edits: revised interpretation injected as coding context
1585
- Operator rejects: issue returned to queue, label removed
1586
- NOTE: no auto_confirm for the self-improvement loop, ever
1587
-
1588
- Coding phase (isolated worktree, branchStrategy: 'worktree')
1589
-
1590
- [Gate 2: Protected file check]
1591
- Checks git diff for: daemon-soul.md, triggers.yml,
1592
- src/v2/durable-core/ HMAC layer, docs/design/v2-core-design-locks.md,
1593
- src/daemon/ session lifecycle core
1594
- ANY HIT → stop immediately, escalate to operator, do not open PR
1595
- (deterministic script, no LLM)
1596
-
1597
- PR opened automatically
1598
-
1599
- [Gate 3: CI]
1600
- PASS → continue
1601
- FAIL → WorkTrain reads CI output, attempts one targeted fix,
1602
- pushes to same branch, re-runs CI
1603
- STILL FAILING → escalate to operator, do not proceed
1604
-
1605
- [Gate 4: Verification agent] -- independent QA agent, adversarial stance
1606
- Fed: original issue + shaped pitch (if exists) + implementation diff
1607
- Tools: Read, Glob, Grep, constrained Bash (npx vitest, git diff, git show only)
1608
- Job: prove each requirement is met with explicit evidence
1609
-
1610
- VerificationRecord output contract:
1611
- requirementsCoveredCount: number
1612
- requirementsTotal: number
1613
- evidencePerRequirement: Array<{
1614
- requirement: string
1615
- evidence: string // test output, grep result, etc.
1616
- confident: boolean
1617
- }>
1618
- gapsFound: ReadonlyArray<string> // asked for, not implemented
1619
- unexpectedScope: ReadonlyArray<string> // implemented, not asked for
1620
- verdict: 'approved' | 'gaps_found' | 'disputed' | 'uncertain'
1621
-
1622
- 'approved' → proceed to review
1623
- 'gaps_found' → back to coding agent with gaps as context (one retry only)
1624
- 'uncertain' → escalate to operator
1625
- 'disputed' → Resolver agent (see below)
1626
-
1627
- [Gate 4b: Resolver agent -- fires only on 'disputed']
1628
- Third independent agent, no stake in either position
1629
- Fed: original requirement text, coding agent's implementation rationale
1630
- (from session notes), verifier's specific objection + evidence
1631
- Tools: same constrained Bash as verifier
1632
- Job: determine ground truth -- does the code satisfy the requirement?
1633
-
1634
- Resolver verdict (binding -- coding agent cannot argue back):
1635
- 'satisfied' → proceed to review
1636
- 'not_satisfied' → back to coding agent with resolver's rationale (final)
1637
- 'requirement_ambiguous' → escalate to operator with:
1638
- original requirement + both agents' positions + resolver's analysis
1639
- + suggested clarification for the operator to add to the issue
1640
-
1641
- NOTE: ACP (agent-to-agent real-time messaging) is a future enhancement
1642
- to this gate. When ACP ships, verifier and resolver could exchange
1643
- structured messages through the coordinator rather than using a batch
1644
- three-agent panel. The coordinator-mediated panel is the MVP approach.
1645
-
1646
- [Gate 5: wr.mr-review -- calibrated for workrail]
1647
- Loaded with: coding philosophy principles, daemon invariants doc,
1648
- design locks, commit message rules, neverthrow/assertNever conventions
1649
- ALL findings fixed before proceeding, regardless of severity
1650
- WorkTrain fixes, re-reviews until clean -- no threshold, no "note it"
1651
-
1652
- Auto-merge (squash, delete worktree)
1653
-
1654
- Issue closed, backlog item marked done
1655
- ```
1656
-
1657
- ---
1658
-
1659
- **What needs to be built (in dependency order):**
1660
-
1661
- 1. **Convention enforcement in wr.mr-review for workrail** -- workspace context that injects coding philosophy, design locks, and daemon invariants as explicit review criteria. Without this, Gate 5 is generic and toothless.
1662
-
1663
- 2. **Scope filter at dispatch (Gate 0)** -- pure TypeScript coordinator check. Refuses dispatch if: protected files predicted in scope, issue touches multiple subsystems, issue is architectural. Reads issue body + label set.
1664
-
1665
- 3. **Protected file gate (Gate 2)** -- post-coding delivery-layer script, runs `git diff --name-only` against a blocklist. Hard stop, no retry.
1666
-
1667
- 4. **Interpretation checkpoint wired to operator approval (Gate 1)** -- in daemon sessions on the workrail repo, the interpretation checkpoint must never auto-confirm. Coordinator sets `requireInterpretationApproval: true` for this trigger.
1668
-
1669
- 5. **Verification agent (Gate 4)** -- new agent role with dedicated system prompt (adversarial QA stance), constrained Bash tool variant, and `VerificationRecord` output contract enforced by the engine.
1670
-
1671
- 6. **Resolver agent (Gate 4b)** -- new agent role, binding verdict, same constrained Bash. Coordinator spawns only on `disputed` verdict.
1672
-
1673
- 7. **CI failure one-retry loop (Gate 3)** -- coordinator reads CI status via `gh`, spawns a targeted-fix session if failing, re-polls.
1674
-
1675
- ---
1676
-
1677
- **Spawn depth:** The full gate sequence uses coordinator → coding → verifier → (if disputed) resolver. That's depth 3, hitting the current default `maxSubagentDepth: 3`. The workrail self-improvement trigger needs `maxSubagentDepth: 4` in `triggers.yml`.
1678
-
1679
- **MVP issue scope:** Start with issues that are: scoped to one file or directory, have Effort:3 in the backlog (hours to a day), and don't touch `src/v2/durable-core/` or `src/daemon/` session lifecycle. Good candidates: infra utilities, CLI commands, test coverage, observability additions.
1680
-
1681
- **Trust ramp:** For the first 10 issues, operator reviews the interpretation checkpoint output in full before approving. After 10 clean runs (no gaps found by verifier, no resolver invocations), interpretation checkpoint can be reviewed asynchronously (operator approves via `worktrain inbox`). After 20 clean runs, the gate timing can be relaxed further. The loop tightens based on track record, not a timer.
1682
-
1683
- **Things to hash out:**
1684
- - The constrained Bash tool for the verifier/resolver is a new tool variant not yet in the codebase -- `makeConstrainedBashTool(allowedPrefixes: readonly string[])`. Where does it live and how is it enforced?
1685
- - `requireInterpretationApproval: true` is not a current field on `TriggerDefinition`. Does it go on `agentConfig` or as a top-level trigger field?
1686
- - How does the operator approve/reject at Gate 1 in practice? Via `worktrain inbox`? Via console? This needs to work reliably at 3am when the operator isn't watching.
1687
- - The one-retry CI fix (Gate 3) spawns a child session -- that child needs access to CI failure output. Does the coordinator fetch it via `gh` and inject it as context, or does the child agent fetch it itself?
1688
-
1689
- ---
1690
-
1691
1441
  ### Demo repo feedback loop: WorkTrain improves itself via real task execution (Apr 20, 2026)
1692
1442
 
1693
1443
  **Status: idea** | Priority: high
@@ -2259,80 +2009,6 @@ This is already how mid-run resume works. The same mechanism extends naturally t
2259
2009
 
2260
2010
  ---
2261
2011
 
2262
- ### `withTimeout` + `withRetry` as first-class async boundary utilities (May 7, 2026)
2263
-
2264
- **Status: idea** | Priority: medium
2265
-
2266
- **Score: 10** | Cor:2 Cap:1 Eff:3 Lev:2 Con:3 | Blocked: no
2267
-
2268
- WorkTrain's daemon has no composable timeout or retry primitives. `AgentLoop` has a stall timer wired in internally; `PollingScheduler` has no error backoff; `startup-recovery.ts` makes one attempt per session with no retry. The vision says "cancellation/timeouts are first-class" and "overnight-safe" -- but the infrastructure for that is scattered or absent.
2269
-
2270
- The pattern (from `etienne-clone/src/types.ts`):
2271
-
2272
- ```typescript
2273
- withTimeout<T>(fn: (signal: AbortSignal) => Promise<T>, ms: number, label: string): Promise<Result<T, TimeoutError>>
2274
- withRetry<T, E>(fn: () => Promise<Result<T, E>>, config: RetryConfig): Promise<Result<T, E>>
2275
- ```
2276
-
2277
- `RetryConfig` has `retryOn: (error: unknown) => boolean` -- the caller decides what's retryable, not the primitive. `withTimeout` threads `AbortSignal` into the function so cancellation propagates correctly. Both return `Result` types, never throw.
2278
-
2279
- **Adaptation note:** etienne-clone rolls its own `Result<T,E>`; WorkTrain uses `neverthrow`. The `RetryConfig` shape and `retryOn` predicate are directly portable. The function bodies need to be rewritten against `ResultAsync` from neverthrow rather than copied verbatim.
2280
-
2281
- **Philosophy note:** "Higher-order functions as a tool" -- retry and timeout are cross-cutting behaviors that should be composed around functions, not scattered across call sites. "Cancellation/timeouts are first-class" is a stated coding principle. These primitives make it structurally impossible to call an async boundary without deciding upfront whether it can timeout or retry.
2282
-
2283
- **Done looks like:** `src/infra/async-boundaries.ts` exports `withTimeout()` and `withRetry()` using neverthrow `ResultAsync`. Used at: coordinator `callbackUrl` POST retries, polling error recovery, startup-recovery rehydrate attempts.
2284
-
2285
- ---
2286
-
2287
- ### `OrchestratorWorkflowAvailability` pattern: make missing-workflow states unrepresentable (May 7, 2026)
2288
-
2289
- **Status: idea** | Priority: medium
2290
-
2291
- **Score: 9** | Cor:2 Cap:1 Eff:3 Lev:1 Con:3 | Blocked: no
2292
-
2293
- WorkTrain's adaptive coordinator checks whether a requested workflow exists at dispatch time, but the check is implicit -- a missing workflow ID returns `workflow_not_found` from the engine at runtime, mid-session, after a session slot has been consumed. There is no compile-time or startup-time guarantee that the coordinator cannot route to a non-existent workflow.
2294
-
2295
- The pattern (from `etienne-clone/src/pipeline/orchestrator-workflow-selection.ts`):
2296
-
2297
- ```typescript
2298
- type OrchestratorWorkflowAvailability =
2299
- | { kind: 'standard_only' }
2300
- | { kind: 'standard_and_focused' }
2301
-
2302
- assessAvailableOrchestratorWorkflows(
2303
- availableWorkflowIds: readonly string[],
2304
- workflowIds: OrchestratorWorkflowIds,
2305
- ): Result<OrchestratorWorkflowAvailability, string>
2306
- ```
2307
-
2308
- The `standard=false` state cannot be constructed -- `assessAvailableOrchestratorWorkflows` returns `err` if the standard workflow is missing. The coordinator only ever holds a valid `OrchestratorWorkflowAvailability` value, so its dispatch logic cannot route to a missing workflow.
2309
-
2310
- **Philosophy note:** "Make illegal states unrepresentable" -- the type system enforces that routing only happens after availability is confirmed. A bare string `workflowId` can point to anything; a typed `WorkflowAvailability` discriminated union can only be constructed when the workflows actually exist. This is the difference between a label and a constraint.
2311
-
2312
- **Done looks like:** WorkTrain's adaptive coordinator calls `assessAvailableWorkflows(ctx, workflowIds)` at startup or pre-dispatch. Returns `Result<WorkflowAvailability, string>`. The dispatch function takes `WorkflowAvailability` as a parameter -- it is structurally impossible to call without first confirming availability.
2313
-
2314
- ---
2315
-
2316
- ### Per-session cost tracking: estimated spend visible in execution stats and console (May 7, 2026)
2317
-
2318
- **Status: idea** | Priority: medium
2319
-
2320
- **Score: 8** | Cor:1 Cap:2 Eff:3 Lev:1 Con:3 | Blocked: no
2321
-
2322
- WorkTrain records `inputTokens` and `outputTokens` per session in `LlmTurnCompletedEvent` and step-level metrics, but never converts them to an estimated dollar cost. Operators have no way to see how much a session cost, which workflows are expensive, or when a stuck session has burned a disproportionate budget.
2323
-
2324
- The pattern (from `etienne-clone/src/observability/cost.ts`): `estimateCost(modelId, usage, env)` is a pure function that returns `Result<number, CostEstimationError>`. Pricing is overridable via `LLM_INPUT_COST_PER_1M` and `LLM_OUTPUT_COST_PER_1M` env vars (injected for testability). Token counts are already in `LlmTurnCompletedEvent` and the v2 session store.
2325
-
2326
- **Philosophy note:** "Observability as a constraint" -- cost is a first-class observable dimension of a session, not a post-hoc calculation. The env-injection pattern (`env: EnvRecord = process.env`) is already how WorkTrain tests env-dependent code. The function is pure and trivially testable.
2327
-
2328
- **Done looks like:** `src/observability/cost.ts` (port from etienne-clone, adapting to WorkTrain's model IDs). `estimatedCostUsd` added to `execution-stats.jsonl` rows and `SessionCompletedEvent`. Console session detail shows cost. Alert threshold emits `orchestrator_review_warning`-equivalent event when a session exceeds a configured per-workflow cost cap.
2329
-
2330
- **Things to hash out:**
2331
- - Should cost alerts trigger escalation (surface to operator) or just log? A stuck session burning $5 should probably escalate.
2332
- - Per-workflow cost caps belong in `TriggerDefinition.agentConfig` or in a separate `costPolicy` block?
2333
-
2334
- ---
2335
-
2336
2012
  ### Extensible output contract registration: coordinator-owned schemas, engine-enforced (Apr 30, 2026)
2337
2013
 
2338
2014
  **Status: idea** | Priority: medium
@@ -2457,155 +2133,6 @@ The problem is not just "add an LLM to make the decision." An LLM making approva
2457
2133
 
2458
2134
  ---
2459
2135
 
2460
- ### Multi-score deterministic workflow routing: replace string-matching coordinator dispatch with typed scoring (May 7, 2026)
2461
-
2462
- **Status: idea** | Priority: high
2463
-
2464
- **Score: 12** | Cor:2 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
2465
-
2466
- WorkTrain's adaptive coordinator selects which pipeline to run (quick_review, review_only, implement, full) based on task content. The current dispatch uses heuristics and LLM-assisted classification. This violates vision principle #1: zero LLM turns for routing. Coordinator decisions must be deterministic TypeScript code, not LLM reasoning.
2467
-
2468
- The pattern: compute independent typed scores (size, complexity, risk, breadth) over the task's structured metadata, classify the task into a typed shape discriminated union (`isolated_fix`, `small_cohesive_behavior`, `broad_or_risky`, etc.), find hard blockers (conditions that force a specific pipeline regardless of scores), and return a typed `WorkflowAssessment` with score breakdown, shape, blockers, and a human-readable reason string. No LLM call in the routing path.
2469
-
2470
- Pattern source: `etienne-clone/src/pipeline/orchestrator-workflow-selection.ts` -- `classifyMrShape()`, `assessWorkflowRoute()`, `chooseOrchestratorWorkflow()`. Also: `OrchestratorWorkflowAvailability` discriminated union ensures the "standard workflow missing" state cannot be constructed -- `assessAvailableOrchestratorWorkflows()` returns `Result<OrchestratorWorkflowAvailability, string>` so callers only hold valid states.
2471
-
2472
- **Philosophy note:** This is the vision's "control flow from data state" principle made concrete: routing decisions derive from an explicit typed state machine over task scores, not from an LLM's implicit reasoning. Exhaustiveness on the shape union (`assertNever` in the confidence function) makes the routing logic refactor-safe. The `WorkflowAssessment` return type (not a bare string) makes every decision traceable without reading session transcripts.
2473
-
2474
- **Done looks like:** The adaptive coordinator dispatches entirely from a `WorkflowAssessment` value produced by a pure TypeScript function over the task's metadata. No LLM call occurs before `runAdaptivePipeline()` selects a mode. A test can assert the routing decision for any task shape without mocking an LLM.
2475
-
2476
- **Things to hash out:**
2477
- - What dimensions make sense for WorkTrain tasks? MR review uses size/cohesion/risk/breadth over file diffs. WorkTrain tasks may need different dimensions (specificity, ambiguity, scope breadth, ticket maturity).
2478
- - Where does the input data come from? The task candidate (from queue poll) has title, body, labels, issue number. Is that enough to score confidently without an LLM?
2479
- - Should hard blockers be configurable per-trigger (e.g. "tasks labeled `security` always use the full pipeline") or hardcoded in the assessment function?
2480
-
2481
- ---
2482
-
2483
- ### Required `reason` field on coordinator signals: typed audit trail for headless gate decisions (May 7, 2026)
2484
-
2485
- **Status: idea** | Priority: high
2486
-
2487
- **Score: 11** | Cor:2 Cap:2 Eff:3 Lev:2 Con:3 | Blocked: no
2488
-
2489
- When an agent calls `signal_coordinator` with `kind: 'approval_needed'` or `kind: 'blocked'`, the coordinator receives a signal but not necessarily the agent's reasoning. The coordinator (and operator) must read the full session transcript to understand why the agent requested approval. At scale -- dozens of sessions per day -- transcript reading is impractical. Signals without stated reasoning are unauditable.
2490
-
2491
- Vision principle: "every decision visible in the session store." A signal that doesn't include the agent's stated reasoning violates this -- the decision (to pause and request approval) is visible, but the reason for it is buried in prose.
2492
-
2493
- The fix is structural: make `reason` a required field on `approval_needed` and `blocked` signal kinds. The engine or coordinator rejects signals that omit it. This is the same pattern as `SelfConfirmEvent.validationReason` in `etienne-clone/src/types.ts`, where the comment reads: "`validationReason` is REQUIRED -- it's the audit trail for headless gates."
2494
-
2495
- Pattern source: `etienne-clone/src/types.ts` `SelfConfirmEvent` -- `readonly validationReason: string` required (not optional) on the self-confirm event interface.
2496
-
2497
- **Philosophy note:** "Errors are data" applies to under-specified signals too. A signal with no stated reason is incomplete data -- the receiver cannot act on it deterministically. Making `reason` required at the schema level turns a runtime ambiguity into a compile-time constraint. Capability-based: the signal type declares what information it carries, enforced by the schema, not by convention.
2498
-
2499
- **Done looks like:** `CoordinatorSignalKindSchema` for `approval_needed` and `blocked` includes `reason: z.string().min(10)`. The `signal_emitted` daemon event includes the reason. `worktrain inbox` shows it. Coordinators can filter and route based on `reason` text without reading session transcripts.
2500
-
2501
- ---
2502
-
2503
- ### Typed finding extraction with multi-strategy fallback: enforce structured output contracts at coordinator boundaries (May 7, 2026)
2504
-
2505
- **Status: idea** | Priority: high
2506
-
2507
- **Score: 12** | Cor:3 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: no
2508
-
2509
- WorkTrain's coordinators read structured phase handoff artifacts (review verdicts, discovery summaries, shaped pitches) from session output. Today, if an agent doesn't produce a clean JSON handoff, the coordinator either fails hard or reads free-text that it can't reliably parse. There is no graceful degradation ladder for malformed or missing structured output.
2510
-
2511
- Vision principle: "structured outputs at every boundary" and "typed contracts make phases composable." This requires not just that the engine validates artifacts when they're present, but that the coordinator has a systematic recovery strategy when they aren't -- without silently accepting garbage.
2512
-
2513
- The pattern (from `etienne-clone/src/findings/extract.ts`):
2514
- 1. Try the ideal path: find the typed artifact in the last `complete_step` output
2515
- 2. Fallback: parse a JSON block from the agent's final text response, normalize field names and casing
2516
- 3. Fallback: reconstruct from observable side effects (e.g. tool calls that imply what the agent concluded)
2517
- 4. All paths return `Result<T, ExtractionError>` with a typed error union (`no_handoff_output`, `invalid_json`, `schema_mismatch`) -- never throw, never silently accept
2518
-
2519
- The normalization layer (`normalizeRawFindingsOutput`) handles the practical reality that agents produce `"APPROVE"` when the schema says `"approve"`, or `reviewFindings` when the field should be `findings`. This is boundary validation done right.
2520
-
2521
- **Philosophy note:** "Validate at boundaries, trust inside" -- this is exactly the boundary. The coordinator trusts the extracted artifact once it passes validation; it never trusts raw agent output. "Errors are data" -- `ExtractionError` is a typed discriminated union that lets coordinators route on failure kind, not parse error messages.
2522
-
2523
- **Done looks like:** Every coordinator phase that reads a structured handoff uses an `extractHandoff<T>()` function that applies the three-strategy fallback chain and returns `Result<T, HandoffExtractionError>`. The coordinator handles `err` cases explicitly -- escalate to operator, retry the phase, or degrade gracefully -- rather than crashing or silently accepting bad output.
2524
-
2525
- **Things to hash out:**
2526
- - Strategy 3 (reconstruct from tool calls) is highly session-type-specific. Should each coordinator define its own reconstruction strategy, or is there a generic fallback (e.g. "use the last `complete_step` notes as free text")?
2527
- - Should extraction errors produce a `report_issue` record automatically, or is that the coordinator's responsibility?
2528
-
2529
- ---
2530
-
2531
- ### `InteractionIntent` discriminated union: platform-neutral operator decision model (May 7, 2026)
2532
-
2533
- **Status: idea** | Priority: high
2534
-
2535
- **Score: 12** | Cor:2 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
2536
-
2537
- WorkTrain has no typed model for operator decisions on in-flight pipeline outcomes -- approving a PR, unblocking a stuck session, dropping an escalated finding, confirming an interpretation. Today, operator actions arrive as CLI commands (`worktrain tell`) or console HTTP calls, but the domain has no typed representation of what the operator actually decided. The coordinator cannot route on decision kind without parsing free text.
2538
-
2539
- The pattern (from `etienne-clone/src/messaging/types.ts`): `InteractionIntent` is an exhaustive discriminated union of everything an operator can decide -- `approve_all`, `skip`, `post_finding`, `drop_finding`, `edit_finding`, `batch_post`, `batch_drop`. Platform adapters (Slack buttons, console HTTP, CLI) translate operator input into these intents. The domain handler switches on them exhaustively with `assertNever`. Neither layer knows about the other.
2540
-
2541
- This directly addresses WorkTrain's open gap around human-in-the-loop for critical findings and stuck session escalation: the console or CLI emits a typed `OperatorIntent`, the coordinator handles it the same way regardless of channel.
2542
-
2543
- Pattern source: `etienne-clone/src/messaging/types.ts` `InteractionIntent` + `etienne-clone/src/messaging/review-handler.ts` `createReviewInteractionHandler()`.
2544
-
2545
- **Philosophy note:** "Make illegal states unrepresentable" -- a bare string `action=approve` can carry anything; a typed `{ kind: 'approve_all'; sessionId }` cannot. "Exhaustiveness everywhere" -- the switch on `intent.kind` with `assertNever` ensures adding a new decision kind forces every handler to be updated. "Capability-based" -- the operator is given exactly the decisions they can make, enforced by the type, not by convention.
2546
-
2547
- **Done looks like:** `OperatorIntent` discriminated union covers the decisions WorkTrain surfaces to operators: `approve_pr`, `block_pr`, `unblock_session`, `drop_escalation`, `confirm_interpretation`. Console and CLI translate input into `OperatorIntent` values. The coordinator handler switches on them. A test can assert coordinator behavior for any intent without mocking a UI.
2548
-
2549
- **Things to hash out:**
2550
- - Which decisions should be in scope for v1? Not all decisions need to be interactive -- many can be autonomous. The union should cover only the cases where human judgment is genuinely required.
2551
- - Should `OperatorIntent` carry a `reason?: string` (optional for human input, vs required for synthetic gates)? Or separate types for human vs synthetic decisions?
2552
-
2553
- ---
2554
-
2555
- ### `PendingDecision` store with pure state transitions: immutable approval state for operator-gated pipeline actions (May 7, 2026)
2556
-
2557
- **Status: idea** | Priority: high
2558
-
2559
- **Score: 11** | Cor:2 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: yes (needs InteractionIntent above)
2560
-
2561
- When a WorkTrain pipeline reaches a point requiring operator approval -- a critical finding before merge, an interpretation confirmation before coding, a PR before auto-merge -- it currently either blocks indefinitely (waiting for an outbox message) or skips the gate. There is no structured in-memory state tracking what's pending, what the operator decided, and what the disposition of each item is.
2562
-
2563
- The pattern (from `etienne-clone/src/messaging/pending-reviews.ts`): `PendingReview` is a fully immutable snapshot with `approvedFindings: ReadonlySet<FindingId>`, `droppedFindings: ReadonlySet<FindingId>`, `editedBodies: ReadonlyMap<FindingId, string>`, `status: 'pending' | 'posted' | 'skipped'`. All state transitions are pure functions `(state: PendingDecision) => PendingDecision` composed with `store.update(id, fn)` -- `approveFinding`, `dropFinding`, `editFinding`, `approveAllBySeverity`. The store is a thin `Map` wrapper with `add`, `get`, `update(id, fn)`, `cleanup(olderThanMs)`.
2564
-
2565
- Pattern source: `etienne-clone/src/messaging/pending-reviews.ts`.
2566
-
2567
- **Philosophy note:** "Derive state, don't accumulate it" -- each transition is a pure function over the current state, not an imperative mutation. `ReadonlySet` and `ReadonlyMap` enforce immutability at the type level. "Single source of state truth" -- the `PendingDecision` store is the one place that tracks what an operator has decided; no parallel flags or session-level booleans.
2568
-
2569
- **Done looks like:** `PendingDecisionStore` in `src/coordinator/pending-decisions.ts`. Pipeline sessions that reach an operator gate call `pendingDecisions.add(decision)`. The coordinator polls or subscribes and calls `pendingDecisions.update(id, applyIntent(intent))` when the operator acts. `cleanup(24 * 60 * 60 * 1000)` runs at startup to expire stale pending decisions.
2570
-
2571
- ---
2572
-
2573
- ### `BriefingDelivery` + `InteractionPort` ports: hexagonal adapter contract for operator notification channels (May 7, 2026)
2574
-
2575
- **Status: idea** | Priority: medium
2576
-
2577
- **Score: 10** | Cor:1 Cap:3 Eff:2 Lev:2 Con:2 | Blocked: yes (needs InteractionIntent above)
2578
-
2579
- WorkTrain's notification path (when it ships) will need to deliver pipeline results to operators and receive their decisions back. If the notification logic is coupled to a specific channel (Slack, console, CLI), adding a second channel requires duplicating the entire delivery + interaction wiring. There is no current abstraction that separates "what to deliver" from "how to deliver it."
2580
-
2581
- The pattern (from `etienne-clone/src/messaging/port.ts`): two tiny focused interfaces -- `BriefingDelivery` (domain → adapter: `deliverBriefing`, `updateStatus`, `markCompleted`, `notifySkipped`) and `InteractionPort` (adapter → domain: `start`, `stop`, `onInteraction`). The adapter owns all layout decisions (threads, message IDs, button rendering). The domain never sees channel-specific types. Any channel -- Slack, console, webhook, CLI -- implements the same two interfaces.
2582
-
2583
- Pattern source: `etienne-clone/src/messaging/port.ts`.
2584
-
2585
- **Philosophy note:** "Keep interfaces small and focused" -- two interfaces with four methods each, no leakage of platform details into the domain. "Dependency injection for boundaries" -- the coordinator receives `BriefingDelivery` and `InteractionPort` as injected dependencies; swapping Slack for the console is a constructor argument change. "Capability-based architecture" -- the coordinator can only do what the `BriefingDelivery` interface exposes, not arbitrary channel operations.
2586
-
2587
- **Done looks like:** `PipelineDelivery` and `OperatorInteractionPort` interfaces in `src/coordinator/ports.ts`. Console HTTP adapter and CLI adapter each implement both. The coordinator takes them as constructor injections. Adding a Slack adapter requires only implementing the two interfaces, no coordinator changes.
2588
-
2589
- ---
2590
-
2591
- ### `CorrectionEvent` with typed `correctionType`: structured learning from operator edits to pipeline outputs (May 7, 2026)
2592
-
2593
- **Status: idea** | Priority: medium
2594
-
2595
- **Score: 9** | Cor:1 Cap:2 Eff:2 Lev:2 Con:2 | Blocked: yes (needs PendingDecision store above)
2596
-
2597
- When an operator edits a WorkTrain output -- rewrites a PR description, changes a finding severity, adjusts a summary -- that edit is currently invisible to the system. The session event log records that the pipeline ran; it does not record what the operator thought was wrong with the output. This is the core gap in the per-run retrospective backlog item: there is no structured way to capture what the operator corrected.
2598
-
2599
- The pattern (from `etienne-clone/src/messaging/review-handler.ts` `emitCorrection()`): when an operator edits a finding, `CorrectionEvent` is emitted with a typed `correctionType: { textChanged, severityChanged, locationChanged, principleChanged, dropped }`. The event also carries `{ original, edited, context }`. Repeated correction patterns across sessions become training signal for improving prompts and workflows.
2600
-
2601
- Pattern source: `etienne-clone/src/messaging/review-handler.ts` `emitCorrection()` + `etienne-clone/src/types.ts` `CorrectionEvent`.
2602
-
2603
- **Philosophy note:** "Observability as a constraint" -- operator corrections are first-class events in the session store, not manual notes. "Errors are data" -- a correction is structured data about what the system got wrong, not a free-text annotation. The typed `correctionType` makes the correction machine-readable without LLM parsing.
2604
-
2605
- **Done looks like:** `PipelineCorrectionEvent` added to `DaemonEvent` union. When an operator edits a WorkTrain output via the `PendingDecision` flow, the correction is classified and emitted. `worktrain logs` surfaces corrections. A future analytics pass can aggregate correction patterns per workflow and per step to identify systematic output quality gaps.
2606
-
2607
- ---
2608
-
2609
2136
  ### Agents must not perform delivery actions -- only the coordinator's delivery layer can (Apr 30, 2026)
2610
2137
 
2611
2138
  **Status: idea** | Priority: high
@@ -5615,6 +5142,23 @@ The agent is expensive, inconsistent, and slow. Scripts are free, deterministic,
5615
5142
 
5616
5143
  ---
5617
5144
 
5145
+ ### Branch dependency tracking: prevent accidental stacking and handle intentional stacking correctly (May 8, 2026)
5146
+
5147
+ **Status: idea** | Priority: high
5148
+
5149
+ **Score: 12** | Cor:3 Cap:2 Eff:2 Lev:3 Con:2 | Blocked: no
5150
+
5151
+ WorkTrain creates branches and opens PRs without tracking whether a branch was created from main or from another in-flight PR branch. When a branch is accidentally based on a pending PR branch, two problems follow: (1) squash-merging the base PR absorbs the dependent PR's commits, making the dependent PR either empty or conflicted; (2) CI on the dependent PR tests the combined diff, not just the intended change. This has already caused real merge failures and required manual rebases. When stacking is intentional (PR B genuinely depends on PR A), WorkTrain has no mechanism to enforce merge order or automatically rebase B after A lands.
5152
+
5153
+ **Things to hash out:**
5154
+ - Should WorkTrain enforce "always branch from main" as a hard rule, or support intentional stacking with explicit dependency metadata?
5155
+ - If stacking is allowed, what is the right representation for a stack dependency -- a field in the session store, a git note, or a GitHub PR relationship?
5156
+ - When the base PR merges, who is responsible for rebasing dependents -- the coordinator, a post-merge hook, or a separate `worktrain rebase` command?
5157
+ - One candidate approach for the accidental case: detect at branch creation time whether HEAD is on a pending PR branch and warn or block. For the intentional case, this would be wrong -- some work genuinely depends on an unmerged PR and the stack is correct. The distinction needs to be explicit, not inferred. Take with a large grain of salt.
5158
+ - What should happen to a dependent branch mid-session when its base merges? Should the daemon interrupt the session, or let it finish and rebase at the end?
5159
+
5160
+ ---
5161
+
5618
5162
  ### Worktree and branch lifecycle management
5619
5163
 
5620
5164
  WorkTrain has no tooling to surface the state of worktrees and branches relative to main. Doing this manually today requires running git commands across every registered worktree, cross-referencing merged PR lists, and inspecting each branch's unique commits to determine if the work landed. Pain points observed in practice:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.79.2",
3
+ "version": "3.80.0",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {