@exaudeus/workrail 3.32.0 → 3.33.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli/commands/index.d.ts +1 -0
- package/dist/cli/commands/index.js +3 -1
- package/dist/cli/commands/worktrain-await.js +11 -9
- package/dist/cli/commands/worktrain-daemon-install.d.ts +35 -0
- package/dist/cli/commands/worktrain-daemon-install.js +291 -0
- package/dist/cli/commands/worktrain-daemon.d.ts +31 -0
- package/dist/cli/commands/worktrain-daemon.js +272 -0
- package/dist/cli/commands/worktrain-spawn.js +11 -9
- package/dist/cli-worktrain.js +329 -0
- package/dist/cli.js +1 -22
- package/dist/console/standalone-console.d.ts +28 -0
- package/dist/console/standalone-console.js +142 -0
- package/dist/{console/assets/index-Cb_LO718.js → console-ui/assets/index-BuJFLLfY.js} +1 -1
- package/dist/{console → console-ui}/index.html +1 -1
- package/dist/daemon/agent-loop.d.ts +26 -0
- package/dist/daemon/agent-loop.js +39 -1
- package/dist/daemon/daemon-events.d.ts +47 -1
- package/dist/daemon/workflow-runner.d.ts +3 -2
- package/dist/daemon/workflow-runner.js +205 -41
- package/dist/infrastructure/session/HttpServer.js +133 -34
- package/dist/manifest.json +118 -62
- package/dist/mcp/output-schemas.d.ts +30 -30
- package/dist/mcp/transports/bridge-events.d.ts +4 -0
- package/dist/mcp/transports/fatal-exit.js +4 -0
- package/dist/mcp/transports/http-entry.js +2 -0
- package/dist/mcp/transports/stdio-entry.js +26 -6
- package/dist/mcp/v2/tools.d.ts +4 -4
- package/dist/trigger/adapters/github-poller.d.ts +44 -0
- package/dist/trigger/adapters/github-poller.js +190 -0
- package/dist/trigger/adapters/gitlab-poller.d.ts +27 -0
- package/dist/trigger/adapters/gitlab-poller.js +81 -0
- package/dist/trigger/index.d.ts +4 -1
- package/dist/trigger/index.js +5 -1
- package/dist/trigger/polled-event-store.d.ts +22 -0
- package/dist/trigger/polled-event-store.js +173 -0
- package/dist/trigger/polling-scheduler.d.ts +20 -0
- package/dist/trigger/polling-scheduler.js +249 -0
- package/dist/trigger/trigger-listener.d.ts +3 -0
- package/dist/trigger/trigger-listener.js +47 -3
- package/dist/trigger/trigger-store.js +114 -33
- package/dist/trigger/types.d.ts +17 -1
- package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +224 -224
- package/dist/v2/durable-core/schemas/session/events.d.ts +42 -42
- package/dist/v2/durable-core/schemas/session/manifest.d.ts +6 -6
- package/dist/v2/durable-core/schemas/session/validation-event.d.ts +2 -2
- package/dist/v2/durable-core/tokens/payloads.d.ts +52 -52
- package/dist/v2/usecases/console-routes.js +3 -3
- package/dist/v2/usecases/console-service.js +133 -9
- package/dist/v2/usecases/console-types.d.ts +7 -0
- package/docs/design/daemon-conversation-logging-plan.md +98 -0
- package/docs/design/daemon-conversation-logging-review.md +55 -0
- package/docs/design/daemon-conversation-logging.md +129 -0
- package/docs/design/github-polling-adapter-design-candidates.md +226 -0
- package/docs/design/github-polling-adapter-design-review-findings.md +131 -0
- package/docs/design/github-polling-adapter-implementation-plan.md +284 -0
- package/docs/design/implementation_plan.md +192 -0
- package/docs/design/workflow-id-validation-at-startup.md +146 -0
- package/docs/design/workflow-id-validation-design-review.md +87 -0
- package/docs/design/workflow-id-validation-implementation-plan.md +185 -0
- package/docs/design/worktrain-system-prompt-report-issue-candidates.md +135 -0
- package/docs/design/worktrain-system-prompt-report-issue-design-review.md +73 -0
- package/docs/ideas/backlog.md +361 -0
- package/package.json +1 -1
- package/workflows/architecture-scalability-audit.json +1 -1
- package/workflows/bug-investigation.agentic.v2.json +3 -3
- package/workflows/coding-task-workflow-agentic.json +32 -32
- package/workflows/coding-task-workflow-agentic.lean.v2.json +1 -1
- package/workflows/coding-task-workflow-agentic.v2.json +7 -7
- package/workflows/mr-review-workflow.agentic.v2.json +21 -12
- package/workflows/personal-learning-materials-creation-branched.json +2 -2
- package/workflows/production-readiness-audit.json +1 -1
- package/workflows/relocation-workflow-us.json +2 -2
- package/workflows/ui-ux-design-workflow.json +14 -14
- package/workflows/workflow-for-workflows.json +3 -3
- package/workflows/workflow-for-workflows.v2.json +2 -2
- package/workflows/wr.discovery.json +1 -1
- /package/dist/{console → console-ui}/assets/index-8dh0Psu-.css +0 -0
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# WorkTrain: System Prompt Preamble + report_issue Tool -- Design Candidates
|
|
2
|
+
|
|
3
|
+
## Problem Understanding
|
|
4
|
+
|
|
5
|
+
### Core Tensions
|
|
6
|
+
|
|
7
|
+
1. **Preamble richness vs. existing test assertions** -- The current `buildSystemPrompt()` preamble is ~15 lines (identity, tools, execution contract, session state). The new spec replaces it with ~55 lines covering oracle hierarchy, self-directed reasoning, workflow-as-contract, silent failure, and tools-as-hands. The existing tests assert specific strings (`'You are WorkRail Auto'`, `'## Your tools'`, `'## Execution contract'`) must be present. The new content must include those strings or the tests break.
|
|
8
|
+
|
|
9
|
+
2. **report_issue write durability vs. fire-and-forget contract** -- The JSONL file must be written reliably enough for a future coordinator to read it, but a disk-full or permission error must never interrupt the agent session. Resolution: same void+catch pattern as `DaemonEventEmitter`.
|
|
10
|
+
|
|
11
|
+
3. **Purity of buildSystemPrompt vs. tool name references in prose** -- The new preamble mentions `report_issue` by name in the "silent failure" section. If the tool name ever changes, the system prompt text becomes stale. This is an accepted documentation-drift risk; tool names are stable identifiers.
|
|
12
|
+
|
|
13
|
+
4. **YAGNI (no coordinator yet) vs. extractability (coordinator coming soon)** -- The auto-fix coordinator that will read the issue JSONL doesn't exist. Building a dedicated `IssueStore` class now is speculative. However, the file write must still work for the coordinator when it arrives.
|
|
14
|
+
|
|
15
|
+
### Likely Seam
|
|
16
|
+
|
|
17
|
+
- **Preamble:** `buildSystemPrompt()` lines 1086-1108 in `src/daemon/workflow-runner.ts`. This is the correct seam -- pure function, all callers go through it.
|
|
18
|
+
- **report_issue tool:** The tools array in `runWorkflow()` at lines 1318-1323. New tool factory `makeReportIssueTool()` slots in here.
|
|
19
|
+
- **DaemonEvent union:** `src/daemon/daemon-events.ts` -- add `IssueReportedEvent` + extend union.
|
|
20
|
+
|
|
21
|
+
### What Makes This Hard
|
|
22
|
+
|
|
23
|
+
- Existing test assertions constrain the new preamble content -- `'## Your tools'` and `'## Execution contract'` must survive the rewrite.
|
|
24
|
+
- `report_issue.execute()` must be fire-and-forget for the JSONL write. Junior devs would `await` it directly and let I/O errors propagate.
|
|
25
|
+
- `IssueReportedEvent` fields must be typed as literal union strings (not `string`), otherwise illegal kind/severity values are representable at compile time.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Philosophy Constraints
|
|
30
|
+
|
|
31
|
+
From `/Users/etienneb/CLAUDE.md` and `~/.workrail/daemon-soul.md`:
|
|
32
|
+
|
|
33
|
+
- **Exhaustiveness everywhere** -- `DaemonEvent` must remain a discriminated union; new variant must follow the pattern
|
|
34
|
+
- **Make illegal states unrepresentable** -- `kind` and `severity` on IssueReportedEvent must be literal unions, not strings
|
|
35
|
+
- **YAGNI with discipline** -- don't build IssueStore class until the coordinator needs it
|
|
36
|
+
- **Observability as a constraint** -- fire-and-forget writes must never block correctness
|
|
37
|
+
- **Document 'why' not 'what'** -- JSDoc on BASE_SYSTEM_PROMPT explaining why it's a constant
|
|
38
|
+
- **Immutability by default** -- all DaemonEvent interfaces use `readonly` fields
|
|
39
|
+
- **Functional core, imperative shell** -- buildSystemPrompt is pure; JSONL write is at the shell boundary
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Impact Surface
|
|
44
|
+
|
|
45
|
+
- `tests/unit/workflow-runner-system-prompt.test.ts` -- 3 assertions on preamble content that must pass after rewrite
|
|
46
|
+
- `tests/unit/daemon-events.test.ts` -- existing event tests; adding a new event kind must not break exhaustiveness checks
|
|
47
|
+
- Any future TypeScript code that does `switch (event.kind)` on `DaemonEvent` -- must handle `'issue_reported'` or get a compile error (desired)
|
|
48
|
+
- `runWorkflow()` tools array -- adding `makeReportIssueTool` here; `sessionId` and `emitter` are already in scope
|
|
49
|
+
- The `BASE_SYSTEM_PROMPT` constant mentions the tool by name -- documentation drift if tool is renamed later
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Candidates
|
|
54
|
+
|
|
55
|
+
### Part 1: New Preamble
|
|
56
|
+
|
|
57
|
+
#### Candidate A -- Inline replacement (no constant)
|
|
58
|
+
- **Summary:** Replace the 4-item preamble lines directly in the `lines` array inside `buildSystemPrompt()`.
|
|
59
|
+
- **Tensions resolved:** Existing tests still pass (required strings included). **Accepted:** Preamble not readable in isolation; harder to document with JSDoc.
|
|
60
|
+
- **Boundary:** Inside `buildSystemPrompt()`, same as today.
|
|
61
|
+
- **Failure mode:** If session state tag position shifts, tests break. Manageable.
|
|
62
|
+
- **Repo pattern:** Departs -- existing code has no named constant but the soul-template.ts extraction is a precedent.
|
|
63
|
+
- **Gains:** No indirection. **Losses:** Not visible as a document; can't be verified without calling buildSystemPrompt.
|
|
64
|
+
- **Scope:** Too narrow.
|
|
65
|
+
|
|
66
|
+
#### Candidate B -- Named `BASE_SYSTEM_PROMPT` constant (recommended)
|
|
67
|
+
- **Summary:** Extract the static preamble into `export const BASE_SYSTEM_PROMPT: string` defined above `buildSystemPrompt()`. The function uses it as the first element of the lines array. JSDoc explains why it's a constant vs inline.
|
|
68
|
+
- **Tensions resolved:** Preamble visible as a document; existing test assertions pass; honors 'Document why not what'.
|
|
69
|
+
- **Accepted:** Slight indirection; tool name drift risk (prose mentions `report_issue`).
|
|
70
|
+
- **Boundary:** Module-scoped constant, not exported (callers use `buildSystemPrompt`). Dynamic content (session state, soul, workspace) remains in the function.
|
|
71
|
+
- **Failure mode:** If a future author edits `BASE_SYSTEM_PROMPT` and removes `'## Your tools'` or `'## Execution contract'`, tests catch it immediately.
|
|
72
|
+
- **Repo pattern:** Follows soul-template.ts precedent (constant for stable content, function for dynamic assembly).
|
|
73
|
+
- **Gains:** Readable, documentable, testable in isolation. **Losses:** None significant.
|
|
74
|
+
- **Scope:** Best-fit.
|
|
75
|
+
- **Philosophy:** Honors 'Immutability by default' (const), 'Document why not what' (JSDoc), 'Determinism' (pure function unchanged).
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
### Part 2: report_issue Tool
|
|
80
|
+
|
|
81
|
+
#### Candidate A -- Inline tool factory with private async helper (recommended)
|
|
82
|
+
- **Summary:** `makeReportIssueTool(sessionId: string, emitter?: DaemonEventEmitter): AgentTool` -- custom inline JSON schema (no schemas param), `execute()` fires a void Promise for the JSONL write (same pattern as DaemonEventEmitter), emits `issue_reported` event, returns string confirmation. A module-level private `appendIssueAsync()` function handles the actual fs writes to keep execute() clean.
|
|
83
|
+
- **Tensions resolved:** Fire-and-forget (never blocks), type-safe kind/severity, YAGNI (no IssueStore class), natural tool-factory shape.
|
|
84
|
+
- **Accepted:** Less isolated unit-testability for the JSONL write path (no dirOverride). Extractable later.
|
|
85
|
+
- **Boundary:** Tool factory in `workflow-runner.ts` alongside make*Tool siblings. Private helper in same file.
|
|
86
|
+
- **Failure mode:** JSONL write fails silently. Accepted -- same contract as DaemonEventEmitter.
|
|
87
|
+
- **Repo pattern:** Follows tool factory pattern exactly (name, description, inputSchema, label, execute).
|
|
88
|
+
- **Gains:** Simple, correct, consistent with existing code. **Losses:** JSONL write not unit-testable in isolation today.
|
|
89
|
+
- **Scope:** Best-fit.
|
|
90
|
+
- **Philosophy:** Honors 'YAGNI with discipline', 'Exhaustiveness everywhere' (literal unions), 'Make illegal states unrepresentable', 'Observability as a constraint'.
|
|
91
|
+
|
|
92
|
+
#### Candidate B -- Dedicated `IssueStore` class (parallel to DaemonEventEmitter)
|
|
93
|
+
- **Summary:** Extract a class `IssueStore` with `append(sessionId, issue)` and `dirOverride` for tests. Injected into `makeReportIssueTool`. Separate file or same file.
|
|
94
|
+
- **Tensions resolved:** Full unit-testability with dirOverride, separation of concerns.
|
|
95
|
+
- **Accepted:** Over-engineering for current scope (coordinator doesn't exist). YAGNI violated.
|
|
96
|
+
- **Boundary:** New abstraction layer. Only one caller today.
|
|
97
|
+
- **Failure mode:** Premature abstraction; adds complexity with no current benefit.
|
|
98
|
+
- **Repo pattern:** Matches DaemonEventEmitter exactly -- but DaemonEventEmitter was built when multiple callers needed it immediately.
|
|
99
|
+
- **Gains:** Testable in isolation, clean interface for future coordinator. **Losses:** Speculative complexity today.
|
|
100
|
+
- **Scope:** Too broad (no coordinator in this PR).
|
|
101
|
+
- **Philosophy:** Conflicts with 'YAGNI with discipline'. Honors 'Prefer fakes over mocks'.
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Comparison and Recommendation
|
|
106
|
+
|
|
107
|
+
### Part 1
|
|
108
|
+
Candidate B (BASE_SYSTEM_PROMPT constant) dominates on every axis. Not a close call.
|
|
109
|
+
|
|
110
|
+
### Part 2
|
|
111
|
+
Candidate A (inline tool factory) wins on YAGNI. The only real loss is JSONL write isolation in tests -- acceptable since the write is purely observational (fire-and-forget), not correctness-affecting.
|
|
112
|
+
|
|
113
|
+
**If the auto-fix coordinator were being built in this PR**, Candidate B would be justified. It isn't.
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Self-Critique
|
|
118
|
+
|
|
119
|
+
### Part 1 strongest counter-argument
|
|
120
|
+
'The constant is 55 lines and clutters the file.' Counter: it's in a clearly labeled `## System prompt` section. A 55-line constant is readable. The alternative (55 lines of array items with string literals) is worse. Not a real objection.
|
|
121
|
+
|
|
122
|
+
### Part 2 strongest counter-argument
|
|
123
|
+
'DaemonEventEmitter is a class -- to be consistent, IssueStore should also be a class.' True in the abstract. But DaemonEventEmitter was built when the event stream was a first-class concern. Issue recording is secondary observability for a future feature. The consistency argument applies when there are multiple callers, not one.
|
|
124
|
+
|
|
125
|
+
### Pivot conditions
|
|
126
|
+
- **Part 2:** If the coordinator PR is planned for this sprint, extract IssueStore now to save rework later.
|
|
127
|
+
- **Part 1:** If the BASE_SYSTEM_PROMPT constant is exported and used in tests directly, add it to the test file's import.
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Open Questions for the Main Agent
|
|
132
|
+
|
|
133
|
+
1. Should `BASE_SYSTEM_PROMPT` be exported (for tests to import directly) or left module-private? Current test assertions only check the output of `buildSystemPrompt()`, so private is sufficient.
|
|
134
|
+
2. Should `IssueReportedEvent` include `continueToken` (for coordinator to resume)? The spec says optional -- include it as `readonly continueToken?: string`.
|
|
135
|
+
3. The new preamble's `## Your tools` section should list `report_issue` -- does it also list `Bash`, `Read`, `Write`, `continue_workflow`? Yes, for completeness.
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# WorkTrain: System Prompt Preamble + report_issue Tool -- Design Review Findings
|
|
2
|
+
|
|
3
|
+
## Tradeoff Review
|
|
4
|
+
|
|
5
|
+
| Tradeoff | Assessment |
|
|
6
|
+
|---|---|
|
|
7
|
+
| Tool name drift: preamble prose mentions `report_issue` | Acceptable. Prose is guidance, not tool registration. Agent uses the tools array. |
|
|
8
|
+
| JSONL write not unit-testable in isolation | **Resolved** by adding `issuesDirOverride?: string` to `makeReportIssueTool`. |
|
|
9
|
+
| No IssueStore class (YAGNI) | Acceptable. Coordinator doesn't exist in this PR. Extractable later. |
|
|
10
|
+
| Fatal severity doesn't abort the loop | Accepted limitation. Tool returns instructional text. Agent is told to stop. Recording the issue is the priority. |
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Failure Mode Review
|
|
15
|
+
|
|
16
|
+
| Failure Mode | Handling | Risk |
|
|
17
|
+
|---|---|---|
|
|
18
|
+
| Missing issues dir | `mkdir recursive` before appendFile (DaemonEventEmitter pattern) | Low |
|
|
19
|
+
| Preamble loses required test strings | Caught immediately by existing vitest assertions | Low (self-healing) |
|
|
20
|
+
| Agent ignores fatal severity and continues | Instructional return value. Can't force stop from inside tool without violating 'errors are data'. | Medium (accepted) |
|
|
21
|
+
| sessionId is process-local UUID not server ID | Consistent with all other tools in this file. Coordinator correlates via sessionId. | Low |
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Runner-Up / Simpler Alternative Review
|
|
26
|
+
|
|
27
|
+
**Part 1:** Inline preamble (runner-up) offers no advantages over the named constant. Candidate B dominates.
|
|
28
|
+
|
|
29
|
+
**Part 2:** IssueStore class (runner-up) offers `dirOverride` for isolated testing. A hybrid was adopted: add `issuesDirOverride?: string` parameter to `makeReportIssueTool` without extracting a full class. This resolves the testability weakness at negligible complexity cost.
|
|
30
|
+
|
|
31
|
+
**Simpler alternative (no dirOverride):** Technically satisfies acceptance criteria. Rejected because it violates 'prefer fakes over mocks' and departs from the established DaemonEventEmitter precedent for no benefit.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Philosophy Alignment
|
|
36
|
+
|
|
37
|
+
| Principle | Status |
|
|
38
|
+
|---|---|
|
|
39
|
+
| Immutability by default | Satisfied: `const BASE_SYSTEM_PROMPT`; readonly IssueReportedEvent fields |
|
|
40
|
+
| Make illegal states unrepresentable | Satisfied: `kind` and `severity` are literal union types |
|
|
41
|
+
| Exhaustiveness everywhere | Satisfied: new DaemonEvent variant triggers TS compile error on unhandled switch |
|
|
42
|
+
| Errors are data | Satisfied: report_issue returns string confirmation, never throws |
|
|
43
|
+
| YAGNI with discipline | Satisfied: no IssueStore class |
|
|
44
|
+
| Observability as a constraint | Satisfied: fire-and-forget, never blocks correctness |
|
|
45
|
+
| Prefer fakes over mocks | Satisfied (after hybrid): issuesDirOverride allows temp-dir testing |
|
|
46
|
+
| Document why not what | Satisfied: JSDoc on BASE_SYSTEM_PROMPT constant; WHY comments on void+catch |
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Findings
|
|
51
|
+
|
|
52
|
+
### Yellow (low-risk, note for implementer)
|
|
53
|
+
|
|
54
|
+
**Y1:** `BASE_SYSTEM_PROMPT` must include `'You are WorkRail Auto'`, `'## Your tools'`, and `'## Execution contract'` to preserve existing test assertions. The implementer must verify these strings survive the rewrite verbatim.
|
|
55
|
+
|
|
56
|
+
**Y2:** The new preamble's `## Your tools` section should list all 5 tools: `continue_workflow`, `Bash`, `Read`, `Write`, `report_issue`. If any tool is added/removed in the future, the preamble prose becomes stale. Accepted documentation-drift risk.
|
|
57
|
+
|
|
58
|
+
**Y3:** The `issuesDirOverride` parameter changes the signature of `makeReportIssueTool`. Wire it through `runWorkflow()` with `undefined` (production path). A test for the file write path should be added to `tests/unit/`.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Recommended Revisions
|
|
63
|
+
|
|
64
|
+
1. **Add `issuesDirOverride?: string` to `makeReportIssueTool`** -- adopt the hybrid over inline-only. (Already decided above.)
|
|
65
|
+
2. **Use `void appendIssueAsync(...).catch(() => {})` idiom** -- identical to DaemonEventEmitter._append; don't await.
|
|
66
|
+
3. **IssueReportedEvent fields:** Use `readonly` on all fields; `issueKind` not `kind` for the payload (since `kind` is the discriminant `'issue_reported'`). Wait -- looking at the spec: the tool's input schema field is `kind` (the issue type), but the event discriminant is also `kind: 'issue_reported'`. Rename the payload field to `issueKind` in `IssueReportedEvent` to avoid shadowing, while keeping the JSON input schema field as `kind` (per spec).
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Residual Concerns
|
|
71
|
+
|
|
72
|
+
- **Fatal severity + agent continuation:** The design cannot enforce stopping. This is a known limitation of the instruction-based approach. The coordinator will need to detect "fatal issue recorded, then agent continued for N more steps" and flag it. Out of scope for this PR.
|
|
73
|
+
- **Issue file format:** The spec says 'JSON line' but doesn't specify the schema. The implementer should include all input fields plus `ts: Date.now()` and `sessionId` for correlation -- consistent with how daemon events are written.
|
package/docs/ideas/backlog.md
CHANGED
|
@@ -4029,3 +4029,364 @@ worktrain logs --format json # machine-readable for scripts
|
|
|
4029
4029
|
3. `worktrain logs` CLI commands (reads files, correlates by sessionId)
|
|
4030
4030
|
4. SSE extension in DaemonConsole for live event streaming
|
|
4031
4031
|
5. Coordinator script subscription to event streams (replaces polling session store)
|
|
4032
|
+
|
|
4033
|
+
---
|
|
4034
|
+
|
|
4035
|
+
### Subagent context packaging: the main agent assumes too much (Apr 17, 2026)
|
|
4036
|
+
|
|
4037
|
+
**The problem:** When a main agent spawns a subagent, the work package it creates is usually too thin. The main agent has rich context from the full conversation -- why this task matters, what was already tried, what constraints were discovered -- but it packages the subagent task as if that context is shared. The subagent gets a one-liner and has to rediscover everything from scratch.
|
|
4038
|
+
|
|
4039
|
+
This is the same problem as a developer handing a junior a vague JIRA ticket instead of a proper brief. The subagent wastes tokens re-deriving what the main agent already knows, or worse, makes wrong assumptions.
|
|
4040
|
+
|
|
4041
|
+
**Where this manifests:**
|
|
4042
|
+
- Coding task subagents that don't know why a specific approach was chosen
|
|
4043
|
+
- MR review subagents that don't know what invariants matter for this codebase
|
|
4044
|
+
- Discovery subagents that re-read files the main agent just read
|
|
4045
|
+
- Fix subagents that don't know what was already tried and failed
|
|
4046
|
+
|
|
4047
|
+
**Three solution directions:**
|
|
4048
|
+
|
|
4049
|
+
**Option A: Better instructions to the main agent (prompt engineering)**
|
|
4050
|
+
Add explicit guidance to the WorkTrain system prompt: "When spawning a subagent, include: (1) what you already know that the subagent won't, (2) what was already tried, (3) why this specific approach was chosen, (4) what constraints or invariants matter, (5) what 'done' looks like." This is the cheapest fix but depends on the main agent reliably following it.
|
|
4051
|
+
|
|
4052
|
+
**Option B: Platform-assisted package creation (structured)**
|
|
4053
|
+
The `worktrain spawn` command (or the `spawn_session` tool) takes a structured work package:
|
|
4054
|
+
```typescript
|
|
4055
|
+
spawnSession({
|
|
4056
|
+
workflowId: 'coding-task-workflow-agentic',
|
|
4057
|
+
goal: '...',
|
|
4058
|
+
context: {
|
|
4059
|
+
whyThisApproach: '...', // what the main agent knows about the decision
|
|
4060
|
+
alreadyTried: [...], // what failed
|
|
4061
|
+
knownConstraints: [...], // invariants the subagent must respect
|
|
4062
|
+
relevantFiles: [...], // files the main agent already read
|
|
4063
|
+
completionCriteria: '...' // what done actually looks like
|
|
4064
|
+
}
|
|
4065
|
+
})
|
|
4066
|
+
```
|
|
4067
|
+
The platform validates that the package is complete before spawning -- missing fields emit a warning or block the spawn. The subagent's system prompt is enriched with this context automatically, without the main agent having to think about how to format it.
|
|
4068
|
+
|
|
4069
|
+
**Option C: Platform-mediated context transfer (autonomous)**
|
|
4070
|
+
The platform automatically packages context from the spawning session into the child session. When the main agent calls `spawn_session`, the platform reads the current session's step notes and recent advances, synthesizes a context bundle, and injects it into the child's system prompt. No explicit packaging required from the main agent.
|
|
4071
|
+
|
|
4072
|
+
This is the most powerful but also the most complex -- requires the platform to understand what's relevant, not just what's recent.
|
|
4073
|
+
|
|
4074
|
+
**Recommended approach: B + A**
|
|
4075
|
+
Option B (structured work package with validation) as the primary mechanism. Option A (better main agent instructions) as a fallback. Option C as a long-term goal once the knowledge graph and session event stream are queryable enough to synthesize context automatically.
|
|
4076
|
+
|
|
4077
|
+
**The `context` field in the structured package is the key addition.** Today `worktrain spawn` takes `goal`, `workflowId`, `workspacePath`. Adding a structured `context` object that the platform validates and injects gives subagents the brief they need without depending on the main agent to remember to include it.
|
|
4078
|
+
|
|
4079
|
+
**Connection to knowledge graph:** Once the structural knowledge graph is built, `relevantFiles` can be auto-populated from a graph query rather than requiring the main agent to list them. The platform asks "what files are relevant to this goal?" and includes them automatically. This is how the context packaging problem gets solved at scale -- the platform knows what the subagent needs without the main agent having to enumerate it.
|
|
4080
|
+
|
|
4081
|
+
**Session knowledge log (extends Option B):**
|
|
4082
|
+
As the main agent progresses, it continuously appends to a structured `session-knowledge.jsonl` for the session. Not step notes (those are workflow artifacts) -- this is a running record of things that would matter to any agent picking up this work:
|
|
4083
|
+
|
|
4084
|
+
```jsonl
|
|
4085
|
+
{"kind":"decision","summary":"Using execFile not exec for all subprocess calls","reason":"Shell injection risk with user-controlled content","ts":1234567890}
|
|
4086
|
+
{"kind":"user_pushback","summary":"User rejected the polling approach","detail":"Wants webhook-based solution instead","ts":...}
|
|
4087
|
+
{"kind":"relevant_file","path":"src/trigger/trigger-router.ts","why":"Core routing logic, all trigger changes flow through here","ts":...}
|
|
4088
|
+
{"kind":"constraint","summary":"Never modify triggers.yml autonomously","source":"daemon-soul.md","ts":...}
|
|
4089
|
+
{"kind":"tried_and_failed","summary":"Tried npx approach, got version mismatch","detail":"Local build is different from installed package","ts":...}
|
|
4090
|
+
{"kind":"external_ref","url":"https://github.com/...","why":"Design doc for the delivery pattern","ts":...}
|
|
4091
|
+
{"kind":"plan","path":"implementation_plan.md","summary":"3-slice plan for the feature","ts":...}
|
|
4092
|
+
```
|
|
4093
|
+
|
|
4094
|
+
When spawning a subagent, the platform automatically includes the session knowledge log in the work package. The subagent gets the full brief without the main agent having to reconstruct it.
|
|
4095
|
+
|
|
4096
|
+
**Blank subagents (intentionally uncontextualized):**
|
|
4097
|
+
Sometimes you explicitly DON'T want context from the main session -- fresh eyes are the point. A hypothesis challenge subagent should challenge the leading hypothesis, not be anchored to it. An adversarial reviewer should find problems without knowing the main agent thinks the approach is sound.
|
|
4098
|
+
|
|
4099
|
+
The `spawn_session` call should have an explicit `context: 'inherit' | 'blank' | 'custom'` field:
|
|
4100
|
+
- `inherit` -- auto-package from session knowledge log (default for most tasks)
|
|
4101
|
+
- `blank` -- no session context injected, subagent starts fresh (for adversarial roles)
|
|
4102
|
+
- `custom` -- explicit structured package (for precise control)
|
|
4103
|
+
|
|
4104
|
+
**Subagent types with specialized system prompts and tools:**
|
|
4105
|
+
|
|
4106
|
+
Different tasks need different cognitive profiles. A subagent type bundles: system prompt, available tools, and context mode:
|
|
4107
|
+
|
|
4108
|
+
| Type | System prompt focus | Tools | Context |
|
|
4109
|
+
|------|---------------------|-------|---------|
|
|
4110
|
+
| `researcher` | Thorough, neutral, evidence-first | Read, Bash (read-only), Glob, Grep | inherit |
|
|
4111
|
+
| `challenger` | Adversarial, finds holes, challenges assumptions | Read, Bash | blank (intentionally unanchored) |
|
|
4112
|
+
| `implementer` | Precise, follows plans, no improvisation | Read, Write, Bash, continue_workflow | inherit |
|
|
4113
|
+
| `reviewer` | Finds bugs, security issues, philosophy violations | Read, Bash | blank |
|
|
4114
|
+
| `verifier` | Confirms claims with evidence, runs commands | Read, Bash | inherit |
|
|
4115
|
+
| `coordinator` | Routes work, reads event streams, dispatches | worktrain_spawn, worktrain_await | inherit |
|
|
4116
|
+
|
|
4117
|
+
The type determines the system prompt variant, not just the tools. A `challenger` gets a system prompt that explicitly says "your job is to find problems, not solve them -- do not offer solutions." A `verifier` gets "do not trust claims without running the commands yourself."
|
|
4118
|
+
|
|
4119
|
+
This is the WorkTrain equivalent of cognitive specialization -- different agents for different modes of thought, not just different tasks. The workflow step can specify which subagent type to spawn: `spawn_session({ type: 'challenger', goal: '...' })`.
|
|
4120
|
+
|
|
4121
|
+
---
|
|
4122
|
+
|
|
4123
|
+
### Workflow-scoped system prompts for subagents (Apr 17, 2026)
|
|
4124
|
+
|
|
4125
|
+
**The idea:** Workflows (and individual steps within them) can declare a `systemPrompt` field that gets injected into subagent sessions spawned by that workflow step. The workflow author encodes the cognitive mode directly rather than describing it in step prose that the agent has to interpret.
|
|
4126
|
+
|
|
4127
|
+
**Why this is the right layer:**
|
|
4128
|
+
The workflow already controls: what steps run, what tools are available, what the output contract is, what assessments are required. The cognitive mode -- how the agent should think -- is a natural extension of that. A workflow that says "run as adversarial challenger" should be able to enforce that at the platform level, not just suggest it in a prompt.
|
|
4129
|
+
|
|
4130
|
+
**Two levels:**
|
|
4131
|
+
|
|
4132
|
+
**1. Workflow-level `systemPrompt`** -- applies to all subagents spawned by this workflow:
|
|
4133
|
+
```json
|
|
4134
|
+
{
|
|
4135
|
+
"id": "mr-review-workflow.agentic.v2",
|
|
4136
|
+
"systemPrompt": "You are an adversarial code reviewer. Your job is to find problems, not validate the approach. Do not offer solutions -- only surface issues with evidence. Treat every claim as unproven until you verify it yourself.",
|
|
4137
|
+
"steps": [...]
|
|
4138
|
+
}
|
|
4139
|
+
```
|
|
4140
|
+
|
|
4141
|
+
**2. Step-level `systemPrompt`** -- overrides the workflow-level prompt for a specific step:
|
|
4142
|
+
```json
|
|
4143
|
+
{
|
|
4144
|
+
"id": "phase-hypothesis-challenge",
|
|
4145
|
+
"systemPrompt": "You are a devil's advocate. For every assumption in the hypothesis, find the strongest counterargument. Do not be balanced -- be adversarial.",
|
|
4146
|
+
"prompt": "Challenge the leading hypothesis..."
|
|
4147
|
+
}
|
|
4148
|
+
```
|
|
4149
|
+
|
|
4150
|
+
**How it composes with the base system prompt:**
|
|
4151
|
+
The final subagent system prompt is assembled in layers:
|
|
4152
|
+
1. WorkTrain base prompt (execution contract, oracle priority, tools)
|
|
4153
|
+
2. Workflow-level `systemPrompt` (cognitive mode for this workflow)
|
|
4154
|
+
3. Step-level `systemPrompt` (cognitive override for this step)
|
|
4155
|
+
4. Soul file (operator behavioral rules)
|
|
4156
|
+
5. AGENTS.md / workspace context
|
|
4157
|
+
6. Session knowledge log (inherited context, if `context: 'inherit'`)
|
|
4158
|
+
7. Step prompt (the actual work instruction)
|
|
4159
|
+
|
|
4160
|
+
The workflow author controls layers 2-3. The operator controls layer 4. The platform assembles 1 and 5-7 automatically. Clear separation of concerns.
|
|
4161
|
+
|
|
4162
|
+
**This also enables the subagent type system** (from the previous backlog entry) to be workflow-driven rather than call-site-driven. Instead of `spawn_session({ type: 'challenger' })`, the workflow step that spawns a challenger simply declares `systemPrompt: "you are adversarial..."` -- the cognitive mode travels with the workflow definition, not the spawn call.
|
|
4163
|
+
|
|
4164
|
+
**Schema addition:**
|
|
4165
|
+
```typescript
|
|
4166
|
+
interface WorkflowDefinition {
|
|
4167
|
+
systemPrompt?: string; // workflow-level, injected into all subagent sessions
|
|
4168
|
+
steps: WorkflowStep[];
|
|
4169
|
+
}
|
|
4170
|
+
|
|
4171
|
+
interface WorkflowStep {
|
|
4172
|
+
systemPrompt?: string; // step-level, overrides workflow-level for this step
|
|
4173
|
+
prompt: string;
|
|
4174
|
+
// ...existing fields
|
|
4175
|
+
}
|
|
4176
|
+
```
|
|
4177
|
+
|
|
4178
|
+
**Authoring implication:** The `workflow-for-workflows` meta-workflow should guide authors to write cognitive mode as `systemPrompt` rather than embedding it in `prompt` prose. "What mode should the agent be in?" is a structural question, not a content question.
|
|
4179
|
+
|
|
4180
|
+
---
|
|
4181
|
+
|
|
4182
|
+
### Console as the unified WorkRail dashboard -- standalone, file-reading, zero coupling (Apr 18, 2026)
|
|
4183
|
+
|
|
4184
|
+
**The insight:** The console is the unified view of all WorkRail activity -- whether sessions were started by the autonomous daemon or by a human working interactively through the MCP server. It doesn't care how a session was created. It reads the same session store either way.
|
|
4185
|
+
|
|
4186
|
+
The console doesn't need a live connection to either the daemon or the MCP server. It reads files. The current architecture where the console is owned by whichever process wins a port election is wrong -- it's a legacy of when the MCP server was the only long-running process.
|
|
4187
|
+
|
|
4188
|
+
**Target architecture -- zero coupling:**
|
|
4189
|
+
|
|
4190
|
+
```
|
|
4191
|
+
Daemon → writes ~/.workrail/data/sessions/
|
|
4192
|
+
→ writes ~/.workrail/events/daemon/
|
|
4193
|
+
→ serves :3200 (webhooks only)
|
|
4194
|
+
|
|
4195
|
+
MCP server → reads/writes session store (same files as daemon)
|
|
4196
|
+
→ serves :3100 (Claude Code bridge only)
|
|
4197
|
+
|
|
4198
|
+
Console → reads ~/.workrail/data/sessions/ (file watch, not HTTP)
|
|
4199
|
+
→ reads ~/.workrail/events/daemon/ (file watch)
|
|
4200
|
+
→ reads git for PR/commit context
|
|
4201
|
+
→ serves :3456 (browser UI only)
|
|
4202
|
+
→ `worktrain console` -- fully standalone binary
|
|
4203
|
+
```
|
|
4204
|
+
|
|
4205
|
+
**No startup coordination. No lock files. No port election. No coupling.**
|
|
4206
|
+
|
|
4207
|
+
The console works whether the daemon is running or not, whether the MCP server is running or not. Start it once, leave it running permanently. It shows whatever is in the files.
|
|
4208
|
+
|
|
4209
|
+
**How it gets live updates without HTTP:** FSEvents (macOS) / inotify (Linux) file watching on the session store and daemon event stream. When a new event is appended, the console picks it up within milliseconds and pushes to the browser via SSE -- same latency as today, no polling, no HTTP connection to the daemon required.
|
|
4210
|
+
|
|
4211
|
+
**The `worktrain console` command:**
|
|
4212
|
+
```bash
|
|
4213
|
+
worktrain console # start on default port 3456
|
|
4214
|
+
worktrain console --port 4000 # custom port
|
|
4215
|
+
worktrain console --workspace ~/git/myproject # workspace-scoped view
|
|
4216
|
+
```
|
|
4217
|
+
|
|
4218
|
+
**Migration:** Remove console startup from both the daemon command and the MCP server startup. The primary election logic (`DashboardLock`, `bindWithPortFallback`) becomes unnecessary. The `DaemonConsole` module in `src/trigger/daemon-console.ts` becomes `src/console/standalone-console.ts` with a simpler interface.
|
|
4219
|
+
|
|
4220
|
+
**Why this matters:** Today the console goes down whenever the MCP server crashes. With this architecture, the console is as stable as the filesystem. The daemon crashing doesn't affect the console. The MCP server crashing doesn't affect the console. The only thing that can take down the console is killing the `worktrain console` process itself.
|
|
4221
|
+
|
|
4222
|
+
---
|
|
4223
|
+
|
|
4224
|
+
## WorkTrain sprint: Apr 17-18, 2026 -- shipped and current state
|
|
4225
|
+
|
|
4226
|
+
### What shipped (Apr 17-18)
|
|
4227
|
+
|
|
4228
|
+
**Daemon stabilization:**
|
|
4229
|
+
- ✅ `report_issue` tool -- agents call this instead of dying silently; structured JSON written to `~/.workrail/issues/<sessionId>.jsonl`, event emitted to daemon stream, WORKTRAIN_STUCK marker in `WorkflowRunResult`
|
|
4230
|
+
- ✅ Richer `BASE_SYSTEM_PROMPT` -- baked-in behavioral principles (oracle hierarchy, self-directed reasoning, workflow-as-contract, silent failure policy) rather than relying on soul file alone
|
|
4231
|
+
- ✅ `/bin/bash` for Bash tool -- process substitution `<(...)` and other bash-specific syntax now works
|
|
4232
|
+
- ✅ `DaemonEventEmitter` -- structured event stream at `~/.workrail/events/daemon/YYYY-MM-DD.jsonl`
|
|
4233
|
+
- ✅ Self-configuration -- `triggers.yml`, upgraded `daemon-soul.md` (WorkRail-specific rules + coding philosophy), `AGENTS.md` WorkTrain section
|
|
4234
|
+
|
|
4235
|
+
**Workflow library:**
|
|
4236
|
+
- ✅ mr-review v2.6 -- `philosophy_alignment` reviewer family; scoped philosophy extraction in fact packet; 7th coverage domain; "is this the right design?" framing
|
|
4237
|
+
- ✅ wfw v2.5 -- phases 2 and 3 split into dedicated prep-step design steps (2a/2b, 3a/3b); principle: assessments need dedicated prep steps, not on-the-fly evidence gathering
|
|
4238
|
+
- ✅ Clean workflow display names across library (removed `v2 •`, `Lean •`, etc.)
|
|
4239
|
+
- ✅ `philosophy.mdc` created at `~/.firebender/commands/philosophy.mdc` -- MR review subagents now evaluate findings against coding philosophy
|
|
4240
|
+
|
|
4241
|
+
**Integrations and infrastructure:**
|
|
4242
|
+
- ✅ GitLab polling triggers fully merged (#404) -- zero-webhook MR polling
|
|
4243
|
+
- ✅ TS6 forward-compat tsconfig fixes (#401) -- unblocks TypeScript 6 dep bumps
|
|
4244
|
+
- ✅ Standalone console spec -- `worktrain console` as independent file-reading binary, zero coupling to daemon or MCP server
|
|
4245
|
+
|
|
4246
|
+
---
|
|
4247
|
+
|
|
4248
|
+
### Current state (Apr 18, 2026)
|
|
4249
|
+
|
|
4250
|
+
**What works:**
|
|
4251
|
+
- Daemon runs autonomously on webhook triggers
|
|
4252
|
+
- Sessions advance through full workflow steps
|
|
4253
|
+
- Console at `:3456` when daemon starts before MCP server
|
|
4254
|
+
- Daemon event stream logging every tool call
|
|
4255
|
+
- GitLab + GitHub polling (no webhooks needed)
|
|
4256
|
+
- Philosophy-aligned MR reviews
|
|
4257
|
+
- `report_issue` tool available to agents
|
|
4258
|
+
|
|
4259
|
+
**Known issues / active bugs:**
|
|
4260
|
+
|
|
4261
|
+
1. **Daemon killed by MCP server reconnects** (CRITICAL) -- the daemon and MCP server share process infrastructure via the bridge mechanism. When Claude Code reconnects and a new MCP server process starts, it displaces the running daemon. The daemon must be run from a separate terminal or as a `launchd` service to survive MCP reconnects. Root fix: decouple daemon from the MCP server process tree entirely.
|
|
4262
|
+
|
|
4263
|
+
2. **Console unstable** -- the console port (3456) is contested between daemon and MCP server. Whoever starts first wins. When the MCP server reconnects, it takes the port and the daemon console goes down. Root fix: standalone `worktrain console` binary (spec in backlog).
|
|
4264
|
+
|
|
4265
|
+
3. **`workflow_not_found` on first test** -- trigger used `coding-task-workflow-agentic.lean.v2` (filename) instead of `coding-task-workflow-agentic` (workflow ID). Fixed in triggers.yml. Symptom of workflow ID vs filename confusion -- worth a validator that catches this at `worktrain daemon` startup.
|
|
4266
|
+
|
|
4267
|
+
4. **Session advances 0 when daemon crashes** -- if daemon dies mid-Phase-0 (before any `continue_workflow` call), the session is orphaned at `observation_recorded(8)` with 0 advances and no output. No automatic recovery. Crash recovery reads the daemon-session token file but can't resume a session that never advanced. No fix yet.
|
|
4268
|
+
|
|
4269
|
+
---
|
|
4270
|
+
|
|
4271
|
+
### Next priorities (groomed Apr 18)
|
|
4272
|
+
|
|
4273
|
+
**Tier 1 -- Must fix for reliable autonomous operation:**
|
|
4274
|
+
1. **Daemon as a launchd service** -- run daemon outside Claude Code's process tree so MCP reconnects can't kill it. `worktrain daemon --install` creates a launchd plist and starts it.
|
|
4275
|
+
2. **Standalone `worktrain console`** -- file-watching binary independent of daemon/MCP. Zero coupling. Spec in backlog.
|
|
4276
|
+
3. **Workflow ID validation at startup** -- `workrail daemon` should validate that all `workflowId` values in triggers.yml resolve to real workflows before starting, not fail silently at dispatch time.
|
|
4277
|
+
|
|
4278
|
+
**Tier 2 -- Workflow quality:**
|
|
4279
|
+
4. **mr-review prep steps** -- the audit identified missing dedicated prep steps for philosophy extraction, pattern baseline, and design decision reconstruction. These are described in the backlog but not yet in the workflow JSON. wfw v2.5 guides new workflows to add them; the mr-review workflow itself still needs a v2.7 pass to implement them.
|
|
4280
|
+
5. **Autonomous workflow variants** -- audit `requireConfirmation` gates across all workflows; confirm daemon's `autonomy: full` setting correctly bypasses the right ones.
|
|
4281
|
+
|
|
4282
|
+
**Tier 3 -- Features:**
|
|
4283
|
+
6. **`worktrain spawn` / `worktrain await`** -- already merged, needs real-world test
|
|
4284
|
+
7. **Auto-commit from handoff artifact** -- merged but untested end-to-end
|
|
4285
|
+
8. **Session knowledge log** -- continuous context accumulation for subagent packaging
|
|
4286
|
+
9. **TypeScript 6 dep bump** -- tsconfig fixes are in (#401), unblocks #244 and #231
|
|
4287
|
+
|
|
4288
|
+
**Open PRs (only dep bumps remain):**
|
|
4289
|
+
- #330, #287, #288 -- vitest 4 + vite 8 (major version, needs testing)
|
|
4290
|
+
- #244, #231 -- TypeScript 6.0.2 (now unblocked by #401)
|
|
4291
|
+
|
|
4292
|
+
---
|
|
4293
|
+
|
|
4294
|
+
### Duplicate task detection: prevent agents from doing the same work twice (Apr 18, 2026)
|
|
4295
|
+
|
|
4296
|
+
**The problem:** with multiple agents running concurrently and a persistent work queue, it's easy to accidentally start two agents on the same task -- especially when the queue drains items from external sources (GitHub issues, Jira) that may be added again after a sync. Today, two agents can independently pick up the same issue, do the same investigation, and open duplicate PRs.
|
|
4297
|
+
|
|
4298
|
+
**Detection sources:**
|
|
4299
|
+
1. **Open PRs**: before starting any coding task, check `gh pr list --state open` -- if a PR already exists that addresses the same issue/goal, skip it
|
|
4300
|
+
2. **Active sessions**: the session store knows which workflows are currently running and what their goals are; a new dispatch can check for semantic overlap before starting
|
|
4301
|
+
3. **Queue deduplication**: the work queue should deduplicate by external item ID (GitHub issue number, Jira ticket key) so the same item can't be enqueued twice
|
|
4302
|
+
4. **Session history**: before starting an investigation, check recent session notes for the same workflowId + goal combination -- if it was completed in the last 24 hours with a successful result, skip or ask the user
|
|
4303
|
+
|
|
4304
|
+
**Implementation approach:**
|
|
4305
|
+
- Queue-level dedup is the simplest and most reliable: each queue item from an external source carries its `sourceId` (e.g. `github:EtienneBBeaulac/workrail:issues:123`). On enqueue, check if `sourceId` already exists in the queue (pending or active) -- if so, skip with a log.
|
|
4306
|
+
- PR-level dedup: before `worktrain spawn` dispatches a coding task, run `gh pr list --search "<issue title keywords>"` and check for matches. If found, add to outbox ("task already in progress as PR #X") and skip.
|
|
4307
|
+
- Session-level dedup: the coordinator script checks active session goals before spawning a new one with the same goal text.
|
|
4308
|
+
|
|
4309
|
+
**The classify-task-workflow role:** when a task is classified, it can also output a `deduplicationKey` (e.g. `fix:trigger-store:error-kind-consistency`) that is stored with the queue item. Queue items with the same key are considered duplicates.
|
|
4310
|
+
|
|
4311
|
+
**What makes this hard:** semantic dedup (two tasks described differently but solving the same problem) requires embedding-based similarity, not exact match. For MVP, exact `sourceId` match + approximate PR title search is sufficient. Semantic dedup is a post-knowledge-graph feature.
|
|
4312
|
+
|
|
4313
|
+
---
|
|
4314
|
+
|
|
4315
|
+
### Agent actions as first-class events in the session event log (Apr 18, 2026)
|
|
4316
|
+
|
|
4317
|
+
**The vision:** the console should be able to reconstruct exactly what an agent did in a session -- every tool call, every argument, every result, every decision -- by reading the event log alone. No log files, no stdout parsing, no separate monitoring infrastructure. The session event store IS the audit trail.
|
|
4318
|
+
|
|
4319
|
+
**What's already in the event log:**
|
|
4320
|
+
- `session_created`, `run_started`, `run_completed`
|
|
4321
|
+
- `node_created`, `edge_created`, `advance_recorded`
|
|
4322
|
+
- `node_output_appended` (step notes)
|
|
4323
|
+
- `preferences_changed`, `context_set`, `observation_recorded`
|
|
4324
|
+
|
|
4325
|
+
**What's missing -- agent-level actions:**
|
|
4326
|
+
- `tool_call_started` -- which tool was called, with what arguments, at what timestamp
|
|
4327
|
+
- `tool_call_completed` -- result (truncated), duration, success/error
|
|
4328
|
+
- `llm_turn_started` -- model, token count estimate, step context
|
|
4329
|
+
- `llm_turn_completed` -- stop reason, output tokens, whether steer() was injected
|
|
4330
|
+
- `steer_injected` -- what context was injected and why (session recap, workspace context)
|
|
4331
|
+
- `report_issue_recorded` -- the structured issue from the `report_issue` tool
|
|
4332
|
+
- `worktrain_stuck` -- when WORKTRAIN_STUCK marker is emitted
|
|
4333
|
+
|
|
4334
|
+
**Why this matters:**
|
|
4335
|
+
Today the `DaemonEventEmitter` writes to `~/.workrail/events/daemon/YYYY-MM-DD.jsonl` separately from the session store. That's two places to look -- and they're not correlated to specific sessions. Putting agent actions into the session event log means:
|
|
4336
|
+
- Console can show a session timeline: "Phase 0: called `bash` 3 times (12ms, 8ms, 45ms) → called `read` 2 times → advanced to Phase 1"
|
|
4337
|
+
- The proof record (verification chain spec) can link specific tool calls to assessment gate evidence
|
|
4338
|
+
- Crash recovery knows exactly where in the agent's execution it died
|
|
4339
|
+
- The knowledge graph can be updated from session events without re-reading step notes
|
|
4340
|
+
|
|
4341
|
+
**The event schema (additions to the existing event store format):**
|
|
4342
|
+
|
|
4343
|
+
```typescript
|
|
4344
|
+
// Tool call lifecycle
|
|
4345
|
+
{ kind: 'tool_call_started', tool: 'bash', args: { command: 'git status' }, nodeId, ts }
|
|
4346
|
+
{ kind: 'tool_call_completed', tool: 'bash', durationMs: 45, exitCode: 0, resultSummary: '...', nodeId, ts }
|
|
4347
|
+
{ kind: 'tool_call_failed', tool: 'bash', durationMs: 45, error: 'ENOENT', nodeId, ts }
|
|
4348
|
+
|
|
4349
|
+
// LLM turn lifecycle
|
|
4350
|
+
{ kind: 'llm_turn_started', model: 'claude-sonnet-4-6', inputTokens: 12000, nodeId, ts }
|
|
4351
|
+
{ kind: 'llm_turn_completed', stopReason: 'tool_use', outputTokens: 450, toolsRequested: ['bash'], nodeId, ts }
|
|
4352
|
+
|
|
4353
|
+
// Steer injection
|
|
4354
|
+
{ kind: 'steer_injected', reason: 'session_recap', contentLength: 800, nodeId, ts }
|
|
4355
|
+
|
|
4356
|
+
// Agent self-reporting
|
|
4357
|
+
{ kind: 'report_issue_recorded', severity: 'warning', summary: '...', sessionId, ts }
|
|
4358
|
+
```
|
|
4359
|
+
|
|
4360
|
+
**Where to emit them:**
|
|
4361
|
+
- In `src/daemon/agent-loop.ts` -- before and after each `tool.execute()` call, before and after each LLM call
|
|
4362
|
+
- In `src/daemon/workflow-runner.ts` -- for steer injection and report_issue recording
|
|
4363
|
+
- Use the existing `V2ToolContext` session store to append events (same mechanism as `continue_workflow` and `start_workflow`)
|
|
4364
|
+
|
|
4365
|
+
**Console rendering:**
|
|
4366
|
+
Each session detail view gets a "Timeline" tab alongside "Steps" and "Notes":
|
|
4367
|
+
```
|
|
4368
|
+
Phase 0: Understand & Classify [2m 14s]
|
|
4369
|
+
├── llm_turn 450 tokens → 3 tool calls
|
|
4370
|
+
├── bash: git status 45ms ✓
|
|
4371
|
+
├── bash: gh pr list 180ms ✓
|
|
4372
|
+
├── read: AGENTS.md 8ms ✓
|
|
4373
|
+
└── llm_turn 280 tokens → advance
|
|
4374
|
+
Phase 1a: State Hypothesis [0m 38s]
|
|
4375
|
+
├── llm_turn 310 tokens → advance
|
|
4376
|
+
...
|
|
4377
|
+
```
|
|
4378
|
+
|
|
4379
|
+
**Relationship to DaemonEventEmitter:**
|
|
4380
|
+
The existing `DaemonEventEmitter` (written in #498) writes to a separate daily log file. Once agent actions are first-class session events, the daemon event emitter can be simplified or removed -- the session event log is the canonical record. The console reads session events, not daemon event files.
|
|
4381
|
+
|
|
4382
|
+
**Build order:**
|
|
4383
|
+
1. Add `tool_call_started`/`tool_call_completed` events to `agent-loop.ts` -- smallest change, highest value
|
|
4384
|
+
2. Add `llm_turn_started`/`llm_turn_completed` events
|
|
4385
|
+
3. Console Timeline tab reads and renders the new event kinds
|
|
4386
|
+
4. Wire `report_issue_recorded` and `steer_injected` events
|
|
4387
|
+
|
|
4388
|
+
---
|
|
4389
|
+
|
|
4390
|
+
### FatalToolError: distinguish recoverable from non-recoverable tool failures (follow-up from PR #523)
|
|
4391
|
+
The blanket try/catch in AgentLoop._executeTools() converts ALL tool throws to isError tool_results. This is correct for Bash/Read/Write (LLM can see and retry), but potentially wrong for continue_workflow failures (LLM retrying with a broken token loops). The discovery agent proposed a FatalToolError subclass: tools throw FatalToolError for non-recoverable errors (session corruption, bad tokens), plain Error for recoverable failures. _executeTools catches plain Error and returns isError; FatalToolError propagates and kills the session. Combined with the DEFAULT_MAX_TURNS cap (PR followup), this provides defense-in-depth.
|
|
4392
|
+
5. Deprecate `DaemonEventEmitter` once console reads from session events
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "architecture-scalability-audit",
|
|
3
|
-
"name": "Architecture Scalability Audit
|
|
3
|
+
"name": "Architecture Scalability Audit",
|
|
4
4
|
"version": "0.1.0",
|
|
5
5
|
"description": "Use this to audit a bounded codebase scope for architecture scalability. Declare which scalability dimensions matter (load, data volume, team size, feature extensibility, operational); the workflow investigates each and produces evidence-grounded findings.",
|
|
6
6
|
"about": "## Architecture Scalability Audit\n\nThis workflow audits a bounded codebase scope for scalability across the dimensions you care about. It does not produce generic \"won't scale\" warnings -- every finding must cite a specific file, class, method, or pattern, and every concern must name a concrete growth scenario (e.g. 10x traffic, 100x records, 3x team size).\n\n**What it does:**\nYou declare the scope boundary and the scalability dimensions that matter for your context. The workflow reads the codebase to understand the architecture, assigns one dedicated reviewer family per dimension, runs them in parallel from a shared fact packet, reconciles contradictions and blind spots through a synthesis loop, and delivers a per-dimension verdict (will_break / risk / fine) with an overall scalability readiness verdict.\n\n**The five scalability dimensions you can select:**\n- **load** -- handles more requests, users, or throughput\n- **data_volume** -- handles more records, storage, or query size\n- **team_org** -- more teams or developers working on this scope without friction\n- **feature_extensibility** -- more features added without rearchitecting\n- **operational** -- more deployments, environments, or operational complexity\n\n**When to use it:**\n- Before investing significantly in a component you expect to grow\n- When planning capacity for a new traffic tier or data volume increase\n- When evaluating a codebase acquired through a merger, partnership, or open-source adoption\n- When a team is growing and you want to know if the architecture will hold under parallel development\n\n**What it produces:**\nAn overall scalability verdict, per-dimension findings with specific code references and growth scenarios, cross-cutting concerns that span multiple dimensions, a prioritized concern list, and explicit callouts of what is already well-designed for scale.\n\n**How to get good results:**\nBe specific about the scope boundary -- name the service, module, or feature explicitly and say what is out of scope. Choose the dimensions relevant to your actual growth pressures; the workflow will not add dimensions you did not select. If you know a specific growth target (e.g. \"we expect 50x user growth in 18 months\"), mention it.",
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "bug-investigation-agentic",
|
|
3
|
-
"name": "Bug Investigation
|
|
3
|
+
"name": "Bug Investigation",
|
|
4
4
|
"version": "2.0.0",
|
|
5
5
|
"description": "Use this to diagnose a bug or unexpected behavior in code. Builds a hypothesis, gathers evidence, and proves or disproves the root cause before concluding.",
|
|
6
6
|
"about": "## Bug Investigation Workflow\n\nThis workflow guides an AI agent through a rigorous, evidence-driven investigation of a bug or unexpected behavior. It is designed to prevent the most common failure mode in AI debugging: jumping to a plausible-sounding conclusion without sufficient proof.\n\n**What it does:**\nThe workflow moves through triage, context gathering, hypothesis generation, evidence planning, iterative evidence collection, diagnosis validation, and a final handoff. It explicitly distinguishes between theories (formed by reading code) and proof (confirmed by running tests or reproducing the failure). The final output is a diagnosis with a confidence rating, the strongest alternative explanations that were ruled out, and a high-level fix direction -- not a patch.\n\n**When to use it:**\n- You have a specific bug report, failing test, or production incident to investigate\n- The root cause is not immediately obvious and multiple explanations are plausible\n- You want a trustworthy diagnosis before spending time writing a fix\n- The bug carries enough risk that you need to be confident before changing code\n\n**What it produces:**\nA structured investigation handoff covering: root cause type (single cause, multi-factor, working as designed, etc.), proof summary, ruled-out alternatives, residual uncertainty, likely files involved, and verification steps for whoever implements the fix.\n\n**How to get good results:**\nProvide repro steps, observed symptoms, and expected behavior upfront. Include any relevant logs, failing test commands, or environment details you already have. The more concrete the repro, the faster the workflow can gather real evidence rather than theorizing. If the bug is intermittent, say so -- the workflow adapts its rigor based on reproducibility confidence.",
|
|
@@ -57,7 +57,7 @@
|
|
|
57
57
|
"steps": [
|
|
58
58
|
{
|
|
59
59
|
"id": "phase-0-triage-and-intake",
|
|
60
|
-
"title": "Phase 0: Triage (Bug Intake
|
|
60
|
+
"title": "Phase 0: Triage (Bug Intake • Risk • Mode)",
|
|
61
61
|
"prompt": "Understand the bug report and choose the right rigor.\n\nCapture:\n- `bugSummary`: concise statement of the issue\n- `reproSummary`: repro steps, symptoms, expected behavior, environment notes\n- `investigationComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `maxParallelism`: 0 / 2 / 3\n\nDecision guidance:\n- QUICK: clear repro, narrow surface area, low ambiguity\n- STANDARD: moderate ambiguity, moderate system breadth, or meaningful risk\n- THOROUGH: high ambiguity, high-risk production impact, broad surface area, or multiple plausible causes\n\nSet context variables:\n- `bugSummary`\n- `reproSummary`\n- `investigationComplexity`\n- `riskLevel`\n- `rigorMode`\n- `automationLevel`\n- `maxParallelism`\n- `reproducibilityConfidence` (High / Medium / Low)\n\nAsk for confirmation only if the chosen rigor materially affects expectations or if critical repro details are still missing.",
|
|
62
62
|
"requireConfirmation": true
|
|
63
63
|
},
|
|
@@ -140,7 +140,7 @@
|
|
|
140
140
|
{
|
|
141
141
|
"id": "phase-4b-loop-decision",
|
|
142
142
|
"title": "Evidence Loop Decision",
|
|
143
|
-
"prompt": "Decide whether the evidence loop should continue.\n\nDecision rules:\n- if `contradictionCount > 0`
|
|
143
|
+
"prompt": "Decide whether the evidence loop should continue.\n\nDecision rules:\n- if `contradictionCount > 0` → continue\n- else if `unresolvedEvidenceGapCount > 0` → continue\n- else if `hasStrongAlternative = true` and the alternative is not meaningfully weaker → continue\n- else if `diagnosisType = inconclusive_but_narrowed` and further evidence is not realistically available → stop with bounded uncertainty\n- else → stop\n\nOutput exactly:\n```json\n{\n \"artifacts\": [{\n \"kind\": \"wr.loop_control\",\n \"decision\": \"continue\"\n }]\n}\n```",
|
|
144
144
|
"requireConfirmation": true,
|
|
145
145
|
"outputContract": {
|
|
146
146
|
"contractRef": "wr.contracts.loop_control"
|