@exaudeus/workrail 3.66.0 → 3.68.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/application/services/compiler/template-registry.js +10 -1
- package/dist/application/validation.js +1 -1
- package/dist/cli/commands/worktrain-init.js +1 -1
- package/dist/console/standalone-console.js +4 -1
- package/dist/console-ui/assets/{index-BynU38Vu.js → index-CyzltI6D.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/modes/full-pipeline.js +4 -4
- package/dist/coordinators/modes/implement-shared.js +5 -5
- package/dist/coordinators/modes/implement.js +4 -4
- package/dist/coordinators/pr-review.js +4 -4
- package/dist/daemon/workflow-runner.d.ts +1 -0
- package/dist/daemon/workflow-runner.js +1 -0
- package/dist/infrastructure/storage/schema-validating-workflow-storage.d.ts +21 -2
- package/dist/infrastructure/storage/schema-validating-workflow-storage.js +48 -0
- package/dist/manifest.json +41 -41
- package/dist/mcp/handlers/v2-workflow.js +24 -7
- package/dist/mcp/output-schemas.d.ts +36 -0
- package/dist/mcp/output-schemas.js +11 -1
- package/dist/mcp/workflow-protocol-contracts.js +2 -2
- package/dist/v2/projections/session-metrics.d.ts +1 -1
- package/dist/v2/projections/session-metrics.js +16 -35
- package/dist/v2/usecases/console-routes.d.ts +2 -2
- package/docs/authoring-v2.md +4 -4
- package/docs/changelog-recent.md +3 -3
- package/docs/configuration.md +1 -1
- package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
- package/docs/design/adaptive-coordinator-context.md +1 -1
- package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
- package/docs/design/adaptive-coordinator-routing-review.md +1 -1
- package/docs/design/adaptive-coordinator-routing.md +34 -34
- package/docs/design/agent-cascade-protocol.md +2 -2
- package/docs/design/console-daemon-separation-discovery.md +323 -0
- package/docs/design/context-assembly-design-candidates.md +1 -1
- package/docs/design/context-assembly-implementation-plan.md +1 -1
- package/docs/design/context-assembly-layer.md +2 -2
- package/docs/design/context-assembly-review-findings.md +1 -1
- package/docs/design/coordinator-access-audit.md +293 -0
- package/docs/design/coordinator-architecture-audit.md +62 -0
- package/docs/design/coordinator-error-handling-audit.md +240 -0
- package/docs/design/coordinator-testability-audit.md +426 -0
- package/docs/design/daemon-architecture-discovery.md +1 -1
- package/docs/design/daemon-console-separation-discovery.md +242 -0
- package/docs/design/daemon-memory-audit.md +203 -0
- package/docs/design/design-candidates-console-daemon-separation.md +256 -0
- package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
- package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
- package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
- package/docs/design/discovery-loop-fix-candidates.md +161 -0
- package/docs/design/discovery-loop-fix-design-review.md +106 -0
- package/docs/design/discovery-loop-fix-validation.md +258 -0
- package/docs/design/discovery-loop-investigation-A.md +188 -0
- package/docs/design/discovery-loop-investigation-B.md +287 -0
- package/docs/design/exploration-workflow-candidates.md +205 -0
- package/docs/design/exploration-workflow-design-review.md +166 -0
- package/docs/design/exploration-workflow-discovery.md +443 -0
- package/docs/design/ide-context-files-candidates.md +231 -0
- package/docs/design/ide-context-files-design-review.md +85 -0
- package/docs/design/ide-context-files.md +615 -0
- package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
- package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
- package/docs/design/in-process-http-audit.md +190 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
- package/docs/design/loadSessionNotes-candidates.md +108 -0
- package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
- package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
- package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
- package/docs/design/probe-session-design-candidates.md +261 -0
- package/docs/design/probe-session-phase0.md +490 -0
- package/docs/design/routines-guide.md +7 -7
- package/docs/design/session-metrics-attribution-candidates.md +250 -0
- package/docs/design/session-metrics-attribution-design-review.md +115 -0
- package/docs/design/session-metrics-attribution-discovery.md +319 -0
- package/docs/design/session-metrics-candidates.md +227 -0
- package/docs/design/session-metrics-design-review.md +104 -0
- package/docs/design/session-metrics-discovery.md +454 -0
- package/docs/design/spawn-session-debug.md +202 -0
- package/docs/design/trigger-validator-candidates.md +214 -0
- package/docs/design/trigger-validator-review.md +109 -0
- package/docs/design/trigger-validator-shaping-phase0.md +239 -0
- package/docs/design/trigger-validator.md +454 -0
- package/docs/design/v2-core-design-locks.md +2 -2
- package/docs/design/workflow-extension-points.md +15 -15
- package/docs/design/workflow-id-validation-at-startup.md +1 -1
- package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
- package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
- package/docs/design/worktrain-task-queue-candidates.md +5 -5
- package/docs/design/worktrain-task-queue.md +4 -4
- package/docs/discovery/coordinator-script-design.md +1 -1
- package/docs/discovery/coordinator-ux-discovery.md +3 -3
- package/docs/discovery/simulation-report.md +1 -1
- package/docs/discovery/workflow-modernization-discovery.md +326 -0
- package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
- package/docs/discovery/worktrain-status-briefing.md +1 -1
- package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
- package/docs/docker.md +1 -1
- package/docs/ideas/backlog.md +227 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
- package/docs/integrations/claude-code.md +5 -5
- package/docs/integrations/firebender.md +1 -1
- package/docs/plans/agentic-orchestration-roadmap.md +2 -2
- package/docs/plans/mr-review-workflow-redesign.md +9 -9
- package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
- package/docs/plans/ui-ux-workflow-discovery.md +2 -2
- package/docs/plans/workflow-categories-candidates.md +8 -8
- package/docs/plans/workflow-categories-discovery.md +4 -4
- package/docs/plans/workflow-modernization-design.md +430 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
- package/docs/plans/workflow-staleness-detection-review.md +4 -4
- package/docs/plans/workflow-staleness-detection.md +9 -9
- package/docs/plans/workrail-platform-vision.md +3 -3
- package/docs/reference/agent-context-cleaner-snippet.md +1 -1
- package/docs/reference/agent-context-guidance.md +4 -4
- package/docs/reference/context-optimization.md +2 -2
- package/docs/roadmap/now-next-later.md +2 -2
- package/docs/roadmap/open-work-inventory.md +16 -16
- package/docs/workflows.md +31 -31
- package/package.json +1 -1
- package/spec/workflow-tags.json +47 -47
- package/workflows/adaptive-ticket-creation.json +16 -16
- package/workflows/architecture-scalability-audit.json +22 -22
- package/workflows/bug-investigation.agentic.v2.json +3 -3
- package/workflows/classify-task-workflow.json +1 -1
- package/workflows/coding-task-workflow-agentic.json +6 -6
- package/workflows/cross-platform-code-conversion.v2.json +8 -8
- package/workflows/document-creation-workflow.json +8 -8
- package/workflows/documentation-update-workflow.json +8 -8
- package/workflows/intelligent-test-case-generation.json +2 -2
- package/workflows/learner-centered-course-workflow.json +2 -2
- package/workflows/mr-review-workflow.agentic.v2.json +4 -4
- package/workflows/personal-learning-materials-creation-branched.json +8 -8
- package/workflows/presentation-creation.json +5 -5
- package/workflows/production-readiness-audit.json +1 -1
- package/workflows/relocation-workflow-us.json +31 -31
- package/workflows/routines/context-gathering.json +1 -1
- package/workflows/routines/design-review.json +1 -1
- package/workflows/routines/execution-simulation.json +1 -1
- package/workflows/routines/feature-implementation.json +3 -3
- package/workflows/routines/final-verification.json +1 -1
- package/workflows/routines/hypothesis-challenge.json +1 -1
- package/workflows/routines/ideation.json +1 -1
- package/workflows/routines/parallel-work-partitioning.json +3 -3
- package/workflows/routines/philosophy-alignment.json +2 -2
- package/workflows/routines/plan-analysis.json +1 -1
- package/workflows/routines/plan-generation.json +1 -1
- package/workflows/routines/tension-driven-design.json +6 -6
- package/workflows/scoped-documentation-workflow.json +26 -26
- package/workflows/ui-ux-design-workflow.json +14 -14
- package/workflows/workflow-diagnose-environment.json +1 -1
- package/workflows/workflow-for-workflows.json +32 -77
- package/workflows/workflow-for-workflows.v2.json +0 -788
|
@@ -10,9 +10,9 @@
|
|
|
10
10
|
|
|
11
11
|
## Context / Ask
|
|
12
12
|
|
|
13
|
-
A daemon session was dispatched using `coding-task
|
|
13
|
+
A daemon session was dispatched using `wr.coding-task` with a goal that said "Discovery only -- Do NOT write any code". The session ran 11 advances, produced good design candidate notes, stopped at event 74 with no `run_completed`, and the later advances had no note output (likely conditional skips).
|
|
14
14
|
|
|
15
|
-
The question: for a discovery-only task (no code, just a design document), should we use `coding-task
|
|
15
|
+
The question: for a discovery-only task (no code, just a design document), should we use `wr.coding-task` or `wr.discovery`? And can `wr.coding-task` be trusted to stay in discovery mode when the goal explicitly says no code?
|
|
16
16
|
|
|
17
17
|
---
|
|
18
18
|
|
|
@@ -42,11 +42,11 @@ The question: for a discovery-only task (no code, just a design document), shoul
|
|
|
42
42
|
|
|
43
43
|
### Current state summary
|
|
44
44
|
|
|
45
|
-
`coding-task
|
|
45
|
+
`wr.coding-task` (lean v2, v1.1.0) is a full implementation lifecycle workflow. Its `about` field says: "Use this to implement a software feature or task." Its preconditions include "A deterministic validation path exists (tests, build, or an explicit verification strategy)." It explicitly describes what it produces: `implementation_plan.md`, `spec.md`, code slices, and a PR-ready handoff with commit JSON.
|
|
46
46
|
|
|
47
47
|
`wr.discovery` (v3.1.0) is a structured thinking/design workflow. Its `about` field says: "Use this to explore and think through a problem end-to-end." Its metaGuidance explicitly states: "Boundary: this workflow can end with a recommendation memo, prototype or test plan, or a research-informed direction. It should not implement production code."
|
|
48
48
|
|
|
49
|
-
### Step structure analysis: coding-task
|
|
49
|
+
### Step structure analysis: wr.coding-task
|
|
50
50
|
|
|
51
51
|
| Step | Condition | Discovery-relevant? |
|
|
52
52
|
|------|-----------|---------------------|
|
|
@@ -67,7 +67,7 @@ The question: for a discovery-only task (no code, just a design document), shoul
|
|
|
67
67
|
|
|
68
68
|
For Medium/Large tasks, the workflow runs the full design pipeline (phases 0-4) which produces `design-candidates.md` -- but it then continues directly into implementation (phases 6-7). There is no early exit after design.
|
|
69
69
|
|
|
70
|
-
**Does coding-task
|
|
70
|
+
**Does wr.coding-task have a "discovery only" mode?** No. It has no `runCondition` or context variable that would stop before implementation when a goal says "no code". The only escape hatch would be the agent choosing to stop itself based on the goal text -- which is an honor-system trust, not a structural guarantee.
|
|
71
71
|
|
|
72
72
|
### What phases run for Small vs Medium/Large
|
|
73
73
|
|
|
@@ -101,15 +101,15 @@ It explicitly cannot produce production code. It always ends with a design docum
|
|
|
101
101
|
|
|
102
102
|
### Option categories
|
|
103
103
|
|
|
104
|
-
1. **Use wr.discovery** for discovery tasks, `coding-task
|
|
105
|
-
2. **Use coding-task
|
|
106
|
-
3. **Add a discovery-mode flag** to `coding-task
|
|
104
|
+
1. **Use wr.discovery** for discovery tasks, `wr.coding-task` for implementation tasks
|
|
105
|
+
2. **Use wr.coding-task for everything**, trusting the agent to stop early when goal says "no code"
|
|
106
|
+
3. **Add a discovery-mode flag** to `wr.coding-task` via a `runCondition` on phases 6-7
|
|
107
107
|
4. **Use separate triggers** in triggers.yml with different `workflowId` per task type
|
|
108
108
|
|
|
109
109
|
### Contradictions / disagreements
|
|
110
110
|
|
|
111
|
-
- The daemon session with `coding-task
|
|
112
|
-
- The risk is not that `coding-task
|
|
111
|
+
- The daemon session with `wr.coding-task` produced "good design candidates notes" -- so the workflow does good design work even though it is intended for implementation. The design pipeline (phases 1-4) is legitimate and high quality.
|
|
112
|
+
- The risk is not that `wr.coding-task` does bad design work. The risk is that (a) it might not stop before phase-6 reliably, and (b) it carries implementation framing (slices, spec, PR handoff) that pollutes a pure discovery context.
|
|
113
113
|
|
|
114
114
|
### Evidence gaps
|
|
115
115
|
|
|
@@ -131,12 +131,12 @@ It explicitly cannot produce production code. It always ends with a design docum
|
|
|
131
131
|
|
|
132
132
|
- Dispatch a session that produces a design document and nothing else
|
|
133
133
|
- Know with certainty that no code will be written, regardless of agent judgment
|
|
134
|
-
- Get a high-quality, structured design output comparable to what coding-task
|
|
134
|
+
- Get a high-quality, structured design output comparable to what wr.coding-task's design phases produce
|
|
135
135
|
|
|
136
136
|
### Pains / tensions / constraints
|
|
137
137
|
|
|
138
138
|
- The daemon currently has ONE `workflowId` in triggers.yml -- no per-task routing
|
|
139
|
-
- `coding-task
|
|
139
|
+
- `wr.coding-task` is trusted for design quality but is not structurally bounded to stop before code
|
|
140
140
|
- `wr.discovery` is structurally bounded to no-code but may produce different design output depth
|
|
141
141
|
|
|
142
142
|
### Success criteria
|
|
@@ -148,12 +148,12 @@ It explicitly cannot produce production code. It always ends with a design docum
|
|
|
148
148
|
### Assumptions
|
|
149
149
|
|
|
150
150
|
- The daemon reads `workflowId` directly from triggers.yml and cannot dynamically select based on goal text
|
|
151
|
-
- `wr.discovery` produces design candidates comparable in quality to what phases 1-4 of `coding-task
|
|
151
|
+
- `wr.discovery` produces design candidates comparable in quality to what phases 1-4 of `wr.coding-task` produce
|
|
152
152
|
- triggers.yml supports multiple trigger entries with different `workflowId` values
|
|
153
153
|
|
|
154
154
|
### Reframes / HMW questions
|
|
155
155
|
|
|
156
|
-
- HMW: How might we route discovery tasks to `wr.discovery` and implementation tasks to `coding-task
|
|
156
|
+
- HMW: How might we route discovery tasks to `wr.discovery` and implementation tasks to `wr.coding-task` at the dispatcher level instead of relying on agent judgment?
|
|
157
157
|
- HMW: How might we make "discovery only" a structural guarantee rather than a goal-text instruction?
|
|
158
158
|
|
|
159
159
|
### What would make this framing wrong
|
|
@@ -183,15 +183,15 @@ Configure a second trigger entry in triggers.yml with `workflowId: wr.discovery`
|
|
|
183
183
|
|
|
184
184
|
**Why it fits:** Structural guarantee. `wr.discovery` was explicitly designed for this use case. Its metaGuidance says "should not implement production code."
|
|
185
185
|
|
|
186
|
-
**Strongest evidence for it:** The session incident shows the risk of relying on honor-system stop behavior in `coding-task
|
|
186
|
+
**Strongest evidence for it:** The session incident shows the risk of relying on honor-system stop behavior in `wr.coding-task`. Structural routing removes the risk entirely.
|
|
187
187
|
|
|
188
|
-
**Strongest risk against it:** triggers.yml currently supports one trigger per session. If it cannot support multiple triggers with per-task routing, this requires daemon work. Also, `wr.discovery` produces a recommendation memo/design doc, not the same `design-candidates.md` artifact shape that `coding-task
|
|
188
|
+
**Strongest risk against it:** triggers.yml currently supports one trigger per session. If it cannot support multiple triggers with per-task routing, this requires daemon work. Also, `wr.discovery` produces a recommendation memo/design doc, not the same `design-candidates.md` artifact shape that `wr.coding-task` phases 1-4 produce.
|
|
189
189
|
|
|
190
190
|
**When it should win:** Always, for any task where the desired output is a design document and there is no intent to implement code in the same session.
|
|
191
191
|
|
|
192
192
|
---
|
|
193
193
|
|
|
194
|
-
### Direction B: Trust coding-task
|
|
194
|
+
### Direction B: Trust wr.coding-task with honor-system stop
|
|
195
195
|
|
|
196
196
|
Keep triggers.yml as-is. Rely on the goal text ("Discovery only -- Do NOT write any code") to instruct the agent to stop before phase-6.
|
|
197
197
|
|
|
@@ -205,27 +205,27 @@ Keep triggers.yml as-is. Rely on the goal text ("Discovery only -- Do NOT write
|
|
|
205
205
|
|
|
206
206
|
---
|
|
207
207
|
|
|
208
|
-
### Direction C: Add discoveryMode flag to coding-task
|
|
208
|
+
### Direction C: Add discoveryMode flag to wr.coding-task
|
|
209
209
|
|
|
210
|
-
Modify `coding-task
|
|
210
|
+
Modify `wr.coding-task` to support a `discoveryMode` context variable. Add `runCondition: { var: "discoveryMode", not_equals: true }` to phases 6 and 7. Pass `discoveryMode: true` via the goal or a trigger-level context override.
|
|
211
211
|
|
|
212
|
-
**Why it fits:** Preserves the high-quality design pipeline of `coding-task
|
|
212
|
+
**Why it fits:** Preserves the high-quality design pipeline of `wr.coding-task` while adding a structural stop before implementation.
|
|
213
213
|
|
|
214
|
-
**Strongest evidence for it:** The design phases (1-4) of `coding-task
|
|
214
|
+
**Strongest evidence for it:** The design phases (1-4) of `wr.coding-task` are well-designed and familiar. Reusing them avoids duplication.
|
|
215
215
|
|
|
216
216
|
**Strongest risk against it:** This requires modifying a core workflow file. It adds complexity to a workflow that was designed for a different purpose. It creates a hybrid that does neither thing cleanly. And triggers.yml still only has one trigger, so the `discoveryMode` value must come from somewhere (goal text parse? trigger-level context?).
|
|
217
217
|
|
|
218
|
-
**When it should win:** If modifying `wr.discovery` or the daemon is unavailable, and modifying `coding-task
|
|
218
|
+
**When it should win:** If modifying `wr.discovery` or the daemon is unavailable, and modifying `wr.coding-task` is cheap and acceptable.
|
|
219
219
|
|
|
220
220
|
---
|
|
221
221
|
|
|
222
222
|
## Challenge Notes
|
|
223
223
|
|
|
224
|
-
**Against Direction A (wr.discovery):** The design output format differs. `coding-task
|
|
224
|
+
**Against Direction A (wr.discovery):** The design output format differs. `wr.coding-task` produces `design-candidates.md` via the `tension-driven-design` routine, followed by a `design-review-findings.md` and a full `implementation_plan.md`. `wr.discovery` produces a design doc with Candidate Directions and a recommendation. For a technical question about workflow architecture, the `wr.discovery` output (a recommendation memo) is actually _more_ appropriate than `implementation_plan.md`. The format difference is not a disadvantage.
|
|
225
225
|
|
|
226
226
|
**Against Direction B:** The incident already showed the risk. The session stopped at event 74 with no `run_completed`. We do not know if it stopped intentionally or by timeout/connection drop. If it stopped by timeout, the next session might not stop in the same place. Structural guarantees are always preferred over honor-system constraints when the downside (code written to a wrong branch) is recoverable but costly.
|
|
227
227
|
|
|
228
|
-
**Against Direction C:** Modifying `coding-task
|
|
228
|
+
**Against Direction C:** Modifying `wr.coding-task` for a use case it was not designed for violates the "make illegal states unrepresentable" principle. It is better to use the right tool than to add a mode switch to the wrong tool.
|
|
229
229
|
|
|
230
230
|
---
|
|
231
231
|
|
|
@@ -256,7 +256,7 @@ Modify `coding-task-workflow-agentic` to support a `discoveryMode` context varia
|
|
|
256
256
|
#### Recommendation
|
|
257
257
|
|
|
258
258
|
For a discovery-only task (no code, just a design document):
|
|
259
|
-
- **Use `wr.discovery`**, not `coding-task
|
|
259
|
+
- **Use `wr.discovery`**, not `wr.coding-task`
|
|
260
260
|
- Add a second trigger entry to `triggers.yml` with a unique `id` and `workflowId: wr.discovery`
|
|
261
261
|
- The daemon's trigger-store.ts and trigger-router.ts already support multiple triggers with different workflowIds -- no code change required
|
|
262
262
|
|
|
@@ -266,7 +266,7 @@ For a discovery-only task (no code, just a design document):
|
|
|
266
266
|
triggers:
|
|
267
267
|
- id: test-task
|
|
268
268
|
provider: generic
|
|
269
|
-
workflowId: coding-task
|
|
269
|
+
workflowId: wr.coding-task
|
|
270
270
|
workspacePath: /Users/etienneb/git/personal/workrail
|
|
271
271
|
goal: "Add the evidenceFrom field to AssessmentDimension..."
|
|
272
272
|
concurrencyMode: parallel
|
|
@@ -287,13 +287,13 @@ triggers:
|
|
|
287
287
|
|
|
288
288
|
The caller must send the correct `triggerId` (`discovery-task` vs `test-task`) when firing the webhook.
|
|
289
289
|
|
|
290
|
-
#### Why coding-task
|
|
290
|
+
#### Why wr.coding-task cannot be trusted in discovery mode
|
|
291
291
|
|
|
292
|
-
`coding-task
|
|
292
|
+
`wr.coding-task` has no structural stop before phase-6 (Implement Slice-by-Slice). For Small tasks, phase-5 (Small Task Fast Path) explicitly requires writing code. For Medium/Large tasks, the design pipeline (phases 0-4) produces good design work, then phase-6 writes code. The only protection against code-writing is the agent choosing to stop based on goal text -- an honor-system constraint that can fail under context window pressure.
|
|
293
293
|
|
|
294
294
|
The prior session stopped at event 74 (likely after phase-4, before phase-6) -- but we cannot confirm whether this was agent judgment or a connection drop. With `wr.discovery`, the question is irrelevant: there are no phases 6-7 to reach.
|
|
295
295
|
|
|
296
|
-
#### What phases coding-task
|
|
296
|
+
#### What phases wr.coding-task skips for Small tasks
|
|
297
297
|
|
|
298
298
|
- Skips: phase-1a (hypothesis), phase-1b (design), phase-1c (challenge), phase-2 (design review), phase-3 (plan), phase-3b (spec), phase-4 (plan audit), phase-6 (implementation), phase-7 (verification)
|
|
299
299
|
- Runs: phase-0 (classify) and phase-5 (Small Task Fast Path -- **writes code**)
|
|
@@ -302,24 +302,24 @@ For Medium/Large tasks, all phases run in sequence, including phase-6 (implement
|
|
|
302
302
|
|
|
303
303
|
#### Would wr.discovery have been a better choice?
|
|
304
304
|
|
|
305
|
-
Yes, without qualification. `wr.discovery` was designed for exactly this use case. Its metaGuidance states: "should not implement production code." All paths end with a recommendation memo, prototype spec, or research plan. It uses the same `tension-driven-design` routine as `coding-task
|
|
305
|
+
Yes, without qualification. `wr.discovery` was designed for exactly this use case. Its metaGuidance states: "should not implement production code." All paths end with a recommendation memo, prototype spec, or research plan. It uses the same `tension-driven-design` routine as `wr.coding-task` phases 1b, so design quality is equivalent.
|
|
306
306
|
|
|
307
307
|
#### How to configure triggers.yml for discovery vs implementation
|
|
308
308
|
|
|
309
|
-
- **Implementation tasks**: `workflowId: coding-task
|
|
309
|
+
- **Implementation tasks**: `workflowId: wr.coding-task` -- use the existing `test-task` trigger or rename it
|
|
310
310
|
- **Discovery tasks**: `workflowId: wr.discovery` -- add a new trigger entry (e.g., `id: discovery-task`)
|
|
311
311
|
- Route by sending the correct `triggerId` in the webhook
|
|
312
312
|
|
|
313
313
|
#### Workflow selection strategy when the daemon has ONE workflowId configured
|
|
314
314
|
|
|
315
|
-
The current `test-task` trigger always dispatches to `coding-task
|
|
315
|
+
The current `test-task` trigger always dispatches to `wr.coding-task`. For discovery tasks, either:
|
|
316
316
|
1. Add a second trigger entry (preferred -- structural routing, zero code change)
|
|
317
317
|
2. Temporarily change the trigger's `workflowId` to `wr.discovery` for discovery sessions, then change it back (workable but manual and error-prone)
|
|
318
318
|
3. Use console AUTO dispatch and set `workflowId: wr.discovery` explicitly in the dispatch request (for console-dispatched sessions only)
|
|
319
319
|
|
|
320
320
|
Option 1 is the right answer.
|
|
321
321
|
|
|
322
|
-
### Strongest alternative: Direction C (add discoveryMode flag to coding-task
|
|
322
|
+
### Strongest alternative: Direction C (add discoveryMode flag to wr.coding-task)
|
|
323
323
|
|
|
324
324
|
If the two-trigger routing were unavailable (it is not), adding `runCondition: { var: "discoveryMode", not_equals: true }` to phases 6-7 would also provide structural enforcement. Loses: workflow cleanliness, YAGNI compliance, reversibility. Not recommended when Direction A is available.
|
|
325
325
|
|
|
@@ -301,7 +301,7 @@ ACTIVE (2 sessions)
|
|
|
301
301
|
Discovery: what data exists today that a 'worktrain status' plain-English briefing command could use
|
|
302
302
|
Step: phase-3-synthesize Running 22 min
|
|
303
303
|
|
|
304
|
-
● coding-task
|
|
304
|
+
● wr.coding-task
|
|
305
305
|
Implement GitHub polling adapter for Issues/PRs without requiring webhooks
|
|
306
306
|
Step: phase-2-implement Running 8 min ⚠ no activity for 18 min
|
|
307
307
|
|
|
@@ -296,7 +296,7 @@ C2 is more structurally correct -- a mandatory separate step enforces that goal
|
|
|
296
296
|
|
|
297
297
|
### Next actions
|
|
298
298
|
|
|
299
|
-
These findings are the input to Phase 2: the `workflow-for-workflows` workflow will design the implementation based on this diagnosis.
|
|
299
|
+
These findings are the input to Phase 2: the `wr.workflow-for-workflows` workflow will design the implementation based on this diagnosis.
|
|
300
300
|
|
|
301
301
|
1. The wfw workflow should receive: the full diagnosis (Phase 0 is the root cause), the specific changes needed (5 changes listed above), the priority order (Phase 0 goalType classification is highest priority), and the decision to implement the C1+C3 hybrid, not C2.
|
|
302
302
|
2. After wfw produces the improved workflow, write it to `workflows/wr.discovery.json`.
|
package/docs/docker.md
CHANGED
|
@@ -70,7 +70,7 @@ echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | docker run -
|
|
|
70
70
|
echo '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"workflow_list","arguments":{}}}' | docker run --rm -i workrail-mcp
|
|
71
71
|
|
|
72
72
|
# Test getting a specific workflow
|
|
73
|
-
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"workflow_get","arguments":{"id":"coding-task
|
|
73
|
+
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"workflow_get","arguments":{"id":"wr.coding-task","mode":"metadata"}}}' | docker run --rm -i workrail-mcp
|
|
74
74
|
```
|
|
75
75
|
|
|
76
76
|
## Custom Workflows
|
package/docs/ideas/backlog.md
CHANGED
|
@@ -7577,3 +7577,230 @@ Discovery session `ecf359d7` running: 77 turns, 11 step advances (active, making
|
|
|
7577
7577
|
**Priority:** High. Every daemon crash currently wastes all in-flight work and waits up to 56 min before retrying. With even basic resume (step > 0 → resume, step = 0 → discard + fast re-dispatch), we'd recover most of the lost work and reduce retry latency from 56 min to < 5 min.
|
|
7578
7578
|
|
|
7579
7579
|
**Depends on:** Conversation history persistence (for high-quality resume context).
|
|
7580
|
+
|
|
7581
|
+
---
|
|
7582
|
+
|
|
7583
|
+
## Current state update (Apr 23, 2026)
|
|
7584
|
+
|
|
7585
|
+
**npm version: v3.66.0** | Daemon: stopped (intentionally, undergoing MCP reconnect) | MCP: reconnecting to updated binary
|
|
7586
|
+
|
|
7587
|
+
---
|
|
7588
|
+
|
|
7589
|
+
### What shipped in this session (Apr 22-23, 2026)
|
|
7590
|
+
|
|
7591
|
+
This was a major session covering daemon/console separation, metrics infrastructure, and workflow stability fixes.
|
|
7592
|
+
|
|
7593
|
+
**Architecture -- daemon/console/MCP separation:**
|
|
7594
|
+
- ✅ **Delete daemon-console.ts** (#753) -- daemon no longer bundles an embedded console; `worktrain console` is now the sole console entry point
|
|
7595
|
+
- ✅ **Remove dead steer/poll endpoints** (#755) -- deleted `worktrain trigger poll` CLI and the steer/poll HTTP endpoints that were only used by the deleted daemon-console
|
|
7596
|
+
- ✅ **Wire workflow catalog into standalone console** (#783, open) -- `worktrain console` Workflows tab now works without the MCP server running; `EnhancedMultiSourceWorkflowStorage` constructed directly in `standalone-console.ts`
|
|
7597
|
+
|
|
7598
|
+
**Metrics infrastructure (6-step sequence, all merged):**
|
|
7599
|
+
- ✅ **timestampMs on events** (#768, #772) -- `DomainEventEnvelopeV1Schema` now has required `timestampMs`; backfill script at `scripts/backfill-timestamps.ts`
|
|
7600
|
+
- ✅ **`run_completed` event** (#773) -- emitted on successful session completion with `startGitSha`, `endGitSha`, `agentCommitShas`, `captureConfidence`, `durationMs`
|
|
7601
|
+
- ✅ **Authoring docs: metrics_* keys** (#767) -- `metricsProfile` field and SHA accumulation convention documented in `docs/authoring-v2.md`
|
|
7602
|
+
- ✅ **`projectSessionMetricsV2` projection** (#771) -- pure projection reading `run_completed` + `context_set metrics_*` keys, wired into `ConsoleSessionSummary`
|
|
7603
|
+
- ✅ **Console metrics display** (#777) -- `SessionMetricsSection` in session detail view; `GET /api/v2/sessions/:id/diff-summary` endpoint
|
|
7604
|
+
- ✅ **`stats-summary.json` writer** (#769) -- `~/.workrail/data/stats-summary.json` aggregated from `execution-stats.jsonl`, written post-session and every 30s heartbeat
|
|
7605
|
+
|
|
7606
|
+
**Engine improvements:**
|
|
7607
|
+
- ✅ **Execution time tracking** (#756) -- `execution-stats.jsonl` per session in finally block
|
|
7608
|
+
- ✅ **Worktree orphan leak fix** (#756) -- sidecar deletion deferred to `maybeRunDelivery()` for worktree sessions
|
|
7609
|
+
- ✅ **assertNever for ReviewSeverity** (#756)
|
|
7610
|
+
- ✅ **Crash recovery phase A** (#759) -- `clearQueueIssueSidecars()` fixes 56-min re-dispatch block; sidecar preservation for sessions with progress
|
|
7611
|
+
- ✅ **Conversation history persistence** (#762) -- `<sessionId>-conversation.jsonl` per daemon session, append-only delta flush at each turn
|
|
7612
|
+
- ✅ **queue-poll.jsonl rotation** (#761) -- 10 MB size cap with `.1` backup
|
|
7613
|
+
- ✅ **Remove WorkTrain-owned label writes** (#765) -- `worktrain:in-progress`, `worktrain:generated` labels removed; deduplication now purely internal (sidecar + dispatchingIssues + session scan)
|
|
7614
|
+
- ✅ **metricsProfile footer injection** (#779) -- engine injects `metrics_*` accumulation footers based on `metricsProfile` workflow field; all 35 bundled workflows assigned profiles
|
|
7615
|
+
|
|
7616
|
+
**Workflow namespace:**
|
|
7617
|
+
- ✅ **Rename all bundled workflows to `wr.*`** (#782, open) -- `coding-task-workflow-agentic` → `wr.coding-task`, `mr-review-workflow-agentic` → `wr.mr-review`, etc. Prevents local project source from shadowing bundled workflows on version mismatch.
|
|
7618
|
+
|
|
7619
|
+
---
|
|
7620
|
+
|
|
7621
|
+
### Open PRs (waiting for WorkRail MCP review before merge)
|
|
7622
|
+
|
|
7623
|
+
| PR | Title | Status |
|
|
7624
|
+
|---|---|---|
|
|
7625
|
+
| #782 | Rename all bundled workflows to `wr.*` namespace | CI passing, needs `wr.mr-review` |
|
|
7626
|
+
| #783 | Wire workflow catalog into standalone console | CI pending, needs `wr.mr-review` |
|
|
7627
|
+
|
|
7628
|
+
**Do not merge #782 or #783 without running `wr.mr-review` on each.** The MCP needs to reconnect to the updated 3.66.0 binary first.
|
|
7629
|
+
|
|
7630
|
+
---
|
|
7631
|
+
|
|
7632
|
+
### Active bugs (investigated, not yet fixed)
|
|
7633
|
+
|
|
7634
|
+
1. **`additionalProperties: false` not enforced in Ajv** -- `src/application/validation.ts` uses `strict: false`, making schema's `additionalProperties` advisory only. A workflow with an unknown field passes `validate:registry`. Discovery+shaping in progress (agent running). **High priority -- fix before next release.**
|
|
7635
|
+
|
|
7636
|
+
2. **`wr.mr-review` NOT_FOUND from MCP** -- `list_workflows` finds it but `start_workflow` returns NOT_FOUND. Root cause: MCP process is still running old 3.60.0 binary (global npm was stale). Fixed by `npm update -g @exaudeus/workrail` (done). Requires MCP reconnect to take effect.
|
|
7637
|
+
|
|
7638
|
+
3. **User's `wr.discovery` VALIDATION_ERROR** -- stale `npx` cache pre-3.11.2. Fix: `npm cache clean --force && npx @exaudeus/workrail`. No code change needed.
|
|
7639
|
+
|
|
7640
|
+
---
|
|
7641
|
+
|
|
7642
|
+
### Known gaps (not yet started)
|
|
7643
|
+
|
|
7644
|
+
- **Phase B crash recovery** -- actual agent loop restart after crash (not just sidecar preservation). Blocked on conversation history being tested end-to-end. See "Autonomous crash recovery" entry above.
|
|
7645
|
+
- **`workrail cleanup` command** -- removes dead managed sources, old sessions. Still needed.
|
|
7646
|
+
- **console-routes.ts dispatch coupling** -- `POST /api/v2/auto/dispatch` still imports `runWorkflow` from `src/daemon/`. See backlog entry.
|
|
7647
|
+
- **`wr.*` list/get inconsistency** -- user-source `wr.*` copies appear in list but execution uses bundled. Low priority.
|
|
7648
|
+
|
|
7649
|
+
---
|
|
7650
|
+
|
|
7651
|
+
### Current system state (for next engineer picking this up)
|
|
7652
|
+
|
|
7653
|
+
**Daemon:** Stopped intentionally. Unload: `launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/io.worktrain.daemon.plist`
|
|
7654
|
+
**To restart daemon:** `launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/io.worktrain.daemon.plist`
|
|
7655
|
+
**MCP server:** Reconnecting -- run `/mcp` in Claude Code to get fresh 3.66.0 process
|
|
7656
|
+
**Global npm:** Updated to 3.66.0 (`npm update -g @exaudeus/workrail`)
|
|
7657
|
+
**Local build:** Built from main at 3.66.0 (`npm run build` done)
|
|
7658
|
+
**triggers.yml:** Must update `workflowId` values to new `wr.*` IDs after #782 merges (e.g. `coding-task-workflow-agentic` → `wr.coding-task`)
|
|
7659
|
+
|
|
7660
|
+
**Immediate next actions:**
|
|
7661
|
+
1. Reconnect MCP (`/mcp` in Claude Code)
|
|
7662
|
+
2. Run `wr.mr-review` on PR #782 (rename) and PR #783 (console fix)
|
|
7663
|
+
3. Merge both PRs
|
|
7664
|
+
4. Wait for validation fix shaping to complete, then code and ship it
|
|
7665
|
+
5. Update `triggers.yml` with new `wr.*` workflow IDs
|
|
7666
|
+
6. Restart daemon and monitor first pipeline run with new IDs
|
|
7667
|
+
|
|
7668
|
+
---
|
|
7669
|
+
|
|
7670
|
+
## Daemon agent loop stall detection (Apr 23, 2026)
|
|
7671
|
+
|
|
7672
|
+
**The problem:** When a subagent (workrail-executor) stalls with no progress for 600s, the stream watchdog kills it with "Agent stalled: no progress for 600s (stream watchdog did not recover)". The daemon has no equivalent mechanism. A daemon session that stops making LLM API calls (e.g. waiting on a hung tool, a network issue with no timeout, or a silent deadlock) will spin until the wall-clock timeout fires -- which can be up to 55-65 minutes. No indication to the operator, no early abort, no event emitted.
|
|
7673
|
+
|
|
7674
|
+
**What we want:** The daemon's `AgentLoop` should detect when no LLM turn starts within a configurable window (e.g. 120s) and abort the session with a `'stuck'` result. This is different from the existing `repeated_tool_call` / `no_progress` stuck detection, which watches for behavioral loops. This is a liveness check: if the loop simply isn't making any API calls at all, something is frozen.
|
|
7675
|
+
|
|
7676
|
+
**Design sketch:**
|
|
7677
|
+
- In `src/daemon/agent-loop.ts`, add a per-turn heartbeat timer that resets each time an LLM call starts
|
|
7678
|
+
- If the timer fires (120s with no new turn), call `agent.abort()` and emit `agent_stuck` with `reason: 'no_llm_turn'`
|
|
7679
|
+
- Configurable via `agentConfig.stallTimeoutSeconds` in `triggers.yml` (default 120s)
|
|
7680
|
+
- Distinct from wall-clock timeout (`maxSessionMinutes`) which covers the full session
|
|
7681
|
+
|
|
7682
|
+
**Where to look:**
|
|
7683
|
+
- `src/daemon/agent-loop.ts` -- `_runLoop()`, where LLM calls are made
|
|
7684
|
+
- `src/daemon/workflow-runner.ts` -- existing stuck detection and abort logic
|
|
7685
|
+
- `src/daemon/daemon-events.ts` -- `AgentStuckEvent` already has `reason` union (add `'no_llm_turn'`)
|
|
7686
|
+
|
|
7687
|
+
**Priority:** Medium. The wall-clock timeout provides a safety net, but 55 minutes is a long time to wait for a frozen session. A 2-minute liveness check would dramatically improve operator experience.
|
|
7688
|
+
|
|
7689
|
+
---
|
|
7690
|
+
|
|
7691
|
+
## Versioned workflow schema validation (Apr 23, 2026)
|
|
7692
|
+
|
|
7693
|
+
### The problem
|
|
7694
|
+
|
|
7695
|
+
WorkRail validates workflow files against the schema bundled in the currently-running MCP binary. This creates a bidirectional version mismatch problem:
|
|
7696
|
+
|
|
7697
|
+
**New binary, old workflow:** Binary's schema has tightened validation (new required field, removed enum value) → old workflow fails → silently dropped from registry.
|
|
7698
|
+
|
|
7699
|
+
**Old binary, new workflow:** Workflow has new fields the old schema doesn't know about → `additionalProperties: false` rejects it → silently dropped from registry.
|
|
7700
|
+
|
|
7701
|
+
Both directions cause the same symptom: workflow disappears from `list_workflows` with no explanation. This is what we hit Apr 22-23: local `workflows/` directory (v3.66.0 with `metricsProfile`) was loaded by an old global npm binary (v3.60.0) whose schema didn't know `metricsProfile`, causing the entire registry to appear empty.
|
|
7702
|
+
|
|
7703
|
+
The `wr.*` rename solves the specific case of bundled workflows being shadowed by local project files. But the version mismatch problem affects any non-bundled workflow (user, managed source, project) when the binary and the workflow file are at different schema versions.
|
|
7704
|
+
|
|
7705
|
+
### The right long-term fix: versioned schema validation (like Android Room migrations)
|
|
7706
|
+
|
|
7707
|
+
**Model:** Each workflow declares `"schemaVersion": 2` (integer). The binary ships validator copies for every schema version it supports. When loading a workflow, pick the validator matching the declared version -- not the current one.
|
|
7708
|
+
|
|
7709
|
+
```json
|
|
7710
|
+
{ "schemaVersion": 2, "id": "my-workflow", ... }
|
|
7711
|
+
```
|
|
7712
|
+
|
|
7713
|
+
**Load-time logic:**
|
|
7714
|
+
1. Read `schemaVersion` from the workflow file (default to 1 if absent -- legacy workflows)
|
|
7715
|
+
2. If `schemaVersion === current`: validate against current schema directly
|
|
7716
|
+
3. If `schemaVersion < current` (binary newer): validate against the declared schema version (workflow is valid for its era)
|
|
7717
|
+
4. If `schemaVersion > current` (binary too old): load leniently with warnings -- binary doesn't know this schema version, so `additionalProperties: false` doesn't apply
|
|
7718
|
+
|
|
7719
|
+
This gives you full schema freedom going forward. You can add required fields, tighten enums, rename things -- without breaking workflows written for older schema versions.
|
|
7720
|
+
|
|
7721
|
+
### Why NOT migrations (yet)
|
|
7722
|
+
|
|
7723
|
+
A migration chain (v1→v2→v3 transform functions like Room) is the logical extension but adds complexity:
|
|
7724
|
+
- Migration functions on JSON documents with free-form prose are harder to write correctly than SQL schema migrations
|
|
7725
|
+
- The forward direction (binary too old for workflow) can't be migrated -- you'd have to downgrade the workflow, which is lossy
|
|
7726
|
+
- Requires process discipline: every schema change must increment the version AND write a migration. Easy to forget.
|
|
7727
|
+
- Migration chain length grows over time -- a v1 workflow loading against a v10 binary runs 9 migrations
|
|
7728
|
+
|
|
7729
|
+
**Recommended phased approach:**
|
|
7730
|
+
- **Phase 1 (ship first):** Schema version dispatch without migrations. Keep old schema files (`spec/workflow.schema.v1.json`, `v2.json`, etc.). Validate each workflow against its declared version. No migration functions yet. Simple, no bugs, covers the backward direction.
|
|
7731
|
+
- **Phase 2 (add when needed):** Add migration functions when you actually need to make a breaking schema change that would invalidate old workflows. Not before.
|
|
7732
|
+
|
|
7733
|
+
### Gaps and known issues with this approach
|
|
7734
|
+
|
|
7735
|
+
1. **Forward direction still requires leniency.** When a workflow declares `schemaVersion: 5` and the binary only knows up to `schemaVersion: 4`, the only option is lenient loading with warnings. This is the `additionalProperties: true` approach, scoped to the mismatch case. This is acceptable -- if the binary is too old, it can still try to run the workflow with unknown fields ignored.
|
|
7736
|
+
|
|
7737
|
+
2. **Schema version vs. authoring spec version.** WorkRail already has `validatedAgainstSpecVersion` on workflows (authoring spec -- style/quality). `schemaVersion` is separate (structural validity). Two version numbers with similar names need clear documentation.
|
|
7738
|
+
|
|
7739
|
+
3. **External author burden.** When a new schema version ships, teams using managed sources need to know what changed and whether their workflows need updating. A changelog per schema version is required.
|
|
7740
|
+
|
|
7741
|
+
4. **Default for legacy workflows.** Workflows without `schemaVersion` should default to `1` (oldest supported), not current. This means they get validated against the oldest schema -- which is lenient and permissive -- rather than the current strict one. Acceptable tradeoff.
|
|
7742
|
+
|
|
7743
|
+
5. **`workflow-for-workflows` should stamp `schemaVersion`.** When authoring or modernizing a workflow, `wfw` should set `schemaVersion` to the current version automatically. This keeps the version accurate without requiring manual maintenance.
|
|
7744
|
+
|
|
7745
|
+
### What's already in place
|
|
7746
|
+
|
|
7747
|
+
- `validatedAgainstSpecVersion` field exists on workflows (authoring spec version, different concept)
|
|
7748
|
+
- `workflow.schema.json` has a `$id` with a version string (`v0.3.0`) but it's decorative -- not used at runtime
|
|
7749
|
+
- Validation warnings in `list_workflows` (PR #787) give users visibility when their workflow is silently dropped -- this is the interim fix until versioned validation ships
|
|
7750
|
+
|
|
7751
|
+
### Files to change (Phase 1)
|
|
7752
|
+
|
|
7753
|
+
- `spec/workflow.schema.json` -- add `schemaVersion` as an optional integer field (default 1 if absent)
|
|
7754
|
+
- `spec/workflow.schema.v1.json` -- snapshot of the current schema as "v1" (baseline)
|
|
7755
|
+
- `src/application/validation.ts` -- version dispatch: load the right schema based on `schemaVersion`
|
|
7756
|
+
- `src/types/workflow-definition.ts` -- add `readonly schemaVersion?: number` to `WorkflowDefinition`
|
|
7757
|
+
- `workflow-for-workflows.json` -- add step that stamps `schemaVersion` on the authored workflow
|
|
7758
|
+
- All bundled workflows -- add `"schemaVersion": 1` (once Phase 1 ships, bump to whatever the current version is)
|
|
7759
|
+
|
|
7760
|
+
### Priority
|
|
7761
|
+
|
|
7762
|
+
Medium-High. The `wr.*` rename (PR #782) is the immediate fix. This is the permanent architectural solution that prevents the problem for all workflow sources, not just bundled ones. Should ship after the rename stabilizes.
|
|
7763
|
+
|
|
7764
|
+
**Implementation note (Apr 23):** Start with v1 = current schema. A git history audit is running to check whether any breaking changes have already been shipped. If none found: all existing workflows are valid against the current schema, v1 = today, no reconstruction needed. If breaking changes are found: snapshot the pre-break schema as v1, declare current as v2 (or higher), and existing workflows without `schemaVersion` default to v1. **Do not ship schema versioning until the audit completes and this determination is made.**
|
|
7765
|
+
|
|
7766
|
+
**Audit result (Apr 23):** Exactly one breaking change found -- commit `b3212b45` (Apr 5, 2026) restructured `assessmentConsequenceTrigger` (`dimensionId`/`equalsLevel` → `anyEqualsLevel`). This affected only the `assessmentConsequences` feature which was introduced 4 days earlier (Apr 1). The bundled workflows that used it were migrated atomically in the same commit. No external workflows could have adopted this feature in that 4-day window. All other 14 schema changes in history are additive or loosening -- fully backward compatible.
|
|
7767
|
+
|
|
7768
|
+
**Decision: v1 = current schema. No historical reconstruction needed.** The one breaking change was fully contained within the bundled workflow corpus at the time it shipped.
|
|
7769
|
+
|
|
7770
|
+
---
|
|
7771
|
+
|
|
7772
|
+
## Consider rewriting WorkRail engine in Kotlin (Apr 23, 2026)
|
|
7773
|
+
|
|
7774
|
+
### The argument
|
|
7775
|
+
|
|
7776
|
+
WorkRail's coding philosophy demands "make illegal states unrepresentable" and "type safety as the first line of defense." TypeScript is structurally at odds with this: the compiler is advisory, not enforcing. `as unknown as`, `any`, and type assertion casts are always one line away. In a codebase where autonomous agents write and merge code without deep human review, the compiler is the reviewer -- and TypeScript's escape hatches make it too easy for an agent to paper over a real design problem with a cast.
|
|
7777
|
+
|
|
7778
|
+
Evidence from today's work: the `RunCompletedDataExpected` intermediate interface and the `as unknown as` cast in `session-metrics.ts` both existed for weeks. TypeScript didn't prevent them. A stricter compiler -- one where bypass requires genuine effort -- raises the bar the agent has to clear before code is valid.
|
|
7779
|
+
|
|
7780
|
+
### What Kotlin actually buys
|
|
7781
|
+
|
|
7782
|
+
- **Sealed classes** -- exhaustive `when` is a compile error, not a runtime `assertNever` pattern that convention must enforce
|
|
7783
|
+
- **No easy escape hatch** -- `as` in Kotlin throws at runtime on type mismatch; there's no equivalent of `as unknown as` that silently lies to the compiler
|
|
7784
|
+
- **Null safety by default** -- `String` vs `String?` is a language distinction, not a `strict: true` compiler flag that can be turned off
|
|
7785
|
+
- **Value classes and data classes** -- less boilerplate for domain types, stronger invariants
|
|
7786
|
+
|
|
7787
|
+
### What TypeScript + current tooling already covers
|
|
7788
|
+
|
|
7789
|
+
- Zod at boundaries provides runtime validation that Kotlin's type system would provide at compile time -- this gap is smaller than it looks
|
|
7790
|
+
- `neverthrow` gives Result types
|
|
7791
|
+
- Discriminated unions + `assertNever` give exhaustiveness -- but enforced by convention, not the compiler
|
|
7792
|
+
|
|
7793
|
+
### Real costs
|
|
7794
|
+
|
|
7795
|
+
- JVM startup latency for an MCP server that starts/stops frequently -- mitigable with GraalVM native image, but adds build complexity
|
|
7796
|
+
- Full rewrite of `src/` -- months of work, not weeks
|
|
7797
|
+
- Console stays TypeScript/React regardless
|
|
7798
|
+
- The Kotlin MCP SDK exists but the ecosystem tooling (npm, Node.js file I/O patterns) needs reimplementation
|
|
7799
|
+
|
|
7800
|
+
### The honest tradeoff
|
|
7801
|
+
|
|
7802
|
+
Convention drift is a recurring tax. Migration is a one-time cost. In a codebase driven heavily by autonomous agents, the compiler is the last line of defense against accumulated drift. TypeScript's permissiveness means that defense has holes.
|
|
7803
|
+
|
|
7804
|
+
This is not urgent -- the current codebase is working well. But if autonomous agent usage grows and human review per-PR decreases further, the compiler enforcement gap becomes more important, not less.
|
|
7805
|
+
|
|
7806
|
+
**Priority:** Low / long-term. Worth revisiting when the agent is writing the majority of new code. Requires a concrete spike: rewrite one module (e.g. `src/v2/durable-core/domain/`) in Kotlin and measure the real friction before committing to a full migration.
|
|
@@ -144,7 +144,7 @@ agent = Agent(
|
|
|
144
144
|
subagent_type="workrail-executor",
|
|
145
145
|
description="Execute context gathering",
|
|
146
146
|
prompt="""
|
|
147
|
-
Start the routine-context-gathering workflow.
|
|
147
|
+
Start the wr.routine-context-gathering workflow.
|
|
148
148
|
|
|
149
149
|
Workspace: /path/to/project
|
|
150
150
|
Focus: COMPLETENESS
|
|
@@ -156,7 +156,7 @@ agent = Agent(
|
|
|
156
156
|
Or from the main agent in Claude Code:
|
|
157
157
|
|
|
158
158
|
```
|
|
159
|
-
Please use the workrail-executor agent to run the bug-investigation
|
|
159
|
+
Please use the workrail-executor agent to run the wr.bug-investigation workflow
|
|
160
160
|
```
|
|
161
161
|
|
|
162
162
|
---
|
|
@@ -273,15 +273,15 @@ Later repositories override earlier ones with the same workflow ID.
|
|
|
273
273
|
### Running a workflow directly
|
|
274
274
|
|
|
275
275
|
```
|
|
276
|
-
> Use the bug-investigation
|
|
276
|
+
> Use the wr.bug-investigation workflow to investigate the cache expiration issue
|
|
277
277
|
```
|
|
278
278
|
|
|
279
279
|
### Delegating to workrail-executor
|
|
280
280
|
|
|
281
281
|
```
|
|
282
282
|
> Spawn two workrail-executor agents in parallel:
|
|
283
|
-
> 1. One running routine-context-gathering with focus=COMPLETENESS
|
|
284
|
-
> 2. One running routine-context-gathering with focus=DEPTH
|
|
283
|
+
> 1. One running wr.routine-context-gathering with focus=COMPLETENESS
|
|
284
|
+
> 2. One running wr.routine-context-gathering with focus=DEPTH
|
|
285
285
|
```
|
|
286
286
|
|
|
287
287
|
### Resuming a checkpointed workflow
|
|
@@ -22,7 +22,7 @@ The rollout is structured in **3 Phased Tiers**, gated by feature flags, ensurin
|
|
|
22
22
|
* Creation of `bug-investigation.agentic.json` with manual delegation instructions.
|
|
23
23
|
* Implementation of the "Delegate or Proxy" prompt pattern directly in the JSON.
|
|
24
24
|
3. **The Diagnostic Suite:**
|
|
25
|
-
* `
|
|
25
|
+
* `wr.diagnose-environment.json`: Agent-driven wizard to probe capabilities and generate config.
|
|
26
26
|
* `docs/integrations/firebender.md`: Documentation on tool whitelisting constraints.
|
|
27
27
|
|
|
28
28
|
**User Experience:**
|
|
@@ -83,7 +83,7 @@ The rollout is structured in **3 Phased Tiers**, gated by feature flags, ensurin
|
|
|
83
83
|
**Why it matters:**
|
|
84
84
|
* Keeps the primary step prompt user-voiced while still allowing start/resume-only guidance.
|
|
85
85
|
* Makes current runtime-owned supplement behavior explicit and eventually authorable.
|
|
86
|
-
* Gives workflow-for-workflows and future linting a real schema surface instead of relying on hidden server policy.
|
|
86
|
+
* Gives wr.workflow-for-workflows and future linting a real schema surface instead of relying on hidden server policy.
|
|
87
87
|
|
|
88
88
|
**Constraints:**
|
|
89
89
|
* Should be a **narrow, typed feature**, not arbitrary extra prompt sludge.
|
|
@@ -299,11 +299,11 @@ The redesign currently references a few routines conceptually, but it should mak
|
|
|
299
299
|
|
|
300
300
|
High-value candidates include:
|
|
301
301
|
|
|
302
|
-
- `routine-context-gathering`
|
|
303
|
-
- `routine-hypothesis-challenge`
|
|
304
|
-
- `routine-execution-simulation`
|
|
305
|
-
- `routine-philosophy-alignment`
|
|
306
|
-
- `routine-final-verification`
|
|
302
|
+
- `wr.routine-context-gathering`
|
|
303
|
+
- `wr.routine-hypothesis-challenge`
|
|
304
|
+
- `wr.routine-execution-simulation`
|
|
305
|
+
- `wr.routine-philosophy-alignment`
|
|
306
|
+
- `wr.routine-final-verification`
|
|
307
307
|
|
|
308
308
|
These should be treated as current reusable building blocks, not future ideas.
|
|
309
309
|
|
|
@@ -786,9 +786,9 @@ The workflow should further strengthen:
|
|
|
786
786
|
|
|
787
787
|
This phase should explicitly consider use of:
|
|
788
788
|
|
|
789
|
-
- `routine-hypothesis-challenge` for adversarial reviewer challenge
|
|
790
|
-
- `routine-execution-simulation` when runtime behavior or branch-sensitive behavior is material
|
|
791
|
-
- `routine-philosophy-alignment` when policy-context is important enough to affect recommendation quality
|
|
789
|
+
- `wr.routine-hypothesis-challenge` for adversarial reviewer challenge
|
|
790
|
+
- `wr.routine-execution-simulation` when runtime behavior or branch-sensitive behavior is material
|
|
791
|
+
- `wr.routine-philosophy-alignment` when policy-context is important enough to affect recommendation quality
|
|
792
792
|
|
|
793
793
|
## Phase 4: Contradiction, Gap, and Boundary Resolution Loop
|
|
794
794
|
|
|
@@ -822,7 +822,7 @@ The current final validation idea remains useful, but it should explicitly valid
|
|
|
822
822
|
|
|
823
823
|
Final validation should also ensure the handoff reflects uncertainty honestly instead of over-stating confidence.
|
|
824
824
|
|
|
825
|
-
The current WorkRail routine catalog suggests the redesign should strongly consider `routine-final-verification` as either:
|
|
825
|
+
The current WorkRail routine catalog suggests the redesign should strongly consider `wr.routine-final-verification` as either:
|
|
826
826
|
|
|
827
827
|
- a delegated verifier
|
|
828
828
|
- an injected routine template
|
|
@@ -58,7 +58,7 @@
|
|
|
58
58
|
- **Tensions resolved**: all 6 failure categories; forces alternatives at hypothesis stage; evidence-based per-dimension findings
|
|
59
59
|
- **Tensions accepted**: inherent visual limitations; spec not mockup
|
|
60
60
|
- **Failure mode**: reviewer families produce generic UX advice not tied to actual design context
|
|
61
|
-
- **Repo pattern**: directly adapts `production-readiness-audit.json` structure; auditComplexity branching from `adaptive-ticket-creation.json`
|
|
61
|
+
- **Repo pattern**: directly adapts `wr.production-readiness-audit.json` structure; auditComplexity branching from `wr.adaptive-ticket-creation.json`
|
|
62
62
|
- **Gains**: comprehensive, structured freedom, all failure categories covered
|
|
63
63
|
- **Losses**: heavier than minimal for simple tasks (mitigated by Simple fast path)
|
|
64
64
|
- **Scope**: best-fit for feature-level and screen-level design work
|
|
@@ -73,7 +73,7 @@
|
|
|
73
73
|
- **Tensions resolved**: single-solution anchoring; forces genuine exploration
|
|
74
74
|
- **Tensions accepted**: UX laws/accessibility not explicitly enforced
|
|
75
75
|
- **Failure mode**: 3 directions are superficially different (same IA, different metaphors)
|
|
76
|
-
- **Repo pattern**: adapts `architecture-scalability-audit.json` dimension-declaration
|
|
76
|
+
- **Repo pattern**: adapts `wr.architecture-scalability-audit.json` dimension-declaration
|
|
77
77
|
- **Gains**: best for exploring solution space; documents tradeoffs
|
|
78
78
|
- **Losses**: lighter on UX law enforcement; accessibility second-class
|
|
79
79
|
- **Scope**: best as a mechanism within B rather than a standalone workflow
|
|
@@ -90,11 +90,11 @@
|
|
|
90
90
|
- **Tensions resolved**: turns agent UX knowledge into structured application; fully evidence-based
|
|
91
91
|
- **Tensions accepted**: doesn't help with design-from-scratch; requires existing design as input
|
|
92
92
|
- **Failure mode**: agent audits what's in the spec but misses implicit design assumptions not stated
|
|
93
|
-
- **Repo pattern**: directly adapts `architecture-scalability-audit.json`
|
|
93
|
+
- **Repo pattern**: directly adapts `wr.architecture-scalability-audit.json`
|
|
94
94
|
- **Gains**: actionable per-dimension findings with references; complements B
|
|
95
95
|
- **Losses**: review only, not creation
|
|
96
96
|
- **Scope**: best-fit as standalone for design review; or used after B to audit the output
|
|
97
|
-
- **Philosophy**: all principles satisfied; mirrors architecture-scalability-audit exactly
|
|
97
|
+
- **Philosophy**: all principles satisfied; mirrors wr.architecture-scalability-audit exactly
|
|
98
98
|
|
|
99
99
|
## Comparison and Recommendation
|
|
100
100
|
|