@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +3 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +252 -93
  160. package/workflows/workflow-for-workflows.v2.json +188 -77
@@ -0,0 +1,11 @@
1
+ # Epic: Native Context Management for Workflows
2
+
3
+ > **Not pursuing**
4
+ >
5
+ > WorkRail is not planning to implement native context management.
6
+ >
7
+ > This file is kept only as a stable tombstone so old references do not break.
8
+ >
9
+ > Use these instead:
10
+ > - `docs/roadmap/legacy-planning-status.md`
11
+ > - `docs/roadmap/open-work-inventory.md`
@@ -0,0 +1,225 @@
1
+ # Performance Fixes: Design Candidates
2
+
3
+ **Context:** Four remaining performance fixes after a prior session implemented expanded skip list,
4
+ `MAX_WALK_DEPTH=5`, and 30s TTL walk cache.
5
+
6
+ ---
7
+
8
+ ## Problem Understanding
9
+
10
+ ### Core tensions
11
+
12
+ 1. **Determinism vs. performance** (findWorkflowJsonFiles parallelization): Making directory scan
13
+ concurrent breaks output insertion order. Resolved by sorting the result -- adds negligible
14
+ overhead at realistic file counts.
15
+
16
+ 2. **Simplicity vs. targeted protection** (timeout on walk): The simplest placement wraps all of
17
+ `createWorkflowReaderForRequest`. But the walk is only one sub-phase -- a tighter, more targeted
18
+ timeout should wrap just `discoverRootedWorkflowDirectories`.
19
+
20
+ 3. **Lazy vs. eager eviction** (TTL in remembered roots): Lazy eviction (on write) is simple and
21
+ has no background-timer risk. It only runs when `rememberRoot` is called, so a workspace seen
22
+ once and never evicted persists until the next write. Acceptable per issue #241.
23
+
24
+ 4. **Real I/O vs. mocked infra** (latency test): A test using real `fs.mkdir` can be slow on CI.
25
+ A 500ms budget for a small synthetic tree is generous enough to avoid flakiness.
26
+
27
+ ### Likely seam
28
+
29
+ - **Parallelization**: inside `scan()` in `findWorkflowJsonFiles` -- collect subdirs, fan out with
30
+ `Promise.all`, sort final `files` array.
31
+ - **Timeout**: inside `createWorkflowReaderForRequest` wrapping `discoverRootedWorkflowDirectories`
32
+ -- single place, both handlers automatically protected.
33
+ - **TTL eviction**: inside the `andThen` chain in `rememberRoot()`, just before
34
+ `this.persist(nextRoots)` -- lock is already held, `nextRoots` already computed.
35
+ - **Latency test**: `tests/performance/perf-fixes.test.ts` following `cache-eviction.test.ts` style.
36
+
37
+ ### What makes this hard
38
+
39
+ - Parallelization + determinism: need explicit sort, not just `Promise.all`
40
+ - Timeout constant calibration: 10s is generous for most environments but may be tight on
41
+ cold-start NFS mounts before the 30s cache warms
42
+ - TTL eviction placement: must be on write path (not read path) to avoid per-call overhead
43
+ - Latency test flakiness: tree must be small enough to be fast on CI, large enough to exercise
44
+ the depth limit
45
+
46
+ ---
47
+
48
+ ## Philosophy Constraints
49
+
50
+ From `AGENTS.md` and `/Users/etienneb/CLAUDE.md`:
51
+
52
+ - **Determinism over cleverness**: parallelization requires explicit sort to restore determinism
53
+ - **Errors are data**: `withTimeout` throws; callers use `ResultAsync.fromPromise(withTimeout(...))`
54
+ -- no change to that pattern needed
55
+ - **Immutability by default**: TTL filter produces a new `nextRoots` array (does not mutate)
56
+ - **YAGNI with discipline**: no configurable TTL parameter -- use a named constant
57
+ - **Prefer fakes over mocks**: latency test uses real `fs` operations
58
+ - **Document 'why', not 'what'**: TTL constant and parallelization rationale need explanatory
59
+ comments
60
+
61
+ ### Conflicts
62
+
63
+ - **Stated: no exceptions** vs **practiced: `withTimeout` throws**. Consistent in practice:
64
+ `withTimeout` is a low-level utility; callers convert at boundary with `ResultAsync.fromPromise`.
65
+
66
+ ---
67
+
68
+ ## Impact Surface
69
+
70
+ - `findWorkflowJsonFiles` is called by `scanRawWorkflowFiles` (same file). No caller asserts
71
+ order today. Sort makes the new order contract explicit and stable.
72
+ - `createWorkflowReaderForRequest` is called from `handleV2ListWorkflows` and
73
+ `handleV2InspectWorkflow`. Adding timeout inside the shared function protects both handlers
74
+ without modifying them.
75
+ - `rememberRoot` is called from `remembered-roots.ts` shared handler helper -- no interface
76
+ change needed.
77
+ - `LocalRememberedRootsStoreV2` implements `RememberedRootsStorePortV2` -- port interface
78
+ unchanged.
79
+
80
+ ---
81
+
82
+ ## Candidates
83
+
84
+ ### Item 1: Parallelize `findWorkflowJsonFiles`
85
+
86
+ #### Candidate A (recommended): `Promise.all` fan-out + final sort
87
+
88
+ Inside `scan()`, collect subdirectory paths from `entries`, push files immediately, then
89
+ `await Promise.all(subdirs.map(dir => scan(dir)))`. After `scan(baseDirReal)` returns, call
90
+ `files.sort()` before return.
91
+
92
+ - **Tensions resolved**: sequential scan latency
93
+ - **Tensions accepted**: minor sort overhead (negligible)
94
+ - **Boundary**: inside `scan()`, no interface change
95
+ - **Why best-fit**: targets the anti-pattern directly
96
+ - **Failure mode**: if a caller depends on insertion order (none currently do)
97
+ - **Repo pattern**: follows `Promise.all` fan-out used elsewhere
98
+ - **Gain**: concurrent I/O; **Lose**: insertion order (replaced by stable sort)
99
+ - **Scope**: best-fit
100
+ - **Philosophy**: honors Determinism (via sort), Compose with small pure functions
101
+
102
+ #### Candidate B: Replace `statSync` with async `fs.stat`, keep sequential loop
103
+
104
+ Replaces the blocking sync call in the scan loop with async stat, but keeps sequential descent.
105
+
106
+ - Too narrow: doesn't fix the `for...of await` sequential descent -- the main bottleneck
107
+ - **Scope**: too narrow
108
+
109
+ ---
110
+
111
+ ### Item 2: Timeout protection for walk
112
+
113
+ #### Candidate A (recommended): Wrap `discoverRootedWorkflowDirectories` inside `createWorkflowReaderForRequest`
114
+
115
+ Add `DISCOVERY_TIMEOUT_MS = 10_000` constant. Replace:
116
+ ```ts
117
+ const { discovered, stale } = await discoverRootedWorkflowDirectories(rememberedRoots);
118
+ ```
119
+ with:
120
+ ```ts
121
+ const { discovered, stale } = await withTimeout(
122
+ discoverRootedWorkflowDirectories(rememberedRoots),
123
+ DISCOVERY_TIMEOUT_MS,
124
+ 'workflow_root_discovery',
125
+ );
126
+ ```
127
+
128
+ - **Tensions resolved**: hung walk blocking handler forever; single place to maintain
129
+ - **Boundary**: `createWorkflowReaderForRequest` in shared module
130
+ - **Failure mode**: 10s may be tight on cold NFS walk -- mitigated by 30s cache for subsequent calls
131
+ - **Repo pattern**: adapts exact same `withTimeout` pattern from `v2-workflow.ts` lines 215/363
132
+ - **Scope**: best-fit
133
+
134
+ #### Candidate B: Wrap `createWorkflowReaderForRequest` in each handler
135
+
136
+ Two call sites. If a 3rd handler is added, it misses the timeout. Departs from DRY.
137
+
138
+ - **Scope**: too broad (and duplicated)
139
+
140
+ ---
141
+
142
+ ### Item 3: TTL eviction in `LocalRememberedRootsStoreV2`
143
+
144
+ #### Candidate A (recommended): Filter `nextRoots` in `rememberRoot()` before persist
145
+
146
+ Add `const TTL_30_DAYS_MS = 30 * 24 * 60 * 60 * 1000`. In `rememberRoot()`:
147
+
148
+ ```ts
149
+ const withEviction = nextRoots.filter(
150
+ (root) => root.lastSeenAtMs >= nowMs - TTL_30_DAYS_MS
151
+ );
152
+ return this.persist(withEviction);
153
+ ```
154
+
155
+ - **Boundary**: inside `rememberRoot()`, lock already held, `nextRoots` already computed
156
+ - **Failure mode**: roots seen once and never evicted until next write -- acceptable
157
+ - **Repo pattern**: adapts `normalizeRootRecords` filter pattern in same file
158
+ - **Philosophy**: Immutability (new filtered array), YAGNI (no configurable TTL)
159
+
160
+ #### Candidate B: Filter in `listRootRecords()` (read path)
161
+
162
+ Eviction on read removes stale entries from the in-memory result but does not persist them.
163
+ Stale entries remain on disk. Read path is called much more often -- wrong seam.
164
+
165
+ - **Scope**: wrong boundary; doesn't reduce disk accumulation
166
+
167
+ ---
168
+
169
+ ### Item 4: Latency regression test
170
+
171
+ #### Candidate A (recommended): Synthetic tree in `tests/performance/perf-fixes.test.ts`
172
+
173
+ Create a temp directory tree (depth 5, branching factor 3) with real `fs.mkdir`. Call
174
+ `discoverRootedWorkflowDirectories([treeRoot])`. Assert elapsed < 500ms.
175
+
176
+ - **Boundary**: black-box test of the exported function
177
+ - **Failure mode**: flaky on slow CI if tree is too large -- mitigated by small breadth (3) and depth (5)
178
+ - **Repo pattern**: follows `cache-eviction.test.ts` style
179
+ - **Philosophy**: Prefer fakes over mocks (real FS); Determinism (reproducible tree)
180
+
181
+ ---
182
+
183
+ ## Comparison and Recommendation
184
+
185
+ All candidates converge. Genuine diversity does not exist for these changes -- each problem has
186
+ one clearly best-fit mechanical solution.
187
+
188
+ **Proceed with all four Candidate A choices.**
189
+
190
+ Each change:
191
+ - Touches exactly one function
192
+ - Requires no interface or contract changes
193
+ - Is reversible (one-line revert if assumptions are wrong)
194
+ - Follows an existing repo pattern
195
+
196
+ ---
197
+
198
+ ## Self-Critique
199
+
200
+ ### Strongest counter-arguments
201
+
202
+ - **Parallelization**: if downstream validation depends on processing order, sorting may not be
203
+ enough and could mask a latent ordering bug. No test currently asserts order -- low risk.
204
+ - **Walk timeout at 10s**: first cold walk on a large monorepo on NFS might legitimately exceed 10s.
205
+ Would produce a user-visible timeout error on first use. The 30s cache means subsequent calls
206
+ are instant -- only the first call is at risk.
207
+
208
+ ### Pivot conditions
209
+
210
+ - If cold walk times > 10s in production: raise `DISCOVERY_TIMEOUT_MS` or add per-root timeout
211
+ inside `walkForRootedWorkflowDirectories`.
212
+ - If `findWorkflowJsonFiles` results need filesystem order: remove sort, document non-determinism.
213
+ - If TTL eviction needs to run on stale roots that are never written again: add eviction to the
214
+ read path as a side-effecting read or add a separate `evictStaleRoots()` method.
215
+
216
+ ### Narrower option that lost
217
+
218
+ Sequential `findWorkflowJsonFiles` with only `statSync` → `fs.stat`: fixes minor blocking I/O
219
+ but doesn't address the actual sequential descent anti-pattern.
220
+
221
+ ---
222
+
223
+ ## Open Questions
224
+
225
+ None that require human decision. All design choices are bounded by existing constraints.
@@ -0,0 +1,61 @@
1
+ # Performance Fixes: Design Review Findings
2
+
3
+ ## Tradeoff Review
4
+
5
+ | Tradeoff | Verdict | Notes |
6
+ |---|---|---|
7
+ | Lazy TTL eviction (write-only) | Acceptable | Issue #241 explicitly allows lazy eviction. Roots not written again persist, but this is a known, bounded edge case. |
8
+ | Non-deterministic intermediate state during parallel scan | Acceptable | Resolved by final `files.sort()` -- stable lexicographic order. No caller asserts insertion order. |
9
+ | 10s walk timeout may be tight on slow FS | Acceptable | 30s cache means only first cold call is at risk. Error is descriptive, not silent. Constant is easy to raise. |
10
+
11
+ ## Failure Mode Review
12
+
13
+ | Failure Mode | Coverage | Residual Risk |
14
+ |---|---|---|
15
+ | Order dependency in callers after parallelization | Covered by sort | Low |
16
+ | Walk timeout fires on first cold call | Descriptive error, user recovers | Medium (UX degradation, not data loss) |
17
+ | TTL eviction false positive (active root evicted) | Impossible at 30-day TTL | None |
18
+ | Latency test flakiness (cache interference) | Mitigated: unique temp dir per test run | Low |
19
+
20
+ ## Runner-Up / Simpler Alternative Review
21
+
22
+ No runner-up elements worth pulling. No simpler alternative satisfies all acceptance criteria.
23
+ All four Candidate A approaches remain unchanged.
24
+
25
+ ## Philosophy Alignment
26
+
27
+ All key principles satisfied: Determinism (via sort), Errors are data (ResultAsync.fromPromise
28
+ wrapping), Immutability (new arrays), YAGNI (named constants), Prefer fakes over mocks (real FS
29
+ in test), Architectural fixes over patches (parallelization, timeout).
30
+
31
+ Two minor tensions:
32
+ - `files[]` shared append in parallel scan: acceptable in single-threaded Node.js
33
+ - Timeout inside utility function vs. handler boundary: acceptable -- shared module IS the discovery boundary
34
+
35
+ ## Findings
36
+
37
+ **Yellow: Walk timeout constant (10s) has no empirical basis**
38
+ - DISCOVERY_TIMEOUT_MS = 10_000 is a reasonable default but untested against real environments
39
+ - Should be commented as adjustable, not hardcoded as final
40
+ - No blocking concern for this PR; monitor in production
41
+
42
+ **Yellow: Latency test timing assertion (500ms) is generous for a small tree but may pass vacuously**
43
+ - A 500ms budget for a depth-5 breadth-3 tree (max ~243 dirs) should complete in ~10-50ms
44
+ - The test is more valuable as a non-regression guard than a strict budget test
45
+ - Document the budget reasoning in the test comment
46
+
47
+ No Red or Orange findings.
48
+
49
+ ## Recommended Revisions
50
+
51
+ 1. Add a comment near `DISCOVERY_TIMEOUT_MS` explaining it can be raised for slow NFS environments
52
+ 2. Add a comment in the latency test explaining the 500ms budget and tree size rationale
53
+ 3. Use a unique temp dir per test invocation (already in plan) to prevent walk cache interference
54
+
55
+ ## Residual Concerns
56
+
57
+ - **Walk timeout vs. UX**: if production walk times are measured and commonly > 10s, the constant
58
+ should be raised to 20s. No action needed now.
59
+ - **TTL eviction completeness**: roots that are never written again persist forever. Acceptable
60
+ per issue #241. If this becomes a problem, a separate `evictStaleRoots()` method would be the
61
+ right extension point.
@@ -0,0 +1,264 @@
1
+ # Performance Fixes: New Issues Discovery
2
+
3
+ **Date:** 2026-04-07
4
+ **Status:** Complete -- 5 new issues confirmed, HIGH confidence
5
+
6
+ ## Final Summary
7
+
8
+ **Path:** full_spectrum (landscape reading + reframing)
9
+
10
+ **Problem framing:** The known 7 issues were derived from design doc analysis. Actual source code reading reveals 5 additional issues: one second unguarded call site (inspect_workflow), one test comment that describes nonexistent code, and three issues in `raw-workflow-file-scanner.ts` (a file the known list doesn't mention).
11
+
12
+ **Landscape takeaways:** All 4 target files are in pre-fix state. No implemented fixes. The design patterns for all fixes exist elsewhere in the codebase (`withTimeout` in v2-workflow.ts, `normalizeRootRecords` in the same remembered-roots file, `Promise.all` fan-out referenced in design docs, `sortedEntries` in request-workflow-reader.ts).
13
+
14
+ **Chosen direction:** All 5 new issues are confirmed and distinct. No single 'direction' -- this is a discovery output.
15
+
16
+ **Confidence band:** HIGH
17
+
18
+ **Residual risks:**
19
+ 1. Issue A severity: if MCP transport already converts unhandled promise rejections to structured error responses, Issue A is degraded-response rather than crash. Verify before classifying as Red.
20
+ 2. Issue C scope: `existsSync` is imported alongside `statSync` at raw-workflow-file-scanner.ts:2 -- audit its usage for the same event-loop concern.
21
+
22
+ **Next actions:**
23
+ 1. Add Issue A to the known issue #1 ticket (or create a sub-item): inspect_workflow call site at v2-workflow.ts:332
24
+ 2. Create a new ticket for raw-workflow-file-scanner.ts covering Issues C, D, E together (they are all in the same file)
25
+ 3. Fix Issue B (test comment) as part of whichever PR implements the walk cache
26
+
27
+ This document records issues found by reading the actual current state of the four target files
28
+ (`request-workflow-reader.ts`, `raw-workflow-file-scanner.ts`, `remembered-roots-store/index.ts`,
29
+ `perf-fixes.test.ts`). All 7 previously known issues are confirmed present. The 5 issues below
30
+ are NEW -- not named in the known list.
31
+
32
+ ---
33
+
34
+ ## Problem Understanding
35
+
36
+ ### Core tensions
37
+
38
+ 1. **Known list completeness vs. actual code state**: The known 7 issues were derived from design
39
+ doc analysis. Reading actual code reveals additional gaps that the design docs mentioned but the
40
+ known issue list didn't capture explicitly.
41
+
42
+ 2. **Fix scope vs. fix surface**: The design docs say 'all changes in request-workflow-reader.ts'
43
+ for the walk fixes, but the unguarded call site issue extends to `handleV2InspectWorkflow` --
44
+ a second handler not named in known issue #1.
45
+
46
+ 3. **Test reliability vs. test accuracy**: The test file describes code behavior that doesn't exist
47
+ yet (a walk cache), creating a maintenance hazard for future implementers.
48
+
49
+ ### Likely seam
50
+
51
+ - Issues A (call site): `v2-workflow.ts` lines 332-339 -- identical structural pattern to known issue #1
52
+ - Issues B (test comment): `perf-fixes.test.ts` lines 17-18 -- inline comment describing phantom cache
53
+ - Issues C, D, E (scanner): `raw-workflow-file-scanner.ts` -- all three affect the same file,
54
+ different functions: `statSync` at line 95, `scan()` sequential loop lines 19-35, unsorted return
55
+
56
+ ### What makes this hard
57
+
58
+ - Issue A is easy to miss because the design doc says 'callers need not change' -- but it was
59
+ wrong: there are two bare-await call sites, not one
60
+ - Issue B is invisible unless you cross-check test comments against actual source code
61
+ - Issues C/D/E all live in `raw-workflow-file-scanner.ts` -- a file the known issues don't mention,
62
+ even though the design doc explicitly specifies all three fixes for it
63
+
64
+ ---
65
+
66
+ ## Philosophy Constraints
67
+
68
+ - **Errors are data**: Issue A violates this -- `createWorkflowReaderForRequest` can throw, and
69
+ `handleV2InspectWorkflow` doesn't wrap it in a Result
70
+ - **Determinism over cleverness**: Issue E violates this -- filesystem readdir order is not stable
71
+ - **Document why not what**: Issue B violates this -- the comment describes a thing that doesn't
72
+ exist, not the reason the test is structured as it is
73
+ - **Dependency injection for boundaries**: Issue C violates this tangentially -- `statSync` is a
74
+ hidden sync I/O side effect inside an async function
75
+
76
+ ---
77
+
78
+ ## Impact Surface
79
+
80
+ - **Issue A**: `handleV2InspectWorkflow` in `v2-workflow.ts` -- any `listRememberedRoots` error
81
+ thrown inside `createWorkflowReaderForRequest` reaches the MCP transport layer unhandled.
82
+ `handleV2ListWorkflows` has the same exposure (known issue #1). `start.ts` is correctly wrapped.
83
+
84
+ - **Issue B**: `perf-fixes.test.ts` -- the misleading comment affects any future developer
85
+ implementing the walk cache. They might skip writing cache tests because the comment implies
86
+ the test already validates cache behavior.
87
+
88
+ - **Issues C/D/E**: `raw-workflow-file-scanner.ts` affects `FileWorkflowStorage.buildWorkflowIndex`
89
+ (via `findWorkflowJsonFiles`) and `scanRawWorkflowFiles` (which calls `findWorkflowJsonFiles`
90
+ then does per-file reads). Both callers receive non-deterministic, sequentially-scanned results.
91
+
92
+ ---
93
+
94
+ ## New Issues
95
+
96
+ ### Issue A: `handleV2InspectWorkflow` has the same unguarded call site as the known #1
97
+
98
+ **Summary:** `v2-workflow.ts` line 332 uses bare `await createWorkflowReaderForRequest(...)` in
99
+ `handleV2InspectWorkflow`, identical to the known issue at line 193 in `handleV2ListWorkflows`.
100
+
101
+ - **Tensions resolved**: names the second unguarded call site
102
+ - **Tensions accepted**: requires the same fix pattern as known issue #1
103
+ - **Boundary**: `v2-workflow.ts:332` -- the `handleV2InspectWorkflow` function
104
+ - **Failure mode**: `listRememberedRoots` error propagates as unhandled exception to MCP transport
105
+ - **Repo pattern**: `start.ts` correctly uses `RA.fromPromise(createWorkflowReaderForRequest(...), mapper)` -- that is the right pattern
106
+ - **Gains**: fixing this gives complete handler coverage; losing it means inspect_workflow crashes on remembered-roots store errors
107
+ - **Scope**: best-fit -- single line change at the call site
108
+ - **Philosophy fit**: fixing restores 'Errors are data'
109
+
110
+ **Evidence**: `src/mcp/handlers/v2-workflow.ts` line 332:
111
+ ```ts
112
+ ? await createWorkflowReaderForRequest({
113
+ ```
114
+ vs `src/mcp/handlers/v2-execution/start.ts` line 364:
115
+ ```ts
116
+ ? RA.fromPromise(
117
+ createWorkflowReaderForRequest({...}),
118
+ (err): StartWorkflowError => ({...})
119
+ )
120
+ ```
121
+
122
+ ---
123
+
124
+ ### Issue B: Test file comment describes a walk cache that does not exist
125
+
126
+ **Summary:** `perf-fixes.test.ts` lines 17-18 describe 'the module-level walk cache (keyed on
127
+ sorted root paths)' -- a data structure that is entirely absent from `request-workflow-reader.ts`.
128
+
129
+ - **Tensions resolved**: names the maintenance hazard
130
+ - **Tensions accepted**: fix is purely editorial (update the comment)
131
+ - **Boundary**: `tests/performance/perf-fixes.test.ts` -- the test file JSDoc block
132
+ - **Failure mode**: future implementer reads the comment, assumes the cache is already tested,
133
+ and ships the cache implementation without writing cache hit/miss/TTL tests
134
+ - **Repo pattern**: departs from 'Document why not what' -- should describe why unique temp dirs
135
+ are used (to prevent cross-test interference), not describe a feature that doesn't exist
136
+ - **Scope**: best-fit -- comment update only
137
+ - **Philosophy fit**: violation of 'Document why not what'
138
+
139
+ **Evidence**: `tests/performance/perf-fixes.test.ts` lines 17-18:
140
+ ```
141
+ * Each test uses a unique mkdtemp path so the module-level walk cache
142
+ * (keyed on sorted root paths) does not mask the actual walk cost.
143
+ ```
144
+ No cache exists anywhere in `request-workflow-reader.ts`.
145
+
146
+ ---
147
+
148
+ ### Issue C: `statSync` in `scanRawWorkflowFiles` blocks the Node.js event loop
149
+
150
+ **Summary:** `raw-workflow-file-scanner.ts` line 95 uses the synchronous `statSync` inside an
151
+ async function, blocking the event loop during file size checks.
152
+
153
+ - **Tensions resolved**: eliminates the sync I/O stall
154
+ - **Tensions accepted**: requires replacing with `await fs.stat(...)`
155
+ - **Boundary**: `scanRawWorkflowFiles` inner loop, line 95
156
+ - **Failure mode**: in the current state, every file in a workflow directory causes an event-loop stall during `scanRawWorkflowFiles` -- under concurrent load, all in-flight requests pause
157
+ - **Repo pattern**: `fs/promises` is already imported at line 1; `statSync` and `existsSync` are imported from `'fs'` at line 2. Switching to async stat removes the sync import.
158
+ - **Scope**: best-fit -- one-line replacement
159
+ - **Philosophy fit**: violates async contract ('Determinism over cleverness', implicit event-loop contract)
160
+
161
+ **Evidence**: `src/application/use-cases/raw-workflow-file-scanner.ts` line 2 and 95:
162
+ ```ts
163
+ import { existsSync, statSync } from 'fs';
164
+ ...
165
+ const stats = statSync(filePath);
166
+ ```
167
+ The design doc (perf-fixes-design-candidates.md, Candidate B note) mentions replacing `statSync`
168
+ with async `fs.stat`.
169
+
170
+ ---
171
+
172
+ ### Issue D: `findWorkflowJsonFiles` uses sequential `await` inside a `for` loop (no parallelization)
173
+
174
+ **Summary:** `raw-workflow-file-scanner.ts` lines 19-35 implement `scan()` as a sequential
175
+ `for...of` loop with `await scan(fullPath)` inside -- each subdirectory is fully scanned before
176
+ the next one starts.
177
+
178
+ - **Tensions resolved**: names the sequential I/O bottleneck in the scanner
179
+ - **Tensions accepted**: parallelization requires explicit sort to restore deterministic order
180
+ - **Boundary**: `scan()` inner function inside `findWorkflowJsonFiles`, lines 19-35
181
+ - **Failure mode**: on a deep workflow directory with many subdirectories, scan is O(depth) sequential round trips even on fast SSDs
182
+ - **Repo pattern**: the design doc specifies `Promise.all` fan-out; this pattern is used elsewhere in the codebase
183
+ - **Scope**: best-fit -- change is inside `scan()`, no interface change
184
+ - **Philosophy fit**: honors 'Compose with small pure functions' when fixed (scan becomes fan-out); violates 'Determinism over cleverness' if fan-out added without sort (see Issue E)
185
+
186
+ **Evidence**: `src/application/use-cases/raw-workflow-file-scanner.ts` lines 23-35:
187
+ ```ts
188
+ for (const entry of entries) {
189
+ const fullPath = path.join(currentDir, entry.name);
190
+ if (entry.isDirectory()) {
191
+ if (entry.name === 'examples') { continue; }
192
+ await scan(fullPath); // sequential -- next dir waits for this one
193
+ } else if (...) { ... }
194
+ }
195
+ ```
196
+
197
+ ---
198
+
199
+ ### Issue E: `findWorkflowJsonFiles` returns files in non-deterministic filesystem order
200
+
201
+ **Summary:** The `files[]` array in `findWorkflowJsonFiles` is accumulated via sequential push
202
+ with no final sort, so output order depends on `readdir` order, which varies by OS and filesystem.
203
+
204
+ - **Tensions resolved**: names the non-determinism in the output
205
+ - **Tensions accepted**: a sort step adds minor overhead (negligible at workflow file counts)
206
+ - **Boundary**: return point of `findWorkflowJsonFiles`, after `await scan(baseDirReal)`
207
+ - **Failure mode**: callers that process workflows in order may behave differently on macOS vs Linux CI; integration tests could have latent order-dependency bugs
208
+ - **Repo pattern**: `request-workflow-reader.ts` already sorts entries: `const sortedEntries = [...entries].sort(...)` before iterating -- this is the established pattern
209
+ - **Scope**: best-fit -- `files.sort()` before return
210
+ - **Philosophy fit**: violates 'Determinism over cleverness'; fix restores it
211
+
212
+ **Evidence**: `src/application/use-cases/raw-workflow-file-scanner.ts` line 37-39:
213
+ ```ts
214
+ await scan(baseDirReal);
215
+ return files; // no sort -- order is readdir order (OS-dependent)
216
+ }
217
+ ```
218
+ vs `request-workflow-reader.ts` line 233:
219
+ ```ts
220
+ const sortedEntries = [...entries].sort((a, b) => a.name.localeCompare(b.name));
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Comparison and Recommendation
226
+
227
+ | Issue | Severity | Category | Fix complexity |
228
+ |---|---|---|---|
229
+ | A: inspect_workflow unguarded | High | Robustness | Low (wrap in RA.fromPromise) |
230
+ | B: phantom cache comment | Medium | Maintenance hazard | Trivial (comment update) |
231
+ | C: statSync blocks event loop | Medium-high | Performance/correctness | Low (await fs.stat) |
232
+ | D: sequential scan | Medium | Performance | Medium (Promise.all + sort) |
233
+ | E: non-deterministic output | Low-medium | Correctness | Trivial (files.sort()) |
234
+
235
+ All 5 are real, actionable, and distinct from the known 7.
236
+
237
+ Fix priority: A first (crash exposure), then C (event-loop blocking), then D+E together
238
+ (parallelization + sort are coupled), then B (editorial).
239
+
240
+ ---
241
+
242
+ ## Self-Critique
243
+
244
+ **Strongest counter-argument against including all 5:**
245
+ - Issue D and Issue E are both about `findWorkflowJsonFiles`, and the design doc (Item 1) already
246
+ covers them implicitly. But the known 7 issues don't name them explicitly -- they focus on the
247
+ walk in `request-workflow-reader.ts`. They belong in the new list.
248
+ - Issue B (phantom cache comment) is 'just a comment' -- but it actively misrepresents the code
249
+ state, which is a maintenance correctness issue, not cosmetic.
250
+
251
+ **Pivot conditions:**
252
+ - If known issue #1 is interpreted to cover 'all call sites of createWorkflowReaderForRequest',
253
+ then Issue A would be a sub-item of #1, not a new issue. The known list's wording names only
254
+ `handleV2ListWorkflows` specifically.
255
+ - If `findWorkflowJsonFiles` is not included in the perf fix scope, Issues D and E drop out.
256
+ But the design doc explicitly targets this function (Item 1).
257
+
258
+ ---
259
+
260
+ ## Open Questions
261
+
262
+ 1. Should Issue A be fixed as part of the existing issue #1 ticket, or as a separate item?
263
+ 2. Is `existsSync` at line 2 of raw-workflow-file-scanner.ts also used synchronously? (It is
264
+ imported but the actual uses should be audited -- it may introduce the same event-loop concern.)
@@ -0,0 +1,110 @@
1
+ # Performance Fixes: New Issues Review Findings
2
+
3
+ **Date:** 2026-04-07
4
+ **Input:** `perf-fixes-new-issues-candidates.md`
5
+
6
+ ---
7
+
8
+ ## Tradeoff Review
9
+
10
+ | Tradeoff | Verdict | Condition under which it fails |
11
+ |---|---|---|
12
+ | Issue E (non-deterministic order) is low severity today | Acceptable | Becomes medium once Issue D (parallelization) is implemented -- the two are coupled |
13
+ | Issue D overlaps with design doc Item 1 | Acceptable | The design doc and the known-7 issue list are separate artifacts; Item 1 is not in the known list |
14
+
15
+ No tradeoffs fail under review.
16
+
17
+ ---
18
+
19
+ ## Failure Mode Review
20
+
21
+ | Failure Mode | Coverage | Highest Risk |
22
+ |---|---|---|
23
+ | Issue A: unhandled throw in inspect_workflow | No mitigation until fixed | YES -- production crash surface |
24
+ | Issue C: event-loop stall on statSync | No mitigation until fixed | Medium-high under concurrent load |
25
+ | Issue E: latent ordering bug after parallelization | No mitigation until fixed | Low today, medium once Issue D is fixed |
26
+
27
+ **Highest-risk failure mode:** Issue A -- the only one that causes a production runtime crash
28
+ (unhandled exception reaching the MCP transport layer).
29
+
30
+ ---
31
+
32
+ ## Runner-Up / Simpler Alternative Review
33
+
34
+ No runner-up -- this is issue discovery, not competing design options. All 5 issues are distinct
35
+ and minimal. No issue can be dropped without leaving a real defect or maintenance hazard.
36
+
37
+ Issues D and E are coupled (parallelization without sorting makes non-determinism worse) and should
38
+ be fixed together.
39
+
40
+ ---
41
+
42
+ ## Philosophy Alignment
43
+
44
+ | Principle | Issue | Status |
45
+ |---|---|---|
46
+ | Errors are data | A: bare await in inspect_workflow | Violated -- throw not a data value |
47
+ | Determinism over cleverness | E: unsorted file list | Violated -- same input, different output |
48
+ | Document why not what | B: phantom cache comment | Violated -- describes nonexistent feature |
49
+ | Async contract (no sync I/O in async) | C: statSync | Violated -- blocks event loop |
50
+ | Functional/declarative | D: sequential for-of await | Tension -- sequential where fan-out is idiomatic |
51
+
52
+ All violations are in the unfixed code. The issue list accurately names them.
53
+
54
+ ---
55
+
56
+ ## Findings
57
+
58
+ ### Red
59
+
60
+ **Issue A: `handleV2InspectWorkflow` has an unguarded bare `await createWorkflowReaderForRequest(...)`**
61
+ - `v2-workflow.ts` line 332: same unhandled-throw exposure as known issue #1 at line 193
62
+ - `start.ts` line 364 is the correct reference: `RA.fromPromise(createWorkflowReaderForRequest(...), mapper)`
63
+ - A `listRememberedRoots` error propagates as an unhandled exception to the MCP transport layer
64
+ - Severity: production crash surface -- same as known issue #1, and equally urgent
65
+
66
+ ### Orange
67
+
68
+ **Issue C: `statSync` at `raw-workflow-file-scanner.ts:95` blocks the Node.js event loop**
69
+ - Synchronous I/O inside an async function; the `'fs'` sync import at line 2 is the entry point
70
+ - Blocks all in-flight concurrent MCP requests during the stat call
71
+ - Fix: `await fs.stat(filePath)` using the already-imported `fs/promises`
72
+ - Secondary: audit `existsSync` (also imported from `'fs'` at line 2) for similar usage
73
+
74
+ ### Yellow
75
+
76
+ **Issue D: Sequential `await scan(fullPath)` in `findWorkflowJsonFiles` (raw-workflow-file-scanner.ts:19-35)**
77
+ - Each subdirectory is fully scanned before the next starts
78
+ - Design doc (perf-fixes-design-candidates.md, Item 1) specifies `Promise.all` fan-out
79
+ - Not named in any of the known 7 issues; it is a distinct item in a different file
80
+ - Coupled with Issue E: must add `files.sort()` when parallelizing
81
+
82
+ **Issue E: `findWorkflowJsonFiles` returns files in non-deterministic OS-dependent order**
83
+ - `raw-workflow-file-scanner.ts:38`: `return files` without a preceding `files.sort()`
84
+ - `FileWorkflowStorage` and `scanRawWorkflowFiles` both consume this output
85
+ - Low risk today; escalates to medium the moment Issue D is fixed
86
+ - Fix is one line: `files.sort()` before `return files`
87
+
88
+ **Issue B: Test comment describes a walk cache that does not exist**
89
+ - `perf-fixes.test.ts` lines 17-18: 'module-level walk cache (keyed on sorted root paths)'
90
+ - No such cache exists in `request-workflow-reader.ts`
91
+ - Future implementer reading the test might skip writing cache tests, believing they already exist
92
+ - Fix: replace the phantom description with the actual reason (unique temp dirs prevent cross-test pollution)
93
+
94
+ ---
95
+
96
+ ## Recommended Revisions to the Candidates Document
97
+
98
+ 1. Elevate Issue A to the same urgency as known issue #1 -- they are identical in failure mode
99
+ 2. Add note to Issue C to audit `existsSync` usage (same file, same import line)
100
+ 3. Note that Issues D and E must be implemented together -- fixing D without E makes ordering worse
101
+
102
+ ---
103
+
104
+ ## Residual Concerns
105
+
106
+ - **Issue A severity**: if the MCP transport layer already catches unhandled promise rejections
107
+ from handler functions and converts them to error responses, Issue A is mitigated at the
108
+ framework level. This should be verified before treating it as a crash vs. a degraded-response.
109
+ - **Issue E completeness**: `FileWorkflowStorage.buildWorkflowIndex` also calls
110
+ `findWorkflowJsonFiles` -- order dependency there should be checked before declaring the fix safe.