@exaudeus/workrail 3.28.0 → 3.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-C146q2kN.js → index-Bl5-Ghuu.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +4 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +3 -3
  160. package/workflows/workflow-for-workflows.v2.json +3 -3
@@ -0,0 +1,96 @@
1
+ # Performance Sweep -- April 2026
2
+
3
+ **Date:** 2026-04-07
4
+ **Status:** Discovery complete, issues filed
5
+
6
+ Six parallel discovery agents audited the full workrail codebase for performance and efficiency issues. This document consolidates all findings.
7
+
8
+ ## Cross-cutting pattern
9
+
10
+ Every layer independently re-reads and re-computes from raw data on every call. Nothing is shared between layers. The same session event log is scanned 10+ times per `continue_workflow` call across the engine, prompt renderer, and session store.
11
+
12
+ ## Findings by area
13
+
14
+ ### 1. Session store & persistence (`src/v2/infra/local/session-store/`)
15
+
16
+ - `appendImpl` calls `loadTruthOrEmpty()` before every write -- a full manifest + all segment reads -- even though `ExecutionSessionGateV2` already loaded the session (double disk read per write)
17
+ - Two separate `open/write/fsync/close` cycles when snapshot pins are present; should be one
18
+ - 200 sequential `stat` calls in `readdirWithMtime` (for-loop, one at a time)
19
+ - Segment files read sequentially despite being independent and immutable once written
20
+ - `loadHealthySummaries` loads sessions sequentially with no concurrency cap and no cache
21
+ - `validateAppendPlan` re-runs Zod parse on every event in the plan -- already trusted data
22
+ - Full event payloads read in `loadTruthOrEmpty` just to extract `dedupeKey` fields
23
+ - `new TextDecoder()` allocated per segment read (should be module-level singleton)
24
+ - `mkdirp(eventsDir)` called on every `append`, not just session creation
25
+
26
+ ### 2. V2 engine core (`src/v2/durable-core/`, `src/mcp/handlers/v2-execution/`)
27
+
28
+ - `continue_workflow` scans `truth.events` 6+ times per call across `continue-advance.ts`, `input-validation.ts`, `replay.ts` with no shared state
29
+ - Session loaded from disk a second time after the advance completes; same events scanned again
30
+ - `projectRunContextV2` called in `validateAdvanceInputs` then again inside `renderPendingPrompt`
31
+ - `projectAssessmentsV2` runs full scan on every step even when no assessment events exist
32
+ - Sortedness validation repeated in every projection (4+ times per advance) on data the store guarantees is sorted
33
+ - `createWorkflow(pinned.definition)` called on every advance for the same immutable workflow hash -- never cached
34
+ - `pinnedStore.get()` called twice on first-advance path when pin already found
35
+ - `deriveWorkflowHashRef` called 3 times with the same input per advance
36
+ - `hasPriorNotesInRun` adds a 4th+ event scan inside `renderPendingPrompt`
37
+
38
+ ### 3. Workflow loading & registry (`src/infrastructure/storage/`, `src/mcp/handlers/`)
39
+
40
+ - N+1 `getWorkflowById` calls per `list_workflows`: 1 list + N individual fetches, then full 5-pass compilation + SHA-256 hash + disk read per workflow on every call
41
+ - New AJV instance + schema compilation on every request (`createWorkflowReaderForRequest`)
42
+ - Recursive filesystem walk of all remembered-root directories per request with workspace signal
43
+ - `CachingWorkflowStorage` uses linear `find` scan instead of `Map` lookup
44
+ - `listWorkflowSummaries` triggers full validation pass just to return metadata fields
45
+ - `statSync` blocking event loop in index build (`FileWorkflowStorage.buildWorkflowIndex`)
46
+ - `workflow.schema.json` re-read and JSON.parsed on every `workflow_get_schema` call
47
+ - `listWorkflowSummaries` and `loadAllWorkflows` as two parallel independent index reads
48
+
49
+ ### 4. MCP handler layer (`src/mcp/handlers/`, `src/mcp/handler-factory.ts`)
50
+
51
+ - Output schema `.parse()` on every hot-path response on data the server itself produced (handlers for `continue_workflow`, `start_workflow`, `list_workflows`, etc. all call `Schema.parse()` on their own output)
52
+ - `V2BlockerReportSchema.superRefine()` runs O(n log n) duplicate-check on every parse
53
+ - `process.env.WORKRAIL_CLEAN_RESPONSE_FORMAT` read as string comparison per call (not cached)
54
+ - `coerceJsonStringObjectFields` rebuilds object-field set from schema shape per call
55
+ - `JSON.stringify(..., null, 2)` with indentation on all machine-to-machine wire responses
56
+ - `getV2ExecutionRenderEnvelope` called twice per non-execution response
57
+ - Schema shape re-traversed on every validation error for suggestion generation
58
+
59
+ ### 5. Console service & data projection (`src/v2/usecases/console-service.ts`)
60
+
61
+ - Full 500-session disk load + projection rebuild on every `/api/v2/sessions` request, no caching
62
+ - `/api/v2/worktrees` calls `getSessionList()` a second time (double the I/O)
63
+ - `projectRunDagV2` called 3-4 times on the same event array per session per request
64
+ - `resolveRunCompletion` always re-projects the DAG from events even when caller has it
65
+ - `projectRunStatusSignalsV2` internally calls `projectRunDagV2` + `projectGapsV2` again
66
+ - `projectSessionHealthV2` calls `projectRunDagV2` yet again
67
+ - `projectNodeOutputsV2` called twice per session summary (title extraction + recap)
68
+ - `projectNodeDetail` runs 5 independent full event-log scans sequentially
69
+ - `loadSegmentsRecursive` O(N^2) array allocations via spread per segment
70
+
71
+ ### 6. Prompt rendering & content assembly (`src/v2/durable-core/domain/`)
72
+
73
+ - `renderPendingPrompt` runs 3 independent full-event-log projections (`projectRunContextV2`, `projectRunDagV2`, `projectNodeOutputsV2`) plus `hasPriorNotesInRun` scan
74
+ - `resolveParentLoopStep` and `getStepById` both do double-nested workflow traversal on every render
75
+ - `expandFunctionDefinitions` re-searches workflow definition on every call
76
+ - `buildChain`/`buildPathBackward` in `recap-recovery.ts` allocate O(N^2) `Set` objects per ancestry traversal
77
+ - `renderBudgetedRehydrateRecovery` encodes the same string 3 times in the budget-trim loop
78
+ - Tier lookup functions use `Array.find` over constant 2-3 element arrays (should be `Record`)
79
+ - Shared mutable global `g`-flag regex in `context-template-resolver.ts` (latent correctness bug)
80
+ - `dotPath.split('.')` allocates new array on every template token match
81
+ - `JSON.stringify` for node deduplication equality in `projectRunDagV2`
82
+
83
+ ## Highest-leverage fixes
84
+
85
+ | Priority | Fix | Areas | Issues |
86
+ |---|---|---|---|
87
+ | 1 | `SessionIndex`: build once at load, thread through engine + renderer | Engine, renderer | #248 |
88
+ | 2 | `(sessionId, mtime)` projection cache in console service | Console | #249 |
89
+ | 3 | Remove output-side Zod `.parse()` on server-produced responses | MCP | #250 |
90
+ | 4 | Thread loaded session into `appendImpl` (eliminate double disk read) | Session store | #252 |
91
+ | 5 | Cache `createWorkflow` by hash; fix AJV singleton; fix fs walk | Engine, workflows | #254, #256 |
92
+ | 6 | Parallelize serial I/O (stat loop, segment reads) | Session store | #253 |
93
+ | 7 | Pre-index step/loop/function lookups at Workflow construction | Engine, renderer | #255 |
94
+ | 8 | Fix N+1 workflow fetches and recursive fs walk per request | Workflows | #256 |
95
+ | 9 | Fix serialization overhead (JSON indent, env vars, coercion) | MCP | #251 |
96
+ | 10 | Fix O(N^2) ancestry + budget loop re-encoding + minor allocations | Renderer | #257 |
@@ -0,0 +1,219 @@
1
+ # Routines Guide — Three Consumption Modes
2
+
3
+ Routines are reusable cognitive workflows defined as JSON in `workflows/routines/`.
4
+ They can be consumed in three ways, each suited to different orchestration needs.
5
+
6
+ ## Mode 1: Delegation (WorkRail Executor)
7
+
8
+ The primary agent delegates a routine to a **WorkRail Executor subagent** at runtime.
9
+ The subagent runs the routine's steps independently and returns output to the parent.
10
+
11
+ **When to use**: bounded cognitive tasks (design generation, hypothesis challenge, plan analysis)
12
+ where the parent agent wants to continue working in parallel.
13
+
14
+ **How it works**:
15
+ 1. Parent agent spawns a WorkRail Executor with a `routineId`
16
+ 2. The executor runs the routine's steps sequentially
17
+ 3. Output flows back to the parent via the session
18
+
19
+ **Example** (in a workflow step prompt):
20
+ ```
21
+ Spawn ONE WorkRail Executor running `routine-tension-driven-design` with your
22
+ tensions, philosophy sources, and problem understanding as input.
23
+ ```
24
+
25
+ ## Mode 2: Direct Execution (Agent Follows Steps)
26
+
27
+ An agent reads the routine definition and **follows its steps directly** as structured guidance.
28
+ No subagent spawning — the agent itself executes each step in sequence.
29
+
30
+ **When to use**: when the agent IS the executor (e.g., inside a WorkRail Executor session),
31
+ or when delegation overhead isn't justified.
32
+
33
+ **How it works**:
34
+ 1. Agent loads the routine JSON
35
+ 2. Agent executes each step's prompt in order
36
+ 3. Agent produces the deliverable described in the final step
37
+
38
+ ## Mode 3: Injection (Compile-Time Template Expansion)
39
+
40
+ A workflow references a routine via a `type: "template_call"` step, and the **compiler expands the routine's
41
+ steps inline** at compile time. The routine's steps become first-class workflow steps.
42
+
43
+ **When to use**: when routine steps should be visible in the workflow's step list, participate
44
+ in confirmation gates, and be tracked individually in the session.
45
+
46
+ **How it works**:
47
+ 1. A workflow step declares a `type: "template_call"` step with the routine's template ID and args
48
+ 2. At compile time, the template registry expands the routine into real steps
49
+ 3. The expanded steps replace the template call step in the compiled workflow
50
+ 4. `{arg}` placeholders in prompts are substituted; `{{contextVar}}` is preserved for runtime
51
+
52
+ **Template ID convention**:
53
+ - Routine `routine-tension-driven-design` → template ID `wr.templates.routine.tension-driven-design`
54
+ - The `routine-` prefix is stripped automatically
55
+
56
+ **Example** (in workflow JSON):
57
+ ```json
58
+ {
59
+ "type": "template_call",
60
+ "templateId": "wr.templates.routine.tension-driven-design",
61
+ "args": {
62
+ "deliverableName": "design-candidates.md"
63
+ }
64
+ }
65
+ ```
66
+
67
+ **What happens at compile time**:
68
+ - The step above is replaced by the routine's 5 steps (step-discover-philosophy, step-understand-deeply, etc.)
69
+ - Each expanded step ID is prefixed using the compiler's provenance/step identity rules
70
+ - `{deliverableName}` in prompts becomes `design-candidates.md`
71
+ - The routine's `metaGuidance` is injected as step-level `guidance` on each expanded step
72
+ - `preconditions` and `clarificationPrompts` are NOT included (parent workflow handles those)
73
+
74
+ **Constraints**:
75
+ - Routine steps must NOT contain nested `template_call` usage (no recursive injection)
76
+ - All `{arg}` placeholders must be satisfied by the template call's `args`
77
+ - Arg values must be primitives (string, number, boolean) — objects/arrays are rejected
78
+
79
+ ## Comparison
80
+
81
+ | Aspect | Delegation | Direct Execution | Injection |
82
+ |---|---|---|---|
83
+ | When resolved | Runtime | Runtime | Compile time |
84
+ | Parallelism | Yes (subagent) | No | N/A (steps are inline) |
85
+ | Step visibility | Opaque to parent | Transparent | Fully visible |
86
+ | Confirmation gates | Subagent only | Agent decides | Per-step as authored |
87
+ | Session tracking | Separate session | Same session | Same session, per-step |
88
+ | Arg substitution | Via context | Via context | `{arg}` → compile-time |
89
+
90
+ ## Selection guidance
91
+
92
+ Choosing the right consumption mode matters as much as choosing the right routine.
93
+
94
+ ### Default decision rule
95
+
96
+ Use this order unless you have a strong reason not to:
97
+
98
+ - **Use injection (`templateCall`) by default** when the routine is part of the parent workflow's authored structure.
99
+ - **Use delegation** when the routine's value comes from an independent perspective, parallelism, or an intentionally opaque bounded audit.
100
+ - **Use extension points only to make delegation seams overridable**, not as a substitute for routine injection.
101
+
102
+ A good litmus test:
103
+
104
+ - If you want the routine's **steps to appear in the parent workflow**, use **injection**.
105
+ - If you want the routine's **result but not its internal steps**, use **delegation**.
106
+ - If you want a team to **swap which delegated implementation is called** without forking the parent workflow, add an **extension point** around that delegated seam.
107
+
108
+ ### What extension points do not do
109
+
110
+ `extensionPoints` and `{{wr.bindings.*}}` do **not** inject a routine into the parent workflow.
111
+
112
+ They only resolve a slot to a routine/workflow ID in prompt text at compile time. The parent agent still decides whether to call or follow that bound implementation at runtime.
113
+
114
+ Because binding resolution runs **after** template expansion, extension points cannot currently choose which routine gets injected via `templateCall`.
115
+
116
+ ### Prefer delegation when
117
+
118
+ - an independent cognitive perspective adds value
119
+ - the parent can continue useful work in parallel
120
+ - the routine is acting as an auditor, challenger, or verifier
121
+ - the routine's internal steps do not need to be visible as first-class parent workflow steps
122
+
123
+ Common examples:
124
+
125
+ - context completeness / depth audits
126
+ - adversarial hypothesis challenge
127
+ - philosophy alignment review
128
+ - final verification from a fresh perspective
129
+
130
+ ### Prefer direct execution when
131
+
132
+ - delegation overhead is not justified
133
+ - the current agent is already the natural executor
134
+ - step visibility is unnecessary
135
+ - the routine is mainly a reusable thinking scaffold, not a separate perspective
136
+
137
+ ### Prefer injection when
138
+
139
+ - the routine's steps should be visible in the parent workflow
140
+ - confirmation behavior should apply per injected step
141
+ - session traceability matters
142
+ - the routine is central enough to the parent workflow that hiding it behind opaque delegation would reduce debuggability
143
+ - the author wants the engine, not the agent, to own that reusable subflow
144
+
145
+ Common examples:
146
+
147
+ - reusable design-generation cores
148
+ - reusable final-verification skeletons
149
+ - bounded reusable subflows the author wants Studio/session visibility for
150
+
151
+ ## Auditor-first guidance
152
+
153
+ For many high-value routines, the best default mental model is **auditor**, not **task owner**.
154
+
155
+ That means the parent workflow:
156
+
157
+ - gathers or synthesizes the current state
158
+ - delegates a bounded audit/challenge/verification package
159
+ - interprets the returned artifact as evidence
160
+
161
+ not as canonical truth.
162
+
163
+ This is often a better fit than executor-style delegation for:
164
+
165
+ - review workflows
166
+ - planning workflows
167
+ - verification-heavy workflows
168
+
169
+ ## High-value routine defaults
170
+
171
+ The current routine catalog suggests these default uses:
172
+
173
+ - `routine-context-gathering`: completeness/depth audit or bounded context expansion
174
+ - `routine-hypothesis-challenge`: adversarial challenge against the current leading story
175
+ - `routine-execution-simulation`: bounded runtime/flow reasoning where mental execution adds value
176
+ - `routine-philosophy-alignment`: review against user/repo principles
177
+ - `routine-final-verification`: proof-oriented end-state validation
178
+
179
+ ## Good and bad fits
180
+
181
+ ### Good fit for delegation
182
+
183
+ - an adversarial reviewer challenging the current recommendation
184
+ - a philosophy/policy auditor checking alignment against repo rules
185
+ - a fresh final verifier evaluating whether evidence really supports the conclusion
186
+
187
+ ### Bad fit for delegation
188
+
189
+ - tiny deterministic transformations that the parent can do faster directly
190
+ - parent-owned loop decisions or canonical synthesis
191
+ - work where hiding the internal steps would make the session harder to debug
192
+
193
+ ### Good fit for injection
194
+
195
+ - a reusable multi-step authoring scaffold the parent wants visible in the step list
196
+ - a reusable verification sequence that should honor parent confirmation gates
197
+
198
+ ### Bad fit for injection
199
+
200
+ - every small repeated instruction block
201
+ - routines whose value comes mainly from independent perspective rather than visible sub-steps
202
+
203
+ ### Prefer extension points when
204
+
205
+ - the parent workflow intentionally delegates a bounded seam
206
+ - teams may want to replace that delegated implementation per project
207
+ - the parent workflow still owns synthesis, loop control, and final decisions
208
+
209
+ ### Bad fit for extension points
210
+
211
+ - using `{{wr.bindings.*}}` where the real goal is inline routine structure
212
+ - hiding a core parent subflow behind a rebinding slot just to avoid hardcoding a routine ID
213
+ - expecting project bindings to change which routine a `templateCall` injects
214
+
215
+ ## See Also
216
+
217
+ - `workflows/examples/routine-injection-example.json` — example workflow using injection
218
+ - `src/application/services/compiler/template-registry.ts` — injection implementation
219
+ - `src/application/services/compiler/routine-loader.ts` — routine loading from disk
@@ -0,0 +1,11 @@
1
+ # Sequence Diagrams for Native Context Management
2
+
3
+ > **Not pursuing**
4
+ >
5
+ > WorkRail is not planning to implement native context management.
6
+ >
7
+ > This file is kept only as a stable tombstone so old links do not break.
8
+ >
9
+ > See:
10
+ > - `docs/roadmap/legacy-planning-status.md`
11
+ > - `docs/plans/native-context-management-epic.md`
@@ -0,0 +1,220 @@
1
+ # Subagent Design Principles & Catalog
2
+
3
+ ## Overview
4
+
5
+ This document defines WorkRail's approach to subagent design for agentic IDEs. It outlines the core principles, patterns, and catalog of specialized subagents that enhance WorkRail workflows.
6
+
7
+ **Philosophy:** Subagents are **specialized cognitive functions**, not task owners. They execute complete, autonomous routines and return structured deliverables to the main agent orchestrator.
8
+
9
+ ---
10
+
11
+ ## Core Principles
12
+
13
+ ### 1. **Cognitive Specialization, Not Task Ownership**
14
+
15
+ ** Good:** "Context Researcher" - Specializes in deep reading and systematic exploration
16
+ ** Bad:** "Debugger" - Too broad, owns entire debugging workflow
17
+
18
+ **Rule:** Subagents should embody a **specific cognitive mode** (exploration, challenge, verification) that can be applied across many workflows, not own a complete workflow themselves.
19
+
20
+ ### 2. **Stateless & Self-Contained**
21
+
22
+ Each subagent invocation is independent:
23
+ - **No memory** between calls
24
+ - **No conversational refinement**
25
+ - **No follow-up questions**
26
+
27
+ **Implication:** The main agent must provide **all necessary context upfront** in a single, complete work package.
28
+
29
+ **Pattern:**
30
+ ```
31
+ Main Agent → Subagent: [Complete Context Package]
32
+ Subagent: [Autonomous Execution]
33
+ Subagent → Main Agent: [Structured Deliverable]
34
+ ```
35
+
36
+ ### 3. **Autonomous Routine Execution**
37
+
38
+ Subagents execute **complete routines** from start to finish:
39
+ - Receive: Self-contained work package with all context
40
+ - Execute: Multi-step routine autonomously
41
+ - Return: Named, structured artifact (e.g., `ExecutionFlow.md`)
42
+
43
+ **Not this:** Iterative back-and-forth, gradual context building, conversational refinement.
44
+
45
+ ### 4. **Depth-Aware Investigation**
46
+
47
+ For research/exploration tasks, subagents support **configurable depth levels** to balance speed vs thoroughness:
48
+
49
+ | Level | Name | Time | Use Case |
50
+ |-------|------|------|----------|
51
+ | 0 | Survey | 1-2 min | "What exists here?" |
52
+ | 1 | Scan | 5-10 min | "What are the major components?" |
53
+ | 2 | Explore | 15-30 min | "What does each component do?" |
54
+ | 3 | Analyze | 30-60 min | "How does this specific logic work?" |
55
+ | 4 | Dissect | 60+ min | "What is every line doing?" |
56
+
57
+ Main agent chooses depth based on uncertainty and importance.
58
+
59
+ ### 5. **Structured Deliverables**
60
+
61
+ Every subagent routine produces a **named artifact** with a **consistent structure**:
62
+
63
+ **Standard Output Format:**
64
+ ```markdown
65
+ ### Summary (3-5 bullets)
66
+ - Key findings
67
+
68
+ ### Detailed Findings
69
+ - Component breakdowns
70
+ - File citations (file:line)
71
+
72
+ ### Suspicious Points / Concerns / Gaps
73
+ - What could be problematic
74
+ - What couldn't be determined
75
+
76
+ ### Recommendations
77
+ - What main agent should do next
78
+ ```
79
+
80
+ **Deliverable Quality Gates:**
81
+
82
+ Main agent validates each deliverable against these criteria:
83
+ - **Completeness**: All required sections present
84
+ - **Citations**: File:line references for all findings
85
+ - **Gaps Section**: Explicit about limitations and unknowns
86
+ - **Actionability**: Clear next steps or recommendations
87
+
88
+ **If a deliverable fails quality gates**, the main agent should:
89
+ 1. Note the gaps in the workflow context
90
+ 2. Decide if the partial deliverable is sufficient
91
+ 3. Optionally re-run with clarified context (not automatic)
92
+
93
+ **Artifact Naming Convention:** Use kebab-case for filenames:
94
+ - `execution-flow.md`
95
+ - `hypothesis-challenges.md`
96
+ - `plan-analysis.md`
97
+
98
+ ### 6. **Explicit Over Implicit**
99
+
100
+ While agentic IDEs support auto-invocation (system picks subagent based on task description), **WorkRail workflows use explicit delegation**:
101
+
102
+ ```
103
+ Use: task(subagent_type="context-researcher", prompt="...")
104
+ Not: "Hey, someone gather context for me" (auto-invoke)
105
+ ```
106
+
107
+ **Rationale:** Predictability, debuggability, user understanding.
108
+
109
+ ### 7. **Auditor Model: Review, Don't Execute**
110
+
111
+ **Key Discovery:** Subagents work better as **auditors** than **executors**.
112
+
113
+ ** Executor Model (Problematic):**
114
+ ```
115
+ Main Agent: "Go gather context about authentication"
116
+ Subagent: *reads files, builds understanding*
117
+ Problem: Main agent doesn't have the context, needs to re-read
118
+ ```
119
+
120
+ ** Auditor Model (Effective):**
121
+ ```
122
+ Main Agent: *reads files, builds understanding*
123
+ Main Agent: "I read these files and learned X. Audit my work."
124
+ Subagent: "You missed Y, assumption Z is risky, go deeper on W"
125
+ Main Agent: *investigates gaps*
126
+ Result: Main agent has full context + quality control
127
+ ```
128
+
129
+ **Why Auditors Work Better:**
130
+ - **No dilution**: Main agent has full, uncompressed context
131
+ - **No duplication**: Main agent doesn't need to re-read what subagent read
132
+ - **Fresh perspective**: Auditor catches gaps and blind spots
133
+ - **Quality control**: Ensures sufficient understanding before proceeding
134
+ - **Cognitive diversity**: Different perspective on the same work
135
+
136
+ **When to Use Auditors:**
137
+ - Context gathering (audit for completeness and depth)
138
+ - Hypothesis formation (challenge assumptions)
139
+ - Plan creation (validate completeness and soundness)
140
+ - Final validation (adversarial review before committing)
141
+
142
+ **When Executors Still Make Sense:**
143
+ - Simulation (running "what-if" scenarios in parallel)
144
+ - Independent parallel work (different execution paths)
145
+ - Specialized tasks main agent can't do well
146
+
147
+ ### 8. **Parallel Delegation for Critical Work**
148
+
149
+ **Pattern:** Spawn multiple subagents **simultaneously** for critical phases to get diverse perspectives and ensure nothing is missed.
150
+
151
+ **Explicit Parallelism:**
152
+ ```
153
+ **CRITICAL: Spawn ALL subagents SIMULTANEOUSLY, not sequentially.**
154
+
155
+ Delegate to THREE subagents AT THE SAME TIME:
156
+ 1. [Subagent 1 with specific focus]
157
+ 2. [Subagent 2 with different focus]
158
+ 3. [Subagent 3 with different focus]
159
+ ```
160
+
161
+ **Use Cases:**
162
+
163
+ **1. Multi-Perspective Auditing (Diverse Focuses)**
164
+ ```
165
+ Main agent gathers context
166
+
167
+ Parallel Audit (2-3 subagents):
168
+ ├─ Context Researcher (FOCUS: Completeness)
169
+ ├─ Context Researcher (FOCUS: Depth)
170
+ └─ [Optional 3rd perspective]
171
+
172
+ Main agent synthesizes all perspectives
173
+ ```
174
+
175
+ **2. Redundant Critical Work (Different Rigor)**
176
+ ```
177
+ Main agent forms hypotheses
178
+
179
+ Parallel Challenge (2 subagents):
180
+ ├─ Hypothesis Challenger (rigor=3: Thorough)
181
+ └─ Hypothesis Challenger (rigor=5: Maximum)
182
+
183
+ Main agent strengthens hypotheses based on challenges
184
+ ```
185
+
186
+ **3. Multi-Modal Validation (Different Cognitive Modes)**
187
+ ```
188
+ Main agent proposes fix
189
+
190
+ Parallel Validation (3 subagents):
191
+ ├─ Hypothesis Challenger (adversarial review)
192
+ ├─ Execution Simulator (simulate the fix)
193
+ └─ Plan Analyzer (validate the plan)
194
+
195
+ Main agent proceeds only if ALL THREE validate
196
+ ```
197
+
198
+ **Synthesis Guidance:**
199
+
200
+ When main agent receives multiple parallel deliverables:
201
+ - **Common concerns**: If 2+ subagents flag the same issue → High priority
202
+ - **Unique insights**: Each subagent may catch different gaps → Investigate all
203
+ - **Conflicting advice**: If they disagree → Investigate to understand why
204
+ - **Quality gate**: For critical phases, require ALL subagents to validate
205
+
206
+ **Cost/Speed Tradeoff:**
207
+ - Parallel = faster wall time but higher token cost
208
+ - Use for critical phases where quality matters most
209
+ - Use for phases where diverse perspectives add value
210
+
211
+ ### 9. **Focused Audits for Parallel Work**
212
+
213
+ When spawning multiple auditors in parallel, give each a **specific focus** to maximize diversity and minimize overlap.
214
+
215
+ **Pattern:**
216
+ ```
217
+ Subagent 1: FOCUS = Completeness
218
+ - Priority: Did they miss any critical areas?
219
+ - Still checks other dimensions, but emphasizes coverage
220
+ ```