@exaudeus/workrail 3.27.0 → 3.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +3 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +252 -93
- package/workflows/workflow-for-workflows.v2.json +188 -77
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
# Performance Sweep -- April 2026
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-07
|
|
4
|
+
**Status:** Discovery complete, issues filed
|
|
5
|
+
|
|
6
|
+
Six parallel discovery agents audited the full workrail codebase for performance and efficiency issues. This document consolidates all findings.
|
|
7
|
+
|
|
8
|
+
## Cross-cutting pattern
|
|
9
|
+
|
|
10
|
+
Every layer independently re-reads and re-computes from raw data on every call. Nothing is shared between layers. The same session event log is scanned 10+ times per `continue_workflow` call across the engine, prompt renderer, and session store.
|
|
11
|
+
|
|
12
|
+
## Findings by area
|
|
13
|
+
|
|
14
|
+
### 1. Session store & persistence (`src/v2/infra/local/session-store/`)
|
|
15
|
+
|
|
16
|
+
- `appendImpl` calls `loadTruthOrEmpty()` before every write -- a full manifest + all segment reads -- even though `ExecutionSessionGateV2` already loaded the session (double disk read per write)
|
|
17
|
+
- Two separate `open/write/fsync/close` cycles when snapshot pins are present; should be one
|
|
18
|
+
- 200 sequential `stat` calls in `readdirWithMtime` (for-loop, one at a time)
|
|
19
|
+
- Segment files read sequentially despite being independent and immutable once written
|
|
20
|
+
- `loadHealthySummaries` loads sessions sequentially with no concurrency cap and no cache
|
|
21
|
+
- `validateAppendPlan` re-runs Zod parse on every event in the plan -- already trusted data
|
|
22
|
+
- Full event payloads read in `loadTruthOrEmpty` just to extract `dedupeKey` fields
|
|
23
|
+
- `new TextDecoder()` allocated per segment read (should be module-level singleton)
|
|
24
|
+
- `mkdirp(eventsDir)` called on every `append`, not just session creation
|
|
25
|
+
|
|
26
|
+
### 2. V2 engine core (`src/v2/durable-core/`, `src/mcp/handlers/v2-execution/`)
|
|
27
|
+
|
|
28
|
+
- `continue_workflow` scans `truth.events` 6+ times per call across `continue-advance.ts`, `input-validation.ts`, `replay.ts` with no shared state
|
|
29
|
+
- Session loaded from disk a second time after the advance completes; same events scanned again
|
|
30
|
+
- `projectRunContextV2` called in `validateAdvanceInputs` then again inside `renderPendingPrompt`
|
|
31
|
+
- `projectAssessmentsV2` runs full scan on every step even when no assessment events exist
|
|
32
|
+
- Sortedness validation repeated in every projection (4+ times per advance) on data the store guarantees is sorted
|
|
33
|
+
- `createWorkflow(pinned.definition)` called on every advance for the same immutable workflow hash -- never cached
|
|
34
|
+
- `pinnedStore.get()` called twice on first-advance path when pin already found
|
|
35
|
+
- `deriveWorkflowHashRef` called 3 times with the same input per advance
|
|
36
|
+
- `hasPriorNotesInRun` adds a 4th+ event scan inside `renderPendingPrompt`
|
|
37
|
+
|
|
38
|
+
### 3. Workflow loading & registry (`src/infrastructure/storage/`, `src/mcp/handlers/`)
|
|
39
|
+
|
|
40
|
+
- N+1 `getWorkflowById` calls per `list_workflows`: 1 list + N individual fetches, then full 5-pass compilation + SHA-256 hash + disk read per workflow on every call
|
|
41
|
+
- New AJV instance + schema compilation on every request (`createWorkflowReaderForRequest`)
|
|
42
|
+
- Recursive filesystem walk of all remembered-root directories per request with workspace signal
|
|
43
|
+
- `CachingWorkflowStorage` uses linear `find` scan instead of `Map` lookup
|
|
44
|
+
- `listWorkflowSummaries` triggers full validation pass just to return metadata fields
|
|
45
|
+
- `statSync` blocking event loop in index build (`FileWorkflowStorage.buildWorkflowIndex`)
|
|
46
|
+
- `workflow.schema.json` re-read and JSON.parsed on every `workflow_get_schema` call
|
|
47
|
+
- `listWorkflowSummaries` and `loadAllWorkflows` as two parallel independent index reads
|
|
48
|
+
|
|
49
|
+
### 4. MCP handler layer (`src/mcp/handlers/`, `src/mcp/handler-factory.ts`)
|
|
50
|
+
|
|
51
|
+
- Output schema `.parse()` on every hot-path response on data the server itself produced (handlers for `continue_workflow`, `start_workflow`, `list_workflows`, etc. all call `Schema.parse()` on their own output)
|
|
52
|
+
- `V2BlockerReportSchema.superRefine()` runs O(n log n) duplicate-check on every parse
|
|
53
|
+
- `process.env.WORKRAIL_CLEAN_RESPONSE_FORMAT` read as string comparison per call (not cached)
|
|
54
|
+
- `coerceJsonStringObjectFields` rebuilds object-field set from schema shape per call
|
|
55
|
+
- `JSON.stringify(..., null, 2)` with indentation on all machine-to-machine wire responses
|
|
56
|
+
- `getV2ExecutionRenderEnvelope` called twice per non-execution response
|
|
57
|
+
- Schema shape re-traversed on every validation error for suggestion generation
|
|
58
|
+
|
|
59
|
+
### 5. Console service & data projection (`src/v2/usecases/console-service.ts`)
|
|
60
|
+
|
|
61
|
+
- Full 500-session disk load + projection rebuild on every `/api/v2/sessions` request, no caching
|
|
62
|
+
- `/api/v2/worktrees` calls `getSessionList()` a second time (double the I/O)
|
|
63
|
+
- `projectRunDagV2` called 3-4 times on the same event array per session per request
|
|
64
|
+
- `resolveRunCompletion` always re-projects the DAG from events even when caller has it
|
|
65
|
+
- `projectRunStatusSignalsV2` internally calls `projectRunDagV2` + `projectGapsV2` again
|
|
66
|
+
- `projectSessionHealthV2` calls `projectRunDagV2` yet again
|
|
67
|
+
- `projectNodeOutputsV2` called twice per session summary (title extraction + recap)
|
|
68
|
+
- `projectNodeDetail` runs 5 independent full event-log scans sequentially
|
|
69
|
+
- `loadSegmentsRecursive` O(N^2) array allocations via spread per segment
|
|
70
|
+
|
|
71
|
+
### 6. Prompt rendering & content assembly (`src/v2/durable-core/domain/`)
|
|
72
|
+
|
|
73
|
+
- `renderPendingPrompt` runs 3 independent full-event-log projections (`projectRunContextV2`, `projectRunDagV2`, `projectNodeOutputsV2`) plus `hasPriorNotesInRun` scan
|
|
74
|
+
- `resolveParentLoopStep` and `getStepById` both do double-nested workflow traversal on every render
|
|
75
|
+
- `expandFunctionDefinitions` re-searches workflow definition on every call
|
|
76
|
+
- `buildChain`/`buildPathBackward` in `recap-recovery.ts` allocate O(N^2) `Set` objects per ancestry traversal
|
|
77
|
+
- `renderBudgetedRehydrateRecovery` encodes the same string 3 times in the budget-trim loop
|
|
78
|
+
- Tier lookup functions use `Array.find` over constant 2-3 element arrays (should be `Record`)
|
|
79
|
+
- Shared mutable global `g`-flag regex in `context-template-resolver.ts` (latent correctness bug)
|
|
80
|
+
- `dotPath.split('.')` allocates new array on every template token match
|
|
81
|
+
- `JSON.stringify` for node deduplication equality in `projectRunDagV2`
|
|
82
|
+
|
|
83
|
+
## Highest-leverage fixes
|
|
84
|
+
|
|
85
|
+
| Priority | Fix | Areas | Issues |
|
|
86
|
+
|---|---|---|---|
|
|
87
|
+
| 1 | `SessionIndex`: build once at load, thread through engine + renderer | Engine, renderer | #248 |
|
|
88
|
+
| 2 | `(sessionId, mtime)` projection cache in console service | Console | #249 |
|
|
89
|
+
| 3 | Remove output-side Zod `.parse()` on server-produced responses | MCP | #250 |
|
|
90
|
+
| 4 | Thread loaded session into `appendImpl` (eliminate double disk read) | Session store | #252 |
|
|
91
|
+
| 5 | Cache `createWorkflow` by hash; fix AJV singleton; fix fs walk | Engine, workflows | #254, #256 |
|
|
92
|
+
| 6 | Parallelize serial I/O (stat loop, segment reads) | Session store | #253 |
|
|
93
|
+
| 7 | Pre-index step/loop/function lookups at Workflow construction | Engine, renderer | #255 |
|
|
94
|
+
| 8 | Fix N+1 workflow fetches and recursive fs walk per request | Workflows | #256 |
|
|
95
|
+
| 9 | Fix serialization overhead (JSON indent, env vars, coercion) | MCP | #251 |
|
|
96
|
+
| 10 | Fix O(N^2) ancestry + budget loop re-encoding + minor allocations | Renderer | #257 |
|
|
@@ -0,0 +1,219 @@
|
|
|
1
|
+
# Routines Guide — Three Consumption Modes
|
|
2
|
+
|
|
3
|
+
Routines are reusable cognitive workflows defined as JSON in `workflows/routines/`.
|
|
4
|
+
They can be consumed in three ways, each suited to different orchestration needs.
|
|
5
|
+
|
|
6
|
+
## Mode 1: Delegation (WorkRail Executor)
|
|
7
|
+
|
|
8
|
+
The primary agent delegates a routine to a **WorkRail Executor subagent** at runtime.
|
|
9
|
+
The subagent runs the routine's steps independently and returns output to the parent.
|
|
10
|
+
|
|
11
|
+
**When to use**: bounded cognitive tasks (design generation, hypothesis challenge, plan analysis)
|
|
12
|
+
where the parent agent wants to continue working in parallel.
|
|
13
|
+
|
|
14
|
+
**How it works**:
|
|
15
|
+
1. Parent agent spawns a WorkRail Executor with a `routineId`
|
|
16
|
+
2. The executor runs the routine's steps sequentially
|
|
17
|
+
3. Output flows back to the parent via the session
|
|
18
|
+
|
|
19
|
+
**Example** (in a workflow step prompt):
|
|
20
|
+
```
|
|
21
|
+
Spawn ONE WorkRail Executor running `routine-tension-driven-design` with your
|
|
22
|
+
tensions, philosophy sources, and problem understanding as input.
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Mode 2: Direct Execution (Agent Follows Steps)
|
|
26
|
+
|
|
27
|
+
An agent reads the routine definition and **follows its steps directly** as structured guidance.
|
|
28
|
+
No subagent spawning — the agent itself executes each step in sequence.
|
|
29
|
+
|
|
30
|
+
**When to use**: when the agent IS the executor (e.g., inside a WorkRail Executor session),
|
|
31
|
+
or when delegation overhead isn't justified.
|
|
32
|
+
|
|
33
|
+
**How it works**:
|
|
34
|
+
1. Agent loads the routine JSON
|
|
35
|
+
2. Agent executes each step's prompt in order
|
|
36
|
+
3. Agent produces the deliverable described in the final step
|
|
37
|
+
|
|
38
|
+
## Mode 3: Injection (Compile-Time Template Expansion)
|
|
39
|
+
|
|
40
|
+
A workflow references a routine via a `type: "template_call"` step, and the **compiler expands the routine's
|
|
41
|
+
steps inline** at compile time. The routine's steps become first-class workflow steps.
|
|
42
|
+
|
|
43
|
+
**When to use**: when routine steps should be visible in the workflow's step list, participate
|
|
44
|
+
in confirmation gates, and be tracked individually in the session.
|
|
45
|
+
|
|
46
|
+
**How it works**:
|
|
47
|
+
1. A workflow step declares a `type: "template_call"` step with the routine's template ID and args
|
|
48
|
+
2. At compile time, the template registry expands the routine into real steps
|
|
49
|
+
3. The expanded steps replace the template call step in the compiled workflow
|
|
50
|
+
4. `{arg}` placeholders in prompts are substituted; `{{contextVar}}` is preserved for runtime
|
|
51
|
+
|
|
52
|
+
**Template ID convention**:
|
|
53
|
+
- Routine `routine-tension-driven-design` → template ID `wr.templates.routine.tension-driven-design`
|
|
54
|
+
- The `routine-` prefix is stripped automatically
|
|
55
|
+
|
|
56
|
+
**Example** (in workflow JSON):
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"type": "template_call",
|
|
60
|
+
"templateId": "wr.templates.routine.tension-driven-design",
|
|
61
|
+
"args": {
|
|
62
|
+
"deliverableName": "design-candidates.md"
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**What happens at compile time**:
|
|
68
|
+
- The step above is replaced by the routine's 5 steps (step-discover-philosophy, step-understand-deeply, etc.)
|
|
69
|
+
- Each expanded step ID is prefixed using the compiler's provenance/step identity rules
|
|
70
|
+
- `{deliverableName}` in prompts becomes `design-candidates.md`
|
|
71
|
+
- The routine's `metaGuidance` is injected as step-level `guidance` on each expanded step
|
|
72
|
+
- `preconditions` and `clarificationPrompts` are NOT included (parent workflow handles those)
|
|
73
|
+
|
|
74
|
+
**Constraints**:
|
|
75
|
+
- Routine steps must NOT contain nested `template_call` usage (no recursive injection)
|
|
76
|
+
- All `{arg}` placeholders must be satisfied by the template call's `args`
|
|
77
|
+
- Arg values must be primitives (string, number, boolean) — objects/arrays are rejected
|
|
78
|
+
|
|
79
|
+
## Comparison
|
|
80
|
+
|
|
81
|
+
| Aspect | Delegation | Direct Execution | Injection |
|
|
82
|
+
|---|---|---|---|
|
|
83
|
+
| When resolved | Runtime | Runtime | Compile time |
|
|
84
|
+
| Parallelism | Yes (subagent) | No | N/A (steps are inline) |
|
|
85
|
+
| Step visibility | Opaque to parent | Transparent | Fully visible |
|
|
86
|
+
| Confirmation gates | Subagent only | Agent decides | Per-step as authored |
|
|
87
|
+
| Session tracking | Separate session | Same session | Same session, per-step |
|
|
88
|
+
| Arg substitution | Via context | Via context | `{arg}` → compile-time |
|
|
89
|
+
|
|
90
|
+
## Selection guidance
|
|
91
|
+
|
|
92
|
+
Choosing the right consumption mode matters as much as choosing the right routine.
|
|
93
|
+
|
|
94
|
+
### Default decision rule
|
|
95
|
+
|
|
96
|
+
Use this order unless you have a strong reason not to:
|
|
97
|
+
|
|
98
|
+
- **Use injection (`templateCall`) by default** when the routine is part of the parent workflow's authored structure.
|
|
99
|
+
- **Use delegation** when the routine's value comes from an independent perspective, parallelism, or an intentionally opaque bounded audit.
|
|
100
|
+
- **Use extension points only to make delegation seams overridable**, not as a substitute for routine injection.
|
|
101
|
+
|
|
102
|
+
A good litmus test:
|
|
103
|
+
|
|
104
|
+
- If you want the routine's **steps to appear in the parent workflow**, use **injection**.
|
|
105
|
+
- If you want the routine's **result but not its internal steps**, use **delegation**.
|
|
106
|
+
- If you want a team to **swap which delegated implementation is called** without forking the parent workflow, add an **extension point** around that delegated seam.
|
|
107
|
+
|
|
108
|
+
### What extension points do not do
|
|
109
|
+
|
|
110
|
+
`extensionPoints` and `{{wr.bindings.*}}` do **not** inject a routine into the parent workflow.
|
|
111
|
+
|
|
112
|
+
They only resolve a slot to a routine/workflow ID in prompt text at compile time. The parent agent still decides whether to call or follow that bound implementation at runtime.
|
|
113
|
+
|
|
114
|
+
Because binding resolution runs **after** template expansion, extension points cannot currently choose which routine gets injected via `templateCall`.
|
|
115
|
+
|
|
116
|
+
### Prefer delegation when
|
|
117
|
+
|
|
118
|
+
- an independent cognitive perspective adds value
|
|
119
|
+
- the parent can continue useful work in parallel
|
|
120
|
+
- the routine is acting as an auditor, challenger, or verifier
|
|
121
|
+
- the routine's internal steps do not need to be visible as first-class parent workflow steps
|
|
122
|
+
|
|
123
|
+
Common examples:
|
|
124
|
+
|
|
125
|
+
- context completeness / depth audits
|
|
126
|
+
- adversarial hypothesis challenge
|
|
127
|
+
- philosophy alignment review
|
|
128
|
+
- final verification from a fresh perspective
|
|
129
|
+
|
|
130
|
+
### Prefer direct execution when
|
|
131
|
+
|
|
132
|
+
- delegation overhead is not justified
|
|
133
|
+
- the current agent is already the natural executor
|
|
134
|
+
- step visibility is unnecessary
|
|
135
|
+
- the routine is mainly a reusable thinking scaffold, not a separate perspective
|
|
136
|
+
|
|
137
|
+
### Prefer injection when
|
|
138
|
+
|
|
139
|
+
- the routine's steps should be visible in the parent workflow
|
|
140
|
+
- confirmation behavior should apply per injected step
|
|
141
|
+
- session traceability matters
|
|
142
|
+
- the routine is central enough to the parent workflow that hiding it behind opaque delegation would reduce debuggability
|
|
143
|
+
- the author wants the engine, not the agent, to own that reusable subflow
|
|
144
|
+
|
|
145
|
+
Common examples:
|
|
146
|
+
|
|
147
|
+
- reusable design-generation cores
|
|
148
|
+
- reusable final-verification skeletons
|
|
149
|
+
- bounded reusable subflows the author wants Studio/session visibility for
|
|
150
|
+
|
|
151
|
+
## Auditor-first guidance
|
|
152
|
+
|
|
153
|
+
For many high-value routines, the best default mental model is **auditor**, not **task owner**.
|
|
154
|
+
|
|
155
|
+
That means the parent workflow:
|
|
156
|
+
|
|
157
|
+
- gathers or synthesizes the current state
|
|
158
|
+
- delegates a bounded audit/challenge/verification package
|
|
159
|
+
- interprets the returned artifact as evidence
|
|
160
|
+
|
|
161
|
+
not as canonical truth.
|
|
162
|
+
|
|
163
|
+
This is often a better fit than executor-style delegation for:
|
|
164
|
+
|
|
165
|
+
- review workflows
|
|
166
|
+
- planning workflows
|
|
167
|
+
- verification-heavy workflows
|
|
168
|
+
|
|
169
|
+
## High-value routine defaults
|
|
170
|
+
|
|
171
|
+
The current routine catalog suggests these default uses:
|
|
172
|
+
|
|
173
|
+
- `routine-context-gathering`: completeness/depth audit or bounded context expansion
|
|
174
|
+
- `routine-hypothesis-challenge`: adversarial challenge against the current leading story
|
|
175
|
+
- `routine-execution-simulation`: bounded runtime/flow reasoning where mental execution adds value
|
|
176
|
+
- `routine-philosophy-alignment`: review against user/repo principles
|
|
177
|
+
- `routine-final-verification`: proof-oriented end-state validation
|
|
178
|
+
|
|
179
|
+
## Good and bad fits
|
|
180
|
+
|
|
181
|
+
### Good fit for delegation
|
|
182
|
+
|
|
183
|
+
- an adversarial reviewer challenging the current recommendation
|
|
184
|
+
- a philosophy/policy auditor checking alignment against repo rules
|
|
185
|
+
- a fresh final verifier evaluating whether evidence really supports the conclusion
|
|
186
|
+
|
|
187
|
+
### Bad fit for delegation
|
|
188
|
+
|
|
189
|
+
- tiny deterministic transformations that the parent can do faster directly
|
|
190
|
+
- parent-owned loop decisions or canonical synthesis
|
|
191
|
+
- work where hiding the internal steps would make the session harder to debug
|
|
192
|
+
|
|
193
|
+
### Good fit for injection
|
|
194
|
+
|
|
195
|
+
- a reusable multi-step authoring scaffold the parent wants visible in the step list
|
|
196
|
+
- a reusable verification sequence that should honor parent confirmation gates
|
|
197
|
+
|
|
198
|
+
### Bad fit for injection
|
|
199
|
+
|
|
200
|
+
- every small repeated instruction block
|
|
201
|
+
- routines whose value comes mainly from independent perspective rather than visible sub-steps
|
|
202
|
+
|
|
203
|
+
### Prefer extension points when
|
|
204
|
+
|
|
205
|
+
- the parent workflow intentionally delegates a bounded seam
|
|
206
|
+
- teams may want to replace that delegated implementation per project
|
|
207
|
+
- the parent workflow still owns synthesis, loop control, and final decisions
|
|
208
|
+
|
|
209
|
+
### Bad fit for extension points
|
|
210
|
+
|
|
211
|
+
- using `{{wr.bindings.*}}` where the real goal is inline routine structure
|
|
212
|
+
- hiding a core parent subflow behind a rebinding slot just to avoid hardcoding a routine ID
|
|
213
|
+
- expecting project bindings to change which routine a `templateCall` injects
|
|
214
|
+
|
|
215
|
+
## See Also
|
|
216
|
+
|
|
217
|
+
- `workflows/examples/routine-injection-example.json` — example workflow using injection
|
|
218
|
+
- `src/application/services/compiler/template-registry.ts` — injection implementation
|
|
219
|
+
- `src/application/services/compiler/routine-loader.ts` — routine loading from disk
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Sequence Diagrams for Native Context Management
|
|
2
|
+
|
|
3
|
+
> **Not pursuing**
|
|
4
|
+
>
|
|
5
|
+
> WorkRail is not planning to implement native context management.
|
|
6
|
+
>
|
|
7
|
+
> This file is kept only as a stable tombstone so old links do not break.
|
|
8
|
+
>
|
|
9
|
+
> See:
|
|
10
|
+
> - `docs/roadmap/legacy-planning-status.md`
|
|
11
|
+
> - `docs/plans/native-context-management-epic.md`
|
|
@@ -0,0 +1,220 @@
|
|
|
1
|
+
# Subagent Design Principles & Catalog
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This document defines WorkRail's approach to subagent design for agentic IDEs. It outlines the core principles, patterns, and catalog of specialized subagents that enhance WorkRail workflows.
|
|
6
|
+
|
|
7
|
+
**Philosophy:** Subagents are **specialized cognitive functions**, not task owners. They execute complete, autonomous routines and return structured deliverables to the main agent orchestrator.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Core Principles
|
|
12
|
+
|
|
13
|
+
### 1. **Cognitive Specialization, Not Task Ownership**
|
|
14
|
+
|
|
15
|
+
** Good:** "Context Researcher" - Specializes in deep reading and systematic exploration
|
|
16
|
+
** Bad:** "Debugger" - Too broad, owns entire debugging workflow
|
|
17
|
+
|
|
18
|
+
**Rule:** Subagents should embody a **specific cognitive mode** (exploration, challenge, verification) that can be applied across many workflows, not own a complete workflow themselves.
|
|
19
|
+
|
|
20
|
+
### 2. **Stateless & Self-Contained**
|
|
21
|
+
|
|
22
|
+
Each subagent invocation is independent:
|
|
23
|
+
- **No memory** between calls
|
|
24
|
+
- **No conversational refinement**
|
|
25
|
+
- **No follow-up questions**
|
|
26
|
+
|
|
27
|
+
**Implication:** The main agent must provide **all necessary context upfront** in a single, complete work package.
|
|
28
|
+
|
|
29
|
+
**Pattern:**
|
|
30
|
+
```
|
|
31
|
+
Main Agent → Subagent: [Complete Context Package]
|
|
32
|
+
Subagent: [Autonomous Execution]
|
|
33
|
+
Subagent → Main Agent: [Structured Deliverable]
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### 3. **Autonomous Routine Execution**
|
|
37
|
+
|
|
38
|
+
Subagents execute **complete routines** from start to finish:
|
|
39
|
+
- Receive: Self-contained work package with all context
|
|
40
|
+
- Execute: Multi-step routine autonomously
|
|
41
|
+
- Return: Named, structured artifact (e.g., `ExecutionFlow.md`)
|
|
42
|
+
|
|
43
|
+
**Not this:** Iterative back-and-forth, gradual context building, conversational refinement.
|
|
44
|
+
|
|
45
|
+
### 4. **Depth-Aware Investigation**
|
|
46
|
+
|
|
47
|
+
For research/exploration tasks, subagents support **configurable depth levels** to balance speed vs thoroughness:
|
|
48
|
+
|
|
49
|
+
| Level | Name | Time | Use Case |
|
|
50
|
+
|-------|------|------|----------|
|
|
51
|
+
| 0 | Survey | 1-2 min | "What exists here?" |
|
|
52
|
+
| 1 | Scan | 5-10 min | "What are the major components?" |
|
|
53
|
+
| 2 | Explore | 15-30 min | "What does each component do?" |
|
|
54
|
+
| 3 | Analyze | 30-60 min | "How does this specific logic work?" |
|
|
55
|
+
| 4 | Dissect | 60+ min | "What is every line doing?" |
|
|
56
|
+
|
|
57
|
+
Main agent chooses depth based on uncertainty and importance.
|
|
58
|
+
|
|
59
|
+
### 5. **Structured Deliverables**
|
|
60
|
+
|
|
61
|
+
Every subagent routine produces a **named artifact** with a **consistent structure**:
|
|
62
|
+
|
|
63
|
+
**Standard Output Format:**
|
|
64
|
+
```markdown
|
|
65
|
+
### Summary (3-5 bullets)
|
|
66
|
+
- Key findings
|
|
67
|
+
|
|
68
|
+
### Detailed Findings
|
|
69
|
+
- Component breakdowns
|
|
70
|
+
- File citations (file:line)
|
|
71
|
+
|
|
72
|
+
### Suspicious Points / Concerns / Gaps
|
|
73
|
+
- What could be problematic
|
|
74
|
+
- What couldn't be determined
|
|
75
|
+
|
|
76
|
+
### Recommendations
|
|
77
|
+
- What main agent should do next
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Deliverable Quality Gates:**
|
|
81
|
+
|
|
82
|
+
Main agent validates each deliverable against these criteria:
|
|
83
|
+
- **Completeness**: All required sections present
|
|
84
|
+
- **Citations**: File:line references for all findings
|
|
85
|
+
- **Gaps Section**: Explicit about limitations and unknowns
|
|
86
|
+
- **Actionability**: Clear next steps or recommendations
|
|
87
|
+
|
|
88
|
+
**If a deliverable fails quality gates**, the main agent should:
|
|
89
|
+
1. Note the gaps in the workflow context
|
|
90
|
+
2. Decide if the partial deliverable is sufficient
|
|
91
|
+
3. Optionally re-run with clarified context (not automatic)
|
|
92
|
+
|
|
93
|
+
**Artifact Naming Convention:** Use kebab-case for filenames:
|
|
94
|
+
- `execution-flow.md`
|
|
95
|
+
- `hypothesis-challenges.md`
|
|
96
|
+
- `plan-analysis.md`
|
|
97
|
+
|
|
98
|
+
### 6. **Explicit Over Implicit**
|
|
99
|
+
|
|
100
|
+
While agentic IDEs support auto-invocation (system picks subagent based on task description), **WorkRail workflows use explicit delegation**:
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
Use: task(subagent_type="context-researcher", prompt="...")
|
|
104
|
+
Not: "Hey, someone gather context for me" (auto-invoke)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Rationale:** Predictability, debuggability, user understanding.
|
|
108
|
+
|
|
109
|
+
### 7. **Auditor Model: Review, Don't Execute**
|
|
110
|
+
|
|
111
|
+
**Key Discovery:** Subagents work better as **auditors** than **executors**.
|
|
112
|
+
|
|
113
|
+
** Executor Model (Problematic):**
|
|
114
|
+
```
|
|
115
|
+
Main Agent: "Go gather context about authentication"
|
|
116
|
+
Subagent: *reads files, builds understanding*
|
|
117
|
+
Problem: Main agent doesn't have the context, needs to re-read
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
** Auditor Model (Effective):**
|
|
121
|
+
```
|
|
122
|
+
Main Agent: *reads files, builds understanding*
|
|
123
|
+
Main Agent: "I read these files and learned X. Audit my work."
|
|
124
|
+
Subagent: "You missed Y, assumption Z is risky, go deeper on W"
|
|
125
|
+
Main Agent: *investigates gaps*
|
|
126
|
+
Result: Main agent has full context + quality control
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**Why Auditors Work Better:**
|
|
130
|
+
- **No dilution**: Main agent has full, uncompressed context
|
|
131
|
+
- **No duplication**: Main agent doesn't need to re-read what subagent read
|
|
132
|
+
- **Fresh perspective**: Auditor catches gaps and blind spots
|
|
133
|
+
- **Quality control**: Ensures sufficient understanding before proceeding
|
|
134
|
+
- **Cognitive diversity**: Different perspective on the same work
|
|
135
|
+
|
|
136
|
+
**When to Use Auditors:**
|
|
137
|
+
- Context gathering (audit for completeness and depth)
|
|
138
|
+
- Hypothesis formation (challenge assumptions)
|
|
139
|
+
- Plan creation (validate completeness and soundness)
|
|
140
|
+
- Final validation (adversarial review before committing)
|
|
141
|
+
|
|
142
|
+
**When Executors Still Make Sense:**
|
|
143
|
+
- Simulation (running "what-if" scenarios in parallel)
|
|
144
|
+
- Independent parallel work (different execution paths)
|
|
145
|
+
- Specialized tasks main agent can't do well
|
|
146
|
+
|
|
147
|
+
### 8. **Parallel Delegation for Critical Work**
|
|
148
|
+
|
|
149
|
+
**Pattern:** Spawn multiple subagents **simultaneously** for critical phases to get diverse perspectives and ensure nothing is missed.
|
|
150
|
+
|
|
151
|
+
**Explicit Parallelism:**
|
|
152
|
+
```
|
|
153
|
+
**CRITICAL: Spawn ALL subagents SIMULTANEOUSLY, not sequentially.**
|
|
154
|
+
|
|
155
|
+
Delegate to THREE subagents AT THE SAME TIME:
|
|
156
|
+
1. [Subagent 1 with specific focus]
|
|
157
|
+
2. [Subagent 2 with different focus]
|
|
158
|
+
3. [Subagent 3 with different focus]
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Use Cases:**
|
|
162
|
+
|
|
163
|
+
**1. Multi-Perspective Auditing (Diverse Focuses)**
|
|
164
|
+
```
|
|
165
|
+
Main agent gathers context
|
|
166
|
+
↓
|
|
167
|
+
Parallel Audit (2-3 subagents):
|
|
168
|
+
├─ Context Researcher (FOCUS: Completeness)
|
|
169
|
+
├─ Context Researcher (FOCUS: Depth)
|
|
170
|
+
└─ [Optional 3rd perspective]
|
|
171
|
+
|
|
172
|
+
Main agent synthesizes all perspectives
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**2. Redundant Critical Work (Different Rigor)**
|
|
176
|
+
```
|
|
177
|
+
Main agent forms hypotheses
|
|
178
|
+
↓
|
|
179
|
+
Parallel Challenge (2 subagents):
|
|
180
|
+
├─ Hypothesis Challenger (rigor=3: Thorough)
|
|
181
|
+
└─ Hypothesis Challenger (rigor=5: Maximum)
|
|
182
|
+
|
|
183
|
+
Main agent strengthens hypotheses based on challenges
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
**3. Multi-Modal Validation (Different Cognitive Modes)**
|
|
187
|
+
```
|
|
188
|
+
Main agent proposes fix
|
|
189
|
+
↓
|
|
190
|
+
Parallel Validation (3 subagents):
|
|
191
|
+
├─ Hypothesis Challenger (adversarial review)
|
|
192
|
+
├─ Execution Simulator (simulate the fix)
|
|
193
|
+
└─ Plan Analyzer (validate the plan)
|
|
194
|
+
|
|
195
|
+
Main agent proceeds only if ALL THREE validate
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Synthesis Guidance:**
|
|
199
|
+
|
|
200
|
+
When main agent receives multiple parallel deliverables:
|
|
201
|
+
- **Common concerns**: If 2+ subagents flag the same issue → High priority
|
|
202
|
+
- **Unique insights**: Each subagent may catch different gaps → Investigate all
|
|
203
|
+
- **Conflicting advice**: If they disagree → Investigate to understand why
|
|
204
|
+
- **Quality gate**: For critical phases, require ALL subagents to validate
|
|
205
|
+
|
|
206
|
+
**Cost/Speed Tradeoff:**
|
|
207
|
+
- Parallel = faster wall time but higher token cost
|
|
208
|
+
- Use for critical phases where quality matters most
|
|
209
|
+
- Use for phases where diverse perspectives add value
|
|
210
|
+
|
|
211
|
+
### 9. **Focused Audits for Parallel Work**
|
|
212
|
+
|
|
213
|
+
When spawning multiple auditors in parallel, give each a **specific focus** to maximize diversity and minimize overlap.
|
|
214
|
+
|
|
215
|
+
**Pattern:**
|
|
216
|
+
```
|
|
217
|
+
Subagent 1: FOCUS = Completeness
|
|
218
|
+
- Priority: Did they miss any critical areas?
|
|
219
|
+
- Still checks other dimensions, but emphasizes coverage
|
|
220
|
+
```
|