keystone-cli 1.1.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -34,11 +34,12 @@ Keystone allows you to define complex automation workflows using a simple YAML s
34
34
 
35
35
  ---
36
36
 
37
- ## Features
37
+ ## <a id="features"></a>✨ Features
38
38
 
39
39
  - ⚡ **Local-First:** Built on Bun with a local SQLite database for state management.
40
40
  - 🧩 **Declarative:** Define workflows in YAML with automatic dependency tracking (DAG).
41
41
  - 🤖 **Agentic:** First-class support for LLM agents defined in Markdown with YAML frontmatter.
42
+ - 🎯 **Dynamic Workflows:** LLM-driven orchestration where a supervisor generates and executes steps at runtime.
42
43
  - 🧑‍💻 **Human-in-the-Loop:** Support for manual approval and text input steps.
43
44
  - 🔄 **Resilient:** Built-in retries, timeouts, and state persistence. Resume failed or paused runs exactly where they left off.
44
45
  - 📊 **TUI Dashboard:** Built-in interactive dashboard for monitoring and managing runs.
@@ -51,7 +52,7 @@ Keystone allows you to define complex automation workflows using a simple YAML s
51
52
 
52
53
  ---
53
54
 
54
- ## 🚀 Installation
55
+ ## <a id="installation"></a>🚀 Installation
55
56
 
56
57
  Ensure you have [Bun](https://bun.sh) installed.
57
58
 
@@ -89,7 +90,7 @@ source <(keystone completion bash)
89
90
 
90
91
  ---
91
92
 
92
- ## 🚦 Quick Start
93
+ ## <a id="quick-start"></a>🚦 Quick Start
93
94
 
94
95
  ### 1. Initialize a Project
95
96
  ```bash
@@ -130,39 +131,42 @@ keystone ui
130
131
 
131
132
  ---
132
133
 
133
- ## 🧰 Bundled Workflows
134
+ ## <a id="bundled-workflows"></a>🧰 Bundled Workflows
134
135
 
135
136
  `keystone init` seeds these workflows under `.keystone/workflows/` (and the agents they rely on under `.keystone/workflows/agents/`):
136
137
 
137
- Top-level workflows:
138
- - `scaffold-feature`: Interactive workflow scaffolder. Prompts for requirements, plans files, generates content, and writes them.
139
- - `decompose-problem`: Decomposes a problem into research/implementation/review tasks, waits for approval, runs sub-workflows, and summarizes.
140
- - `dev`: Self-bootstrapping DevMode workflow for an interactive plan/implement/verify loop.
141
- - `agent-handoff`: Demonstrates agent handoffs and tool-driven context updates.
142
- - `script-example`: Demonstrates sandboxed JavaScript execution.
143
- - `artifact-example`: Demonstrates artifact upload and download between steps.
144
- - `idempotency-example`: Demonstrates safe retries for side-effecting steps.
145
-
146
- Sub-workflows:
147
- - `scaffold-plan`: Generates a file plan from `requirements` input.
148
- - `scaffold-generate`: Generates file contents from `requirements` plus a `files` plan.
149
- - `decompose-research`: Runs a single research task (`task`) with optional `context`/`constraints`.
150
- - `decompose-implement`: Runs a single implementation task (`task`) with optional `research` findings.
151
- - `decompose-review`: Reviews a single implementation task (`task`) with optional `implementation` results.
152
- - `review-loop`: Reusable generate critique refine loop with a quality gate.
138
+ Top-level workflows (seeded in `.keystone/workflows/`):
139
+ - `scaffold-feature.yaml`: Interactive workflow scaffolder. Prompts for requirements, plans files, generates content, and writes them.
140
+ - `decompose-problem.yaml`: Decomposes a problem into research/implementation/review tasks, waits for approval, runs sub-workflows, and summarizes.
141
+ - `dev.yaml`: Self-bootstrapping DevMode workflow for an interactive plan/implement/verify loop.
142
+ - `agent-handoff.yaml`: Demonstrates agent handoffs and tool-driven context updates.
143
+ - `full-feature-demo.yaml`: A comprehensive workflow demonstrating multiple step types (shell, file, request, etc.).
144
+ - `script-example.yaml`: Demonstrates sandboxed JavaScript execution.
145
+ - `artifact-example.yaml`: Demonstrates artifact upload and download between steps.
146
+ - `idempotency-example.yaml`: Demonstrates safe retries for side-effecting steps.
147
+ - `dynamic-demo.yaml`: Demonstrates LLM-driven dynamic workflow orchestration where steps are generated at runtime.
148
+
149
+ Sub-workflows (seeded in `.keystone/workflows/`):
150
+ - `scaffold-plan.yaml`: Generates a file plan from `requirements` input.
151
+ - `scaffold-generate.yaml`: Generates file contents from `requirements` plus a `files` plan.
152
+ - `decompose-research.yaml`: Runs a single research task (`task`) with optional `context`/`constraints`.
153
+ - `decompose-implement.yaml`: Runs a single implementation task (`task`) with optional `research` findings.
154
+ - `decompose-review.yaml`: Reviews a single implementation task (`task`) with optional `implementation` results.
155
+ - `review-loop.yaml`: Reusable generate → critique → refine loop with a quality gate.
153
156
 
154
157
  Example runs:
155
158
  ```bash
156
159
  keystone run scaffold-feature
157
160
  keystone run decompose-problem -i problem="Add caching to the API" -i context="Node/Bun service"
158
161
  keystone run agent-handoff -i topic="billing" -i user="Ada"
162
+ keystone run dynamic-demo -i task="Set up a Node.js project with TypeScript"
159
163
  ```
160
164
 
161
165
  Sub-workflows are used by the top-level workflows, but can be run directly if you want just one phase.
162
166
 
163
167
  ---
164
168
 
165
- ## ⚙️ Configuration
169
+ ## <a id="configuration"></a>⚙️ Configuration
166
170
 
167
171
  Keystone loads configuration from project `.keystone/config.yaml` (and user-level config; see `keystone config show` for search order) to manage model providers and model mappings.
168
172
 
@@ -198,7 +202,7 @@ providers:
198
202
  google-gemini:
199
203
  type: google-gemini
200
204
  base_url: https://cloudcode-pa.googleapis.com
201
- default_model: gemini-3-pro-high
205
+ default_model: gemini-1.5-pro
202
206
  groq:
203
207
  type: openai
204
208
  base_url: https://api.groq.com/openai/v1
@@ -357,7 +361,7 @@ Or use the `keystone auth login` command to securely store them in your local ma
357
361
 
358
362
  ---
359
363
 
360
- ## 📝 Workflow Example
364
+ ## <a id="workflow-example"></a>📝 Workflow Example
361
365
 
362
366
  Workflows are defined in YAML. Dependencies are automatically resolved based on the `needs` field, and **Keystone also automatically detects implicit dependencies** from your `${{ }}` expressions.
363
367
 
@@ -440,7 +444,7 @@ expression:
440
444
 
441
445
  ---
442
446
 
443
- ## 🏗️ Step Types
447
+ ## <a id="step-types"></a>🏗️ Step Types
444
448
 
445
449
  Keystone supports several specialized step types:
446
450
 
@@ -481,7 +485,8 @@ Keystone supports several specialized step types:
481
485
  ```yaml
482
486
  outputMapping:
483
487
  final_result: result_from_subflow
484
- status: state
488
+ # 'from' can be used for explicit mapping or expression
489
+ # status: { from: "steps.some_step.status" }
485
490
  ```
486
491
  - `join`: Aggregate outputs from dependencies and enforce a completion condition.
487
492
  - `condition`: `'all'` (default), `'any'`, or a number.
@@ -499,6 +504,7 @@ Keystone supports several specialized step types:
499
504
  - `op: store`: Store text with metadata.
500
505
  - `op: search`: Search for similar text using vector embeddings.
501
506
  - `text` / `query`: The content to store or search for.
507
+ - `model`: Optional embedding model (defaults to `local`). Currently only local embeddings (via `Transformers.js`) are supported.
502
508
  - `metadata`: Optional object for filtering or additional context.
503
509
  - `limit`: Number of results to return (default `5`).
504
510
  ```yaml
@@ -518,6 +524,54 @@ Keystone supports several specialized step types:
518
524
  - `env` and `cwd` are required and must be explicit.
519
525
  - `input` is sent to stdin (objects/arrays are JSON-encoded).
520
526
  - Summary is parsed from stdout or a file at `KEYSTONE_ENGINE_SUMMARY_PATH` and stored as an artifact.
527
+ - `git`: Execute git operations with automatic worktree management.
528
+ - Operations: `clone`, `checkout`, `pull`, `push`, `commit`, `worktree_add`, `worktree_remove`.
529
+ - `cleanup: true` automatically removes worktrees at workflow end.
530
+ ```yaml
531
+ - id: clone_repo
532
+ type: git
533
+ op: clone
534
+ url: https://github.com/example/repo.git
535
+ path: ./repo
536
+ branch: main
537
+ cleanup: true
538
+ ```
539
+ - `dynamic`: LLM-driven workflow orchestration where a supervisor agent generates steps at runtime.
540
+ - The supervisor LLM creates a plan of steps that are then executed dynamically.
541
+ - Supports resumability - state is persisted after each generated step.
542
+ - Generated steps can be: `llm`, `shell`, `workflow`, `file`, or `request`.
543
+ - `goal`: High-level goal for the supervisor to accomplish (required).
544
+ - `context`: Additional context for planning.
545
+ - `prompt`: Custom supervisor prompt (overrides default).
546
+ - `supervisor`: Agent for planning (defaults to `keystone-architect`).
547
+ - `agent`: Default agent for generated LLM steps.
548
+ - `templates`: Role-to-agent mapping for specialized tasks.
549
+ - `maxSteps`: Maximum number of steps to generate.
550
+ - `concurrency`: Maximum number of steps to run in parallel (default: `1`).
551
+ - `confirmPlan`: Review and approve/modify the plan before execution (default: `false`).
552
+ - `maxReplans`: Number of automatic recovery attempts if the plan fails (default: `3`).
553
+ - `allowStepFailure`: Continue execution even if individual generated steps fail.
554
+ - `library`: A list of pre-defined step patterns available to the supervisor.
555
+ ```yaml
556
+ - id: implement_feature
557
+ type: dynamic
558
+ goal: "Implement user authentication with JWT"
559
+ context: "This is a Node.js Express application"
560
+ agent: keystone-architect
561
+ templates:
562
+ planner: "keystone-architect"
563
+ developer: "software-engineer"
564
+ maxSteps: 10
565
+ allowStepFailure: false
566
+ ```
567
+
568
+ #### Dynamic Orchestration vs. Rigid Pipelines
569
+ Traditional workflows often require complex multi-file decomposition (e.g., `decompose-problem.yaml` calling separate research, implementation, and review workflows). The `dynamic` step type replaces these rigid patterns with **Agentic Orchestration**:
570
+ - **Simplified Structure**: A single `dynamic` step can replace multiple nested pipelines.
571
+ - **Adaptive Execution**: The agent adjusts its plan based on real-time feedback and results from previous steps.
572
+ - **Improved Resumability**: Each sub-step generated by the agent is persisted, allowing seamless resumption even inside long-running dynamic tasks.
573
+
574
+ Use **Deterministic Workflows** (standard steps) for predictable, repeatable processes. Use **Dynamic Orchestration** for open-ended tasks where the specific steps cannot be known in advance.
521
575
 
522
576
  ### Human Steps in Non-Interactive Mode
523
577
  If stdin is not a TTY (CI, piped input), `human` steps suspend. Resume by providing an answer via inputs using the step id and `__answer`:
@@ -551,8 +605,8 @@ All steps support common features:
551
605
  - `retry`: `{ count, backoff: 'linear'|'exponential', baseDelay }`.
552
606
  - `timeout`: Maximum execution time in milliseconds (best-effort; supported steps receive an abort signal).
553
607
  - `foreach`: Iterate over an array in parallel.
554
- - `concurrency`: Limit parallel items for `foreach` (must be a positive integer).
555
- - `strategy.matrix`: Experimental parser-time expansion into `foreach` (prefer explicit `foreach` for now).
608
+ - `concurrency`: Limit parallel items for `foreach` (must be a positive integer). Defaults to `50`.
609
+ - `strategy.matrix`: Multi-axis expansion into `foreach` at parse-time.
556
610
  - `pool`: Assign step to a resource pool.
557
611
  - `breakpoint`: Pause before executing the step when running with `--debug`.
558
612
  - `compensate`: Step to run if the workflow rolls back.
@@ -723,7 +777,7 @@ Until `strategy.matrix` is wired end-to-end, use explicit `foreach` with an arra
723
777
 
724
778
  ---
725
779
 
726
- ## 🔧 Advanced Features
780
+ ## <a id="advanced-features"></a>🔧 Advanced Features
727
781
 
728
782
  ### Idempotency Keys
729
783
 
@@ -806,6 +860,24 @@ Upload and download files between steps without hardcoded artifact paths.
806
860
 
807
861
  Upload outputs include `artifactPath` and `files` for downstream references.
808
862
 
863
+ - `git`: Perform git operations (clone, worktree, checkout, pull, push, commit).
864
+ - `op`: Required operation (`clone`, `worktree_add`, `worktree_remove`, `checkout`, `pull`, `push`, `commit`).
865
+ - `path`: Local path for clone or worktree.
866
+ - `url`: Repository URL for clone.
867
+ - `branch`: Branch name for clone, checkout, push, pull, or worktree.
868
+ - `message`: Commit message.
869
+ - `cwd`: Directory to run the git command in.
870
+ - `allowOutsideCwd`: Boolean (default `false`). Set `true` to allow operations outside the project root.
871
+ - `allowInsecure`: Boolean (default `false`). Set `true` to allow git commands that fail the security whitelist.
872
+
873
+ ```yaml
874
+ - id: setup_feat
875
+ type: git
876
+ op: worktree_add
877
+ path: ../feat-branch
878
+ branch: feature/x
879
+ ```
880
+
809
881
  ### Structured Events
810
882
 
811
883
  Emit NDJSON events for step and workflow lifecycle updates:
@@ -918,7 +990,7 @@ You can also define a workflow-level `compensate` step to handle overall cleanup
918
990
 
919
991
  ---
920
992
 
921
- ## 🤖 Agent Definitions
993
+ ## <a id="agent-definitions"></a>🤖 Agent Definitions
922
994
 
923
995
  Agents are defined in Markdown files with YAML frontmatter, making them easy to read and version control.
924
996
 
@@ -1102,7 +1174,7 @@ In these examples, the agent will have access to all tools provided by the MCP s
1102
1174
 
1103
1175
  ---
1104
1176
 
1105
- ## 🛠️ CLI Commands
1177
+ ## <a id="cli-commands"></a>🛠️ CLI Commands
1106
1178
 
1107
1179
  | Command | Description |
1108
1180
  | :--- | :--- |
@@ -1166,7 +1238,7 @@ Input keys passed via `-i key=val` must be alphanumeric/underscore and cannot be
1166
1238
  ### Dry Run
1167
1239
  `keystone run --dry-run` prints shell commands without executing them and skips non-shell steps (including human prompts). Outputs from skipped steps are empty, so conditional branches may differ from a real run.
1168
1240
 
1169
- ## 🛡️ Security
1241
+ ## <a id="security"></a>🛡️ Security
1170
1242
 
1171
1243
  ### Shell Execution
1172
1244
  Keystone blocks shell commands that match common injection/destructive patterns (like `rm -rf /` or pipes to shells). To run them, set `allowInsecure: true` on the step. Prefer `${{ escape(...) }}` when interpolating user input.
@@ -1194,7 +1266,7 @@ Request steps enforce SSRF protections and require HTTPS by default. Cross-origi
1194
1266
 
1195
1267
  ---
1196
1268
 
1197
- ## 🏗️ Architecture
1269
+ ## <a id="architecture"></a>🏗️ Architecture
1198
1270
 
1199
1271
  ```mermaid
1200
1272
  graph TD
@@ -1224,13 +1296,18 @@ graph TD
1224
1296
  EX --> Script[Script Step]
1225
1297
  EX --> Sleep[Sleep Step]
1226
1298
  EX --> Memory[Memory operations]
1299
+ EX --> Artifact[Artifact operations]
1300
+ EX --> Git[Git operations]
1301
+ EX --> Wait[Wait Step]
1302
+ EX --> Join[Join Step]
1303
+ EX --> Blueprint[Blueprint Step]
1227
1304
 
1228
1305
  LLM --> Adapters[LLM Adapters]
1229
1306
  Adapters --> Providers[OpenAI, Anthropic, Gemini, Copilot, etc.]
1230
1307
  LLM --> MCPClient[MCP Client]
1231
1308
  ```
1232
1309
 
1233
- ## 📂 Project Structure
1310
+ ## <a id="project-structure"></a>📂 Project Structure
1234
1311
 
1235
1312
  - `src/cli.ts`: CLI entry point.
1236
1313
  - `src/db/`: SQLite persistence layer.
@@ -1245,6 +1322,6 @@ graph TD
1245
1322
 
1246
1323
  ---
1247
1324
 
1248
- ## 📄 License
1325
+ ## <a id="license"></a>📄 License
1249
1326
 
1250
1327
  MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "keystone-cli",
3
- "version": "1.1.2",
3
+ "version": "1.3.0",
4
4
  "description": "A local-first, declarative, agentic workflow orchestrator built on Bun",
5
5
  "type": "module",
6
6
  "bin": {
@@ -17,9 +17,11 @@ import architectAgent from '../templates/agents/keystone-architect.md' with { ty
17
17
  import softwareEngineerAgent from '../templates/agents/software-engineer.md' with { type: 'text' };
18
18
  import summarizerAgent from '../templates/agents/summarizer.md' with { type: 'text' };
19
19
  import testerAgent from '../templates/agents/tester.md' with { type: 'text' };
20
+ import fullFeatureDemo from '../templates/basics/full-feature-demo.yaml' with { type: 'text' };
20
21
  import idempotencyExample from '../templates/control-flow/idempotency-example.yaml' with {
21
22
  type: 'text',
22
23
  };
24
+ import dynamicDemo from '../templates/dynamic-demo.yaml' with { type: 'text' };
23
25
  import artifactExample from '../templates/features/artifact-example.yaml' with { type: 'text' };
24
26
  import scriptExample from '../templates/features/script-example.yaml' with { type: 'text' };
25
27
  // Import templates
@@ -37,6 +39,9 @@ import decomposeReviewWorkflow from '../templates/scaffolding/decompose-review.y
37
39
  type: 'text',
38
40
  };
39
41
  import devWorkflow from '../templates/scaffolding/dev.yaml' with { type: 'text' };
42
+ import dynamicDecomposeWorkflow from '../templates/scaffolding/dynamic-decompose.yaml' with {
43
+ type: 'text',
44
+ };
40
45
  import reviewLoopWorkflow from '../templates/scaffolding/review-loop.yaml' with { type: 'text' };
41
46
  import scaffoldWorkflow from '../templates/scaffolding/scaffold-feature.yaml' with { type: 'text' };
42
47
  import scaffoldGenerateWorkflow from '../templates/scaffolding/scaffold-generate.yaml' with {
@@ -101,6 +106,7 @@ const SEEDS = [
101
106
  { path: '.keystone/workflows/scaffold-plan.yaml', content: scaffoldPlanWorkflow },
102
107
  { path: '.keystone/workflows/scaffold-generate.yaml', content: scaffoldGenerateWorkflow },
103
108
  { path: '.keystone/workflows/decompose-problem.yaml', content: decomposeWorkflow },
109
+ { path: '.keystone/workflows/dynamic-decompose.yaml', content: dynamicDecomposeWorkflow },
104
110
  { path: '.keystone/workflows/decompose-research.yaml', content: decomposeResearchWorkflow },
105
111
  { path: '.keystone/workflows/decompose-implement.yaml', content: decomposeImplementWorkflow },
106
112
  { path: '.keystone/workflows/decompose-review.yaml', content: decomposeReviewWorkflow },
@@ -118,6 +124,8 @@ const SEEDS = [
118
124
  { path: '.keystone/workflows/script-example.yaml', content: scriptExample },
119
125
  { path: '.keystone/workflows/artifact-example.yaml', content: artifactExample },
120
126
  { path: '.keystone/workflows/idempotency-example.yaml', content: idempotencyExample },
127
+ { path: '.keystone/workflows/full-feature-demo.yaml', content: fullFeatureDemo },
128
+ { path: '.keystone/workflows/dynamic-demo.yaml', content: dynamicDemo },
121
129
  ];
122
130
 
123
131
  export function registerInitCommand(program: Command): void {
@@ -0,0 +1,319 @@
1
+ /**
2
+ * Tests for DynamicStateManager
3
+ */
4
+ import { afterEach, beforeEach, describe, expect, it } from 'bun:test';
5
+ import { existsSync, mkdirSync, rmSync } from 'node:fs';
6
+ import { join } from 'node:path';
7
+ import {
8
+ type DynamicPlan,
9
+ DynamicStateManager,
10
+ type DynamicStepState,
11
+ } from './dynamic-state-manager.ts';
12
+ import { WorkflowDb } from './workflow-db.ts';
13
+
14
+ describe('DynamicStateManager', () => {
15
+ let db: WorkflowDb;
16
+ let stateManager: DynamicStateManager;
17
+ const testDir = join(import.meta.dir, '.test-dynamic-state');
18
+ const testDbPath = join(testDir, 'test.db');
19
+
20
+ beforeEach(async () => {
21
+ // Clean up any existing test db
22
+ if (existsSync(testDir)) {
23
+ rmSync(testDir, { recursive: true });
24
+ }
25
+ mkdirSync(testDir, { recursive: true });
26
+
27
+ db = new WorkflowDb(testDbPath);
28
+ stateManager = new DynamicStateManager(db);
29
+
30
+ // Create a workflow run for foreign key constraint
31
+ await db.createRun('test-run-1', 'test-workflow', { input: 'value' });
32
+ });
33
+
34
+ afterEach(() => {
35
+ db.close();
36
+ if (existsSync(testDir)) {
37
+ rmSync(testDir, { recursive: true });
38
+ }
39
+ });
40
+
41
+ describe('create', () => {
42
+ it('should create a new dynamic state', async () => {
43
+ const state = await stateManager.create({
44
+ runId: 'test-run-1',
45
+ stepId: 'dynamic-step-1',
46
+ workflowId: 'wf-123',
47
+ });
48
+
49
+ expect(state.id).toBeDefined();
50
+ expect(state.runId).toBe('test-run-1');
51
+ expect(state.stepId).toBe('dynamic-step-1');
52
+ expect(state.workflowId).toBe('wf-123');
53
+ expect(state.status).toBe('planning');
54
+ expect(state.generatedPlan.steps).toEqual([]);
55
+ expect(state.currentStepIndex).toBe(0);
56
+ expect(state.startedAt).toBeDefined();
57
+ });
58
+
59
+ it('should create state defaulting workflowId to runId', async () => {
60
+ const state = await stateManager.create({
61
+ runId: 'test-run-1',
62
+ stepId: 'dynamic-step-2',
63
+ });
64
+
65
+ expect(state.workflowId).toBe('test-run-1');
66
+ });
67
+ });
68
+
69
+ describe('load', () => {
70
+ it('should load existing state', async () => {
71
+ const created = await stateManager.create({
72
+ runId: 'test-run-1',
73
+ stepId: 'dynamic-step-1',
74
+ });
75
+
76
+ const loaded = await stateManager.load('test-run-1', 'dynamic-step-1');
77
+
78
+ expect(loaded).not.toBeNull();
79
+ expect(loaded?.id).toBe(created.id);
80
+ expect(loaded?.status).toBe('planning');
81
+ });
82
+
83
+ it('should return null for non-existent state', async () => {
84
+ const loaded = await stateManager.load('test-run-1', 'non-existent');
85
+ expect(loaded).toBeNull();
86
+ });
87
+ });
88
+
89
+ describe('loadById', () => {
90
+ it('should load state by ID', async () => {
91
+ const created = await stateManager.create({
92
+ runId: 'test-run-1',
93
+ stepId: 'dynamic-step-1',
94
+ });
95
+
96
+ if (!created.id) throw new Error('ID missing');
97
+ const loaded = await stateManager.loadById(created.id);
98
+
99
+ expect(loaded).not.toBeNull();
100
+ expect(loaded?.id).toBe(created.id);
101
+ });
102
+ });
103
+
104
+ describe('setPlan', () => {
105
+ it('should set the plan and create step executions', async () => {
106
+ const state = await stateManager.create({
107
+ runId: 'test-run-1',
108
+ stepId: 'dynamic-step-1',
109
+ });
110
+ if (!state.id) throw new Error('State ID missing');
111
+
112
+ const plan: DynamicPlan = {
113
+ steps: [
114
+ { id: 'step1', name: 'First step', type: 'shell', run: 'echo hello' },
115
+ {
116
+ id: 'step2',
117
+ name: 'Second step',
118
+ type: 'llm',
119
+ agent: 'test',
120
+ prompt: 'do something',
121
+ needs: ['step1'],
122
+ },
123
+ ],
124
+ notes: 'Test plan',
125
+ };
126
+
127
+ await stateManager.setPlan(state.id, plan);
128
+
129
+ // Verify state was updated
130
+ const loaded = await stateManager.loadById(state.id);
131
+ expect(loaded?.status).toBe('executing');
132
+ expect(loaded?.generatedPlan.steps.length).toBe(2);
133
+ expect(loaded?.generatedPlan.notes).toBe('Test plan');
134
+
135
+ // Verify step executions were created
136
+ const executions = await stateManager.getStepExecutions(state.id);
137
+ expect(executions.length).toBe(2);
138
+ expect(executions[0].stepId).toBe('step1');
139
+ expect(executions[0].status).toBe('pending');
140
+ expect(executions[0].executionOrder).toBe(0);
141
+ expect(executions[1].stepId).toBe('step2');
142
+ expect(executions[1].executionOrder).toBe(1);
143
+ });
144
+ });
145
+
146
+ describe('updateProgress', () => {
147
+ it('should update the current step index', async () => {
148
+ const state = await stateManager.create({
149
+ runId: 'test-run-1',
150
+ stepId: 'dynamic-step-1',
151
+ });
152
+ if (!state.id) throw new Error('State ID missing');
153
+
154
+ await stateManager.updateProgress(state.id, 3);
155
+
156
+ const loaded = await stateManager.loadById(state.id);
157
+ expect(loaded?.currentStepIndex).toBe(3);
158
+ });
159
+ });
160
+
161
+ describe('startStep and completeStep', () => {
162
+ it('should track step execution lifecycle', async () => {
163
+ const state = await stateManager.create({
164
+ runId: 'test-run-1',
165
+ stepId: 'dynamic-step-1',
166
+ });
167
+ if (!state.id) throw new Error('State ID missing');
168
+
169
+ const plan: DynamicPlan = {
170
+ steps: [{ id: 'step1', name: 'First step', type: 'shell', run: 'echo hello' }],
171
+ };
172
+ await stateManager.setPlan(state.id, plan);
173
+
174
+ // Start the step
175
+ await stateManager.startStep(state.id, 'step1');
176
+
177
+ let executions = await stateManager.getStepExecutions(state.id);
178
+ expect(executions[0].status).toBe('running');
179
+ expect(executions[0].startedAt).toBeDefined();
180
+
181
+ // Complete the step
182
+ await stateManager.completeStep(state.id, 'step1', {
183
+ status: 'success',
184
+ output: { result: 'hello' },
185
+ });
186
+
187
+ executions = await stateManager.getStepExecutions(state.id);
188
+ expect(executions[0].status).toBe('success');
189
+ expect(executions[0].output).toEqual({ result: 'hello' });
190
+ expect(executions[0].completedAt).toBeDefined();
191
+ });
192
+
193
+ it('should handle failed steps', async () => {
194
+ const state = await stateManager.create({
195
+ runId: 'test-run-1',
196
+ stepId: 'dynamic-step-1',
197
+ });
198
+ if (!state.id) throw new Error('State ID missing');
199
+
200
+ const plan: DynamicPlan = {
201
+ steps: [{ id: 'step1', name: 'First step', type: 'shell', run: 'exit 1' }],
202
+ };
203
+ await stateManager.setPlan(state.id, plan);
204
+ await stateManager.startStep(state.id, 'step1');
205
+
206
+ await stateManager.completeStep(state.id, 'step1', {
207
+ status: 'failed',
208
+ error: 'Command exited with code 1',
209
+ });
210
+
211
+ const executions = await stateManager.getStepExecutions(state.id);
212
+ expect(executions[0].status).toBe('failed');
213
+ expect(executions[0].error).toBe('Command exited with code 1');
214
+ });
215
+ });
216
+
217
+ describe('finish', () => {
218
+ it('should mark state as completed', async () => {
219
+ const state = await stateManager.create({
220
+ runId: 'test-run-1',
221
+ stepId: 'dynamic-step-1',
222
+ });
223
+ if (!state.id) throw new Error('State ID missing');
224
+
225
+ await stateManager.finish(state.id, 'completed');
226
+
227
+ const loaded = await stateManager.loadById(state.id);
228
+ expect(loaded?.status).toBe('completed');
229
+ expect(loaded?.completedAt).toBeDefined();
230
+ });
231
+
232
+ it('should mark state as failed with error', async () => {
233
+ const state = await stateManager.create({
234
+ runId: 'test-run-1',
235
+ stepId: 'dynamic-step-1',
236
+ });
237
+ if (!state.id) throw new Error('State ID missing');
238
+
239
+ await stateManager.finish(state.id, 'failed', 'Something went wrong');
240
+
241
+ const loaded = await stateManager.loadById(state.id);
242
+ expect(loaded?.status).toBe('failed');
243
+ expect(loaded?.error).toBe('Something went wrong');
244
+ });
245
+ });
246
+
247
+ describe('getStepResultsMap', () => {
248
+ it('should return completed steps as a map', async () => {
249
+ const state = await stateManager.create({
250
+ runId: 'test-run-1',
251
+ stepId: 'dynamic-step-1',
252
+ });
253
+ if (!state.id) throw new Error('State ID missing');
254
+
255
+ const plan: DynamicPlan = {
256
+ steps: [
257
+ { id: 'step1', name: 'First', type: 'shell', run: 'echo 1' },
258
+ { id: 'step2', name: 'Second', type: 'shell', run: 'echo 2' },
259
+ ],
260
+ };
261
+ await stateManager.setPlan(state.id, plan);
262
+
263
+ // Complete first step
264
+ await stateManager.startStep(state.id, 'step1');
265
+ await stateManager.completeStep(state.id, 'step1', {
266
+ status: 'success',
267
+ output: { value: 1 },
268
+ });
269
+
270
+ const resultsMap = await stateManager.getStepResultsMap(state.id);
271
+
272
+ expect(resultsMap.size).toBe(1); // Only completed steps
273
+ expect(resultsMap.get('step1')).toEqual({
274
+ output: { value: 1 },
275
+ status: 'success',
276
+ error: undefined,
277
+ });
278
+ expect(resultsMap.has('step2')).toBe(false); // Still pending
279
+ });
280
+ });
281
+
282
+ describe('listActive', () => {
283
+ it('should list active states', async () => {
284
+ await stateManager.create({
285
+ runId: 'test-run-1',
286
+ stepId: 'step-1',
287
+ });
288
+
289
+ const state2 = await stateManager.create({
290
+ runId: 'test-run-1',
291
+ stepId: 'step-2',
292
+ });
293
+
294
+ // Complete one
295
+ if (!state2.id) throw new Error('State ID missing');
296
+ await stateManager.finish(state2.id, 'completed');
297
+
298
+ const active = await stateManager.listActive();
299
+ expect(active.length).toBe(1);
300
+ expect(active[0].stepId).toBe('step-1');
301
+ });
302
+ });
303
+
304
+ describe('listByRun', () => {
305
+ it('should list states for a run', async () => {
306
+ await stateManager.create({
307
+ runId: 'test-run-1',
308
+ stepId: 'step-1',
309
+ });
310
+ await stateManager.create({
311
+ runId: 'test-run-1',
312
+ stepId: 'step-2',
313
+ });
314
+
315
+ const states = await stateManager.listByRun('test-run-1');
316
+ expect(states.length).toBe(2);
317
+ });
318
+ });
319
+ });