agent-step-gate 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/ARCHITECTURE.md +393 -0
  2. package/README.md +662 -0
  3. package/SKILL.md +190 -0
  4. package/Weaver.md +140 -0
  5. package/dist/cli.d.ts +1 -0
  6. package/dist/cli.js +573 -0
  7. package/dist/cli.js.map +1 -0
  8. package/dist/core/errors.d.ts +16 -0
  9. package/dist/core/errors.js +32 -0
  10. package/dist/core/errors.js.map +1 -0
  11. package/dist/core/gate.d.ts +20 -0
  12. package/dist/core/gate.js +82 -0
  13. package/dist/core/gate.js.map +1 -0
  14. package/dist/core/keys.d.ts +18 -0
  15. package/dist/core/keys.js +37 -0
  16. package/dist/core/keys.js.map +1 -0
  17. package/dist/core/plan.d.ts +2 -0
  18. package/dist/core/plan.js +135 -0
  19. package/dist/core/plan.js.map +1 -0
  20. package/dist/core/program.d.ts +69 -0
  21. package/dist/core/program.js +191 -0
  22. package/dist/core/program.js.map +1 -0
  23. package/dist/core/reconcile.d.ts +37 -0
  24. package/dist/core/reconcile.js +198 -0
  25. package/dist/core/reconcile.js.map +1 -0
  26. package/dist/core/session.d.ts +25 -0
  27. package/dist/core/session.js +88 -0
  28. package/dist/core/session.js.map +1 -0
  29. package/dist/index.d.ts +1 -0
  30. package/dist/index.js +29 -0
  31. package/dist/index.js.map +1 -0
  32. package/dist/storage/db.d.ts +3 -0
  33. package/dist/storage/db.js +117 -0
  34. package/dist/storage/db.js.map +1 -0
  35. package/dist/storage/repository.d.ts +24 -0
  36. package/dist/storage/repository.js +449 -0
  37. package/dist/storage/repository.js.map +1 -0
  38. package/dist/tools/activeTask.d.ts +2 -0
  39. package/dist/tools/activeTask.js +41 -0
  40. package/dist/tools/activeTask.js.map +1 -0
  41. package/dist/tools/cancelTask.d.ts +2 -0
  42. package/dist/tools/cancelTask.js +39 -0
  43. package/dist/tools/cancelTask.js.map +1 -0
  44. package/dist/tools/checkpoint.d.ts +2 -0
  45. package/dist/tools/checkpoint.js +71 -0
  46. package/dist/tools/checkpoint.js.map +1 -0
  47. package/dist/tools/current.d.ts +2 -0
  48. package/dist/tools/current.js +64 -0
  49. package/dist/tools/current.js.map +1 -0
  50. package/dist/tools/finalize.d.ts +2 -0
  51. package/dist/tools/finalize.js +95 -0
  52. package/dist/tools/finalize.js.map +1 -0
  53. package/dist/tools/index.d.ts +6 -0
  54. package/dist/tools/index.js +7 -0
  55. package/dist/tools/index.js.map +1 -0
  56. package/dist/tools/startPlan.d.ts +2 -0
  57. package/dist/tools/startPlan.js +124 -0
  58. package/dist/tools/startPlan.js.map +1 -0
  59. package/dist/types/index.d.ts +142 -0
  60. package/dist/types/index.js +6 -0
  61. package/dist/types/index.js.map +1 -0
  62. package/package.json +48 -0
  63. package/scripts/interactive-demo.ts +394 -0
  64. package/scripts/mcp-call.mjs +56 -0
  65. package/scripts/prompt-check-hook.sh +27 -0
  66. package/scripts/session-start-hook.sh +47 -0
  67. package/scripts/stop-hook.mjs +83 -0
  68. package/scripts/stop-hook.sh +75 -0
package/README.md ADDED
@@ -0,0 +1,662 @@
1
+ # 🚦 Agent Step Gate
2
+
3
+ Lightweight execution gate for long-running AI agent tasks.
4
+
5
+ Agent Step Gate helps AI agents avoid missing planned steps during complex refactors, multi-session development, and multi-agent harness workflows.
6
+
7
+ The core idea is simple:
8
+
9
+ > Trust the agent's ability, but don't trust its claims.
10
+
11
+ Agent Step Gate does **not** try to control how an agent works.
12
+ It only maintains an external execution ledger and verifies that planned steps have been completed before a task can be marked as done.
13
+
14
+ ---
15
+
16
+ ## ❓ Why
17
+
18
+ Long-context agents are capable, but they often miss steps in long-running tasks.
19
+
20
+ For example:
21
+
22
+ ```text
23
+ Large refactor plan
24
+ ├─ Phase 1: 12 tasks
25
+ ├─ Phase 2: 15 tasks
26
+ ├─ Phase 3: 10 tasks
27
+ └─ ...
28
+ ```
29
+
30
+ Even when the roadmap is well written, the agent may skip a step because of context drift, interruption, nested subtasks, or multi-session execution.
31
+
32
+ Agent Step Gate solves this by introducing a lightweight checkpoint system:
33
+
34
+ ```text
35
+ Step completed -> StepKey issued
36
+ All Steps completed -> TaskKey issued
37
+ TaskKey verified -> Task can be considered done
38
+ ```
39
+
40
+ The agent is free to execute the work however it wants, but it cannot successfully claim completion without a valid completion key.
41
+
42
+ ---
43
+
44
+ ## 🧠 Design Philosophy
45
+
46
+ Agent Step Gate follows several principles:
47
+
48
+ 1. **Minimal interference**
49
+
50
+ Do not constrain the agent's reasoning or implementation style.
51
+
52
+ 2. **Verify completion, not behavior**
53
+
54
+ The system does not judge whether the agent wrote good code.
55
+ It only checks whether the planned steps were explicitly passed.
56
+
57
+ 3. **External ledger**
58
+
59
+ Completion state is stored outside the agent context, so it is not lost when the context window becomes long.
60
+
61
+ 4. **Hook-gated finalization**
62
+
63
+ The agent may forget the Skill or CLI instruction, but the final Hook can still block incomplete tasks.
64
+
65
+ 5. **Bottom-up verification**
66
+
67
+ Verify from the smallest execution unit: `Step`.
68
+ Then propagate completion upward to `Task`, `Node`, and `Program`.
69
+
70
+ ---
71
+
72
+ ## 🏗️ Architecture
73
+
74
+ Agent Step Gate uses a four-layer model:
75
+
76
+ ```text
77
+ Program
78
+ └─ Node
79
+ └─ Task
80
+ └─ Step
81
+ ```
82
+
83
+ ### Program
84
+
85
+ A cross-session large plan.
86
+
87
+ Example:
88
+
89
+ ```text
90
+ Program: Refactor authentication module
91
+ ```
92
+
93
+ A Program can contain multiple Nodes.
94
+
95
+ ---
96
+
97
+ ### Node
98
+
99
+ A high-level work unit, usually corresponding to one session or one major stage.
100
+
101
+ Example:
102
+
103
+ ```text
104
+ Node: Migrate login flow
105
+ Node: Refactor token validation
106
+ Node: Update tests
107
+ ```
108
+
109
+ A Node can contain multiple Tasks.
110
+
111
+ ---
112
+
113
+ ### Task
114
+
115
+ The main unit for one agent interaction.
116
+
117
+ The recommended model is:
118
+
119
+ ```text
120
+ One interaction = One Task
121
+ ```
122
+
123
+ A Task contains concrete Steps.
124
+ The Hook only needs to force-check the current Task.
125
+
126
+ ---
127
+
128
+ ### Step
129
+
130
+ The smallest checkpoint.
131
+
132
+ Example:
133
+
134
+ ```text
135
+ Step 1: Read current auth implementation
136
+ Step 2: Identify token validation logic
137
+ Step 3: Refactor validation function
138
+ Step 4: Update related tests
139
+ Step 5: Run test suite
140
+ ```
141
+
142
+ Each completed Step produces a `StepKey`.
143
+
144
+ When all Steps in a Task are completed, the system produces a `TaskKey`.
145
+
146
+ ---
147
+
148
+ ## 🔄 Core Flow
149
+
150
+ ```text
151
+ 1. Create or resume a Task
152
+ 2. Agent executes current Step
153
+ 3. Agent marks Step complete
154
+ 4. Gate validates previous StepKey
155
+ 5. Gate returns next StepKey
156
+ 6. Repeat until all Steps are complete
157
+ 7. Gate issues TaskKey
158
+ 8. Hook or Main Agent verifies TaskKey
159
+ 9. Task is accepted as completed
160
+ ```
161
+
162
+ ---
163
+
164
+ ## 💻 CLI-first Workflow
165
+
166
+ Agent Step Gate is designed as a CLI-first tool.
167
+
168
+ ```text
169
+ Skill -> tells the agent how to cooperate
170
+ CLI -> stores state, issues keys, verifies completion
171
+ Hook -> blocks incomplete task finalization
172
+ MCP -> optional adapter for tool-calling agents
173
+ ```
174
+
175
+ The core system does not require a shared MCP server.
176
+
177
+ This makes it easier to avoid cross-terminal conflicts:
178
+
179
+ ```text
180
+ Project directory
181
+ └─ .agent-step-gate/
182
+ ├─ state.db
183
+ ├─ sessions/
184
+ └─ current-task.json
185
+ ```
186
+
187
+ Each project owns its local execution ledger.
188
+
189
+ ---
190
+
191
+ ## 🪝 Hook-gated Completion
192
+
193
+ Agent Step Gate does not need to remind the agent constantly.
194
+
195
+ A recommended pattern is:
196
+
197
+ ```text
198
+ Session start:
199
+ remind the agent to use Agent Step Gate
200
+
201
+ During work:
202
+ do not interrupt the agent frequently
203
+
204
+ Before final answer:
205
+ Hook checks whether the current Task is complete
206
+ ```
207
+
208
+ If the Task is incomplete, the Hook blocks finalization and returns the missing Step information.
209
+
210
+ Example:
211
+
212
+ ```text
213
+ Task is not complete.
214
+
215
+ Current task:
216
+ task_auth_refactor_001
217
+
218
+ Missing steps:
219
+ - step_004: Update related tests
220
+ - step_005: Run test suite
221
+
222
+ Please continue the task and complete the missing steps before finalizing.
223
+ ```
224
+
225
+ ---
226
+
227
+ ## 🤖 Multi-agent Harness Usage
228
+
229
+ Agent Step Gate can also be used as a lightweight Agent Harness primitive.
230
+
231
+ A Main Agent can assign Tasks to Sub Agents:
232
+
233
+ ```text
234
+ Main Agent
235
+ ├─ assigns Task A to Sub Agent A
236
+ ├─ assigns Task B to Sub Agent B
237
+ └─ verifies returned TaskKeys
238
+ ```
239
+
240
+ Sub Agent flow:
241
+
242
+ ```text
243
+ 1. Receive TaskId
244
+ 2. Complete Steps using StepKeys
245
+ 3. Finalize Task
246
+ 4. Return TaskKey to Main Agent
247
+ ```
248
+
249
+ Main Agent does not need to inspect the full conversation or execution trace of the Sub Agent.
250
+
251
+ It only verifies:
252
+
253
+ ```text
254
+ verify-task-key(taskId, taskKey)
255
+ ```
256
+
257
+ If the TaskKey is valid, the Task is considered complete.
258
+
259
+ This saves context and enables scalable multi-agent orchestration.
260
+
261
+ ---
262
+
263
+ ## 📋 Example
264
+
265
+ ### Create a task
266
+
267
+ ```bash
268
+ asg task create \
269
+ --title "Refactor auth token validation" \
270
+ --steps "Read current implementation" \
271
+ "Refactor validation logic" \
272
+ "Update tests" \
273
+ "Run test suite"
274
+ ```
275
+
276
+ Output:
277
+
278
+ ```text
279
+ Task created: task_abc123
280
+ Current step: step_001
281
+ StepKey: K8F2QZ
282
+ ```
283
+
284
+ ---
285
+
286
+ ### Complete a step
287
+
288
+ ```bash
289
+ asg step complete \
290
+ --task task_abc123 \
291
+ --step step_001 \
292
+ --key K8F2QZ
293
+ ```
294
+
295
+ Output:
296
+
297
+ ```text
298
+ Step completed: step_001
299
+ Next step: step_002
300
+ StepKey: 9XLM2A
301
+ ```
302
+
303
+ ---
304
+
305
+ ### Finalize a task
306
+
307
+ ```bash
308
+ asg task finalize \
309
+ --task task_abc123
310
+ ```
311
+
312
+ If all Steps are complete:
313
+
314
+ ```text
315
+ Task completed.
316
+ TaskKey: TK_7H3Q9Z2M
317
+ ```
318
+
319
+ If not complete:
320
+
321
+ ```text
322
+ Task incomplete.
323
+
324
+ Missing steps:
325
+ - step_003: Update tests
326
+ - step_004: Run test suite
327
+ ```
328
+
329
+ ---
330
+
331
+ ### Verify a TaskKey
332
+
333
+ ```bash
334
+ asg task verify \
335
+ --task task_abc123 \
336
+ --key TK_7H3Q9Z2M
337
+ ```
338
+
339
+ Output:
340
+
341
+ ```text
342
+ TaskKey valid.
343
+ ```
344
+
345
+ ---
346
+
347
+ ## 📦 Program-level Usage
348
+
349
+ For large cross-session work, create a Program:
350
+
351
+ ```bash
352
+ asg program create --title "Authentication module refactor"
353
+ ```
354
+
355
+ Create Nodes under the Program:
356
+
357
+ ```bash
358
+ asg node create \
359
+ --program pgm_auth_refactor \
360
+ --title "Token validation refactor"
361
+ ```
362
+
363
+ Create Tasks under a Node:
364
+
365
+ ```bash
366
+ asg task create \
367
+ --node node_token_validation \
368
+ --title "Refactor token validation implementation" \
369
+ --steps "Read implementation" \
370
+ "Modify validation logic" \
371
+ "Update tests" \
372
+ "Run tests"
373
+ ```
374
+
375
+ Completion propagates naturally:
376
+
377
+ ```text
378
+ All Steps completed
379
+ -> Task completed
380
+ -> Node may become completed
381
+ -> Program may become completed
382
+ ```
383
+
384
+ The Hook only needs to force-check the current Task.
385
+
386
+ Program and Node are higher-level planning structures and can be finalized manually or through Skill guidance.
387
+
388
+ ---
389
+
390
+ ## 📖 Recommended Agent Skill Instruction
391
+
392
+ A Skill can include instructions like:
393
+
394
+ ```text
395
+ You are working under Agent Step Gate.
396
+
397
+ At the beginning of a task:
398
+ 1. Identify the current Task.
399
+ 2. Confirm the planned Steps.
400
+ 3. Use the CLI to get the current StepKey.
401
+
402
+ During execution:
403
+ 1. Complete the current Step.
404
+ 2. Mark the Step complete with the CLI.
405
+ 3. Continue until all Steps are completed.
406
+
407
+ Before final response:
408
+ 1. Finalize the Task.
409
+ 2. Include the TaskKey if the task is complete.
410
+ 3. If the Task is incomplete, continue working instead of claiming completion.
411
+ ```
412
+
413
+ The Skill is guidance only.
414
+
415
+ The real enforcement is performed by the Hook and CLI state.
416
+
417
+ ---
418
+
419
+ ## 🔔 Hook Behavior
420
+
421
+ The Stop Hook should check the current Task before the agent gives a final answer.
422
+
423
+ Pseudo logic:
424
+
425
+ ```text
426
+ on_stop:
427
+ currentTask = get_current_task()
428
+
429
+ if no current task:
430
+ allow
431
+
432
+ result = check_task(currentTask)
433
+
434
+ if result.completed:
435
+ allow
436
+
437
+ block with missing step summary
438
+ ```
439
+
440
+ Example command:
441
+
442
+ ```bash
443
+ asg hook check-stop
444
+ ```
445
+
446
+ Possible output:
447
+
448
+ ```json
449
+ {
450
+ "allow": false,
451
+ "taskId": "task_abc123",
452
+ "missingSteps": [
453
+ {
454
+ "stepId": "step_003",
455
+ "title": "Update tests"
456
+ }
457
+ ]
458
+ }
459
+ ```
460
+
461
+ ---
462
+
463
+ ## 💾 Storage
464
+
465
+ Agent Step Gate stores local project state.
466
+
467
+ Recommended storage:
468
+
469
+ ```text
470
+ .agent-step-gate/
471
+ ├─ state.db
472
+ ├─ current-session.json
473
+ ├─ current-task.json
474
+ └─ logs/
475
+ ```
476
+
477
+ Suggested implementation:
478
+
479
+ - SQLite
480
+ - WAL mode
481
+ - Append-only event log where possible
482
+ - Project-local isolation
483
+
484
+ ---
485
+
486
+ ## 🔑 Key Model
487
+
488
+ Agent Step Gate uses short completion keys.
489
+
490
+ ```text
491
+ StepKey:
492
+ issued after a Step is accepted
493
+
494
+ TaskKey:
495
+ issued after all Steps in a Task are completed
496
+ ```
497
+
498
+ Keys should be generated by the system, not by the agent.
499
+
500
+ Recommended generation:
501
+
502
+ ```text
503
+ randomBytes -> base32/base36 string -> hash before storing
504
+ ```
505
+
506
+ Store only the hash of the key.
507
+
508
+ Example:
509
+
510
+ ```text
511
+ raw key: K8F2QZ
512
+ stored: sha256(K8F2QZ)
513
+ ```
514
+
515
+ ---
516
+
517
+ ## ❌ What Agent Step Gate Does Not Do
518
+
519
+ Agent Step Gate intentionally does not do these things:
520
+
521
+ - It does not evaluate code quality.
522
+ - It does not replace tests.
523
+ - It does not force the agent to use a specific implementation path.
524
+ - It does not constantly interrupt the agent.
525
+ - It does not require the Main Agent to read all Sub Agent context.
526
+ - It does not try to become a full project management system.
527
+
528
+ It only verifies execution checkpoints.
529
+
530
+ ---
531
+
532
+ ## 📝 Suggested Commands
533
+
534
+ The exact command names can be adapted, but the recommended command groups are:
535
+
536
+ ```bash
537
+ asg program create
538
+ asg program status
539
+ asg program finalize
540
+
541
+ asg node create
542
+ asg node status
543
+ asg node finalize
544
+
545
+ asg task create
546
+ asg task current
547
+ asg task status
548
+ asg task finalize
549
+ asg task verify
550
+
551
+ asg step current
552
+ asg step complete
553
+
554
+ asg hook check-stop
555
+ asg resume
556
+ ```
557
+
558
+ ---
559
+
560
+ ## 🛠️ Development
561
+
562
+ Example stack:
563
+
564
+ - Node.js / TypeScript
565
+ - SQLite / better-sqlite3
566
+ - Zod for validation
567
+ - Vitest for tests
568
+ - Optional MCP adapter
569
+
570
+ Install dependencies:
571
+
572
+ ```bash
573
+ pnpm install
574
+ ```
575
+
576
+ Run tests:
577
+
578
+ ```bash
579
+ pnpm test
580
+ ```
581
+
582
+ Build:
583
+
584
+ ```bash
585
+ pnpm build
586
+ ```
587
+
588
+ Run locally:
589
+
590
+ ```bash
591
+ pnpm dev
592
+ ```
593
+
594
+
595
+ ---
596
+
597
+ ## 🎯 Use Cases
598
+
599
+ Agent Step Gate is useful for:
600
+
601
+ - Large refactors
602
+ - Long-context coding tasks
603
+ - Multi-session development
604
+ - Multi-agent harness orchestration
605
+ - Test migration
606
+ - Documentation rewrite
607
+ - Codebase cleanup
608
+ - Security review workflows
609
+ - Release checklist execution
610
+ - **Skill composition** — embed Agent Step Gate into other Skills to guarantee stable step execution
611
+
612
+ ### Skill Composition
613
+
614
+ Any Skill with a multi-step workflow can wrap itself in Agent Step Gate to prevent skipped steps.
615
+
616
+ ```
617
+ User invokes "complex-refactor" Skill
618
+ └─ Skill internally:
619
+ 1. asg task create --title "Complex refactor" --steps [...]
620
+ 2. Execute each step, checkpoint as you go
621
+ 3. asg task finalize
622
+ 4. Return TaskKey as completion proof
623
+ ```
624
+
625
+ Why this matters:
626
+
627
+ - **Skill authors** don't need to build their own step tracking. They declare steps, Step Gate handles the ledger.
628
+ - **Skill users** get a guarantee that the Skill's steps were actually executed, not just claimed.
629
+ - **Recursive composition** works naturally — a high-level Skill can spawn sub-Skills, each gated independently.
630
+
631
+ Example: a `db-migration` Skill
632
+
633
+ ```text
634
+ Skill: db-migration
635
+ ├─ Step 1: Snapshot current schema
636
+ ├─ Step 2: Generate migration SQL
637
+ ├─ Step 3: Run migration (dry-run)
638
+ ├─ Step 4: Verify data integrity
639
+ └─ Step 5: Apply migration
640
+
641
+ → Agent Step Gate ensures none of these 5 steps are skipped,
642
+ even if the agent context drifts during a multi-hour migration.
643
+ ```
644
+
645
+ The Skill defines *what* to do. The Gate verifies *that it was done*.
646
+
647
+ ---
648
+
649
+ ## ⚡ Core Principle
650
+
651
+ ```text
652
+ Do not rely on the agent's memory to guarantee task completion.
653
+
654
+ Let the agent work freely.
655
+ Let the gate verify completion.
656
+ ```
657
+
658
+ ---
659
+
660
+ ## 📄 License
661
+
662
+ MIT