agent-step-gate 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/ARCHITECTURE.md +393 -0
  2. package/README.md +662 -0
  3. package/SKILL.md +190 -0
  4. package/Weaver.md +140 -0
  5. package/dist/cli.d.ts +1 -0
  6. package/dist/cli.js +573 -0
  7. package/dist/cli.js.map +1 -0
  8. package/dist/core/errors.d.ts +16 -0
  9. package/dist/core/errors.js +32 -0
  10. package/dist/core/errors.js.map +1 -0
  11. package/dist/core/gate.d.ts +20 -0
  12. package/dist/core/gate.js +82 -0
  13. package/dist/core/gate.js.map +1 -0
  14. package/dist/core/keys.d.ts +18 -0
  15. package/dist/core/keys.js +37 -0
  16. package/dist/core/keys.js.map +1 -0
  17. package/dist/core/plan.d.ts +2 -0
  18. package/dist/core/plan.js +135 -0
  19. package/dist/core/plan.js.map +1 -0
  20. package/dist/core/program.d.ts +69 -0
  21. package/dist/core/program.js +191 -0
  22. package/dist/core/program.js.map +1 -0
  23. package/dist/core/reconcile.d.ts +37 -0
  24. package/dist/core/reconcile.js +198 -0
  25. package/dist/core/reconcile.js.map +1 -0
  26. package/dist/core/session.d.ts +25 -0
  27. package/dist/core/session.js +88 -0
  28. package/dist/core/session.js.map +1 -0
  29. package/dist/index.d.ts +1 -0
  30. package/dist/index.js +29 -0
  31. package/dist/index.js.map +1 -0
  32. package/dist/storage/db.d.ts +3 -0
  33. package/dist/storage/db.js +117 -0
  34. package/dist/storage/db.js.map +1 -0
  35. package/dist/storage/repository.d.ts +24 -0
  36. package/dist/storage/repository.js +449 -0
  37. package/dist/storage/repository.js.map +1 -0
  38. package/dist/tools/activeTask.d.ts +2 -0
  39. package/dist/tools/activeTask.js +41 -0
  40. package/dist/tools/activeTask.js.map +1 -0
  41. package/dist/tools/cancelTask.d.ts +2 -0
  42. package/dist/tools/cancelTask.js +39 -0
  43. package/dist/tools/cancelTask.js.map +1 -0
  44. package/dist/tools/checkpoint.d.ts +2 -0
  45. package/dist/tools/checkpoint.js +71 -0
  46. package/dist/tools/checkpoint.js.map +1 -0
  47. package/dist/tools/current.d.ts +2 -0
  48. package/dist/tools/current.js +64 -0
  49. package/dist/tools/current.js.map +1 -0
  50. package/dist/tools/finalize.d.ts +2 -0
  51. package/dist/tools/finalize.js +95 -0
  52. package/dist/tools/finalize.js.map +1 -0
  53. package/dist/tools/index.d.ts +6 -0
  54. package/dist/tools/index.js +7 -0
  55. package/dist/tools/index.js.map +1 -0
  56. package/dist/tools/startPlan.d.ts +2 -0
  57. package/dist/tools/startPlan.js +124 -0
  58. package/dist/tools/startPlan.js.map +1 -0
  59. package/dist/types/index.d.ts +142 -0
  60. package/dist/types/index.js +6 -0
  61. package/dist/types/index.js.map +1 -0
  62. package/package.json +48 -0
  63. package/scripts/interactive-demo.ts +394 -0
  64. package/scripts/mcp-call.mjs +56 -0
  65. package/scripts/prompt-check-hook.sh +27 -0
  66. package/scripts/session-start-hook.sh +47 -0
  67. package/scripts/stop-hook.mjs +83 -0
  68. package/scripts/stop-hook.sh +75 -0
package/SKILL.md ADDED
@@ -0,0 +1,190 @@
1
+ ---
2
+ name: Step Gate
3
+ description: >
4
+ Use this skill whenever working on multi-step tasks, large refactors, cross-session
5
+ development, or multi-agent orchestration — any situation where skipping a planned step
6
+ would be costly. This skill enforces an external cryptographic ledger: every planned step
7
+ must be checkpointed with a valid key before the task can be finalized. Triggers on
8
+ phrases like "multi-step plan", "refactor in phases", "orchestrate agents", "long task",
9
+ "don't skip steps", "checkpoint my work", "gate my steps", or any mention of Step Gate.
10
+ ---
11
+
12
+ # Step Gate — External Execution Ledger
13
+
14
+ An external cryptographic gate for agent task execution. It does not control *how* you
15
+ work — it only verifies *that you did* what you planned. Think of it as a
16
+ proof-of-work chain for agent steps: every completed step produces a cryptographic key,
17
+ and you cannot finalize a task without the final chain key.
18
+
19
+ ## Why this exists
20
+
21
+ Long-context agents lose track of plans. A 15-step refactor becomes 12 steps in the
22
+ agent's memory by the time it reaches step 9. Context compression drops the original
23
+ plan. A Sub Agent claims "all done" when it skipped step 7.
24
+
25
+ The Gate solves this by moving the plan ledger **outside** the agent's context. The plan
26
+ lives in SQLite. Each step is locked behind a 6-character key. The key appears only once
27
+ in the checkpoint response — if the agent loses it, the step cannot be faked.
28
+
29
+ ## Core rule
30
+
31
+ **One interaction = One Task.** At the start of each interaction, create a Task with the
32
+ steps you plan to do. Before the interaction ends, checkpoint every step and finalize
33
+ the Task. The Stop Hook will block exit if a Task is left unfinalized.
34
+
35
+ ```
36
+ Interaction start → start-plan → checkpoint × N → finalize(taskKey) → done
37
+ ```
38
+
39
+ Node and Program layers are optional — most work only needs the Task level.
40
+
41
+ ## CLI reference
42
+
43
+ All commands return JSON. The CLI binary is at `node dist/cli.js` from the project root.
44
+
45
+ ### Task commands
46
+
47
+ **start-plan** — Create a task for this interaction
48
+ ```bash
49
+ node dist/cli.js start-plan '{"title":"What this task does","steps":[...]}'
50
+ ```
51
+ Each step: `id` (optional), `title` (required), `dependsOn` (string array or omit for
52
+ auto-serial), `children` (nested container). First call auto-creates session files.
53
+ Returns `taskId`, `currentSteps`, and `stepKeys`.
54
+
55
+ **checkpoint** — Complete a step and unlock its dependents
56
+ ```bash
57
+ node dist/cli.js checkpoint '{"taskId":"tsk_XXX","stepId":"tsk_XXX_yy","stepKey":"KEY"}'
58
+ ```
59
+ The key is consumed on use — it cannot be reused. Returns `nextSteps` + `nextStepKeys`
60
+ for newly unlocked steps. When all steps are done, returns `allStepsCompleted: true` and
61
+ a `taskKey`.
62
+
63
+ **current** — Read current progress (does NOT return keys)
64
+ ```bash
65
+ node dist/cli.js current '{"taskId":"tsk_XXX"}'
66
+ ```
67
+
68
+ **finalize** — Complete the task and auto-propagate upward
69
+ ```bash
70
+ node dist/cli.js finalize '{"taskId":"tsk_XXX","taskKey":"KEY"}'
71
+ ```
72
+ Verifies the taskKey, marks the task completed, then automatically checks whether the
73
+ parent Node (if any) and Program are also complete. Returns a `level` field: `task`,
74
+ `node`, or `program`.
75
+
76
+ **cancel-task** — Cancel the current session's task
77
+ ```bash
78
+ node dist/cli.js cancel-task '{"taskId":"tsk_XXX"}'
79
+ ```
80
+ Session-gated — you can only cancel your own tasks. Cross-session cancel requires
81
+ `--admin --recovery-token <token>`.
82
+
83
+ **active-task** — List active tasks
84
+ ```bash
85
+ node dist/cli.js active-task # current session only
86
+ node dist/cli.js active-task --all # all sessions
87
+ ```
88
+
89
+ ### Program commands (cross-session projects)
90
+
91
+ ```bash
92
+ node dist/cli.js program init '{"title":"Big project","nodes":[...]}'
93
+ node dist/cli.js program start '{"programId":"pgm_XXX","nodeId":"phase-1"}'
94
+ node dist/cli.js program status '{"programId":"pgm_XXX"}'
95
+ ```
96
+
97
+ Program finalization is automatic — when the last Task in the last Node is finalized,
98
+ the system propagates completion all the way up. No manual `program finalize` needed.
99
+
100
+ **program rebuild** — Rebuild node/program after plan changes (dry-run first, then `--confirm`)
101
+ ```bash
102
+ node dist/cli.js program rebuild '{"programId":"pgm_XXX"}' # dry-run
103
+ node dist/cli.js program rebuild '{"programId":"pgm_XXX"}' --confirm
104
+ ```
105
+
106
+ Always show the user the dry-run output and get confirmation before running `--confirm`.
107
+
108
+ ### Diagnostics
109
+
110
+ ```bash
111
+ node dist/cli.js gate reconcile # full read-only health check
112
+ node dist/cli.js gate reconcile '{"programId":"pgm_XXX"}' # scoped to one program
113
+ ```
114
+
115
+ ## DAG rules
116
+
117
+ **Example — parallel branches + merge point:**
118
+ ```bash
119
+ node dist/cli.js start-plan '{
120
+ "title":"Backend refactor",
121
+ "steps":[
122
+ {"id":"auth","title":"Auth module","dependsOn":[]},
123
+ {"id":"api","title":"API layer","dependsOn":[]},
124
+ {"id":"db","title":"DB migration","dependsOn":["auth"]},
125
+ {"id":"test","title":"Integration tests","dependsOn":["api","db"]}
126
+ ]
127
+ }'
128
+ # auth + api activate immediately; db waits for auth; test waits for api + db
129
+ ```
130
+
131
+ | dependsOn | Behavior |
132
+ |-----------|----------|
133
+ | `[]` (explicit empty) | Parallel entry — activated immediately |
134
+ | omitted / undefined | Auto-serial — depends on previous leaf |
135
+ | `["a", "b"]` | Merge point — unlocks after both a and b complete |
136
+ | Container with children | Children inherit the container's dependsOn |
137
+ | `skipKey` + `skipTaskId` | Skip a previously completed step (one-time use) |
138
+
139
+ Cycle detection runs at plan creation time — circular dependencies are rejected before
140
+ any step starts.
141
+
142
+ ## Interruption recovery
143
+
144
+ When a session is interrupted, completed steps are permanent cryptographic proofs:
145
+
146
+ ```bash
147
+ # Rebuild with skipKey to jump past already-completed steps
148
+ node dist/cli.js start-plan '{
149
+ "title":"Resume wave 2",
150
+ "steps":[
151
+ {"id":"auth","title":"Auth module","dependsOn":[],"skipKey":"OLD_KEY","skipTaskId":"tsk_OLD"},
152
+ {"id":"ci","title":"CI tests","dependsOn":["auth"]}
153
+ ]
154
+ }'
155
+ ```
156
+
157
+ A skipKey can only be consumed once — the system writes a `skip_key_consumed` event on
158
+ first use and rejects subsequent attempts. Skipped steps are marked `skipped` (not
159
+ `completed`) to preserve traceability.
160
+
161
+ ## Key rules
162
+
163
+ 1. Keys appear exactly once — in the checkpoint or start-plan response. If lost, they
164
+ cannot be recovered. The `current` command never returns keys.
165
+ 2. Step double-consumption is impossible — the DB transaction uses `WHERE status='current'`
166
+ with an affected-rows guard.
167
+ 3. Cancel-task is session-gated — agents cannot cancel tasks they don't own.
168
+ 4. SkipKey is one-time — the `events` table records every consumption.
169
+ 5. Cycle detection runs at plan creation — dead DAGs are rejected before execution.
170
+
171
+ The Gate is a proof-of-completion system, not a security product. It protects against
172
+ agent hallucination, context loss, and accidental step-skipping. It does not protect
173
+ against deliberate external attack.
174
+
175
+ ## Session files
176
+
177
+ The first `start-plan` call creates:
178
+ - `.step-gate/sessions/ses_XXXXXX.json` — session credentials
179
+ - `.step-gate/bindings/bind_cli_XXXXXX.json` — hook binding
180
+
181
+ The CLI auto-discovers the session from binding files. No manual session management needed.
182
+
183
+ ## Further reading
184
+
185
+ - `Weaver.md` — Multi-agent orchestration: how a Main Agent spawns Sub Agents, injects
186
+ taskId + stepKey, and verifies returned taskKeys. Read this before orchestrating
187
+ parallel Sub Agents.
188
+ - `ARCHITECTURE.md` — Full architecture: 4-layer model, 7 DB tables, 5 credential types,
189
+ 12 CLI commands, 20+ core functions
190
+ - `docs/security-stress-test-report.md` — Security audit: 9 issues, all resolved
package/Weaver.md ADDED
@@ -0,0 +1,140 @@
1
+ # Weaver — Step Gate 编排引擎
2
+
3
+ ## 三层角色
4
+
5
+ ```
6
+ Main Agent (编排者) ← 持有 Node/Program 全局视角
7
+ │ 只做三件事: 派发、校验、推进
8
+ │ 不写代码、不执行 Step
9
+
10
+ ├── Sub Agent A ← 只知道自己的 taskId + taskGoal
11
+ ├── Sub Agent B ← 不知道其他 Task、不知道 DAG
12
+ └── Sub Agent C ← 不知道 Node/Program 全局
13
+ ```
14
+
15
+ Sub Agent 的上下文由 Main Agent 在 Spawn 时精确注入。看不到全局计划,不知道前后 Task,不持有验证逻辑。
16
+
17
+ ## 完整执行流程
18
+
19
+ ```
20
+ ═══════════════════════════════════════════════════════
21
+ Phase 0 — 规划
22
+ ═══════════════════════════════════════════════════════
23
+ Main Agent:
24
+ program init → 拆分 Node
25
+ reconcile → 日常诊断
26
+
27
+ ═══════════════════════════════════════════════════════
28
+ Phase 1 — 启动 Node
29
+ ═══════════════════════════════════════════════════════
30
+ Main Agent:
31
+ program start <node-id> ← 绑定 session 到 node
32
+ start-plan → 创建 Task(DAG) ← 一次交互 = 一个 Task
33
+ → 拿到 taskId + stepKeys
34
+
35
+ ═══════════════════════════════════════════════════════
36
+ Phase 2 — 派发
37
+ ═══════════════════════════════════════════════════════
38
+ Main Agent → Sub Agent:
39
+ {
40
+ "taskId": "tsk_XXX",
41
+ "taskGoal": "抽离认证中间件",
42
+ "constraints": ["只处理本Task范围", "完成后调checkpoint"]
43
+ }
44
+
45
+ Sub Agent 在同一工作目录启动:
46
+ → ensureSession() 自动从 .step-gate/bindings/ 发现 session
47
+ → 无需手动传 sessionId
48
+
49
+ ═══════════════════════════════════════════════════════
50
+ Phase 3 — Sub Agent 执行循环
51
+ ═══════════════════════════════════════════════════════
52
+ Sub Agent:
53
+ current(taskId)
54
+ → { currentSteps, stepKeys }
55
+
56
+ for each step:
57
+ 执行 step
58
+ checkpoint(taskId, stepId, stepKey)
59
+ → { nextSteps, nextStepKeys }
60
+ → 或 { allStepsCompleted: true, taskKey }
61
+
62
+ ═══════════════════════════════════════════════════════
63
+ Phase 4 — 交回凭证
64
+ ═══════════════════════════════════════════════════════
65
+ Sub Agent → Main Agent:
66
+ {
67
+ "taskId": "tsk_XXX",
68
+ "taskKey": "A1B2C3",
69
+ "summary": "完成认证中间件抽离",
70
+ "artifacts": ["src/middleware/auth.ts"]
71
+ }
72
+
73
+ ═══════════════════════════════════════════════════════
74
+ Phase 5 — Main Agent 校验 + 自动推进
75
+ ═══════════════════════════════════════════════════════
76
+ Main Agent:
77
+ finalize(taskId, taskKey)
78
+
79
+ ✅ 通过:
80
+ → 返回 { ok: true, level, ... }
81
+ → level="task": Node 还有未完成的 Task,继续派发
82
+ → level="node": Node 完成! nodeKey 返回,自动推进
83
+ → level="program": 全部 Node 完成! 收工
84
+ → Sub Agent 释放
85
+
86
+ ❌ 不通过:
87
+ → 返回 { actualStatus, completedSteps, missingSteps,
88
+ currentStepId, stepKey }
89
+ → Main Agent 把真实账本发回 Sub Agent:
90
+ "你的 TaskKey 未通过 Gate 校验。
91
+ 已完成: step_001, step_002
92
+ 缺失: step_003, step_004
93
+ 当前应继续 step_003,StepKey: SK_REAL33"
94
+ → Sub Agent 从 currentStepId 继续 checkpoint
95
+ → 修完重新 finalize
96
+
97
+ ═══════════════════════════════════════════════════════
98
+ Phase 6 — 下一个 Node (自动)
99
+ ═══════════════════════════════════════════════════════
100
+ finalize 返回 level="node" 时,Main Agent:
101
+ program status → 找下一个 ready node
102
+ program start <next-node>
103
+ → 创建新 Task → 派发 → 循环
104
+
105
+ ═══════════════════════════════════════════════════════
106
+ 收尾
107
+ ═══════════════════════════════════════════════════════
108
+ 最后一个 Node 完成:
109
+ finalize → level="program" → Program completed
110
+ 收工
111
+ ```
112
+
113
+ ## 关键设计点
114
+
115
+ **Main Agent 只调一个命令**:`finalize(taskKey)`。剩下的系统自动从 Task → Node → Program 传播。
116
+
117
+ **TaskKey 校验即消费**:finalize 会消耗 taskKey 并推进 DAG,不存在"校验通过但不推进"的状态。
118
+
119
+ **Sub Agent 不需要知道**:
120
+ - taskId 的结构含义
121
+ - 完整的 DAG
122
+ - 前后 Task 是什么
123
+ - Node/Program 全局
124
+ - 验证逻辑(系统自己校验)
125
+
126
+ **中断恢复**:taskId + skipKey 重建,旧 step 凭证永久保留。
127
+
128
+ **纯 Task 模式**:不用 Program/Node 时,只需 `start-plan → checkpoint → finalize`。每个交互一个 Task,交互结束 Stop Hook 自动检查。
129
+
130
+ ## 渐进式披露
131
+
132
+ ```
133
+ SKILL.md (执行协议) ← 所有 Agent 必读,基础 CLI 命令
134
+ └─ Weaver.md (编排引擎) ← Main Agent 读,如何编排 Sub Agent
135
+ └─ CLI (状态机) ← 底层实现
136
+ └─ SQLite (持久化)
137
+ ```
138
+
139
+ Sub Agent 只需要 SKILL.md 中的 CLI 命令,不需要 Weaver.md。
140
+ Main Agent 需要 SKILL.md + Weaver.md。
package/dist/cli.d.ts ADDED
@@ -0,0 +1 @@
1
+ export {};